Project Overview
DrowCheck delivers real-time drowsiness and yawning detection using standard webcams. It combines facial-ratio metrics, an LSTM sequence classifier, and a live capture pipeline to monitor user alertness and trigger warnings when fatigue indicators exceed configurable thresholds.
When to Use DrowCheck
- Driver monitoring systems in vehicles
- Workplace fatigue surveillance (e.g., control rooms)
- Healthcare and research applications tracking sleepiness
- Any scenario requiring continuous, non-invasive alertness monitoring
Core Components and Workflow
Frame Acquisition
• Capture frames from webcam via OpenCV.Landmark Detection (
face_detection.py
)
• Detect face, eyes, mouth landmarks with dlib’s 68-point predictor.Metric Computation (
attention_metrics.py
)
•eye_aspect_ratio
(EAR): blink and closure detection
•mouth_aspect_ratio
(MAR): yawning detection
•PERCLOS
: percent eye closure over a sliding windowSequence Buffer
• Maintain a time-ordered buffer of (EAR, MAR) pairs.LSTM Inference (
models.py
)
•LSTMClassifier(input_size=2, hidden_size=64, num_classes=2)
• Classifies sequence as “alert” or “drowsy.”Alert Generation
• Overlay visual warnings on frames
• Emit audio alarms or callbacks when thresholds are breached
Module Breakdown
attention_metrics.py
Computes instantaneous and time-windowed facial ratios.
from attention_metrics import eye_aspect_ratio, mouth_aspect_ratio, PERCLOS
# Compute EAR and MAR from 2D landmark arrays
ear = (eye_aspect_ratio(left_eye) + eye_aspect_ratio(right_eye)) / 2.0
mar = mouth_aspect_ratio(mouth_landmarks)
# Track eye-closure over last 60 seconds
perclos = PERCLOS(window_seconds=60.0)
perclos.update(ear < 0.25)
current_perclos = perclos.value() # percentage closed
models.py
Defines the LSTM sequence classifier.
import torch
from models import LSTMClassifier
# Load pretrained model
model = LSTMClassifier(input_size=2, hidden_size=64, num_classes=2)
model.load_state_dict(torch.load("models/drowsiness_lstm.pth"))
model.eval()
# Prepare sequence tensor of shape (seq_len, batch=1, features=2)
sequence = torch.tensor(metric_buffer).unsqueeze(1)
with torch.no_grad():
logits = model(sequence)
_, pred_class = torch.max(logits, 1)
face_detection.py
Orchestrates capture → detection → metrics → classification → alert.
# Download dlib’s landmark predictor first:
wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
bunzip2 shape_predictor_68_face_landmarks.dat.bz2
# Run the real-time pipeline
python face_detection.py \
--shape-predictor shape_predictor_68_face_landmarks.dat \
--model models/drowsiness_lstm.pth \
--eye-threshold 0.25 \
--mar-threshold 0.7 \
--perclos-window 30
Adjust thresholds and window lengths to fine-tune sensitivity for your application.
Getting Started
Follow these steps to set up the environment, download required models, and run the real-time drowsiness detection demo.
1. Prerequisites
- Python 3.6 or later
- pip
- Webcam (USB or built-in)
2. Clone the Repository
git clone https://github.com/KaitoEight/DrowCheck.git
cd DrowCheck
3. Install Dependencies
Create and activate a virtual environment (optional but recommended):
python3 -m venv venv
source venv/bin/activate # Linux & macOS
venv\Scripts\activate # Windows
Install required packages:
pip install torch torchvision # PyTorch
pip install opencv-python # OpenCV
pip install dlib # Dlib for landmark detection
pip install numpy # Numeric operations
(Alternatively, if a requirements.txt
is provided: pip install -r requirements.txt
.)
4. Download Pretrained Models
Dlib facial landmark predictor
Downloadshape_predictor_68_face_landmarks.dat
from http://dlib.net/files/
Place it in amodels/
folder:mkdir models mv shape_predictor_68_face_landmarks.dat models/
LSTM Drowsiness Model
Copy or downloadlstm_model.pth
(ordrowsy_lstm.pth
) intomodels/
:mv lstm_model.pth models/ mv drowsy_lstm.pth models/ # optional alternate checkpoint
Your models/
directory should now contain:
models/
├── shape_predictor_68_face_landmarks.dat
├── lstm_model.pth
└── drowsy_lstm.pth
5. Run the Demo
Launch the real-time detection using your default camera:
python main.py \
--predictor models/shape_predictor_68_face_landmarks.dat \
--model models/lstm_model.pth
If main.py
does not expose CLI flags, edit the file’s paths:
# main.py (excerpt)
detector = DrowsinessDetector(
predictor_path="models/shape_predictor_68_face_landmarks.dat",
model_path="models/lstm_model.pth",
device="cuda" # or "cpu"
)
Then run:
python main.py
A window titled “Drowsiness Detection” appears. The script overlays facial landmarks, EAR/MAR metrics, and alerts. Press ESC to exit.
6. Verifying Your Setup
- On startup, console logs display:
▶ Phát hiện buồn ngủ... (nhấn ESC để thoát)
- A video window shows landmark annotations and “DROWSY” or “YAWN” alerts when thresholds are exceeded.
- If no face is detected, the frame renders normally without errors.
7. Troubleshooting & Tips
- Camera access error: Ensure no other app uses the webcam. Try a different index:
cap = cv2.VideoCapture(1)
- Performance:
- Resize frames before
process_frame
for higher FPS:frame = cv2.resize(frame, (640, 360))
- Run on GPU by installing the CUDA-enabled PyTorch build and passing
device="cuda"
.
- Resize frames before
- Threshold tuning:
- Adjust
EYE_AR_THRESH
(default 0.25) andMOUTH_AR_THRESH
(default 0.75) inface_detection.py
to suit your camera and lighting.
- Adjust
- Model updates:
- To fine-tune or swap models, replace
.pth
files and ensure the new checkpoint matches the network architecture defined inface_detection.py
.
- To fine-tune or swap models, replace
- Cleanup:
The demo releases resources automatically on ESC, but you can explicitly call:
in your own scripts.cap.release() cv2.destroyAllWindows()
Data Collection & Model Training
This section guides you through gathering custom EAR/MAR data, understanding the CSV schema, retraining the LSTM classifier, and swapping in your new model.
1. Gathering Data with collect_data.py
Use collect_data.py
to capture real-time EAR/MAR values and auto-label frames based on thresholds.
Usage
python collect_data.py \
--output custom_data.csv \
--ear-thresh 0.25 \
--mar-thresh 0.7 \
--duration 300
--output
: CSV file path--ear-thresh
: EAR below this marks “drowsy” (label 1)--mar-thresh
: MAR above this marks “yawn” (label 1)--duration
: Recording time in seconds
The script displays webcam feed, computes EAR/MAR each frame, assigns LABEL
(0/1) and appends rows:
TIMESTAMP, EAR, MAR, LABEL.
2. CSV Format
Ensure your CSV matches the expected columns for DrowsyDataset
:
TIMESTAMP | EAR | MAR | LABEL |
---|---|---|---|
1609459200.0 | 0.32 | 0.45 | 0 |
1609459200.03 | 0.21 | 0.50 | 1 |
… | … | … | … |
EAR
,MAR
: float featuresLABEL
: integer 0 (alert) or 1 (drowsy/yawn)
No header reordering—columns map by name in dataset.py
.
3. Preparing the PyTorch Dataset
Load sequences of fixed length (seq_len
) from your CSV:
from torch.utils.data import DataLoader
from dataset import DrowsyDataset
seq_len = 30
dataset = DrowsyDataset('custom_data.csv', seq_len=seq_len)
loader = DataLoader(dataset, batch_size=16, shuffle=True)
for seq_batch, label_batch in loader:
# seq_batch: [16, 30, 2], label_batch: [16]
...
- Each
seq_batch
is a FloatTensor of (EAR, MAR) pairs. - Labels correspond to the last frame in each window.
4. Retraining LSTMDrowsinessDetector
Use the standard training loop to fit your model on collected data.
import torch
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from models import LSTMDrowsinessDetector
from dataset import DrowsyDataset
# Hyperparameters
seq_len, batch_size, epochs, lr = 50, 32, 20, 1e-3
# DataLoader
dataset = DrowsyDataset('custom_data.csv', seq_len=seq_len)
loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
# Model setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LSTMDrowsinessDetector(input_size=2, hidden_size=64, num_layers=2, num_classes=2)
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=lr)
# Training loop
for epoch in range(1, epochs+1):
model.train()
total_loss = 0.0
for x, y in loader:
x, y = x.to(device), y.to(device)
optimizer.zero_grad()
logits = model(x) # (batch, 2)
loss = F.cross_entropy(logits, y) # scalar
loss.backward()
optimizer.step()
total_loss += loss.item()
avg = total_loss / len(loader)
print(f"Epoch {epoch}/{epochs} Loss: {avg:.4f}")
# Save trained state
torch.save(model.state_dict(), 'models/lstm_drowsiness_custom.pth')
5. Replacing the Shipped Model
- Copy your checkpoint into the repo’s
models/
folder (e.g. overwritelstm_best.pth
). - In your inference or app script, load and replace:
import torch
from models import LSTMDrowsinessDetector
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LSTMDrowsinessDetector(input_size=2, hidden_size=64, num_layers=2, num_classes=2)
model.load_state_dict(torch.load('models/lstm_drowsiness_custom.pth', map_location=device))
model.eval()
- Restart the application. The new weights will power live drowsiness detection.
Practical Tips
- Record at least a few minutes per class for balanced training.
- Visualize EAR/MAR distributions in your CSV to adjust thresholds.
- Experiment with
hidden_size
/num_layers
to match your data complexity. - Use class weights in
F.cross_entropy
for imbalanced labels. - For production, convert your model to TorchScript (
torch.jit.script
) to improve inference speed.
Code Reference & Extensibility
DrowCheck provides modular components for facial‐metric computation, real‐time detection, data collection, and sequence‐based modeling. Use these references to integrate, extend, or contribute new functionality.
attention_metrics.py – EAR, MAR, and PERCLOS
Compute real‐time facial ratios and track eye closure over time.
Functions
from typing import Sequence
import numpy as np
def eye_aspect_ratio(eye: np.ndarray) -> float:
"""Compute EAR from 6×2 array of eye landmarks."""
# vertical distances
A = np.linalg.norm(eye[1] - eye[5])
B = np.linalg.norm(eye[2] - eye[4])
# horizontal distance
C = np.linalg.norm(eye[0] - eye[3])
return (A + B) / (2.0 * C)
def mouth_aspect_ratio(mouth: np.ndarray) -> float:
"""Compute MAR from 20×2 array of mouth landmarks."""
A = np.linalg.norm(mouth[13] - mouth[19])
B = np.linalg.norm(mouth[14] - mouth[18])
C = np.linalg.norm(mouth[15] - mouth[17])
D = np.linalg.norm(mouth[12] - mouth[16])
return (A + B + C) / (3.0 * D)
PERCLOS – Sliding‐Window Percentage of Eye Closure
Provide a real‐time measure of how often eyes remain closed over a rolling time window.
class PERCLOS:
def __init__(self, window_seconds: float = 60.0):
"""
window_seconds: length of rolling history in seconds
"""
self.window = window_seconds
self.history: list[tuple[float, bool]] = []
def update(self, eyes_closed: bool) -> None:
"""Record a new eyes_closed flag at current timestamp."""
now = time.time()
self.history.append((now, eyes_closed))
# discard old entries
cutoff = now - self.window
self.history = [(t, flag) for t, flag in self.history if t >= cutoff]
def value(self) -> float:
"""Return percentage of entries where eyes_closed == True."""
if not self.history:
return 0.0
closed = sum(flag for _, flag in self.history)
return closed / len(self.history) * 100.0
Practical Usage:
from attention_metrics import eye_aspect_ratio, PERCLOS
EAR_THRESHOLD = 0.22
perclos = PERCLOS(window_seconds=60.0)
for frame in video_stream:
left_eye, right_eye = detect_eyes(frame) # 6×2 arrays
ear = (eye_aspect_ratio(left_eye) + eye_aspect_ratio(right_eye)) / 2
perclos.update(ear < EAR_THRESHOLD)
print(f"PERCLOS: {perclos.value():.1f}%")
Tuning Tips:
- Aim for a 60–120 s window, alerting at 70–80%.
- Ensure consistent frame rates to avoid sampling bias.
- Combine with MAR or other metrics for robust alerts.
dataset.py – DrowsyDataset
Load sequential EAR‐MAR features and labels for PyTorch training.
from torch.utils.data import Dataset
import pandas as pd
import torch
class DrowsyDataset(Dataset):
def __init__(self, csv_file: str, seq_len: int = 30):
"""
csv_file: path to CSV with columns [EAR, MAR, LABEL]
seq_len: number of consecutive frames per sample
"""
df = pd.read_csv(csv_file)
self.features = df[['EAR', 'MAR']].values
self.labels = df['LABEL'].values.astype(int)
self.seq_len = seq_len
def __len__(self):
return len(self.labels) - self.seq_len + 1
def __getitem__(self, idx):
seq = self.features[idx : idx + self.seq_len]
label = self.labels[idx + self.seq_len - 1]
return torch.tensor(seq, dtype=torch.float32), torch.tensor(label, dtype=torch.long)
Usage with DataLoader:
from torch.utils.data import DataLoader
dataset = DrowsyDataset("data/drowsy.csv", seq_len=30)
loader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4, pin_memory=True)
for seq_batch, label_batch in loader:
seq_batch, label_batch = seq_batch.to(device), label_batch.to(device)
# forward pass...
Tips:
- Ensure no missing EAR/MAR values.
- LABEL: 0 = alert, 1 = drowsy.
- Split train/val before wrapping in DataLoader.
face_detection.py – DrowsinessDetector.process_frame
Process incoming frames to detect landmarks, compute ratios, and infer drowsiness.
def process_frame(self, frame: np.ndarray) -> (
np.ndarray, # output frame (BGR)
dlib.rectangle, # face rect
np.ndarray, # leftEye (6×2)
np.ndarray, # rightEye (6×2)
np.ndarray, # mouth (20×2)
float, # EAR
float, # MAR
int, # LSTM label (0/1)
bool, # instantaneous drowsy (EAR<thresh)
bool # instantaneous yawning (MAR>thresh)
):
...
Example integration:
from face_detection import DrowsinessDetector
import cv2
detector = DrowsinessDetector(
predictor_path="shape_predictor_68_face_landmarks.dat",
model_path="lstm_model.pth",
device="cpu"
)
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret: break
(out, rect, le, re, mo, ear, mar, label, is_drowsy, is_yawning) = detector.process_frame(frame)
if rect:
# draw landmarks
for pts, color in [(le, (0,255,0)), (re, (0,255,0)), (mo, (0,0,255))]:
cv2.polylines(out, [pts], True, color, 1)
status = f"Drowsy(LSTM)={label}, EAR={ear:.2f}, MAR={mar:.2f}"
cv2.putText(out, status, (10,30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255,255,255), 2)
cv2.imshow("Drowsiness Monitor", out)
if cv2.waitKey(1) & 0xFF == ord('q'): break
cap.release()
cv2.destroyAllWindows()
Extension Points:
- Swap in a custom landmark detector.
- Expose buffer length or thresholds for runtime tuning.
- Add additional flags (e.g., head pose).
models.py – LSTMDrowsinessDetector
Define and train the sequence classifier for alertness.
import torch
import torch.nn as nn
class LSTMDrowsinessDetector(nn.Module):
def __init__(self, input_size=2, hidden_size=64, num_layers=2, num_classes=2):
super().__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
# x: (batch, seq_len, input_size)
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
out, _ = self.lstm(x, (h0, c0)) # (batch, seq_len, hidden_size)
return self.fc(out[:, -1, :]) # final time‐step
Quick Start:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMDrowsinessDetector().to(device)
x = torch.randn(16, 30, 2, device=device)
logits = model(x) # (16, 2)
preds = logits.argmax(dim=1) # (16,)
Customization:
- Enable bidirectional LSTM and adjust
fc
input size. - Add dropout between LSTM layers.
- Replace final head with attention or deeper classifier.
collect_data.py – Data Collection Module
Automate EAR/MAR capture and labeling to CSV for model training.
pip install opencv-python
python collect_data.py
# Press ESC to stop; outputs data.csv
Key constants and flow:
CSV_FILE = "data.csv"
# Header: ["EAR","MAR","LABEL"]
if not os.path.exists(CSV_FILE):
with open(CSV_FILE, "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["EAR","MAR","LABEL"])
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret: break
frame, face, le, re, mo, ear, mar, _, drowsy, yawn = detector.process_frame(frame)
frame = render(frame, face, le, re, mo, ear, mar, None, drowsy, yawn)
label = 1 if drowsy or yawn else 0
with open(CSV_FILE, "a", newline="") as f:
csv.writer(f).writerow([ear, mar, label])
cv2.imshow("Collecting Data", frame)
if cv2.waitKey(1) & 0xFF == 27: break
cap.release()
cv2.destroyAllWindows()
Best Practices:
- Calibrate in uniform lighting.
- Balance label distribution per subject.
- Expose CSV path and thresholds via CLI args for batch collection.
- Extend to include additional sensor data (e.g., head pose, steering wheel).
By leveraging and extending these modules, you can adapt DrowCheck to new sensors, custom detection logic, or advanced sequence models. Contributions and plugin modules should follow the existing API patterns and include unit tests for new metrics or detectors.