KaitoEight/DrowCheck Documentation - Complete Guide & API Reference

Project Overview

DrowCheck delivers real-time drowsiness and yawning detection using standard webcams. It combines facial-ratio metrics, an LSTM sequence classifier, and a live capture pipeline to monitor user alertness and trigger warnings when fatigue indicators exceed configurable thresholds.

When to Use DrowCheck

Driver monitoring systems in vehicles
Workplace fatigue surveillance (e.g., control rooms)
Healthcare and research applications tracking sleepiness
Any scenario requiring continuous, non-invasive alertness monitoring

Core Components and Workflow

Frame Acquisition
• Capture frames from webcam via OpenCV.
Landmark Detection (face_detection.py)
• Detect face, eyes, mouth landmarks with dlib’s 68-point predictor.
Metric Computation (attention_metrics.py)
• eye_aspect_ratio (EAR): blink and closure detection
• mouth_aspect_ratio (MAR): yawning detection
• PERCLOS: percent eye closure over a sliding window
Sequence Buffer
• Maintain a time-ordered buffer of (EAR, MAR) pairs.
LSTM Inference (models.py)
• LSTMClassifier(input_size=2, hidden_size=64, num_classes=2)
• Classifies sequence as “alert” or “drowsy.”
Alert Generation
• Overlay visual warnings on frames
• Emit audio alarms or callbacks when thresholds are breached

Module Breakdown

attention_metrics.py

Computes instantaneous and time-windowed facial ratios.

from attention_metrics import eye_aspect_ratio, mouth_aspect_ratio, PERCLOS

# Compute EAR and MAR from 2D landmark arrays
ear = (eye_aspect_ratio(left_eye) + eye_aspect_ratio(right_eye)) / 2.0
mar = mouth_aspect_ratio(mouth_landmarks)

# Track eye-closure over last 60 seconds
perclos = PERCLOS(window_seconds=60.0)
perclos.update(ear < 0.25)
current_perclos = perclos.value()  # percentage closed

models.py

Defines the LSTM sequence classifier.

import torch
from models import LSTMClassifier

# Load pretrained model
model = LSTMClassifier(input_size=2, hidden_size=64, num_classes=2)
model.load_state_dict(torch.load("models/drowsiness_lstm.pth"))
model.eval()

# Prepare sequence tensor of shape (seq_len, batch=1, features=2)
sequence = torch.tensor(metric_buffer).unsqueeze(1)
with torch.no_grad():
    logits = model(sequence)
    _, pred_class = torch.max(logits, 1)

face_detection.py

Orchestrates capture → detection → metrics → classification → alert.

# Download dlib’s landmark predictor first:
wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
bunzip2 shape_predictor_68_face_landmarks.dat.bz2

# Run the real-time pipeline
python face_detection.py \
  --shape-predictor shape_predictor_68_face_landmarks.dat \
  --model models/drowsiness_lstm.pth \
  --eye-threshold 0.25 \
  --mar-threshold 0.7 \
  --perclos-window 30

Adjust thresholds and window lengths to fine-tune sensitivity for your application.

Getting Started

Follow these steps to set up the environment, download required models, and run the real-time drowsiness detection demo.

1. Prerequisites

Python 3.6 or later
pip
Webcam (USB or built-in)

2. Clone the Repository

git clone https://github.com/KaitoEight/DrowCheck.git
cd DrowCheck

3. Install Dependencies

Create and activate a virtual environment (optional but recommended):

python3 -m venv venv
source venv/bin/activate      # Linux & macOS
venv\Scripts\activate         # Windows

Install required packages:

pip install torch torchvision           # PyTorch
pip install opencv-python              # OpenCV
pip install dlib                       # Dlib for landmark detection
pip install numpy                      # Numeric operations

(Alternatively, if a requirements.txt is provided: pip install -r requirements.txt.)

4. Download Pretrained Models

Dlib facial landmark predictor
Download shape_predictor_68_face_landmarks.dat from http://dlib.net/files/
Place it in a models/ folder:
```
mkdir models
mv shape_predictor_68_face_landmarks.dat models/
```

LSTM Drowsiness Model
Copy or download lstm_model.pth (or drowsy_lstm.pth) into models/:

mv lstm_model.pth models/
mv drowsy_lstm.pth models/    # optional alternate checkpoint

Your models/ directory should now contain:

models/
├── shape_predictor_68_face_landmarks.dat
├── lstm_model.pth
└── drowsy_lstm.pth

5. Run the Demo

Launch the real-time detection using your default camera:

python main.py \
  --predictor models/shape_predictor_68_face_landmarks.dat \
  --model models/lstm_model.pth

If main.py does not expose CLI flags, edit the file’s paths:

# main.py (excerpt)
detector = DrowsinessDetector(
    predictor_path="models/shape_predictor_68_face_landmarks.dat",
    model_path="models/lstm_model.pth",
    device="cuda"      # or "cpu"
)

Then run:

python main.py

A window titled “Drowsiness Detection” appears. The script overlays facial landmarks, EAR/MAR metrics, and alerts. Press ESC to exit.

6. Verifying Your Setup

On startup, console logs display:

▶ Phát hiện buồn ngủ... (nhấn ESC để thoát)

A video window shows landmark annotations and “DROWSY” or “YAWN” alerts when thresholds are exceeded.
If no face is detected, the frame renders normally without errors.

7. Troubleshooting & Tips

Camera access error: Ensure no other app uses the webcam. Try a different index:
cap = cv2.VideoCapture(1)
Performance:
- Resize frames before process_frame for higher FPS:
```
frame = cv2.resize(frame, (640, 360))
```
- Run on GPU by installing the CUDA-enabled PyTorch build and passing device="cuda".
Threshold tuning:
- Adjust EYE_AR_THRESH (default 0.25) and MOUTH_AR_THRESH (default 0.75) in face_detection.py to suit your camera and lighting.
Model updates:
- To fine-tune or swap models, replace .pth files and ensure the new checkpoint matches the network architecture defined in face_detection.py.
Cleanup:
The demo releases resources automatically on ESC, but you can explicitly call:
```
cap.release()
cv2.destroyAllWindows()
```
in your own scripts.

Data Collection & Model Training

This section guides you through gathering custom EAR/MAR data, understanding the CSV schema, retraining the LSTM classifier, and swapping in your new model.

1. Gathering Data with collect_data.py

Use collect_data.py to capture real-time EAR/MAR values and auto-label frames based on thresholds.

Usage

python collect_data.py \
  --output custom_data.csv \
  --ear-thresh 0.25 \
  --mar-thresh 0.7 \
  --duration 300

--output: CSV file path
--ear-thresh: EAR below this marks “drowsy” (label 1)
--mar-thresh: MAR above this marks “yawn” (label 1)
--duration: Recording time in seconds

The script displays webcam feed, computes EAR/MAR each frame, assigns LABEL (0/1) and appends rows: TIMESTAMP, EAR, MAR, LABEL.

2. CSV Format

Ensure your CSV matches the expected columns for DrowsyDataset:

TIMESTAMP	EAR	MAR	LABEL
1609459200.0	0.32	0.45	0
1609459200.03	0.21	0.50	1
…	…	…	…

EAR, MAR: float features
LABEL: integer 0 (alert) or 1 (drowsy/yawn)

No header reordering—columns map by name in dataset.py.

3. Preparing the PyTorch Dataset

Load sequences of fixed length (seq_len) from your CSV:

from torch.utils.data import DataLoader
from dataset import DrowsyDataset

seq_len = 30
dataset = DrowsyDataset('custom_data.csv', seq_len=seq_len)
loader = DataLoader(dataset, batch_size=16, shuffle=True)

for seq_batch, label_batch in loader:
    # seq_batch: [16, 30, 2], label_batch: [16]
    ...

Each seq_batch is a FloatTensor of (EAR, MAR) pairs.
Labels correspond to the last frame in each window.

4. Retraining LSTMDrowsinessDetector

Use the standard training loop to fit your model on collected data.

import torch
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from models import LSTMDrowsinessDetector
from dataset import DrowsyDataset

# Hyperparameters
seq_len, batch_size, epochs, lr = 50, 32, 20, 1e-3

# DataLoader
dataset = DrowsyDataset('custom_data.csv', seq_len=seq_len)
loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

# Model setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LSTMDrowsinessDetector(input_size=2, hidden_size=64, num_layers=2, num_classes=2)
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=lr)

# Training loop
for epoch in range(1, epochs+1):
    model.train()
    total_loss = 0.0
    for x, y in loader:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        logits = model(x)                        # (batch, 2)
        loss = F.cross_entropy(logits, y)        # scalar
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    avg = total_loss / len(loader)
    print(f"Epoch {epoch}/{epochs}  Loss: {avg:.4f}")

# Save trained state
torch.save(model.state_dict(), 'models/lstm_drowsiness_custom.pth')

5. Replacing the Shipped Model

Copy your checkpoint into the repo’s models/ folder (e.g. overwrite lstm_best.pth).
In your inference or app script, load and replace:

import torch
from models import LSTMDrowsinessDetector

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LSTMDrowsinessDetector(input_size=2, hidden_size=64, num_layers=2, num_classes=2)
model.load_state_dict(torch.load('models/lstm_drowsiness_custom.pth', map_location=device))
model.eval()

Restart the application. The new weights will power live drowsiness detection.

Practical Tips

Record at least a few minutes per class for balanced training.
Visualize EAR/MAR distributions in your CSV to adjust thresholds.
Experiment with hidden_size/num_layers to match your data complexity.
Use class weights in F.cross_entropy for imbalanced labels.
For production, convert your model to TorchScript (torch.jit.script) to improve inference speed.

Code Reference & Extensibility

DrowCheck provides modular components for facial‐metric computation, real‐time detection, data collection, and sequence‐based modeling. Use these references to integrate, extend, or contribute new functionality.

attention_metrics.py – EAR, MAR, and PERCLOS

Compute real‐time facial ratios and track eye closure over time.

Functions

from typing import Sequence
import numpy as np

def eye_aspect_ratio(eye: np.ndarray) -> float:
    """Compute EAR from 6×2 array of eye landmarks."""
    # vertical distances
    A = np.linalg.norm(eye[1] - eye[5])
    B = np.linalg.norm(eye[2] - eye[4])
    # horizontal distance
    C = np.linalg.norm(eye[0] - eye[3])
    return (A + B) / (2.0 * C)

def mouth_aspect_ratio(mouth: np.ndarray) -> float:
    """Compute MAR from 20×2 array of mouth landmarks."""
    A = np.linalg.norm(mouth[13] - mouth[19])
    B = np.linalg.norm(mouth[14] - mouth[18])
    C = np.linalg.norm(mouth[15] - mouth[17])
    D = np.linalg.norm(mouth[12] - mouth[16])
    return (A + B + C) / (3.0 * D)

PERCLOS – Sliding‐Window Percentage of Eye Closure

Provide a real‐time measure of how often eyes remain closed over a rolling time window.

class PERCLOS:
    def __init__(self, window_seconds: float = 60.0):
        """
        window_seconds: length of rolling history in seconds
        """
        self.window = window_seconds
        self.history: list[tuple[float, bool]] = []

    def update(self, eyes_closed: bool) -> None:
        """Record a new eyes_closed flag at current timestamp."""
        now = time.time()
        self.history.append((now, eyes_closed))
        # discard old entries
        cutoff = now - self.window
        self.history = [(t, flag) for t, flag in self.history if t >= cutoff]

    def value(self) -> float:
        """Return percentage of entries where eyes_closed == True."""
        if not self.history:
            return 0.0
        closed = sum(flag for _, flag in self.history)
        return closed / len(self.history) * 100.0

Practical Usage:

from attention_metrics import eye_aspect_ratio, PERCLOS

EAR_THRESHOLD = 0.22
perclos = PERCLOS(window_seconds=60.0)

for frame in video_stream:
    left_eye, right_eye = detect_eyes(frame)  # 6×2 arrays
    ear = (eye_aspect_ratio(left_eye) + eye_aspect_ratio(right_eye)) / 2
    perclos.update(ear < EAR_THRESHOLD)
    print(f"PERCLOS: {perclos.value():.1f}%")

Tuning Tips:

Aim for a 60–120 s window, alerting at 70–80%.
Ensure consistent frame rates to avoid sampling bias.
Combine with MAR or other metrics for robust alerts.

dataset.py – DrowsyDataset

Load sequential EAR‐MAR features and labels for PyTorch training.

from torch.utils.data import Dataset
import pandas as pd
import torch

class DrowsyDataset(Dataset):
    def __init__(self, csv_file: str, seq_len: int = 30):
        """
        csv_file: path to CSV with columns [EAR, MAR, LABEL]
        seq_len: number of consecutive frames per sample
        """
        df = pd.read_csv(csv_file)
        self.features = df[['EAR', 'MAR']].values
        self.labels = df['LABEL'].values.astype(int)
        self.seq_len = seq_len

    def __len__(self):
        return len(self.labels) - self.seq_len + 1

    def __getitem__(self, idx):
        seq = self.features[idx : idx + self.seq_len]
        label = self.labels[idx + self.seq_len - 1]
        return torch.tensor(seq, dtype=torch.float32), torch.tensor(label, dtype=torch.long)

Usage with DataLoader:

from torch.utils.data import DataLoader
dataset = DrowsyDataset("data/drowsy.csv", seq_len=30)
loader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4, pin_memory=True)
for seq_batch, label_batch in loader:
    seq_batch, label_batch = seq_batch.to(device), label_batch.to(device)
    # forward pass...

Tips:

Ensure no missing EAR/MAR values.
LABEL: 0 = alert, 1 = drowsy.
Split train/val before wrapping in DataLoader.

face_detection.py – DrowsinessDetector.process_frame

Process incoming frames to detect landmarks, compute ratios, and infer drowsiness.

def process_frame(self, frame: np.ndarray) -> (
    np.ndarray,       # output frame (BGR)
    dlib.rectangle,   # face rect
    np.ndarray,       # leftEye (6×2)
    np.ndarray,       # rightEye (6×2)
    np.ndarray,       # mouth (20×2)
    float,            # EAR
    float,            # MAR
    int,              # LSTM label (0/1)
    bool,             # instantaneous drowsy (EAR<thresh)
    bool              # instantaneous yawning (MAR>thresh)
):
    ...

Example integration:

from face_detection import DrowsinessDetector
import cv2

detector = DrowsinessDetector(
    predictor_path="shape_predictor_68_face_landmarks.dat",
    model_path="lstm_model.pth",
    device="cpu"
)

cap = cv2.VideoCapture(0)
while True:
    ret, frame = cap.read()
    if not ret: break

    (out, rect, le, re, mo, ear, mar, label, is_drowsy, is_yawning) = detector.process_frame(frame)
    if rect:
        # draw landmarks
        for pts, color in [(le, (0,255,0)), (re, (0,255,0)), (mo, (0,0,255))]:
            cv2.polylines(out, [pts], True, color, 1)
        status = f"Drowsy(LSTM)={label}, EAR={ear:.2f}, MAR={mar:.2f}"
        cv2.putText(out, status, (10,30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255,255,255), 2)

    cv2.imshow("Drowsiness Monitor", out)
    if cv2.waitKey(1) & 0xFF == ord('q'): break

cap.release()
cv2.destroyAllWindows()

Extension Points:

Swap in a custom landmark detector.
Expose buffer length or thresholds for runtime tuning.
Add additional flags (e.g., head pose).

models.py – LSTMDrowsinessDetector

Define and train the sequence classifier for alertness.

import torch
import torch.nn as nn

class LSTMDrowsinessDetector(nn.Module):
    def __init__(self, input_size=2, hidden_size=64, num_layers=2, num_classes=2):
        super().__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # x: (batch, seq_len, input_size)
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
        out, _ = self.lstm(x, (h0, c0))         # (batch, seq_len, hidden_size)
        return self.fc(out[:, -1, :])           # final time‐step

Quick Start:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMDrowsinessDetector().to(device)
x = torch.randn(16, 30, 2, device=device)
logits = model(x)             # (16, 2)
preds = logits.argmax(dim=1)  # (16,)

Customization:

Enable bidirectional LSTM and adjust fc input size.
Add dropout between LSTM layers.
Replace final head with attention or deeper classifier.

collect_data.py – Data Collection Module

Automate EAR/MAR capture and labeling to CSV for model training.

pip install opencv-python

python collect_data.py
# Press ESC to stop; outputs data.csv

Key constants and flow:

CSV_FILE = "data.csv"
# Header: ["EAR","MAR","LABEL"]
if not os.path.exists(CSV_FILE):
    with open(CSV_FILE, "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["EAR","MAR","LABEL"])

cap = cv2.VideoCapture(0)
while True:
    ret, frame = cap.read()
    if not ret: break

    frame, face, le, re, mo, ear, mar, _, drowsy, yawn = detector.process_frame(frame)
    frame = render(frame, face, le, re, mo, ear, mar, None, drowsy, yawn)

    label = 1 if drowsy or yawn else 0
    with open(CSV_FILE, "a", newline="") as f:
        csv.writer(f).writerow([ear, mar, label])

    cv2.imshow("Collecting Data", frame)
    if cv2.waitKey(1) & 0xFF == 27: break

cap.release()
cv2.destroyAllWindows()

Best Practices:

Calibrate in uniform lighting.
Balance label distribution per subject.
Expose CSV path and thresholds via CLI args for batch collection.
Extend to include additional sensor data (e.g., head pose, steering wheel).

By leveraging and extending these modules, you can adapt DrowCheck to new sensors, custom detection logic, or advanced sequence models. Contributions and plugin modules should follow the existing API patterns and include unit tests for new metrics or detectors.

Documentation

Contents

Quick Actions