BlinkGesture
Send keyboard commands handsfree by blinking twice / Double Blink gesture
App Screenshot
Check out the older version
1. App Overview
It captures webcam video, locates and tracks eyes, extracts an aligned eye strip, enhances contrast, classifies eye state per frame, and detects single and double blinks. On a double‑blink, it issues a command via xdotool.
2. Data Flow
- Frame Capture: OpenCV
VideoCapture grabs frames at ~30 ms intervals.
- Detection: Every N frames, a Haar cascade (
haarcascade_eye.xml) locates eyes in the downscaled grayscale image.
- Tracking: Two KCF trackers maintain eye bounding boxes between detections; if tracking fails repeatedly, detection resets.
- Strip Extraction: Compute eye centers, angle, inter‑eye distance; define a rotated rectangle; apply affine warp to align eyes horizontally and crop a fixed‑size strip.
- CLAHE Preprocessing: Apply Contrast Limited Adaptive Histogram Equalization to the strip to normalize lighting.
- Patching & Classification: Slide a 64×64 window over the strip at scales 1.0 and 0.75, with stride 10. Each patch is converted to INT8, fed to the ONNX INT8 model, and softmaxed. Confident full‑scale results return early; smaller‑scale patches vote (open votes weighted higher).
- Temporal Logic: Maintain a buffer of recent open/closed states. Require a minimum number of closed frames before registering an open transition as a blink. Track blink timestamps to detect double‑blinks within a frame gap.
- Command Dispatch: When a double‑blink is detected, call
xdotool with the configured key or mouse click string.
3. Python Pipeline
3.1 Training (train_eye_cnn.py)
- Dataset: Images under
dataset/train/open eyes and dataset/train/close eyes.
- Preprocessing:
- Convert to grayscale.
- Resize to 64×64.
- Apply CLAHE (clipLimit=2.0, grid=8×8).
- ToTensor.
- Class Balancing: Downsample “closed eyes” to 80% to roughly match “open eyes”.
- Model Architecture:
EyeCNN(
Conv2d(1→16, 3×3, pad1) → ReLU → BatchNorm
Conv2d(16→32, 3×3, pad1) → ReLU → BatchNorm
MaxPool2d(2)
Conv2d(32→64, 3×3, pad1) → ReLU → BatchNorm
MaxPool2d(2)
AdaptiveAvgPool2d(4×4)
Flatten
Linear(64*4*4→128) → ReLU → Dropout(0.3)
Linear(128→2)
)
- Loss & Optimization:
- CrossEntropy with class weights [1.0, 1.8].
- Label smoothing 0.15.
- Adam optimizer (lr=1e-3, weight_decay=1e-4).
- Train for 7 epochs; save best on validation accuracy.
3.2 Export to ONNX (export_cnn_to_onnx.py)
- Load
eye_cnn.pth into EyeCNN model.
- Dummy input 1×1×64×64 on CPU or CUDA.
- torch.onnx.export →
eye_cnn.onnx (opset 11, dynamic batch).
3.3 Quantization to INT8 (quantize_to_int8.py)
- Use calibration subset (256 images, balanced open/closed) with CLAHE preprocessing.
- quantize_static →
eye_cnn_quant_tmp.onnx (QOperator, QInt8).
- Patch graph:
- Remove initializer inputs, replace input tensor type to INT8 with shape fallback dims.
- Add scale (1/255.0) and zero_point (0) initializers.
- Remove QuantizeLinear nodes immediately after input.
- Clean duplicate initializers.
- Save final
eye_cnn_int8.onnx.
3.4 Evaluation (confusionmatrix.py)
- Load
eye_cnn_int8.onnx via ONNX Runtime.
- For each test image in
dataset/test/{open eyes, close eyes}:
- Grayscale → CLAHE → Resize → Normalize → Quantize to int8 → reshape to (1,1,64,64).
- Run session.run; softmax; if max(prob) ≥ 0.7, record prediction.
- Print classification report and confusion matrix (scikit-learn).
4. C++ Runtime
- blink_detector.cpp:
- Loads
eye_cnn_int8.onnx with ONNX Runtime C++.
processStrip(): CLAHE + patch scanning logic + ONNX inference + thresholds → state.
- Temporal buffer for blink detection; flags double‑blink.
- eyestrip.cpp: Extraction and affine alignment of eye strip; history‑based smoothing; FPS overlay.
- command_dispatcher.cpp: Parses command string; maps to
xdotool key or click commands; executes via QProcess.
- main.cpp / gui.cpp: Qt GUI for selecting camera, cascade, parameters; timer loop calls strip + detector; displays debug image.
5. Build & Dependencies
cmake .. \
-DUSE_SYSTEM_OPENCV=ON \
-DUSE_SYSTEM_ONNX=ON
make -j
- Qt5 Widgets
- OpenCV: core, imgproc, videoio, tracking, objdetect, imgcodecs
- ONNX Runtime C++ API
- xdotool (Linux utility)
- Haar cascade XML file in working directory
6. Pros & Cons
Pros
- Fast on‑CPU inference via INT8 quantization.
- Lighting invariance through CLAHE.
- Tracker/detector hybrid reduces jitter.
Cons
- Haar cascades can fail on extreme poses or occlusions.
- Patch scanning is CPU-heavy.
7. Future Additions
- Change sliding window CNN to a single pass CNN.
- GPU support using CUDA
- Windows support.
←
Comments