SignDecode is a computer vision application that performs real-time American Sign Language (ASL) recognition using skeletal hand tracking and neural network classification. The system processes webcam input to detect hand landmarks and classify gestures into alphanumeric characters (A-Z, 0-9).
Key Technical Achievement: By using skeletal coordinate data instead of raw pixel analysis, the system achieves 99% reduction in input dimensionality (63 features vs 4,096 pixels for 64x64 images) while maintaining 92% classification accuracy.
Camera Input → MediaPipe Hand Detection → Landmark Extraction (21 points × 3 coords)
→ MLP Classifier → Temporal Smoothing → Output Display
Input Layer: 63 features (21 hand landmarks × xyz coordinates)
Hidden Layer 1: 128 neurons, ReLU activation, 30% dropout
Hidden Layer 2: 64 neurons, ReLU activation, 30% dropout
Hidden Layer 3: 64 neurons, ReLU activation
Output Layer: 36 neurons (A-Z, 0-9), Softmax activation
Training Configuration:
Performance Metrics:
SignDecode/
├── src/ # Application source code
│ ├── app.py # Flask server, API endpoints
│ ├── model.py # Model wrapper class
│ ├── utils.py # Keypoint extraction utilities
│ ├── labels.py # Class label mappings
│ ├── text_to_speech.py # Audio output module
│ ├── static/ # Frontend assets (CSS, JS)
│ └── templates/ # HTML templates
├── training/ # Model training pipeline
│ ├── collect_data.py # Data collection utility
│ ├── train_model.py # Model training script
│ └── dataset/ # Training data storage
├── models/ # Trained model artifacts
│ └── sign_language_model.h5
├── run.py # Application entry point
├── requirements.txt # Python dependencies
└── README.md
# Clone repository
git clone https://github.com/thesakshidigg/SignDecode-Sign-Language-Recognition-.git
cd SignDecode-Sign-Language-Recognition-
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run application
python run.py
Navigate to http://localhost:5000 in your browser.
python run.py
The Flask server starts on port 5000. The web interface provides:
Step 1: Collect Training Data
cd training
python collect_data.py
Follow prompts to record hand gestures for each character. Data is saved to training/dataset/sign_data.csv.
Step 2: Train Model
python train_model.py
Trains the neural network and saves the model to models/sign_language_model.h5.
MediaPipe detects 21 anatomical landmarks per hand:
Each landmark provides (x, y, z) coordinates, normalized to [0, 1] range.
The system uses a Multi-Layer Perceptron (MLP) rather than a Convolutional Neural Network (CNN) because:
A 15-frame consistency filter prevents prediction flickering:
if predicted_label == last_predicted_label:
frame_count += 1
if frame_count >= THRESHOLD_FRAMES:
output_text += predicted_character
Processes a single video frame for hand detection and classification.
Request:
{
"image": "data:image/jpeg;base64,..."
}
Response:
{
"prediction": "A",
"image": "data:image/jpeg;base64,..."
}
Returns current recognized text.
Clears the output text buffer.
Contributions are welcome. Please follow standard Git workflow:
git checkout -b feature/improvement)git commit -m 'Add feature')git push origin feature/improvement)MIT License - see LICENSE file for details.
Sakshi Diggikar
GitHub: @thesakshidigg