This repository contains the code and resources for building a real-time Indonesian sign language (Bisindo) object detection model specifically designed for a mobile app. The model utilizes transfer learning techniques with SSD MobileNet V2 FPNLite architecture to accurately detect and localize sign language gestures in real-time video streams.
This real-time Bisindo sign language object detection model is built using transfer learning, leveraging the pre-trained weights of SSD MobileNet V2 FPNLite 320x320 on a large-scale image recognition dataset. By fine-tuning the model on a custom dataset of sign language images, it has been trained to recognize and classify sign language gestures in real-time.
More information for the model: here
The model consists of three seperate model, each trained to detect specific categories:
- Abjad (Alphabet): Detects and classifies various alphabet signs (26 classes)
- Angka (Number): Detects and classifies various number signs. (11 classes)
- Kata (Word): Detects and classifies various word signs. (23 classes)
The trained models have been converted into optimized .tflite
format for efficient deployment on mobile devices. The model is also quantized to reduce the model size while maintaining accuracy.
We have trained the models three time with each version having improved dataset and different training parameters. The following show the links for the Tensorboard visualization of the training process for each model:
The training dataset used for this project consists of a large collection of annotated sign language images. The dataset includes diverse samples of different sign language gestures, captured under various lighting conditions, backgrounds, and hand orientations. The annotations provide bounding box coordinates and corresponding labels for each sign language gesture.
Link to the dataset:
To train the model, you can do it locally or on Google Colab. For our case, we trained the model locally on a machine with CUDA enabled GPU in a linux environment using WSL2 (Ubuntu 20 LTS). You need to train the model in a linux environment because some of the commands used in the training process are linux specific.
To train the model locally, you can follow the steps below:
- Ensure that you have installed CUDA and cuDNN on your machine. You can follow the steps here to install CUDA and here to install cuDNN. We follow this guide to install CUDA on our WSL2 Ubuntu.
- Install the WSL extension on your VSCode.
- Prepare a virtual environment to train your model. You can follow the steps here to create a virtual environment.
- Go to your roboflow account to get your API key in the account settings and then store it in
.env
file. Alternatively, You can also download it manually by visiting our dataset links above and download the dataset in VOC format, then moving it to project directory into images folder. By doing this, you can skip the downloading process of the dataset in the notebook. - Clone the repository, open the notebooks in its own directory or as a root directory.
- Follow the steps in the notebook to train the model.
To train the model on Google Colab, you will need to set the python version to 3.8.10, you can follow the steps below:
- Create a new notebook on Google Colab.
- Import the notebook from this repository.
- Go to your roboflow account to get your API key in the account settings and then store it in
.env
file. Alternatively, You can also download it manually by visiting our dataset links above and download the dataset in VOC format, then moving it to project directory into images folder. By doing this, you can skip the downloading process of the dataset in the notebook. - Follow the steps in the notebook to train the model.
- After you have finished training the model, you can download the model from the notebook.