Android application using input sound to recognize voice.
in progress
- Gradle 8.2
- JVM 11
- Android SDK 34
- Kotlin 1.9.20
- Jetpack Compose 1.6.10
- Compose Multiplatform 1.6.10
- KotlinDL 0.5.2
- cudnn 7.6.3
app
- mobile application.model
- CNN model execution.processing
- Common library.
- Clone repository:
https://github.com/ExaggeratedRumors/demooder.git
- Download AudioWav data: Download from Kaggle.
- Unzip Wav files in
data_audio
directory (from root it'sdemooder-model/data_audio
directory). - [optional] Run data augmentation task:
./gradlew :model:dataAugmentation
- Run create spectrograms task:
./gradlew :model:createSpectrograms
- Run model training task:
./gradlew :model:trainModel
- Output model is saved in
data_models
directory.
Source: CREMA-D
- Audio data augmentation: about audio data augmentation.
- Gaussian noise.
- Time stretching.
- Read WAV files according to the header scheme: wav file format.
- Convert byte data to complex.
- Signal windowing: about windowing.
- Use Short-Time Fourier Transform (STFT): about STFT, about FFT.
- Filter by A-weighting or C-weighting: about weighting.
- Read classifier model.
- Record voice signal.
- Save as WAV file.
- Down-sampling signal from 48000Hz to 16000Hz: about resampling.
- Convert byte data to complex.
- Signal windowing and filter by weighting.
- Predict.
- Read data.
- Use FFT.
- Convert FFt to spectral amplitude.
- Convert to octave/thirds bands: about octave to third conversion.
- Filter by A-weighting or C-weighting.
- Build VGG architecture model: about VGG.
- CUDA for training model on GPU (Nvidia graphics cards):
- NNAPI for mobile devices environment acceleration: about inference on Android .