Whisper API is a functional ans scalable speech to text API developed using Python and whisper as base. The objective of this repository is give some easy to configure base api to integrate in some special case, for this purpose is necessary to take that we use a client-side pattern (it's possible to change depending of the case). Also, we give the docker container to simplify the test and the deployment, check de package zone.
-
Download dependencies
# on Ubuntu or Debian sudo apt update && sudo apt install ffmpeg libasound-dev libportaudio2 libportaudiocpp0 portaudio19-dev # on Arch Linux sudo pacman -S ffmpeg libasound-dev libportaudio2 libportaudiocpp0 portaudio19-dev # on MacOS using Homebrew (https://brew.sh/) brew install ffmpeg pyaudio # on Windows using Chocolatey (https://chocolatey.org/) choco install ffmpeg pyaudio # on Windows using Scoop (https://scoop.sh/) scoop install ffmpeg pyaudio
-
Create a python environment / enable the environment
conda create --name whisper python=3.10 -y conda activate whisper pip install -r requirements.txt
-
Run
uvicorn app:app --reload
-
Docker
docker run -d --name WhisperAPI -p8000:80 ghcr.io/danielsarmiento04/custom_whisper_api:latest
- Websocket service, idea
This repository is licensed under the Apache 2.0 License.
[1] Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. doi:10.48550/ARXIV.2212.04356
[2] Ramírez, S. FastAPI [Computer software]. https://github.com/tiangolo/fastapi