We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup. Note: We use our fork to setup TensorRT
-
Install docker
-
Install nvidia-container-toolkit
-
Clone this repo.
git clone https://github.com/collabora/WhisperLive.git
cd WhisperLive
- Pull the TensorRT-LLM docker image which we prebuilt for WhisperLive TensorRT backend.
docker pull ghcr.io/collabora/whisperbot-base:latest
- Next, we run the docker image and mount WhisperLive repo to the containers
/home
directory.
docker run -it --gpus all --shm-size=8g \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-p 9090:9090 -v /path/to/WhisperLive:/home/WhisperLive \
ghcr.io/collabora/whisperbot-base:latest
- Make sure to test the installation.
# export ENV=${ENV:-/etc/shinit_v2}
# source $ENV
python -c "import torch; import tensorrt; import tensorrt_llm"
NOTE: Uncomment and update library paths if imports fail.
- We build
small.en
andsmall
multilingual TensorRT engine. The script logs the path of the directory with Whisper TensorRT engine. We need the model_path to run the server.
# convert small.en
bash scripts/build_whisper_tensorrt.sh /root/TensorRT-LLM-examples small.en
# convert small multilingual model
bash scripts/build_whisper_tensorrt.sh /root/TensorRT-LLM-examples small
cd /home/WhisperLive
# Install requirements
apt update && bash scripts/setup.sh
pip install -r requirements/server.txt
# Required to create mel spectogram
wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
# Run English only model
python3 run_server.py --port 9090 \
--backend tensorrt \
--trt_model_path "path/to/whisper_trt/from/build/step"
# Run Multilingual model
python3 run_server.py --port 9090 \
--backend tensorrt \
--trt_model_path "path/to/whisper_trt/from/build/step" \
--trt_multilingual