Use Discord chat threads as the UI for your local LLM inference server (or actually any OpenAI-compatible API server; local or not).
This is basically for those who live in Discord, and for those who don't want to spin up another UI for interacting with LLMs.
Follow the instructions of creating and installing a Bot to your server: Creating a Bot Account | discord.py Docs
Important Bot Permissions are: Send Messages
, Send Messages in Threads
, Read Message History
Also, keep note of your Bot token for later.
Create a channel named local-llm-wmll
(case sensitive). The bot will only work inside Threads that are in this channel.
Quickest way to run is getting the docker image.
docker pull ghcr.io/neil-vqa/wumlla-server:latest
Set DISCORD_BOT_TOKEN
, INFERENCE_SERVER
, and BOT_SYSTEM_PROMPT
environment variables when running a container.
If you want to extend or configure something, clone this to your local machine or VPS, then follow the next steps.
Rename .env.sample
to .env
, then provide DISCORD_BOT_TOKEN
, INFERENCE_SERVER
, and BOT_SYSTEM_PROMPT
.
The instructions.txt
provides copy-pastable commands to build and run the bot server.
Usual step of 1) create a virtual environment, 2) install dependencies, 3) run python serve.py
- Build the system in a way that will allow to have plugins/extensions for the functionality of the LLM bot.
Here is a screenshot of Wumlla as a podcaster assistant, working within local-llm-wmll
channel docker
thread. I have llama.cpp serve SmolLM2-1.7B-Instruct-Q6_K
locally.
Wumlla the podcaster