A local, private Knowledge Graph generator that extends GraphRAG-SDK to process many other file types, storing the generated graphs in FalkorDB.
- Core Framework: GraphRAGSDK
- Document Processing: Unstructured-IO
- Graph Database: FalkorDB
- Container Runtime: Docker
- CI/CD: GitHub Actions
- Testing: pytest
- create a virtual env
python3 -m venv .venv
- source it
source .venv/bin/activate
- install dependencies
pip install -r requirements.txt
- launch falkorDB instance on a new terminal
docker run -p 6379:6379 -p 3000:3000 -it --rm -v ./data:/data falkordb/falkordb:edge
- configure environment variables for your favorite llm model in a
.env
file
# leave these empty if you want to use Ollama
OPENAI_API_KEY=""
GOOGLE_API_KEY=""
- run the docker compose
docker-compose -f docker/docker-compose.yml up -d
First you have to give the program some files to process:
python main.py --folder <path_to_folder_with_files>
After all the processing done by the graph-rag engine you'll enter chat mode and you can start asking questions regarding your data. (you can keep adding files with the same command).
Other commands are:
--delete-files
- (erase previously generated graph and exit)
--delete-ontology
- (erase all internal files and exit)
To do a basic testing i processed two wikipedia articles separately:
-
This is the generated knowledge graph viewed with the help of the internal falkorDB tool:
-
To query the entire falkor DB you can run this query:
match(n) optional match (n)-[e]-(m) return *
-
This is the console output:
Console log
Question: Retrieve all the drum equipment that you know
############################################################### ("I don't have any information about drum equipment.", <graphrag_sdk.models.ollama.OllamaChatSession object at 0x31cc3a7e0>) ###############################################################
Question: Name all of the integrants of the band BadBadNotGood
############################################################### ("BadBadNotGood. (MATCH (b:Band {name: 'BadBadNotGood'})-[:MEMBER_OF]->(p:Person) RETURN b, p) Returns: b - Band {name: 'BadBadNotGood'} p - Person {name: 'Abe Rubenstein'} p - Person {name: 'Matt Huber'}" ###############################################################
-
In terms of the integration to support ppt, doc and pdf files, the key is in the selection of a reliable font that supports unicode characters. In this case
DejaVuSans
was used and it achieved great results. -
This is an overview of the system:
- The system architecture is composed of two main services. One being the falkorDB instance and the other one being the main process. If for example a suit of services from AWS is chosen, a simple instance for the falkor service is more than enough. A nice case to analyze would be assigning to the main process an instance such as EC2 G4 that is specially prepared for ML inference with lots of GPU processing power, and in conjunction with the hability of GraphRagSDK to integrate OLlama models, a self hosted LLM instance would be possible (with a cost of ~$0.5/hour i think it can be reasonable to try). Further accomodations for manually scaling with nginx can be made. On a production server i'd use AWS ECS (Elastic Container Service) instead of plain Docker strategy.