Skip to content

A local, private Knowledge Graph Generator that extends the GraphRAG-SDK framework.

Notifications You must be signed in to change notification settings

juanjofrelopez/local-graphrag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge Graph Generator 🕷 🕸

Project Overview 🪻

A local, private Knowledge Graph generator that extends GraphRAG-SDK to process many other file types, storing the generated graphs in FalkorDB.

Tech Stack & Tools 🥞

  • Core Framework: GraphRAGSDK
  • Document Processing: Unstructured-IO
  • Graph Database: FalkorDB
  • Container Runtime: Docker
  • CI/CD: GitHub Actions
  • Testing: pytest

Setup 🛠️

Local 🏠

  1. create a virtual env
python3 -m venv .venv
  1. source it
source .venv/bin/activate
  1. install dependencies
pip install -r requirements.txt
  1. launch falkorDB instance on a new terminal
docker run -p 6379:6379 -p 3000:3000 -it --rm -v ./data:/data falkordb/falkordb:edge
  1. configure environment variables for your favorite llm model in a .env file
# leave these empty if you want to use Ollama
OPENAI_API_KEY=""
GOOGLE_API_KEY=""

Containerization 🐙

  1. run the docker compose
docker-compose -f docker/docker-compose.yml up -d

How to use 🦥 🌿 🌸

First you have to give the program some files to process:

python main.py --folder <path_to_folder_with_files>

After all the processing done by the graph-rag engine you'll enter chat mode and you can start asking questions regarding your data. (you can keep adding files with the same command).

Other commands are:

--delete-files

- (erase previously generated graph and exit)

--delete-ontology

- (erase all internal files and exit)

Results 📊

To do a basic testing i processed two wikipedia articles separately:

match(n) optional match (n)-[e]-(m) return *
  • This is the console output:

    Console log

    Question: Retrieve all the drum equipment that you know

    ###############################################################
    ("I don't have any information about drum equipment.", <graphrag_sdk.models.ollama.OllamaChatSession object at 0x31cc3a7e0>)
    ###############################################################

    Question: Name all of the integrants of the band BadBadNotGood

    ###############################################################
    ("BadBadNotGood.
    
    (MATCH (b:Band {name: 'BadBadNotGood'})-[:MEMBER_OF]->(p:Person) RETURN b, p)
    
    Returns:
     b - Band {name: 'BadBadNotGood'}
      p - Person {name: 'Abe Rubenstein'}
      p - Person {name: 'Matt Huber'}"
    ###############################################################
  • In terms of the integration to support ppt, doc and pdf files, the key is in the selection of a reliable font that supports unicode characters. In this case DejaVuSans was used and it achieved great results.

  • This is an overview of the system:

A

  • The system architecture is composed of two main services. One being the falkorDB instance and the other one being the main process. If for example a suit of services from AWS is chosen, a simple instance for the falkor service is more than enough. A nice case to analyze would be assigning to the main process an instance such as EC2 G4 that is specially prepared for ML inference with lots of GPU processing power, and in conjunction with the hability of GraphRagSDK to integrate OLlama models, a self hosted LLM instance would be possible (with a cost of ~$0.5/hour i think it can be reasonable to try). Further accomodations for manually scaling with nginx can be made. On a production server i'd use AWS ECS (Elastic Container Service) instead of plain Docker strategy.

About

A local, private Knowledge Graph Generator that extends the GraphRAG-SDK framework.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published