This repository provides a comprehensive collection of Retrieval-Augmented Generation (RAG) implementations using various modern frameworks and tools. Each implementation demonstrates different approaches and capabilities for building robust RAG solutions.
- Tech Stack: Verba, Weaviate, OpenAI
- Key Features:
- PDF document ingestion and processing
- Document chunking with overlap
- Vector storage using Weaviate
- Conversational retrieval using GPT models
- Source attribution for answers
- Best For: Document-heavy applications requiring precise retrieval and attribution
- Tech Stack: LangChain, ChromaDB, OpenAI
- Key Features:
- Multiple PDF document processing
- Automatic text chunking
- Semantic search using ChromaDB
- Conversational QA interface
- Chat history support
- Best For: Complex RAG pipelines with multiple components
- Tech Stack: LlamaIndex, OpenAI, Vector Store
- Key Features:
- Advanced data connectors
- Structured data handling
- Optimized query engines
- Custom data indexing
- Best For: Data-intensive applications with diverse sources
- Tech Stack: Phoenix, Vector DB
- Key Features:
- Real-time vector indexing
- High-performance retrieval
- Scalable architecture
- Modern deployment options
- Best For: High-performance, scalable RAG systems
- Tech Stack: MongoDB Atlas, Vector Search
- Key Features:
- Atlas Vector Search integration
- Enterprise-grade security
- Scalable document storage
- Native vector indexing
- Best For: Enterprise applications requiring robust data management
- Tech Stack: Haystack Framework
- Key Features:
- Modular pipeline architecture
- Multiple retriever options
- Production-ready components
- Flexible deployment
- Best For: Production-grade search and QA systems
- Tech Stack: NeMo Guardrails, LLM
- Key Features:
- Content filtering
- Topic boundaries
- Conversation flow control
- Safety mechanisms
- Best For: Applications requiring controlled AI interactions
Each implementation directory contains:
- Detailed README with setup instructions
- Complete source code
- Configuration examples
- Usage demonstrations
- Testing scripts
- Python 3.9+
- Conda (for environment management)
- Relevant API keys (OpenAI, etc.)
- Vector store setup (varies by implementation)
- Clone the repository:
git clone <repository-url>
cd RAG-Toolkit
- Choose an implementation directory
- Follow the specific README instructions
- Set up required API keys and services
- Run the example code
Implementation | Vector Store | LLM Support | Document Types | Deployment Complexity |
---|---|---|---|---|
Verba | Weaviate | OpenAI | Medium | |
LangChain | ChromaDB | Multiple | Multiple | Low |
LlamaIndex | Multiple | Multiple | Multiple | Medium |
Phoenix | Custom Vector DB | Multiple | Multiple | High |
MongoDB | Atlas Search | Multiple | Multiple | Medium |
Haystack | Multiple | Multiple | Multiple | Medium |
NeMo | - | Multiple | Text | Low |
Contributions are welcome! To contribute:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.