(This is a work in progress).
- Technical Prerequisites, Guidance
- Source Content Considerations
- RAG Pipeline Considerations
- Project Design Considerations
Deckard's architecture is based on a core concept of 'data units': representations of one or more ideas in textual form.
Collectors collect data units from source(s) and transform them into pure textual representations. Modules can be written to access data via the web, from a database, or another endpoint. Pipelines can leverage multiple collectors.
Chunkers divide data units into fragments of the original data unit. These fragments are called 'chunks'. By chunking data, we:
- Allow relevant information from the data unit's content to fit within the (limited) LLM context space.
- Segregate semantic concepts within the data unit for improved matching in the RAG pipeline.
Encoders map chunks into vector representations.
Database modules provide API interfaces into local storage.
Interfaces expose data to users.
RAG builders build the underlying data necessary for rag queries.