This Streamlit application allows you to ask questions about scientific documents and receive answers based on the content. It utilizes text extraction from various file formats and leverages the power of a pre-trained language model for information retrieval.
- Extract text from PDF, DOCX, HTML, and image files.
- Process extracted text into meaningful chunks.
- Handle user questions and provide answers based on the document content.
- Maintain a chat history for previous interactions.
This application relies on the meta_ai_api
library, which serves as a mock for a real Meta AI API. Thanks to the work of @Strvm at Strvm/meta-ai-api on this library, we don't require an actual Meta AI API key.
Please refer to the requirements.txt
file for a list of dependencies needed to run this application.
- Clone the repository:
git clone https://github.com/vansh-khaneja/Chat-Multiple-Docs-Indexify.git
cd Chat-Multiple-Docs-Indexify
- Install the required dependencies:
pip install -r requirements.txt
Start the Streamlit application with the following command:
streamlit run test.py
- Start the application.
- Select the file types you want to process (PDF, DOCX, HTML, Image).
- Upload your documents.
- Click the "Process" button to extract text.
- Once processed, ask a question about the document content in the text box.
- Click "Enter" or submit the question.
- The application will display an answer based on the extracted text and the pre-trained model.
We welcome contributions to this project! Feel free to fork the repository and submit pull requests with your enhancements.