Skip to content

"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.

Notifications You must be signed in to change notification settings

dwain-barnes/llama3.2-vision-ocr-streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Private Local OCR with Llama 3.2 Vision and Streamlit

logo

Welcome to the Private Local OCR project! This tool leverages Meta's Llama 3.2 Vision model via the Ollama platform, providing a user-friendly Streamlit interface for efficient and private text extraction from images. It was made as a part of a blog series here

Features

  • Local OCR Processing: Perform OCR tasks entirely on your local machine, ensuring data privacy and eliminating the need for internet connectivity.
  • Advanced Vision Model: Utilize Meta's Llama 3.2 Vision model for accurate text extraction.
  • User-Friendly Interface: Interact seamlessly through a Streamlit-based front-end, allowing easy image uploads and text viewing.
  • Support for Various Image Formats: Process multiple image formats, including JPEG, PNG, and BMP.
  • Extensible Architecture: Easily integrate additional functionalities or models as needed.

Prerequisites

  • Ollama Platform: Ensure the Ollama platform is installed and configured on your machine to access the Llama 3.2 Vision model.
  • Python 10 or Higher: Verify that Python is installed on your system.

Installation

  1. Clone the Repository:

    git clone https://github.com/dwain-barnes/llama3.2-vision-ocr-streamlit.git
    cd LlamaOCR-Streamlit
  2. Set Up a Virtual Environment (optional but recommended):

    python3 -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install Dependencies:

    pip install -r requirements.txt

Usage

  1. Run the Streamlit Application:

    streamlit run app.py
  2. Access the Interface: Open your web browser and navigate to http://localhost:8501 to access the application.

  3. Upload and Process Images:

    • Use the "Browse files" button to upload an image.
    • Click the "Process" button to extract text from the uploaded image using the Llama 3.2 Vision model via Ollama.
    • View the extracted text displayed on the interface.

Dependencies

  • Python 10 or higher
  • Streamlit
  • Ollama
  • Pillow

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your enhancements. Ensure that your code adheres to the project's coding standards and includes appropriate tests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Special thanks to Meta for developing the Llama 3.2 Vision model and to the Ollama platform for providing access to advanced AI models. Their work has been instrumental in the development of this project.

About

"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages