A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
meta cpu optimization chatbot intel llama numa int8 ipex 4-bit-cpu huggingface streamlit bfloat16 neural-compression chatgpt langchain llama2 meta-ai smooth-quantization chatbot-memory
-
Updated
Feb 27, 2024 - Python