You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Potential Data Format that gives courses to take and reasoning. Will be replace with LPI Dataset courses/infoy
input: I want to learn about llm and how to finetune them. I'm intermediate and i want to build a rag pipeline
output: Given your intermediate level and interest in building a RAG pipeline using LLMs, I recommend the following courses and modules:
Course/Module
Source
Level
Duration (Estimate)
Keywords
Reason
Introduction to Large Language Models (LLMs)
Various online courses (e.g., Coursera, edX)
Beginner/Intermediate
Varies (4-8 weeks)
LLMs, transformers, attention mechanisms, language modeling, tokenization
Provides foundational knowledge of LLMs, their architecture, and capabilities. Essential before tackling fine-tuning or RAG.
Natural Language Processing (NLP) Fundamentals
Various online courses (e.g., Stanford NLP)
Intermediate
Varies (6-10 weeks)
NLP, text preprocessing, word embeddings, sentiment analysis, named entity recognition
Necessary for understanding how LLMs process and understand text. Many RAG techniques rely on NLP for data preprocessing and query understanding.
Fine-tuning LLMs
Hugging Face Course, Papers with Code
Intermediate/Advanced
Varies (2-4 weeks)
Fine-tuning, transfer learning, hyperparameter tuning, model evaluation
Teaches you how to adapt pre-trained LLMs to specific tasks, crucial for building a high-performing RAG system. Learn techniques like prompt engineering.
Focuses on the architecture and implementation of RAG pipelines. You'll learn to select and integrate components like vector databases (e.g., Pinecone, Weaviate) and retrieval methods.
Python for Data Science (if needed)
DataCamp, Codecademy, Fast.ai
Intermediate
Varies (2-4 weeks)
Python, pandas, numpy, scikit-learn
Reinforce your Python skills for data manipulation and model building within your RAG pipeline.
As discussed on Discord, you create a new dataset based on the LPI dataset and use that for your training and other purposes.
LPI dataset is still needed for our other purposes (and DOES NOT need to be replaced), plus it holds metadata about the courses which is added information (if can or need to be used).
Can you pls amend your wordings to remove any confusions to others who are also referring or reading this issue.
Potential Data Format that gives courses to take and reasoning. Will be replace with LPI Dataset courses/infoy
input: I want to learn about llm and how to finetune them. I'm intermediate and i want to build a rag pipeline
output: Given your intermediate level and interest in building a RAG pipeline using LLMs, I recommend the following courses and modules:
The text was updated successfully, but these errors were encountered: