https://www.linkedin.com/posts/tirtha-debnath-57b6401a0_there-is-another-ai-ml-based-data-science-activity-7123755159486164992-skyd?utm_source=share&utm_medium=member_desktop
https://onlinecourseclassifier.onrender.com/
This ML model is especially designed for Ed-Tech organizations who are confused to categorized their courses according to proper guidance. To solve this major problem, it's here to help you.
In EdTech startups, sometimes, a content manager who is belong from Non-tech background faces lots of difficulties during organizing a particular courses. There are some courses, that require deep knowledge about that tech stacks to properly organize them in website. For example: There are some differences between Data Science & Generative AI topics, but when it comes to separte them & launce different courses about them, a content manager faces multiple difficulties. In that case, this AI Generated model will help to make that task more easier.
This is one of the most complicated project related to Pytorch & Generative AI. From scraping data to building multilabel model, it took multi stages complicated codes.
- Identifying problems
- Indentifying website to scraping
- Build a web scraping script using selenium & store data
- Initial trainig before multilabel model building
- Multilabel classification
- Onnx Deployment
- Deploying to Huggingface
- Build a web application file & deploy to server
- Conclusion
The purpose of this project is :
- Helping Content Manager to organize their courses.
- Helping learning enthusisast to find suitable courses for them.
- Content Label research purposes.
To collect dataset & run a web scraping script, I have chosen udemy.com site. Undemy is currently one of the most renowned & biggest online courses sites. Besides, due to proper categoriziation & detailed oriented, I have chosen this site. udemy contains 12 main categories & 80+ sub categories. So, this is gonna be a tough & challenging task at web scraping part.
The challening part to build the most efficient web scraping script was to identify the proper elements that contains our desirable data & interact with the site. So my approach was:
- Go to udemy.com
- Navigate main categories
- Access into sub categories
- Gather Course Name, Price, Categories, urls
- Access into a single course
- Combine all details with course descriptions
- Repeat the process again & again
Through these steps, I have successfully gathered ##350k+ data for 12 Main categories & 80+ Sub categories of courses.