It's a very basic new crawler application ,where the script will crawl through the news website times of india everyday at morning 9am, and store the headline the date of headline and other info in our database , whenever you want you can retrive the headline , used cronjobs for job scheduler as well.
Here is how to run the code:
1.Create directory named assesment using command: mkdir assigment
2.Navigate to the directory using the command : cd assigment
3.Clone the project using command : git clone https://github.com/itssunny322/NewsCrawler.git
4.Navigate to the directory where requrements.txt
5.Create virtual environment using the command:
-> python3 -m venv env
6.Activate virtual environment using the command :
-> source env/bin/activate
7.Create database using the command :
-> mysql -u root -p
-> enter password
-> create database newsdb;
8.Navigate to the directory where manage.py is present
9. add task schedular to crawl after every 24 hour using the command -> python manage.py crontab add
10.Run the server using the command: -> python manage.py runserver
- Python
- Django, DRF
- HTML, CSS, JS
- pycharm or vscode