This project is a web scraping tool designed to extract product information from an e-commerce site based on specific search terms. The tool automates the process of data collection and organization for further analysis.
Run the following command to install the required modules:
pip install -r requirements.txt
The complete scraping script is located in the scraper.ipynb
file. Execute it sequentially from the beginning.
You can use the provided search terms in ./queries/M11207321_queries.txt
or prepare your own.
If you want to use custom search terms, ensure they are saved in a .txt file
, with one search term per line:
筆電
衣服
餅乾
洗衣精
衛生紙
...
Open scraper.ipynb
and locate the Parameter Setting section. Here, you can define or modify the following parameters:
- student_id: Your student ID.
- query_path: The path to the search terms.
- results_path: The path where scraping results will be saved.
- search_url: The e-commerce site URL to scrape (must be the Taiwan Coupang page).
- short_time_sleep: Short wait time.
- medium_time_sleep: Medium wait time.
- long_time_sleep: Long wait time.
During the scraping process, ensure you collect the following product information:
- Product Name
- Product Price
- Product URL
Save the collected data as a .csv
file, including the following columns:
- product_name: Product name
- product_price: Product price
- product_url: Product URL
Ensure the .csv
file is encoded in UTF-8-SIG.
After scraping product data for each search term, save the results using the following file naming convention: StudentID_QueryName.csv
, e.g., M11207321_口罩.csv
(if your student ID contains letters, use uppercase letters).
If you need to submit the scraped data, place all result files for the search terms into a folder named after your student ID, then compress the folder into a .zip
file named after your student ID, e.g., M11207321.zip
(if your student ID contains letters, use uppercase letters).
- While scraping, you may open other windows, but do not close or minimize the Chrome window running the scraper (important).
- Ensure that the screen remains on during the scraping process (important).