Coupang Scraper

This project is a web scraping tool designed to extract product information from an e-commerce site based on specific search terms. The tool automates the process of data collection and organization for further analysis.

中文版本

Environment Setup

Run the following command to install the required modules:

pip install -r requirements.txt

Usage

Scraping Script

The complete scraping script is located in the scraper.ipynb file. Execute it sequentially from the beginning.

Search Term Preparation

You can use the provided search terms in ./queries/M11207321_queries.txt or prepare your own.

If you want to use custom search terms, ensure they are saved in a .txt file, with one search term per line:

筆電
衣服
餅乾
洗衣精
衛生紙
...

Parameter Settings

Open scraper.ipynb and locate the Parameter Setting section. Here, you can define or modify the following parameters:

student_id: Your student ID.
query_path: The path to the search terms.
results_path: The path where scraping results will be saved.
search_url: The e-commerce site URL to scrape (must be the Taiwan Coupang page).
short_time_sleep: Short wait time.
medium_time_sleep: Medium wait time.
long_time_sleep: Long wait time.

Scraped Data Submission

Product Information to Extract

During the scraping process, ensure you collect the following product information:

Product Name
Product Price
Product URL

Data Storage Format

Save the collected data as a .csv file, including the following columns:

product_name: Product name
product_price: Product price
product_url: Product URL

Ensure the .csv file is encoded in UTF-8-SIG.

File Naming Convention

After scraping product data for each search term, save the results using the following file naming convention: StudentID_QueryName.csv, e.g., M11207321_口罩.csv (if your student ID contains letters, use uppercase letters).

Submit All Data

If you need to submit the scraped data, place all result files for the search terms into a folder named after your student ID, then compress the folder into a .zip file named after your student ID, e.g., M11207321.zip (if your student ID contains letters, use uppercase letters).

Important Notes

Scraper Notes

While scraping, you may open other windows, but do not close or minimize the Chrome window running the scraper (important).
Ensure that the screen remains on during the scraping process (important).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
queries		queries
results/reference		results/reference
.gitignore		.gitignore
README.md		README.md
README_ZH.md		README_ZH.md
get_product_name_set.ipynb		get_product_name_set.ipynb
requirements.txt		requirements.txt
scraper.ipynb		scraper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coupang Scraper

Table of Contents

Environment Setup

Usage

Scraping Script

Search Term Preparation

Parameter Settings

Scraped Data Submission

Product Information to Extract

Data Storage Format

File Naming Convention

Submit All Data

Important Notes

Scraper Notes

About

Releases

Packages

Languages

clw8998/Coupang_Scraper

Folders and files

Latest commit

History

Repository files navigation

Coupang Scraper

Table of Contents

Environment Setup

Usage

Scraping Script

Search Term Preparation

Parameter Settings

Scraped Data Submission

Product Information to Extract

Data Storage Format

File Naming Convention

Submit All Data

Important Notes

Scraper Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages