This Rust project scrapes the top perfumes listed on Parfumo for different categories (Women, Men, Unisex) and saves the results into CSV files. The output includes the perfume's name, brand, URL, and image source.
- HTTP Request: Sends a GET request to the Parfumo website for each category.
- HTML Parsing: Parses the returned HTML using the
scraper
crate. - Data Extraction:
- Extracts perfume details using CSS selectors.
- Handles missing or incomplete data gracefully.
- CSV Export: Writes the extracted data into category-specific CSV files.
- Scrape Perfume Data: Extracts name, brand, URL, image and score source for each perfume.
- Handles Multiple Categories: Scrapes data for Women, Men, and Unisex categories.
- CSV Export: Saves the scraped data in neatly formatted CSV files.
Ensure that you have Rust installed on your system. If not, you can install it using rustup:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
This project uses the following Rust crates:
reqwest
: For making HTTP requests.scraper
: For parsing and scraping HTML content.csv
: For writing CSV files.
Clone this repository to your local machine:
git clone https://github.com/yourusername/parfumo-scraper.git
cd parfumo-scraper
Run the scraper using cargo
:
cargo run
The scraper will fetch perfume data from the following pages:
/Tops/Women
/Tops/Men
/Tops/Unisex
Each category's data is saved to a separate CSV file:
perfumes_Women.csv
perfumes_Men.csv
perfumes_Unisex.csv
Each CSV file will look like this:
Name | Brand | URL | Image |
---|---|---|---|
Chanel No. 5 | Chanel | https://www.parfumo.com/Perfumes/... | https://imageurl.com/... |
Sauvage | Dior | https://www.parfumo.com/Perfumes/... | https://imageurl.com/... |
Created by Antonio Djigo.