This project demonstrates the integration of streaming transactional data with master data using the Mesh Join algorithm, which is then loaded into a Data Warehouse for analytical processing. The project leverages ETL (Extract, Transform, Load) processes to enrich transactional data and store it in a normalized data warehouse schema for OLAP-ready analysis.
- Mesh Join Integration: Combines real-time transactional data streams with master data for enriched datasets.
- ETL Process: Uses Extract, Transform, and Load processes to prepare and store data.
- Normalized Data Warehouse: Data is processed and stored in a structured schema (
tashi_dw
). - OLAP-ready SQL Queries: Predefined queries for in-depth analytical insights on sales, revenue, and more.
- Real-time Data Processing: Designed for real-time integration of transactional data with master data.
- Scalable ETL Pipeline: Optimized for handling large datasets and real-time processing.
Ensure you have the following software installed:
- Java Development Kit (JDK) (version 8 or higher)
- MySQL Database (Make sure MySQL is installed and running)
- CSV Files: The following CSV files for testing data loading and processing:
transactions.csv
products_data.csv
customers_data.csv
-
Create the master database (
tashi_db
): Run the provided SQL script to set up thetashi_db
schema. -
Insert Sample Data: Insert data into the
customers
andproducts
tables intashi_db
. -
Create the Data Warehouse (
tashi_dw
): Execute the SQL script for creating the data warehouse schema (tashi_dw
).
- Open the project in an IDE (e.g., IntelliJ IDEA or Eclipse).
- Update the database connection settings in the Java code:
String dbUrlTashiDb = "jdbc:mysql://localhost:3306/tashi_db"; String dbUrlTashiDw = "jdbc:mysql://localhost:3306/tashi_dw"; String user = "root"; String password = "your_password";
- Update the path to the transactional data CSV file in the code:
String csvFilePath = "C:/Users/USER/Downloads/transactions.csv";
Follow the steps below to run the project and create the Data Warehouse:
-
Create the Data Base & Warehouse:
- Run
createDB.sql
file in MySQL Workbench to set up the database (tashi_db
) &createDWH.sql
file in MySQL Workbench to set up the data warehouse (tashi_dw
).
- Run
-
Open Eclipse IDE:
- Launch Eclipse IDE.
-
Import the Project:
- In Eclipse, go to File > Open Projects from File System.
- Use the directory option to select and import the project folder.
-
Complete the Project Import:
- Click on Finish. The project will now be loaded into Eclipse.
-
Configure the Build Path:
- Right-click on the project name in the Project Explorer.
- Select Build Path > Configure Build Path.
-
Add External JAR:
- In the Libraries section, click on Classpath.
- Click on Add External JARs and select the
mysql-connector-java-8.0.27.jar
file.
-
Apply Changes:
- Click on Apply and Finish to save the changes.
-
Locate
MeshJoin.java
:- In the src folder, search for the
MeshJoin.java
file and select Run.
- In the src folder, search for the
-
Start the Process:
- Press Enter to begin the process. Wait for the message: "MESHJOIN completed successfully" to confirm completion.
-
Verify Data Warehouse Creation:
- Once the process completes, the Data Warehouse using the extended Mesh Join algorithm will be created successfully.
The tashi_dw
schema includes the following tables:
- Customers: (
customer_id
,customer_name
,gender
) - Products: (
productid
,productName
,productPrice
) - Store: (
storeID
,storeName
) - Supplier: (
supplierID
,supplierName
) - Orders: (
Order_ID
,Order_Date
) - Transactions (Fact Table): (
Transaction_ID
,productid
,customer_id
,time_id
,storeID
,supplierID
,Order_ID
,Sales
,Quantity_Ordered
)
The repository includes SQL queries for:
- Analyzing sales trends.
- Calculating revenue contributions by products, stores, and suppliers.
- Identifying seasonal and outlier sales patterns.
- Data Integration: Real-time enrichment of streaming data.
- ETL Optimization: Efficient data processing using Mesh Join.
- SQL Proficiency: Writing advanced queries for analytics.
- Data Warehousing: Designing and populating normalized schemas.
- Tashfeen Abbasi (abbasitashfeen7@gmail.com)