This workshop involves simulating a streaming data transmission, where we use Kafka and Python to implement a regression algorithm. For each record that arrives at the Kafka consumer, a prediction is made.
workshop03_kafka/
├── 0_src/ # Source scripts
│ ├── __init__.py # Initialization file for the module
│ ├── Database.py # Database handling script
│ ├── df_test.py # DataFrame testing script
│ ├── kafka_test.py # Kafka testing script
│ ├── modelo_regresion.pkl # Pickled regression model
│ └── transform.py # Data transformation script
├── 1_edas/ # Exploratory Data Analysis
│ └── eda.ipynb # EDA Jupyter notebook
├── .gitignore # Ignored files for Git
├── docker-compose.yml # Docker Compose configuration
├── main.py # Main execution script
├── README.md # Project description and guide
└── requirements.txt # Project dependencies
Before getting started with this project, make sure you have the following components installed or ready:
- Apache Kafka
- Python
- Database (can be local or cloud-based, if it's local I recommend using PostgreSQL)
- Docker
Here are the steps to set up your development environment:
-
create a virtual enviroment: Run the following command to create a virtual enviroment called venv:
python -m venv venv
-
activate your venv: Run the following commands to activate the enviroment:
cd venv/bin source activate
in case you don't have the folder 'bin' go to 'Scripts' Folder
-
Install Dependencies: Once you're in the venv run the following command to install the necessary dependencies:
pip install -r requirements.txt
-
Create pg_config: You need to create a json file called "pg_config" with the following information, make sure you replace the values with the correspondent information :
{ "user" : "myuser", "passwd" : "mypass", "server" : "XXX.XX.XX.XX", "database" : "demo_db" }
-
Running docker compose: Go to the project's folder and run:
- docker-compose up
- docker ps
Open a terminal and enter to the container with:
- docker exec -it kafka-test bash
create a new topic
- kafka-topics --bootstrap-server kafka-test:9092 --create --topic kafka_workshop
-
Run main.py: At this point everything is ready and you can run:
python main.py
If you have any questions or suggestions, feel free to contact me at [lapiceroazul@proton.me].