This README provides a basic description of this project
Data from remote server copying to local server copy_from_remote_server.py After copying we have to validate that all data delivered successfully. If it's true, we are loading data to S3 bucket. After loading we are submitting Saprk job in EMR cluster (main.py). Job calculates the metrics and loads it to Redshift cluster. Also Spark job creating Parquet files with sorted data.
Work in progess...
Work in progress...
- Terraform (0.12.23)
- Ansible
- PySpark (2.4.4)
- Apache Airflow
- AWS Elastic Compute Cloud (EC2)
- AWS Elastic MapReduce (EMR)
- AWS Simple Storage Service (S3)
- AWS Redshift
- AWS Elastic Kubernetes Service (EKS)
- AWS Elastic File System (EFS)
- AWS Relational Database Service (RDS)