Documenting SQL case studies from Danny Ma's 8 Week SQL Challenge for learning and practice purposes.
Install docker and docker compose:
$ docker compose up
SQLPad can accessed at http://localhost:3000 or the port specified in the compose.yml
file.
Stop and remove the containers with:
$ docker compose down
The project manager used in this project is uv:
$ uv sync --frozen
The Athena class can be used to interact with Amazon Athena. To use this class, the principal whose credentials are used to access the AWS services must have the necessary permissions for Athena plus S3 if a non-default bucket is used to store the query results (see below for more details).
The needed permissions can be encapsulated in a boto3 session instance and passed as the first argument to the constructor of the Athena
class. The create_session utility function can be used to create the session instance.
The data
parquet files for the case studies must be stored in an S3 bucket. All DDL queires are stored in the sql
directory under each case study directory. These must be adjusted to point to the correct S3 urls. The data files can be uploaded to an S3 bucket using the aws cli or the console.
# Create a bucket
$ aws s3api create-bucket --bucket sql-case-studies --profile profile-name
# Upload all data files to the bucket
$ aws s3 cp data/ s3://sql-case-studies/ --recursive --profile profile-name
Optionally, query results can configured to be stored in a non-default (i.e., aws-athena-query-results-accountid-region
) s3 bucket. The query result S3 url can be stored as an environment variable, e.g. ATHENA_S3_OUTPUT=s3://bucket-name/path/to/output/
, which can then be passed as the s3_output
argument to the Athena
class constructor. The client creates the default bucket if the s3_output
argument is not provided.
import os
s3_output = os.getenv('ATHENA_S3_OUTPUT', '')
Each case study folder contains a notebooks
directory containing Jupyter notebooks that can be used to run SQL queries.