Project with
- Orchestration: dagster
- Transformation & Testing: dbt
- Processing Engine & Database: DuckDB
- Data Source: Ergast API
Cutout of the dagster lineage graph:
- Python >= 3.10 https://www.python.org/downloads/
This project uses pyproject.toml
to describe package metadata
(see PEP 621) and uv
to manage dependencies.
Following commands create and activate a virtual environment.
- The
[dev]
also installs development tools. - The
--editable
makes the CLI script available.
Commands:
- Bash:
make requirements source .venv/bin/activate
- Windows:
python -m venv .venv .venv\Scripts\activate python -m pip install --upgrade uv uv pip install --editable .[dev]
Dagster uses environment variables located in .env.
Start local dagster server
dagster dev
Launch dagster job without
dagster job execute -m foneplatform -j ergast_job
- Code linting/formatting:
ruff
- type checking:
mypy
- SQL linting/formatting:
sqlfluff
Ideally the environment variable DATA_DIR
is set to a location where both the DuckDB
database and the F1 data will be located (fallback is "data" within the project directory).
Dagster uses .env
to set the path.
The data directory will look like this:
data
├── f1.duckdb
└── ergast
├── circuits.parquet
├── constructors.parquet
├── drivers.parquet
...
Staging is done by a Dagster Multi-Asset (./foneplatform/assets/ergast.py):
- Downloading ZIP of CSV files (http://ergast.com/downloads/f1db_csv.zip)
- Read CSV using DuckDB and store the asset-result as Parquet using the
LocalParquetIOManager
- (dbt will create views on top of external Parquet files)
Install dbt_utils:
dbt deps --project-dir="./dbt" --profiles-dir="./dbt"
Run models:
dbt run --project-dir="./dbt" --profiles-dir="./dbt"
Run tests:
dbt test --project-dir="./dbt" --profiles-dir="./dbt"
Run SQL linter on dbt models:
NOTE: This may require setting the
DATA_DIR
environment variable to be set to thedata
directory containing the duckdb database.
sqlfluff lint ./dbt/models/core