Skip to content

Commit

Permalink
v0.14.0-alpha.1
Browse files Browse the repository at this point in the history
  • Loading branch information
elephantum committed Aug 11, 2024
1 parent 85d500b commit 07f7438
Show file tree
Hide file tree
Showing 3 changed files with 71 additions and 1 deletion.
7 changes: 7 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@
sqlite
extending-cli

.. toctree::
:caption: Migration
:maxdepth: 2
:hidden:

migration-v013-to-v014

.. toctree::
:caption: Reference
:maxdepth: 2
Expand Down
63 changes: 63 additions & 0 deletions docs/source/migration-v013-to-v014.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Migration from v0.13 to v0.14

## DatatableTansform can become BatchTransform

Previously, if you had to do whole table transformation, you had to use
`DatatableTransform`. Now you can substitute it with `BatchTransform` which has
zero outputs.

Before:

```python
# Updates global count of input lines

def count(
ds: DataStore,
input_dts: List[DataTable],
output_dts: List[DataTable],
kwargs: Dict,
run_config: Optional[RunConfig] = None,
) -> None:
assert len(input_dts) == 1
assert len(output_dts) == 1

input_dt = input_dts[0]
output_dt = output_dts[0]

output_dt.store_chunk(
pd.DataFrame(
{"result_id": [0], "count": [len(input_dt.meta_table.get_existing_idx())]}
)
)

# ...

DatatableTransform(
count,
inputs=["input"],
outputs=["result"],
)
```

After:

```python
# Updates global count of input lines

def count(
input_df: pd.DataFrame,
) -> pd.DataFrame:
return pd.DataFrame({"result_id": [0], "count": [len(input_df)]})

# ...

BatchTransform(
count,
inputs=["input"],
outputs=["result"],

# Important, we have to specify empty set in order for transformation to operate on
# the whole input at once
transform_keys=[],
)
```
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "datapipe-core"
version = "0.13.14"
version = "0.14.0-alpha.1"
description = "`datapipe` is a realtime incremental ETL library for Python application"
readme = "README.md"
repository = "https://github.com/epoch8/datapipe"
Expand Down

0 comments on commit 07f7438

Please sign in to comment.