Skip to content

Commit

Permalink
feat: Add Mapping/Conversion Logic (#3)
Browse files Browse the repository at this point in the history
  • Loading branch information
hf-kklein authored May 13, 2024
1 parent b753925 commit c25e2c2
Show file tree
Hide file tree
Showing 14 changed files with 625 additions and 49 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12"]
os: [ubuntu-latest]
steps:
- uses: actions/checkout@v4
Expand Down
36 changes: 32 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
![Linting status badge](https://github.com/Hochfrequenz/chronomeleon/workflows/Linting/badge.svg)
![Black status badge](https://github.com/Hochfrequenz/chronomeleon/workflows/Formatting/badge.svg)

Chronomeleon is a Python package that converts date and time related objects in migration scenarios.
It's meant to be used when migrate dates, datetimes or time slices/ranges from one system to another.
Chronomeleon is a Python package that converts and maps date and time information from their representation in one system to another.
It's meant to be used in data migration projects.

## Rationale
While converting a datetime alone is possible with either Python builtin tools or libraries like `pendulum` and `arrow`,
Expand All @@ -32,10 +32,16 @@ Chronomeleon has two purposes:
1. It forces you to make assumptions explicit.
2. Once the assumptions are explicit, it helps you do the conversion.

The latter is no rocket science (and neither is any code in chronomeleon), but the former is crucial for a successful migration.
The latter is no rocket science (and neither is any line of code in chronomeleon), but making assumptions explicit is crucial and that's why using it is beneficial.

When you're constantly wondering why other coders seem to randomly
* add or subtract a day, a second, a tick here and there
* pass around naive dates and datetimes and try to convert them to UTC or other timezones with no clear reason

then chronomeleon is for you.

Chronomeleon makes your code more readable and makes your assumption clear.
This allows you to spot errors in your or your team mates code and explain, why things are done the way they are.
This allows you to spot errors in your or your teammates code more easily and explain why things are done the way they are.

## How to use it?
Install it from pypi:
Expand All @@ -45,7 +51,29 @@ pip install chronomeleon

Then, in your code: Make assumptions about the source and target system explicit.
To do so, chronomeleon provides you with so-called ChronoConfig objects.

Here's an advanced example, that shows the capabilities of Chronomeleon:
```python
from datetime import date, datetime, timedelta

import pytz

from chronomeleon import ChronoAssumption, MappingConfig, adapt_to_target

config = MappingConfig( # make assumptions explicit
source=ChronoAssumption(
implicit_timezone=pytz.timezone("Europe/Berlin"),
resolution=timedelta(days=1),
is_inclusive_end=True,
is_gastag_aware=False,
),
target=ChronoAssumption(resolution=timedelta(milliseconds=1), is_inclusive_end=True, is_gastag_aware=True),
is_end=True,
is_gas=True,
)
source_value = date(2021, 12, 31)
result = adapt_to_target(source_value, config) # do the mapping
assert result == datetime(2022, 1, 1, 4, 59, 59, microsecond=999000, tzinfo=pytz.utc)
```


Expand Down
7 changes: 3 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name = "chronomeleon"
description = "A python package to flexibly adapt start and end date(times) to your system background"
license = { text = "MIT" }
requires-python = ">=3.8"
requires-python = ">=3.10"
authors = [{ name = "Hochfrequenz Unternehmensberatung GmbH", email = "info+github@hochfrequenz.de" }]
keywords = ["date", "time", "conversion", "migration", "inclusive", "exclusive", "dateonly"]
classifiers = [
Expand All @@ -13,8 +13,7 @@ classifiers = [
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
# no python 3.8 and 3.9 because of missing kw_only arg to dataclass
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
Expand All @@ -30,7 +29,7 @@ Homepage = "https://github.com/Hochfrequenz/chronomeleon"

[tool.black]
line-length = 120
target_version = ["py38", "py39", "py310", "py311", "py312"]
target_version = ["py310", "py311", "py312"]

[tool.isort]
line_length = 120
Expand Down
5 changes: 5 additions & 0 deletions src/chronomeleon/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
Chronomeleon is a Python package that helps you to migrate datetimes from one system to another.
"""

__all__ = ["ChronoAssumption", "MappingConfig", "adapt_to_target"]

from .mapping import adapt_to_target
from .models import ChronoAssumption, MappingConfig
102 changes: 102 additions & 0 deletions src/chronomeleon/mapping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
"""
This a docstring for the module.
"""

import datetime as dt_module
from datetime import date, datetime, timedelta
from typing import Union

import pytz

from chronomeleon.models.mapping_config import MappingConfig

_berlin = pytz.timezone("Europe/Berlin")


def _convert_source_date_or_datetime_to_aware_datetime(
source_value: Union[date, datetime], config: MappingConfig
) -> datetime:
"""
returns a datetime object which is aware of the timezone (i.e. not naive) and is an exclusive end
regardless of whether the source was configured as an inclusive or exclusive end.
"""
source_value_datetime: datetime # a non-naive datetime (exclusive, if end)
if isinstance(source_value, datetime):
source_value_datetime = source_value
if config.is_end and config.source.is_inclusive_end:
assert config.source.resolution is not None # ensured by the consistency check
source_value_datetime += config.source.resolution
elif isinstance(source_value, date):
if config.is_end and config.source.is_inclusive_end:
source_value_datetime = datetime.combine(source_value + timedelta(days=1), datetime.min.time())
else:
source_value_datetime = datetime.combine(source_value, datetime.min.time())
else:
raise ValueError(f"source_value must be a date or datetime object but is {source_value.__class__.__name__}")
if source_value_datetime.tzinfo is None:
if config.source.implicit_timezone is not None:
source_value_datetime = config.source.implicit_timezone.localize(source_value_datetime)
else:
# pylint:disable=line-too-long
raise ValueError(
"source_value must be timezone-aware or implicit_timezone must be set in the mapping configuration"
)
source_value_datetime = source_value_datetime.astimezone(pytz.utc)
if config.source.is_gastag_aware and config.is_gas:
berlin_local_datetime = source_value_datetime.astimezone(_berlin)
if berlin_local_datetime.time() == dt_module.time(6, 0, 0):
berlin_local_datetime = berlin_local_datetime.replace(hour=0).replace(tzinfo=None)
# We need to re-localize the datetime, because the UTC offset might have changed
# The Gastag does not always start 6h after midnight.
# It might also be 5h or 7h on DST transition days.
berlin_local_datetime = _berlin.localize(berlin_local_datetime)
source_value_datetime = berlin_local_datetime.astimezone(pytz.utc)
return source_value_datetime


def _convert_aware_datetime_to_target(value: datetime, config: MappingConfig) -> datetime:
"""
returns a date or datetime object which is compatible with the target system
"""
if value.tzinfo is None:
raise ValueError("value must be timezone-aware at this point")
target_value: datetime = value
if config.target.is_gastag_aware and config.is_gas:
_berlin_local_datetime = value.astimezone(_berlin)
if _berlin_local_datetime.time() == dt_module.time(0, 0, 0):
_berlin_local_datetime = _berlin_local_datetime.replace(hour=6).replace(tzinfo=None)
# We need to re-localize the datetime, because the UTC offset might have changed.
# The Gastag does not always start 6h after midnight.
# It might also be 5h or 7h on DST transition days.
_berlin_local_datetime = _berlin.localize(_berlin_local_datetime)
target_value = _berlin_local_datetime.astimezone(pytz.utc)
if config.is_end and config.target.is_inclusive_end:
assert config.target.resolution is not None # ensured by the consistency check
target_value = target_value - config.target.resolution # converts the exclusive end to an inclusive end
# and e.g. 2024-01-02 00:00:00 to 2024-01-01 23:59:59 if the resolution is timedelta(seconds=1)
# Work because the original value is - if it is an end - always an exclusive end.
if config.target.implicit_timezone is not None:
target_value = target_value.astimezone(config.target.implicit_timezone)
if config.target.is_date_only:
target_value = datetime.combine(target_value.date(), datetime.min.time())
return target_value


def adapt_to_target(source_value: Union[date, datetime], config: MappingConfig) -> datetime:
"""
maps the source value to a value compatible with the target system by using the given mapping configuration
"""
if source_value is None:
raise ValueError("source_value must not be None")
if config is None:
raise ValueError("config must not be None")
if not config.is_self_consistent():
raise ValueError("config is not self-consistent: " + ", ".join(config.get_consistency_errors()))
# there are just 2 steps:
# 1. convert the source from whatever it is to something unified with what we can work
# 2. convert the unified source to the target (which might be just as obscure as the source)
source_value_datetime = _convert_source_date_or_datetime_to_aware_datetime(source_value, config) # step 1
assert source_value_datetime.tzinfo is not None
assert source_value_datetime.tzinfo == pytz.utc
target_value = _convert_aware_datetime_to_target(source_value_datetime, config) # step 2
return target_value
4 changes: 4 additions & 0 deletions src/chronomeleon/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,8 @@
models are the classes used by chronomeleon
"""

__all__ = ["ChronoAssumption", "MappingConfig"]
# fixes: Module "chronomeleon.models" does not explicitly export attribute "ChronoAssumption"

from .chrono_assumption import ChronoAssumption
from .mapping_config import MappingConfig
40 changes: 34 additions & 6 deletions src/chronomeleon/models/chrono_assumption.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ class ChronoAssumption:
represents assumptions about how a specific system interprets a specific field that holds date or time
"""

resolution: timedelta
resolution: Optional[timedelta] = None
"""
This is only necessary to provide, if the field is an inclusive end date.
The smallest unit of time that this field can represent.
Typically this is something like 1 day, 1 second, 1 microsecond.
Adding one "unit" of the resolution leads to the smallest possible increase in the field.
Expand All @@ -32,11 +33,6 @@ class ChronoAssumption:
pytz is a dependency of chronomeleon; If you install chronomeleon, you also get pytz.
"""

is_end: Optional[bool] = None
"""
True if and only if the date or time is the end of a range. None if it doesn't matter.
"""

is_inclusive_end: Optional[bool] = None
"""
Must not be None if is_end is True.
Expand All @@ -45,3 +41,35 @@ class ChronoAssumption:
the entire month of January.
If is_inclusive_end is False, then the range 2024-01-01 to 2024-02-01 covers the entire month of January.
"""

is_gastag_aware: bool = False
"""
True if and only if the start of a day is 6:00 am German local time.
If you never heard of the "Gastag", you can ignore this parameter and let it default to False.
"""

is_date_only: bool = False
"""
True if and only if the field in the respective system is a date without a time component (datetime.date).
"""

def get_consistency_errors(self) -> list[str]:
"""
returns errors from the self-consistency check; if the returned list is empty, the object is self-consistent
"""
result: list[str] = []
if self.is_inclusive_end and self.resolution is None:
result.append("if is_inclusive_end is True, then resolution must be set")
if self.resolution is not None and not isinstance(self.resolution, timedelta):
result.append(f"resolution must be a timedelta object but is {self.resolution.__class__.__name__}")
if self.implicit_timezone is not None and not isinstance(self.implicit_timezone, BaseTzInfo):
result.append(
f"implicit_timezone must be a pytz timezone object but is {self.implicit_timezone.__class__.__name__}"
)
return result

def is_self_consistent(self) -> bool:
"""
returns True if the object is self-consistent
"""
return not any(self.get_consistency_errors())
63 changes: 63 additions & 0 deletions src/chronomeleon/models/mapping_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
"""contains the Mapping configuration class"""

from dataclasses import dataclass
from typing import Optional

from .chrono_assumption import ChronoAssumption


@dataclass(frozen=True, kw_only=True)
class MappingConfig:
"""
represents the mapping rules for one date(time) field from one system to another
"""

source: ChronoAssumption
"""
assumptions about the interpretation of the date(time) field in the source system
"""
target: ChronoAssumption
"""
assumptions about the interpretation of the date(time) field in the source system
"""

is_end: Optional[bool] = None
"""
True if and only if the date or time is the end of a range. None if it doesn't matter.
"""

is_gas: Optional[bool] = None
"""
True if the sparte is Gas.
Set to true to trigger the gas tag modifications in source, target or both, if necessary. Ignore otherwise.
"""

def get_consistency_errors(self) -> list[str]:
"""
returns a list of error messages if the mapping configuration is not self-consistent
"""
errors: list[str] = []
if not isinstance(self.source, ChronoAssumption):
errors.append("source must be a ChronoAssumption object")
else:
errors.extend(["source: " + x for x in self.source.get_consistency_errors()])
if not isinstance(self.target, ChronoAssumption):
errors.append("target must be a ChronoAssumption object")
else:
errors.extend(["target: " + x for x in self.target.get_consistency_errors()])
if (self.source.is_gastag_aware or self.target.is_gastag_aware) and self.is_gas is None:
errors.append("if is_gastag_aware is set in either source or target, then is_gas must not be None")
# The opposite is not the case: I can set is_gas to True without setting is_gastag_aware to True
if (
self.source.is_inclusive_end is not None or self.target.is_inclusive_end is not None
) and self.is_end is None:
errors.append("if is_inclusive_end is set in either source or target, then is_end must not be None")
if self.is_end is True and (self.source.is_inclusive_end is None or self.target.is_inclusive_end is None):
errors.append("if is_end is True, then is_inclusive_end must not be None in both source and target")
return errors

def is_self_consistent(self) -> bool:
"""
checks if the mapping configuration is self-consistent
"""
return not any(self.get_consistency_errors())
23 changes: 0 additions & 23 deletions src/chronomeleon/mymodule.py

This file was deleted.

18 changes: 18 additions & 0 deletions unittests/test_chrono_assumption.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from datetime import timedelta

import pytest

from chronomeleon import ChronoAssumption


@pytest.mark.parametrize(
"chrono_assumption, is_self_consistent",
[
pytest.param(
ChronoAssumption(resolution=timedelta(days=1)),
True,
),
],
)
def test_self_consistency(chrono_assumption: ChronoAssumption, is_self_consistent: bool):
assert chrono_assumption.is_self_consistent() == is_self_consistent
Loading

0 comments on commit c25e2c2

Please sign in to comment.