-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HRQB 29 - Improve Quickbase upsert reporting #100
Conversation
Why these changes are being introduced: Formerly, each QuickbaseUpsertTask would individually log how metrics about modified records. This made it difficult to quickly get a holistic picture of the run. The methods were also logging oriented vs data oriented, which made working with that data tricky. How this addresses that need: * Utilize luigi Events to pinpoint tasks only from the current run * Shift parsing up upsert results to QBClient * Add method HRQBPipelineTask.aggregate_upsert_results() to aggregate all upsert results, across all Load tasks that fired during the run By utilizing luigi Events, this also paves the way for better task data cleanup, if we want to only pinpoint tasks that ran during the last run (though not yet implemented). Side effects of this change: * All upsert results are written to logs in an aggregate form Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/HRQB-20
for task, target in list(zip(self.deps(), self.input(), strict=True)) | ||
} | ||
|
||
|
||
@HRQBTask.event_handler(luigi.Event.SUCCESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of the Luigi Events/Callbacks. They allow registering on a particular task class, and then subscribing to particular events, in this case a successful run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one suggestion
hrqb/base/task.py
Outdated
@property | ||
def parse_upsert_counts(self) -> dict | None: | ||
if self.target.exists(): | ||
return QBClient.parse_upsert_results(self.target.read()) | ||
return None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A docstring might help to clarify this property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added! Good call.
Purpose and background context
Formerly, each
QuickbaseUpsertTask
(tasks that actually upsert data to Quickbase) would individually log metrics about modified records. This made it difficult to quickly get a holistic picture of the run; how many tables had how many updates. The methods were also logging oriented vs data oriented, which made working with that data in other contexts tricky.How this addresses that need:
QBClient
HRQBPipelineTask.aggregate_upsert_results()
to aggregate all upsert results, across all Load tasks that fired during the runBy utilizing luigi Events, this also paves the way for better task data cleanup, if we want to only pinpoint tasks that ran during the last run (though not yet implemented).
Includes new or updated dependencies?
NO
Changes expectations for external applications?
YES: updated logging structure
What are the relevant tickets?
Developer
Code Reviewer(s)