HRQB 41 - Improve task scoping for runs and data cleanup #106
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose and background context
This PR reworks the CLI
--task
flag into two new ones--include
and--exclude
. Using a combination of these, it's now possible to:These changes dovetail with some improvements to how pipelines run, including moving the
run_pipeline()
function to a method onHRQBPipelineTask
, where it should only ever run anyhow.Taken altogether, the output of HRQBClient will remain identical. All changes are for local development convenience. When working on a particular task or family of tasks, it is much easier to isolate them for development.
Some areas of interest:
requires()
method,HRQBPipelineTasks
now expect adefault_requires()
method (example)--include-tasks
CLI flag, by new sharedrequires()
method and helpersluigi.utils.run_pipeline()
moved to method onHRQBPipelineTasks
and includes setup + cleanup logicHow can a reviewer manually see the effects of these changes?
Some manual testing is possible with these changes, as we can use test fixtures.
The following should show all tasks as
INCOMPLETE
:Here is the status and tasks of this pipeline:
If not, run this command to fully clear data:
Now, we can fun the full pipeline:
Imagine now that we need to work on the
MultiplyNumbers
task. By using--include-tasks
and--exclude-tasks
we can pinpoint this task, remove it's data each time, but keep data from other required tasks.The following shows that
--include
limits scope toMultiplyNumbers
and its required tasks:Given the includes/excludes, we can remove all data while keeping data from desired tasks:
In our development loop for making and testing changes to
MultiplyNumbers
, we can run this command repeatedly knowing that it will only run and cleanup data we expect:The output from this indicates as much, showing fully complete and successful, then data removed:
Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)