Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HRQB 17 - Prep for functional tasks and pipelines #22

Merged
merged 8 commits into from
May 22, 2024

Conversation

ghukill
Copy link
Collaborator

@ghukill ghukill commented May 16, 2024

Purpose and background context

While beginning on writing actual tasks and pipelines to load data into Quickbase, some additional helpers and functionality were required. This PR breaks that out into a smaller, standalone PR before implementing actual tasks.

To reiterate: this work lays the foundation for "real" tasks, which themselves should focus exclusively on getting data, transforming it, and loading it. Anytime we can offload that work to shared functionality, the better.

The primary are fairly well encapsulated in commits for review:

How can a reviewer manually see the effects of these changes?

This PR is a collection of small updates to support the writing of funtional tasks as a next step. As such, not readily testable in their own right.

Commits have, where applicable, new tests to demonstrate the new behavior.

Includes new or updated dependencies?

NOI

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

ghukill added 8 commits May 16, 2024 10:44
Why these changes are being introduced:

Debugging error will be easier if counts are available for
the total number of records processed by a Task.  This will
reveal tasks that are not returning the expected number of
records for downstream tasks.

How this addresses that need:
* Adds new properties on base Targets to produce a record
count by reading the output Target data

Side effects of this change:
* None
Why these changes are being introduced:

If one HRQBPipelineTask runs another pipeline task, it would be beneficial
to see all pipelines in that lineage in the outputted file.

How this addresses that need:
* Adds parameter 'pipline_parent_name' to HRQBPipelineTask, which is
considered when returning a pipeline name

Side effects of this change:
* Output target files include all calling pipelines in the filename

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/HRQB-17
Why these changes are being introduced:

When a pipeline is run via the luigi runner, it can support multiple parallel
workers.  It is unlikely we will need more than one concurrent worker, which
means we do not need any threading or multiprocessing overhead.  But this
env var allows for that if needed.

How this addresses that need:
* Add new optional env var LUIGI_NUM_WORKERS
  * If not set, luigi.build() is run with 1 worker

Side effects of this change:
* By default, pipelines are run with a single worker

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/HRQB-17
Why these changes are being introduced:

Some tasks may require using data from Quickbase to inform
upserts into other tables.  This will require querying Quickbase
tables using the QBClient.

How this addresses that need:
* Adds methods on QBClient to perform record queries
* Methods are opinionated to use field labels vs ids
* Methods return dataframes like other QBclient methods

Side effects of this change:
* QBClient is able to query records from tables

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/HRQB-17
Why these changes are being introduced:

Quickbase API may surface errors in two ways.  First, there
might be a non 2xx status code returned, which we will throw
and immediate exception.  Second, the request may have been
partially successful, but the response indicates some failures
for certain data upserted.  For these we will log but continue.

How this addresses that need:
* Adds status code error handling in QBClient
* Adds logging to QB tasks run method when errors in response

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/HRQB-17
@ghukill ghukill marked this pull request as ready for review May 16, 2024 18:54
Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic seems sound, 1 non-blocking question

hrqb/utils/quickbase.py Show resolved Hide resolved
Copy link

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@ghukill ghukill merged commit 1ee91d6 into main May 22, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants