Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mint MD5 hash values for Key field #63

Merged
merged 1 commit into from
Jun 14, 2024
Merged

Conversation

ghukill
Copy link
Collaborator

@ghukill ghukill commented Jun 14, 2024

Purpose and background context

It was determined that columns used from the data warehouse, expected to be enduringly the same and unique, were changing across warehouse runs. These values were used for Quickbase merge fields, but this would not work if the values changed.

Values for tables Employee Appointments, Employee Salary History, and Employee Leave were needed that uniquely identified the record/row, and would not change over time from warehouse runs.

The solution was to follow a pattern already established in Employee Leave to create an MD5 hash of record values that are known to uniquely identify a record.

For example, the following is the code that mints an MD5 hash for Employee Appointments:

emp_appts_df["key"] = emp_appts_df.apply(
            lambda row: md5_hash_from_values(
                [
                    row.mit_id,
                    row.position_id,
                    row.appt_begin_date,
                    row.appt_end_date,
                ]
            ),
            axis=1,
        )

How can a reviewer manually see the effects of these changes?

New tests demonstrate how the utility function md5_hash_from_values() creates an MD5 hash from a list of strings. Each Transform task for these tables utilizes this to set a Key field (naming convention from Quickbase) from a list of values. There are tests (example, example, example) for these transforms that show the value of the Key field is expected given the input row data.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:

It was determined that columns used from the data warehouse, expected to be
enduringly the same and unique, where changing across warehouse runs.  These
values were used for Quickbase merge fields, but this would not work if the
values changed.

A value was needed that uniquely identifies the row for a handful of tables,
across warehouse runs over time.

How this addresses that need:
* Updates multiple tables to mint an MD5 value from an ordered list of
field values that are known to uniquely identify a row

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/HRQB-36
@ghukill ghukill requested a review from ehanson8 June 14, 2024 14:27
Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great shifting and reuse of existing code!

task_transform_employee_salary_history_complete,
task_shared_extract_qb_employee_appointments_complete,
):
# NOTE: for this test, need to manually get part of the employee appointment data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good explanation of why this test is different from the others!

@ghukill ghukill merged commit f0273e8 into main Jun 14, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants