Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Feedback - Temporal Data #83

Open
bwalsh opened this issue Dec 2, 2024 · 9 comments
Open

General Feedback - Temporal Data #83

bwalsh opened this issue Dec 2, 2024 · 9 comments
Assignees

Comments

@bwalsh
Copy link

bwalsh commented Dec 2, 2024

What were you reviewing?

Temporal data: datetime and ageAtEvent fields across fhir resources

Review Date:

December 2024

Relevant Link:

For example see birthDate and deceasedDateTime in https://nih-ncpi.github.io/ncpi-fhir-ig-2/StructureDefinition-ncpi-participant.html#profile

Feedback:

See below

@bwalsh
Copy link
Author

bwalsh commented Dec 2, 2024

Architecture Inflection Points for Transforming PHI Dates in FHIR Data

Transforming data that includes Protected Health Information (PHI) dates requires careful architectural planning to support regulatory compliance, scalability, and, crucially, the ability to maintain utility for analysis. Below are key inflection points to guide this process, with a focus on enabling longitudinal studies and temporal trend analysis while retaining the original sections for completeness.


1. Support for Analytical Use Cases

  • Uniform Querying for Longitudinal Studies:
    Ensure that transformed dates (e.g., shifted dates, relative timestamps or ageAtEvent) are stored in a consistent format across all FHIR resources. This uniformity allows analysts to query data without needing to account for varying transformation logic.

  • Temporal Relationships and Trends:
    Maintain the ability to analyze sequences and trends (e.g., time from diagnosis to treatment, or treatment response over time) by preserving relative intervals between events during date transformations.

    • Example: If an Observation occurs 5 days after a Procedure, this relationship must be intact after transformation.
  • Relative Time Queries:
    For scenarios where absolute dates are removed, implement a relativeDateTime field (e.g., days since diagnosis or days since enrollment) or ageAtEvent that supports flexible querying and aggregation for temporal patterns.

  • Custom Query Support for Analysts:
    Extend the FHIR API or implement an analytical query layer to provide researchers with built-in support for common temporal queries such as:

    • "Events that occurred within 30 days after diagnosis."
    • "Temporal trends in blood pressure readings over the course of a treatment plan."

2. Data Privacy and Compliance Requirements

  • Regulatory Compliance:
    Asserts that applicable privacy regulations such as HIPAA, GDPR, or local data privacy laws to determine acceptable handling of PHI dates.

    • Date Shifting: Introduce deterministic or random offsets to preserve privacy while maintaining data usability.
    • RelativeDateTime: Transform absolute dates into relative timestamps (e.g., days since diagnosis or treatment) to remove direct PHI context.
    • ageAtEvent: Transform absolute dates into a uniform float value describing age in days after applying offsets to remove direct PHI context.
  • Auditability:
    Note: ETL authors have to ensure transformations are traceable for compliance audits without reversing anonymization. Submitters will assert only anonymized data is uploaded to the FHIR service.


3. Transformation Logic

  • Consistency:

    • For longitudinal records, ensure consistent date transformations for related entities (e.g., Patient, Encounter, Observation).
    • Use a patient-specific seed for deterministic date shifting, allowing coherent offsets across datasets.
  • Lossless Conversion:

    • When shifting dates or converting to relative formats, preserve chronological relationships (e.g., Event A occurred before Event B).
  • Precision Handling:

    • Ensure that fine-grained timestamps (hours, minutes) are required for early detection and drug response cohorts.
  • Preserve Analytical Context:
    Avoid transformations that disrupt the logical flow of clinical data, ensuring that analysts can derive meaningful insights.

  • Regulatory Compliance for Age:

    • Consider whether age fields require transformation based on specific privacy guidelines (e.g., masking exact ages for older patients to prevent re-identification).
  • Handling Edge Cases:

    • Handle situations where birthDate is missing, invalid, or falls outside reasonable bounds (e.g., extremely old dates).

4. Data Storage Considerations

  • Original Data Retention:
    Decide whether to store original PHI dates in a secured, access-controlled archive for specific use cases (e.g., legal requirements).

  • Schema Updates:
    Modify FHIR resource schemas to include fields for relativeDateTime type or "ageAtEvent" extension

5. Testing and Validation

  • Validation for Age Accuracy:

    • Test that Age calculations are accurate and consistent across all resources and transformation logic.
  • Edge Case Testing:

    • Validate handling of scenarios such as missing birthDate, extremely old or young ages, and overlapping time periods.

Comparison of Effort: Shifting Datetime Fields vs. Introducing an ageAtEvent Extension

Method 1: Shifting Existing Datetime Fields

Advantages

  • Minimal Schema Changes:
    Shifting existing datetime fields requires no new fields or schema extensions, reducing development complexity.
  • Preserves Original Context:
    Dates retain their approximate temporal context (e.g., the year of an event), which can be helpful for certain analyses.
  • Interoperability:
    Maintains compatibility with standard FHIR profiles and external systems that expect datetime fields.

Challenges

  • Consistency Across Related Fields:
    All datetime fields must be shifted deterministically and consistently across the dataset to preserve chronological relationships (e.g., Event A occurred before Event B).
  • Privacy Compliance:
    Shifting dates still risks re-identification if the shifting logic is deterministic and predictable. Randomized shifts may introduce inaccuracies in temporal relationships.
  • Effort for Testing and Validation:
    Requires thorough validation to ensure that chronological relationships are intact after transformation.
  • Query Complexity:
    Temporal queries may require users to account for shifted dates, potentially complicating downstream analysis.

Level of Effort

  • Implementation:
    Moderate, as shifting logic needs to be implemented, tested, and integrated into the pipeline.
  • Maintenance:
    Low, since the data schema remains unchanged and shifts can be automated.
  • Analytical Support:
    Medium, as analysts may need to adapt their queries to handle shifted dates accurately.

Method 2: Introducing an ageAtEvent Extension

Advantages

  • Simplified Analysis:
    Explicitly calculating and storing ageAtEvent provides a user-friendly way for analysts to query data without needing to interpret dates.
  • Privacy Enhancement:
    Avoids storing PHI dates directly, reducing re-identification risk.
  • Preserves Temporal Relationships:
    Since ageAtEvent is derived, the relative intervals between events are naturally preserved.

Challenges

  • Schema Changes:
    Requires the introduction of a new ageAtEvent field or extension, potentially complicating schema management and interoperability.
  • Derived Field Accuracy:
    Accurate calculation of ageAtEvent depends on the availability and correctness of the source datetime fields (e.g., birthDate).
  • Backwards Compatibility:
    Existing queries or systems expecting datetime fields must be updated to utilize the new field.
  • Field Duplication:
    Storing both datetime fields and ageAtEvent may introduce redundancy, requiring additional documentation and governance.

Level of Effort

  • Implementation:
    High, as it involves schema changes, field calculations, and updates to FHIR profiles or extensions.
  • Maintenance:
    Medium, as additional logic is required to maintain consistency between ageAtEvent and datetime fields.
  • Analytical Support:
    Low, since the explicit ageAtEvent field simplifies queries and reduces the cognitive load on analysts.

Comparison Summary

Aspect Shifting Datetime Fields Introducing ageAtEvent Extension
Implementation Effort Moderate High
Schema Changes None Requires new field/extension
Maintenance Effort Low Medium
Analytical Complexity Medium (requires handling shifts) Low (explicit age simplifies queries)
Interoperability Impact Minimal Potential challenges with existing systems
Privacy Compliance Moderate (risks with deterministic shifts) High (avoids direct PHI dates)
Temporal Relationship Requires careful validation Naturally preserved through derivation
User-Friendliness Moderate High

Recommendation

  • Use Shifting Datetime Fields if:

    • Minimal schema changes are preferred.
    • The dataset requires interoperability with systems expecting datetime fields.
    • Analysts are comfortable handling shifted dates during queries.
  • Use ageAtEvent Extension if:

    • Simplifying analytical workflows is a priority.
    • Privacy concerns about datetime fields are significant.
    • There is a focus on longitudinal studies where age is a primary analytical metric.

Each approach has trade-offs, but introducing an ageAtEvent extension provides stronger support for user-friendly and privacy-compliant analysis, albeit at a higher initial implementation cost.

@bwalsh
Copy link
Author

bwalsh commented Dec 2, 2024

@JamedFV You were auto-assigned. Sorry, but I can't seem to remove the assignment?

@teslajoy Can you review and comment?

@RobertJCarroll FYI: Still a draft but wanted to follow up from our last call.

@JamedFV
Copy link
Collaborator

JamedFV commented Dec 2, 2024

Hi @bwalsh - Thanks for submitting this! Sorry about that, I've got the issues generated through a template auto-assigned to myself so they don't get missed. Would you like this issue assigned to yourself or to someone else for review?

@bwalsh
Copy link
Author

bwalsh commented Dec 2, 2024

Thanks @JamedFV - if you can assign it back to me. Thanks again

@JamedFV JamedFV assigned bwalsh and unassigned JamedFV Dec 2, 2024
@teslajoy
Copy link

teslajoy commented Dec 6, 2024

LGTM 👍 - thank you @bwalsh!

As an analyst : measuring N categories as a function of time where the time interval is on a standard unit and shift is important/crucial.

As a data engineer :
also a small note on relativeDateTime.
The use-case of relativeDateTime in FHIR R5 may have changed.
In FHIR R4 version, it looks like it’s possible to point relativeDateTime directly to any FHIR resource entity via relativeDateTime.target This allows user to use relativeDateTime directly as the obfuscated datetime.

In FHIR R5 version, looks like relativeTime is the closest FHIR datatype to relativeDateTime. relativeTime doesn't seem to allow user to point to a fhir resource directly.

@RobertJCarroll
Copy link

This looks great Brian, thanks. I think my straw proposal would be: What we if required both? It's a bit extreme, but considering the value for our users it might be worth it. Plus we could potentially build some tooling together to help the lift on the repository side.

@RobertJCarroll
Copy link

In FHIR R5 version, looks like relativeTime is the closest FHIR datatype to relativeDateTime. relativeTime doesn't seem to allow user to point to a fhir resource directly.

I can't find the R5 relative-time extension that's referenced in the deprecated relativeDateTime, but the R6 version you linked does appear to use contextReference to meet the same need as target.

@teslajoy
Copy link

I see - "context used as a base point" can be interpreted more broadly. Thank you.

@bwalsh
Copy link
Author

bwalsh commented Dec 10, 2024

This looks great Brian, thanks. I think my straw proposal would be: What we if required both? It's a bit extreme, but considering the value for our users it might be worth it. Plus we could potentially build some tooling together to help the lift on the repository side.

@RobertJCarroll Thanks. Building tooling together sounds like a great idea. I'm thinking through What we if required both. Most of my questions are around the User-Friendliness and Temporal Relationship criteria. e.g. I've created a cohort from two different studies from two different submitters. Are both fields populated? If not how do I manage queries?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants