Modify results functions #999

gowthamrao · 2022-11-14T15:31:04Z

I would like to request functions that help manipulate cohort diagnostics output. There are many reasons for this that justify this function

Change cohort id: ability to go thru the results in output zipfile the replace oldCohortId with newCohortId. ie. take a data frame object with newCohortId and oldCohortId fields
Change database id: ability to go thru the results in output zipfile and replace oldDatabaseId, oldDatabaseName with newDatabaseId, newDatabaseName
filter results: ability to filter a large result set by cohortId and databaseId. i.e. function should take an array of cohortId's and databaseId's
Compare cohort sql hash between zip file out put and report if there are two cohorts that have the same sql hash but with different cohortId.
Join two or more zip files with results into one zip file.

gowthamrao · 2022-11-14T15:35:54Z

Justification:

integrate results from multiple sites that have run diagnostics using different definitions
split large studies into smaller
fix issues in labels e.g. use of space in databaseId

azimov · 2022-11-16T17:30:59Z

Whilst more validation is good, most of these things should be set at the time the study is designed. Doing (and allowing) ad hoc comparisons to merge bits of data is bad practice. I don't think it's a good idea to allow users to merge random results together - these are things that can lead to massive interpretation errors. It's much better practice to force investigator discipline at the study design step than to have utilities to merge badly collected data.

gowthamrao · 2022-11-17T18:45:20Z

Forcing investigator discipline is much harder in network studies, when contribution is coming from various sources. The use case i am interested in is the following:

A contributor is contributing to the OHDSI Phenotype library. The requirements for submissions are met. The submission would involve executing the cohort (as developed in their local instance with local atlasId and databaseId).
Once the peer review is complete and it is decided to accept this cohort - we need to integrate the submission to https://github.com/ohdsi-studies/PhenotypeLibraryDiagnostics . For this integration of the initial contribution to existing output from the PhenotypeLibraryDiagnostics study - we have ensure we need to extract (if there were unapproved cohortIds) can re-id the cohortId and if needed databaseId. I cant ask them to re-run just because the OHDSI phenotype library has now assigned the submitted cohort and new id (its hard to data partners to have it run once).

This issue is not limited to PhenotypeLibraryDiagnostics study. I have encountered the need to mix and match outputs from other studies.

gowthamrao · 2022-11-17T18:50:24Z

Change cohort name

Sorry missed this one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify results functions #999

Modify results functions #999

gowthamrao commented Nov 14, 2022 •

edited

Loading

gowthamrao commented Nov 14, 2022

azimov commented Nov 16, 2022

gowthamrao commented Nov 17, 2022 •

edited

Loading

gowthamrao commented Nov 17, 2022

Modify results functions #999

Modify results functions #999

Comments

gowthamrao commented Nov 14, 2022 • edited Loading

gowthamrao commented Nov 14, 2022

azimov commented Nov 16, 2022

gowthamrao commented Nov 17, 2022 • edited Loading

gowthamrao commented Nov 17, 2022

gowthamrao commented Nov 14, 2022 •

edited

Loading

gowthamrao commented Nov 17, 2022 •

edited

Loading