You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the FAIR Data Quality check engine (issue #328), consult the ESIP AI-Readiness Checklist for a list of data quality checks that the community has felt are important for assessment for prep for ML tooling readiness. See:
I propose that these would be a good candidate set of checks that have already been vetted by ESIP and would be useful way to vet the data quality engine. Maybe it would be it's own suite?
The text was updated successfully, but these errors were encountered:
I reformatted the AI Readiness checklist into a csv with a column for whether the "check" could actually be implemented in an automated way. My values in that column is a best guess, first instincts kind of answer. Based on the list I identified the following checks that are already implemented:
Is there contact information for subject-matter experts?
Is there a clear data license?
Is the license standardized and machine-readable (e.g. Creative Commons)?
Is it available in at least one open, non-proprietary format?
Is there a comprehensive data dictionary/codebook to describe parameters?
Does it include details on the spatial and temporal extent?
The following checks could be easily implemented:
Have null values/gaps been filled?
What is the timeliness of the data?
Is there quantitative information about data resolution in space and time?
Is the provenance tracked and documented?
Is the data dictionary standardized?
Do the parameters follow a defined standard?
Are parameters crosswalked in an ontology or common vocabulary (e.g. NIEM)?
What is the file format?
Is it machine-readable?
Has the data been anonymized / de-identified?
The rest either are not applicable, or would be difficult/impossible to implement.
For the FAIR Data Quality check engine (issue #328), consult the ESIP AI-Readiness Checklist for a list of data quality checks that the community has felt are important for assessment for prep for ML tooling readiness. See:
I propose that these would be a good candidate set of checks that have already been vetted by ESIP and would be useful way to vet the data quality engine. Maybe it would be it's own suite?
The text was updated successfully, but these errors were encountered: