-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Metadata schema alignment to DCAT #7
Comments
more clarity please @matamadio on:
time reference (year): This was enabled in E/H already - has this been duplicated, or just extended to V/L? |
Before:
Integrated into model table, applicability split into transferability at country level, and name of specific area if curve is based on local scale info (e.g. city level). Tranasferability notes instead of a fixed enum to allow better explanation.
It had time_start, time_end and time_span. Added and extended time_year as a reference year for scenarios (e.g. "2020" for current, "2080" for scenario). Simpler alternative to time span. |
Some initial feedback on the proposed MVP mostly from the perspective of more clearly aligning the core conceptual data model with other metadata standards. In the JKAN schema, mapping to DCAT and the proposed datapackage format, the "contribution" table maps to a dataset which has some commonly defined metadata fields. But in the proposed MVP we're using different names for those entities and those commonly defined properties. My suggestion would be to align the naming to clarify mapping between different serialisations and implementations. So:
Some other comments/questions:
|
Suggested changes above in table format for my own convenience
|
Thanks @ldodds for the review. The one misalignment I see with current MVP is that in the same contribution/dataset page, we currently put different datasets that are described by the same general attributes, but also have some peculiar attributes. I am still not sure if it is better to keep them in the same page for search/download convenience or split them for metadata clarity. E.g. http://jkan.riskdatalibrary.org/datasets/exp-ssd-all/ >> 3 different resource downloads, each one from a different source. Easier to find and download them altogether since they are related, but needs to rely on multiple "source" field; description becomes more generic to cover all. Borrowing Taku's table:
|
@matamadio for the South Sudan example I think those should actually be three separate datasets each with a single distribution. My reasoning here is that they all come from separate original datasets managed by different organisations which means they have different metadata and probably licensing. (I'm not sure that the CC-BY licence on the OSM extracts is legitimate as OSM uses ODbL and requires use of the same licence for any non-trivial extracts?). It looks like the common elements here are country of coverage and that they might have been produced by the same project/team? A country based grouping of datasets (e.g. as a custom category page built using the geo coverage) would give one easy way to represent these within the current data model and catalogue. Being able to describe the "Project" that produces some datasets might also be helpful, but that seems like an extension. |
According to new plan to have data and metadata hosted by the DDH2 platform, we need to map our schema to the one in use there, which is a modified version on ISO-19115. There is a already a good alignment for general dataset attributes. I will look into mapping-matching of current RDL schema version. Example metadata for landslide hazard layers (DDH2)
|
Alignment is being dealt with in May 2023 revision of JSON schema. |
As discussed in the last calls, we need to fix the schema to a "stable" starter version (called RDS 0.1), in order to add the showcase data homogeneously and to better understand gaps and issue of the schema to improve in the next versions.
The schema is simplified to a "core" implementation fitted to the data-out-DB approach (see GFDRR/rdl-data#36 ), and adapted in some parts to fit the JKAN structure in the schema file: https://github.com/GFDRR/rdl-jkan/blob/gh-pages/_data/schemas/rdl01.yml
This is the sheet I'm using to work on the schema. Changes to original schema (addition, rename, etc) in red:
https://drive.google.com/file/d/1dYRv79i6tlabFFgPG_KncEekEoPUeDhv/view?usp=sharing
(As a reference, here is the original version for SQL implementation from schema dev reports).
The major change compared to the initial schema is how the 4 components "speak" together; I've made an effort to match similar attributes and enums along schemas for better dataset linkage, clarity and homogeneity.
Before implementing it in the JKAN and related datasets metadata, it would be great to have a quick review together for green light.
The text was updated successfully, but these errors were encountered: