-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new gt metadata yml files #143
base: master
Are you sure you want to change the base?
Conversation
Hello, can you please commit the PR. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be better if the titles get prefixed by OCR-D
so they will line-up in the catalog and be immediately recognizable.
If possible at all, it would probably be better from HTR United's side if all the gt_structure
entries (except gt_structure_text
) were composed into one dataset here.
@@ -0,0 +1,50 @@ | |||
schema: https://htr-united.github.io/schema/2023-06-27/schema.json | |||
title: gt_structure_1_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: gt_structure_1_1 | |
title: OCR-D gt_structure_1_1 |
metric: regions | ||
citation-file-link: https://github.com/OCR-D/gt_structure_1_1/blob/main/CITATION.cff | ||
transcription-guidelines: >- | ||
OCR-D-GT-Guideline, Part: Structure Ground Truth https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a carriage return instead of a newline.
OCR-D-GT-Guideline, Part: Structure Ground Truth https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html | |
OCR-D-GT-Guideline, Part: Structure Ground Truth | |
https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html |
@@ -0,0 +1,51 @@ | |||
schema: https://htr-united.github.io/schema/2023-06-27/schema.json | |||
title: gt_structure_1_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: gt_structure_1_2 | |
title: OCR-D gt_structure_1_2 |
@@ -0,0 +1,50 @@ | |||
schema: https://htr-united.github.io/schema/2023-06-27/schema.json | |||
title: gt_structure_1_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: gt_structure_1_3 | |
title: OCR-D gt_structure_1_3 |
metric: regions | ||
citation-file-link: https://github.com/OCR-D/gt_structure_1_3/blob/main/CITATION.cff | ||
transcription-guidelines: >- | ||
OCR-D-GT-Guideline, Part: Structure Ground Truth https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
OCR-D-GT-Guideline, Part: Structure Ground Truth https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html | |
OCR-D-GT-Guideline, Part: Structure Ground Truth | |
https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html |
metric: regions | ||
citation-file-link: https://github.com/OCR-D/gt_structure_5_1/blob/main/CITATION.cff | ||
transcription-guidelines: >- | ||
OCR-D-GT-Guideline, Part: Structure Ground Truth https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCR-D-GT-Guideline, Part: Structure Ground Truth https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html | |
OCR-D-GT-Guideline, Part: Structure Ground Truth | |
https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html |
@@ -0,0 +1,50 @@ | |||
schema: https://htr-united.github.io/schema/2023-06-27/schema.json | |||
title: gt_structure_5_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: gt_structure_5_2 | |
title: OCR-D gt_structure_5_2 |
metric: regions | ||
citation-file-link: https://github.com/OCR-D/gt_structure_5_2/blob/main/CITATION.cff | ||
transcription-guidelines: >- | ||
OCR-D-GT-Guideline, Part: Structure Ground Truth https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCR-D-GT-Guideline, Part: Structure Ground Truth https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html | |
OCR-D-GT-Guideline, Part: Structure Ground Truth | |
https://ocr-d.de/en/gt-guidelines/trans/structur_gt.html |
@@ -0,0 +1,51 @@ | |||
schema: https://htr-united.github.io/schema/2023-06-27/schema.json | |||
title: gt_structure_5_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: gt_structure_5_3 | |
title: OCR-D gt_structure_5_3 |
@@ -0,0 +1,54 @@ | |||
schema: https://htr-united.github.io/schema/2023-06-27/schema.json | |||
title: gt_structure_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: gt_structure_text | |
title: OCR-D gt_structure_text |
Dear @tboenig and @bertsky, Also: be careful with Goth script code, it's for Gothic Language (and not Runes as I said: https://en.wikipedia.org/wiki/Gothic_alphabet ). I think you mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
@@ -0,0 +1,51 @@ | |||
schema: https://htr-united.github.io/schema/2023-06-27/schema.json | |||
title: gt_structure_5_3 | |||
url: https://github.com/OCR-D/tboenig/gt_structure_5_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url: https://github.com/OCR-D/tboenig/gt_structure_5_3 | |
url: https://github.com/OCR-D/gt_structure_5_3 |
What do you mean duplication? The various entries are subcorpora of https://github.com/OCR-D/gt_structure_all, which is basically a subcorpus of deutschestextarchiv.de split into smaller chunks. I proposed aggregating them into a single dataset here. But since the metadata.yml files are generated via CI on our side (for each repo independently), that might be difficult to achieve... |
Hallo @PonteIneptique Thank you for the rigorous check of the data records. I have changed Goth to Latf.
https://github.com/OCR-D/gt_structure_all is a metarepo that links all datasets. Maybe it should be considered for a future version of HTR-United, how such metarepos are represented in the catalog. I suggest that first of all the datasets are published in the catalog. In a second or subsequent step, you can always make improvements. Of course, the metadata/data must be correct. Thank you again for the check. All the Bests |
Dear both, |
@PonteIneptique – understood, @tboenig is already working on a solution. |
Thank you for your understanding :) |
No description provided.