This project is work in progress.
The repository contains data used to train Kraken text recognition and segmentation models for medieval Greek minuscule manuscripts within the framework of eScriptorium.
All images were collected using the IIIF-Services provided by the respective library (copyright may be claimed by the respective library, cf. the linked IIIF manifests in MssList.md). The original transcriptions were taken from Vatican Library Greek Paleography (VAT in MssList.md) and from the Patristic Text Archive. All transcriptions were manually corrected and amended by Annette von Stockhausen.
Currently, the repository contains these folders and files:
- README.md: This file
- MssList.md: List of manuscripts which have already been segmented and annotated
- TranscriptionRules.md: Rules applied in transcription
- SegmentationRules.md: Rules for the segmentation
- data
- will contain the images and transcriptions
- eScriptorium_helpers
- Greek manuscripts.json: Virtual keyboard for special characters used in transcription
- ontology_escriptorium_grc.json: Ontology for segmentation (adapting SegmOnto)