Skip to content

Cross-Linguistic Data Format (CLDF) dataset derived from von Rosenberg's "De Mentawei-Eilanden en Hunne Bewoners" from 1853.

License

Notifications You must be signed in to change notification settings

complexico/mentawai-word-list-1853

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLDF dataset derived from von Rosenberg's "De Mentawei-Eilanden en Hunne Bewoners" from 1853

This work is part of the AHRC-funded project on the lexical resources for Enggano, led by the Faculty of Linguistics, Philology and Phonetics at the University of Oxford, UK. Visit the central webpage of the Enggano project.

CLDF validation DOI

How to cite

If you use these data please cite

  • the original source

    Rosenberg, Carl Benjamin Hermann von. 1853. De Mentawei-Eilanden en Hunne Bewoners. Tijdschrift voor Indische Taal-, Land- en Volkenkunde 1. 403–440.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en license

Available online at https://www.digitale-sammlungen.de/en/view/bsb10433845?page=450,451

Notes

Based on the Rights Statement (presented down below in that page), this digitised journal has a No Copyright-Non-commercial use only condition.

Before the CLDF conversion using Python, the materials in this repository (inside the data directory) were processed using R as an RStudio project (the R scripts are in the codes directory). The English gloss of the Dutch was generated via the DeepL translator using the deeplr R package.

As a long-time R user, the motivation to produce this repository is as a practice to get started with the cldfbench workflow in Python to implement the Cross-Linguistic Data Format (CLDF) that I would like to apply and extend to the Enggano lexical resources project I have been part of. The other motivation is to (i) document this legacy data in a computer-readable format, (ii) enrich its content following the CLDF standard, and (iii) contribute to an on-going research on the languages of the Barrier Islands, in Sumatra, Indonesia, extending the Enggano language project.

Statistics

Glottolog: 100% Concepticon: 98% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 1 (linked to 1 different Glottocodes)
  • Concepts: 267 (linked to 255 different Concepticon concept sets)
  • Lexemes: 271
  • Sources: 1
  • Synonymy: 1.01
  • Invalid lexemes: 0
  • Tokens: 1,575
  • Segments: 31 (0 BIPA errors, 0 CLTS sound class errors, 31 CLTS modified)
  • Inventory size (avg): 31.00

Contributors

Name GitHub user Description Role
Gede Primahadi W. Rajeg @gederajeg Digitisation
Code
CLDF conversion
Concepticon mapping
Orthography profiling
Maintainer

CLDF Datasets

The following CLDF datasets are available in cldf: