Harald Hammarström
2020
The DReaM Corpus: A Multilingual Annotated Corpus of Grammars for the World’s Languages
Shafqat Mumtaz Virk
|
Harald Hammarström
|
Markus Forsberg
|
Søren Wichmann
Proceedings of The 12th Language Resources and Evaluation Conference
There exist as many as 7000 natural languages in the world, and a huge number of documents describing those languages have been produced over the years. Most of those documents are in paper format. Any attempts to use modern computational techniques and tools to process those documents will require them to be digitized first. In this paper, we report a multilingual digitized version of thousands of such documents searchable through some well-established corpus infrastructures. The corpus is annotated with various meta, word, and text level attributes to make searching and analysis easier and more useful.
From Linguistic Descriptions to Language Profiles
Shafqat Mumtaz Virk
|
Harald Hammarström
|
Lars Borin
|
Markus Forsberg
|
Søren Wichmann
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)
Language catalogues and typological databases are two important types of resources containing different types of knowledge about the world’s natural languages. The former provide metadata such as number of speakers, location (in prose descriptions and/or GPS coordinates), language code, literacy, etc., while the latter contain information about a set of structural and functional attributes of languages. Given that both types of resources are developed and later maintained manually, there are practical limits as to the number of languages and the number of features that can be surveyed. We introduce the concept of a language profile, which is intended to be a structured representation of various types of knowledge about a natural language extracted semi-automatically from descriptive documents and stored at a central location. It has three major parts: (1) an introductory; (2) an attributive; and (3) a reference part, each containing different types of knowledge about a given natural language. As a case study, we develop and present a language profile of an example language. At this stage, a language profile is an independent entity, but in the future it is envisioned to become part of a network of language profiles connected to each other via various types of relations. Such a representation is expected to be suitable both for humans and machines to read and process for further deeper linguistic analyses and/or comparisons.
Search