Markus Gärtner


2020

pdf bib
To Boldly Query What No One Has Annotated Before? The Frontiers of Corpus Querying
Markus Gärtner | Kerstin Jung
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Corpus query systems exist to address the multifarious information needs of any person interested in the content of annotated corpora. In this role they play an important part in making those resources usable for a wider audience. Over the past decades, several such query systems and languages have emerged, varying greatly in their expressiveness and technical details. This paper offers a broad overview of the history of corpora and corpus query tools. It focusses strongly on the query side and hints at exciting directions for future development.

pdf bib
The Corpus Query Middleware of Tomorrow – A Proposal for a Hybrid Corpus Query Architecture
Markus Gärtner
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora

Development of dozens of specialized corpus query systems and languages over the past decades has let to a diverse but also fragmented landscape. Today we are faced with a plethora of query tools that each provide unique features, but which are also not interoperable and often rely on very specific database back-ends or formats for storage. This severely hampers usability both for end users that want to query different corpora and also for corpus designers that wish to provide users with an interface for querying and exploration. We propose a hybrid corpus query architecture as a first step to overcoming this issue. It takes the form of a middleware system between user front-ends and optional database or text indexing solutions as back-ends. At its core is a custom query evaluation engine for index-less processing of corpus queries. With a flexible JSON-LD query protocol the approach allows communication with back-end systems to partially solve queries and offset some of the performance penalties imposed by the custom evaluation engine. This paper outlines the details of our first draft of aforementioned architecture.
Search
Co-authors
Venues