Elke Teich
2020
The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study
Stefan Fischer
|
Jörg Knappen
|
Katrin Menzel
|
Elke Teich
Proceedings of The 12th Language Resources and Evaluation Conference
We present a new, extended version of the Royal Society Corpus (RSC), a diachronic corpus of scientific English now covering 300+ years of scientific writing (1665--1996). The corpus comprises 47 837 texts, primarily scientific articles, and is based on publications of the Royal Society of London, mainly its Philosophical Transactions and Proceedings. The corpus has been built on the basis of the FAIR principles and is freely available under a Creative Commons license, excluding copy-righted parts. We provide information on how the corpus can be found, the file formats available for download as well as accessibility via a web-based corpus query platform. We show a number of analytic tools that we have implemented for better usability and provide an example of use of the corpus for linguistic analysis as well as examples of subsequent, external uses of earlier releases. We place the RSC against the background of existing English diachronic/scientific corpora, elaborating on its value for linguistic and humanistic study.
How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech
Yuri Bizzoni
|
Tom S Juzek
|
Cristina España-Bonet
|
Koel Dutta Chowdhury
|
Josef van Genabith
|
Elke Teich
Proceedings of the 17th International Conference on Spoken Language Translation
Translationese is a phenomenon present in human translations, simultaneous interpreting, and even machine translations. Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear. This study analyzes translationese patterns in translation, interpreting, and machine translation outputs in order to explore possible reasons. In our analysis we – (i) detail two non-invasive ways of detecting translationese and (ii) compare translationese across human and machine translations from text and speech. We find that machine translation shows traces of translationese, but does not reproduce the patterns found in human translation, offering support to the hypothesis that such patterns are due to the model (human vs machine) rather than to the data (written vs spoken).
Search
Co-authors
- Stefan Fischer 1
- Jörg Knappen 1
- Katrin Menzel 1
- Yuri Bizzoni 1
- Tom S Juzek 1
- show all...