Katja Markert

2020

pdf bib abs
Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction
Raphael Schumann | Lili Mou | Yao Lu | Olga Vechtomova | Katja Markert
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information. A good summary is characterized by language fluency and high information overlap with the source sentence. We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics. We search for a high-scoring summary by discrete optimization. Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores. Additionally, we demonstrate that the commonly reported ROUGE F1 metric is sensitive to summary length. Since this is unwillingly exploited in recent work, we emphasize that future evaluation should explicitly group summarization systems by output length brackets.

pdf bib abs
Dataset Reproducibility and IR Methods in Timeline Summarization
Leo Born | Maximilian Bacher | Katja Markert
Proceedings of The 12th Language Resources and Evaluation Conference

Timeline summarization (TLS) generates a dated overview of real-world events based on event-specific corpora. The two standard datasets for this task were collected using Google searches for news reports on given events. Not only is this IR method not reproducible at different search times, it also uses components (such as document popularity) that are not always available for any large news corpus. It is unclear how TLS algorithms fare when provided with event corpora collected with varying IR methods. We therefore construct event-specific corpora from a large static background corpus, the newsroom dataset, using differing, relatively simple IR methods based on raw text alone. We show that the choice of IR method plays a crucial role in the performance of various TLS algorithms. A weak TLS algorithm can even match a stronger one by employing a stronger IR method in the data collection phase. Furthermore, the results of TLS systems are often highly sensitive to additional sentence filtering. We consequently advocate for integrating IR into the development of TLS systems and having a common static background corpus for evaluation of TLS systems.

pdf bib abs
Doctor Who? Framing Through Names and Titles in German
Esther van den Berg | Katharina Korfhage | Josef Ruppenhofer | Michael Wiegand | Katja Markert
Proceedings of The 12th Language Resources and Evaluation Conference

Entity framing is the selection of aspects of an entity to promote a particular viewpoint towards that entity. We investigate entity framing of political figures through the use of names and titles in German online discourse, enhancing current research in entity framing through titling and naming that concentrates on English only. We collect tweets that mention prominent German politicians and annotate them for stance. We find that the formality of naming in these tweets correlates positively with their stance. This confirms sociolinguistic observations that naming and titling can have a status-indicating function and suggests that this function is dominant in German tweets mentioning political figures. We also find that this status-indicating function is much weaker in tweets from users that are politically left-leaning than in tweets by right-leaning users. This is in line with observations from moral psychology that left-leaning and right-leaning users assign different importance to maintaining social hierarchies.

Co-authors

Maximilian Bacher 1

Esther van den Berg 1

Katharina Korfhage 1

Josef Ruppenhofer 1

Michael Wiegand 1

Venues

LREC2
ACL1