Seokhwan Kim


2020

pdf bib
TutorialVQA: Question Answering Dataset for Tutorial Videos
Anthony Colas | Seokhwan Kim | Franck Dernoncourt | Siddhesh Gupte | Zhe Wang | Doo Soon Kim
Proceedings of The 12th Language Resources and Evaluation Conference

Despite the number of currently available datasets on video-question answering, there still remains a need for a dataset involving multi-step and non-factoid answers. Moreover, relying on video transcripts remains an under-explored topic. To adequately address this, we propose a new question answering task on instructional videos, because of their verbose and narrative nature. While previous studies on video question answering have focused on generating a short text as an answer, given a question and video clip, our task aims to identify a span of a video segment as an answer which contains instructional details with various granularities. This work focuses on screencast tutorial videos pertaining to an image editing program. We introduce a dataset, TutorialVQA, consisting of about 6,000 manually collected triples of (video, question, answer span). We also provide experimental results with several baseline algorithms using the video transcripts. The results indicate that the task is challenging and call for the investigation of new algorithms.

pdf bib
Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access
Seokhwan Kim | Mihail Eric | Karthik Gopalakrishnan | Behnam Hedayatnia | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Most prior work on task-oriented dialogue systems are restricted to a limited coverage of domain APIs, while users oftentimes have domain related requests that are not covered by the APIs. In this paper, we propose to expand coverage of task-oriented dialogue systems by incorporating external unstructured knowledge sources. We define three sub-tasks: knowledge-seeking turn detection, knowledge selection, and knowledge-grounded response generation, which can be modeled individually or jointly. We introduce an augmented version of MultiWOZ 2.1, which includes new out-of-API-coverage turns and responses grounded on external knowledge sources. We present baselines for each sub-task using both conventional and neural approaches. Our experimental results demonstrate the need for further research in this direction to enable more informative conversational systems.