Jason Baldridge
2020
Mapping Natural Language Instructions to Mobile UI Action Sequences
Yang Li
|
Jiacong He
|
Xin Zhou
|
Yuan Zhang
|
Jason Baldridge
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PixelHelp, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in How-To instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PixelHelp.
Proceedings of the First Workshop on Advances in Language and Vision Research
Xin Wang
|
Jesse Thomason
|
Ronghang Hu
|
Xinlei Chen
|
Peter Anderson
|
Qi Wu
|
Asli Celikyilmaz
|
Jason Baldridge
|
William Yang Wang
Proceedings of the First Workshop on Advances in Language and Vision Research
Search
Co-authors
- Yang Li 1
- Jiacong He 1
- Xin Zhou 1
- Yuan Zhang 1
- Xin Wang 1
- show all...