Keith Suderman
2020
Infrastructure for Semantic Annotation in the Genomics Domain
Mahmoud El-Haj
|
Nathan Rutherford
|
Matthew Coole
|
Ignatius Ezeani
|
Sheryl Prentice
|
Nancy Ide
|
Jo Knight
|
Scott Piao
|
John Mariani
|
Paul Rayson
|
Keith Suderman
Proceedings of The 12th Language Resources and Evaluation Conference
We describe a novel super-infrastructure for biomedical text mining which incorporates an end-to-end pipeline for the collection, annotation, storage, retrieval and analysis of biomedical and life sciences literature, combining NLP and corpus linguistics methods. The infrastructure permits extreme-scale research on the open access PubMed Central archive. It combines an updatable Gene Ontology Semantic Tagger (GOST) for entity identification and semantic markup in the literature, with a NLP pipeline scheduler (Buster) to collect and process the corpus, and a bespoke columnar corpus database (LexiDB) for indexing. The corpus database is distributed to permit fast indexing, and provides a simple web front-end with corpus linguistics methods for sub-corpus comparison and retrieval. GOST is also connected as a service in the Language Application (LAPPS) Grid, in which context it is interoperable with other NLP tools and data in the Grid and can be combined with them in more complex workflows. In a literature based discovery setting, we have created an annotated corpus of 9,776 papers with 5,481,543 words.
Towards Standardization of Web Service Protocols for NLPaaS
Jin-Dong Kim
|
Nancy Ide
|
Keith Suderman
Proceedings of the 1st International Workshop on Language Technology Platforms
Several web services for various natural language processing (NLP) tasks (‘‘NLP-as-a-service” or NLPaaS) have recently been made publicly available. However, despite their similar functionality these services often differ in the protocols they use, thus complicating the development of clients accessing them. A survey of currently available NLPaaS services suggests that it may be possible to identify a minimal application layer protocol that can be shared by NLPaaS services without sacrificing functionality or convenience, while at the same time simplifying the development of clients for these services. In this paper, we hope to raise awareness of the interoperability problems caused by the variety of existing web service protocols, and describe an effort to identify a set of best practices for NLPaaS protocol design. To that end, we survey and compare protocols used by NLPaaS services and suggest how these protocols may be further aligned to reduce variation.
Search
Co-authors
- Nancy Ide 2
- Mahmoud El-Haj 1
- Nathan Rutherford 1
- Matthew Coole 1
- Ignatius Ezeani 1
- show all...