Stefano Faralli
2020
Multiple Knowledge GraphDB (MKGDB)
Stefano Faralli
|
Paola Velardi
|
Farid Yusifli
Proceedings of The 12th Language Resources and Evaluation Conference
We present MKGDB, a large-scale graph database created as a combination of multiple taxonomy backbones extracted from 5 existing knowledge graphs, namely: ConceptNet, DBpedia, WebIsAGraph, WordNet and the Wikipedia category hierarchy. MKGDB, thanks the versatility of the Neo4j graph database manager technology, is intended to favour and help the development of open-domain natural language processing applications relying on knowledge bases, such as information extraction, hypernymy discovery, topic clustering, and others. Our resource consists of a large hypernymy graph which counts more than 37 million nodes and more than 81 million hypernymy relations.
Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph
Georgeta Bordea
|
Stefano Faralli
|
Fleur Mougin
|
Paul Buitelaar
|
Gayo Diallo
Proceedings of The 12th Language Resources and Evaluation Conference
In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state of the art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.
Search
Co-authors
- Paola Velardi 1
- Farid Yusifli 1
- Georgeta Bordea 1
- Fleur Mougin 1
- Paul Buitelaar 1
- show all...
Venues
- LREC2