Embedding Oriented Adaptable Semantic Annotation Framework for Amharic Web Documents

Kidane Woldemariyam, Dr. Fekade Getahun


Abstract
The Web has become a source of information, where information is provided by humans for humans and its growth has increased necessity to get solutions that intelligently extract valuable knowledge from existing and newly added web documents with no (minimal) supervisions. However, due to the unstructured nature of existing data on the Web, effective extraction of this knowledge is limited for both human beings and software agents. Thus, this research work designed generic and embedding oriented framework that automatically annotates semantically Amharic web documents using ontology. This framework significantly reduces manual annotation and learning cost used for semantic annotation of Amharic web documents with its nature of adaptability with minimal modification. The results have also implied that neural network techniques are promising for semantic annotation, especially for less resourced languages like Amharic in comparison to language dependent techniques that have cost of speed and challenge of adaptation into new domains and languages. We experiment the feasibility of the proposed approach using Amharic news collected from WALTA news agency and Amharic Wikipedia. Our results show that the proposed solution exhibits 70.68% of precision, 66.89% of recall and 68.53% of f-measure in semantic annotation for a morphologically complex Amharic language with limited size dataset.
Anthology ID:
2020.winlp-1.3
Volume:
Proceedings of the The Fourth Widening Natural Language Processing Workshop
Month:
July
Year:
2020
Address:
Seattle, USA
Venues:
ACL | WS | WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7
URL:
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.