Gemma Bel-Enguix


2020

pdf bib
CPLM, a Parallel Corpus for Mexican Languages: Development and Interface
Gerardo Sierra Martínez | Cynthia Montaño | Gemma Bel-Enguix | Diego Córdova | Margarita Mota Montoya
Proceedings of The 12th Language Resources and Evaluation Conference

Mexico is a Spanish speaking country that has a great language diversity, with 68 linguistic groups and 364 varieties. As they face a lack of representation in education, government, public services and media, they present high levels of endangerment. Due to the lack of data available on social media and the internet, few technologies have been developed for these languages. To analyze different linguistic phenomena in the country, the Language Engineering Group developed the Corpus Paralelo de Lenguas Mexicanas (CPLM) [The Mexican Languages Parallel Corpus], a collaborative parallel corpus for the low-resourced languages of Mexico. The CPLM aligns Spanish with six indigenous languages: Maya, Ch’ol, Mazatec, Mixtec, Otomi, and Nahuatl. First, this paper describes the process of building the CPLM: text searching, digitalization and alignment process. Furthermore, we present some difficulties regarding dialectal and orthographic variations. Second, we present the interface and types of searching as well as the use of filters.

pdf bib
Enhancing Job Searches in Mexico City with Language Technologies
Gerardo Sierra Martínez | Gemma Bel-Enguix | Helena Gómez-Adorno | Juan Manuel Torres Moreno | Tonatiuh Hernández-García | Julio V Guadarrama-Olvera | Jesús-Germán Ortiz-Barajas | Ángela María Rojas | Tomas Damerau | Soledad Aragón Martínez
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

In this paper, we show the enhancing of the Demanded Skills Diagnosis (DiCoDe: Diagnóstico de Competencias Demandadas), a system developed by Mexico City’s Ministry of Labor and Employment Promotion (STyFE: Secretaría de Trabajo y Fomento del Empleo de la Ciudad de México) that seeks to reduce information asymmetries between job seekers and employers. The project uses webscraping techniques to retrieve job vacancies posted on private job portals on a daily basis and with the purpose of informing training and individual case management policies as well as labor market monitoring. For this purpose, a collaboration project between STyFE and the Language Engineering Group (GIL: Grupo de Ingeniería Lingüística) was established in order to enhance DiCoDe by applying NLP models and semantic analysis. By this collaboration, DiCoDe’s job vacancies system’s macro-structure and its geographic referencing at the city hall (municipality) level were improved. More specifically, dictionaries were created to identify demanded competencies, skills and abilities (CSA) and algorithms were developed for dynamic classifying of vacancies and identifying terms for searches on free text, in order to improve the results and processing time of queries.