Development of a neural machine translation model optimized with BERT for translation from Quechua to Spanish

Authors

  • José Alfredo Sulla Torres Universidad Católica de Santa María, Perú
  • Beatrice Cueva Medina Universidad Católica de Santa María, Perú
  • Gabriel Fabrizio Tuco Casquino Universidad Católica de Santa María, Perú

DOI:

https://doi.org/10.18687/LACCEI2024.1.1.1636

Keywords:

Quechua, Neural Machine Translation, Lowresource Language, BERT, Transformer

Abstract

Quechua, a Native American language spoken by over 3 million people in Peru, plays a significant cultural role but is at risk of decline due to limited resources and the dominance of Spanish. This paper proposes a Quechua-to-Spanish neural machine translation (NMT) model using a Transformer-based architecture and a semi-supervised approach known as LMfusion. The model is trained on parallel datasets, and PRPE morphological segmentation is employed during preprocessing. Initial results show promise, and integrating the QuBERT language model is expected to enhance translation quality. Additionally, a user-friendly web interface has been developed to facilitate Quechua-Spanish translation. This research aims to address the challenges of translating a low-resource language like Quechua and contribute to improved communication between Quechua and Spanish speakers, preserving cultural heritage and facilitating equitable access to information and services.

Downloads

Published

2024-07-27

How to Cite

Sulla Torres, J. A., Cueva Medina, B., & Tuco Casquino, G. F. (2024). Development of a neural machine translation model optimized with BERT for translation from Quechua to Spanish. LACCEI, 1(10). https://doi.org/10.18687/LACCEI2024.1.1.1636