Hate Speech Identification in Texts Through Phraseological Analysis and TF-IDF Representation of N-Grams

César Espin-Riofrio; Ángela Yanza-Montalván; Rocío Carchi-Encalada; Mayra Magdalena Arias Candelario; Angélica Cruz-Chóez; Juan Montesdeoca-Rodríguez; Marcos Bailón-Guaranda

doi:10.18687/LACCEI2025.1.1.624

Hate Speech Identification in Texts Through Phraseological Analysis and TF-IDF Representation of N-Grams

Autores/as

César Espin-Riofrio Universidad De Guayaquil - (Ec), Ecuador
Ángela Yanza-Montalván Universidad De Guayaquil - (Ec), Ecuador
Rocío Carchi-Encalada Universidad De Guayaquil - (Ec), Ecuador
Mayra Magdalena Arias Candelario Universidad De Guayaquil - (Ec), Ecuador
Angélica Cruz-Chóez Universidad De Guayaquil - (Ec), Ecuador
Juan Montesdeoca-Rodríguez Universidad De Guayaquil - (Ec), Ecuador
Marcos Bailón-Guaranda Universidad De Guayaquil - (Ec), Ecuador

DOI:

https://doi.org/10.18687/LACCEI2025.1.1.624

Palabras clave:

Hate speech, Phraseological features, n-grams, TF-IDF, Natural Language Processing

Resumen

The phenomenon of hate speech, widely present on digital platforms, poses unique challenges in the Spanish language due toits linguistic ric hness and cultural diversity—characteristics that complicate the automatic identification of such content. This issue is further exacerbated by the language's ability to disguise hate messages through sarcasm, irony, or specific cultural references. This research focuses on the extraction of phraseological features and TF-IDF n- grams, utilizing traditional statistical classification models, neural networks, and ensemble methods to enhance the performance of classification models collectively. The OffendEs dataset, specifically labeled for hate speech tasks in Spanish, was used. Results demonstrate that ensemble models achieve higher levels of accuracy, striking a good balance between classes and showcasing their ability to handle the linguistic complexity of Spanish. In particular, the Voting Classifier achieved a macro F1 score of 0.742261. Our results were compared with predictions made using specific pre-trained models for hate speech detection, such as Piuba and Pysentimiento, demonstrating that our approach outperforms these models. These findings highlight the effectiveness of our methodology and its contribution to the development of more accurate tools for the automatic detection of hate speech in Spanish.

Descargas

PDF (Inglés)

Publicado

2025-07-27

Número

Vol. 1 Núm. 12 (2025): LACCEI 2025

Sección

Articles

Derechos de autor

Derechos de autor 2025 LACCEI

Ver política oficial de derechos de autor de LACCEI

Licencia

Esta obra está bajo una Licencia Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional.

LACCEI conserva el copyright de todos los artículos publicados bajo los términos de su acuerdo de transferencia de copyright. Como titular del copyright, LACCEI distribuye los artículos al público bajo la Licencia Internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0 (CC BY-NC-SA 4.0).

Cómo citar

Espin-Riofrio, C., Yanza-Montalván, Ángela, Carchi-Encalada, R., Arias Candelario, M. M., Cruz-Chóez, A., Montesdeoca-Rodríguez, J., & Bailón-Guaranda, M. (2025). Hate Speech Identification in Texts Through Phraseological Analysis and TF-IDF Representation of N-Grams. LACCEI, 1(12). https://doi.org/10.18687/LACCEI2025.1.1.624

Descargar cita

Hate Speech Identification in Texts Through Phraseological Analysis and TF-IDF Representation of N-Grams

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Publicado

Número

Sección

Derechos de autor

Licencia

Cómo citar

Artículos más leídos del mismo autor/a

Derechos de autor

Licencia

Información

Idioma

ISSN