News Categorisation Based on Pre-Trained Transformer Models

Authors

  • Espin-Riofrio, César
  • Murillo-Cepeda, Vanessa
  • García-Zambrano, David
  • Mendoza Morán, Verónica
  • Montejo-Ráez, Arturo
  • Zumba Gamboa, Johanna

DOI:

https://doi.org/10.18687/LACCEI2023.1.1.1076

Keywords:

Natural Language Processing, Transformer models, news categorisation.

Abstract

The rise of digital journalism, the amount of news and the continuous number of people accessing these contents, often generates that third parties through web platforms and social networks have the opportunity to persuade readers with content that alters their opinion or behaviour on a topic, so it is necessary to classify news using Natural Language Processing (NLP) techniques. This work seeks to experiment with pre-trained Transformer models using transfer learning and fine tuning to obtain a model capable of determining whether a news item is satire, opinion or information. To do so, we use a labelled dataset of news in English presented for the SemEval 2023 campaign, translating it into Spanish to experiment also in this language. We use pre-trained Transformer models for text classification tasks in the mentioned languages, thus, we compare several models and their predictions using evaluation metrics. The results give indications of the goodness of the models considering the type of news subjective, in the case of satire and opinion, and objective for information, thus contributing to future research related to text classification, specifically news categorisation.

Downloads

Published

2024-04-16

Issue

Section

Articles