Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem

Miguel Alexis Solano-Jiménez; Jose Julio Tobar-Cifuentes; Luz Marina Sierra-Martínez; Carlos Alberto Cobos-Lozada

doi:10.19053/01211129.v29.n54.2020.11762

Vol. 29 No. 54 (2020)
Continuos Publication

Papers

Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem

https://doi.org/10.19053/01211129.v29.n54.2020.11762

Published 2020-09-18

Miguel Alexis Solano-Jiménez
Jose Julio Tobar-Cifuentes
Luz Marina Sierra-Martínez, Ph. D.
Carlos Alberto Cobos-Lozada, Ph. D.

Miguel Alexis Solano-Jiménez
Universidad del Cauca

Jose Julio Tobar-Cifuentes
Universidad del Cauca

Luz Marina Sierra-Martínez, Ph. D.
Universidad del Cauca

Carlos Alberto Cobos-Lozada, Ph. D.
Universidad del Cauca

How to Cite

Solano-Jiménez, M. A., Tobar-Cifuentes, J. J., Sierra-Martínez, L. M., & Cobos-Lozada, C. A. (2020). Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem. Revista Facultad de Ingeniería, 29(54), e11762. https://doi.org/10.19053/01211129.v29.n54.2020.11762

Download Citation

All articles included in the Revista Facultad de Ingeniería are published under the Creative Commons (BY) license.

Authors must complete, sign, and submit the Review and Publication Authorization Form of the manuscript provided by the Journal; this form should contain all the originality and copyright information of the manuscript.

The authors who publish in this Journal accept the following conditions:

a. The authors retain the copyright and transfer the right of the first publication to the journal, with the work registered under the Creative Commons attribution license, which allows third parties to use what is published as long as they mention the authorship of the work and the first publication in this Journal.

b. Authors can make other independent and additional contractual agreements for the non-exclusive distribution of the version of the article published in this journal (eg, include it in an institutional repository or publish it in a book) provided they clearly indicate that the work It was first published in this Journal.

c. Authors are allowed and recommended to publish their work on the Internet (for example on institutional or personal pages) before and during the process.
review and publication, as it can lead to productive exchanges and a greater and faster dissemination of published work.

d. The Journal authorizes the total or partial reproduction of the content of the publication, as long as the source is cited, that is, the name of the Journal, name of the author (s), year, volume, publication number and pages of the article.

e. The ideas and statements issued by the authors are their responsibility and in no case bind the Journal.

Abstract

Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.

Keywords

computational intelligence, computational linguistics, evolutionary computing, heuristic algorithms, natural language processing, parts of speech tagging, search methods

PDF PDF (Español) XML

Author Biography

Miguel Alexis Solano-Jiménez

Roles: Formal Analysis, Data curation, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing.

Jose Julio Tobar-Cifuentes

Roles: Formal Analysis, Data curation, Investigation, Methodology, Software, Validation, Writing – original draft, Writing – review & editing.

Luz Marina Sierra-Martínez, Ph. D.

Roles: Conceptualization, Methodology, Supervision, Project administration, Writing -original draft, Writing – review & editing.

Carlos Alberto Cobos-Lozada, Ph. D.

Roles: Conceptualization, Methodology, Supervision, Project administration, Writing -original draft, Writing – review & editing.

References

[1] T. Güngör, Handbook of Natural Language Processing (2 Edition ), 2011.

[2] D. Jurafsky, and J. H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,” Computational Linguistics, vol. 26(4), pp. 638-641, 2009. https://doi.org/10.1162/089120100750105975

[3] A. Alhasan, and A. T. Al-taani, “POS Tagging for Arabic Text Using Bee Colony Algorithm,” Procedia Computer Science, pp. 158-165, 2018. https://doi.org/10.1016/j.procs.2018.10.471

[4] L. M. Sierra Martínez, C. A. Cobos, and J. C. Corrales, “Memetic algorithm based on global-best harmony search and hill climbing for part of speech tagging,” in International Conference on Mining Intelligence and Knowledge Exploration, 2017, pp. 198-211. https://doi.org/10.1007/978-3-319-71928-3_20

[5] R. Forsati, and M. Shamsfard, “Novel harmony search-based algorithms for part-of-speech tagging,” Knowledge and Information Systems, vol. 42, pp. 709-736, 2014. https://doi.org/10.1007/s10115-013-0719-6

[6] A. Ekbal, and S. Saha, “Simulated annealing based classifier ensemble techniques: Application to part of speech tagging,” Information Fusion, vol. 14 (3), pp. 288-300, 2013. https://doi.org/10.1016/j.inffus.2012.06.002

[7] S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb, “A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA,” IEEE Transactions on Evolutionary Computation, vol. 12 (3), pp. 269-283, Jun. 2008. https://doi.org/10.1109/TEVC.2007.900837

[8] W. N. Francis, and H. Kucera, Brown Corpus Manual, 1979. http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM#bc8

[9] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, “Building a large annotated corpus of English: the penn treebank,” Computational Linguistics, vol. 19 (2), pp. 313-330, 1993. https://doi.org/10.1162/coli.2010.36.1.36100

[10] M. El-Haj, and R. Koulali, “KALIMAT a multipurpose Arabic Corpus,” in Second Workshop on Arabic Corpus Linguistics, 2013.

[11] T. Chakraborty, “Identification of Reduplication in Bengali Corpus and their Semantic Analysis : A Rule-Based Approach,” in Proceedings of the Workshop on Multiword Expressions: from Theory to Applications, 2010, pp. 73-76.

[12] O. Bojar, V. Diatka, P. Rychly, P. Strañak, V. Suchomel, Al. Tamchyna, and D. Zeman, “HindEnCorp - Hindi-English and Hindi-only corpus for machine translation,” in Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014, pp. 3550-3555.

[13] S. S. Mukku, and R. Mamidi, “ACTSA: Annotated Corpus for Telugu Sentiment Analysis,” in Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, 2018, pp. 54-58. https://doi.org/10.18653/v1/w17-5408

[14] L. M. Sierra Martínez, C. A. Cobos, C. J. Muñoz Corrales, T. Curieux Rojas, E. Herrera-viedma, and D. H. Peluffo-ordóñez, “Building a Nasa Yuwe Language Corpus and Tagging with a Metaheuristic Approach,” Computación y Sistemas, vol. 22 (3), pp. 881-894, 2018. https://doi.org/10.13053/CyS-22-3-3018

[15] S. Petrov, D. Das, and R. McDonald, “A Universal Part-of-Speech Tagset,” in Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012, pp. 2089-2096.

[16] X. S. Yang, and S. Deb, “Cuckoo search: Recent advances and applications,” Neural Computing and Applications, vol. 24 (1), pp. 169-174, 2014. https://doi.org/10.1007/s00521-013-1367-1

[17] J. Brownlee, Clever Algorithms, 2011.

[18] F. Neri, and C. Cotta, “A Primer on Memetic Algorithms,” in Handbook of Memetic Algorithm, pp. 43-52, 2012.

[19] C. Cotta, Una Visión General de los Algoritmos Meméticos. http://www.lcc.uma.es/~ccottap/papers/memeticos.pdf

[20] E. R. R. Kato, G. D. de A. Aranha, and R. H. Tsunaki, “A new approach to solve the flexible job shop problem based on a hybrid particle swarm optimization and Random-Restart Hill Climbing,” Computers & Industrial Engineering, vol. 125, pp. 178-189, Nov. 2018. https://doi.org/10.1016/j.cie.2018.08.022

[21] J. Kennedy, and R. Eberhart, “Particle Swarm Optimization,” in Proceedings of ICNN'95 - International Conference on Neural Networks, 1995, pp. 1942-1948. https://doi.org/10.1109/ICNN.1995.488968

[22] A. Nickabadi, M. M. Ebadzadeh, and R. Safabakhsh, “A novel particle swarm optimization algorithm with adaptive inertia weight,” Applied Soft Computing, vol. 11 (4), pp. 3658-3670, 2011. https://doi.org/10.1016/j.asoc.2011.01.037

[23] R. Venkata Rao, “Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems,” International Journal of Industrial Engineering Computations, vol. 7 (1), pp. 19-34, Dec. 2016. https://doi.org/10.5267/j.ijiec.2015.8.004

[24] Institut Universitari de Lingüística Aplicada, IULA Spanish LSP Treebank, 2012.

[25] K. S. Pratt, “Design Patterns for Research Methods: Iterative Field Research,” in AAAI Spring Symposium: Experimental Design for Real, 2009, pp. 1-7.

[26] Q. Pan, M. F. Tasgetiren, and Y. Liang, “A discrete particle swarm optimization algorithm for the no-wait flowshop scheduling problem,” Computers & Operations Research, vol. 35, pp. 2807-2839, 2008. https://doi.org/10.1016/j.cor.2006.12.030

[27] K. Gao, F. Yang, M. Zhou, Q. Pan, and P. N. Suganthan, “Flexible job-shop rescheduling for new job insertion by using discrete Jaya algorithm,” IEEE Transactions on Cybernetics, vol. 49 (5), pp. 1944-1955, 2019. https://doi.org/10.1109/TCYB.2018.2817240

[28] D. H. Wolpert, and W. G. Macready, “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation, vol. 1 (1), pp. 67-82, Apr. 1997. https://doi.org/10.1109/4235.585893

[29] J. Alcalá-Fdez, L. Sánchez, S. García, M. J. del Jesus, S. Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit, V. M. Rivas, J. C. Fernández, and F. Herrera, “KEEL: A software tool to assess evolutionary algorithms for data mining problems,” Soft Computing, vol. 13, pp. 307-318, 2009. https://doi.org/10.1007/s00500-008-0323-y

Downloads

Download data is not yet available.

Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem

Abstract

Keywords

Author Biography

Miguel Alexis Solano-Jiménez

Jose Julio Tobar-Cifuentes

Luz Marina Sierra-Martínez, Ph. D.

Carlos Alberto Cobos-Lozada, Ph. D.

References

Downloads

Most read articles by the same author(s)

Similar Articles

Similar Articles

Systematic Mapping Study on Fast Factorization Using Parallel or Distributed Processing Applied to Cryptanalysis

Comparative Study of Cuckoo-Inspired Algorithms to Solve Large-Scale Continuous Optimization Problems

Advancing Probabilistic Frame Analysis: A Comprehensive Approach Using Monte Carlo Simulation and Response Surfaces

Tools for Developing Applications in the Semantic Web of Things: A Systematic Literature Review

Smart Product Backlog: Automatic Classification of User Stories Using Large Language Models (LLM)

HASCC: A Hybrid Algorithm for Skin Cancer Classification

Machine Learning Used to Close the Communication Gap through a Linguistic Tool for Deaf People

Fourier Analysis Approach to Identify Water Bodies Through Hyperspectral Imagery

Enhancing Programming Education with an Active Learning Plan and Artificial Intelligence Integration

Mathematical Model for the Design of a Broiler Chicken Supply Chain

		Fuente Academica Premier

		(Categoría B)