Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem

Main Article Content

Autores

Miguel Alexis Solano-Jiménez https://orcid.org/0000-0003-1936-3488
Jose Julio Tobar-Cifuentes https://orcid.org/0000-0002-5436-0816
Luz Marina Sierra-Martínez, Ph. D. https://orcid.org/0000-0003-3847-3324
Carlos Alberto Cobos-Lozada, Ph. D. https://orcid.org/0000-0002-6263-1911

Abstract

Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.

Keywords:

Article Details

Licence

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All articles included in the Revista Facultad de Ingeniería are published under the Creative Commons (BY) license.

Authors must complete, sign, and submit the Review and Publication Authorization Form of the manuscript provided by the Journal; this form should contain all the originality and copyright information of the manuscript.

The authors who publish in this Journal accept the following conditions:

a. The authors retain the copyright and transfer the right of the first publication to the journal, with the work registered under the Creative Commons attribution license, which allows third parties to use what is published as long as they mention the authorship of the work and the first publication in this Journal.

b. Authors can make other independent and additional contractual agreements for the non-exclusive distribution of the version of the article published in this journal (eg, include it in an institutional repository or publish it in a book) provided they clearly indicate that the work It was first published in this Journal.

c. Authors are allowed and recommended to publish their work on the Internet (for example on institutional or personal pages) before and during the process.
review and publication, as it can lead to productive exchanges and a greater and faster dissemination of published work.

d. The Journal authorizes the total or partial reproduction of the content of the publication, as long as the source is cited, that is, the name of the Journal, name of the author (s), year, volume, publication number and pages of the article.

e. The ideas and statements issued by the authors are their responsibility and in no case bind the Journal.

References

[1] T. Güngör, Handbook of Natural Language Processing (2 Edition ), 2011.

[2] D. Jurafsky, and J. H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,” Computational Linguistics, vol. 26(4), pp. 638-641, 2009. https://doi.org/10.1162/089120100750105975

[3] A. Alhasan, and A. T. Al-taani, “POS Tagging for Arabic Text Using Bee Colony Algorithm,” Procedia Computer Science, pp. 158-165, 2018. https://doi.org/10.1016/j.procs.2018.10.471

[4] L. M. Sierra Martínez, C. A. Cobos, and J. C. Corrales, “Memetic algorithm based on global-best harmony search and hill climbing for part of speech tagging,” in International Conference on Mining Intelligence and Knowledge Exploration, 2017, pp. 198-211. https://doi.org/10.1007/978-3-319-71928-3_20

[5] R. Forsati, and M. Shamsfard, “Novel harmony search-based algorithms for part-of-speech tagging,” Knowledge and Information Systems, vol. 42, pp. 709-736, 2014. https://doi.org/10.1007/s10115-013-0719-6

[6] A. Ekbal, and S. Saha, “Simulated annealing based classifier ensemble techniques: Application to part of speech tagging,” Information Fusion, vol. 14 (3), pp. 288-300, 2013. https://doi.org/10.1016/j.inffus.2012.06.002

[7] S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb, “A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA,” IEEE Transactions on Evolutionary Computation, vol. 12 (3), pp. 269-283, Jun. 2008. https://doi.org/10.1109/TEVC.2007.900837

[8] W. N. Francis, and H. Kucera, Brown Corpus Manual, 1979. http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM#bc8

[9] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, “Building a large annotated corpus of English: the penn treebank,” Computational Linguistics, vol. 19 (2), pp. 313-330, 1993. https://doi.org/10.1162/coli.2010.36.1.36100

[10] M. El-Haj, and R. Koulali, “KALIMAT a multipurpose Arabic Corpus,” in Second Workshop on Arabic Corpus Linguistics, 2013.

[11] T. Chakraborty, “Identification of Reduplication in Bengali Corpus and their Semantic Analysis : A Rule-Based Approach,” in Proceedings of the Workshop on Multiword Expressions: from Theory to Applications, 2010, pp. 73-76.

[12] O. Bojar, V. Diatka, P. Rychly, P. Strañak, V. Suchomel, Al. Tamchyna, and D. Zeman, “HindEnCorp - Hindi-English and Hindi-only corpus for machine translation,” in Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014, pp. 3550-3555.

[13] S. S. Mukku, and R. Mamidi, “ACTSA: Annotated Corpus for Telugu Sentiment Analysis,” in Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, 2018, pp. 54-58. https://doi.org/10.18653/v1/w17-5408

[14] L. M. Sierra Martínez, C. A. Cobos, C. J. Muñoz Corrales, T. Curieux Rojas, E. Herrera-viedma, and D. H. Peluffo-ordóñez, “Building a Nasa Yuwe Language Corpus and Tagging with a Metaheuristic Approach,” Computación y Sistemas, vol. 22 (3), pp. 881-894, 2018. https://doi.org/10.13053/CyS-22-3-3018

[15] S. Petrov, D. Das, and R. McDonald, “A Universal Part-of-Speech Tagset,” in Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012, pp. 2089-2096.

[16] X. S. Yang, and S. Deb, “Cuckoo search: Recent advances and applications,” Neural Computing and Applications, vol. 24 (1), pp. 169-174, 2014. https://doi.org/10.1007/s00521-013-1367-1

[17] J. Brownlee, Clever Algorithms, 2011.

[18] F. Neri, and C. Cotta, “A Primer on Memetic Algorithms,” in Handbook of Memetic Algorithm, pp. 43-52, 2012.

[19] C. Cotta, Una Visión General de los Algoritmos Meméticos. http://www.lcc.uma.es/~ccottap/papers/memeticos.pdf

[20] E. R. R. Kato, G. D. de A. Aranha, and R. H. Tsunaki, “A new approach to solve the flexible job shop problem based on a hybrid particle swarm optimization and Random-Restart Hill Climbing,” Computers & Industrial Engineering, vol. 125, pp. 178-189, Nov. 2018. https://doi.org/10.1016/j.cie.2018.08.022

[21] J. Kennedy, and R. Eberhart, “Particle Swarm Optimization,” in Proceedings of ICNN'95 - International Conference on Neural Networks, 1995, pp. 1942-1948. https://doi.org/10.1109/ICNN.1995.488968

[22] A. Nickabadi, M. M. Ebadzadeh, and R. Safabakhsh, “A novel particle swarm optimization algorithm with adaptive inertia weight,” Applied Soft Computing, vol. 11 (4), pp. 3658-3670, 2011. https://doi.org/10.1016/j.asoc.2011.01.037

[23] R. Venkata Rao, “Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems,” International Journal of Industrial Engineering Computations, vol. 7 (1), pp. 19-34, Dec. 2016. https://doi.org/10.5267/j.ijiec.2015.8.004

[24] Institut Universitari de Lingüística Aplicada, IULA Spanish LSP Treebank, 2012.

[25] K. S. Pratt, “Design Patterns for Research Methods: Iterative Field Research,” in AAAI Spring Symposium: Experimental Design for Real, 2009, pp. 1-7.

[26] Q. Pan, M. F. Tasgetiren, and Y. Liang, “A discrete particle swarm optimization algorithm for the no-wait flowshop scheduling problem,” Computers & Operations Research, vol. 35, pp. 2807-2839, 2008. https://doi.org/10.1016/j.cor.2006.12.030

[27] K. Gao, F. Yang, M. Zhou, Q. Pan, and P. N. Suganthan, “Flexible job-shop rescheduling for new job insertion by using discrete Jaya algorithm,” IEEE Transactions on Cybernetics, vol. 49 (5), pp. 1944-1955, 2019. https://doi.org/10.1109/TCYB.2018.2817240

[28] D. H. Wolpert, and W. G. Macready, “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation, vol. 1 (1), pp. 67-82, Apr. 1997. https://doi.org/10.1109/4235.585893

[29] J. Alcalá-Fdez, L. Sánchez, S. García, M. J. del Jesus, S. Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit, V. M. Rivas, J. C. Fernández, and F. Herrera, “KEEL: A software tool to assess evolutionary algorithms for data mining problems,” Soft Computing, vol. 13, pp. 307-318, 2009. https://doi.org/10.1007/s00500-008-0323-y

Downloads

Download data is not yet available.

Most read articles by the same author(s)