Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
Abstract
Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.
Keywords
computational intelligence, computational linguistics, evolutionary computing, heuristic algorithms, natural language processing, parts of speech tagging, search methods
Author Biography
Miguel Alexis Solano-Jiménez
Roles: Formal Analysis, Data curation, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing.
Jose Julio Tobar-Cifuentes
Roles: Formal Analysis, Data curation, Investigation, Methodology, Software, Validation, Writing – original draft, Writing – review & editing.
Luz Marina Sierra-Martínez, Ph. D.
Roles: Conceptualization, Methodology, Supervision, Project administration, Writing -original draft, Writing – review & editing.
Carlos Alberto Cobos-Lozada, Ph. D.
Roles: Conceptualization, Methodology, Supervision, Project administration, Writing -original draft, Writing – review & editing.
References
[1] T. Güngör, Handbook of Natural Language Processing (2 Edition ), 2011.
[2] D. Jurafsky, and J. H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,” Computational Linguistics, vol. 26(4), pp. 638-641, 2009. https://doi.org/10.1162/089120100750105975
[3] A. Alhasan, and A. T. Al-taani, “POS Tagging for Arabic Text Using Bee Colony Algorithm,” Procedia Computer Science, pp. 158-165, 2018. https://doi.org/10.1016/j.procs.2018.10.471
[4] L. M. Sierra Martínez, C. A. Cobos, and J. C. Corrales, “Memetic algorithm based on global-best harmony search and hill climbing for part of speech tagging,” in International Conference on Mining Intelligence and Knowledge Exploration, 2017, pp. 198-211. https://doi.org/10.1007/978-3-319-71928-3_20
[5] R. Forsati, and M. Shamsfard, “Novel harmony search-based algorithms for part-of-speech tagging,” Knowledge and Information Systems, vol. 42, pp. 709-736, 2014. https://doi.org/10.1007/s10115-013-0719-6
[6] A. Ekbal, and S. Saha, “Simulated annealing based classifier ensemble techniques: Application to part of speech tagging,” Information Fusion, vol. 14 (3), pp. 288-300, 2013. https://doi.org/10.1016/j.inffus.2012.06.002
[7] S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb, “A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA,” IEEE Transactions on Evolutionary Computation, vol. 12 (3), pp. 269-283, Jun. 2008. https://doi.org/10.1109/TEVC.2007.900837
[8] W. N. Francis, and H. Kucera, Brown Corpus Manual, 1979. http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM#bc8
[9] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, “Building a large annotated corpus of English: the penn treebank,” Computational Linguistics, vol. 19 (2), pp. 313-330, 1993. https://doi.org/10.1162/coli.2010.36.1.36100
[10] M. El-Haj, and R. Koulali, “KALIMAT a multipurpose Arabic Corpus,” in Second Workshop on Arabic Corpus Linguistics, 2013.
[11] T. Chakraborty, “Identification of Reduplication in Bengali Corpus and their Semantic Analysis : A Rule-Based Approach,” in Proceedings of the Workshop on Multiword Expressions: from Theory to Applications, 2010, pp. 73-76.
[12] O. Bojar, V. Diatka, P. Rychly, P. Strañak, V. Suchomel, Al. Tamchyna, and D. Zeman, “HindEnCorp - Hindi-English and Hindi-only corpus for machine translation,” in Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014, pp. 3550-3555.
[13] S. S. Mukku, and R. Mamidi, “ACTSA: Annotated Corpus for Telugu Sentiment Analysis,” in Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, 2018, pp. 54-58. https://doi.org/10.18653/v1/w17-5408
[14] L. M. Sierra Martínez, C. A. Cobos, C. J. Muñoz Corrales, T. Curieux Rojas, E. Herrera-viedma, and D. H. Peluffo-ordóñez, “Building a Nasa Yuwe Language Corpus and Tagging with a Metaheuristic Approach,” Computación y Sistemas, vol. 22 (3), pp. 881-894, 2018. https://doi.org/10.13053/CyS-22-3-3018
[15] S. Petrov, D. Das, and R. McDonald, “A Universal Part-of-Speech Tagset,” in Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012, pp. 2089-2096.
[16] X. S. Yang, and S. Deb, “Cuckoo search: Recent advances and applications,” Neural Computing and Applications, vol. 24 (1), pp. 169-174, 2014. https://doi.org/10.1007/s00521-013-1367-1
[17] J. Brownlee, Clever Algorithms, 2011.
[18] F. Neri, and C. Cotta, “A Primer on Memetic Algorithms,” in Handbook of Memetic Algorithm, pp. 43-52, 2012.
[19] C. Cotta, Una Visión General de los Algoritmos Meméticos. http://www.lcc.uma.es/~ccottap/papers/memeticos.pdf
[20] E. R. R. Kato, G. D. de A. Aranha, and R. H. Tsunaki, “A new approach to solve the flexible job shop problem based on a hybrid particle swarm optimization and Random-Restart Hill Climbing,” Computers & Industrial Engineering, vol. 125, pp. 178-189, Nov. 2018. https://doi.org/10.1016/j.cie.2018.08.022
[21] J. Kennedy, and R. Eberhart, “Particle Swarm Optimization,” in Proceedings of ICNN'95 - International Conference on Neural Networks, 1995, pp. 1942-1948. https://doi.org/10.1109/ICNN.1995.488968
[22] A. Nickabadi, M. M. Ebadzadeh, and R. Safabakhsh, “A novel particle swarm optimization algorithm with adaptive inertia weight,” Applied Soft Computing, vol. 11 (4), pp. 3658-3670, 2011. https://doi.org/10.1016/j.asoc.2011.01.037
[23] R. Venkata Rao, “Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems,” International Journal of Industrial Engineering Computations, vol. 7 (1), pp. 19-34, Dec. 2016. https://doi.org/10.5267/j.ijiec.2015.8.004
[24] Institut Universitari de Lingüística Aplicada, IULA Spanish LSP Treebank, 2012.
[25] K. S. Pratt, “Design Patterns for Research Methods: Iterative Field Research,” in AAAI Spring Symposium: Experimental Design for Real, 2009, pp. 1-7.
[26] Q. Pan, M. F. Tasgetiren, and Y. Liang, “A discrete particle swarm optimization algorithm for the no-wait flowshop scheduling problem,” Computers & Operations Research, vol. 35, pp. 2807-2839, 2008. https://doi.org/10.1016/j.cor.2006.12.030
[27] K. Gao, F. Yang, M. Zhou, Q. Pan, and P. N. Suganthan, “Flexible job-shop rescheduling for new job insertion by using discrete Jaya algorithm,” IEEE Transactions on Cybernetics, vol. 49 (5), pp. 1944-1955, 2019. https://doi.org/10.1109/TCYB.2018.2817240
[28] D. H. Wolpert, and W. G. Macready, “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation, vol. 1 (1), pp. 67-82, Apr. 1997. https://doi.org/10.1109/4235.585893
[29] J. Alcalá-Fdez, L. Sánchez, S. García, M. J. del Jesus, S. Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit, V. M. Rivas, J. C. Fernández, and F. Herrera, “KEEL: A software tool to assess evolutionary algorithms for data mining problems,” Soft Computing, vol. 13, pp. 307-318, 2009. https://doi.org/10.1007/s00500-008-0323-y