Digital Texts as a Teaching Alternative of the Mother Language

Traducción automática de un conjunto de entrenamiento para extracción semántica de relaciones

Main Article Content

Abstract

Machine translation (MT) is used to obtain annotated corpus of English corpus which can be applicable to different natural language processing (NLP) tasks. Considering that there are more resources or data sets for training NLP models in
English language, this paper explores the application of MT to automate NLP tasks in
Spanish. Thus, the article describes a dataset for the extraction of generic relations (reACE) and the construction of a semantic extraction model of relations in Spanish (ER), based on the set of samples translated from English to Spanish. The results show that for the MT task it is necessary to implement a corpus preediting process in English to avoid translation and post-editing errors and maintain the original corpus annotations. The ER models in Spanish achieve measures of accuracy, completeness, and F-value comparable to those obtained by the model in the English language, which suggests that machine translation is a useful tool to perform NLP tasks in the Spanish language.

Keywords:

Downloads

Download data is not yet available.

Article Details

References (SEE)

Ananthram, A., Allaway, E., & McKeown, K. (2020). Event Guided Denoising for

Multilingual Relation Learning. arXiv preprint: arXiv:2012.02721. https://

doi.org/10.18653/v1/2020.coling-main.131

Anastasopoulos, A. (2019). An Analysis of Source-Side Grammatical Errors in NMT. arXiv preprint: arXiv:1905.10024.

Bach, N., & Sameer, B. (2007). A Survey on Relation Extraction. Language Technologies Institute, Carnegie Mellon University 178. https://doi.org/10.1007/978-981-10-7359-5_6

Bahr, R. H., Lebby, S., & Wilkinson, L. C. (2020). Spelling Error Analysis of Written Summaries in an Academic Register by Students with Specific Learning Disabilities: Phonological, Orthographic, and Morphological Influences. Reading and Writing, 33(1), 121-142. https://doi.org/10.1007/s11145-

-09977-0

Belinkov, Y., & Glass, J. (2019). Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics, 7, 49-72. https://doi.org/10.1162/tacl_a_00254 Carrino, C. P., Costa-Jussà, M. R., & Fonollosa, J. A. (2020). Automatic Spanish Translation of SQuAD Dataset for Multi-lingual Question Answering. In Proceedings of the 12th Language Resources and Evaluation Conference

(5515-5523).

Castillo, M. N. (2020). Corpus Básico del Español de Chile ©: metodología de procesamiento y análisis. Lexis, 44(2), 483-523. https://doi.org/10.18800/lexis.202002.004

Cheng, Y. (2019). Neural Machine Translation. In Joint Training for Neural Machine Translation (1-10). Springer. https://doi.org/10.1007/978-981-32- 9748-7_1

Collantes, C., Mallo, J., Parra, C., Quiñones, H. & Serrano, R. (2018). Pásate al lado oscuro: ventajas de la traducción automática para el traductor profesional. La Linterna del Traductor, 17, 33-39.

Gamallo, P., & García, M. (2017). LinguaKit: Uma ferramenta multilingue para a análise linguística e a extração de informação. Linguamática, 9(1), 19-28. https://doi.org/10.21814/lm.9.1.243

Guan, H., Li, J., Xu, H., & Devarakonda, M. (2020). Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction. arXiv preprint: arXiv:2004.06216. https://doi.org/10.1109/ICHI52183.2021.00090

Gurulingappa, H., Rajput, A. M., Roberts, A., Fluck, J., Hofmann-Apitius, M., & Toldo, L. (2012). Development of a Benchmark Corpus to Support the Automatic Extraction of Drug-Related Adverse Effects from Medical Case Reports. Journal of Biomedical Informatics, 45(5), 885–892. https://doi.org/10.1016/j.jbi.2012.04.008

Hachey, B., Grover, C., & Tobin, R. (2012). Datasets for Generic Relation Extraction. Natural Language Engineering, 18(1), 21–59. http://dx.doi.org/10.1017/ S1351324911000106

Haque, R., Hasanuzzaman, M., & Way, A. (2020). Analysing Terminology Translation Errors in Statistical and Neural Machine Translation. Machine Translation, 34(2), 149-195. https://doi.org/10.1007/s10590-020 09251-z

Hidalgo-Ternero, C. M. (2021). Google Translate vs. DeepL. MonTI. Monografías de Traducción e Interpretación, 154-177.

Kramer, O. (2016). Scikit-learn. In Machine learning for evolution strategies. Studies in Big Data, vol 20 (pp. 45-53). Springer, Cham. https://doi. org/10.1007/978-3-319-33383-0_5

Kumar, S. (2017). A Survey of Deep Learning Methods for Relation Extraction. arXiv preprint: arXiv:1705.03645.

Lin, Y., Liu, Z., & Sun, M. (2017). Neural Relation Extraction with Multi-Lingual Attention. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 34–43. Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/P17-1004

Mesquita, F., Schmidek, J., & Barbosa, D. (2013). Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 447-457. Association for Computational Linguistics.

Mikelenić, B., & Tadić, M. (2020). Building the Spanish-Croatian Parallel Corpus. In Proceedings of the 12th Language Resources and Evaluation Conference,3932-3936. European Language Resources Association

Mitchell, A., Strassel, S., Huang, S., & Zakhary, R. (2005). Ace 2004 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia, 1, 1-1.

Nasar, Z., Jaffry, S. W., & Malik, M. K. (2021). Named Entity Recognition and Relation Extraction: State-of-the-Art. ACM Computing Surveys (CSUR), 54(1), 1-39. https://doi.org/10.1145/3445965

Ni, J., & Florian, R. (2019). Neural Cross-Lingual Relation Extraction Based on Bilingual Word Embedding Mapping. arXiv preprint: arXiv:1911.00069. https://doi. org/10.18653/v1/D19-1038

Pastor, G. C. (2018). Laughing One’s Head Off in Spanish Subtitles: A Corpus-Based Study on Diatopic Variation and Its Consequences for Translation1. Fraseología, Diatopía y Traducción/Phraseology, Diatopic Variation and Translation, 17, 32. https://doi.org/10.1075/ivitra.17.03co

Pawar, S., Palshikar, G. K., & Bhattacharyya, P. (2017). Relation Extraction: A Survey. arXiv preprint: arXiv:1712.05191.

Popović, M. (2020). Relations Between Comprehensibility and Adequacy Errors in Machine Translation Output. In Proceedings of the 24th Conference on Computational Natural Language Learning, (pp. 256-264). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020. conll-1.19

Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., & Salakoski, T. (2007). BioInfer: A Corpus for Information Extraction in the Biomedical Domain. BMC Bioinformatics, 8(1), 50. https://doi.org/10.1186/1471- 2105-8-50

Rodrigues, J., & Branco, A. (2020). Argument Identification in a Language Without Labeled Data. In International Conference on Computational Processing of the Portuguese Language, (pp. 335-345). https://doi.org/10.1007/978-3- 030-41505-1_32

Sánchez, A. (2010). Traducción automática, corpus lingüísticos y desambiguación automática de los significados de las palabras. En R. Rabadán, M. Fernández & T. Guzmán (coords.), Lengua, traducción, recepción: en honor de Julio César Santoyo, vol. 1 (pp. 555-587). Universidad de León, Área de Publicaciones. Smirnova, A., & Cudré-Mauroux, P. (2018). Relation Extraction Using Distant

Supervision: A Survey. ACM Computing Surveys (CSUR), 51(5), 1-35. https:// doi.org/10.1145/3241741

Torres, J. P., De Piñérez Reyes, R. G., & Bucheli, V. A. (2018). Support Vector Machines for Semantic Relation Extraction in Spanish Language. Colombian Conference on Computing, 326-337. https://doi.org/10.1007/978-3-319-98998- 3_26

Verga, P., Belanger, D., Strubell, E., Roth, B., & McCallum, A. (2015). Multilingual Relation Extraction Using Compositional Universal Schema. arXiv preprint: arXiv:1511.06396. https://doi.org/10.18653/v1/N16-1103

Virmani, C., Pillai, A., & Juneja, D. (2017). Extracting Information from Social Networks Using NLP. International Journal of Computational Intelligence Research, 13(4), 621-630.

Walker, C., Strassel, S., Medero, J., & Maeda, K. (2006). ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium. https://doi.org/10.35111/ mwxc-vh88

Wu, Y., Schuster, M., Chen, Z., Le, Q., Norouzi, M., Machery, W., Krikun, M. et al. (2016). Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation. arXiv preprint: arXiv:1609.08144.

Yamada, M. (2019). The Impact of Google Neural Machine Translation on Post-Editing by Student Translators. The Journal of Specialised Translation, 31, 87-106.

Zelenko, D., Chinatsu, A., and Anthony, R. (2003, Feb.). Kernel Methods for Relation Extraction. Journal of Machine Learning Research, 3, 1083-1106. https:// dl.acm.org/doi/10.3115/1118693.1118703

Zhang, Q., Mengdong C., and Lianzhong, L. (2017). A Review on Entity Relation Extraction. In 2017 Second International Conference on Mechanical, Control and Computer Engineering (ICMCCE). IEEE. https://doi.org/10.1109/ ICMCCE.2017.14

Zhila, A., & Gelbukh, A. (2013). Comparison of Open Information Extraction for Spanish and English. Computational Linguistics and Intellectual Technologies, 12(1), 794-802.

Citado por: