Ir para o menu de navegação principal Ir para o conteúdo principal Ir para o rodapé

Tradução automática de um conjunto de treinamento para extração semântica de relações

Resumo

A tradução automática (TA) é usada para obter corpus anotados partindo de corpus da língua inglesa, que podem ser aplicáveis a diferentes tarefas de processamento de linguagem natural (PLN). Levando em conta que existem mais recursos ou conjuntos de dados para treinamento de modelos PLN em inglês, este artigo explora a aplicação da TA para automatizar tarefas PLN em espanhol. Desta forma, o artigo descreve um conjunto de dados para extração de relações genéricas (reACE) e a construção de um modelo de extração semântica de relações em espanhol (ER), baseado no conjunto de amostras traduzidas do inglês para o espanhol. Os resultados mostram que para a tarefa de TA é necessário implementar um processo de pré-edição do corpus em inglês, a fim de evitar erros de tradução e pós-edição e manter as anotações do corpus original. Os modelos ER em espanhol alcançam medidas de acurácia, completude e valor F comparáveis às obtidas pelo modelo na língua inglesa, o que sugere que a tradução automática é uma ferramenta útil para realizar tarefas de PLN na língua espanhola.

Palavras-chave

linguística computacional, tradução automática, linguística de corpus, extração de relações

PDF (Español) XML (Español)

Referências

  • Ananthram, A., Allaway, E., & McKeown, K. (2020). Event Guided Denoising for Multilingual Relation Learning. arXiv preprint: arXiv:2012.02721. https://doi.org/10.18653/v1/2020.coling-main.131
  • Anastasopoulos, A. (2019). An Analysis of Source-Side Grammatical Errors in NMT. arXiv preprint: arXiv:1905.10024.
  • Bach, N., & Sameer, B. (2007). A Survey on Relation Extraction. Language Technologies Institute, Carnegie Mellon University 178. https://doi.org/10.1007/978-981-10-7359-5_6
  • Bahr, R. H., Lebby, S., & Wilkinson, L. C. (2020). Spelling Error Analysis of Written Summaries in an Academic Register by Students with Specific Learning Disabilities: Phonological, Orthographic, and Morphological Influences. Reading and Writing, 33(1), 121-142. https://doi.org/10.1007/s11145-019-09977-0
  • Belinkov, Y., & Glass, J. (2019). Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics, 7, 49-72. https://doi.org/10.1162/tacl_a_00254
  • Carrino, C. P., Costa-Jussà, M. R., & Fonollosa, J. A. (2020). Automatic Spanish Translation of SQuAD Dataset for Multilingual Question Answering. In Proceedings of the 12th Language Resources and Evaluation Conference (5515-5523).
  • Castillo, M. N. (2020). Corpus Básico del Español de Chile ©: metodología de procesamiento y análisis. Lexis, 44(2), 483-523. https://doi.org/10.18800/lexis.202002.004
  • Cheng, Y. (2019). Neural Machine Translation. In Joint Training for Neural Machine Translation (1-10). Springer. https://doi.org/10.1007/978-981-32-9748-7_1
  • Collantes, C., Mallo, J., Parra, C., Quiñones, H. & Serrano, R. (2018). Pásate al lado oscuro: ventajas de la traducción automática para el traductor profesional. La Linterna del Traductor, 17, 33-39.
  • Gamallo, P., & García, M. (2017). LinguaKit: Uma ferramenta multilingue para a análise linguística e a extração de informação. Linguamática, 9(1), 19-28. https://doi.org/10.21814/lm.9.1.243
  • Guan, H., Li, J., Xu, H., & Devarakonda, M. (2020). Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction. arXiv preprint: arXiv:2004.06216. https://doi.org/10.1109/ICHI52183.2021.00090
  • Gurulingappa, H., Rajput, A. M., Roberts, A., Fluck, J., Hofmann-Apitius, M., & Toldo, L. (2012). Development of a Benchmark Corpus to Support the Automatic Extraction of Drug-Related Adverse Effects from Medical Case Reports. Journal of Biomedical Informatics, 45(5), 885–892. https://doi.org/10.1016/j.jbi.2012.04.008
  • Hachey, B., Grover, C., & Tobin, R. (2012). Datasets for Generic Relation Extraction. Natural Language Engineering, 18(1), 21–59. http://dx.doi.org/10.1017/S1351324911000106
  • Haque, R., Hasanuzzaman, M., & Way, A. (2020). Analysing Terminology Translation Errors in Statistical and Neural Machine Translation. Machine Translation, 34(2), 149-195. https://doi.org/10.1007/s10590-02009251-z
  • Hidalgo-Ternero, C. M. (2021). Google Translate vs. DeepL. MonTI. Monografías de Traducción e Interpretación, 154-177.
  • Kramer, O. (2016). Scikit-learn. In Machine learning for evolution strategies. Studies in Big Data, vol 20 (pp. 45-53). Springer, Cham. https://doi.org/10.1007/978-3-319-33383-0_5
  • Kumar, S. (2017). A Survey of Deep Learning Methods for Relation Extraction. arXiv preprint: arXiv:1705.03645.
  • Lin, Y., Liu, Z., & Sun, M. (2017). Neural Relation Extraction with Multi-Lingual Attention. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 34–43. Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/P17-1004
  • Mesquita, F., Schmidek, J., & Barbosa, D. (2013). Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 447-457. Association for Computational Linguistics.
  • Mikelenić, B., & Tadić, M. (2020). Building the Spanish-Croatian Parallel Corpus. In Proceedings of the 12th Language Resources and Evaluation Conference, 3932-3936. European Language Resources Association
  • Mitchell, A., Strassel, S., Huang, S., & Zakhary, R. (2005). Ace 2004 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia, 1, 1-1.
  • Nasar, Z., Jaffry, S. W., & Malik, M. K. (2021). Named Entity Recognition and Relation Extraction: State-of-the-Art. ACM Computing Surveys (CSUR), 54(1), 1-39. https://doi.org/10.1145/3445965
  • Ni, J., & Florian, R. (2019). Neural Cross-Lingual Relation Extraction Based on Bilingual Word Embedding Mapping. arXiv preprint: arXiv:1911.00069. https://doi.org/10.18653/v1/D19-1038
  • Pastor, G. C. (2018). Laughing One’s Head Off in Spanish Subtitles: A Corpus-Based Study on Diatopic Variation and Its Consequences for Translation1. Fraseología, Diatopía y Traducción/Phraseology, Diatopic Variation and Translation, 17, 32. https://doi.org/10.1075/ivitra.17.03co
  • Pawar, S., Palshikar, G. K., & Bhattacharyya, P. (2017). Relation Extraction: A Survey. arXiv preprint: arXiv:1712.05191.
  • Popović, M. (2020). Relations Between Comprehensibility and Adequacy Errors in Machine Translation Output. In Proceedings of the 24th Conference on Computational Natural Language Learning, (pp. 256-264). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.conll-1.19
  • Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., & Salakoski, T. (2007). BioInfer: A Corpus for Information Extraction in the Biomedical Domain. BMC Bioinformatics, 8(1), 50. https://doi.org/10.1186/1471-2105-8-50
  • Rodrigues, J., & Branco, A. (2020). Argument Identification in a Language Without Labeled Data. In International Conference on Computational Processing of the Portuguese Language, (pp. 335-345). https://doi.org/10.1007/978-3-030-41505-1_32
  • Sánchez, A. (2010). Traducción automática, corpus lingüísticos y desambiguación automática de los significados de las palabras. En R. Rabadán, M. Fernández & T. Guzmán (coords.), Lengua, traducción, recepción: en honor de Julio César Santoyo, vol. 1 (pp. 555-587). Universidad de León, Área de Publicaciones.
  • Smirnova, A., & Cudré-Mauroux, P. (2018). Relation Extraction Using Distant Supervision: A Survey. ACM Computing Surveys (CSUR), 51(5), 1-35. https://doi.org/10.1145/3241741
  • Torres, J. P., De Piñérez Reyes, R. G., & Bucheli, V. A. (2018). Support Vector Machines for Semantic Relation Extraction in Spanish Language. Colombian Conference on Computing, 326-337. https://doi.org/10.1007/978-3-319-98998- 3_26
  • Verga, P., Belanger, D., Strubell, E., Roth, B., & McCallum, A. (2015). Multilingual Relation Extraction Using Compositional Universal Schema. arXiv preprint: arXiv:1511.06396. https://doi.org/10.18653/v1/N16-1103
  • Virmani, C., Pillai, A., & Juneja, D. (2017). Extracting Information from Social Networks Using NLP. International Journal of Computational Intelligence Research, 13(4), 621-630.
  • Walker, C., Strassel, S., Medero, J., & Maeda, K. (2006). ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium. https://doi.org/10.35111/mwxc-vh88
  • Wu, Y., Schuster, M., Chen, Z., Le, Q., Norouzi, M., Machery, W., Krikun, M. et al. (2016). Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation. arXiv preprint: arXiv:1609.08144.
  • Yamada, M. (2019). The Impact of Google Neural Machine Translation on Post-Editing by Student Translators. The Journal of Specialised Translation, 31, 87-106.
  • Zelenko, D., Chinatsu, A., and Anthony, R. (2003, Feb.). Kernel Methods for Relation Extraction. Journal of Machine Learning Research, 3, 1083-1106. https://dl.acm.org/doi/10.3115/1118693.1118703
  • Zhang, Q., Mengdong C., and Lianzhong, L. (2017). A Review on Entity Relation Extraction. In 2017 Second International Conference on Mechanical, Control and Computer Engineering (ICMCCE). IEEE. https://doi.org/10.1109/ICMCCE.2017.14
  • Zhila, A., & Gelbukh, A. (2013). Comparison of Open Information Extraction for Spanish and English. Computational Linguistics and Intellectual Technologies, 12(1), 794-802.

Downloads

Não há dados estatísticos.

Artigos Semelhantes

<< < 7 8 9 10 11 12 13 14 15 16 > >> 

Você também pode iniciar uma pesquisa avançada por similaridade para este artigo.