Aller directement au menu principal Aller directement au contenu principal Aller au pied de page

Traduction automatique d'un ensemble de formation pour prelevement semantique de relations

Résumé

La traduction automatique (TA) est utilisée pour obtenir des corpus annotés à partir de corpus de langue anglaise, qui peuvent être applicables à différents travaux de traitement du langage naturel (NLP). En tenant compte du fait qu'il existe davantage de ressources ou d'ensembles de données pour la formation de modèles PLN en langue anglaise, cet article explore l'application de la TA pour automatiser les travaux PLN en langue espagnole. Ainsi, l'article décrit un ensemble de données pour le prélèvement de relations génériques (reACE) et la construction d'un modèle de prélèvement de relations sémantiques en espagnol (ER), basé sur l'ensemble des échantillons traduits de l'anglais à l'espagnol. Les résultats montrent que pour le travail de TA, il est nécessaire de mettre en oeuvre un processus de pré-édition du corpus anglais, afin d'éviter les erreurs post-édition de traduction et de garder les annotations du corpus original. Les modèles ER en espagnol atteignent des mesures de précision, de complétude et de valeur F comparables à celles obtenues par le modèle en langue anglaise, ce qui suggère que la traduction automatique est un outil utile pour accomplir des travaux PLN en langue espagnole.

Mots-clés

linguistique informatique, traduction automatique, linguistique de corpus, prélèvement de relations

PDF (Español) XML (Español)

Références

  • Ananthram, A., Allaway, E., & McKeown, K. (2020). Event Guided Denoising for Multilingual Relation Learning. arXiv preprint: arXiv:2012.02721. https://doi.org/10.18653/v1/2020.coling-main.131
  • Anastasopoulos, A. (2019). An Analysis of Source-Side Grammatical Errors in NMT. arXiv preprint: arXiv:1905.10024.
  • Bach, N., & Sameer, B. (2007). A Survey on Relation Extraction. Language Technologies Institute, Carnegie Mellon University 178. https://doi.org/10.1007/978-981-10-7359-5_6
  • Bahr, R. H., Lebby, S., & Wilkinson, L. C. (2020). Spelling Error Analysis of Written Summaries in an Academic Register by Students with Specific Learning Disabilities: Phonological, Orthographic, and Morphological Influences. Reading and Writing, 33(1), 121-142. https://doi.org/10.1007/s11145-019-09977-0
  • Belinkov, Y., & Glass, J. (2019). Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics, 7, 49-72. https://doi.org/10.1162/tacl_a_00254
  • Carrino, C. P., Costa-Jussà, M. R., & Fonollosa, J. A. (2020). Automatic Spanish Translation of SQuAD Dataset for Multilingual Question Answering. In Proceedings of the 12th Language Resources and Evaluation Conference (5515-5523).
  • Castillo, M. N. (2020). Corpus Básico del Español de Chile ©: metodología de procesamiento y análisis. Lexis, 44(2), 483-523. https://doi.org/10.18800/lexis.202002.004
  • Cheng, Y. (2019). Neural Machine Translation. In Joint Training for Neural Machine Translation (1-10). Springer. https://doi.org/10.1007/978-981-32-9748-7_1
  • Collantes, C., Mallo, J., Parra, C., Quiñones, H. & Serrano, R. (2018). Pásate al lado oscuro: ventajas de la traducción automática para el traductor profesional. La Linterna del Traductor, 17, 33-39.
  • Gamallo, P., & García, M. (2017). LinguaKit: Uma ferramenta multilingue para a análise linguística e a extração de informação. Linguamática, 9(1), 19-28. https://doi.org/10.21814/lm.9.1.243
  • Guan, H., Li, J., Xu, H., & Devarakonda, M. (2020). Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction. arXiv preprint: arXiv:2004.06216. https://doi.org/10.1109/ICHI52183.2021.00090
  • Gurulingappa, H., Rajput, A. M., Roberts, A., Fluck, J., Hofmann-Apitius, M., & Toldo, L. (2012). Development of a Benchmark Corpus to Support the Automatic Extraction of Drug-Related Adverse Effects from Medical Case Reports. Journal of Biomedical Informatics, 45(5), 885–892. https://doi.org/10.1016/j.jbi.2012.04.008
  • Hachey, B., Grover, C., & Tobin, R. (2012). Datasets for Generic Relation Extraction. Natural Language Engineering, 18(1), 21–59. http://dx.doi.org/10.1017/S1351324911000106
  • Haque, R., Hasanuzzaman, M., & Way, A. (2020). Analysing Terminology Translation Errors in Statistical and Neural Machine Translation. Machine Translation, 34(2), 149-195. https://doi.org/10.1007/s10590-02009251-z
  • Hidalgo-Ternero, C. M. (2021). Google Translate vs. DeepL. MonTI. Monografías de Traducción e Interpretación, 154-177.
  • Kramer, O. (2016). Scikit-learn. In Machine learning for evolution strategies. Studies in Big Data, vol 20 (pp. 45-53). Springer, Cham. https://doi.org/10.1007/978-3-319-33383-0_5
  • Kumar, S. (2017). A Survey of Deep Learning Methods for Relation Extraction. arXiv preprint: arXiv:1705.03645.
  • Lin, Y., Liu, Z., & Sun, M. (2017). Neural Relation Extraction with Multi-Lingual Attention. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 34–43. Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/P17-1004
  • Mesquita, F., Schmidek, J., & Barbosa, D. (2013). Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 447-457. Association for Computational Linguistics.
  • Mikelenić, B., & Tadić, M. (2020). Building the Spanish-Croatian Parallel Corpus. In Proceedings of the 12th Language Resources and Evaluation Conference, 3932-3936. European Language Resources Association
  • Mitchell, A., Strassel, S., Huang, S., & Zakhary, R. (2005). Ace 2004 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia, 1, 1-1.
  • Nasar, Z., Jaffry, S. W., & Malik, M. K. (2021). Named Entity Recognition and Relation Extraction: State-of-the-Art. ACM Computing Surveys (CSUR), 54(1), 1-39. https://doi.org/10.1145/3445965
  • Ni, J., & Florian, R. (2019). Neural Cross-Lingual Relation Extraction Based on Bilingual Word Embedding Mapping. arXiv preprint: arXiv:1911.00069. https://doi.org/10.18653/v1/D19-1038
  • Pastor, G. C. (2018). Laughing One’s Head Off in Spanish Subtitles: A Corpus-Based Study on Diatopic Variation and Its Consequences for Translation1. Fraseología, Diatopía y Traducción/Phraseology, Diatopic Variation and Translation, 17, 32. https://doi.org/10.1075/ivitra.17.03co
  • Pawar, S., Palshikar, G. K., & Bhattacharyya, P. (2017). Relation Extraction: A Survey. arXiv preprint: arXiv:1712.05191.
  • Popović, M. (2020). Relations Between Comprehensibility and Adequacy Errors in Machine Translation Output. In Proceedings of the 24th Conference on Computational Natural Language Learning, (pp. 256-264). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.conll-1.19
  • Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., & Salakoski, T. (2007). BioInfer: A Corpus for Information Extraction in the Biomedical Domain. BMC Bioinformatics, 8(1), 50. https://doi.org/10.1186/1471-2105-8-50
  • Rodrigues, J., & Branco, A. (2020). Argument Identification in a Language Without Labeled Data. In International Conference on Computational Processing of the Portuguese Language, (pp. 335-345). https://doi.org/10.1007/978-3-030-41505-1_32
  • Sánchez, A. (2010). Traducción automática, corpus lingüísticos y desambiguación automática de los significados de las palabras. En R. Rabadán, M. Fernández & T. Guzmán (coords.), Lengua, traducción, recepción: en honor de Julio César Santoyo, vol. 1 (pp. 555-587). Universidad de León, Área de Publicaciones.
  • Smirnova, A., & Cudré-Mauroux, P. (2018). Relation Extraction Using Distant Supervision: A Survey. ACM Computing Surveys (CSUR), 51(5), 1-35. https://doi.org/10.1145/3241741
  • Torres, J. P., De Piñérez Reyes, R. G., & Bucheli, V. A. (2018). Support Vector Machines for Semantic Relation Extraction in Spanish Language. Colombian Conference on Computing, 326-337. https://doi.org/10.1007/978-3-319-98998- 3_26
  • Verga, P., Belanger, D., Strubell, E., Roth, B., & McCallum, A. (2015). Multilingual Relation Extraction Using Compositional Universal Schema. arXiv preprint: arXiv:1511.06396. https://doi.org/10.18653/v1/N16-1103
  • Virmani, C., Pillai, A., & Juneja, D. (2017). Extracting Information from Social Networks Using NLP. International Journal of Computational Intelligence Research, 13(4), 621-630.
  • Walker, C., Strassel, S., Medero, J., & Maeda, K. (2006). ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium. https://doi.org/10.35111/mwxc-vh88
  • Wu, Y., Schuster, M., Chen, Z., Le, Q., Norouzi, M., Machery, W., Krikun, M. et al. (2016). Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation. arXiv preprint: arXiv:1609.08144.
  • Yamada, M. (2019). The Impact of Google Neural Machine Translation on Post-Editing by Student Translators. The Journal of Specialised Translation, 31, 87-106.
  • Zelenko, D., Chinatsu, A., and Anthony, R. (2003, Feb.). Kernel Methods for Relation Extraction. Journal of Machine Learning Research, 3, 1083-1106. https://dl.acm.org/doi/10.3115/1118693.1118703
  • Zhang, Q., Mengdong C., and Lianzhong, L. (2017). A Review on Entity Relation Extraction. In 2017 Second International Conference on Mechanical, Control and Computer Engineering (ICMCCE). IEEE. https://doi.org/10.1109/ICMCCE.2017.14
  • Zhila, A., & Gelbukh, A. (2013). Comparison of Open Information Extraction for Spanish and English. Computational Linguistics and Intellectual Technologies, 12(1), 794-802.

Téléchargements

Les données relatives au téléchargement ne sont pas encore disponibles.

Articles similaires

<< < 14 15 16 17 18 19 20 21 22 23 > >> 

Vous pouvez également Lancer une recherche avancée de similarité pour cet article.