Modelado de tópicos aplicado al análisis del papel del aprendizaje automático en revisiones sistemáticas

Modeling of topics applied to the analysis of the paper of automatic learning in systemic revisions

Contenido principal del artículo

Andrés Mauricio Grisales-Aguirre
Carlos Julio Figueroa-Vallejo

Resumen

El objetivo de la investigación fue analizar el papel del aprendizaje automático de datos en las revisiones sistemáticas de literatura. Se aplicó la técnica de Procesamiento de Lenguaje Natural denominada modelado de tópicos, a un conjunto de títulos y resúmenes recopilados de la base de datos Scopus. Especificamente se utilizó la técnica de Asignación Latente de Dirichlet (LDA), a partir de la cual se lograron descubrir y comprender las temáticas subyacentes en la colección de documentos. Los resultados mostraron la utilidad de la técnica utilizada en la revisión exploratoria de literatura, al permitir agrupar los resultados por temáticas. Igualmente, se pudo identificar las áreas y actividades específicas donde más se ha aplicado el aprendizaje automático, en lo referente a revisiones de literatura. Se concluye que la técnica LDA es una estrategia fácil de utilizar y cuyos resultados permiten abordar una amplia colección de documentos de manera sistemática y coherente, reduciendo notablemente el tiempo de la revisión.

Palabras clave:

Descargas

Los datos de descargas todavía no están disponibles.

Detalles del artículo

Biografía del autor/a (VER)

Andrés Mauricio Grisales-Aguirre, Universidad Católica Luis Amigó, Manizales

Matemático, Estudiante de Doctorado en Ciencias – Matemáticas

Carlos Julio Figueroa-Vallejo, Corporación Universitaria Remington, Caucasia

Ingeniero de Sistemas, Especialista en Big Data e Inteligencia de Negocios

Referencias (VER)

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959-975. https://doi.org/10.1016/j.joi.2017.08.007

Alamri, A., & Stevensony, M. (2015). Automatic identification of potentially contradictory claims to support systematic reviews. 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 930–937. https://doi.org/10.1109/BIBM.2015.7359808

Ambalavanan, A. K., & Devarakonda, M. V. (2020). Using the contextual language model BERT for multi-criteria classification of scientific articles. Journal of Biomedical Informatics, 112, 103578. https://doi.org/10.1016/j.jbi.2020.103578

Antons, D., Breidbach, C. F., Joshi, A. M., & Salge, T. O. (2021). Computational Literature Reviews: Method, Algorithms, and Roadmap. Organizational Research Methods, 1094428121991230. https://doi.org/10.1177/1094428121991230

Arno, A., Elliott, J., Wallace, B., Turner, T., & Thomas, J. (2021). The views of health guideline developers on the use of automation in health evidence synthesis. Systematic Reviews, 10(1), 16. https://doi.org/10.1186/s13643-020-01569-2

Asmussen, C. B., & Møller, C. (2019). Smart literature review: a practical topic modelling approach to exploratory literature review. Journal of Big Data, 6(1), 1-18.. https://doi.org/10.1186/s40537-019-0255-7

Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2009). On smoothing and inference for topic models. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009. (pp. 25 - 36)

Bertolini, M., Mezzogori, D., Neroni, M., & Zammori, F. (2021). Machine Learning for industrial applications: a comprehensive literature review. Expert Systems with Applications, 175, 114820. https://doi.org/10.1016/j.eswa.2021.114820

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, (4–5). https://doi.org/10.1016/b978-0-12-411519-4.00006-9

Chai, K. E. K., Lines, R. L. J., Gucciardi, D. F., & Ng, L. (2021). Research Screener: a machine learning tool to semi-automate abstract screening for systematic reviews. Systematic Reviews, 10(1), 93. https://doi.org/10.1186/s13643-021-01635-3

Chishtie, J. A., Babineau, J., Bielska, I. A., Cepoiu-Martin, M., Irvine, M., Koval, A., Marchand, J.-S., Turcotte, L., Jeji, T., & Jaglal, S. (2019). Visual Analytic Tools and Techniques in Population Health and Health Services Research: Protocol for a Scoping Review. JMIR Research Protocols, 8(10), e14019. https://doi.org/10.2196/14019

Cohen, A. M., Ambert, K., & McDonagh, M. (2009). Cross-topic learning for work prioritization in systematic review creation and update. Journal of the American Medical Informatics Association: JAMIA, 16(5), 690–704. https://doi.org/10.1197/jamia.M3162

Elliott, J. H., Synnot, A., Turner, T., Simmonds, M., Akl, E. A., McDonald, S., ... & Pearson, L. (2017). Living systematic review: 1. Introduction—the why, what, when, and how. Journal of Clinical Epidemiology, 91, 23-30. https://doi.org/10.1016/j.jclinepi.2017.08.010

Gates, A., Guitard, S., Pillay, J., Elliott, S. A., Dyson, M. P., Newton, A. S., & Hartling, L. (2019). Performance and Usability of Machine Learning for Screening in Systematic Reviews: A Comparative Evaluation of Three Tools. https://doi.org/10.23970/ahrqepcmethmachineperformance

Gates, A., Johnson, C., & Hartling, L. (2018). Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool. Systematic Reviews, 7(1), 45. https://doi.org/10.1186/s13643-018-0707-8

Gates, A., Vandermeer, B., & Hartling, L. (2018). Technology-assisted risk of bias assessment in systematic reviews: a prospective cross-sectional evaluation of the RobotReviewer machine learning tool. Journal of Clinical Epidemiology, 96, 54–62). https://doi.org/10.1016/j.jclinepi.2017.12.015

Genc, Y., Altuger-Genc, G., & Tatoglu, A. (2020). Systematic Review of ASEE Conference Proceedings (2007-2016) with A Machine Learning Approach. International Journal of Engineering Education, 36(5), 1722–1735.

Gorunescu, F. (2011). Data Mining: Concepts, models and techniques (Vol. 12). Springer Science & Business Media. https://doi.org/10.1007/978-3-642-19721-5

Guler, S., Capkin, S., & Sezgin, E. A. (2021). The Evolution of Publications in the Field of Scoliosis: A Detailed Investigation of Global Scientific Output Using Bibliometric Approaches. Turkish Neurosurgery, 31(1). https://doi.org/10.5137/1019-5149.JTN.30216-20.2

Hamel, C., Hersi, M., Kelly, S. E., Tricco, A. C., Straus, S., Wells, G., Pham, B., & Hutton, B. (2021). Guidance for using artificial intelligence for title and abstract screening while conducting knowledge syntheses. BMC Medical Research Methodology, 21(1), 285. https://doi.org/10.1186/s12874-021-01451-2

Hamel, C., Kelly, S. E., Thavorn, K., Rice, D. B., Wells, G. A., & Hutton, B. (2020). An evaluation of DistillerSR’s machine learning-based prioritization tool for title/abstract screening - impact on reviewer-relevant outcomes. BMC Medical Research Methodology, 20(1), 256. https://doi.org/10.1186/s12874-020-01129-1

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169 - 14211. https://doi.org/10.1007/s11042-018-6894-4

Jonnalagadda, S. R., Goyal, P., & Huffman, M. D. (2015). Automating data extraction in systematic reviews: A systematic review. Systematic Reviews, 4(1), 1 – 16. https://doi.org/10.1186/s13643-015-0066-7

Jurafsky, D., & Martin, J. H. (2008). Speech and Language Processing: An introduction to speech recognition, computational linguistics and natural language processing. Upper Saddle River, NJ: Prentice Hall.

Kang, Z., Catal, C., & Tekinerdogan, B. (2020). Machine learning applications in production lines: A systematic literature review. Computers & Industrial Engineering, 149. https://doi.org/10.1016/j.cie.2020.106773

Khamparia, A., & Singh, K. M. (2019). A systematic review on deep learning architectures and applications. Expert Systems, 36(3). https://doi.org/10.1111/exsy.12400

Kherwa, P., & Bansal, P. (2018). Topic Modeling: A Comprehensive Review. ICST Transactions on Scalable Information Systems, 7(24). https://doi.org/10.4108/eai.13-7-2018.159623

Klymenko, O., Braun, D., & Matthes, F. (2020). Automatic Text Summarization: A State-of-the-Art Review. En Proceedings of the 22nd International Conference on Enterprise Information Systems. https://doi.org/10.5220/0009723306480655

Kowsari, K., Jafari-Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4). https://doi.org/10.3390/info10040150

Kumeno, F. (2020). Sofware engneering challenges for machine learning applications: A literature review. Intelligent Decision Technologies, 13(4), 463 – 476. https://doi.org/10.3233/idt-190160

Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2-3), 93–118. https://doi.org/10.1080/19312458.2018.1430754

Marín-López, J., Robledo, S., & Duque-Méndez, N. (2017). Marketing Emprendedor: Una perspectiva cronológica utilizando Tree of Science. Revista Civilizar De Empresa Y Economía, 7(13), 113-123.

Marín-Velásquez, T. D., & Arrojas-Tocuyo, D. D. J. (2021). Revistas científicas de América Latina y el Caribe en SciELO, Scopus y Web of Science en el área de Ingeniería y Tecnología: su relación con variables socioeconómicas. Revista Española de Documentación Científica, 44(3). https://doi.org/10.3989/redc.2021.3.1812

Marshall, I. J., Johnson, B. T., Wang, Z., Rajasekaran, S., & Wallace, B. C. (2020). Semi-Automated evidence synthesis in health psychology: current methods and future prospects. Health Psychology Review, 14(1), 145–158. https://doi.org/10.1080/17437199.2020.1716198

Marshall, I. J., Kuiper, J., & Wallace, B. C. (2016). RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association: JAMIA, 23(1), 193–201. https://doi.org/10.1093/jamia/ocv044

Millard, T., Synnot, A., Elliott, J., Green, S., McDonald, S., & Turner, T. (2019). Feasibility and acceptability of living systematic reviews: results from a mixed-methods evaluation. Systematic Reviews, 8(1), 325. https://doi.org/10.1186/s13643-019-1248-5

Millán, J. D., Polanco, F., Ossa, J. C., Béria, J. S., & Cudina, J. N. (2017). La cienciometría, su método y su filosofía: Reflexiones epistémicas de sus alcances en el siglo XXI. Revista Guillermo de Ockham, 15(2), 17-27. https://doi.org/10.21500/22563202.3492

Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & PRISMA, G. (2014). Ítems de referencia para publicar revisiones sistemáticas y metaanálisis: la Declaración PRISMA. Revista Española de Nutrición Humana y Dietética, 18(3), 172-181. https://doi.org/10.14306/renhyd.18.3.114

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., & Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic Reviews, 4(1), 1-22. https://doi.org/10.1186/2046-4053-4-5

Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan-a web and mobile app for systematic reviews. Systematic Reviews, 5(1), 210. https://doi.org/10.1186/s13643-016-0384-4

Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 569-577.

Prabhakaran, S. (2018). Topic Modeling with Gensim (Python). Machine Learning Plus.

Qiang, J., Qian, Z., Li, Y., Yuan, Y., & Wu, X. (2020). Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/tkde.2020.2992485

Ramírez-Carvajal, D., Toro-Cardona, A., & Grisales-Aguirre, A. (2021). Competencias en networking: perspectivas desde una revisión literaria. Revista de Ingenierías Interfaces, 4(1), 103 -127.

Ramos-Enríquez, V., Duque, P., & Salazar, J. A. V. (2021). Responsabilidad Social Corporativa y Emprendimiento: evolución y tendencias de investigación. Desarrollo Gerencial, 13(1), 1–34. https://doi.org/10.17081/dege.13.1.4210

Rethlefsen, M. L., Kirtley, S., Waffenschmidt, S., Ayala, A. P., Moher, D., Page, M. J., & Koffel, J. B. (2021). PRISMA-S: an extension to the PRISMA statement for reporting literature searches in systematic reviews. Systematic Reviews, 10(1), 1-19. https://doi.org/10.1186/s13643-020-01542-z

Robledo, S., Grisales-Aguirre, A. M., Hughes, M., & Eggers, F. (2021). “Hasta la vista, baby”–will machine learning terminate human literature reviews in entrepreneurship? Journal of Small Business Management, 1-30. https://doi.org/10.1080/00472778.2021.1955125

Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining (pp. 399-408). https://doi.org/10.1145/2684822.2685324

Rodríguez-Jiménez, A., & Pérez-Jacinto, A. O. (2017). Métodos científicos de indagación y de construcción del conocimiento. Revista Escuela de Administración de negocios, (82), 175-195. https://doi.org/10.21158/01208160.n82.2017.1647

Sangwan, N., & Bhatnagar, V. (2020). Comprehensive Contemplation of Probabilistic Aspects in Intelligent Analytics. International Journal of Service Science, Management, Engineering and Technology (IJSSMET), 11(1), 116–141. https://doi.org/10.4018/IJSSMET.2020010108

Sami, I. R. (2020). Automatic Contextual Storytelling in a Natural Language Corpus. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 3249-3252). https://doi.org/10.1145/3340531.3418507

Sutton, A., & Marshall, C. (2017). Mapping The Systematic Review Toolbox. Value in Health, 20(9). https://doi.org/10.1016/j.jval.2017.08.2232

Soboczenski, F., Trikalinos, T. A., Kuiper, J., Bias, R. G., Wallace, B. C., & Marshall, I. J. (2019). Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study. BMC Medical Informatics and Decision Making, 19(1), 96. https://doi.org/10.1186/s12911-019-0814-z

Tighe, P. J., Sannapaneni, B., Fillingim, R. B., Doyle, C., Kent, M., Shickel, B., & Rashidi, P. (2020). Forty-two Million Ways to Describe Pain: Topic Modeling of 200,000 PubMed Pain-Related Abstracts Using Natural Language Processing and Deep Learning-Based Text Generation. Pain Medicine , 21(11), 3133–3160. https://doi.org/10.1093/pm/pnaa061

Tranfield, D., Denyer, D., & Smart, P. (2003). Towards a methodology for developing evidence‐informed management knowledge by means of systematic review. British Journal of Management, 14(3), 207-222. https://doi.org/10.1111/1467-8551.00375

Tsou, A. Y., Treadwell, J. R., Erinoff, E., & Schoelles, K. (2020). Machine learning for screening prioritization in systematic reviews: comparative performance of Abstrackr and EPPI-Reviewer. Systematic Reviews, 9(1), 73. https://doi.org/10.1186/s13643-020-01324-7

Urrútia, G., & Bonfill, X. (2010). Declaración PRISMA: una propuesta para mejorar la publicación de revisiones sistemáticas y metaanálisis. Medicina clínica, 135(11), 507-511. https://doi.org/10.1016/j.medcli.2010.01.015

Valencia-Hernández, D. S., Robledo, S., Pinilla, R., Duque-Méndez, N. D., & Olivar-Tost, G. (2020). Sap algorithm for citation analysis: An improvement to tree of science. Ingeniería e Investigación, 40(1). https://doi.org/10.15446/ing.investig.v40n1.77718

Vinkers, C. H., Lamberink, H. J., Tijdink, J. K., Heus, P., Bouter, L., Glasziou, P., Moher, D., Damen, J. A., Hooft, L., & Otte, W. M. (2021). The methodological quality of 176,620 randomized controlled trials published between 1966 and 2018 reveals a positive trend but also an urgent need for improvement. PLoS Biology, 19(4), e3001162. https://doi.org/10.1371/journal.pbio.3001162

Waffenschmidt, S., Hausner, E., Sieben, W., Jaschinski, T., Knelangen, M., & Overesch, I. (2018). Effective study selection using text mining or a single-screening approach: a study protocol. Systematic Reviews, 7(1), 166. https://doi.org/10.1186/s13643-018-0839-x

Walker, V. R., Schmitt, C. P., Wolfe, M. S., Nowak, A. J., Kulesza, K., Williams, A. R., Shin, R., Cohen, J., Burch, D., Stout, M. D., Shipkowski, K. A., & Rooney, A. A. (2022). Evaluation of a semi-automated data extraction tool for public health literature-based reviews: Dextr. Environment International, 159, 107025. https://doi.org/10.1016/j.envint.2021.107025

Wallace, B. C. (2018). Automating biomedical evidence synthesis: Recent work and directions forward. BIRNDL@ SIGIR. https://openreview.net/pdf?id=Hkby3SWO-B

Wang, C., Paisley, J., & Blei, D. (2011). Online variational inference for the hierarchical Dirichlet process. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics 752-760. JMLR Workshop and Conference Proceedings.

Wei, J., Han, S., & Zou, L. (2020). Vision-kg: Topic-centric visualization system for summarizing knowledge graph. In Proceedings of the 13th International Conference on Web Search and Data Mining, 857-860. https://doi.org/10.1145/3336191.3371863

Weißer, T., Saßmannshausen, T., Ohrndorf, D., Burggräf, P., & Wagner, J. (2020). A clustering approach for topic filtering within systematic literature reviews. MethodsX, 7, 100831. https://doi.org/10.1016/j.mex.2020.100831

Xie, T., Qin, P., & Zhu, L. (2018). Study on the Topic Mining and Dynamic Visualization in View of LDA Model. Modern Applied Science, 13(1), 204. https://doi.org/10.5539/mas.v13n1p204

Zhang, C., Li, Z., & Zhang, J. (2018). A survey on visualization for scientific literature topics. Journal of Visualization, 21(2), 321-335. https://doi.org/10.1007/s12650-017-0462-2

Zhao, H., Phung, D., Huynh, V., Jin, Y., Du, L., & Buntine, W. (2021). Topic Modelling Meets Deep Neural Networks: A Survey. https://doi.org/10.24963/ijcai.2021/638

Zimmerman, J., Soler, R. E., Lavinder, J., Murphy, S., Atkins, C., Hulbert, L., Lusk, R., & Ng, B. P. (2021). Iterative guided machine learning-assisted systematic literature reviews: a diabetes case study. Systematic Reviews, 10(1), 97. https://doi.org/10.1186/s13643-021-01640-6

Zuluaga, M., Robledo, S., Osorio-Zuluaga, G. A., Yathe, L., Gonzalez, D., & Taborda, G. (2016). Metabolomics and pesticides: systematic literature review using graph theory for analysis of references. Nova, 14(25), 121-138. https://doi.org/10.22490/24629448.1735

Citado por: