Modeling of topics applied to the analysis of the paper of automatic learning in systemic revisions
Abstract
The objective of the research was to analyze the role of machine data learning in systematic literature reviews. The Natural Language Processing technique called topic modeling was applied to a set of titles and abstracts collected from the Scopus database. Specifically, the Latent Dirichlet Assignment (LDA) technique was used, from which it was possible to discover and understand the underlying themes in the collection of documents. The results showed the usefulness of the technique used in the exploratory literature review, by allowing the results to be grouped by theme. Likewise, it was possible to identify the specific areas and activities where machine learning has been applied the most, in relation to literature reviews. It is concluded that the LDA technique is an easy-to-use strategy and whose results allow a wide collection of documents to be approached in a systematic and coherent manner, notably reducing the review time.
Keywords
topic modeling;, machine learning;, systematic reviews;, Latent Dirichlet Allocation
Author Biography
Andrés Mauricio Grisales-Aguirre
Matemático, Estudiante de Doctorado en Ciencias – Matemáticas
Carlos Julio Figueroa-Vallejo
Ingeniero de Sistemas, Especialista en Big Data e Inteligencia de Negocios
References
- Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959-975. https://doi.org/10.1016/j.joi.2017.08.007 DOI: https://doi.org/10.1016/j.joi.2017.08.007
- Alamri, A., & Stevensony, M. (2015). Automatic identification of potentially contradictory claims to support systematic reviews. 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 930–937. https://doi.org/10.1109/BIBM.2015.7359808 DOI: https://doi.org/10.1109/BIBM.2015.7359808
- Ambalavanan, A. K., & Devarakonda, M. V. (2020). Using the contextual language model BERT for multi-criteria classification of scientific articles. Journal of Biomedical Informatics, 112, 103578. https://doi.org/10.1016/j.jbi.2020.103578 DOI: https://doi.org/10.1016/j.jbi.2020.103578
- Antons, D., Breidbach, C. F., Joshi, A. M., & Salge, T. O. (2021). Computational Literature Reviews: Method, Algorithms, and Roadmap. Organizational Research Methods, 1094428121991230. https://doi.org/10.1177/1094428121991230 DOI: https://doi.org/10.1177/1094428121991230
- Arno, A., Elliott, J., Wallace, B., Turner, T., & Thomas, J. (2021). The views of health guideline developers on the use of automation in health evidence synthesis. Systematic Reviews, 10(1), 16. https://doi.org/10.1186/s13643-020-01569-2 DOI: https://doi.org/10.1186/s13643-020-01569-2
- Asmussen, C. B., & Møller, C. (2019). Smart literature review: a practical topic modelling approach to exploratory literature review. Journal of Big Data, 6(1), 1-18.. https://doi.org/10.1186/s40537-019-0255-7 DOI: https://doi.org/10.1186/s40537-019-0255-7
- Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2009). On smoothing and inference for topic models. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009. (pp. 25 - 36)
- Bertolini, M., Mezzogori, D., Neroni, M., & Zammori, F. (2021). Machine Learning for industrial applications: a comprehensive literature review. Expert Systems with Applications, 175, 114820. https://doi.org/10.1016/j.eswa.2021.114820 DOI: https://doi.org/10.1016/j.eswa.2021.114820
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, (4–5). https://doi.org/10.1016/b978-0-12-411519-4.00006-9 DOI: https://doi.org/10.1016/B978-0-12-411519-4.00006-9
- Chai, K. E. K., Lines, R. L. J., Gucciardi, D. F., & Ng, L. (2021). Research Screener: a machine learning tool to semi-automate abstract screening for systematic reviews. Systematic Reviews, 10(1), 93. https://doi.org/10.1186/s13643-021-01635-3 DOI: https://doi.org/10.1186/s13643-021-01635-3
- Chishtie, J. A., Babineau, J., Bielska, I. A., Cepoiu-Martin, M., Irvine, M., Koval, A., Marchand, J.-S., Turcotte, L., Jeji, T., & Jaglal, S. (2019). Visual Analytic Tools and Techniques in Population Health and Health Services Research: Protocol for a Scoping Review. JMIR Research Protocols, 8(10), e14019. https://doi.org/10.2196/14019 DOI: https://doi.org/10.2196/14019
- Cohen, A. M., Ambert, K., & McDonagh, M. (2009). Cross-topic learning for work prioritization in systematic review creation and update. Journal of the American Medical Informatics Association: JAMIA, 16(5), 690–704. https://doi.org/10.1197/jamia.M3162 DOI: https://doi.org/10.1197/jamia.M3162
- Elliott, J. H., Synnot, A., Turner, T., Simmonds, M., Akl, E. A., McDonald, S., ... & Pearson, L. (2017). Living systematic review: 1. Introduction—the why, what, when, and how. Journal of Clinical Epidemiology, 91, 23-30. https://doi.org/10.1016/j.jclinepi.2017.08.010 DOI: https://doi.org/10.1016/j.jclinepi.2017.08.010
- Gates, A., Guitard, S., Pillay, J., Elliott, S. A., Dyson, M. P., Newton, A. S., & Hartling, L. (2019). Performance and Usability of Machine Learning for Screening in Systematic Reviews: A Comparative Evaluation of Three Tools. https://doi.org/10.23970/ahrqepcmethmachineperformance DOI: https://doi.org/10.23970/AHRQEPCMETHMACHINEPERFORMANCE
- Gates, A., Johnson, C., & Hartling, L. (2018). Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool. Systematic Reviews, 7(1), 45. https://doi.org/10.1186/s13643-018-0707-8 DOI: https://doi.org/10.1186/s13643-018-0707-8
- Gates, A., Vandermeer, B., & Hartling, L. (2018). Technology-assisted risk of bias assessment in systematic reviews: a prospective cross-sectional evaluation of the RobotReviewer machine learning tool. Journal of Clinical Epidemiology, 96, 54–62). https://doi.org/10.1016/j.jclinepi.2017.12.015 DOI: https://doi.org/10.1016/j.jclinepi.2017.12.015
- Genc, Y., Altuger-Genc, G., & Tatoglu, A. (2020). Systematic Review of ASEE Conference Proceedings (2007-2016) with A Machine Learning Approach. International Journal of Engineering Education, 36(5), 1722–1735.
- Gorunescu, F. (2011). Data Mining: Concepts, models and techniques (Vol. 12). Springer Science & Business Media. https://doi.org/10.1007/978-3-642-19721-5 DOI: https://doi.org/10.1007/978-3-642-19721-5
- Guler, S., Capkin, S., & Sezgin, E. A. (2021). The Evolution of Publications in the Field of Scoliosis: A Detailed Investigation of Global Scientific Output Using Bibliometric Approaches. Turkish Neurosurgery, 31(1). https://doi.org/10.5137/1019-5149.JTN.30216-20.2 DOI: https://doi.org/10.5137/1019-5149.JTN.30216-20.2
- Hamel, C., Hersi, M., Kelly, S. E., Tricco, A. C., Straus, S., Wells, G., Pham, B., & Hutton, B. (2021). Guidance for using artificial intelligence for title and abstract screening while conducting knowledge syntheses. BMC Medical Research Methodology, 21(1), 285. https://doi.org/10.1186/s12874-021-01451-2 DOI: https://doi.org/10.1186/s12874-021-01451-2
- Hamel, C., Kelly, S. E., Thavorn, K., Rice, D. B., Wells, G. A., & Hutton, B. (2020). An evaluation of DistillerSR’s machine learning-based prioritization tool for title/abstract screening - impact on reviewer-relevant outcomes. BMC Medical Research Methodology, 20(1), 256. https://doi.org/10.1186/s12874-020-01129-1 DOI: https://doi.org/10.1186/s12874-020-01129-1
- Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169 - 14211. https://doi.org/10.1007/s11042-018-6894-4 DOI: https://doi.org/10.1007/s11042-018-6894-4
- Jonnalagadda, S. R., Goyal, P., & Huffman, M. D. (2015). Automating data extraction in systematic reviews: A systematic review. Systematic Reviews, 4(1), 1 – 16. https://doi.org/10.1186/s13643-015-0066-7 DOI: https://doi.org/10.1186/s13643-015-0066-7
- Jurafsky, D., & Martin, J. H. (2008). Speech and Language Processing: An introduction to speech recognition, computational linguistics and natural language processing. Upper Saddle River, NJ: Prentice Hall.
- Kang, Z., Catal, C., & Tekinerdogan, B. (2020). Machine learning applications in production lines: A systematic literature review. Computers & Industrial Engineering, 149. https://doi.org/10.1016/j.cie.2020.106773 DOI: https://doi.org/10.1016/j.cie.2020.106773
- Khamparia, A., & Singh, K. M. (2019). A systematic review on deep learning architectures and applications. Expert Systems, 36(3). https://doi.org/10.1111/exsy.12400 DOI: https://doi.org/10.1111/exsy.12400
- Kherwa, P., & Bansal, P. (2018). Topic Modeling: A Comprehensive Review. ICST Transactions on Scalable Information Systems, 7(24). https://doi.org/10.4108/eai.13-7-2018.159623 DOI: https://doi.org/10.4108/eai.13-7-2018.159623
- Klymenko, O., Braun, D., & Matthes, F. (2020). Automatic Text Summarization: A State-of-the-Art Review. En Proceedings of the 22nd International Conference on Enterprise Information Systems. https://doi.org/10.5220/0009723306480655 DOI: https://doi.org/10.5220/0009723306480655
- Kowsari, K., Jafari-Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4). https://doi.org/10.3390/info10040150 DOI: https://doi.org/10.3390/info10040150
- Kumeno, F. (2020). Sofware engneering challenges for machine learning applications: A literature review. Intelligent Decision Technologies, 13(4), 463 – 476. https://doi.org/10.3233/idt-190160 DOI: https://doi.org/10.3233/IDT-190160
- Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2-3), 93–118. https://doi.org/10.1080/19312458.2018.1430754 DOI: https://doi.org/10.1080/19312458.2018.1430754
- Marín-López, J., Robledo, S., & Duque-Méndez, N. (2017). Marketing Emprendedor: Una perspectiva cronológica utilizando Tree of Science. Revista Civilizar De Empresa Y Economía, 7(13), 113-123.
- Marín-Velásquez, T. D., & Arrojas-Tocuyo, D. D. J. (2021). Revistas científicas de América Latina y el Caribe en SciELO, Scopus y Web of Science en el área de Ingeniería y Tecnología: su relación con variables socioeconómicas. Revista Española de Documentación Científica, 44(3). https://doi.org/10.3989/redc.2021.3.1812 DOI: https://doi.org/10.3989/redc.2021.3.1812
- Marshall, I. J., Johnson, B. T., Wang, Z., Rajasekaran, S., & Wallace, B. C. (2020). Semi-Automated evidence synthesis in health psychology: current methods and future prospects. Health Psychology Review, 14(1), 145–158. https://doi.org/10.1080/17437199.2020.1716198 DOI: https://doi.org/10.1080/17437199.2020.1716198
- Marshall, I. J., Kuiper, J., & Wallace, B. C. (2016). RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association: JAMIA, 23(1), 193–201. https://doi.org/10.1093/jamia/ocv044 DOI: https://doi.org/10.1093/jamia/ocv044
- Millard, T., Synnot, A., Elliott, J., Green, S., McDonald, S., & Turner, T. (2019). Feasibility and acceptability of living systematic reviews: results from a mixed-methods evaluation. Systematic Reviews, 8(1), 325. https://doi.org/10.1186/s13643-019-1248-5 DOI: https://doi.org/10.1186/s13643-019-1248-5
- Millán, J. D., Polanco, F., Ossa, J. C., Béria, J. S., & Cudina, J. N. (2017). La cienciometría, su método y su filosofía: Reflexiones epistémicas de sus alcances en el siglo XXI. Revista Guillermo de Ockham, 15(2), 17-27. https://doi.org/10.21500/22563202.3492 DOI: https://doi.org/10.21500/22563202.3492
- Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & PRISMA, G. (2014). Ítems de referencia para publicar revisiones sistemáticas y metaanálisis: la Declaración PRISMA. Revista Española de Nutrición Humana y Dietética, 18(3), 172-181. https://doi.org/10.14306/renhyd.18.3.114 DOI: https://doi.org/10.14306/renhyd.18.3.114
- O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., & Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic Reviews, 4(1), 1-22. https://doi.org/10.1186/2046-4053-4-5 DOI: https://doi.org/10.1186/2046-4053-4-5
- Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan-a web and mobile app for systematic reviews. Systematic Reviews, 5(1), 210. https://doi.org/10.1186/s13643-016-0384-4 DOI: https://doi.org/10.1186/s13643-016-0384-4
- Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 569-577. DOI: https://doi.org/10.1145/1401890.1401960
- Prabhakaran, S. (2018). Topic Modeling with Gensim (Python). Machine Learning Plus.
- Qiang, J., Qian, Z., Li, Y., Yuan, Y., & Wu, X. (2020). Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/tkde.2020.2992485 DOI: https://doi.org/10.1109/TKDE.2020.2992485
- Ramírez-Carvajal, D., Toro-Cardona, A., & Grisales-Aguirre, A. (2021). Competencias en networking: perspectivas desde una revisión literaria. Revista de Ingenierías Interfaces, 4(1), 103 -127.
- Ramos-Enríquez, V., Duque, P., & Salazar, J. A. V. (2021). Responsabilidad Social Corporativa y Emprendimiento: evolución y tendencias de investigación. Desarrollo Gerencial, 13(1), 1–34. https://doi.org/10.17081/dege.13.1.4210 DOI: https://doi.org/10.17081/dege.13.1.4210
- Rethlefsen, M. L., Kirtley, S., Waffenschmidt, S., Ayala, A. P., Moher, D., Page, M. J., & Koffel, J. B. (2021). PRISMA-S: an extension to the PRISMA statement for reporting literature searches in systematic reviews. Systematic Reviews, 10(1), 1-19. https://doi.org/10.1186/s13643-020-01542-z DOI: https://doi.org/10.5195/jmla.2021.962
- Robledo, S., Grisales-Aguirre, A. M., Hughes, M., & Eggers, F. (2021). “Hasta la vista, baby”–will machine learning terminate human literature reviews in entrepreneurship? Journal of Small Business Management, 1-30. https://doi.org/10.1080/00472778.2021.1955125 DOI: https://doi.org/10.1080/00472778.2021.1955125
- Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining (pp. 399-408). https://doi.org/10.1145/2684822.2685324 DOI: https://doi.org/10.1145/2684822.2685324
- Rodríguez-Jiménez, A., & Pérez-Jacinto, A. O. (2017). Métodos científicos de indagación y de construcción del conocimiento. Revista Escuela de Administración de negocios, (82), 175-195. https://doi.org/10.21158/01208160.n82.2017.1647 DOI: https://doi.org/10.21158/01208160.n82.2017.1647
- Sangwan, N., & Bhatnagar, V. (2020). Comprehensive Contemplation of Probabilistic Aspects in Intelligent Analytics. International Journal of Service Science, Management, Engineering and Technology (IJSSMET), 11(1), 116–141. https://doi.org/10.4018/IJSSMET.2020010108 DOI: https://doi.org/10.4018/IJSSMET.2020010108
- Sami, I. R. (2020). Automatic Contextual Storytelling in a Natural Language Corpus. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 3249-3252). https://doi.org/10.1145/3340531.3418507 DOI: https://doi.org/10.1145/3340531.3418507
- Sutton, A., & Marshall, C. (2017). Mapping The Systematic Review Toolbox. Value in Health, 20(9). https://doi.org/10.1016/j.jval.2017.08.2232 DOI: https://doi.org/10.1016/j.jval.2017.08.2232
- Soboczenski, F., Trikalinos, T. A., Kuiper, J., Bias, R. G., Wallace, B. C., & Marshall, I. J. (2019). Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study. BMC Medical Informatics and Decision Making, 19(1), 96. https://doi.org/10.1186/s12911-019-0814-z DOI: https://doi.org/10.1186/s12911-019-0814-z
- Tighe, P. J., Sannapaneni, B., Fillingim, R. B., Doyle, C., Kent, M., Shickel, B., & Rashidi, P. (2020). Forty-two Million Ways to Describe Pain: Topic Modeling of 200,000 PubMed Pain-Related Abstracts Using Natural Language Processing and Deep Learning-Based Text Generation. Pain Medicine , 21(11), 3133–3160. https://doi.org/10.1093/pm/pnaa061 DOI: https://doi.org/10.1093/pm/pnaa061
- Tranfield, D., Denyer, D., & Smart, P. (2003). Towards a methodology for developing evidence‐informed management knowledge by means of systematic review. British Journal of Management, 14(3), 207-222. https://doi.org/10.1111/1467-8551.00375 DOI: https://doi.org/10.1111/1467-8551.00375
- Tsou, A. Y., Treadwell, J. R., Erinoff, E., & Schoelles, K. (2020). Machine learning for screening prioritization in systematic reviews: comparative performance of Abstrackr and EPPI-Reviewer. Systematic Reviews, 9(1), 73. https://doi.org/10.1186/s13643-020-01324-7 DOI: https://doi.org/10.1186/s13643-020-01324-7
- Urrútia, G., & Bonfill, X. (2010). Declaración PRISMA: una propuesta para mejorar la publicación de revisiones sistemáticas y metaanálisis. Medicina clínica, 135(11), 507-511. https://doi.org/10.1016/j.medcli.2010.01.015 DOI: https://doi.org/10.1016/j.medcli.2010.01.015
- Valencia-Hernández, D. S., Robledo, S., Pinilla, R., Duque-Méndez, N. D., & Olivar-Tost, G. (2020). Sap algorithm for citation analysis: An improvement to tree of science. Ingeniería e Investigación, 40(1). https://doi.org/10.15446/ing.investig.v40n1.77718 DOI: https://doi.org/10.15446/ing.investig.v40n1.77718
- Vinkers, C. H., Lamberink, H. J., Tijdink, J. K., Heus, P., Bouter, L., Glasziou, P., Moher, D., Damen, J. A., Hooft, L., & Otte, W. M. (2021). The methodological quality of 176,620 randomized controlled trials published between 1966 and 2018 reveals a positive trend but also an urgent need for improvement. PLoS Biology, 19(4), e3001162. https://doi.org/10.1371/journal.pbio.3001162 DOI: https://doi.org/10.1371/journal.pbio.3001162
- Waffenschmidt, S., Hausner, E., Sieben, W., Jaschinski, T., Knelangen, M., & Overesch, I. (2018). Effective study selection using text mining or a single-screening approach: a study protocol. Systematic Reviews, 7(1), 166. https://doi.org/10.1186/s13643-018-0839-x DOI: https://doi.org/10.1186/s13643-018-0839-x
- Walker, V. R., Schmitt, C. P., Wolfe, M. S., Nowak, A. J., Kulesza, K., Williams, A. R., Shin, R., Cohen, J., Burch, D., Stout, M. D., Shipkowski, K. A., & Rooney, A. A. (2022). Evaluation of a semi-automated data extraction tool for public health literature-based reviews: Dextr. Environment International, 159, 107025. https://doi.org/10.1016/j.envint.2021.107025 DOI: https://doi.org/10.1016/j.envint.2021.107025
- Wallace, B. C. (2018). Automating biomedical evidence synthesis: Recent work and directions forward. BIRNDL@ SIGIR. https://openreview.net/pdf?id=Hkby3SWO-B
- Wang, C., Paisley, J., & Blei, D. (2011). Online variational inference for the hierarchical Dirichlet process. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics 752-760. JMLR Workshop and Conference Proceedings.
- Wei, J., Han, S., & Zou, L. (2020). Vision-kg: Topic-centric visualization system for summarizing knowledge graph. In Proceedings of the 13th International Conference on Web Search and Data Mining, 857-860. https://doi.org/10.1145/3336191.3371863 DOI: https://doi.org/10.1145/3336191.3371863
- Weißer, T., Saßmannshausen, T., Ohrndorf, D., Burggräf, P., & Wagner, J. (2020). A clustering approach for topic filtering within systematic literature reviews. MethodsX, 7, 100831. https://doi.org/10.1016/j.mex.2020.100831 DOI: https://doi.org/10.1016/j.mex.2020.100831
- Xie, T., Qin, P., & Zhu, L. (2018). Study on the Topic Mining and Dynamic Visualization in View of LDA Model. Modern Applied Science, 13(1), 204. https://doi.org/10.5539/mas.v13n1p204 DOI: https://doi.org/10.5539/mas.v13n1p204
- Zhang, C., Li, Z., & Zhang, J. (2018). A survey on visualization for scientific literature topics. Journal of Visualization, 21(2), 321-335. https://doi.org/10.1007/s12650-017-0462-2 DOI: https://doi.org/10.1007/s12650-017-0462-2
- Zhao, H., Phung, D., Huynh, V., Jin, Y., Du, L., & Buntine, W. (2021). Topic Modelling Meets Deep Neural Networks: A Survey. https://doi.org/10.24963/ijcai.2021/638 DOI: https://doi.org/10.24963/ijcai.2021/638
- Zimmerman, J., Soler, R. E., Lavinder, J., Murphy, S., Atkins, C., Hulbert, L., Lusk, R., & Ng, B. P. (2021). Iterative guided machine learning-assisted systematic literature reviews: a diabetes case study. Systematic Reviews, 10(1), 97. https://doi.org/10.1186/s13643-021-01640-6 DOI: https://doi.org/10.1186/s13643-021-01640-6
- Zuluaga, M., Robledo, S., Osorio-Zuluaga, G. A., Yathe, L., Gonzalez, D., & Taborda, G. (2016). Metabolomics and pesticides: systematic literature review using graph theory for analysis of references. Nova, 14(25), 121-138. https://doi.org/10.22490/24629448.1735 DOI: https://doi.org/10.22490/24629448.1735