Modelado y gestión de bases de datos NoSQL: Revisión sistemática
Resumen
Las bases de datos NoSQL que surgieron este siglo fueron creadas para resolver las limitaciones de los sistemas de bases de datos relacionales debido a los diferentes tipos de datos que han aparecido para el procesamiento de la información. En este artículo, presentamos los resultados de un estudio secundario realizado con el fin de encontrar y sintetizar la investigación realizada hasta ahora sobre procesos de modelado, características de los tipos de datos utilizados, y herramientas de gestión para bases de datos NoSQL. Actualmente, se reconocen y clasifican cuatro tipos según el modelo de datos que utilizan: clave-valor, orientada a documentos, basada en columnas y basada en gráficos. Con este estudio se identificó que el tipo de modelo de base de datos NoSQL más frecuente es el de documentos porque ofrece una mayor flexibilidad y versatilidad en comparación con los otros tres modelos. Aunque ofrecen métodos de búsqueda más complejos, en términos de datos, los esquemas de columnas y documentos son los que suelen describir sus características. También se pudo observar una tendencia en el uso del modelo orientado a columnas y el modelo orientado a documentos en las herramientas de gestión, y, aunque todas cumplen con las funcionalidades básicas, las diferencias radican en la forma en que se almacena la información y la forma en que se puede acceder a ellas.
Palabras clave
ingeniería de software, modelado de bases de datos, NoSQL, revisión sistemática de la literatura
Biografía del autor/a
Omar Gómez Gómez
Professor at the Higher Polytechnic School of Chimborazo. PhD from the Polytechnic University of Madrid.
Citas
- C. Coronel, S. Morris, P. Rob, Base de datos: diseño, implementación y administración, Cengage Learning Editores, 2011.
- E. Codd, "A Relational Model of Data for Large Shared Data Banks", Communications of the ACM, vol. 13, no. 6, pp. 377-387, 1970. https://doi.org/10.1145/357980.358007
- P.J. Sadalage, M. Fowler, NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, Addison-Wesley Professional, 2012.
- P. Neubauer, NOSQL and Neo4j. https://www.scitepress.org/Papers/2017/63560/63560.pdf
- R. Cattell, "Scalable SQL and NoSQL data stores," ACM Sigmod Record, vol. 39, no. 4, pp. 12-27, 2011. https://doi.org/10.1145/1978915.1978919
- D. McCreary, A. Kelly, Making sense of NoSQL: A guide for managers and the rest of us, Manning, 2013.
- J. Browne, Brewer's CAP Theorem, 2009. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
- L. George, HBase-The Definitive Guide: Random Access to Your Planet-Size Data, O’Reilly Media, 2011.
- A. Nayak, A. Poriya, D. Poojary, "Type of NOSQL databases and its comparison with relational databases," International Journal of Applied Information Systems, vol. 5, no. 4, pp. 16-19, 2013.
- N. Roy-Hubara, A. Sturm, "Design methods for the new database era: a systematic literature review," Software and Systems Modeling, vol. 19, pp. 297-312, 2019. https://doi.org/10.1007/s10270-019-00739-8
- S. Ramzan, I. Bajwa, R. Kazmi, "Challenges in NoSQL-Based Distributed Data Storage: A Systematic Literature Review," Electronics, vol. 8, no. 5, e488, 2019. https://doi.org/10.3390/electronics8050488
- C. Zdepski, A. Bini, S. Matos, "New Perspectives for NoSQL Database Design: A Systematic Review," American Academic Scientific Research Journal for Engineering, Technology, and Sciences, vol. 68, no. 1, pp. 50-62, 2020.
- F. Mostajabi, A. Safaei. A. Sahafi, "A Systematic Review of Data Models for the Big Data Problem," IEEE Access, vol. 9, pp. 128889-128904, 2021. https://doi.org/10.1109/ACCESS.2021.3112880
- M. Genero, J. Cruz, M. Piattini, Métodos de Investigación en Ingeniería de Software, Editorial Ra-Ma, 2014.
- B. Kitchenham, “Procedures for performing systematic reviews,” Keele, vol. 33, p. 28, 2004.
- C. Wohlin, "Guidelines for snowballing in systematic literature studies and a replication in software engineering," in Proceedings of the 18th international conference on evaluation and assessment in software engineering, 2014. https://doi.org/10.1145/2601248.2601268
- T. Dybå, T. Dingsøyr, "Strength of evidence in systematic reviews in software engineering," in Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement, 2008. https://doi.org/10.1145/1414004.1414034
- L. Yang, H. Zhang, H. Shen, X. Huang, X. Zhou, G. Rong, D. Shao, "Quality Assessment in Systematic Literature Reviews: A Software Engineering Perspective," Information and Software Technology, vol. 130, e106397, 2021. https://doi.org/10.1016/j.infsof.2020.106397
- M. Ivarsson, T. Gorschek, "A method for evaluating rigor and industrial relevance of technology evaluations," Empirical Software Engineering, vol. 16, pp. 365-395, 2020. https://doi.org/10.1007/s10664-010-9146-4
- S. Ramzan, I. Bajwa, B. Ramzan, W. Anwar, "Intelligent Data Engineering for Migration to NoSQL Based Secure Environments," IEEE Access, vol. 7, pp. 69042-69057, 2019. https://doi.org/10.1109/ACCESS.2019.2916912
- C. Fernández, D. Sevilla, J. García-Molina, "A Unified Metamodel for NoSQL and Relational Databases,” Information Systems, vol. 104, e101898, 2022. https://doi.org/10.1016/j.is.2021.101898
- A. Frozza, R. Mello, R. "JS4Geo: a canonical JSON Schema for geographic data suitable to NoSQL databases," GeoInformatica, vol. 24, no. 4, pp. 1-33, 2020. https://doi.org/10.1007/s10707-020-00415-w
- R. Sellami, S. Bhiri, B. Defude, “Supporting Multi Data Stores Applications in Cloud,” IEEE Transactions on Services Computing, vol. 9, pp. 59-71, 2016. https://doi.org/10.1109/TSC.2015.2441703
- P. Atzeni, F. Bugiotti, L. Cabibbo, R. Torlone, "Data modeling in the NoSQL world," Computer Standards & Interfaces, vol. 67, e003, 2020. https://doi.org/10.1016/j.csi.2016.10.003
- M. Eshtay, A. Sleit, M. Aldwairi, "Implementing Bi-Temporal Properties into Various NoSQL Database Categories," International Journal of Computing, vol. 18, no. 1, pp. 45-52, 2019. https://doi.org/10.47839/ijc.18.1.1272
- H. Shim, "PHash: A memory-efficient, high-performance key-value store for large-scale data-intensive applications," Journal of Systems and Software, vol. 123, pp. 33-44, 2017. https://doi.org/10.1016/j.jss.2016.09.047
- N. Roy-Hubara, P. Shoval, A. Sturm, "Selecting databases for Polyglot Persistence applications," Data & Knowledge Engineering, vol 137, e101950, 2021. https://doi.org/10.1016/j.datak.2021.101950
- Z. Lv, X. Li, H. Lv, W. Xiu, "BIM Big Data Storage in WebVRGIS," IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2566-2573, 2019. https://doi.org/10.1109/TII.2019.2916689
- H. Yong, S. Dessloch, "Extracting deltas from column-oriented NoSQL databases for different incremental applications and diverse data targets," Data & Knowledge Engineering, vol. 93, pp. 42-59, 2014. https://doi.org/10.1016/j.datak.2014.07.002
- D. Zhang, Y. Wang, Z. Liu, S. Dai, "Improving NoSQL Storage Schema Based on Z-Curve for Spatial Vector Data," IEEE Access, vol. 7, pp. 78817-78829, 2019. https://doi.org/10.1109/ACCESS.2019.2922693
- X. Chai, Q. Wang, W. Chen, W. Wang, D. Wang, Y. Li, "Research on a Distributed Processing Model Based on Kafka for Large-Scale Seismic Waveform Data," IEEE Access, vol. 8, pp. 39971-39981, 2020. https://doi.org/10.1109/ACCESS.2020.2976660
- J. Song, H. He, R. Thomas, Y. Bao, G. Yu, "Haery: A Hadoop Based Query System on Accumulative and High-Dimensional Data Model for Big Data," IEEE Transactions on Knowledge and Data Engineering, vol. 32, pp. 1362-1377, 2020. https://doi.org/10.1109/TKDE.2019.2904056
- R. Ouanouki, A. April, A. Abran, A. Gomez, J. Desharnais, "Toward building RDB to HBase conversion rules," Journal of Big Data, vol. 4, no. 1, pp. 1-21, 2017. https://doi.org/10.1186/s40537-017-0071-x
- L. Bao, J. Yang, C.Q. Wu, H. Qi, X. Zhang, S. Cai, "XML2HBase: Storing and querying large collections of XML documents using a NoSQL database system," Journal of Parallel and Distributed Computing, vol. 161, pp. 83-99, 2022. https://doi.org/10.1016/j.jpdc.2021.11.003
- M. Mozaffari, E. Nazemi, A. Eftekhari-Moghadam, "CONST: Continuous online NoSQL schema tuning," Software: Practice and Experience, vol. 51, no. 5, pp. 1147-1169, 2021. https://doi.org/10.1002/spe.2945
- M. Mior, K. Salem, A. Aboulnaga, R. Liu, "NoSE: Schema Design for NoSQL Applications," IEEE Transactions on Knowledge and Data Engineering, vol. 29, pp. 2275-2289, 2017. https://doi.org/10.1109/TKDE.2017.2722412
- A. De la Vega, D. García-Saiz, C. Blanco, M. Zorrilla. P. Sánchez, "Mortadelo: Automatic generation of NoSQL stores from platform-independent data models," Future Generation Computer Systems, vol. 105, pp. 455-474, 2020. https://doi.org/10.1016/j.future.2019.11.032
- C. Zdepski, A. Bini, S. Matos, “PDDM: A Database Design Method for Polyglot Persistence," American Academic Scientific Research Journal for Engineering, Technology, and Sciences, vol. 71, pp. 136-152, 2020.
- M. Ansari, V. Vakili, B. Bahrak, "Evaluation of big data frameworks for analysis of smart grids." Journal of Big Data, vol. 6, pp. 1-14, 2019. https://doi.org/10.1186/s40537-019-0270-8
- S. Sengupta, S., Bhunia, “Secure Data Management in Cloudlet Assisted IoT Enabled e-Health Framework in Smart City," IEEE Sensors Journal, vol. 20, pp. 9581-9588, 2020. https://doi.org/10.1109/JSEN.2020.2988723
- H. Kim, E. Ko, Y. Jeon, K. Lee, "Techniques and guidelines for effective migration from RDBMS to NoSQL," The Journal of Supercomputing, vol. 76, no. 10, pp. 7936-7950, 2018. https://doi.org/10.1007/s11227-018-2361-2
- A. Turk, R. Selvitopi, H. Ferhatosmanoğlu, C. Aykanat, "Temporal Workload-Aware Replicated Partitioning for Social Networks," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 11, pp. 2832-2845, 2014. https://doi.org/10.1109/TKDE.2014.2302291
- G. Baruffa, M. Femminella, M. Pergoles, G. Reali, "Comparison of MongoDB and Cassandra Databases for Spectrum Monitoring As-a-Service," IEEE Transactions on Network and Service Management, vol. 17, no. 1, pp. 346-360, 2019. https://doi.org/10.1109/TNSM.2019.2942475
- A. Hernández, D. Sevilla, J. García, S. Feliciano, "A Model-Driven Approach to Generate Schemas for Object-Document Mappers," IEEE Access, vol 7, pp. 59126-59142, 2019. https://doi.org/10.1109/ACCESS.2019.2915201
- P. Gómez, C. Roncancio, R. Casallas, "Analysis and evaluation of document-oriented structures," Data & Knowledge Engineering, vol. 134, e101893, 2021. https://doi.org/10.1016/j.datak.2021.101893
- E. Kuszera, L. Peres, M. Fabro, "Exploring data structure alternatives in the RDB to NoSQL document store conversion process," Information Systems, vol. 105, e101941, 2021. https://doi.org/10.1016/j.is.2021.101941
- G. Nys, R. Billen, "From consistency to flexibility: A simplified database schema for the management of CityJSON 3D city models," Transactions in GIS, vol. 25, no. 6, pp. 3048-3066, 2021. https://doi.org/10.1111/tgis.12807
- A. Maté, J. Peral, J. Trujillo, C. Blanco, D. García-Saiz, E. Fernández-Molina, " Improving security in NoSQL document databases through model-driven modernization," Knowledge and Information Systems, vol. 63, no. 8, pp. 2209-2230, 2021. https://doi.org/10.1007/s10115-021-01589-x
- S. Banerjee, A. Sarkar, "Ontology Driven Meta-Modeling for NoSQL Databases: A Conceptual Perspective," International Journal of Software Engineering and its Applications, vol. 10, no. 12, pp. 41-64, 2016. https://doi.org/10.14257/ijseia.2016.10.12.05
- C. Zdepski, A. Bini, S. Matos, "PDDM: A Database Design Method for Polyglot Persistence," American Academic Scientific Research Journal for Engineering, Technology, and Sciences, vol. 71, no. 1, pp. 136-152, 2020.
- A. Imam, S. Basri, R. Ahmad, A. Wahab, M. González-Aparicio, L. Capretz, A. Alazzawi, A. Balogun, "DSP: Schema Design for Non-Relational Applications," Symmetry, vol. 12, no. 11, e1799, 2020. https://doi.org/10.3390/sym12111799
- A. Hernández, J. Hoyos, J. García, D. Sevilla, "Discovering entity inheritance relationships in document stores," Knowledge-Based Systems, vol. 230, e107394, 2021. https://doi.org/10.1016/j.knosys.2021.107394
- I. Al Jawarneh, P. Bllavista, A. Corradi, L. Foschini, R. Montanari, "Efficient QoS-Aware Spatial Join Processing for Scalable NoSQL Storge Frameworks," IEEE Transactions on Network and Service Management, vol. 18, no. 2, pp. 2437-2449, 2020. https://doi.org/10.1109/TNSM.2020.3034150
- S. Sutedi, N. Setiawan, T. Adji, "Enhanced Graph Transforming V2 Algorithm for Non-Simple Graph in Big Data Pre-Processing," IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 1, pp. 67-77, 2020. https://doi.org/10.1109/TKDE.2018.2880971
- N. Mehmood, R. Culmone, L. Mostarda, "Modeling temporal aspects of sensor data for MongoDB NoSQL database," Journal of Big Data, vol. 4, no. 1, pp. 1-35, 2017. https://doi.org/10.1186/s40537-017-0068-5
- B. Namdeo, U. Suman, "Schema design advisor model for RDBMS to NoSQL database migration," International Journal of Information Technology, vol. 13, no. 1, pp. 277-286, 2020. https://doi.org/10.1007/s41870-020-00515-8
- B. Khalfi, C. De Runz, S. Faiz, H. Akdag, "A New Methodology for Storing Consistent Fuzzy Geospatial Data in Big Data Environment," IEEE Transactions on Big Data, vol. 7, no. 2, pp. 468-482, 2021. https://doi.org/10.1109/TBDATA.2017.2725904
- A. Sveen, "Efficient storage of heterogeneous geospatial data in spatial databases," Journal of Big Data, vol. 6, no. 1, pp. 1-14, 2019. https://doi.org/10.1186/s40537-019-0262-8
- M. Min, "Modeling and Implementation of Public Open Data in NoSQL Database," International Journal of Internet, Broadcasting and Communication, vol. 10, no. 3, pp. 51-58, 2018. https://doi.org/10.7236/IJIBC.2018.10.3.51
- K. Baker, P. Roehsner, T. Lake, D. Rivet, S. Benston, B. Bommersbach, W. Kirk, "Point-trained models in a grid environment: Transforming a potato late blight risk forecast for use with the National Digital Forecast Database," Computers and Electronics in Agriculture, vol. 105, pp. 1-8, 2014. https://doi.org/10.1016/j.compag.2014.04.002
- M. Hewasinghage, A. Abelló, J. Varga, E. Zimányi, "A cost model for random access queries in document stores," The VLDB Journal, vol. 30, no. 4, pp. 559-578, 2021. https://doi.org/10.1007/s00778-021-00660-x
- Y. Cheng, K. Zhou, J. Wang, P. D. Maeyer, T. V. Voorde, J. Yan, S. Cui, "A Comprehensive Study of Geochemical Data Storage Performance Based on Different Management Methods," Remote Sensing, vol. 13, no. 6, e3208, 2021. https://doi.org/10.3390/rs13163208
- A. Charef, B. Abdelkader, "Towards NoSQL-based Data Warehouse Solution integrating ECDIS for Maritime Navigation Decision Support System," Informatica, vol. 45, no. 3, e3204, 2021. https://doi.org/10.31449/inf.v45i3.3204
- E. Damiani, B. Oliboni, E. Quintarelli, L. Tanca, L. "A graph-based meta-model for heterogeneous data management," Knowledge and Information Systems, vol. 6, no. 1, pp. 107-136, 2018. https://doi.org/10.1007/s10115-018-1305-8
- M. Hewasinghage, J. Varga, A. Abelló, E. Zimányi, "Managing Polyglot Systems Metadata with Hypergraphs," Data & Knowledge Engineering, vol. 134, e101896, 2021. https://doi.org/10.1016/j.datak.2021.101896
- M. Sokolova, F. Gómez, L. Borisoglebskaya, "Migration from an SQL to a hybrid SQL/NoSQL data model," Journal of Management Analytics, vol. 7, no. 1, pp. 1-11, 2020. https://doi.org/10.1080/23270012.2019.1700401
- G. Demirci, H. Ferhatosmanoğlu, C. Aykanat, "Cascade-aware partitioning of large graph databases," The VLDB Journal, vol. 28, no. 3, pp. 329-350, 2019. https://doi.org/10.1007/s00778-018-0531-8
- C. Küçükkeçeci, A. Yazıcı, "Big Data Model Simulation on a Graph Database for Surveillance in Wireless Multimedia Sensor Networks," Big Data Research, vol. 11, pp. 33-43, 2018. https://doi.org/10.1016/j.bdr.2017.09.003
- Y. Hu, V. Gunapati, P. Zhao, D. Gordon, N. Wheeler, M. Hossain, T.LJ. Peshek, L. Bruckman, G. Zhang, R. French, "A Nonrelational Data Warehouse for the Analysis of Field and Laboratory Data from Multiple Heterogeneous Photovoltaic Test Sites," IEEE Journal of Photovoltaics, vol. 7, no. 1, pp. 230-236, 2017. https://doi.org/10.1109/JPHOTOV.2016.2626919
- C. Li, Q. Zhang, P. He, Z. Wang, L. Chen, "An agricultural data storage mechanism based on HBase," International Journal of Information and Communication Technology, vol. 14, no. 4, pp. 456-469, 2019. https://doi.org/10.1504/IJICT.2019.101864
- K. Santhiya, V. Bhuvaneswari, "An Automated MapReduce Framework for Crime Classification of News Articles Using MongoDB, " International Journal of Applied Engineering Research, vol. 13, no. 1, pp. 131-136, 2018.
- J. Zeng, B. Plale, "Argus: A Multi-tenancy NoSQL store with workload-aware resource reservation," Parallel Computing, vol. 58, pp. 76-89, 2016. https://doi.org/10.1016/j.parco.2016.06.003
- J. Yoon, D. Jeong, C. Kang, S. Lee, "Forensic investigation framework for the document store NoSQL DBMS: MongoDB as a case study," Digital Investigation, vol. 17, pp. 53-65, 2016. https://doi.org/10.1016/j.diin.2016.03.003
- H. Asri, H. Mousannif,, H. Moatassime, "Reality mining and predictive analytics for building smart applications," Journal of Big Data, vol. 6, pp. 1-25, 2019. https://doi.org/10.1186/s40537-019-0227-y