NoSQL Database Modeling and Management: A Systematic Literature Review
Abstract
The NoSQL databases that emerged this century were created to solve the limitations of relational database systems due to the different types of data that have appeared for information processing. In this paper, we present the results of a secondary study carried out to find and synthesize the research made up to now on modeling processes, characteristics of the used types of data, and management tools for NoSQL Databases. Currently, four types are recognized and classified according to the data model they use: key-value, document-oriented, column-based, and graph-based. With this study, it was possible to identify that the most frequently type of NoSQL database model is that of documents because it offers greater flexibility and versatility compared to the other three models. Although it offers more complex search methods, in terms of data, column and document schemas are the ones that usually describe their characteristics. It was also possible to observe a trend in the use of the column-oriented model and the document-oriented model in the management tools, and, although they all comply with the basic functionalities, the differences lie in the way in which the information is stored and the way they can be accessed.
Keywords
NoSQL, Database Modeling, Review Systematic Literature, Software Engineering
Author Biography
Omar Gómez Gómez
Professor at the Higher Polytechnic School of Chimborazo. PhD from the Polytechnic University of Madrid.
References
- C. Coronel, S. Morris, P. Rob, Base de datos: diseño, implementación y administración, Cengage Learning Editores, 2011.
- E. Codd, "A Relational Model of Data for Large Shared Data Banks", Communications of the ACM, vol. 13, no. 6, pp. 377-387, 1970. https://doi.org/10.1145/357980.358007
- P.J. Sadalage, M. Fowler, NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, Addison-Wesley Professional, 2012.
- P. Neubauer, NOSQL and Neo4j. https://www.scitepress.org/Papers/2017/63560/63560.pdf
- R. Cattell, "Scalable SQL and NoSQL data stores," ACM Sigmod Record, vol. 39, no. 4, pp. 12-27, 2011. https://doi.org/10.1145/1978915.1978919
- D. McCreary, A. Kelly, Making sense of NoSQL: A guide for managers and the rest of us, Manning, 2013.
- J. Browne, Brewer's CAP Theorem, 2009. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
- L. George, HBase-The Definitive Guide: Random Access to Your Planet-Size Data, O’Reilly Media, 2011.
- A. Nayak, A. Poriya, D. Poojary, "Type of NOSQL databases and its comparison with relational databases," International Journal of Applied Information Systems, vol. 5, no. 4, pp. 16-19, 2013.
- N. Roy-Hubara, A. Sturm, "Design methods for the new database era: a systematic literature review," Software and Systems Modeling, vol. 19, pp. 297-312, 2019. https://doi.org/10.1007/s10270-019-00739-8
- S. Ramzan, I. Bajwa, R. Kazmi, "Challenges in NoSQL-Based Distributed Data Storage: A Systematic Literature Review," Electronics, vol. 8, no. 5, e488, 2019. https://doi.org/10.3390/electronics8050488
- C. Zdepski, A. Bini, S. Matos, "New Perspectives for NoSQL Database Design: A Systematic Review," American Academic Scientific Research Journal for Engineering, Technology, and Sciences, vol. 68, no. 1, pp. 50-62, 2020.
- F. Mostajabi, A. Safaei. A. Sahafi, "A Systematic Review of Data Models for the Big Data Problem," IEEE Access, vol. 9, pp. 128889-128904, 2021. https://doi.org/10.1109/ACCESS.2021.3112880
- M. Genero, J. Cruz, M. Piattini, Métodos de Investigación en Ingeniería de Software, Editorial Ra-Ma, 2014.
- B. Kitchenham, “Procedures for performing systematic reviews,” Keele, vol. 33, p. 28, 2004.
- C. Wohlin, "Guidelines for snowballing in systematic literature studies and a replication in software engineering," in Proceedings of the 18th international conference on evaluation and assessment in software engineering, 2014. https://doi.org/10.1145/2601248.2601268
- T. Dybå, T. Dingsøyr, "Strength of evidence in systematic reviews in software engineering," in Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement, 2008. https://doi.org/10.1145/1414004.1414034
- L. Yang, H. Zhang, H. Shen, X. Huang, X. Zhou, G. Rong, D. Shao, "Quality Assessment in Systematic Literature Reviews: A Software Engineering Perspective," Information and Software Technology, vol. 130, e106397, 2021. https://doi.org/10.1016/j.infsof.2020.106397
- M. Ivarsson, T. Gorschek, "A method for evaluating rigor and industrial relevance of technology evaluations," Empirical Software Engineering, vol. 16, pp. 365-395, 2020. https://doi.org/10.1007/s10664-010-9146-4
- S. Ramzan, I. Bajwa, B. Ramzan, W. Anwar, "Intelligent Data Engineering for Migration to NoSQL Based Secure Environments," IEEE Access, vol. 7, pp. 69042-69057, 2019. https://doi.org/10.1109/ACCESS.2019.2916912
- C. Fernández, D. Sevilla, J. García-Molina, "A Unified Metamodel for NoSQL and Relational Databases,” Information Systems, vol. 104, e101898, 2022. https://doi.org/10.1016/j.is.2021.101898
- A. Frozza, R. Mello, R. "JS4Geo: a canonical JSON Schema for geographic data suitable to NoSQL databases," GeoInformatica, vol. 24, no. 4, pp. 1-33, 2020. https://doi.org/10.1007/s10707-020-00415-w
- R. Sellami, S. Bhiri, B. Defude, “Supporting Multi Data Stores Applications in Cloud,” IEEE Transactions on Services Computing, vol. 9, pp. 59-71, 2016. https://doi.org/10.1109/TSC.2015.2441703
- P. Atzeni, F. Bugiotti, L. Cabibbo, R. Torlone, "Data modeling in the NoSQL world," Computer Standards & Interfaces, vol. 67, e003, 2020. https://doi.org/10.1016/j.csi.2016.10.003
- M. Eshtay, A. Sleit, M. Aldwairi, "Implementing Bi-Temporal Properties into Various NoSQL Database Categories," International Journal of Computing, vol. 18, no. 1, pp. 45-52, 2019. https://doi.org/10.47839/ijc.18.1.1272
- H. Shim, "PHash: A memory-efficient, high-performance key-value store for large-scale data-intensive applications," Journal of Systems and Software, vol. 123, pp. 33-44, 2017. https://doi.org/10.1016/j.jss.2016.09.047
- N. Roy-Hubara, P. Shoval, A. Sturm, "Selecting databases for Polyglot Persistence applications," Data & Knowledge Engineering, vol 137, e101950, 2021. https://doi.org/10.1016/j.datak.2021.101950
- Z. Lv, X. Li, H. Lv, W. Xiu, "BIM Big Data Storage in WebVRGIS," IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2566-2573, 2019. https://doi.org/10.1109/TII.2019.2916689
- H. Yong, S. Dessloch, "Extracting deltas from column-oriented NoSQL databases for different incremental applications and diverse data targets," Data & Knowledge Engineering, vol. 93, pp. 42-59, 2014. https://doi.org/10.1016/j.datak.2014.07.002
- D. Zhang, Y. Wang, Z. Liu, S. Dai, "Improving NoSQL Storage Schema Based on Z-Curve for Spatial Vector Data," IEEE Access, vol. 7, pp. 78817-78829, 2019. https://doi.org/10.1109/ACCESS.2019.2922693
- X. Chai, Q. Wang, W. Chen, W. Wang, D. Wang, Y. Li, "Research on a Distributed Processing Model Based on Kafka for Large-Scale Seismic Waveform Data," IEEE Access, vol. 8, pp. 39971-39981, 2020. https://doi.org/10.1109/ACCESS.2020.2976660
- J. Song, H. He, R. Thomas, Y. Bao, G. Yu, "Haery: A Hadoop Based Query System on Accumulative and High-Dimensional Data Model for Big Data," IEEE Transactions on Knowledge and Data Engineering, vol. 32, pp. 1362-1377, 2020. https://doi.org/10.1109/TKDE.2019.2904056
- R. Ouanouki, A. April, A. Abran, A. Gomez, J. Desharnais, "Toward building RDB to HBase conversion rules," Journal of Big Data, vol. 4, no. 1, pp. 1-21, 2017. https://doi.org/10.1186/s40537-017-0071-x
- L. Bao, J. Yang, C.Q. Wu, H. Qi, X. Zhang, S. Cai, "XML2HBase: Storing and querying large collections of XML documents using a NoSQL database system," Journal of Parallel and Distributed Computing, vol. 161, pp. 83-99, 2022. https://doi.org/10.1016/j.jpdc.2021.11.003
- M. Mozaffari, E. Nazemi, A. Eftekhari-Moghadam, "CONST: Continuous online NoSQL schema tuning," Software: Practice and Experience, vol. 51, no. 5, pp. 1147-1169, 2021. https://doi.org/10.1002/spe.2945
- M. Mior, K. Salem, A. Aboulnaga, R. Liu, "NoSE: Schema Design for NoSQL Applications," IEEE Transactions on Knowledge and Data Engineering, vol. 29, pp. 2275-2289, 2017. https://doi.org/10.1109/TKDE.2017.2722412
- A. De la Vega, D. García-Saiz, C. Blanco, M. Zorrilla. P. Sánchez, "Mortadelo: Automatic generation of NoSQL stores from platform-independent data models," Future Generation Computer Systems, vol. 105, pp. 455-474, 2020. https://doi.org/10.1016/j.future.2019.11.032
- C. Zdepski, A. Bini, S. Matos, “PDDM: A Database Design Method for Polyglot Persistence," American Academic Scientific Research Journal for Engineering, Technology, and Sciences, vol. 71, pp. 136-152, 2020.
- M. Ansari, V. Vakili, B. Bahrak, "Evaluation of big data frameworks for analysis of smart grids." Journal of Big Data, vol. 6, pp. 1-14, 2019. https://doi.org/10.1186/s40537-019-0270-8
- S. Sengupta, S., Bhunia, “Secure Data Management in Cloudlet Assisted IoT Enabled e-Health Framework in Smart City," IEEE Sensors Journal, vol. 20, pp. 9581-9588, 2020. https://doi.org/10.1109/JSEN.2020.2988723
- H. Kim, E. Ko, Y. Jeon, K. Lee, "Techniques and guidelines for effective migration from RDBMS to NoSQL," The Journal of Supercomputing, vol. 76, no. 10, pp. 7936-7950, 2018. https://doi.org/10.1007/s11227-018-2361-2
- A. Turk, R. Selvitopi, H. Ferhatosmanoğlu, C. Aykanat, "Temporal Workload-Aware Replicated Partitioning for Social Networks," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 11, pp. 2832-2845, 2014. https://doi.org/10.1109/TKDE.2014.2302291
- G. Baruffa, M. Femminella, M. Pergoles, G. Reali, "Comparison of MongoDB and Cassandra Databases for Spectrum Monitoring As-a-Service," IEEE Transactions on Network and Service Management, vol. 17, no. 1, pp. 346-360, 2019. https://doi.org/10.1109/TNSM.2019.2942475
- A. Hernández, D. Sevilla, J. García, S. Feliciano, "A Model-Driven Approach to Generate Schemas for Object-Document Mappers," IEEE Access, vol 7, pp. 59126-59142, 2019. https://doi.org/10.1109/ACCESS.2019.2915201
- P. Gómez, C. Roncancio, R. Casallas, "Analysis and evaluation of document-oriented structures," Data & Knowledge Engineering, vol. 134, e101893, 2021. https://doi.org/10.1016/j.datak.2021.101893
- E. Kuszera, L. Peres, M. Fabro, "Exploring data structure alternatives in the RDB to NoSQL document store conversion process," Information Systems, vol. 105, e101941, 2021. https://doi.org/10.1016/j.is.2021.101941
- G. Nys, R. Billen, "From consistency to flexibility: A simplified database schema for the management of CityJSON 3D city models," Transactions in GIS, vol. 25, no. 6, pp. 3048-3066, 2021. https://doi.org/10.1111/tgis.12807
- A. Maté, J. Peral, J. Trujillo, C. Blanco, D. García-Saiz, E. Fernández-Molina, " Improving security in NoSQL document databases through model-driven modernization," Knowledge and Information Systems, vol. 63, no. 8, pp. 2209-2230, 2021. https://doi.org/10.1007/s10115-021-01589-x
- S. Banerjee, A. Sarkar, "Ontology Driven Meta-Modeling for NoSQL Databases: A Conceptual Perspective," International Journal of Software Engineering and its Applications, vol. 10, no. 12, pp. 41-64, 2016. https://doi.org/10.14257/ijseia.2016.10.12.05
- C. Zdepski, A. Bini, S. Matos, "PDDM: A Database Design Method for Polyglot Persistence," American Academic Scientific Research Journal for Engineering, Technology, and Sciences, vol. 71, no. 1, pp. 136-152, 2020.
- A. Imam, S. Basri, R. Ahmad, A. Wahab, M. González-Aparicio, L. Capretz, A. Alazzawi, A. Balogun, "DSP: Schema Design for Non-Relational Applications," Symmetry, vol. 12, no. 11, e1799, 2020. https://doi.org/10.3390/sym12111799
- A. Hernández, J. Hoyos, J. García, D. Sevilla, "Discovering entity inheritance relationships in document stores," Knowledge-Based Systems, vol. 230, e107394, 2021. https://doi.org/10.1016/j.knosys.2021.107394
- I. Al Jawarneh, P. Bllavista, A. Corradi, L. Foschini, R. Montanari, "Efficient QoS-Aware Spatial Join Processing for Scalable NoSQL Storge Frameworks," IEEE Transactions on Network and Service Management, vol. 18, no. 2, pp. 2437-2449, 2020. https://doi.org/10.1109/TNSM.2020.3034150
- S. Sutedi, N. Setiawan, T. Adji, "Enhanced Graph Transforming V2 Algorithm for Non-Simple Graph in Big Data Pre-Processing," IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 1, pp. 67-77, 2020. https://doi.org/10.1109/TKDE.2018.2880971
- N. Mehmood, R. Culmone, L. Mostarda, "Modeling temporal aspects of sensor data for MongoDB NoSQL database," Journal of Big Data, vol. 4, no. 1, pp. 1-35, 2017. https://doi.org/10.1186/s40537-017-0068-5
- B. Namdeo, U. Suman, "Schema design advisor model for RDBMS to NoSQL database migration," International Journal of Information Technology, vol. 13, no. 1, pp. 277-286, 2020. https://doi.org/10.1007/s41870-020-00515-8
- B. Khalfi, C. De Runz, S. Faiz, H. Akdag, "A New Methodology for Storing Consistent Fuzzy Geospatial Data in Big Data Environment," IEEE Transactions on Big Data, vol. 7, no. 2, pp. 468-482, 2021. https://doi.org/10.1109/TBDATA.2017.2725904
- A. Sveen, "Efficient storage of heterogeneous geospatial data in spatial databases," Journal of Big Data, vol. 6, no. 1, pp. 1-14, 2019. https://doi.org/10.1186/s40537-019-0262-8
- M. Min, "Modeling and Implementation of Public Open Data in NoSQL Database," International Journal of Internet, Broadcasting and Communication, vol. 10, no. 3, pp. 51-58, 2018. https://doi.org/10.7236/IJIBC.2018.10.3.51
- K. Baker, P. Roehsner, T. Lake, D. Rivet, S. Benston, B. Bommersbach, W. Kirk, "Point-trained models in a grid environment: Transforming a potato late blight risk forecast for use with the National Digital Forecast Database," Computers and Electronics in Agriculture, vol. 105, pp. 1-8, 2014. https://doi.org/10.1016/j.compag.2014.04.002
- M. Hewasinghage, A. Abelló, J. Varga, E. Zimányi, "A cost model for random access queries in document stores," The VLDB Journal, vol. 30, no. 4, pp. 559-578, 2021. https://doi.org/10.1007/s00778-021-00660-x
- Y. Cheng, K. Zhou, J. Wang, P. D. Maeyer, T. V. Voorde, J. Yan, S. Cui, "A Comprehensive Study of Geochemical Data Storage Performance Based on Different Management Methods," Remote Sensing, vol. 13, no. 6, e3208, 2021. https://doi.org/10.3390/rs13163208
- A. Charef, B. Abdelkader, "Towards NoSQL-based Data Warehouse Solution integrating ECDIS for Maritime Navigation Decision Support System," Informatica, vol. 45, no. 3, e3204, 2021. https://doi.org/10.31449/inf.v45i3.3204
- E. Damiani, B. Oliboni, E. Quintarelli, L. Tanca, L. "A graph-based meta-model for heterogeneous data management," Knowledge and Information Systems, vol. 6, no. 1, pp. 107-136, 2018. https://doi.org/10.1007/s10115-018-1305-8
- M. Hewasinghage, J. Varga, A. Abelló, E. Zimányi, "Managing Polyglot Systems Metadata with Hypergraphs," Data & Knowledge Engineering, vol. 134, e101896, 2021. https://doi.org/10.1016/j.datak.2021.101896
- M. Sokolova, F. Gómez, L. Borisoglebskaya, "Migration from an SQL to a hybrid SQL/NoSQL data model," Journal of Management Analytics, vol. 7, no. 1, pp. 1-11, 2020. https://doi.org/10.1080/23270012.2019.1700401
- G. Demirci, H. Ferhatosmanoğlu, C. Aykanat, "Cascade-aware partitioning of large graph databases," The VLDB Journal, vol. 28, no. 3, pp. 329-350, 2019. https://doi.org/10.1007/s00778-018-0531-8
- C. Küçükkeçeci, A. Yazıcı, "Big Data Model Simulation on a Graph Database for Surveillance in Wireless Multimedia Sensor Networks," Big Data Research, vol. 11, pp. 33-43, 2018. https://doi.org/10.1016/j.bdr.2017.09.003
- Y. Hu, V. Gunapati, P. Zhao, D. Gordon, N. Wheeler, M. Hossain, T.LJ. Peshek, L. Bruckman, G. Zhang, R. French, "A Nonrelational Data Warehouse for the Analysis of Field and Laboratory Data from Multiple Heterogeneous Photovoltaic Test Sites," IEEE Journal of Photovoltaics, vol. 7, no. 1, pp. 230-236, 2017. https://doi.org/10.1109/JPHOTOV.2016.2626919
- C. Li, Q. Zhang, P. He, Z. Wang, L. Chen, "An agricultural data storage mechanism based on HBase," International Journal of Information and Communication Technology, vol. 14, no. 4, pp. 456-469, 2019. https://doi.org/10.1504/IJICT.2019.101864
- K. Santhiya, V. Bhuvaneswari, "An Automated MapReduce Framework for Crime Classification of News Articles Using MongoDB, " International Journal of Applied Engineering Research, vol. 13, no. 1, pp. 131-136, 2018.
- J. Zeng, B. Plale, "Argus: A Multi-tenancy NoSQL store with workload-aware resource reservation," Parallel Computing, vol. 58, pp. 76-89, 2016. https://doi.org/10.1016/j.parco.2016.06.003
- J. Yoon, D. Jeong, C. Kang, S. Lee, "Forensic investigation framework for the document store NoSQL DBMS: MongoDB as a case study," Digital Investigation, vol. 17, pp. 53-65, 2016. https://doi.org/10.1016/j.diin.2016.03.003
- H. Asri, H. Mousannif,, H. Moatassime, "Reality mining and predictive analytics for building smart applications," Journal of Big Data, vol. 6, pp. 1-25, 2019. https://doi.org/10.1186/s40537-019-0227-y