The value of open data government : a quality assessment approach

La calidad de los datos implica un conjunto de características, valores y expresiones que se construyen de forma iterativa. Los datos abiertos del gobierno (OGD) comparten las cualidades de los Big Data en cuanto a volumen, velocidad, precisión y valor, y destaca el propósito de lograr la reutilización de los interesados. Este estudio utiliza una revisión para identificar las relaciones entre big data, datos abiertos, calidad y valor. Se concluye que el OGD, una tendencia reciente, fomenta la colaboración y la participación ciudadana y permite la reutilización de los datos para buscar la innovación pública y privada. Asegurar la calidad de los datos permite obtener valor de su reutilización y generar Valor Económico, Valor Comercial, Valor Social y Valor Público. Los consumidores requieren una sólida base de confianza antes de utilizar la información pública, lo que exige un proceso de garantía de la calidad de los datos que refuerce e intensifique el valor inherente de los mismos, y active su potencial en diferentes contextos.


Introduction
At the heart of data are their contextual value and the potential for projection and extension to other domains. The quality of these data is a fundamental component to facilitate the use and exploitation of their use benefits. Additionally, Open Government Data (OGD), allows people to have useful information, with agility, at the right time and in an accessible language (Congreso de la República de Colombia, 2014).
Whether the data have quality problems or not, a fundamental step is ensuring their suitability and adequacy for their purpose. From there, it emerges that data quality is oriented in its foundation of a continuous and not static process, which is an enabler and motivator to take a large set of data, perhaps without form, and adapt it so that users can discover patterns, new ideas, and trends (BSA The Software Alliance, 2017). In this way, a contribution is made to the data value chain. Data-based products are generated with the potential value of fostering digital development in transportation, health, manufacturing, and retail (Attard et al., 2016).
If the public value is defined as "the provision of public resources to provide effective and useful responses to social needs (Consejo Nacional de Política Económica y Social de la República de Colombia, 2018), it is possible to determine that all information of public interest should be brought together to create public value. This value comes from aligning with common motivations for sharing government data, such as increasing transparency, stimulating economic growth, and improving government services and processes and responsiveness (Kucera & Chlapek, 2014), in parallel to supporting its development based on the principle of quality of information, which is made up of the pillars of truthfulness, objectivity, completeness, timeliness, availability, accessibility, ease of processing, and reuse (Congreso de la República de Colombia, 2014).
This work presents the relationship among the life cycles of Big Data and OGD and the value that is generated from the evaluation and assurance of data. The article considers the concepts of Big Data and OGD, highlighting the inherent value and potential of data quality as an enabler of public value in different societal contexts. The methodology used is a literature review. In the results and discussion, the life cycles of Big Data and OGD are analyzed and their similarities and differences are established. Based on the above, we identify how data quality can influence the generation of different types of value and how data enhances its use from this. Finally, conclusions are presented, where the relationships between characteristics, processes and guidelines are harmonized as a platform to obtain public, social, commercial and economic value.

Background
The conceptual lines addressed in this article correspond to characteristics of Big Data and Open Government Data (Figure 1), and how evaluating and assuring their quality opens a path to generating value according to their use and appropriation.

Gina Maestre-Gongora Adriana Rangel-Carrillo Mariutsi Osorio-Sanabria
For Cai and Zhu (2015), Big Data addresses unique characteristics, such as volume, speed, accuracy, value, and variety, so quality assurance involves a series of challenges such as facilitating the discovery and promoting the use of data, as a process that enables its value and strengthens the trust of its origin to facilitate its understanding and discovery; and surmounting shortcomings that hinder the analysis process, such as inaccurate, obsolete, wrong, or incomplete data (i.e., not suitable for use).
The United Nations Economic Commission for Europe, UNECE (2014), defi ned the general principles for evaluating the quality of Big Data, which are suitability for use, fl exibility, ability to extend its context, and application to various situations. Finally, it highlights the relationship between the eff ort required to ensure its quality versus the value obtained.
To evaluate the quality of use of Big Data, we take as a reference the "3 C" model, exposed by Williams and Tang (2020), which addresses: a) Contextual Consistency, which takes into account the capacity of the data set to be useful in the domain of interest independent of its format; b) Temporal Consistency, which focuses on the time interval where the data can be used; and c) Operational Consistency, which represents the extent to which the data can be processed for analysis.
In the past, the consumers of the data corresponded to the same group of producers, thus allowing a high level of control to guarantee its quality. However, with the era of Big Data, Cai and Zhu (2015), noted that the users of the data do not necessarily correspond to the producers of the data. This new reality increases the complexity of quality measurement since the data set itself is involved, as well as the technology required for its management and its spatial and operational problems, as Merino et al. (2016) explained. Mahecha et al. (2018), identifi ed that public institutions produce and manage data very quickly, which is published in predetermined formats in a structured way to facilitate access to the public and its reuse in diff erent contexts; such data is called Open Government Data (OGD). While Open Linked Data was defi ned by  The objective of OGD projects and initiatives is to facilitate cooperation among public administration, politics, industry, and citizens, strengthening transparency, democracy, participation, and collaborative work (Kucera, 2015). According to Attard et al. (2015), the quality of OGDs is interdisciplinary, multidimensional, and transversal in its life cycle, since the following occurs: i) The data are created to ensure their completeness ; ii) The data are selected according to their objectivity, timeliness, and accuracy; iii) The data are harmonized to ensure accessibility, ease of use, and processing and iv) The data are published so that they are available to the public.
Data availability promotes reuse and discovery and facilitates the exploration and exploitation of data, processes through which the knowledge obtained from the information that correlates the data under study is built, innovated, connected, and managed.
In turn, Ministerio de Tecnologías de la Información y las Comunicaciones (MinTIC, 2019a), in the "Guide for the Use and Exploitation of Open Data in Colombia," specified how the principles of OGD chart the path to generating public value in the following ways:Promoting transparency and social control, helping people to take a more active role in society, facilitating product innovation and the creation of new business models, improving government management and efficiency of services, allowing planning and forecasting of future scenarios and strengthening the exploitation of the potential use of data. Ciancarini et al. (2016), defined open data as a subdomain of Big Data, and data quality problems are exacerbated in Big Data; for this reason, it becomes fundamental to focus efforts on evaluating and guaranteeing the quality of this representative subdomain.
About the above, the quality of the OGDs must be given as a process before the opening of the data to provide value in the creation and generation of new business models in the business sphere and, in turn, to facilitate innovation in the public sector, based on ideas and best practices from the private sector. Therefore, the development of data opening strategies generates value in discovering trends and identifying new perspectives of a problem to obtain patterns that support innovation.

Method
The methodology used in this study, corresponds to a literature review based on the work of Maestre-Gongora and Colmenares-Quintero (2018). Based on the object of the study, the following questions are proposed to be resolved: Q1. How is data quality addressed in the life cycles of Big Data and OGD? and Q2. What types of values are generated from Big Data and OGD?
The search was done in the following sources: IEEE Xplore digital library, Enlace Springer, and Scopus, using the following keywords: Open Government Data Quality, Big Data Quality, Open Government Data Quality Methodology, and Big Data Quality Methodology.
In the selection of documents, the title, keywords and summary were verified, so that these documents would make a decisive contribution to the research and thus avoid irrelevant references. The following filters were applied to the pre-selected documents: year, type of publication: articles, theses, normative documents, technical documents and reports from multilateral organizations.
Subsequently, we identified how each of the selected articles addressed: relationship with the object of study: Emphasis -relationship with the object of study: (Big Data, Government Open Data), Sector of application (Public, business, academic), Type of Value (Public, Political, Social, Commercial, Economic or Technical, other), Affecting Factors (Benefits, limitations, challenges and risks), Quality framework (model, policies, principles, process, guide, life cycle, best practices, methodology, among others). Then, the type document (journal and conference articles), and the range of publication dates, 2010 to 2021, were reviewed.

Gina Maestre-Gongora Adriana Rangel-Carrillo Mariutsi Osorio-Sanabria
Therefore, the final selection included a total of 32 documents, which contribute to the results and discussion of the research questions. Finally, 32 documents were selected, as shown in Table 1.  Table 2 shows some authors' perceptions regarding the Big Data life cycle. They agree on the need to conduct a data-discovery-andanalysis phase, where the information of the Big Data context is collected and grouped, and its profiling is done. The assessment of quality and exploitation of Big Data has the purpose of enabling the use of the data. Only one of the authors includes a monitoring phase through which the suitability and usefulness of the data are verified. Through the life cycle of Big Data, it is identified that the quality evaluation of the data precedes the exploitation; this shown the dependence between the quality and the data as value generators. In this way, a quality assurance process is required that enables the data to be used in the different potential contexts. With the large amount and availability of data, Freitas and Curry (2016), conceptualized that opportunities have been generated for people to use them as input for decision making, forming a complete picture of reality, supported by data.

Open Government Data life cycle
In the "Guide for the use and exploitation of Open Data in Colombia," the MinTIC (2019a), describes the open data life cycle, which consists of four stages: a) Establish the opening plan, b) Structure and publish the data, c) Communicate and promote the use of the data (disseminate, link stakeholders, consolidate, and position), and d) Monitor quality and use (measure quality, measure use, and measure the impact).
The quality evaluation is established in the documentation and structuring stage, which is done by applying and measuring parameters and indicators. These indicators must correlate use with quality, demonstrating that it is only possible to generate value from the data being used. They are only used if these data meet quality standards. Figure 2 presents a parallel between the life cycle phases of Big Data and OGD. The quality assurance component is immersed in evaluating the data for Big Data and documenting and structuring the data for OGDs. This component enables the next phase that corresponds to the exploitation and use of government data, through which the value immersed in the data is obtained. OGD is emerging as a recent trend that encourages collaboration and citizen participation, and enables reusing data to pursue public and private innovation. Consumers of these data require a solid foundation of trust that activates the use of public information, which can only be acquired after the implementation of a data quality assurance process that strengthens and intensifies the inherent value of the data and activates its potential in different contexts.

Types of values generated from Big Data and OGD
In 60% of the documents reviewed, the topic of "Value" is addressed, in some of them, from the public, economic, social, commercial, and, to a lesser extent, political and technical approach.
Research covering the concept of public value at the social, economic, and commercial levels is the most relevant literature. Political value highlights governments' and citizens' responsibilities to contribute to mitigating corruption collaboratively. Technical value focuses on the technology and the platform's characteristics and requirements that support the data.
Later, when analyzing the studies selected by the type of public, social, commercial, and economic value generated by guaranteeing the quality of Big Data, it was found that the public, social, and economic value has been simultaneously established in Kalampokis et al. (2011) and Consejo Nacional de Política Económica y Social de la República de Colombia (2018); while the commercial value has been analyzed independently (Merino et al., 2016;Wahyudi et al., 2018;Wieczorkowski, 2019;Williams & Tang, 2020). Table 3 shows the types of value generated and the sources that refer to them.  (Cai & Zhu, 2015) X (Merino et al., 2016) X (Wahyudi et al., 2018) X (Wieczorkowski, 2019) X (Williams & Tang, 2020) X (Ciancarini et al., 2016) X In the literature reviewed related to OGD, the type of public, social, commercial, and economic value is also identified, where more than 80% of the authors have established public value as the most identified and affirmed by using OGD, while the least referenced is commercial value.
It is evident that the only author who considers in his study the four types of value is Attard et al. (2016), and he names two additional types of value (political and technical value), while in the document "Quality Requirements for Open Data" produced by MinTIC (2019b), the public value is discriminated according to its political, strategic, and ideological impacts and the value of legitimacy and respect. In many studies, the public, social, and economic values are established simultaneously, as shown in Table 4  Some of the public value-oriented data perspectives highlight the public value in reuse, collaboration, and innovation. Regarding the public value in reuse, OGD may be of interest to the business sector and the general public. For the data to be reused, it is necessary to guarantee its quality level to facilitate the connection between the real-world objects and the description that the data provides of these objects. Kalampokis et al. (2011), defined that the challenges for achieving this ideal situation vary between the lack of available information and the large amount of data produced by the public sector.
A considerable percentage of data is shared by several entities and may expose different results from the same object or real-world problem, preventing the creation of new value-added services and products. Linking other actors in the data ecosystem, promotes the reuse of available information and facilitates feedback on detected inconsistencies, thus ensuring ac-Gina Maestre-Gongora Adriana Rangel-Carrillo Mariutsi Osorio-Sanabria cess and generating value from the use of data (Ministerio de Tecnologías de la Información y las Comunicaciones de Colombia, 2019a).
Alternatively, citizen participation is a useful mechanism to build knowledge collaboratively. OGDs promote the active intervention and empowerment of citizens and facilitate transparency through accountability and the nation's growth as a whole, as indicated by (Zuiderwijk & Janssen, 2015). Attard et al. (2015), presented that OGDs are related to decision and policymaking rather than sporadic participation in election processes. The value obtained from collaboration transversally acts in the process of continuous improvement of the government's services. MinTIC (2016a), in the Open Data Guide, declared that this contributes to transparency, social control by stakeholders, and predictive measurement of policies' impact at the social, cultural, and economic levels.
Additionally, Yang and Kankanhalli (2013), reported that innovation has had an evolution, going from a closed concept, in which ideas and the flow of knowledge occur within an organization, to an open concept, where development ideas have an external origin and are welcome to generate new products and services.
The concept of OGD relates more explicitly to the theme of value, which can be seen because the authors address several types of value in their publications simultaneously. In contrast, big data addresses them individually. The value of OGD is based on ensuring its quality and transformation into relevant public information to serve citizens by extracting the maximum value from the data, which are built through a process of reuse in fields of innovation and creation of services. Public value is the most prevalent value in the OGD literature, followed by economic value, while commercial value seems to have more impact on Big Data.

Conclusions
The concepts of Big Data and OGD consider quality and value, which are exposed through the stages of their life cycles, which do not depend exclusively on their inherent characteristics but extend to the domain of knowledge from which they originate and the context of use.
The Big Data Life Cycle Correlation Points and OGD, provide guidelines for prioritization and emphasis for improvement as a basis for establishing a flexible and practical data quality assessment process. This quality assessment process serves as a platform for the value the data provide to stakeholders to emerge. This value can be obtained through the reuse and active collaboration of different stakeholders, so that the value can be diversified into public, social, commercial, and economic values.
The capacity of Big Data and OGD to generate value, can be obtained by employing mechanisms of quality assurance of the data, which contribute in a significanly to improving the level of understanding and reliability of the data, to enable their use in the discovery of: patterns, tendencies, forecasts, models, and the construction of knowledge.
To leverage the necessary conditions that unlock the value of data, one must be clear about the value one wants to generate and move towards a data culture where a platform that maximizes their value is consolidated. From the above, it is of interest for this work to assume as future work the methodological proposal of a specific phase of data quality within the life cycle of Big Data and OGD to facilitate the use and reuse of data as mechanisms to generate public value, mainly for reuse, collaboration, and innovation.