Typesetting
Sun, 03 Sep 2023 in Revista de Investigación, Desarrollo e Innovación
Mapping, evolution, and application trends in co-citation analysis: a scientometric approach
Abstract
This study aims to explore the mapping, evolution, and application trends of co-citation analysis. To accomplish this goal, a comprehensive search was conducted using Scopus and Web of Science, resulting in 1298 relevant studies. Further analysis was conducted on scientific production, country, author, journal, and network data. The Tree of Science algorithm was applied to demonstrate the development of co-citation analysis. The results make three significant contributions to scientometric research: Firstly, a scientific mapping is presented highlighting the scientific output, main journals, and key researchers; secondly, the advancements of co-citation analysis are presented through the Tree of Science metaphor; lastly, the study identifies the three main subtopics within co-citation analysis through citation analysis. These findings will assist researchers and librarians in recognizing the crucial contributions and applications of co-citation analysis.
Main Text
1. Introduction
Co-citation analysis (CA) has received particular attention during the last decades due to the emerging technology applications that help to automate this type of analysis. Nowadays, researchers and librarians can access a considerable amount of academic information and, in some way, “indigestible.” Researchers produce around 3 million academic documents every year (Fire & Guestrin, 2019), and this value is increasing every time. However, the analysis has become nearer to everybody due to the new software tools like CiteSpace (Chen, 2006), VosViewer (van-Eck & Waltman, 2010), and Vantage Point (Sepúlveda-López et al., 2021). New proposals in free languages like R and Python have raised the data analysis to other scenarios in artificial intelligence (Robledo et al., 2021). Thus, it is important to identify the evolution and applications of the traditional scientometric techniques to identify the improvements and different applications to take advantage of emerging technologies.
Artificial intelligence (AI) is revolutionizing scientific research. For instance, AI tools like ChatGPT can enhance written work by pointing out gaps in research (van-Dis et al., 2023). Elicit (https://elicit.org/), can provide answers to research questions by analyzing academic papers, while Writefull (https://www.writefull.com/), can streamline the manuscript preparation process. AI's impact on the industry is remarkable and far-reaching (Blaizot et al., 2022). Furthermore, web-based tools such as Semantic Scholar (Lo et al., 2020) and Connected Papers (https://www.connectedpapers.com/), work in tandem with AI models to automate various research operations.
The purpose of this research is to investigate the main contributions of CA. Some reviews try to accomplish similar goals. For example, Gmür (2003) assesses different strategies for CA. He explains that the different results depend on the research question. Also, Boyack and Klavans (2010) compared different approaches such as bibliographic coupling, direct citation, and CA to identify the performance of each one. Therefore, previous studies of CA have not described the evolution and different subtopics in the area. Therefore, it is necessary to identify the last proposals in CA with their applications.
This research data was collected from Scopus and Web of Science (WoS). We merged both results using biblimetrix (Aria & Cuccurullo, 2017) and tosr packages from R. We split the results into three sections. The first section shows the importance of co-citation analysis by measuring the papers, journals, and authors' production. In the second section, we show the evolution of this topic using the Tree of Science (Valencia-Hernandez et al., 2020). Next, we present the perspectives or subtopics from a citation network analysis. This way of presenting the results permits the readers to generate a wider overview of the CA topic (Duque et al., 2021).
2. Methodology
2.1 Theoretical framework
CA has long been proposed as a tool to find the intellectual structure of a research topic. CA is split in two ways, Document Co-citation Analysis (DCA) and Author Co-citation Analysis (ACA). DCA refers to the graph or network created from connecting all the references of a document (Köseoglu, 2020); for example, if a paper has ten references, DCA creates a fully connected network with all the references of the paper (see Figure 1). Then, these references are connected with other references from other papers. The result is a highly connected weighted network of references from all papers to be analyzed. If two papers appear several times in the reference list in different documents, we can understand that these two papers have a similarity in the topic. In this vein, a group of papers in the co-citation network could lead to a common research topic called intellectual structure. ACA is a co-citation network generated through a similar process to the document co-citation network (see Figure 1). The links in the author’s co-citation network are created when two authors appear in the same reference list of a paper (Zhao & Strotmann, 2020). The result is a fully connected authorship network.
2.2 Method
We split the methodology into three parts. The first section explains in detail the search in Scopus and WoS. The second section shows the process of creating the Tree of Science of the CO topic. Finally, the third section presents the citation network and the algorithm to identify the clusters or subtopics.
Mapping
Zupic and Čater (2015) propose a workflow to conduct a science mapping with scientometric methods. First, they suggest identifying the paper production through time, the most productive journals, and authors. Also, it is necessary to generate networks like author co-citation, collaborations among authors and countries, and co-occurrence networks. Table 1 shows the elements in the search. This is a quantitative perspective, different from other studies with a qualitative approach (Cardona-Arbeláez et al., 2019; Macías-Rojas et al., 2022; Vargas-Zapata et al., 2022).
The results from WoS were 808 papers and in Scopus 1224. We merged these papers through Bibliometrix (Aria & Cuccurullo, 2017) and tosr packages. During this process, we removed duplicate values from the two datasets. The final results (1298) show a 6% of superposition between Scopus and Wos, which means that Scopus has most of the papers that appear in WoS.
Tree of Science (ToS)
Tree of Science (ToS) is a metaphor proposed to visualize academic production in a tree’s shape (Robledo et al., 2022; Zuluaga et al., 2022). The first step is to create a citation network (Zuluaga et al., 2016). Next, the algorithm simulates the process of raw and processed sap in the three through the citation network (for a detailed explanation, see Valencia-Hernandez et al. (2020)). This algorithm has been applied to different research topics: economy (Barrera-Rubaceti et al., 2021; Trejos-Salazar et al., 2021), oxidation processes (Macías-Quiroga et al., 2021), corporate social responsibility (Diez et al., 2022; Ramos-Enríquez et al., 2021), food applications (Durán-Aranguren et al., 2021), organizations (Clavijo-Tapia et al., 2021), engineering ( Grisales-Aguirre. et al., 2023; Robledo et al., 2023) and psychology (Gómez-Tabares, 2021). The adoption process of ToS is explained in Eggers et al. (2022).
Identifying emerging trends through citation analysis
Once we have the citation network, we applied the Blondel et al. (2008) algorithm to identify the clusters or subtopics of CA. The initial network has 9.196 nodes (articles) and 32.518 links (references). After applying the clustering algorithm, we selected the three biggest clusters according to the tipping point proposed by Hurtado-Marín et al. (2021). The final network has 4.608 nodes and 13.619 links. Finally, we selected the main papers in each cluster through the traditional metrics in social network analysis (indegree and outdegree).
3. Results and discussion
3.1 Science mapping of CA
Scientific production
Figure 2 shows the annual scientific production of CA. The annual production reflects the importance of a field (Sud et al., 2021). CA presents 2000 a small number of publications; however, since 2010 has been increasing by slightly to moderate each year. After 2018, the total production increased rapidly, with an average of 30.17%, indicating that the research in CA has excellent potential and development in the last few years.
Country production
The top ten most productive countries are shown in Table 2. These countries represent 61% of the total production. China has the highest number of published papers, in WoS with 250 and Scopus with 279, and with 337 unique publications. This amount represents 26% of the total production. The next country is the United States with 109 papers, 67 in WoS and 92 in Scopus. This country represents only 8% of the total production. These results indicate the importance of China in scientometric research and a global reference.
Author production
Table 3 shows the ten most productive authors. The most productive author is Dr. Hallinger with 14 papers; however, Dr. Wang is the most productive author in Scopus with 17 papers, and in general with 21. It is interesting to recognize that the most productive author in WoS is in third place in both datasets with 18 papers. Also, the second most productive author in WoS (Dr. Koseoglu) does not appear in the merged dataset. Thus, the most productive researchers in WoS do not publish in Scopus.
Journal production
Table 4 lists the ten most productive journals. Scientometrics is in first place with 99 papers, representing 8% of the whole dataset. This value is important because the second is the Journal of Informetrics with only 1%. Additionally, both journals are in Q1 quantile according to the Scimago ranking.
Network analysis
Figure 3 presents the author co-citation network, author collaboration network, country collaboration network, and co-occurrence network. The author's co-citation network shows four main communities around a topic. Community one represents classical authors like Dr. Small, Dr. Small, and Dr. Leydesdorff. The second community shows the influence of Dr. Chen in Asia with his software CiteSpace. The third focuses on medical studies, and the last community shows the influence of CA in the management field. The author's collaboration network was built by connecting researchers who worked on the same paper. The first community refers to health science production at the Shenzhen Institute of Advanced Technology and the second one refers to the School of Humanities and Social Sciences at the University of Science and Technology of China. The process of creating groups among researchers increases the impact of their research (Robledo et al., 2022).
The country collaboration network presents an interesting result; it shows the United States (USA) as the major country connecting Europe, Asia, and South America. Moreover, the keyword co-occurrence shows two big clusters. The first one presents the relevant production of the USA and China around human topics and the second one is about the importance of mapping and visualization around information analysis.
3.2 Tree of Science
This section presents the papers in tree order, first the papers in the roots, next, the papers in the trunk, and last, the papers in the leaves. We decided to present the papers that contribute to the advances in ACA and DCA. In this vein, we excluded papers related to the CA applications because we will focus on section 3.3.
Roots
According to the results of the SAP algorithm, the paper that proposes for the first time DCA method was Small (1973); he proposed that co-citation networks among references of a document generated scientific specialties of a research topic. Later, White and Griffith (1981) used the same methodology as the authors, and they showed that this method improves the understanding of the intellectual structure. Finally, Chen et al. (2010) added a network visualization, spectral clustering, and text summarization to facilitate the analysis.
Trunk
Rousseau & Zuccala (2004) showed a problem in the ACA on how the author's data is captured. For example, there are two references with the same author but in different order. The study proposes to include all authors for ACA. Also, Zhao (2006) performs better results using five authors in the analysis. Zhao and Strotmann (2008) and Schneider et al. (2009) confirmed these results by comparing the outcomes of ACA with only the first author and with all authors. He and Cheung Hui (2002) proposed adding an agglomerative hierarchical clustering to display the authors’ networks. Finally, Jeong et al. (2014) added the number of times a reference is cited inside a paper to improve DCA results.
Leaves
We identify studies with more sophisticated modifications in ACA and DCA. Sanguri et al. (2020) propose the semantic similarity-adjusted co-citation index. This index includes a similarity metric using the abstracts of the top-cited documents. Hadj Taieb et al. (2021) also proposed to use similarity metrics but only with titles and Bu et al. (2020) considered the sequence of all authors. Karaulova et al. (2020) introduce a co-nomination method that combines social network analysis and snowball sampling. Zaho and Strotman (2020) compared four weighting schemes for ACA.
3.3 Emerging application trends application
Figure 4 shows the citation network with the three clusters. Cluster one is well-defined from the other two clusters. Clusters two and three have similar subtopics according to the figure.
Subtopic 1: mapping knowledge trends applications
This subtopic shows the different thematic applications of co-citation analysis. The word cloud presents the main words in the middle and the different applications around them for example, information science, health, and environmental studies. Academic production is also called information science, Li et al. (2019) present a DCA of this research topic using CiteSpace software. In health studies, we found the paper of Zheng et al. (2021) that shows the trends in intracranial aneurysms magnetic resonance treatments. They used CiteSpace, ACA, and DCA. Finally, in environmental studies, Sun et al. (2020) demonstrate the link between industrial structure and carbon emissions through a scientometric analysis (DCA). Therefore, this cluster or subtopic refers to works around mapping different scientific topics using ACA or DCA or both.
Subtopic 2: innovation and management applications
Subtopic 2 represents the papers in management innovation that use ACA and DCA to identify intellectual structures. This area represents a subtopic because scientometric techniques have been applied for a long time in management in general. This premise was highlighted by Simao et al. (2021), who presented the evolution and future directions of innovation management using a DCA. Moreover, López-Rubio et al. (2021) focus only on national innovation systems while Boiko (2021) on firm performance. Thus subtopic 2 shows the well-established field of scientometric analysis in management.
Subtopic 3: author co-citation applications
Subtopic 3 refers to ACA applications. Araújo and Bufrem (2021) studied the academic social structure of Brazilian researchers, Ghane et al. (2019) identified the trends of information retrieval and González-Valiente (2021) of information management.
4. Conclusions
This research set out three objectives: science mapping of scientific production in CA, understanding the main contributions that improved CA using ToS, and identifying the main emerging application trends. We reached these objectives based on 1298 papers merged from Scopus and WoS datasets. Results from this research can provide important insights for researchers and librarians interested in scientometric analysis.
This study has shown a rapid growth of scientific production in CA from 2018 to 2020 based on the analysis in section 3.1.1. Particularly, China has a prolific production in this topic. Also, authors like Wang, Zhao, and Hallinger are well-recognized in the field. It is noted that the quality of the production is high, according to the journal’s quantiles.
Moreover, the three subtopics of CA were knowledge trend applications in several topics like information science, health, and environmental studies. Second, innovation management, and, finally, ACA applications. It is essential to highlight that DCA and ACA have applications in different fields because they are cross-disciplinary. Moreover, innovation management has emerged as a subtopic due to the well-received in this discipline. Finally, the applications of ACA have gained great attention, as is shown in the third subtopic.
A limitation of this research is that DCA and ACA are cross-disciplinary topics and sometimes overlap different subtopics, such as applications and improvements of the methods. We selected only three clusters according to the tipping point presented in Figure 4; however, a further study could use one of the two types of CA (DCA or ACA) in the other clusters. Despite its exploratory nature, this study offers insights into the several applications of DCA and ACA.
The logical next step in this work is to examine the effect of AI on scientometric analysis. This could involve exploring the accurate methods for creating collaboration networks and evaluating their impact on performance, as measured by the number of citations received. This type of research would be beneficial for universities seeking to optimize the return on their investment in science.
Abstract
Main Text
1. Introduction
2. Methodology
2.1 Theoretical framework
2.2 Method
Mapping
Tree of Science (ToS)
Identifying emerging trends through citation analysis
3. Results and discussion
3.1 Science mapping of CA
Scientific production
Country production
Author production
Journal production
Network analysis
3.2 Tree of Science
Roots
Trunk
Leaves
3.3 Emerging application trends application
Subtopic 1: mapping knowledge trends applications
Subtopic 2: innovation and management applications
Subtopic 3: author co-citation applications
4. Conclusions