From the Selectedworks of Nader Ale Ebrahim Equality of Google Scholar with Web of Science Citations: Case of Malaysian Engineering Highly Cited Papers

This study uses citation analysis from two citation tracking databases, Google Scholar (GS) and ISI Web of Science, in order to test the correlation between them and examine the effect of the number of paper versions on citations. The data were retrieved from the Essential Science Indicators and Google Scholar for 101 highly cited papers from Malaysia in the field of engineering. An equation for estimating the citation in ISI based on Google scholar is offered. The results show a significant and positive relationship between both citation in Google Scholar and ISI Web of Science with the number of versions. This relationship is higher between versions and ISI citations (r = 0.395, p<0.01) than between versions and Google Scholar citations (r = 0.315, p<0.01). Free access to data provided by Google Scholar and the correlation to get ISI citation which is costly, allow more transparency in tenure reviews, funding agency and other science policy, to count citations and analyze scholars' performance more precisely.


Introduction
Citation index as a type of Bibliometrics method traces the references in a published article. It shows that how many times an article has been cited by other articles . Citations are applied to evaluate the academic performance and the importance of information contained in an article (Zhang, 2009). This feature helps researchers get a preliminary idea of the articles and research that make an impact in a field of interest. The avenues to evaluate citation tracking have greatly increased in the past years (Kear & Colbert-Lewis, 2011). Citation analysis was monopolized for decades by the system developed by Eugene Garfield at the Institute for Scientific Information (ISI) now owned by Thomson Reuter Scientific (Bensman, 2011). ISI Web of Science is a publication and citation database which covers all domains of science and social science for many years (Aghaei . In 2004, two competitors emerged, Scopus and Google Scholar (Bakkalbasi, Bauer, Glover, & Wang, 2006). Google Inc. released the beta version of 'Google Scholar' (GS) (http://scholar.google.com) in November 2004 (Pauly & Stergiou, 2005). These three tools, ISI from Thomson Reuters, Google Scholar (GS) from Google Inc. and Scopus from Elsevier are used by academics to track their citation rates. Access to ISI Web of Science is subscription-based service while GS provides a free alternative to retrieve the citation counts. Therefore, the researchers need to estimate their citation in ISI by knowing the GS citation counts. On the other hand, publishing a research paper in a scholarly journal is necessary but not sufficient for receiving citations in the future (Nader Ale . The paper should be visible to the relevant users and authors in order to get citations. The visibility of the paper is defined by the number of paper versions which are available in the Google Scholar database. The number of citations will be limited to the versions of the published article on the web. The literature has shown increased visibility by making research outputs available through open access repositories, wider access results and higher citation impact (Nader Ale Ebrahim et al., 2014;Amancio, Oliveira Jr, & da Fontoura Costa, 2012;Antelman, 2004;Ertürk & Şengül, 2012;Hardy, Oppenheim, Brody, & Hitchcock, 2005). A paper has greater chance of becoming highly cited whenever it has more visibility (Nader Ale Egghe, Guns, & Rousseau, 2013).
The objectives of this paper are two-fold. The first objective is to find the correlation between Google Scholar and ISI citation in the highly cited papers. The second objective is to find a relationship between the paper availability and the number of citations.

Google Scholar & Web of Science Citations
The citation facility of Google Scholar is a potential new tool for Bibliometrics (Kousha & Thelwall, 2007). Google Scholar, is a free-of-charge by the giant Google search engine, has been suggested as an alternative or complementary resource to the commercial citation databases like Web of Knowledge (ISI/Thomson) or Scopus (Elsevier) (Aguillo, 2011). Google Scholar provides Bibliometrics information on a wide range of scholarly journals, and other published material, such as peer-reviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations (Orduña-Malea & Delgado López-Cózar, 2014). GS also introduced two new services in recent years: Google Scholar Author Citation Tracker in 2011 and Google Scholar Metrics for Publications in April 2012 (Jacso, 2012). Perhaps some of these documents would not otherwise be indexed by search engines such as Google, so they would be "invisible" to web searchers, and clearly some would be similarly invisible to Web of Science users, since it is dominated by academic journals (Kousha & Thelwall, 2007). On the other hand, the Thomson Reuters/Institute for Scientific Information databases (ISI) or Web of Science database (actually there is ambiguity between different names of former ISI), include three databases: Science Citation Index/Science Citation Index Expanded (SCI/SCIE) (SCIE is the online version of SCI), Social Science Citation Index (SSC) and Arts and Humanities Citation Index (AHCI) (Larsen & von Ins, 2010). Since 1964 the Science Citation Index (SCI) has been a leading tool in indexing (Garfield, 1972).
Few studies have been done to find a correlation between GS with WoS citations. Cabezas-Clavijo and Delgado-Lopez-Cozar (2013) found that the average h-index values in Google Scholar are almost 30% higher than those obtained in ISI Web of Science, and about 15% higher than those collected by Scopus. GS citation data differed greatly from the findings using citations from the fee-based databases such as ISI Web of Science (Bornmann et al., 2009). Google Scholar overestimates the number of citable articles (in comparison with formal citation services such as Scopus and Thomson Reuters) because of the automated way it collects data, including 'grey' literature such as theses (Hooper, 2012). The first objective of this study is to find the correlation between Google Scholar and ISI citation in the highly cited papers.

Visibility and Citation Impact
Nader Ale Ebrahim et al. (2014) based on a case study confirmed that the article visibility will greatly improve the citation impact. The journal visibility has an important influence on the journal citation impact (Yue & Wilson, 2004). Therefore, greater visibility caused higher citation impact (Zheng et al., 2012). In contrast, lack of visibility has condensed a significant citation impact (Rotich & Musakali, 2013). Nader Ale  by reviewing the relevant papers extracts 33 different ways for increasing the citations possibilities. The results show that the article visibility has tended to receive more download and citations. In order to improve the visibility of scholars' works and make them relevant on the academic scene, electronic publishing will be advisable. This provides the potential to readers to search and locate the articles at minimum time within one journal or across multiple journals. This includes publishing articles in journals that are reputable and listed in various databases and peer reviewed (Rotich & Musakali, 2013). Free online availability substantially increases a paper's impact (Lawrence, 2001a). Lawrence (2001aLawrence ( , 2001b) demonstrated a correlation between the likelihood of online availability of the full-text article and the total number of citations. He further showed that the relative citation counts for articles available online are on average 336% higher than those for articles not found online (Craig, Plume, McVeigh, Pringle, & Amin, 2007).
However, there are limited resources to explain the relationship between the paper availability and the number of citations (Craig et al., 2007;Lawrence, 2001b;McCabe & Snyder, 2013;Solomon, Laakso, & Björk, 2013).
None of them discussed about the relationship between the number of versions, and citation. The number of "versions" will be shown in any Google Scholar search result. Figure 1 shows 34 different versions of an article entitled "Virtual Teams: a Literature Review (Nader Ale Ebrahim, Ahmed, & Taha, 2009)" and number of citations. The second objective of this research is to find a relationship between the paper availability and the number of citations.

Methodology
Highly cited papers from Malaysia in the field of engineering were retrieved from the Essential Science Indicators (ESI) which is one the Web of Knowledge (WoK) databases. ESI provides access to a comprehensive compilation of scientists' performance statistics and science trend data derived from WoK Thomson Reuters databases. Total citation counts and cites per paper are indicators of influence and impact of each paper. There is a threshold to select highly cited papers according to the baseline data in ESI. This threshold is different from one discipline to another one. ESI rankings are determined for the most cited authors, institutions, countries, and journals (The Thomson Corporation, 2013). The paper must be published within the last 10-year plus four-month period (January 1, 2003-April 30, 2013 and must be cited above threshold level, in order to be selected. Essential Science Indicators data used in this research have been updated as of July 1, 2013. Google Scholar which is a free online database was used for deriving the number of citations and versions of the ESI highly cited papers. The data from ESI was collected on 29 July 2013 and Google Scholar data was collected on 31 July 2013. The total numbers of 101 papers were listed in ESI as highly cited papers from Malaysia in the field of engineering. The lists of 101 papers were retrieved from ESI database and then were exported to an Excel sheet. A search engine was developed to get the number of citations and versions from Google Scholar. This gadget assisted the present researchers to collect the data more preciously and faster than searching for the papers one by one. The Statistical Package for the Social Sciences (SPSS) was used for analyzing the data. The results are illustrated in the following section.

Results and Discussion
The number of citations which were derived from Web of Knowledge platform hereafter are called ISI citation. To study the relationship among the number of citations in Google scholar and ISI and the number of versions, correlation coefficients were computed. Table 1 shows descriptive statistics of the variables. As both numbers of citations in Google scholar and ISI were distributed normally, Pearson correlation coefficient (r) was used and the results showed a very high positive and significant association (r = 0.932 , P<0.01) between the number of citations in Google scholar and ISI for the articles that were published during 2006 to 2012 from Malaysia in the field of engineering. To study the relationship between both citation and the number of versions, www.ccsenet.org/mas Vol. 8, No. 5; Spearman Rho was used due to the non-normal distribution of the versions. The results showed a significant and positive relationship between both citations in Google Scholar and ISI with the number of versions. This relationship was higher between versions and ISI citations (r = 0.395, p<0.01) than between versions and Google Scholar citations (r = 0.315, p<0.01). Linear regression was also applied to predict the number of citations in ISI based on Google Scholar citations. The results showed a very high predictability (R2 = 0.836) for the linear model (see Figure 2) which was significant (F = 511.63, p<0.01). Therefore, the final equation for estimating the citation in ISI based on Google Scholar is: ISI Citation = 5.961 + 0.460 (Google Scholar citation)

Figure 2. Scatter diagram between ISI citation and Google Scholar citation
To study the effect of the number of versions on both citations in Google Scholar and ISI, simple linear regression was applied. The results indicated that the number of versions had a significant positive effect on citations in both databases (see Table 2 and Table 3). A comparison between Google Scholar and ISI citation for highly cited papers from Malaysia in the field of engineering (see Figure 3) shows that the citation counts in Google Scholar are always higher than the number of citations in ISI.

Conclusion
The number of publications and the number of citations in ISI Web of Science are used to measure the researchers' scientific performance and their research impact. However, these numbers are not freely available. Therefore, the offered equation can be used as a reference to convert the number of Google Scholar citations to ISI citations. On the other hand, the number of versions of both systems has a significant positive effect on the number of citations. This finding supports other researchers' (Amancio et al., 2012;Antelman, 2004;Egghe et al., 2013;Ertürk & Şengül, 2012;Hardy et al., 2005) findings related to the paper visibility. The results of this study indicate that there is a strong correlation between the number of citations in Google Scholar and ISI Web of Science. Therefore, the researchers can increase the impact of their research by increasing the visibility of their research papers (or paper versions). Future study is needed to determine the relationship between citation counts on the other databases such as Microsoft Academic Research, Scopus, SiteSeer index and ISI by considering journal article and conference papers.