A Bibliometric Assessment of Canadian Software Engineering Scholars and Institutions ( 1996-2006 )

This paper summarizes a ranking of Canadian researchers and institutions in the field of software engineering from 1996 to 2006, based on two metrics: impact factors, and h-index. The ranking is going to be an ongoing, annual event to identify the top 50 scholars and top 50 institutions over a 10-year period in Canada. The rankings are calculated based on the impact factor and h-index of papers published in top 12 selected software engineering journals and conferences. The top-ranked institution is Carleton University, and the top-ranked scholars (by each of the two metrics) are Lionel Briand (formerly with Carleton University) and Gail Murphy from UBC.


Introduction
Who are the most published scholars in the field of software engineering?Which are the most published institutions?
The series of 12 papers by Glass and Chen (Glass 1995;Glass and Chen 2001;Glass and Chen 2002;Tse, Chen et al. 2006) answer the above questions in the worldwide context on an annual basis from 1994 until 2006.A next natural question would be to study the above questions in the context of each country and region.Assessing academic and industrial research institutions of a nation, together with their scholars, can help identify the best organizations and researchers of a country in a given discipline.
Such an assessment can reveal outstanding institutions and scholars, allowing graduate students and researchers to better choose where they want to study or work.It can also allow employers to recruit the most qualified potential employees.Such assessments can also assist internal administrators in making influential decisions, e.g., promotions.They can also help external administrators and funding agencies, e.g., Natural Sciences and Engineering Research Council of Canada (NSERC), in assessing the efficiency of researchers based on the given funds as input.
The series of 12 papers by Glass and Chen (for example (Glass 1995;Glass and Chen 2001;Glass and Chen 2002;Tse, Chen et al. 2006)) is an ongoing, annual event that identifies the top 15 scholars and institutions for the five-year period in systems and software engineering.The rankings are based on the number of papers published in the leading journals within the field, and around 4,000 scholars worldwide have been evaluated.
To the best of our knowledge, the series of studies in (Glass 1995;Glass and Chen 2001;Glass and Chen 2002;Tse, Chen et al. 2006) only consider the quantity of the publications, and not the citations factor.Well, this is a major shortcoming as the Canadian well-known academic David Parnas says in his classics paper: "Stop the Numbers Game" (Parnas 2007).
Ren and Taylor's work in (Ren and Taylor 2007) and the Java tool they have developed (Ren and Taylor Last accessed: Aug. 2009) have incorporated the impact factors of publication venues and have used those to more carefully calculate the scores for institutions and scholars.We follow their path in this study.
To the best of our knowledge, there exists no study, which assesses the Canadian software engineering institutions and scholars.

Metrics, Data Sources, and Tools Used
It is important to note at the outset that this study focuses on the field of software engineering, and not, for example, on computer science, computer engineering, or information systems.
The two metrics used for performance analysis are: (1) impact factor (Garfield 2005), and (2) h-index (Hirsch 2005).These two metrics are the two most widely used metrics to quantify the output of scientific research (Glass and Chen 2002;Tse, Chen et al. 2006;Ren and Taylor 2007).
In a given year, the impact factor of a journal is the average number of citations to those papers that were published during the two preceding years.For example, the 2003 impact factor of a journal would be calculated as follows: A = the number of times articles published in 2001 and 2002 were cited by indexed journals during 2003 B = the total number of "citable items" published in 2001 and 2002.("Citable items" are usually articles, reviews, proceedings, or notes; not editorials or Letters-to-the-Editor.)

impact factor = A/B
The h-index is based on the set of the scientist's most cited papers and the number of citations that they have received in other people's publications.The index can also be applied to the productivity and impact of a group of scientists, such as a department or university or country.A scholar with an index of h has published h papers each of which has been cited by others at least h times.Thus, the h-index reflects both the number of publications and the number of citations per publication.The h-index is designed to improve upon simpler measures such as the total number of citations or publications.The index works properly only for comparing scientists working in the same field; citation conventions differ widely among different fields (Hirsch 2005).The h-index serves as an alternative to more traditional journal impact factor metrics in the evaluation of the impact of the work of a particular researcher In a simpler term, impact factor measures how many times a researcher has published in highly-cited venues (without necessarily tracking the citations of his/her particular publications), while h-index looks closely in the individual publications of an researcher's publications and counts only those publications with a specified minimum number of citations.
The main data source for this study is Scopus (www.scopus.com),one of the leading online academic databases owned by Elsevier.It indexes 15,400 peer-reviewed journals in the scientific, technical, medical and social sciences fields.Scopus is powerful tool and provides both impact factors and h-index measures.To filter publications for our analysis in the Canadian context only, we included the word "Canada" in the "authors address" field of the search engine.Random verification of a large set of results supported the suitability of the above search mechanism.
Our data set selection criterion was to determine if the approach gave any true-negative or false-positive results.After applying the selection mechanism, we reviewed a large sample set to see if all of the resulted papers were indeed published by Canadian researchers.We also considered a set of existing Canadian papers in the area and verified if our selection mechanism selected them as part of the data set.Further, note that the Scopus tool has already converted short form country name "CA" to "Canada" in its paper data set.Thus, our selection mechanism did not need to worry about this.
To better understand the concept of h-index, let us refer to the example in Figure 1 which shows an h-index graph generated by Scopus.The graph visualizes the publications of Lionel Briand in the year 2006.They were 17 papers by this author on that year, and out of those 17, 6 articles were cited at least 6 times.This is shown by a 45° line on in the graph, and fix articles fall "above" or on this line.This number 6 is actually the h-index of the author on that year.
To perform the actual rankings, we used the Java tool developed by Ren and Taylor at the University of California Irvine (Ren and Taylor Last accessed: Aug. 2009).The publications data imported from the Scopus were fed into this Java tool to calculate the rankings.All the compiled data we generated for and used in the current study is available online from (Garousi and Varma 2009).
A snapshot of the Java tool running in the institution-ranking mode is shown in Figure 2.This tool provides a general framework for ranking institutions and authors based purely on their publications.Customizable policies guide the ranking process, for example, by assigning different weights to different publication venues or by determining how a publication's score will be distributed among multiple authors.The goal of this application is to be applicable to all fields of research, and to accommodate as many ranking policies, as possible.
The analysis period has been set to 1996-2006.Note although the study was conducted in summer 2009, as per the definition of the impact factor metric (Garfield 2005), one year should pass after the last publication.Also in many online sources such as the Scopus, it usually takes a few years until all the publication and citation records have entered into the database.As per our manual investigation of a few known publications and their citations, 2006 proved to be a suitable year-end setting for our study.
As the publication venues used for ranking scholars and institutions, a list of top 12 software engineering journals and conferences were selected.The selection was based on the venues used for analysis in (Glass 1995;Glass and Chen 2001;Glass and Chen 2002;Tse, Chen et al. 2006) and also discussions with a few colleagues.The venues and their citations/article count as well as their impact factor measures for the period of 1996-2006 are shown in Table 1.
Searching the Scopus paper database with the Canadian search keywords (as discussed above) plus giving in the list of 12 software engineering journals and conferences yielded a list of 12,839 papers published by Canadian software engineering researchers in the 1996-2006 period.This paper pool was our data set and is available online (Garousi and Varma 2009).
Note that none of the tools or the databases used for this study had any option to exclude "self" citations, and performing such an exclusion manually would have been very time consuming.Thus, in the analysis reported in this paper, self citations have not been excluded.
Many researchers believe that "self" citations do not necessarily denote the popularity of one's research publications (Glänzel, Debackere et al. 2006).It is instead widely accepted that citations by other researchers to one's work denote research impact and popularity.Thus, not being able to exclude "self" citations in our analysis would have led to biasing the impact factors and h-indices.However, since almost all researchers have self citations in their papers, the impact of this can be somewhat neglected since all researchers and all papers have self citations (more or less).

Top Scholars
The top 15 scholars based on the values of impact factors (IF) and h-index are shown in Figure 3.The two metrics yielded slightly different rankings.The impact factor metric identifies Lionel Briand as the top Canadian software engineering scholar from 1996-2006.Lionel Briand was formerly with Carleton University, and he is presently with Simula labs, Norway.On the other hand, h-index identifies Gail Murphy from UBC.It is interesting to observe that these two researchers are ranked #1 and #2 interchangeably by either of the two metrics.
The values of the measures in Figure 3 can be assessed using the definitions of the two metrics (Section 0), and denote the "extent" of difference between the scholars' research output.For example, in the top curve of Figure 3, it can be seen that the first three researchers have impact factor values of more than 100 and while the rest have values between 40 and 70.
There is obviously an overlap between the two ranking schemes.In the two top 15 lists, four researchers appear in both lists.If we look at the top 50 instead, all researchers in one list appears in the other one as well, but of course in a different rank.To see how the two rankings relate, the top 15 list has been shown in each rank versus the other in Table 2.The list of top 50 scholars based on h-index is provided in Table 3.The h-index values of some researchers vary more than the others.Murphy had the lowest h-index variation, while Labiche had the largest variation.This denotes that Murphy's research output has kept quite steady over the five year period.
The research output of some researchers have an increasing trend, while we also observe decreasing and also mixed trends.For example, Labiche has been able to gain an increasing trend in his research output.Miller has an increasing trend until 2004 from when it goes on a decreasing trend.
Interestingly, the year 2004 seem like a "tie" point when four of the five researchers have the same h-index, with the value of 4.

WORLDWIDE CONTEXT
It can be interesting to see how our rankings based on h-index relate to the existing global rankings from (Tse, Chen et al. 2006;Ren and Taylor 2007).Table shows how the top 5 Canadian researchers and institutions position in the worldwide rankings.Those items whose worldwide rankings are blank denote that those items have not appeared in the top 15 list in (Tse, Chen et al. 2006) or the top 50 list in (Ren and Taylor 2007).

Top Institutions
The top 50 institutions and universities based on cumulative h-index values of their software engineering researchers are shown in Figure 5. Carleton University has the lead with a good difference.University of British Columbia (UBC) and University of Waterloo are the 2 nd and 3 rd with close scores.Among government agencies, the National Research Council (NRC) of Canada has the lead.Among the industrial firms, IBM Canada is the first rank.

INSTITUTIONS' PERFORMANCE OVER YEARS
Similar to Section 0, we can study institutions' performance changes over years.Variations of cumulative h-index of five top institutions over years 1996-2006 are visualized in Figure 6.The corresponding box-plot of the data is shown in Figure 7.The main observations from these variations in h-index values are discussed next: The h-index values of some institutions vary more than the others.University of British Columbia had the lowest h-index variation, while Carleton University had the largest variation.Large variations can be explained according to various root causes, e.g., (1) changes (additions/leaves) of the research personnel, and (2) major changes in research funding.
Year 1997 is interesting in that it marks the lowest research output for most institutions.The reason for this situation is not clear.Perhaps, a nation-wide challenge led to such a situation.University of Calgary does not have any cumulative h-index in years prior to 1999.The root cause of this situation is due to the fact that University of Calgary started to hire software engineering academics sometimes around 1999.
According to the box-plot of the data (Figure 7), UBC has an "outlier" point (value=64) in the cumulative h-index values, which occurs in the year 1999.This seems to indicate that, due to unknown reason(s), the research performance in that institution was dramatically increased in that particular year.
The highest single-year research output is by the University of Alberta in the year 2004 (cumulative h-index=68).
Methods that could be used to verify the possible root-causes of the above points include (1) closer look at each institution's research team dynamics, research funding and new hires/leaves over years, and also (2) interviewing researchers from different institutions and asking for their explanations for variations in each institution's research output.These could be interesting future works.

BREAK-DOWN BY PROVINCES
Based on the location of each institution, we can also rank the Canadian provinces and territories as shown in Figure 8.With a large difference from the second rank, Ontario has the lead, denoting that it is the major hub of software engineering research in Canada.British Columbia, Quebec, and Alberta come afterwards in order.The other nine provinces and territories have very low or no major software engineering research output.

Provincial Research Efficiency Analysis: Input versus Output
Once would expect the research productivity of different provinces and territories to relate to the amount of research grants of the researchers in those locations, and their industrial connections.If for example the amount of research grants in the area of software engineering can be estimated for those cases, the relationship between the research grants (as input) and cumulative impact factors (as output) can be analyzed.
One such open data source for the amount of research grants is available in the Canadian context, the awards search engine of the Natural Sciences and Engineering Research Council of Canada (NSERC) (Natural Sciences and Engineering Research Council of Canada (NSERC) Last accessed: Jan. 2010).NSERC is a major Canadian government division that provides grants for research in the natural sciences and in engineering.NSERC in concept is comparable to the American National Science Foundation (NSF).
Although the Canadian software engineering researchers receive other research funds and grants than only NSERC grants, we can assume that the ratios of all research funds to NSERC grants across all institutions are almost equal.Therefore, we can conduct a comparative analysis of provinces' research productivity versus NSERC grants.
The scatter plot of provinces' research productivity (output) versus NSERC grants and scholarships (input) for the period 1996-2006 is shown in Figure 9.The provincially-aggregated grant values have been extracted from NSERC's online search engine (Natural Sciences and Engineering Research Council of Canada (NSERC) Last accessed: Jan. 2010).The query criteria to extract the values have been chosen as: "selection committee=Computing and Information Sciences".NSERC funds Canadian software engineering researchers through this particular subject area.
The acronyms for the four provinces names with the lowest amount of grants have been defined in Figure 8.In our data collection phase, there was absolutely no software engineering paper (research output) from any of the four Canadian territories.For this reason, those four territories are not shown in this scatter plot.
To clarify the notion of NSERC grants and scholarships, note that the NSERC grants are awarded to faculty members while the scholarships are awarded to graduate students and post-doctoral fellows.
The most notable observations based on Figure 9 are discussed next: Ontario (the most populous Canadian province) has received the major portion of the NSERC funding in the software engineering area with an amount which is greater than the 2 nd and 3 rd rank amounts added together (Quebec and British Columbia).
The total research support for the software engineering researchers in the bottom five provinces (Manitoba, NF, NB, NS and SK) altogether is almost equal the support for British Columbian researchers alone.It is interesting that the summation of research productivity in those five provinces is also almost equal to the British Columbian researchers.This denotes an almost equal efficiency in research publications in those cases.
A regression trend line has been super-imposed on the scatter plot which connects the lower five provinces to the Ontario point.British Columbia also almost falls on this line, denoting the average efficiency trend in most Canadian provinces.It is interesting to see that Alberta and Quebec are not clearly following this efficiency trend line.Thus, based on a rough analysis, Albertan researchers seem to have a higher efficiency than the national average.On the other hand, researchers from Quebec seem to have a lower efficiency than the national average.The root-cause analysis of this particular phenomenon would need careful analysis of various influential factors, e.g., perhaps, it is that researchers from Quebec are publishing more in other venues than the 12 venues we considered, e.g., those in French language.Quebec is receiving more than twice from NSERC grants compared to British Columbia, but both provinces have almost equal research productivity (output).

Conclusions and Future Works
This paper summarized a survey of publications by the Canadian researchers in the field of software engineering from 1996 to 2006.The survey is going to be an ongoing, annual event to identify the top 50 scholars and top 50 institutions over a 10-year period.The rankings were calculated based on the impact factor and h-index of papers published in top 12 selected software engineering journals and conferences.The top-ranked institution in the study period is Carleton University, and the top-ranked scholars (by each of the two metrics) are Lionel Briand (formerly with Carleton University) and Gail Murphy from UBC.
While being systematic, our study has several limitations due to the assumption we have made and also due to its large data set.First since we automated the analysis using a Java toolset (Ren and Taylor Last accessed: Aug. 2009), for researchers who have moved from/to Canada in the middle of their career, our analysis as it stands only analyzes their in-Canada publications.Another option (based on manual analysis) would have been to track each and every researcher to see if she/he has moved from-to Canada in the study period, however this would be extremely complicated given that there are 12,839 papers in the data set for our study period (Garousi and Varma 2009).Further note that by reviewing the top 15 list, we only found one researcher (Lionel Briand) to be traveling in and out of Canada during the study period, however he is being ranked #1 anyways due to his excellent publication record.
We are planning to conduct similar studies in the upcoming years and also in others fields such as electrical and computer engineering.
4.1 RESEARCHERS' PERFORMANCE OVER YEARSVariations of annual h-index of top 5 researchers over years 2001-2006 are shown in Figure4.Due to the large size of the data needed to calculate these variations and the associated time complexity for the analysis, the results are only shown for the2001-2006 period, rather than for 1996-2006.The main observations from these variations in h-index values are discussed next:

Figure 1 .
Figure 1.An example h-index graph generated by Scopus, visually representing the concept of h-index(Image courtesy of Scopus)

Figure 4 .Figure 6 .U
Figure 3. Top 15 scholars based on impact factors and h-index

Table 1 .
The list of top 12 selected software engineering journals and conferences

Table 2 .
Top 15 scholars based on impact factors vs. h-index

Table 4 .
How the top 5 Canadian researchers and institutions position in the worldwide rankings