An Evaluation of the Psychometric Properties of an Advising Survey for Medical and Professional Program Students

The purpose of this study was to evaluate the psychometric properties of a newly developed instrument intended to measure faculty competence as it pertains to their role as advisors, particularly in medical and professional programs. A total of 166 students completed the Faculty Advisor’s Skills and Behaviors Inventory (FASBI). The psychometric properties of the FASBI were evaluated using the Rasch Rating Scale Model. Results indicate the FASBI is a psychometrically-sound instrument capable of producing valid and reproducible measures.


Introduction of Problem
Student advising plays a critical role in student development (Light, 2001;Pizzolato, 2008;Reinarz & Ehrlich, 2002) and good advising has continually been linked to students' satisfaction with an institution (Baker & Griffin, 2010;Elliott & Healy, 2001;Freeman, 2008), and academic success (Campbell & Nutt, 2008;Museus & Ravello, 2010).Beggs, Bantham, and Taylor (2008) reported that college and university advising often influences students' career decisions and direction as well.Although colleges and universities often indicate a commitment to advising, extant research suggests advising is often "uneven in quality and ultimately ineffective" (Hossler, Ziskin, & Gross, 2009, p. 8).To this end, research by Hossler and colleagues (2009) and Kramer (2003) note that most colleges and universities in the United States do not assess advising.This is particularly unfortunate because the failure to evaluate the quality of advising at one's institution conveys a message to the faculty that advising is of low priority or is undervalued.

Purpose of the Present Study
Programmatic assessments are an important tool to actively evaluate successes, shortcoming, and outright failures of a program.This is essential for ongoing programmatic improvement.Given faculty advising is a topic of critical importance for students and their education, it is necessary that faculty advisors routinely be evaluated.Objective assessments of faculty strengths and weaknesses can inform the development of training programs or other interventions designed to help faculty in this vitally important role.Thus, the purpose of this study was to evaluate the psychometric properties of a newly developed instrument intended to measure faculty competence as it pertains to their role as advisors, particularly in medical and professional programs (e.g., veterinary medicine, pharmacy, dentistry, physical therapy, etc.).

Participants
Survey participants involved students at the North Carolina State University College of Veterinary Medicine.A census sampling of all students enrolled across each of the four years in a Doctor of Veterinary Medicine (DVM) program were invited to participate in the study.Of the 394 students invited, 167 provided valid responses for the 25 items evaluating the skills and behaviors of faculty advisors.Students' ages ranged from 21 to 49 years old, with the average student being 26 (SD = 3.9).The median age for the sample was 25.A complete breakdown of student demographic characteristics are presented in Table 1.

Instrumentation
A number of instruments are available for assessing faculty advising.However, most instruments focus on elements that are typically less relevant to faculty in medical and health-related professional programs.For example, most instruments assessing advising ask questions that focus on the extent to which faculty were able to help students identify a major and select appropriate/relevant courses to ensure degree requirements are fulfilled and students graduate in a timely matter.For medical and health-related programs, students have already selected their major, and typically navigate the curriculum in a cohort manner.Thus, there is little need to assess these facets.Similarly, instruments that focus on advising as it pertains to graduate and doctoral students (e.g., the Graduate Advising Survey for Doctoral Students) tends to focus on many of the same issues, but also perceptions of the department's climate, peer influence, and the advisor's role as someone training researchers.
Given medical and health-related programs have somewhat different needs, we developed a new survey based on elements we believed to be particularly relevant for these types of programs (see Zimmerman & Mokma, 2004;Barnes et al., 2011;Belcheir, 2000).This resulted in the development of the Faculty Advisor's Skills and Behaviors Inventory (FASBI).

Validation Framework
Much has been written in the psychometric literature about the limitations of traditional statistical approaches for validation studies.Royal (2010) lists six major weaknesses of traditional approaches, which include: 1) problems associated with erroneously treating ordinal ratings as interval level measures, 2) erroneously perceiving all items of equal importance, 3) erroneously assuming error is equal across all measures, 4) sample-dependency problems, 5) parametric statistical approaches require normally distributed data, and 6) how missing data can seriously threaten score validity.Many measurement scholars have declared Rasch models the "gold standard" for psychometric validation studies because they overcome the limitations of traditional statistical models and are the only measurement models that have the property of invariance (Royal, 2010;Salzberger, 2002;Wright, 1997).
Rasch models are particularly attractive for analyzing survey research data as they are able to separate person measures (e.g., knowledge, ability, skills, etc.) and item data (e.g., difficulty of item, difficulty of task, etc.) and explore how these two facets interact with one another.In the present study, the latent trait being measured is students' tendency to endorse (agree with) a variety of survey items.Rasch models produce linear measures (called "logits") and create a common linear continuum onto which both person and item measures are mapped.Because Rasch models are probabilistic models, the likelihood of a student endorsing an item can be modeled as a logistic function of the distance between a student and a survey item.Readers interested in learning more about Rasch measurement models are encouraged to see Engelhard (2014) and Bond and Fox (2007).
For the present study we opted to use the Rasch Rating Scale Model (RRSM) (Andrich, 1978) for the analysis of survey data as it is particularly well-suited for polytomous data.According to the RRSM model, the probability of a person n responding in category x to item i, is given by: β n is the person's position on the variable, δ i is the scale value (difficulty to endorse) estimated for each item i and τ 1 , τ 2 , ... , τ m are the m response thresholds estimated for the m + 1 rating categories.Winsteps measurement software (Linacre, 2016) was used to perform the data analysis.Parameters were estimated using joint maximum likelihood estimation procedures (Wright & Masters, 1982).

Results
Before the results of the psychometric validation are presented, it is necessary to first explore summative descriptive statistics.25.Is someone I would recommend as an advisor to other students 3.12 0.97

Dimensionality
Virtually all survey data sets consist of multiple dimensions.The question, however, is to what degree are various dimensions present, and is there evidence of a single, primary underlying construct being measured.To answer this question, a Rasch-based Principal Components Analysis (PCA) of standardized residual correlations was performed to assess dimensionality.In total, 64.4% of the Rasch dimension was explained, with 20.8% being attributed to the items.The largest secondary dimension explained 4.4% of the variance and had an Eigenvalue of 3.0, indicating a strength of about 3 items.The ratio of the variance explained by the items (20.8%) and the largest secondary dimension (4.4%) is about 5:1.Thus, for all practical purposes the data were sufficiently unidimensional for a Rasch measurement analysis.

Reliability
Reliability was assessed using multiple methods.First, the traditional measure of reliability (Cronbach's alpha) was calculated, then reliability measures were calculated from the Rasch measurement framework using "real" and "modeled" measures.Cronbach's α reliability was .972for the 25 items.Rasch-based reliability estimates were .94for "real", and ".95" for modeled, suggesting true reliability is likely somewhere in between.All three measures of reliability indicate highly reproducible measures (Royal & Hecker, 2015).Separation, which refers to the number of statistically distinguishable levels within the data, was also assessed.The separation statistic was 4.27, indicating approximately four statistically distinguishable levels were present in the data.

Rating Scale Effectiveness
Rating scale diagnostics were assessed to determine the extent to which students were able to appropriately interpret and make use of each rating scale category (see Table 3).Results indicate students made use of the full rating scale, although fewer students tended to provide ratings of disagreement.Infit and outfit mean square fit statistics were in appropriate range, the category measures and structure calibration measures each advanced in a stepwise manner as anticipated (Linacre, 2002).Collective results provide evidence the rating scale functioned very well for this instrument.(Linacre, 2015a).
Similar procedures used to assess person measure quality.Person measures ranged from -4.01 to 6.52 with an average standard error of .52 (SD = .22)in magnitude.These values also indicate excellent variability with sufficiently small standard errors to ensure stable measurements.Only 9 students (5.4%) from the sample had at least one fit statistic exceeding 2.0 indicating possible misfit.Two items appeared to violate assumptions for Local Item Dependence (LID).Specifically, item #2-Is easy to get in touch with, and item #3-Responds to my emails/calls in a timely manner, shared a standardized residual correlation of .70 indicating high statistical dependence.This indicates a student's response to one of these items will likely correlate highly with their response to the other item.
Differential Item Function (DIF) analyses were performed to assess if the construct remained invariant across relevant subgroups, particularly class year and gender.The iterative-logit (Rasch-Welch) method presented in Linacre (2015b) was performed.Because multiple comparisons were made across 25 items, a Bonferroni correction was necessary to control for compounding error.This resulted in the p-value normally set at 0.05 being reduced to 0.002 in order to detect statistically significant differences.Results indicate no statistically significant differences across items per subgroup.

Construct Hierarchy
In psychometrics, the manner in which items are ordered along a linear continuum is considered a construct.In essence, most substantive results emanating from a Rasch measurement analysis can be found in the construct map (also known as a Wright Map).In short, a construct map presents a visual snapshot of the "psychometric ruler" onto which person and items measures have been placed (see Figure 1).In the present study, students appear on the left side of the map, with those near the top representing individuals that had the least difficulty endorsing the items and those appearing at the bottom having the greatest difficulty endorsing the items.Likewise, items appear on the right side of the map, with those most difficult to endorse at the top and those easiest to endorse at the bottom.Here, students had the most difficulty endorsing item Q1-[My advisor] is proactive in reaching out to meet with me, and the least difficulty endorsing items Q12-[My advisor] encourages me to achieve my educational goals, and Q8-[My advisor] respects my opinions and feelings.Samuel Messick's (1989) framework for construct validity provides a useful guide for interpreting validity evidence.According to Messick, validity is the integration of any evidence that impacts the interpretation or meaning of a score.Messick's framework consists of six unique "aspects" of validity: substantive, content, generalizability, structural, external and consequential.It is from this framework that we appraise validity evidence present in the aforementioned results.
First, a Rasch-based PCA of standardized residual correlations indicated the data were primarily unidimensional, indicating a single, primary latent trait was being measured.This evidence speaks to the substantive aspect of validity.Measures of reliability consistently exceeded .90indicating highly reproductive measures.This evidence speaks to the generalizability aspect of validity.An evaluation of rating scale diagnostics indicated the rating scale functioned appropriately.This speaks to the structural aspect of validity.An evaluation of item and person measures confirmed the measures were psychometrically-sound.This speaks to the content aspect of validity.Additional validity evidence was discernible by way of DIF analyses that confirmed measures were invariant across class year and sex subpopulations.This speaks to the systematic aspect of validity and provides additional support for the generalizability aspect of validity.Results of this study have not been correlated with other studies, thus we present no evidence that speaks to the external aspect of validity.Finally, because the FASBI has not be used previously we cannot speak to the consequential aspect of validity which involves consequences (positive or negative) resulting from the use of the instrument (Royal & Puffer, 2014).In sum, there is an abundance of validity evidence to support the psychometric quality of the FASBI and its ability to produce high-quality measures.

Implications
The purpose of this study was to evaluate the psychometric properties of a newly developed instrument intended to measure faculty skills and behaviors as it pertains to their role as advisors, particularly in medical and professional programs (e.g., veterinary medicine, pharmacy, dentistry, physical therapy, etc.).Results of a thorough investigation of the FASBI's psychometric properties indicates the instrument is psychometrically-sound.Therefore, persons interested in evaluating faculty advising in medical, health, and various professional programs are especially encouraged to use this instrument.Of course, the FASBI may be relevant and appropriate for persons interested in evaluating advising in other disciplines as well.
The FASBI may be particularly helpful for evaluators as it addresses a wide-variety of topics of concern for most academic programs.Having insights about how students feel with respect to each of the items may be particularly informative with regard to identify attributes of more and less successful faculty advisors.Furthermore, identifying these strengths and weaknesses would be particularly helpful for preparing faculty training to become more effective advisors.
Finally, another implication of this study is methodological in nature.The techniques presented in this study involve what many measurement experts consider to be "gold standard" methods for conducting survey validation studies.Further, the use Messick's framework for evaluating and organizing validity evidence may help other researchers better identify how to present validity evidence to others.

Limitations
Of course, this study is not without its limitations.While this study intentionally made no effort to speak to substantive findings as that was beyond the scope of this paper, it remains a potential limitation that the survey data were of a self-reported nature.The extent to which students may have provided socially desirable responses, or that non-response bias may be an issue remains unknown.With respect to the sample frame, students were by and large female, which is typically the norm for veterinary medical programs.Despite the underrepresentation of male respondents, comparatively speaking, it is important to note that tests for DIF were conducted to determine if students appear to respond differently to the FASBI items based on sex.Results of the DIF analysis concluded that participant's sex does not affect students' responses to any FASBI items in any systematic way, t(833) = .00,p =1.000.Finally, veterinary medicine, much like all medical professional programs, has significant issues with regard to diversity.Future research should evaluate the functioning of FASBI items based on race and ethnicity as well.

Conclusion
Effective faculty advising remains a problem in many colleges and universities, including medical and health professions programs.To date, relatively few institutions having actively evaluated faculty advising.We suspect this may be in part due to a lack of appropriate instrumentation.Thus, this study sought to investigate the psychometric properties of an instrument intended to assess faculty competence as it pertains to advising.Results of the psychometric validation study provide considerable amount of validity evidence to support the quality of the FASBI.We encourage others to adopt the FABSI for similar evaluations of faculty advising.

Figure
Figure 1.Construct hierarchy

Table 1 .
Demographic characteristics of sample

Table 2 .
Table 2 presents a summary of mean and Standard Deviation (SD) values for each of the 25 items.Results of traditional statistical analysis

Table 3 .
Rating scale diagnostics Myadvisor] is proactive in reaching out to meet with me yielded inflated fit statistics (1.83 and 2.26, respectively), thus indicating a potentially problematic item.Point-measure correlations were all (high) positive values indicating excellent discriminatory abilities Wright and Linacre (1994)tandard errors to ensure statistically stable measures.Infit and outfit mean square fit statistics were evaluated using the recommendations provided byWright and Linacre (1994), noting ideal values should range between .60-1.40, and values exceeding 2.00 may be indicative of a "noisy" (potentially problematic) item.Of the 25 items, 23 fell within ideal range for fit statistic values.Item #23-[My advisor] is helpful yielded fit statistics slightly below .60 indicating these responses were a bit predictable.Item #1-[

Table 4 .
Item quality indicators