Exploring Gender Equivalence and Bias in a Measure of Psychological Hardiness

One of the most pervasive criticisms found in the hardiness literature concern the question whether the construct is equally important for men and women. Using a multi-group confirmatory factor analytic approach, this question was explored from a more fundamental perspective by examining measurement equivalence across gender in a measure of hardiness, the 15-item Dispositional Resilience Scale [DRS-15; Bartone, P. T. (1995). A short hardiness scale. Paper presented at the Annual Convention of the American Psychological Society, New York.]. Although results suggested some non-equivalence related to the control subscale, follow-up analyses examining gender bias in the two non-equivalent items showed that the effect of gender was minimal. The gender effects found indicated that women had a greater tendency to endorse these items compared to men. Given the stringent criteria used to test for equivalence and minimal evidence of bias found, it is concluded that the results largely point to equivalence across gender in the DRS-15.


Introduction
The personality disposition known as hardiness describes a generalized style of functioning characterized by a strong sense of commitment (the ability to see the world as interesting and meaningful), control (the belief in one's own ability to influence the course of events), and challenge (seeing new experiences as exciting opportunities for personal growth; Bartone, 2000).It is conceptualized as a constellation of personality characteristics that function as a resilience resource in the encounter with stressful life events (Kobasa, 1979;Kobasa, Maddi, & Kahn, 1982;Kobasa, Maddi, & Zola, 1983).What started as a longitudinal study of American business executive (c.f., Maddi & Kobasa, 1984) has by now grown into an impressive body of research demonstrating the stress mitigating effects of hardiness (see, e.g., Bernas & Major, 2000;Hystad, Eid, Laberg, Johnsen, & Bartone, 2009;Soderstrom, Dolbier, Leiferman, & Steinhardt, 2000).
The hardiness-stress mechanism most likely involves a combination of cognitive, physiological, and behavioral processes.For example, Maddi and Hightower (1999) have suggested that hardiness encourages a kind of coping process that renders stressful events less harmful, labeled transformational coping.That is, high-hardy individuals are more likely to react to stressful events with increased interaction, effort, and active attempts to find solutions.Part of this transformational coping involves the interpretation or meaning that people attach to events around them (Ouellette, 1993).People characterized by high levels of hardiness believe they can control or influence events, tend to interpret stressful events in positive and constructive ways, and construe these events as challenges and valuable learning opportunities.These adaptive cognitions are in turn believed to result in lower levels of organismic strain in response to potentially threatening events (Kobasa, Maddi, Puccetti, & Zola, 1985).For instance, Dolbier and colleagues (Dolbier et al., 2001) have suggested that these adaptive appraisals protects individuals from the immune-suppressive effects of stress, and thereby enabling them to uphold a healthy status.

Gender Differences in Hardiness Research
The accumulating evidence aside, the construct of hardiness has not escaped criticism.One of the most pervasive lines of critique concerns the question whether hardiness is equally important for men and women.From a social-constructive and feminist standpoint, Riska (2002) has criticized the construct as merely a way to confirm and legitimize traditional masculinity.In Riska's view, hardiness is a product of the socio-political climate of its decade (the 1980s), reflecting the traditional white, middle-class, and masculine values prevalent in the US at the time.She describes a transition from the Type A man that captured the ideological construct of traditional masculinity prevalent in the 1950s, to the hardy man, which served to demedicalize and give new legitimacy to traditional male behavior.In this transition, hardiness became a key to re-evaluate the core features of traditional masculinity; men could again be ambitious, competitive, and in control, while remaining healthy, as opposed to the unhealthy and coronary-prone Type A man.
Many of the concerns regarding gender differences found in the literature stem from the fact that the original hardiness measure was developed based on a sample of exclusively male executives (c.f.Maddi & Kobasa, 1984), and that early empirical investigations tended to focus primarily on men.In later studies that have included female participants, inconsistent or equivocal results have been reported.Some have found that hardiness moderates the ill effects of stress on health for men, but not for women (Benishek & Lopez, 1997;Klag & Bradley, 2004;Shepperd & Kashani, 1991), while others have found similar effects for the two sexes (King, King, Fairbank, Keane, & Adams, 1998;Robitschek & Kashubeck, 1999;Rosen, Wright, Marlowe, Bartone, & Gifford, 1999).
Several explanations have been forwarded to explain the gender differences.It has been suggested that the coping strategies typically employed by men and women could account for some of the observed differences (Klag & Bradley, 2004;Williams, Wiebe, & Smith, 1992).More precisely, men stereotypically cope with stress by using problem-focused strategies, whereas women are believed to employ avoidance strategies to a larger degree (Tamres, Janicki, & Helgeson, 2002).Moreover, men and women are often believed to differ in regard to how a particular stressor is appraised and in what they consider stressful (Baum & Grunberg, 1991).Combined, the argument is that, compared to hardy men, equally hardy women use less beneficial cognitive and behavioral coping strategies.However, this explanation seems insufficient on several grounds.Firstly, there are reasons to believe that the stereotypical coping patterns of men and women are not as clear cut as presented above (Tamres et al., 2002).
And secondly, research have demonstrated gender differences even when no differences in coping were present (Klag & Bradley, 2004).
A more plausible explanation could be that the stressors that have been investigated in many hardiness studies are predominantly male oriented (Wiebe & Williams, 1992).Several studies have focused on achievement-oriented stressors, whereas social or interpersonal stressors have been researched less often.Many studies may thus contain a masculine bias in the gender relevance of stressors.To illustrate, Wiebe (1991) subjected male and female participants to an experimentally induced evaluative threat (achievement-oriented), and recorded the participants' affective and physiological response.The few gender differences Wiebe found showed that hardiness was a protective factor for males, but not females, and this could be related to the achievement-oriented stressor used.
Yet studies that have focused on female participants and included interpersonal or social stressors have managed to demonstrate beneficial effects of hardiness.For example, Feinauer, Mitchell, Harper, and Dane (1996) found that among victims of childhood sexual abuse, high-hardy women had significantly fewer distressing symptoms (e.g., depression, sleep disturbance, sexual dysfunctions, and symptoms of post-traumatic stress disorder).Likewise, Foster and Dion (2003) demonstrated that women who scored high on hardiness reported less anxiety after being exposed to both hypothetical and personal encounters with gender discrimination in a laboratory.

The Issue of Measurement Equivalence
While the above arguments all raise important and valid points, they fail to address a more fundamental question; are we actually measuring the same underlying construct in men and women with current measures of hardiness?Great strides have been made within the field of cross-cultural psychology in stressing the importance of instrument equivalence in cross-group comparisons.It is now generally acknowledged that measurement equivalence needs to be established before any meaningful comparisons between different groups can be made (Byrne & Watkins, 2003;van de Vijver & Leung, 1997).However, measurement equivalence, and the closely related term measurement bias, has not featured as prominently in hardiness research.
Equivalence is a generic term describing different aspect concerning the comparability of a construct or measuring instrument across two or more groups.For instance, measurement equivalence refers to whether the association between the construct or constructs being measured and the indicators used to measure it are the same across groups (Byrne & Watkins, 2003).In other words, measurement equivalence concern the degree to which the content of the items comprising the scale are being perceived in the same way across groups.Similarly, structural equivalence refers to whether relations among the constructs, as tapped by its subscales, are the same across groups (Note 1).Structural equivalence thus concerns the underlying theoretical or factorial structure of a given measuring instrument.
In a similar vein, the term bias can be thought of as a generic term reflecting all nuisance factors that can possibly threaten multi-group comparability.For instance, the term construct bias suggests the notion that the construct being measured is understood differently across the groups being studied, for example because its indicators are differentially appropriate across groups.The term item bias (also commonly referred to as differential item functioning or DIF) implies that a particular item elicits a differential meaning of their content across groups.The implication of item bias is that the item does not discriminate between people with different standings on the trait in the same way across groups.In other words, people with the same standing on the underlying trait should have the same item score irrespective of group membership (van de Vijver & Leung, 1997).
As an extension of the critique forwarded by Riska (2002), one could therefore argue that hardiness measures contain (subtle) gender biases and do not adequately capture or measure the construct in women.For example, the behavior of working hard that is often used as a marker of hardiness may not be an appropriate indicator among women.In Riska's terms, this indicator would reflect traditional masculine values and behaviors.As of yet, no study has systematically examined potential gender bias at the measurement level.This is somewhat surprising since measurement equivalence is a prerequisite if any substantial claims about gender differences are to be made.Before we can establish whether differences exist in the hardiness-health relationship, we need to ascertain that the same underlying construct is being measured in men and women.

Research Aim
The aim of the current study was to fill this gap in the literature by examining a commonly used hardiness measure for equivalence across samples of male and female participants.To achieve this aim, a confirmatory factor analytic (CFA) approach was used, wherein a baseline model was compared to successively more restricted models.Next, provided with findings of non-equivalent items, an analysis of variance (ANOVA) procedure suggested by van de Vijver and Leung (1997) was used to further explore the presence of bias related to the items in question.

Participants
In order to obtain a sufficiently large and diverse study sample, participants from four pre-existent surveys were pooled into one sample.Participants in these surveys had all completed the same hardiness measure.The first sample consisted of employees from the Norwegian Armed Forces who had completed hardiness questionnaires as part of an annual personnel survey in 2008 (n = 7522, 1265 females).The second sample included undergraduate students enrolled in introductory psychology courses at the University of Bergen who had completed the hardiness scale as part of a larger study in 2007 (n = 289, 226 females).The third sample comprised participants who had attended leader development programs under the auspices of the Norwegian Armed Forces in the period 1994 to 2007 (n = 157, 88 females).Finally, the fourth sample included candidates applying for admission into different officer candidate schools and military academies in Norway in 2008 (n = 257, 37 females).

The Hardiness Instrument
Hardiness was measured with the 15-item Dispositional Resilience Scale (DRS-15; Bartone, 1995).The DRS-15 consists of five items each to measure the control (e.g., "By working hard you can nearly always achieve your goals"), commitment (e.g., "Most of my life gets spent doing things that are meaningful"), and challenge (e.g., "Changes in routines are interesting to me") dimensions of hardiness.After reversing six negatively keyed items, a total hardiness score can be obtained by summing responses to all items.In addition to a total hardiness score, three subscale scores can be created by summing the relevant five items for each of the commitment, control, and challenge dimensions.All items are scored on a four-point scale, ranging from not at all true to completely true.
Previous research has confirmed the DRS-15 to be a valid and useful tool in both military and non-military samples (e.g., Britt, Adler, & Bartone, 2001;Clark, 2002;Vogt, Rizvi, Shipherd, & Resick, 2008).In the present study, an adapted Norwegian version of the DRS-15 was used.This scale has previously been validated for use in the Norwegian population and language (Hystad, Eid, Johnsen, Laberg, & Bartone, 2010).In two recent studies, this scale predicted the likelihood of sickness absence from work (Hystad, Eid, & Brevik, 2011) and was negatively related to risk for alcohol abuse in military personnel (Bartone, Hystad, Eid, & Brevik, 2012).
Cronbach's alphas for the total score in this study were .78and .76 for men and women, respectively.For men, the Cronbach's alphas for the subscales were .74,75, and .62 for commitment, control, and challenge, respectively, while for women, the alphas were .73,.73,and .67 for commitment, control, and challenge, respectively.These reliability coefficients are within the range typically reported for the 15-item scale and subscales (usually in the .60-.70 range; e.g., Bartone, Roland, Picano, & Williams, 2008;Britt et al., 2001).Hystad et al. (2010) have demonstrated that the DRS can best be represented as a hierarchical structure comprising a general hardiness dimension and three first-order factors corresponding to the sub-dimensions of commitment, control, and challenge.To test the equivalence of this theoretical structure across male and female participants, version 6 of the statistical program EQS was used (Bentler, 2001).Based upon previous suggestions and empirical investigations (Byrne, Shavelson, & Muthén, 1989;Vandenberg & Lance, 2000), the following steps were taken in the present study:

Statistical Analyses
1) A test of configural equivalence, wherein equal factor structures are tested.This is achieved by specifying the same pattern of fixed and free factor loadings in each group, and aims at examining whether the hardiness instrument evoke the same cognitive frame of reference for female and male respondents.This also serves as a baseline model to which subsequent, more restricted models can be compared.
2) A test of measurement equivalence, in which factor loadings for like items are invariant across groups.Accordingly, step 1 was repeated with imposed equality constraints on the factor loadings for like items.In essence this examines whether the associations between like items and the underlying construct are the same across groups, and thus whether the construct indicators (i.e., the items) are calibrated to the underlying construct in the same manner.
3) A test of structural equivalence, in which the underlying theoretical structure of the instrument is tested for equivalence.This entails that the relations among the construct, as tapped by the hardiness measure's subscales, are equivalent across groups.In this model all specified constraints from the previous step are retained while concurrently testing for equivalence of the relations between the latent factors.
In each of these steps, all error variances were allowed to vary freely between the two groups.The requirement that error variances be equal between groups is considered excessively stringent and of little practical value, and thus test of equivalence generally do not include constraints between errors (Byrne & Watkins, 2003).
Provided with evidence of non-equivalent items, the next step included scrutinizing further each item in question by using the ANOVA approach described by van de Vijver and Leung (1997).In this approach, item score is treated as the dependent variable, while gender and score levels are the independent variables.Score levels are composed on the basis of total score on the instrument or its subscales, and ideally, all possible score levels should be scrutinized.Most often, however, it is impossible to separate all score levels due to insufficient data in many levels.Based on van de Vijver and Leung's recommendation of at least 50 persons per score level, nine different levels were created.Next, 2 x 9 two-way ANOVAs, with gender (2 levels) and score level (9 levels) as independent variables, and item score as the dependent variables, were performed.

Baseline Models
Testing for equivalence begins with separate estimations of well-fitting baseline models for each group.Following completion of this task, the multi-group model, comprising the baseline models for both men and women, is tested for equivalence across groups.
In testing for the baseline models for both groups, inspection of the Lagrange Multiplier (LM) test provided by EQS argued for the specifications of error covariance between two item pairs for both groups (DRS6 and DRS8; DRS16 and DRS18).Error correlation between item pairs can be justified because it often indicates perceived redundancy in item content or represents non-random error due to method effects (Byrne, Baron, & Campell, 1993).On these grounds, we considered it theoretically justified to include error correlations between said items because they are negatively keyed (DRS6 and DRS8) and/or belong to the same subscale (DRS6 and DRS8; DRS16 and DRS18).The multi-group model to be tested for equivalence therefore contains two common item pair covariances.This model is illustrated in Figure 1.

Test for Equivalence
Nested models (i.e., models that are hierarchically related to one another in that their parameter sets are subsets of one another) are usually compared by computing the difference in chi square values (Δχ²) for the two models.This Δχ² value is distributed as χ², with degrees of freedom equal to the difference in degrees of freedom (Δdf), and non-significant values indicate equivalence between models.When the analyses are based on robust estimation, Satorra and Benter (2001) have shown that the difference in S-Bχ² (ΔS-Bχ²) can be corrected and used in the same way as the Δχ² value to compare models.Results from the model that allowed all parameters to be freely estimated across gender, and the model that constrained factor loadings and the common error covariance to equality, yielded a ΔS-Bχ²(15) of 39.45, with p < 0.001.

Tests for Item Bias
The LM test χ² statistics assigned to each constrained parameter indicated that the non-equivalence related to one error covariance and two factor loadings (DRS2: "By working hard you can nearly always achieve your goals" and DRS8: "I don't think there's much I can do to influence my own future"), both belonging to the control subscale.Faced with these results, the relations between the latent factors were next tested for (structural) equality, while allowing the error covariance and factor loadings associated with DRS2 and DRS8 to vary freely among groups, as suggested by Byrne et al. (1989).The results from this test of partial equivalence indicated non-equivalence related to the second-order loading from control to the general hardiness factor (LM test χ² = 10.67,p = .001).Results from the testing of equivalence are summarized in Table 1.
The results from the test of structural equivalence echoed the finding from the analysis related to the first-order factor loadings, suggesting that the control dimension and related items might not function equivalently across gender.To further explore the potential non-equivalence of the two control items, tests for evidence of item bias were performed following the procedures proposed by van de Vijver and Leung (1997), as described in section 2.3.According to van de Vijver and Leung, significant differences related to the main effect of gender points to uniform bias (i.e., bias that is constant across score levels), while a significant interaction between gender and score level indicates non-uniform bias (i.e., bias that is not constant across score levels).Given the relatively large sample size used in the present study, significant effects are likely to emerge for trivial differences between the genders.Consequently, the extent of bias was evaluated based on inspection of effect sizes (η p ²) for the main and interaction effects, where values of .01,.06,and .14 are considered small, medium, and large, respectively (Cohen, 1988).
Results from the ANOVA showed that item DRS8 demonstrated a significant effect of gender, p = .004,η p ² = .001.None of the interaction effects between gender and score level, or the main effect of gender for item DRS2, was significant.In reviewing the effect size for DRS8, however, the extent of bias could be characterized as negligible.In other words, the effect of gender accounted for only 1‰ of variance in item score.
Figure 2 gives a more illustrative picture of the two non-equivalent items.The horizontal axes of this figure represents the different score levels computed for the control subscale, while the vertical axes represents the difference in mean value resulting from subtraction of item mean score for female participants from the item mean score for male participants.A biased item is expected to exhibit a pattern in which the plotted score level points depart from zero in a systematic pattern.For example, if the plot of points remains consistently above or below zero, the item is said to be uniformly biased towards one of the groups, depending on whether the plot is above or below.Turning to figure 2, it is evident that although the plotted points for DRS8 remain somewhat consistently above zero, this pattern is not strong.The line connecting the mean difference at each score level is close to zero (representing equal item mean scores), and is near to tangent to the zero-line at the higher score levels.A final point worth mentioning is that, to the extent that the item is biased, it favors female participants.That is, at every score level except level 8, female had mean item scores larger than or equal to male participants.Also evident in Figure 2, the plotted points for the other non-equivalent item (DRS2), exhibit a more inconsistent pattern, interchangeably positioned above and below zero, as expected from the non-significant ANOVA results.

Discussion
This article aimed to fill a gap in the literature by examining equivalence in a commonly used hardiness scale.It was argued that before we have established whether hardiness scales actually measure the same construct across gender, it is impossible to draw any decisive inferences about any differences found in the literature.The results from the analyses of equivalence are indicative of some gender differences.Adequately well-fitting baseline models were established for both genders, suggesting that the DRS evoked the same cognitive frame of reference for both female and male participants.However, two items belonging to the control subscale were found not to be equivalent across gender.This entails that the associations between these items and the underlying construct of control were not equivalent for female and male participants.Echoing this result, the test for structural equivalence showed that the relationship between the control subscale and the general hardiness factor was non-equivalent across gender.Examining the non-equivalent items in more detail, however, revealed that the amount of bias was small at best.
It is interesting to note that the difference between male and female participants all appeared in the control subscale.The control dimension of hardiness involves the perception of your ability to affect the course of events, and is assessed by statements involving hard work and individual effort to achieve goals and affect your surroundings (e.g., "How things go in my life depends on my own actions).Following Riska (2002), these indicators could be said to reflect traditional masculine values.There is also empirical evidence to suggest that personal control holds different importance to the identity of men and women.For example, it is frequently found that those with traditionally masculine characteristics are generally more affected by success in instrumental activities, as opposed to those with traditionally female characteristics who are generally more affected by interpersonal success (Waelde, Silvern, & Hodges, 1994).Moreover, in a study of adolescents, Margalit and Eysenck (1990) found that boys' development of identity focused on individual achievement and task-oriented behavior, while girls' development focused on issues of relationships and social behavior.
Yet a male emphasis on personal control does not seem to explain the gender differences found in this study.As the ANOVA analyses showed, the modest signs of bias found in the non-equivalent control items favored the female participants.In other words, given the same score on the underlying control factor, female participants nevertheless had a higher mean item score than male participants.In practical terms, this means that women had a higher propensity to endorse these items compared to men.Also, while not included in the results section, but implied from the results from the ANOVA analyses, women had significantly higher mean scores on the control subscale (t(8223) = 4.03, p < .001).A likely explanation for these results pertains to the particular samples used.With one exception, the samples were drawn from military populations, or, in one case, candidates applying for officer candidate schools.Perhaps the women in these populations have internalized traditional male values in order to succeed in predominantly male dominated domains, and to such a degree that these values eventually exceeded those of their male counterparts.
It should also be noted that the analyses conducted in this article represents a conservative test of equivalence.Specifically, the χ² is sensitive to sample size.Because of this, the Δχ² (and ΔS-Bχ²) value is also sensitive to sample size and tends to yield significant values even for trivial differences between groups (Cheung & Rensvold, 2002).For this reason, there is an increasing tendency to rely on two other criteria when evaluating equivalence (Byrne, 2006).Firstly, the more restricted or constrained model is deemed equivalent if it exhibits an adequate fit to the data.Based on this criterion, equivalence across gender could perhaps be said to have been established in the present study.The more restricted models that constrained parameters to equality across groups demonstrated adequate fit on par with the unconstrained baseline model.As Table 1 in section 3.2 shows, the *CFI and SRMR values increased somewhat compared with the baseline model; the *RMSEA value, however, was smaller in both constrained models.The second alternative criterion involves inspecting change in fit statistics other than the χ².Cheung and Rensvold (2002) examined 20 different goodness-of-fit statistics and concluded that the ΔCFI was a robust statistic relatively unaffected by sample size.They also suggested that the ΔCFI value should not exceed .01.Again, based on this criterion, equivalence across gender is supported in the present study, as evident in the negligible ΔCFI value of .001(see Table 1).
In conclusion, the results from the present study support some gender difference at the measurement level of psychological hardiness.As previously argued, earlier studies have maybe somewhat prematurely jumped to conclusions and argued that hardiness is more important for men, without establishing whether the same construct is actually measured in both genders.The present study suggests that there might in fact be some gender differences in hardiness, relating to how the associations between the indicators and the underlying construct of control is perceived; and as an extension to this, how the association between the control factor and the general hardiness dimension is perceived.
However, it is important to note that, while these differences might be of statistical significance, their substantial or practical value might be less certain.Judged by less stringent criteria, for example, the results from the present study largely point to equivalence across gender.In addition, the amount of gender bias according to the ANOVA was negligible, accounting for minimal amount of variance in item scores.At any rate, it behooves the researcher to take the issue of measurement equivalence into account when exploring gender differences in hardiness.

Figure 1 .
Figure 1.Hypothesized equivalent multi-group model of the dispositional resilience scale

Figure 2 .
Figure 2. Non-equivalent items related to the control subscale of the dispositional resilience scale Positive scores on the vertical axes indicate higher mean scores for women and negative scores on the vertical axes indicate higher mean scores for men

Table 1 .
Goodness-of-fit statistics and summary of equivalence testing of the dispositional resilience scale across gender a Corrected ΔS-Bχ² values are reported.b Error covariance between item pairs DRS6-DRS8 and DRS16-DRS18.c Error covariance between DRS16 and DRS18.* p < .05. *** p < .001.