Is Age-related Decline in Vocal Emotion Identification an Artefact of Labelling Cognitions ?

Evidence has emerged that older adults find it more difficult to interpret prosodic emotions than younger adults. However, typical tasks involve labelling-related cognitions over and above emotion perception per se. Accordingly, we aimed to determine if age-related difficulty in prosodic emotion labelling extended to discrimination, which is more closely related to emotion perception per se. For this purpose, 45 younger adults (mean age 20 years, 2 males/43 females) and 45 older adults (mean age 71 years, 16 males/29 females) were recruited. In one task, participants heard pairs of sentences and were asked to indicate whether they were spoken with the same emotional intonational or not. In a second task, they heard sentences with intonation conveying a question or statement, and indicated whether the non-emotional intonation patterns matched or not. Older adults’ performance consistently fell below that of younger adults. Older adults may have a generic prosodic decoding deficit, regardless of the end function of the prosody.


Introduction
Clinical studies highlight that across age-groups, emotion recognition is an important mediator of social success and well-being (Brune, Abdel-Hamid, Lehmkamper, & Sonntag, 2007;Jacobs et al., 2008), whilst non-clinical studies document correlations with depression and relationship success in supposedly healthy young adults (Carton, Kessler, & Pape, 1999).Much literature on emotion-related cognition in older adults discusses their superior performance relative to younger adults, e.g., emotion regulation is maintained or improved (Mather & Carstensen, 2005).However, younger adults perform other emotion-related cognitions better than older adults, particularly emotion cue processing; older adults are less able to identify facial emotions, particularly negative ones (Ruffman, Henry, Livingstone, & Phillips, 2008).Most research documents older adults' difficultly with facial emotion identification, but older adults are also less able to identify nonverbal intonation cues, and have particular difficulty with the pragmatic functions of these intonational cues in conveying communicative intents such as speaker emotion (Dupuis & Pichora-Fuller, 2010;Orbelo, Grim, Talbott, & Ross, 2005;Paulmann, Pell, & Kotz, 2007).
In the literature since recognition that age-related difficulty recognising emotion cues also exists for prosody (Cohen & Brosgole, 1988), an emergent theme is the question of whether this difficulty is a primary consequence of ageing, or a secondary by-product.Key possible mediators of indirect age-related effects on prosodic emotion recognition include age-related sensorineural hearing loss, and cognitive ageing.Prosodic emotions are conveyed by manipulating intonation patterns, which rely on changing pitch, duration and amplitude.However, since information on these prosodic components is concentrated amongst lower speech frequencies (Scott, Green, & Stuart, 2001), the tendency for older listeners' difficulties to relate to the well-known effects of high-frequency hearing loss (Pichora-Fuller & Souza, 2003) may mitigate age-related differences in prosodic comprehension unless age-related hearing loss is controlled for.
Regarding indirect cognitive mediators, research on declining prosodic emotion recognition is akin to that which has shown that age-related differences in facial emotion identification cannot be explained by perceptual or general cognitive decline (Orgeta & Phillips, 2008).In the first notable study, Orbelo et al. inferred declining prosodic emotion recognition is not attributable to general cognitive ageing nor hearing loss (Orbelo, et al., 2005).They administered Ross et al's "Aprosodia Battery" (Ross & Monnot, 2008), which requires participants to identify the emotions conveyed by prosody across asyllables ("aaaaahhhhh"), monosyllables ("ba ba ba ba ba"), and words ("I am going to the other movies").Alongside the Aprosodia Battery, older and younger adults completed pure-tone threshold and word-recognition assessment of hearing, and neuropsychological assessment with the Stroop colour word task (Stroop, 1935), and the Repeatable Battery for the Assessment of Neuropsychological Status (Randolph, Tierney, Mohr, & Chase, 1998).Neither hearing measure was predictive of performance on any prosodic emotion recognition task.Only three of the thirteen neuropsychological sub-tests predicted any variance on the prosodic tasks (the Stroop, list learning and story memory), and only marginally so.A subsequent study suggested that some aspects of cognitive ageing did worsen the age-related decline though (Mitchell, 2007).Alongside the prosodic emotion identification task, background tests of pure-tone average hearing, IQ approximation (Nelson, 1982), depression (Beck & Steer, 1987), and verbal working memory were administered (Daneman & Carpenter, 1980).In the prosodic task itself, the possible influence of the well-known age-related decline in frontal lobe function (Tisserand & Jolles, 2003), was probed by systematically manipulating working memory load in an "N-back" style task (Nyberg, Dahlin, Stigsdotter Neely, & Backman, 2009).Of these probes, both frontal lobe load and verbal IQ significantly increased the age-relate impairment in prosodic emotion recognition.
One factor that seriously confounds the primary vs. secondary nature of this age-related decline is the type of task used, because of the different cognitive operations that the various prosodic emotion recognition tasks require.In the prototypical tasks that dominate, participants label emotions by selecting emotion words.However, this does not just involve emotion recognition, but cognitions relating to label identification and retrieval.At present, it is therefore difficult to determine whether age impairs prosodic emotion recognition per se, or the additional cognitions, or both.When the influence of these additional cognitions is tackled, we will be better able to assess the importance of age-related cognitive decline as an indirect mediator of declining prosodic emotion recognition, and thus be better able to assess the primacy of age-related impairment in prosodic emotion recognition.
One way to reduce these additional cognitive demands is to strip away superfluous cognitive operations surrounding the basic task.In visual studies the labelling operation is removed, and participants are asked to judge whether two photographs display the same emotion or not.MacPherson et al. used such a task to suggest age-related effects were not attributable to declining verbal decision making, since older adults' ability to label and discriminate diminished (MacPherson, Phillips, & Della Sala, 2006).In the first task, participants viewed photographs of Caucasian and Japanese faces and were required to label the emotion displayed by choosing from a fixed list of emotional terms.Older adults were particularly impaired at recognising sad facial expressions.In the second task, a visual emotion-matching paradigm was used in which pairs of faces were displayed side by side and participants chose the label "same" or "different", to indicate whether the same emotion was displayed or not across the pair.Although the decision-making load was reduced on the second task, clear age-related impairment in prosodic emotion recognition was still observed.
Here, we aimed to determine if age-related differences in prosodic emotion identification were still observed when labelling was not required.This would make an important contribution to the literature, by more specifically isolating whether these differences truly related to emotion processing or not.If no longer observed, it might not relate to emotion-decoding per se.If observed, combined labelling and discrimination evidence might imply a basic auditory perception dysfunction or one that specifically related to obtaining emotion cues from the acoustic information comprising prosody.If it also extended to discrimination of non-emotional prosody, declining auditory perception would be more an even more likely explanation.The rich body of work by Wingfield et al. suggests that older adults do demonstrate difficulty using prosodic information for non-emotional (linguistic) purposes e.g., parsing and word recall (relative to younger adults) (Wingfield, Lindfield, & Goodglass, 2000;Wingfield, Wayland & Stine, 1992).However, with respect to the pragmatic functions of non-emotional prosody in conveying communicative intent, the literature is less clear.Emotional and non-emotional prosody may be conveyed by quantitatively different acoustic characteristics (Pell, 2001), which could perhaps create the potential for differential age effects at the level of integrated prosodic ensembles.However, at this level, positive findings of age-related decline (Mitchell, 2007) are balanced against null findings (Raithel & Hielscher-Fastabend, 2004), and the potentially confounding influences of labelling cognitions remain untested.
In summary, the aim of this study was to determine whether age-related decline in vocal emotion identification was an artefact of labelling cognitions and thus a secondary by-product of cognitive ageing, and secondly, to ascertain the generalisability of this supposition to the use of prosody to convey non-emotional types of communicative intent.

Methods
This research was performed according to the Declaration of Helsinki (Rits, 1964), and the British Psychological Society's Ethical Principles.It was independently reviewed in accordance with procedures specified by the University of Reading Ethics and Research Committee.

Participants
Forty-five healthy older adults (target 60-85years, mean age 71) were recruited from a volunteer database maintained by the School of Psychology.Further background data is summarised in Table 1.Worthy of note, the hearing loss demonstrated by both groups was comfortably within the range considered to be clinically normal (<40 dB(HL) loss) (Chew & Yeak, 2010).Corresponding assessments are described below.Forty-five healthy younger adults (target 18-35years, mean age 20) were recruited from psychology undergraduates.Exclusion criteria included first language other than English, hearing problems, psychological or neurological disorders, head injury or long periods of un-consciousness, and alcohol or drug abuse (self-report).Unpaired t-tests compared younger and older adults to assess whether the samples were matched.For number of years' education and level of depression, the samples matched, but there were small differences in hearing sensitivity loss and predicted verbal IQ, older adults showing higher IQ and greater hearing loss.

Background Assessments
A Kamplex KS8 audiometer assessed hearing loss dB(HL) against British Standards BSEN60645 and BSENISO389, using tones at 500 Hz, 1 kHz and 2 kHz (the sound frequencies important for conveying speech (French & Steinberg, 1947).Tones were presented serially, and their amplitude increased from the normed zero, until participants could hear them."Pure tone averages" (PTA) were derived by calculating hearing loss at each frequency, averaged across the left and right ears.Given that age-related hearing loss begins at higher frequencies and, with increasing age, expands in both magnitude and extent, we separately explored the effects of hearing loss at these three frequencies (Bielefeld, Tanaka, Chen, & Henderson, 2010).To approximate verbal IQ, predicted verbal IQ was derived from the National Adult Reading Test (NART) (Nelson, 1982).This approximation is based on correspondence between reading ability and general intellectual level.It comprises 50 irregularly spelt words, whose pronunciation cannot be predicted from letter-sound correspondences, and whose reliability coefficient has been shown to be 0.93 (Nelson, 1982).Finally, the Beck Depression Inventory (BDI; (Beck & Steer, 1987) was administered to attenuate confounds relating to the greater incidence of depression in younger adults (Fiske, Wetherell, & Gatz, 2009).Its reported reliability coefficient for nonclinical populations is .81(Beck & Steer, 1987).

Experimental Tasks
One task assessed discriminating whether prosody conveyed different emotions across stimuli pairs; the other assessed discrimination of stimuli whose non-emotional (linguistic) function differed according to intonation pattern.Stimuli were drawn from 62 sentences such as "the pen was filled with blue ink", or "the girl did her homework", their neutral content attenuating interfering lexico-semantic emotion cues.One male and female drama student recorded them in four tones: happy, sad, as declarative statements, as questions.Using GoldWave v5.23 (http://www.goldwave.com/),16-bit mono digital recordings were sampled at 11,025 kHz.Tone validity was checked by surveying 25 healthy young adults who rated whether the intonation sounded "happy", "sad", or "neither" for emotion stimuli, and like a "question", "statement", or "neither" for non-emotional stimuli.This age-stratum was chosen to make the recordings since younger adults are commonly considered to represent the prototypical standard to which other groups should be compared (Adolphs, 2002).Recordings were retained if >95% participants agreed with the intended tone.No emotion stimuli had to be eliminated, but twelve non-emotional stimuli did.Further processing with Gold Wave reduced extraneous noise and improved sound quality, using noise gate and noise reduction functions.Lead-in and -out silence were manually trimmed.The processing and acoustic characteristics of stimuli are described in more detail in Mitchell et al. (Mitchell, Kingston, & Barbosa Boucas, 2011).
The structure of the tasks was the same, and they were both administered on a standard PC with E-prime (Psychology Software Tools, Pittsburgh), using Bose QuietComfort©3 headphones.In each trial sentence1 began at t=0s, sentence2 at t=5.75s, with the mean duration of the sentence stimuli being 1.97s (+.54).Participants waited until the end of the second sentence before responding.In both tasks, participants pressed 1 on the keyboard if the intonation was the same, 9 if different.Response keys alternated between participants, although they used the same keys for both tasks.Key prompts remained onscreen throughout.Measurement of reaction times (RT) began at the end of sentence2.Participants were allowed as long as necessary to respond, although they were advised to respond as quickly and accurately as possible.The space bar was pressed to begin the next trial.Both tasks comprised 36 trial pairs in which each sentence was independently pseudo-randomly selected, although intonation patterns matched in half the trials.For non-matched emotional pairs, the first sentence was happy half the time, sad the other half.For non-matched non-emotional pairs, the first sentence was a question half the time, and a statement elsewhere.Speaker gender pairings (male, male; female, female; male, female; female, male) were evenly distributed

Results
Descriptive performance statistics appear in sed by mean and standard error.Table 2. Two three-way ANCOVAs were performed (one for performance accuracy, one for RT), where the between-subjects factor was group (young vs. old), and the repeated factors were task (emotion discrimination vs. intonation discrimination), and agreement (intonation patterns matched vs. differed).Because of between-group differences in hearing sensitivity (at 1000Hz and 2000Hz) and IQ approximation, these variables were initially incorporated into the statistical models as potential explanatory covariates.However, in the accuracy ANCOVA, neither hearing sensitivity variable covaried with performance (1000Hz: F (1, 85)=.69,p=.409; 2000Hz: F (1, 85)=1.90,p=.172), and for RT, no variables covaried (NART: F (1, 85)=.001,p=.981; 1000Hz: F (1, 85)=.30,p=.583; 2000Hz: F (1, 85)=.02,p=.899).Hence all covariates were removed from the RT model, and both hearing sensitivity variables were removed from the accuracy model leaving only IQ approximation.There were insufficient numbers of males to assess interactive effects of gender, so the analyses were repeated with males excluded to ascertain whether gender might be influencing the results.Excluding males did not change the results patterns.

Discussion
Older adults were consistently less accurate at discriminating prosody patterns than younger adults, regardless of whether the task focussed on emotional or non-emotional prosody, and regardless of additional load from within-pair incongruence.Concerning our main aim, age-related prosodic emotion interpretation differences (Dupuis & Pichora-Fuller, 2010;Mitchell, 2007;Mitchell, et al., 2011) do indeed extend towards discrimination judgements.We can reject the hypothesis that age-related differences in prosodic emotion recognition are an artefact of additional unrelated cognitions, because in a discrimination task devoid of labelling cognitions, age-related decline in emotion perception was still observed.The differences may be more fundamental, and relate to common generic operations.That older adults' performance was universally above chance indicates older adults retain some ability to interpret prosody though.Although Orbelo et al's study also examined prosodic emotion discrimination (Orbelo, et al., 2005), the current study improved methodology -it comprised more trials (36 vs. 24), used >50 different carrier sentences, and used ecologically representative speech unlike their low pass filtering.The current data are therefore more generalisable and more likely to model real-life performance.Age-related differences in comprehension of low-pass filtered speech could reflect older adults' increased reaction to their "unnaturalness", but we have ruled this out.The combined data from our emotional and non-emotional tasks allow us to make a second advance, and suggest that age-related differences in prosodic emotion discrimination may represent a generic difficulty deriving pragmatic functions from prosody about communicative intent.No interactions between participant group and task were observed, but this claim can also be derived from the observed main effect of group (i.e., collapsed across task).
Further research could perhaps include samples of middle-aged adults as in Brosgole et al.'s early labelling studies (Allen & Brosgole, 1993;Brosgole & Weisman, 1995), to map the gradual decline of these functions between younger and older adults, whilst examining discrimination judgements and responses to non-emotional prosody.One interpretation of our data is that older adults may have a generic difficulty deriving overall communicative intent from the prosodic ensemble.However, examination of specific acoustic cues and their subsequent integration or contribution to generic difficulties interpreting prosodic ensembles is also warranted.Whilst there is a report that duration cues (but not intensity or pitch) predict middle-aged adults' (aged 38-50 years) ability to correctly identify prosodic emotions (Paulmann, et al., 2007), comparable data for older adults is not yet available.However, evidence from elsewhere suggests that there may well be some bottom-up perceptual contributions to age-related difficulty interpreting prosodic emotions, since older adults are more likely (than younger adults) to misjudge key components of prosody such as duration (Fitzgibbons & Gordon-Salant, 1995;Ostroff, McDonald, Schneider, & Alain, 2003), pitch (Petrini & Tagliapietra, 2008), timing (Fitzgibbons & Gordon-Salant, 2004), and amplitude (Boettcher, Poth, Mills, & Dubno, 2001).A bottom-up perceptual explanation therefore still remains viable.
Beyond reduced accuracy, older adults also took longer to make discrimination judgements.Regarding the cognitive bases underlying age-related difficulty with prosodic emotion perception, this finding may represent generic slowing of necessary cognitions, but it is difficult to draw definitive conclusions without further study, e.g., to filter out confounding effects of reduced psychomotor speed with age (Salthouse, 2000).In theory, these RT data could be explained by global motor or cognitive decline.However, although age-related changes in accuracy were comparable across the emotional and non-emotional tasks, the group by task interaction showed that age-related slowing was more exaggerated for emotional prosody than for non-emotional prosody.This additional finding means that although an overall age-related impairment may exist relative to younger adults, the impairment is even greater in certain circumstances.The interaction implies that when processing prosody with emotional connotations (vs.linguistic (non-emotional) functions), older adults require an additional time period, to maintain that accuracy across the tasks.One possible explanation is that the additional time older adults need to interpret the prosodic emotion cues reflects the time needed to recruit compensatory brain mechanisms, and it is the recruitment of these mechanisms that maintains older adults' performance on the emotional task to the level of accuracy shown on the non-emotional task.The question would then become why might older adults find a greater need for compensatory brain mechanisms on the emotional task?Perhaps precisely because of its emotional connotations given the wealth of evidence supporting age-related reduction in emotion recognition (Ruffman, et al., 2008).
Regarding what is driving the effects we have observed, we recommend that both neural and social causes are worthy of investigation.In conjunction with prior studies of age-related differences in prosodic emotion recognition, our data hint at the underlying neuroanatomical bases of this difficulty.Such predictions can be made from what is known about which parts of the brain mediate the different stages of prosody perception.To explain the normal functional neuroanatomy of prosodic emotion perception in young adults, a three-stage model has been proposed (Schirmer & Kotz, 2006).It begins with sensory processing in which emotionally relevant acoustic cues are analysed in bilateral temporal lobe auditory regions.It continues with integration of emotionally significant acoustic cues into an ensemble (from bilateral superior temporal gyri to the right anterior superior temporal sulcus).Finally, comes cognitive evaluation of emotional significance (right inferior gyrus and orbitofrontal cortex), in which a verbal label is attached, and evaluative judgements made.Stages 1&2 share cognitive operations and neuroanatomy with non-emotional prosody (Arciuli & Slowiaczek, 2007;Pihan, 2006;Wildgruber, Ackermann, Kreifelts, & Ethofer, 2006).The current study also found age-related difficulty discriminating non-emotional prosody.It therefore provides evidence that older adults' difficulty might lie in stage one or two.Further study would be necessary to determine which of these two stages is most difficult for older adults, however, its most likely basis is perhaps the right superior lateral temporal lobe.In addition to biological mediators, future research should also evaluate what happens when older adults evaluate prosodic cues expressed by their peers (Ruffman, et al., 2008).Reduced time spent with young adults could explain declining prosodic emotion interpretation in current tests.
In conclusion, our results showed that age-related decline in prosodic emotion perception observed in labelling tasks are also observed in discrimination tasks, which suggests that age-related decline in vocal emotion identification is not an artefact of labelling cognitions.We also demonstrate this extends to different types of prosody.The results of our two tasks imply generic age-related differences in interpretation of prosodic ensembles, but the underlying causes remain unclear.Summarised by mean and standard error.Accuracy = % of correct "same" or "different" judgements made for the pairs of stimuli, RT = mean response speed to correctly discriminated pairs.

Table 1 .
Demographic data and background assessments for the older and younger adult participant groups

Table 2 .
Performance of older and younger adults on the emotional and linguistic prosody interpretations tasks