The Effects of Intrinsic Acoustic Cues on Categorical Perception in Children with Cochlear Implants

Many previous studies researched the influence of external cues on speech perception, yet little is known pertaining to the role of intrinsic cues in categorical perception of Mandarin vowels and tones by children with cochlear implants (CI). This study investigated the effects of intrinsic acoustic cues on categorical perception in children with CIs, compared to normal-hearing (NH) children. Categorical perception experiment paradigm was applied to evaluate their identification and discrimination abilities in perceiving /i/-/u/ with static intrinsic formants and Tone 1 (T1)-Tone 2 (T2) with dynamic intrinsic fundamental frequency (F0) contours. Results for the NH group showed that vowel continuum of /i/-/u/ was less categorically perceived than T1-T2 continuum with significantly wider boundary width and less alignment between the discrimination peak and the boundary position. However, a different categorical perception pattern was depicted for the CI group. Specifically, the CI group exhibited less categoricalness in both /i/-/u/ and T1-T2. It suggested that the effects of intrinsic acoustic cues on categorical perception was proved for the normal-hearing children, while not for the hearing-impaired children with cochlear implants. In conclusion, acoustically dynamic cues can facilitate categorical perception of speech in NH children, whereas this effect will be inhibited by difficulties in processing spectral F0 information as in the CI users.


Introduction
Categorical perception (CP) in speech sounds refers to the phenomenon that the acoustic stimuli which vary along a physical continuum of equal intervals are perceived as discrete categories, and the differences between categories are more discriminable than within categories (Harnad, 2003;Reetz & Jongman, 2009;. To date, categorical perception has extended from stop consonants to vowels, and to tones, and has extended from speech to non-speech. Categorical perception research started its way at Haskins Laboratories. Dating back to Liberman, Harris, Hoffman and Griffith (1957), we found people perceived sounds that varied along a continuum abruptly not gradually. Accordingly, Liberman proposed that perception of stop consonants was "categorical". The essential indicators of CP were summarized as follows: 1) the identification boundary position and boundary width of one category shifts to another category sharply; 2) the peak of the discrimination curve and its alignment with the categorical boundary; 3) the between-category accuracy is higher than that of the within-category (Liberman et al., 1957;. Quite a few conclusions have been made: perception of consonants is universally believed as categorical (Liberman et al., 1957;Miller & Eimas, 1977); for contour tones, it is concluded they are categorically perceived (e.g., Abramson, 1979;Francis et al., 2003;Xu et al., 2006;Hallé et al., 2004;Peng et al., 2010 etc.), while the perception of level tones is continuous (Abramson, 1979;Francis, Ciocca, & Ng, 2003); vowels chiefly are considered as being less categorical or even continuous (Fry et al., 1962). became categorical by the age of 4 yr old. Instead, children with severe hearing impairment have considerable difficulty in learning the tone system. In contrast to regular hearing-aids, cochlear implants bypassed the damaged hair cells and directly stimulated the auditory nerve by converting the mechanical sound energy into an electrical stimulus (Lee, van Hasselt, & Tong, 2010). Gu, Yin and Mahshie (2016) investigated the manner of categorical perception of T1-T2 and T1-T4 by the CI children, obtaining that both groups showed categorical effects. Nevertheless, some scholars argued that the gain in tone perception was not satisfactory (Wong & Wong, 2004;Aisha, 2000).
Despite the studies on categorical perception were abundant, previous studies mostly focused on the extrinsic factors of categorical perception, such as language experience, speech complexity, signal duration, and aging effect (e.g., Chen, Peng, Yan, & Wang, 2017;Francis et al., 2003;Hallé et al., 2004;Peng et al., 2010;Pisoni, 1975;Repp, Healy, & Crowder, 1979;Wang et al., 2017;Xu, Gandour, & Francis, 2006;. To compare categorical perception of vowels with tones is thus a good attempt to test the effects of intrinsic acoustic cues on categorical perception. Formants and fundamental frequency (F0) both are spectral/frequency information. F0 is a highly attributable cue to tones, whereas F1 and F2 are important for vowels. They are in contrast in their states.
In consideration of formant, it is inherently steady-state property as in monophthongs. Contrastively for diphthongs, the direction of lower formants will change continuously or even sharply, thus being a dynamic cue (Chen, Zhang, Wang, & Peng, 2019). Nearey (1989) has hypothesized that vowels' inherent static cue (i.e., formants of monophthongs) was crucial for the identification and discrimination, and dynamic changes within the inherent cue, like formants of diphthongs, would affect the manner of vowel perceptions. Differences between categorical perception patterns of monophthongs and vowels carried in a CVC construction supportively illustrate the foregoing statement. Being carried in CVC syllables, the original steady-state formant would generate a transition part due to the surrounding consonants. Therefore, in the study of Studdert-Kennedy (1976), he found stronger categoricalness in the CVC syllable than in the V carrier syllable. Chen et al. (2019) addressed the intrinsic factors in the categorical perception of vowels, implying that the lack of categoricalness of monophthongs was factually due to the steady-state of formant. In the same way, F0 contours are treated as a dynamic-state cue (Chen et al., 2019;Wang, 1967). With these regards, it is understandable that the perception of tones with different F0 contours are categorical (e.g., Wang, 1976;Xu et al., 2006;Peng et al., 2010;, while the perception of level tones is continuous (e.g., Abramson, 1979;Francis et al., 2003). The former is dynamically changed in its intrinsic F0 contour, while the latter remains steady.
Regarding children with cochlear implants, studies stated that current implant systems didn't provide fine spectral or temporal information (Mckay, 2005). For hearing-impaired children, even with the help of CIs, the gain in tone perception remained unsatisfactory (Cheung, Wong, Lam, Lee, & van Hasset, 2002;Ciocca, Aisha, Francis, & Wong, 2002;Lee, van Hasselt, Chiu, & Cheung, 2002;Aisha, 2000;Wong & Wong, 2004). Lee at al. (2010) explained the pitch information essential for tonal languages seemed not to be explicitly represented in the electrical stimulation via current cochlear implant systems. For the CI users, spectral information is degraded (Chatterjee & Peng, 2008), along with difficulty in processing spectral information (Friesen et al., 2001;Lee et al., 2010;Petersen et al., 2015). Luo, Fu, Wu and Hsu (2009) brought additional evidences, in which the CI users performed better on vowel recognition than tone recognition, but were still able to score above 60% on average on tone recognition in quiet. In brief, conditions in the CI group are more complex.
Amid the existing literature, the studies of CP of tones and vowels if not most examined extrinsic factors, and adults were their priority. Yet, studies concerning how the intrinsic cues (static and dynamic) affect the categorical perception are nearly blank. Worse still, the effects in children, especially in the hearing-impaired children, are also less discovered. Originally, using the traditional paradigm of CP, the present study aims to address how the CI group, compared to the NH group, perceive /i/-/u/ and T1-T2, and thus to examine how the intrinsic acoustic cues affect categorical perception.

Mandarin Vowels
Acoustically, vowels are specified by intensity, formant and duration. Formant is a concentration of acoustic energy, reflecting the way air from the lungs vibrates in the vocal tract, as it changes its shape. It appears as a peak in the frequency spectrum, and is closely related to vowel quality. The first three formants are the primary basis for vowels, i.e., F1, F2 and F3. Here, F1 and F2 will be used as two cues to select the most natural vowel token. system, namely, /i/, /y/, /u/, /ə/ and /a/. Alternatively, in Mandarin Chinese, Lin and Wang (2013) [a], and two apical vowels: front /ʅ/ and back /ɿ/. Table 1 gives averaged values of the first three formants from the recordings of 16 Beijing dialect speaking women. /i/ and /u/ are two of the most frequent vowel phonemes in the world (Zee & Lee, 2007). Mandarin /i/ is a front unrounded vowel with a higher value of F2 while a lower value of F1. On the contrary, /u/ is a back rounded vowel with both lower F1 and F2 values. They are primarily distinct from each other in F2.

Mandarin Lexical Tones
Tone is a term used in phonology and phonetics referring to the distinctive pitch level of a syllable. Difference in tone is caused by pitch variations, which are produced by changes in the tension of the glottal folds that cause variations in fundamental frequency (F0) during voiced intervals of speech. Almost 60%−70% of the world's languages are tone languages (Yip, 2002), and over half of the world's people speak a tone language (Fromkin, 1978). In Mandarin tone system, there are four lexical tones, carried by monosyllables. The four tones in terms of their contours are, respectively, level tone, rising tone, dipping tone and falling tone. According to Chao (1948), the relative pitch values of these four tones can be represented through a 1−5 scale, specifically indicating the relative starting and ending pitch of each tone. 1 refers to the lowest pitch, and 5 refers to the highest pitch. The four tones are corresponding to 55, 35, 214 and 51 respectively. To illustrate the relationship between the tone pitch and its meaning, an example of /ba/ is presented in the following table. In the present study, the perceptual performance for two continua, i.e., /i/-/u/ with static cue and T1-T2 with dynamic cue, are compared across two children groups: normal-hearing children listeners and cochlear implanted children listeners. In view of previous findings, the following hypotheses are made: 1) The NH group will perceive T1-T2 in a more categorical manner than vowels due to their different state of intrinsic acoustic cues (i.e., static versus dynamic); 2) Potential discrepancies are expected within the CI group, where they will perform both vowels and tones continua in a less categorical fashion due to their hearing disability.

Methodology
Before the beginning of the formal tests, a pilot study was done to detect problems during the whole experimental design, and then to make timely revision. Afterwards, also prior to the formal experiment procedure, an experimenter was responsible for giving a subject prescreening to avoid any individual who is unable to fulfill the basic cognitive development criteria. To minimize unexpected errors, the experimenter together with the teacher gave an extra lesson on /i/, /u/, T1 and T2, and at the same time an illustration session was ready to further their understanding of the procedure.

Participants
Two groups of children (i.e., NH and CI) were recruited for this study. The NH group consisted of 10 native Mandarin-speaking children (5M, 5F) with age range from 4;0 to 7;10 recruited from either kindergarten or primary school in Changsha. They had no reported history of speech, hearing or cognitive disorders, nor any music learning experience according to school and parental reports. In the CI group, 10 native Mandarin-speaking children (8M, 2F) between the ages of 4;9 and 8;8 were recruited from a local hearing rehabilitation center in Changsha. Their duration of use was from 1;8 to 3;7 with a mean duration of 2;2. They were reported to be prelingually and congenitally impaired with profound hearing losses (> 90 dB HL) in either the right or the left ear, and had no reported history of other cognitive or physical disabilities. All of them were unimodal technology users, with only cochlear implant in the impaired ear. Moreover, they were all implanted before the age of 7 yr old, and the duration of their CI usage was at least 8 months. For more information see Table 3.
A background questionnaire was collected to gather information about their use of dialects and musical training experience. The questionnaire was written in Chinese, and was prepared for their parents or teachers to fill out before the children started their test. Parents' Notice was designed to make sure that they know what the experiment was about and how it would be proceeded, as well as any other details they deserved to know. If holding no disagreement, they would sign their names on it. We received informed consent. Every child was paid for their participation.

Materials
Stimuli along the vowel and tone continuum in the present study was synthesized from the recorded samples of tone produced by a female native Mandarin speaker who is from northern mainland China.
For the /i/-/u/ continuum, /i/ was firstly produced in high level, and a set of 9 vowel stimuli were arranged in equal F1(i.e., 12 Hz) and F2 (i.e., 296 Hz) acoustic intervals from /i/ to /u/. The frequency of F1 was from 317 Hz to 414 Hz, and from 2901Hz to 536 Hz in F2. Owing to that the formant was a level property, the ending frequency was kept the same to the starting frequency. /i/-/u/ continuum was constructed based on the natural samples of /i/. The stimulus duration was interpolated to be 350 ms, with amplitude fixed at 70 dB. The third, fourth and fifth formants were fixed at 3957 Hz, 4766 Hz, and 4914 Hz respectively, which were derived from recorded samples. The major steps of synthesizing the stimuli with a formant synthesizer in Praat were listed as follows:1) Sound normalization. In order to minimize the potentially confounding effects of duration, the duration of stimuli was normalized to 350 ms; 2) Synthesizing speech continuum based on a Praat script. Using /i/ as the basis for manipulation, a 9-step continuum was created by setting the designated values, regarding /i/ as Number 1 stimulus (the onset of the continuum) and /u/ as Number 9 stimulus (the end of the continuum). Figure  1 shows the schematic diagram of these stimuli. ijel.ccsenet.
Stimuli alo means "eig only the be source sou starting fre continuum was re-syn 1995) imp and intens pitch cont Setting the position, a formula 2 manipulati

Proced
All the pa instructed blocks: on correctly a E-prime (S .org ong the tone c ght"), and /bá/ est one was se und. A set of 9 equency of F0 m was construc nthesized by ap plemented in P sity (70 dB) w our of Tone 1 e number of pi and the third on 10 Hz + 9 Hz ions were done dure articipants were to complete tw ne for the /i/-/ and quickly as Schneider, Esc In Figure 1 continuum we / (T2, means "t lected as the s 9 tone stimuli w 0 was from 2 cted based on t pplying the pit Praat (Boersma was the same a 1 to the level f itch points to t ne at the endin z * (Stimulus e on Praat (Bo   Figure 3 presented in p l whether the t experimenter d T1-T2 were d by two steps continuum wer of two differen , 4-2, … , 8-6 2016). Each p ant. The order rgeted particip sting blocks. I they finished Vol. 10,No. 5; e response in practicing trai asks, thus obta ere not includ e participants menter clicked m, and "2" for hich were rand elding a total o y presented to ng trails, whil es to finish. screen …, 9-9, 1-3, … pairs with a two stimuli in helped to pres separated into , and each uni re presented to nt stimuli (diff 6, 9-7), and 9 pair was repea r of the two bl pants, reaction In order to arr a block.

Positio
The mean Figure 6. T

Discrim
The overa discrimina Figure 8. respectively. T for the CI group tect their diff significant dif p = 0.009), resp Pwc in /i/-/u/ b 9) = 2.167, p = y and significa rimination, exc Linguistics iscrimination gory for the t i/-/u/) and 64% T1-T2 continu the difference as the withinrrection metho ign ANOVA re nificant main e group (F (1,1 accurately in / ndaries than w airs were furth y the differenc els and tones f The mean with p are 64% and ference in the fference betwe pectively]. For but only marg = 0.058), resp antly higher t cept for the CI are depicted two groups ar % (T1-T2) for ua for the NH a s in their ove -subject factor od was used w evealed signifi effect of group 18) = 0.047, p /i/-/u/ than in T within the sam er divided into ce between Pb for the NH gro hin-category ac d 61%.
e between-an een Pbc and P r the CI group, ginally signific ectively]. Con than that of th I group in perc Vol. 10,No. 5; in Figure 7, re demonstrate the NH group and CI groups erall discrimin r and group a when appropria icant main effe p (F (1,18) Figure 8.

Discuss
This study implants, continuum identificati significant line with th between th clear peak curve was significant illustrated also parall
The results of this experimental study depict that when the internal acoustic cue, i.e. formants, between monophthongs is in a relatively steady state, the categoricalness of perception decreases accordingly; on the contrary, when the internal cue, i.e., F0 contours, between tones dynamically change, the categoricalness of perception will be significantly improved. The current findings also echo the previous study (Chen et al., 2019). In Chen et al. (2019), the perception of monophthongs with static formants was significantly more categorical than that of monophthong-diphthong continuum with dynamic formants changes. Moreover, studies of level tones and contour tones provide supplementary evidences for this effect (e.g., Abramson, 1979;Francis et al., 2003;Xu et al., 2006;Hallé et al., 2004;Peng et al., 2010 etc.). These studies altogether discover that perception of tones with different contours is categorical (Peng et al., 2010;Xu et al., 2006;Hallé et al., 2004), while perception of level tones was continuous (Abramson, 1979;Francis et al., 2003).
It's worth noting that significant differences in positions amid tasks of /i/-/u/ and T1-T2 identification were not present in this study, whereas both groups manifested significant differences between between-category and within-category accuracies in /i/-/u/ and in T1-T2 discrimination. For the NH group, significant difference in widths between the two continua was displayed. These results might imply that the effects of intrinsic acoustic cues on categorical perception is in operation, and it plays its role mostly in the boundary width. Boundary width is closely related to the rate of detecting the changes between each sound pair. Therefore, it means that this effect influences the normal-hearing children's rate of perceiving the acoustic changes between sound stimuli. Chen et al. (2019) compared the indicators of categorical perception, and concluded that boundary width was the one of the decisive indicators to evaluate categoricalness in perception. Accordingly, the significantly narrower width in T1-T2 means more categorical, which effectively signals the presence of the effects of intrinsic acoustic cues on categorical perception.

Spectral Information Processing
In addition, in the current study, the conditions in the CI group was, to some extent, different from those in the NH group. That is, categoricalness in the perception of either vowels or tones was not significantly different and not satisfactory enough. To be specific, they showed significantly wider widths in both vowels and tones identification than the control group in T1-T2, and no significant differences in their boundary widths and positions were seen between the identification of /i/-/u/ and T1-T2; the identification boundary and the discrimination peak didn't align well in the two continua. And worse still, strongly significant difference in the peakedness was observed in discriminating /i/-/u/, but not found in T1-T2; overall discrimination accuracy in /i/-/u/ was significantly higher than that in T1-T2. The analyses of boundary widths then suggested that the CI group perceived the changes of /i/-/u/ and T1-T2 at a significantly lower rate compared to the NH group, but they performed with similar boundary widths in /i/-/u/ compared to T1-T2.
Previously, studies observed that for hearing-impaired children, improvements were seen in perception of consonants and vowels (Lee et al., 2005). /i/-/u/ by the CI group in this study was perceived in a similar manner with the NH group, which mirrored the results in Munson and Nelson (2005). The current findings of vowels perception by CI children testified the improvement of cochlear implants in processing formants information.
Nevertheless, the coding strategies of CIs have focused on conveying speech envelope information, while the fine structure of sounds (e.g., F0) has not been coded due to technological constraints (Gu et al., 2017). Results in this study are compatible with previous studies that the gain in tone perception tends to be unsatisfactory for those hearing-impaired children irrespective of the aid given by CIs (Wong & Wong, 2004;Cheung et al., 2002;Ciocca et al., 2002;Lee et al., 2002;Aisha, 2000). Significant reduction in performance in T1-T2 by the CI group in this study is also similar to the results of Luo et al. (2009). In their study, results indicated that CI users performed better on vowel recognition than tone recognition. More importantly, spectral information is degraded in CIs (Chatterjee & Peng, 2008), with CI users having a limited number of spectral channels available when compared to NH listeners (Friesen et al., 2001). In their study, CI recipients reportedly had difficulty with the recognition of some prosodic cues, especially those features closely related to fundamental frequency (F0). As cited in Lee at al. (2010), the pitch information was not explicitly represented in the electrical stimulation via CIs. In addition, evidence from MMN response verified that CI users were not sensitive to pitch deviants (Petersen et al., 2015). Therefore, worse performance in T1-T2 could be ascribed to their difficulty in processing fundamental frequency information. It is likely that their difficulty in processing F0 information affects the way in which the intrinsic dynamic acoustic cue functions in the categorical perception of T1-T2. In a word, this effect is not manifested well in the CI group due to their difficulty in processing spectral information, especially in processing F0.

Acoustic Difference Detection
Intriguingly, /i/-/u/ exhibited higher discrimination accuracy for both groups. For one reason, this might be due to the relative larger step size difference in formants within each vowel sound pair, namely around 296 Hz in F2 and 12 Hz in F1, but only 9 Hz in F0 in T1-T2 continuum. As illuminated in Liu (2013), the just-noticeable difference (JND) of lexical pitch perception was 4-8 Hz. Chen et al. (2018) employed tonal comparisons with varying acoustic intervals as 3-step and 4-step stimuli in categorical perception of lexical tones, because 2-step pairs might be too small to perceive for amusics who were impaired in musical pitch perception. Petersen et al. (2015) discovered that weaker brain responses and poorer behavioral performance were true for CI users' discrimination of small changes in pitch. And, the lower-level acoustics underlies higher-level phonological categories (Chen et al., 2018). With these regards, in the current study, the results of the better discriminability on the vowel continuum might be attributable to the larger step size. T1-T2, due to the smaller changes in pitch, exhibits poorer performance in discrimination. Besides,  reported that vowel perception might be strongly influenced by pitch properties of lexical tones. The perception of /i/-/u/ continuum in the current study, therefore, might be posed as a result of the impacts from the original high-level tone as recorded prior to the identification and discrimination tasks. The flattened pitch remains steady in high level tone, which is congruous with the static formant in vowels. The overlaps of that kind of property could subserve the perception of the vowel stimuli synthesized under the condition of the high-level tone, hence the higher scores of discrimination could be graded for the vowel continuum instead of the tonal continuum. In addition, cochlear implants can detect higher frequency information moderately better, while worse in the detection of lower frequency information. In this study, F2 of /i/ and /u/ is much higher than the F0 of T1 and T2, along with the fact that the current processing schemes in CI do not provide optimum information about F0 compared to formants (Mckay, 2005), which in some way explains the discrepancy between them.
Importantly, though the discrimination accuracy was higher in /i/-/u/ than that in T1-T2, it was still lower than the accuracy for adults in the study of Chen et al. (2019). In addition, for the CI group, although they performed similarly in discrimination accuracy and boundary width in /i/-/u/ and T1-T2, they differed in the correspondence between discrimination peaks and boundary positions. Altogether, the categoricalness of /i/-/u/ and T1-T2 by the CI group is not as strong as the perception of TI-T2 by the NH group.

Conclusion
This study investigated the effects of intrinsic acoustic cues on categorical perception in children with cochlear implants, compared to normal-hearing children. Results showed that vowels /i/-/u/ continuum was less categorically perceived than T1-T2 continuum for the NH group. However, the CI group perceived T1-T2 and /i/-/u/ in a similarly less categorical way. In conclusion, the effects of intrinsic acoustic cues on categorical perception is proved to be true for normal-hearing children, while not for the hearing-impaired children due to the interference of their difficulty in processing spectral information. In a nutshell, dynamic acoustic cues can facilitate categorical perception of speech, which however might be hindered by difficulties in processing F0 information.
The present study is a good attempt to investigate the effects of intrinsic acoustic properties on categorical perception. It helps to decide whether categorical perception is affected differently by different states of inherent cues (i.e., static or dynamic) for the CI users compared to normal children. This will generate important theoretical contributions to further understanding of the relationship between acoustic property and perceptual mechanism. Drawing upon two stands of research into perception of speech, it can generate some pedagogical implications for children with special educational needs. By comparing their performances, we can find out which acoustic property causes greater difficulty for hearing-impaired children with cochlear implants and to what extent the hearing-impaired group differs from normal children. Clinically, it can exert suggestions for the research and development of cochlear implants. In the future work, attention should be paid to the comparison between the categorical perception of monophthong and diphthong, where different states of formant are exhibited. It is also possible to take other influential factors into consideration, such as the duration of use of CIs and the complexity of speech signals (i.e., speech vesus non-speech). To compare categorical perception of spectral information (e.g., vowels or tones) with temporal information (e.g., consonants) by children with CIs is another concern to be addressed in the future. Furthermore, attempts should turn into the Event-related Potentials (ERPs) experiments to make up for the shortage of behavioral experiments.