Performance of Children and Adolescents on Random Number Generation as a Measure of Executive Functions

This study investigated the age-related differences in the random number generation (RNG) of children aged 7 to 15 years old (n=106) divided into three groups (7-9, 10-12 and 13-15 years of age) as it was compared to computer-generated pseudorandom sequences. The results showed that there was an age effect on four out of seven indices of randomization that are known to tap the Inhibition ability and the Updating ability (i.e., active manipulation of relevant information in working memory). The participants’ (children’s and adolescents’; n=106) responses were significantly different from pseudorandom sequences (n=106) produced by the RgCalc program and no gender differences were observed. The RNG task indices that reflect inhibition ability did not correlate with the participants’ performance on the Stroop color-word task. The development of executive functions in school-aged children and adolescents is discussed.

performance, school-aged children did not. The authors attributed this effect to potential developmental alterations in number representations (Sosson, Georges, Guillaume, Schuller, & Schiltz, 2018). Less information has been gathered about RNG and EF in school-aged children. In this paper we will address RNG in normally developing children and adolescents.
In attempt to explore different variables influencing EF in Brazilian children between six and 12 years old, Jacobsen, de Mello, Kochhann, and Fonseca (2017) showed that age was the most prominent predictor with regards to RNG performance, while parental education influenced only a small percentage of performance variance (Jacobsen, de Mello, Kochhann, & Fonseca, 2017). Rabinowitz, Dunlap, Grant, and Campione (1989) were among the first to investigate age effects and RNG; 30 young children (mean age of 87 months), 30 children studying in the fifth grade (mean age of 136 months) and 30 first-year college students were tested. In their work first graders produced random sequences but used more counting strategies (series) than both the fifth graders and the college students. Also, Towse and Mclachlan (1999) recruited children aged from four to 11 and showed that children under the age of five manifest significant difficulties in this type of procedure and that their performance improved according to age, and speed requirements (Towse & Mclachlan, 1999).
Regarding developmental disorders, studies support that children within the Autism Spectrum Disorder (ASD) are poor randomizers (Boucher, 1977;Brugger, 1997;Frith, 1972). Similarly, Rinehart, Bradshaw, Moss, Brereton, and Tonge (2006) in a 10 digit-randomization task (i.e., digits from 1 to 10) examined differences in the randomization performance in children with autism (mean age=13.4 years) and Asperger's syndrome (mean age=10.6 years) compared to age-control normal counterparts (mean age = 13.0 years and 10.6 years respectively). More number repetitions were found for the children in the autism group and more stereotyped responses for the children belonging to the Asperger's syndrome group as compared to the control subjects. However, Gilbert, Bird, Brindley, Frith, & Burgess (2008) on a button-press randomization task did not show behavioral differences between children within ASD and normally developing children.
Neuroimaging studies demonstrated that in the brain EF depend highly upon the pre-frontal lobes (Tsujimoto, 2008). Evidence indicates that the pre-frontal lobes mature last in comparison to other brain structures, during late childhood and early adolescence; that is the Pre-frontal Cortex (PFC) is subjected to neuronal and synaptic changes (Tsujimoto, 2008). It has been noted that not until a person is in their 30 years of age, do these brain loci mature completely (Johnson, Blum, & Giedd, 2009;Sowell et al., 2003). In a similar vein, Gauvrit, Zenil, Soler-Toscano, Delahaye, and Brugger (2017) in a series of random generation experiments demonstrated that performance improves with age and reaches a plateau at 25 years before it starts declining.
Stroop task (Stroop, 1935) has traditionally been used as a measure of EF, especially for selective attention and inhibition (Kiyonaga & Egner, 2014;Lamers, Roelofs, & Rabeling-Keus, 2010). The Stroop task has also been shown to correlate with some aspects of RNG (Brugger, 1997;Brugger, Pietzsch, Weidmann, Biro, & Alon, 1995). The Greek version of the Stroop task includes three parts lasting 45 seconds each (Zafiri & Kosmidis, 2008). For each subset, the correct responses constitute the score. An interference score reflecting the "Stroop conflict" is calculated (Lovallo, Yechiam, Sorocco, Vincent, & Collins, 2006;Perlstein, Carter, Barch, & Baird, 1998). In this work, the Greek version of the Stroop Color Word Test (Zafiri & Kosmidis, 2008) was administered to investigate children's and adolescents' performance and to see if it may or may not correlate with the randomness indices that indicate stereotypy (e.g., adjacent responses). These results will give insight into the children's and adolescents' ability of inhibition (Towse & Neil, 1998;Miyake et al., 2000).

Randomness Indices
Several indices have been proposed as measures of randomness in RNG tasks. Seven indices that have previously been used in RNG tasks (Proios, Asaridou, & Brugger, 2008), and are generated by the RgCalc program (Towse & Neil, 1998), are Coupon, Mean Repetition Gap (Mean RG), Redundancy (R) Turning Point Index (TPI), Runs, Adjacency combined score (A comb), and Evan's Random Number Generation (RNG). According to Towse and Neil (1998) and their principal component analysis (PCA), these seven indices load on two different factors, termed "equality of response usage" and "prepotent associates", respectively. R, Coupon, and Mean RG comprise the former factor that shows whether someone employs responses equally, whereas A comb, TPI, RNG and Runs Score comprise the latter factor and demonstrates one's ability to avoid stereotypy (Towse, 1998;Towse & Neil, 1998, p. 590). Miyake et al. (2000) also replicated the above factor structure, adding that the indices included in the "equality of response usage" component tap the EF of inhibition, and the indices that form the "prepotent associates" component are indicative of the ability to update and monitor working memory information (Miyake et al., 2000;Miyake, Witzki, & Emerson, 2001). Neil, 1998). For instance, in the RNG task if a participant utters the sequence "2,4,1,3,6,5" the Coupon score would be the lowest, thus the optimal. The longer the sequence before all alternatives are produced the higher the Coupon score, and therefore, the worse the randomization performance (Audiffren, Tomporowski, & Zagrodnik, 2009). On the other hand, the Mean Repetition Gap score (Mean RG) is the mean gap or distance between the appearance of the same number. To cite an example, in the sequence "3,5,2,1,3,4,6" there is a gap of four for the number "3". Hence, when someone repeats a number frequently the Mean RG will be lower (Audiffren et al., 2009). Redundancy (R) is a percentage score denoting the deviation from randomness, the more random the less redundant a sequence. Thus, R scores of 10% or 20% signify a more random sequence than an R score of 70% (For a mathematical approach of R score you can see Towse & Neil, 1998). Turning Point Index (TPI) measures the change between ascending and descending series (Geisseler et al., 2016). For example, in the sequence "2,3,5,1,4,2" there are two turning points: one at the digit "5" and one at the digit "4". The turning points are compared to theoretical values of turning points of random sequences (Audiffren et al., 2009). A TPI score of 100 is the optimal score concerning randomization, whereas a TPI lower than 100 reflects a counting tendency (i.e., not as many alterations of ascending and descending sequences as expected). Finally, a TPI score higher than 100 shows a strategy of changing ascending and descending sequence more often than expected (Audiffren et al., 2009). The Runs score (Runs) is a great measure of counting strategies. The higher the Runs the longer the ascending and descending sequences. Consequently, a lower Runs score shows better randomization performance (Towse & Neil, 1998). The Adjacency combined score (A comb) is a percentage of successive ascending or descending answers (e.g., "4,5" or "3,2"; Audiffren et al., 2009). High A comb scores indicate a stereotypy in responses (Towse & Neil, 1998). On the other hand, Evan's Random Number Generation or Random Number Generation (RNG) refers to the frequency of pairs. Evan's RNG ranges from 0 to 1 with higher RNG index signifying more deviation from theoretical distributions, and therefore poorer randomization performance (Evans, 1978;Towse, & Neil, 1998).

Hypotheses
We hypothesize that the EF underlying performance on an RNG task will be improved in adolescents compared to the youngest counterparts. We also expect that performance on the Stroop Test (Stroop, 1935) will correlate with the subjects' performance on the RNG task indices that load in the "prepotent associates" component and tap inhibition (Towse & Neil, 1998;Miyake et al., 2000).

Participants
A group of 106 children and adolescents aged from 7 to 15 (54 female) with a mean age of 132 months (SD= 27.1 months) were divided into three groups according to their age: 7-9 (n=41, mean age=105.5 months, SD=8.54 months, age range: 89 -119 months), 10-12 (n=40, mean age=135.4 months, SD=9.29 months, age range: 120-154 months), 13-15 (n=25, mean age=172 months, SD=9.12 months, age range: 156-185 months). None of the participants had a history of learning disabilities or developmental disorder (based on the school records and student reports). Children and adolescents were recruited as part of regular curriculum (2019) in three elementary schools and four high-schools in the cities of Thessaloniki, Ioannina and Kalamata in northern, western, and southern Greece, respectively and participated face-to face in this study.

Procedures
Participants' next of kin were informed about the study by the school committees. A letter informing about the purpose of the study, the procedure, and a consent form were sent via mail. Children and adolescents were tested in the school premises individually, while anonymity and confidentiality were kept throughout the process. All participants were informed that they were free to withdraw from testing at any time. Their wellbeing and confidentiality were protected by the researchers and testing took place according to the ethical guideline of the 1964 Declaration of Helsinki (Rickham, 1964).

Measures
The Mental Dice Task (MDT; Brugger, Landis, & Regard, 1990;Brugger, Milicevic, Regard, & Cook, 1993;Brugger et al., 1996) asks subjects to imagine repetitively throwing a die and report the number that is rolled. In this study students were required to utter a random sequence of digits from 1 to 6 (i.e., imitating a die). All participants had a practice run of 10 responses and then were asked to produce 66 responses in time with a metronome that was set at 1 beat/sec. After verbal responses were recorded, the data was uploaded to RgCalc program (Towse & Neil, 1998) in order to examine the performance of randomness. For every student a pseudorandom sequence was produced by RgCalc for comparison with all subjects.

Data Analyses
For the statistical analysis the Statistical Package for the Social Sciences (SPSS 22.0) was used. For the effect of age on randomness parametric and non-parametric tests were applied: F-test (ANOVA), Welsch, and Kruskal-Wallis. Welsch test was applied when homogeneity of distributions did not exist, while Kruskal-Wallis test was preferred when histogram was not symmetrical and the population was small (n<30). Post-hoc analyses included the Bonferroni correction for the F-test, the Tamhane correction for the Welsch test, and the Mann-Whitney test for the Kruskal-Wallis test. The level of significance was set at .05 (α=.05)

Factorial Structure of the RNG Indices
The seven randomization indices derived from the data of our sample were calculated by the RgCalc program. These seven indices were subjected to Principal Component Analysis (Determinant of correlation matrix ˃.0001, KMO=.613˃.6, Bartlett's Test of Sphericity p˂.05) with an orthogonal (varimax) rotation. From examination of the scree plot (Figure 1), two factors with eigenvalues larger than 1 were extracted, together accounting for 70.86% of the variance. The indices that comprise the two factors and their loadings are shown in Table 1.   Table 2 illustrates the age effect on indices of randomness. In particular, the Kruskal-Wallis test indicates statistically significant differences on R among the three groups (KW=6.71, p=.03). Pairwise comparisons with the Mann-Whitney test showed that R is greater in the first group (7-9 years of age) compared to the second group (10-12 years of age). In contrast, the F-test (ANOVA) was performed on Evan's Random Number Generation (RNG). The age had a significant effect on this index (F=5.36, p=.006). Pairwise comparison with the Bonferroni correction showed that RNG is higher in Group 1 (7-9 years of age) in comparison to Group 3 (13-15 years of age). Furthermore, the youngest participants (Group 1) had a higher A comb score than their older counterparts (Group 1> Group 3; KW=10.92, p=.004). Finally, the youngest children (Group 1) had a smaller Mean RG than the adolescents (Group 3; KW=6.92, p=.03).

Age Effect
On the other hand, the F-test did not reveal any significant differences among the three groups on TPI (F=1.70, p=.18) and Runs (F=2.80, p=.06). Lastly, according to the Kruskal-Wallis test there were no significant differences on Coupon among the three age groups (KW=4.10, p=.12).  Table 3 indicates the gender effect on the different indices of randomness. Independent sample t-test and Mann-Whitney U test revealed that none of these indices was statistically significantly different between the two groups.

Comparison of Randomness between Participants and Simulated Data (RgCalc)
Comparisons between the participants' (i.e., children and adolescents) responses and the 106 pseudorandom sequences produced by the RgCalc program are presented in Table 4. T-test indicated significant differences between groups for all indices of randomness (Table 4). We found R, TPI, and Coupon to be significantly lower for the participant group (i.e., children and adolescents) compared to the RgCalc sequences. On the other hand, RNG, Runs, A comb, and Mean RG were significantly higher for the participant group.  Table 5 shows the correlations (Pearson r) of the Stroop performance and the indices of randomness. As it can be seen none of these indices correlated significantly with performance on the Stroop Test (raw Interference Score). developmental change of inhibition.

Correlations of the RNG Indices and the Stroop Test
The PCA performed in the present study revealed consistent findings with Towse and Neil's (1998) about the indices of randomness. While Towse and Neil (1998) performed a PCA in adult responses of a 10-digit randomization task, we did with responses of school-aged children and adolescents on the MDT. This is interesting and extends the literature of the randomization tasks.
No gender effect was found on the randomization performance in any of the age groups and our finding is in line with some previous research regarding the effect of gender on EF (Brocki & Bohlin, 2004;Jacobsen et al., 2017). This is contrary to other studies that have reported gender effects on EF (Dias, Menezes, & Seabra, 2013;Wiebe, Espy, & Charak, 2008).
Children's and adolescents' responses were significantly different from pseudorandom sequences produced by the RgCalc program, exhibiting two major characteristics: firstly, a tendency to generate numbers in their natural order (e.g., counting) as indicated by the higher RNG, Runs, and A comb scores and the lower TPI score. Secondly, the students generated number sequences in which all alternatives are used equally and are evenly distributed as indicated by the higher Mean RG score and the lower R and Coupon scores. Children and adolescents might not be fully able to inhibit prepotent or stereotyped responses while at the same time, they might perceive randomness as an even appearance of all response alternatives. Human perception and generation of randomness is characterized by such biases (Audiffren et al., 2009;Jahanshahi et al., 1998;Spatt & Goldenberg, 1993).
The Stroop test results did not correlate with the inhibition indices of randomization performance (i.e., A comb, TPI, Runs, RNG). In contrast, Brugger et al. (1995) found a moderate positive correlation between the Stroop task and the MDT (r=.30, p=.007) in healthy adults. Also, Friedman and Miyake (2004) found that Stroop performance and the indices that indicate stereotypy (i.e., "prepotent associates") correlate significantly in young adults. Whereas like our findings, Maes, Eling, Reelick, & Kessels (2011) indicated that the Stroop task performance did not correlate significantly with an RNG task in patients with cognitive decline. On this matter one may wonder whether there is a developmental effect. One idea is that RNG and Stroop tasks examine diverse aspects of inhibition engaging adjacent but not identical neural pathways.

Study Limitations
The limitations of the present work include the following: the fact that the Greek version of the Stroop task is standardized to the adult population, and the relatively small sample size of this study. Normative data do not exist for the Greek children at the population level. The limits of what is possible in RNG tasks and EF in the Greek language for this age group have not yet been explored. Inclusion of different versions of the Stroop task and of more neuropsychological assessment tools could be used to examine interactions of RNG tasks with EF. The development of the randomization performance was based on specific age ranges using comparative analyses. Using a larger sample, then age can be analysed as a continuous variable. This study could be extended to include populations from special educational settings for the evaluation of EF in atypically developing children and adolescents.

Conclusion
As our findings were in agreement with previous work (Rabinowitz et al., 1989) future studies can continue to investigate age-related differences and include larger as well as different groups: pre-school children and school-aged children and adolescents. Notwithstanding the shortcomings, the findings of our work conclude that there is an age effect on randomization performance with adolescents demonstrating improved performance as compared to younger children. We did not find a significant gender effect and this is in accordance with some latest findings as Guerra et al. (2021) reported a gender effect only on some EF tasks while Slot and von Suchodoletz, (2018) found insignificant results on gender differences in pre-school children. Although, a recent meta-analysis found that gender plays a significant role on EF in children (Cortés Pascual et al., 2019). Undoubtedly, more research is needed to cast light on this phenomenon. These conclusions suggest that in a Greek sample gender effects do not exist with regard to EF and more data from different cultures can advance our knowledge.

Recommendations for Future Research
Future research can further explore more dimensions of random number generation performance in school-aged children and adolescents. Other populations that demonstrate EF deficits can be investigated such as children with Attention Deficit Hyperactivity Disorder (ADHD) or children within the ASD (Leno et al., 2018; Pineda-Alhucema, Aristizabal, Escudero-Cabarcas, Acosta-Lopez, & Vélez, 2018). Finally, the inclusion of more neuropsychological tools and cognitive measurements along with academic achievement could enhance our understanding on this topic.