Executive Functioning and Adolescents’ Academic Performance on Standardized Exams

Executive functions (EFs) help regulate and direct thoughts, behaviors, and emotions. They also play vital roles in many areas of life. However, few studies address the role EFs play in adolescents’ lives, including their academic performance. We investigated the effects of EFs on standardized exams in mathematics, reading, and English language arts. The main findings were that: 1) adolescents’ EFs—especially when measured by their current teachers—predict performance on standardized academic assessments throughout the middle and high school grades; 2) this effect existed among a rather diverse sample of students both with and without diagnosed disabilities; 3) the predictiveness of EFs tended to increase across these grades when measured by the teachers, but not those gauged by the students themselves; and 4) EFs were somewhat more strongly associated with performance on standardized reading and English language arts exams than on math exams. In addition, students who identified as female tended to show stronger EFs; race/ethnicity showed some significance, but not in easily interpreted ways; and age was reliably associated with performance on these standardized exams such that older students tended to do better even though the exam scores were standardized by grade level. The results illustrate the contributions of EFs to standardized assessments for students with and without diagnosed disabilities.


Executive Functioning and Adolescents' Academic Performance on Standardized Exams
Executive functions (EFs) can be generally defined as a set of cognitive and behavioral control processes that individuals use to regulate and direct attention, memory, thoughts, emotional reactions, and behaviors so that they may attain both short-and long-term goals (Blair & Raver, 2012;Diamond, 2013;Gioia et al., 2015). These abilities to direct one's attention and behavior towards meeting a goal are necessary to complete most academic tasks. It is not surprising then that EFs present an area of growing interest among education researchers. The varied demands of reading (decoding and synthesizing lexical and phonological aspects) require the development and coordination of various executive functions (Ober et al., 2020). It is not surprising then that both Sesma et al. (2009) and Zelazo and Carlson (2012) contend that EFs are fundamental for language development and thus for literacy-itself a foundation for learning-as well as for the processing and organization of received information.
The main purpose of the present study is to examine the relationship between EFs and academic performance on standardized testing among adolescents. The analyses also take into consideration variables such as students' age, gender, disability status, race/ethnicity and socioeconomic status (SES). Prior research has established the relationship between EFs and assessment of academic performance through teacher grades (Samuels et al., 2016(Samuels et al., , 2019. most of the studies over the last decade have found this variable to be more significant for academic performance than the intelligence quotient, the variable traditionally viewed to be the best predictor of academic success (Ren et al., 2015).

Goals and Hypotheses
The primary goals of the current study were to: (1) investigate the predictive validity of the BRIEF by analyzing the contribution of BRIEF General Executive Composite (GEC) scores to predictions of academic performance on standardized tests among 6 th to 12 th graders, (2) examine the predictive validity of the BRIEF-SR by analyzing the contribution of BRIEF-SR GEC scores to the predictions of these same outcomes, (3) compare the contributions of the BRIEF with those of the BRIEF-SR for their uses as experimental tools and in light of relevant demographic variables (age, gender, race/ethnicity, SES, and disability), and (4) evaluate the results for both components of EF: metacognition and behavioral regulation.
We hypothesized that BRIEF and BRIEF-SR GEC scores would predict performance on a standardized test among adolescent students. However, we anticipated that the valid predictive use of BRIEF-SR GEC scores here may not be as well supported (i.e., will not predict academic performance as well as BRIEF GEC scores) given the equivocal findings on the use of the BRIEF-SR in academic settings outlined earlier.

Participants/Demographics
With informed consent and IRB approval, data were collected in a charter school of New York City. The school was chosen because it provides an inclusive environment for all students. It welcomes all students, including those with disabilities and integrates them in all classrooms and activities. For further context, it is worth noting that the school's mission statement indicates that full integration of students empowers them to break down barriers through the power of their daily academic and social experience, enabling them to develop the academic skill, emotional fluency, and confidence required to be successful students today and thoughtful, open-minded leaders tomorrow.
Therefore, all students attended the same middle school and then the same high school. The students at these two schools are diverse and represent a general-if under-served-population, but are all experiencing the same overall academic environment. We would have liked to collect data from other schools as well, but the intensity of the yearly data collection and breadth of the data collected have so far prevented us from being able to doing so at more schools.
Scantron © Scaled Scores in mathematics were available for 688 students, in reading for 719 students, and in Language Arts for 717. For this group, which spanned grades 6-12. The mean age for the students with at least one Scaled Score and BRIEF / BRIEF-SR score was 13.57 (SD = 1.91) years. Forty-seven percent identified as female; 40% identified as African-, 4% as Asian-, 12% as European-, and 33% as Latin-American. (Race and ethnicity were recorded as exclusive categories, with only 1% identifying as members of multiple races.) Thirty-four percent were classified as having a disability and had individualized education programs (IEPs). The classifications included in the IEPs were: social emotional impairment, autism spectrum disorder, learning disability, other health impairment, speech and language impairment, intellectual disability.

The Behavior Rating Inventory of Executive Function (BRIEF)
The Behavior Rating Inventory of Executive Function (BRIEF; Gioia et al., 2000) is an 86-item instrument developed to assess-via parent and/or teacher reports-EF manifestations in the everyday lives of children and adolescents aged 5-18 years. The BRIEF has been widely used in clinical applications as well as in a variety of research studies involving children and adolescents who are typically and atypically developing (for reviews see Isquith et al., 2013;Roth et al., 2014). Researchers who examined the discriminant validity of the BRIEF reported that it successfully differentiates between children and adolescents with and without ADHD (Reddy et al., 2011;Toplak et al., 2008). It has been widely used to assess outcomes following a variety of interventions (Isquith et al., 2013) and is associated with academic performance (Clark et al., 2010;Langberg et al., 2013;Locascio et al., 2010;Roth et al., 2014;Samuels et al., 2016Samuels et al., , 2019. The BRIEF has demonstrated good inter-item and test-retest reliability (Gioia et al., 2000). It has also been found to be a practical tool showing valid uses in school and clinical settings as well as in research; there are over 400 peer-reviewed publications supporting the reliability, clinical utility, and valid uses of the BRIEF, mostly among children. Overall, reviews of the BRIEF have been positive (Baron, 2000;Goldstein, 2001;Strauss et al., 2006). To our knowledge, however, no studies have yet investigated its use to predict performance on standardized exams in schools and among diverse, community-dwelling adolescents.

The Behavior Rating Inventory of Executive Function-Self-Report Version (BRIEF-SR)
The Behavior Rating Inventory of Executive Function-Self-Report Version (BRIEF-SR) offers another method to measure EFs among older children and adolescents. The BRIEF-SR is designed for those aged 11-18 years to self-report the frequency of various EF-related behaviors through 80 items that measure nearly the same domains as the BRIEF (Guy et al., 2004). The use of the BRIEF-SR may therefore allow for investigations of EFs among adolescents while relying on a different source for information that may reduce the burden on any one participant while also providing a complimentary-or perhaps even an alternate-vehicle for measurement. Guy et al. (2004) provided evidence for the BRIEF-SR's ability to validly measure EFs, including through its relationship with the Behavior Assessment System for Children Parent Rating Scales (BASC-PRS) and with Teacher Rating Scales (BASC-TRS)-but, importantly, not directly against the BRIEF. Indeed, few studies have compared versions of the BRIEF side-by-side with the BRIEF-SR, but the present study undertakes this task. The uses of the BRIEF and BRIEF-SR here have been purchased from PAR, Inc., their publisher.

Structure of the BRIEF and BRIEF-SR
The BRIEF and BRIEF-SR were constructed to measure two general areas of EF: Metacognition and Behavioral Regulation (Gioia et al., 2000;Guy et al., 2004), themselves each comprised of further subscales that interrelate but nonetheless represent different executive functions (Karr et al., 2022;Keller et al., 2023). Exploratory factor analyses of the eight subscale divisions of the parent and teacher forms of BRIEF showed the same two-factor solution in both normal controls and specific clinical subjects (Gioia et al., 2000). The metacognition and behavioral regulation areas can be combined to create an overall Global Executive Composite (GEC) score.
As operationalized by the BRIEF, metacognition includes the "ability to initiate, plan, organize, and sustain future-oriented problem solving in working memory" (Gioia et al., 2000, p.20). Behavioral regulation involves the "ability to shift cognitive set and modulate emotions and behavior via appropriate inhibitory control" while allowing "metacognitive processes to successfully guide active, systematic problem solving (and supports) appropriate self-regulation" (p.20).

General Administration
Both the BRIEF and BRIEF-SR were administered once per academic year (AY), within a few weeks of the end of the AY. All students completed the BRIEF-SR on the same day during the same class (Wellness, a course that roughly corresponds to a combined Health and Civics course). Teachers of this course completed the BRIEF for each of their students within one week of when the students completed the BRIEF-SR.

Assessment of Academic Performance Through Scantron © Scaled Scores
Students were assessed in reading, English language arts, and mathematics using the Scantron © Performance Series, a series of standardized assessments of various content areas, including of course math, reading, and English language arts. The Performance Series are computer adaptive diagnostic assessments that each take between 45 and 60 minutes to complete and consists of 52-68 items, depending on how quickly a student's score can be determined.
Since the Performance Series use adaptive testing (the items and their difficulty levels changes based on a student's responses), one cannot use classical test theory's definition (often operationalized as Cronbach's α) to determine reliability. Instead, the exam is usually stopped once the standard error of measurement on an exam is less than a pre-determined threshold. Test-retest reliability for these exams tends to be reasonable (rs > .65;Scantron, 2015).
Evidence of the content-related validity of the Performance Series comes from their alignment with New York State Learning Standards (New York State Education Department, 2015-2016 and with standards proposed by national content pedagogy organizations including the National Council of Teachers of Mathematics (NCTM) and the National Council of Teachers of English (NCTE). Scores on the Performance Series mathematics exams also tend to correlate well with elementary and middle school students' performance on New York State mathematics tests (rs > .69; Scantron, 2015); correlations with other content areas were not available.
Among the available metrics generated by the Scantron Performance Series exams, we elected to focus on the Scaled Scores. Scaled Scores are standardized transformations of students' raw scores on any given administration of an exam. These scores are standardized using Rasch modeling of the item difficulties (Scantron, 2015). Scaled Scores are grade-level independent, making them useful for comparing changes over time.

Administration
At the schools studied here, Performance Series exams were administered in each subject area usually twice per year; usually two different content area Performance Series exams are given, one during the fall and an other during the spring. In other words, students complete Scatron Performance Series exams on two occasions in an AY, but usually only complete a given content area exam at most once per year. There were instances when students took the same Scantron exam more than once in a given year; this would occur when there was a mis-administration, technical issues, or if the student or teacher felt that the performance on the exam was not indicative of their true ability and another chance to assess was given. Scantron indeed allows teachers to "spoil" an exam, halting it to allow students to retake it at a later time when either feels the exam will better measure the student's ability. At this school, it is the administration that decides if an exam should be spoiled. This occurred on 7.9% of the times the tests were administered; on 6.6% (n = 45) of the occasions, a student took it one or two times more than prescribed, and on 1.3% (n = 9) occasions they took a given exam up to as many seven times more than the minimum. Since some students retook a given Scantron exam, averaged a student's Scale Scores by academic year (AY) to insure the comparability of the data for each student.
There was an average of 3.43 AYs of data for Math Scaled Scores, 4.38 AYs for Reading Scaled Scores, and 4.16 AYs for Language Arts Scaled Scores. Note, however, that students did not complete a given content area Scantron exam every year; occasionally (~9% of the time), either the BRIEF or BRIEF-SR were not completed in a given year. Given this, students did not often have both a Scantron © Scaled Score and BRIEF or BRIEF-SR scores for the same AY. In nearly all cases (≥89%), data were available for both a Scaled Score and BRIEF / BRIEF-SR score for only one AY; otherwise, data were available for both Scantron and BRIEF / BRIEF-SR scores for two AYs, sufficient to perform the planned analyses exploring longitudinal effects.

Analytic Plan
To summarize the data collection procedure, the BRIEF and BRIEF-SR were administered to teachers and students respectively within a few weeks of the end of each academic year. The teachers completed the BRIEF for a random subset of students whom they taught that academic year; the students completed the BRIEF-SR about themselves. The BRIEF and BRIEF-SR were collected every year for each student, with different teachers completing it for a given student each academic year. Students completed the Scantron Performance Series exams as they were normally administered during the academic year, usually a given Scantron exam was administered once per AY in either semester.
All instruments collected during the same academic year were coded as being collected during the same AY. Although we knew the exact dates on which the instruments were measured, including this level of exactness did not improve the power of the analyses. The years of data were nested within student so that we could analyze within-student changes in both EFs and performance on the standardized exams.
Our investigation into the relationships between EFs and standardized scores included three general steps to strengthen the analysis. First, we assessed the simple (zero-order) correlations between BRIEF and BRIEF-SR scores and sub-scores with Scantron © Scaled Scores in math, reading, and English language arts; we also examined the relationships of important demographics within the models. Second, we used "predictive modeling" to examine the interactive effects of the variables. Third, we employed multilevel models (also commonly called as "hierarchical linear models") to test the joint effects of the variables. All variables were either standardized (for continuous variables) or included as dummy variables (for nominal variables).
In these analyses, we first created a "base" model that included the demographic variables that were both found to be of interest through the zero-order correlations and that were also found to be of interest through previous research (e.g., Best et al., 2009;Best et al., 2011;Danielsson et al., 2010;Torske et al., 2017;Vogan et al., 2018). The demographic variables included in the base models were age, gender, race/ethnicity, SES, and IEP status.
These base models served as comparisons for the models that then also included EF-related terms: a main effect of the given EF (metacognition, etc.) and an EF-by-time interaction term. The EF-by-time interaction term tested whether any effect of the EF on that standardized score changed over time, e.g., if the EF become more impactful as the student matured. non-EF factors and then focus specifically on EFs per se.
All times-varying terms (viz., age, EF scores, and standardized exam scores) were nested within students. Students, however, were not nested within their classrooms since this changed every year and since students could be grouped differently within a year for different courses.

Results
This section presents the Results of all features of the analytic plan (Tables 1 -4) and a summary of the main findings (Table 5). Table 1 presents the correlations with Scantron © Scaled Scores and BRIEF and BRIEF-SR sub-scores and total scores, IEP variables, and demographics. Teacher-reported BRIEF Global Executive Composite (GEC) scores-the overall measure of executive functioning-were strongly and significantly correlated with all three Scaled Scores (rs = -.38 --.45) (Note 1) Student-reported BRIEF-SR GEC scores correlated weakly with the Scaled Scores; the correlation with Language Arts was moderate (r = -.16) while the correlations with Math and Reading were both low (rs ≈ -.08). BRIEF and BRIEF-SR GEC scores themselves correlated moderately (r = .19). The various individual BRIEF EFs-Emotional Control, etc.-also showed relatively similar correlations with Scaled Scores. The correlations with Language Arts (rs = -.44 --.48) tended to be stronger than those for either Reading (rs = -.33 --.41) or Math (rs = -.32 --.39), but no one EF stood out as especially-well correlated.

Correlations
There was somewhat more variation among the individual BRIEF-SR EFs. Task Completion showed relatively stronger correlations than other executive functions with all three Scaled Scores (rs = -.16 --.23). Inhibit, Monitor, Organization of Materials, and Plan/Organize all correlated more strongly with Language Arts Scaled Scores (rs = -.16 --.20) than with either Math or Reading Scaled Score (rs = -.03 --.10). Emotional Control and Working Memory were not significantly correlated with any of the Scaled Scores (largest r = -.06).
For demographic variables, students' genders correlated significantly with Language Arts and Reading (rs = .13 & .07, respectively) but not Math (r = .01) Scaled Scores, indicating that girls showed a small but significant tendency to out-perform boys on linguistic-related exams. Whether a student was eligible for free / reduced-priced school lunch was not significantly correlated. Identifying as a member of most races/ethnicities categorized here tended not to significantly correlate with Scaled Scores, except for identifying as Asian-American (rs = .09 -.12) or European-American's correlations with Language Arts scores (r = .06), although even these correlations were small.
In summary, EFs correlated well with performance on these standardized exams. Teacher-reported BRIEF scores correlated more strongly than student self-reported BRIEF-SR scores. Various EFs within the BRIEF and BRIEF-SR tended to show rather similar levels of correlations with standardized scores, although student-reported BRIEF-SR scores varied more. Gender also correlated with Reading and Language Arts scores, but not Math; other demographic factors were not strongly reliably correlated with the exam scores.
Having an IEP correlated significantly and similarly with all three Scaled Scores (rs = -.34 --.36). The negative correlation indicates that those with IEPs tended to receive lower Scantron © Scaled Scores.
In addition, the various diagnoses within the IEPs showed variable levels of correlations with Scaled Scores.
Intellectual impairment correlated about -.23 with all three Scaled Scores, and speech or language impairment correlated strongly with Reading (r = -.28) and Language Arts (r = -.22), and less so with Math (r = -.17). Being diagnosed with a learning disability (rs = -.12 --.16) and even with social/emotional impairment (rs = -.02 --.06) correlated significantly with Scaled Scores, but the magnitudes of these correlations were small enough to warrant little attention.
The student's age was not significantly correlated with Scaled Scores (r = .08 -.22). However, we found an association between age and Scaled Scores when age was nested within student in the multilevel models reported below.
The Scaled Scores themselves were highly intercorrelated (rs = .63 -.70). The entire matrix of correlations between the variables is provided in Appendix 1, available online.

Predictive Modeling: Interactive Effects of the Variables
The zero-order correlations with Scaled Scores suggest that EFs (especially as measured by the teachers) as well as students' IEP status, and perhaps demographics provide insights into these students' performance on the Scantron exams. This provides important insight into understanding their success here. However, these correlations only show how each of the variables relate to the Scaled Scores-not how they act together. The matrix of correlations for all of the variables (presented in Appendix 1, available online) indicates that several of the other variables correlate with each other. The individual EFs measured by the BRIEF correlated strongly (mean r = .91), as do those measured in the BRIEF-SR (mean r = .58), and measures between these instruments also correlate mildly (mean r = .14). Special education status correlates as well with BRIEF scores (mean r = .25, r with GEC = .26) and gender (r = -.17), but not well with BRIEF-SR GEC score (r = .09) or sub-scores (mean r = .07)-or with free / reduced lunch status (r = .05), or race/ethnicity (mean |r| = .06).

Multilevel (Hierarchical) Linear Models: Joint Effects of the Variables
The picture, however, is not as simple as these correlations present. We must investigate their joint effects to see a more complete picture. We did this through a series of multilevel models, first creating a base (null) model without EF-related terms. The base models in Tables 2-4 therefore include all model terms except those related to executive functioning.
We then added to these base models either terms for BRIEF GEC (i.e., total) scores or for BRIEF-SR GEC scores. The BRIEF columns Tables 2-4 present the changes in model terms after adding the teacher GEC main effect term and the GEC × Age interaction to the base model; the BRIEF-SR columns after adding those terms for student self-reports. Creating a base model allows us to test whether considering EFs more holistically can improve our understanding of how well students perform on the respective Scaled Score. This improvement can be tested by comparing the Bayesian information criterion (BIC) for the base model against the BIC for the model containing the terms for executive functioning. In all cases, adding the executive functioning terms greatly improved the model fits (smallest = 4916.37; p < .001 for Math Scaled Score predicted by BRIEF-SR terms).
Of course, we then also investigated the size and significance of each of the model terms in relation to the scores in math, reading, and language arts.

Math Scaled Scores
The base model in Table 2 shows that when students' age, gender, free / reduced lunch status, IEP status, and race/ethnicity are all significant when added together to predict Math Scaled Scores. The effect sizes (β-weights) for each term are also rather large. Using the effect size guidelines given by Kraft (2020) (Note 3) suggests that these are all rather "large" effects for education interventions; even Cohen (1988) would consider them to be between "small-" and "medium-" sized effects.   Adding in both the main effect for BRIEF GEC scores and GEC × Age interaction terms to the model greatly improved the fit of the model to these data (BIC Base Model = 4474.09, BIC BRIEF Model = 602.19, = 3871.89, p < .001). Adding those two BRIEF terms also changed the significances (and of course effect sizes) of those terms. Gender and free / reduced lunch status were no longer significant, representing the associations both have with executive functioning often found by others (e.g., Martoni et al., 2015;Noble et al., 2015).
Teacher-rated overall EFs had a small to medium, significant effect on Math Scaled Scores (β = -0.20, p = .019). The interaction with students' ages, however, was not significant (β = 0.17, p = .179). This lack of an interaction suggests that the magnitude of the effect of EFs on math performance does not appreciably change as adolescents age.
The effect of student self-rated overall EFs showed a similar pattern. The main effect of EFs was significant (β = -0.20, p = .031) while the interaction with age was not. The size of main effect for student-rated EFs was thus not appreciably different than for rated by the teachers (95% confidence interval for BRIEF GEC, β = -0.36 --0.03; for BRIEF-SR GEC, β = -0.39 --0.02).
All demographic terms remained significant in the model with student self-rated executive functioning. This included gender.
The size of the main effects for overall executive function relationships with Reading Scaled Scores (β BRIEF = -0.29, β BRIEF-SR = -0.17) were similar to the size of the effects for Math Scaled Scores (β BRIEF = -0.20, β BRIEF-SR = -0.20). However, the effects for Reading scores were more clearly significant, indicating more reliable relationships to reading ability than to math ability.

Language Arts Scaled Scores
Not all race/ethnicity categories significantly predicted Language Arts Scaled Scores in the base model, but the effects for student age, gender, IEP status, and free / reduced lunch status all were (Table 4).
Adding terms for teacher-rated overall executive functioning significantly improved the model (BIC Base Model = 5617.82, BIC BRIEF = 769.58, = 4848.24, p < .001). It led to a large and significant main effect for executive functioning (β = -0.41, p < .001) but no significant interaction with age (β = 0.04, p < .640)-and to a loss of significance for the free / reduced lunch, change in significance of race/ethnicities and generally smaller effects sizes for the demographic terms. This hints towards more complex relationships between demographics, executive functioning, and language arts abilities.
The pattern was similar, but slightly less pronounced, for student self-rated executive functioning. Adding executive functioning terms also significantly improved the model's fit (BIC BRIEF = 797.91, = 4819.91, p < .001), and generated a significant main effect for executive functioning (β = -0.23, p < .001) but no significant interaction with age (β = -0.06, p = .410). It also led to changes in the size-and sometimes significance-of the other model terms; free / reduced lunch status was no longer significant (β = 0.21, p = .080), and the pattern of significance among race/ethnicities changed slightly.
The size of the main effect for student-rated executive functioning (95% CI = -0.33 --0.13) was significantly smaller than that for the teacher-rated executive functioning (95% CI = -0.54 --0.30). The association between executive functioning and language arts ability (when accounting for effects of relevant demographics) is weaker when measured by the student themself and when measured by their teachers.

Effects of Individual Executive Functions
The BRIEF and BRIEF-SR sub-scores showed relatively similar patterns in predicting the three Scaled Scores. The models used to test the effects of individual EFs are available in Appendix 2, available in supplemental online materials. Those tables show that BRI and MCI sub-scores were also never appreciably different from each other. This remained true for the individual executive functions that comprise those sub-scores (and the overall GEC score). For example, the coefficients for the BRIEF BRI and MCI main effects predicting Math Scaled Scores were For Language Arts, the BRIEF BRI and MCI main effects were both comparably large (βs = -0.42 and -0.41, respectively), and the individual executive functioning scores that comprise each were within each other's confidence intervals. The sub-scores for the BRIEF-SR were also very similar (both βs ≈ -0.22), and their individual executive functioning scores also did not significantly differ.
There was more variability of the associations with Scaled Scores among the individual EFs rated by the students themselves through the BRIEF-SR. EFs were again more predictive of Reading and Language Arts scores than Math; in fact, all individual BRIEF-SR EFs were significant predictors of Reading and Language Arts (except Organization of Materials, β = -0.084, t = -1.96, p = .051). However, only BRIEF-SR Task Completion, Plan/Organize, and Monitor significantly predicted Math Scaled Scores.

Comparison of Teacher-Versus Student-Rated Executive Functioning
In addition to investigating whether teacher-rated and student self-rated EFs predicted Scaled Scores individually, we also analyzed if they predicted the scores relative to each other. The farthest-right set of columns in Tables 2 -4 present the results of these analyses.
Teacher-rated BRIEF GEC scores significantly predicted Math Scaled Scores when both they and student self-rated BRIEF-SR scores were added together in the model. Although the BRIEF-SR GEC main effect had been significant when it was along in the model, it was no longer significant when BRIEF scores were also added ( Table  2). We therefore conclude that teacher and student ratings do not contribute sufficiently unique information to the prediction of Language Arts scores; the magnitude of the association with BRIEF scores may also have been stronger than that with BRIEF-SR scores.
The pattern of significance among the other predictors (age, gender, etc.) was similar when only BRIEF or BRIEF-SR scores were added to the Math model. Age, IEP status, free / reduced lunch eligibility, and all race/ethnicities were significant, but gender was not.
For the model predicting Reading Scaled Scores, both the main effects for BRIEF and BRIEF-SR scores were significant; teacher and student ratings thus each contributed significantly unique information about that student's reading performance. The BRIEF × Age interaction was also significant; teacher-rated executive functioning became more strongly associated with Reading scores as the students aged. Once again, the pattern of significances of the other predictors resembled those for the models with only BRIEF or BRIEF-SR terms (Table  3).
Finally, for Language Arts Scale Scores, both the main effect and age interaction terms for the BRIEF remained significant, but neither BRIEF-SR term was. The BRIEF and BRIEF-SR main effects had each been significant when they were alone in the model; as with Math Scaled Scores, teachers and students do not contribute sufficiently unique information about the student's performance (Table 4).
The BRIEF × Age interaction had not been significant when BRIEF terms were alone in the model, but this term became significant when BRIEF-SR terms were added. It is not immediately clear why the changing effect of teacher-rated executive functioning would become clearer when student self-rated executive functioning-and its non-significant changes over time-were also considered.
The pattern of significances of the other terms was also similar to those when the BRIEF or BRIEF-SR scores were alone in the model, although effects of self-identifying with any of the races/ethnicities was not significant. Age, gender, and IEP status were still significant, and free / reduced lunch eligibility still was not. Table 5 summarizes which terms in the various models were reliably significant predictors of the three Scaled Scores, which were sometimes significant, and which were never significant. Teacher-rated BRIEF GEC scores reliably predicted performance in math, reading, and English language arts across the middle and high school grades. Age was also reliably associated with performance on these standardized exams: Older students tended to do better on these exams even though the exam scores are standardized by grade level. BRIEF scores themselves also tended to become more strongly associated with Reading and Language Arts (but not Math) scores as the students aged.  Student self-rated BRIEF-SR scores were also significantly associated with performance in all three content areas-but not always when teacher-rated BRIEF scores were also added. BRIEF-SR scores only remained significant-i.e., only continued to provide unique information-when predicting Reading scores. The predictiveness of the BRIEF-SR scores did not change over time (their interaction with age was never significant). What information students' self-rating provided did not change.

Summary of Results
Students' gender identification and eligibility for free / reduced lunches reliably predicted performance on the Reading and Language Arts exams, but less so performance on the Math exam. Gender was never a significant predictor of Math scores when any EF terms were considered. free / reduced lunch eligibility was significant whenever BRIEF-SR terms were included. Students' race/ethnicity was sometimes associated with their performance on the three exams. Consistent patterns here are difficult to detect and anyway would be even harder to interpret. Students with IEPs tended to perform more poorly on all three exams; this association remained after either teacher-or student-self-rated executive functioning scores were added.

Relationship Between EFs and Academic Performance on Standardized Tests
This longitudinal study supports the relationship between EFs and academic performance on standardized tests among adolescents in schools that integrates students with and without disabilities. Our findings indicate that the overall levels of adolescent student EFs measured by both teachers (via the BRIEF) and by the students themselves (via the BRIEF-SR) significantly predicted those students' performance on standardized reading, math, and English language arts exams. These results support the importance of EFs for academic performance and build upon prior research that used GPAs to predict EFs that were also assessed by teachers and by the students themselves (Samuels et al., 2016(Samuels et al., , 2019 Best et al. (2011) investigated the relationships between EFs and academic achievement among both children and adolescents and found that EFs were moderately correlated with success in both math and reading achievement therefore, consistent with our findings. Below are some of our specific findings.

Comparison of BRIEF with BRIEF-SR
One of the goals of our study was to compare the contributions of the BRIEF with those of the BRIEF-SR for their uses as experimental tools. We found that BRIEF scores (reported by teachers) out-performed BRIEF-SR scores (self-reported by students) as predictors of academic outcomes. The effect sizes (β-weights) for BRIEF terms showed some tendency to be larger than those for the BRIEF-SR; those BRIEF terms were also more reliably significant-even when both BRIEF and BRIEF-SR terms were both added to the models. BRIEF GEC ratings successfully predicted student standardized reading, math and English language arts scores across the middle and high school grades; BRIEF-SR scores were predictive but less so. However, when teacher-rated BRIEF scores were also added, BRIEF-SR scores only remained significant-i.e., only continued to provide unique information-when predicting Reading scores. These results suggest that BRIEF and BRIEF-SR scores cannot be used interchangeably to make significant predictions and are somewhat different from previous findings (Samuels et al., 2016(Samuels et al., , 2019 in which the scores of the BRIEF or BRIEF-SR could be used alone to make significant predictions about how students perform in middle and high school courses. The current study suggests that using both may be unnecessary, although-of course-more research must be conducted to better support that.

The Behavioral Regulation Index (BRI) and the Metacognitive Index (MCI)
It is of considerable significance that both EF overall and its two components-the Metacognitive Index (MCI) and the Behavioral Regulation Index (BRI)-correlated with academic performance as measured by standardized testing. It supports prior factor analyses, enhances the overall findings of this study, and can influence academic practice. That is, it suggests that classroom work includes a good deal of emphasis on behavioral regulation, i.e., the "ability to shift cognitive set and modulate emotions and behavior via appropriate inhibitory control" while allowing "metacognitive processes to successfully guide active, systematic problem solving (and supports) appropriate self-regulation" (Gioia et al., 2000, p. 20). The current study sample were students in a school implementing an established, research-based "wellness" curriculum designed for adolescence and focused on social-emotional skills training and problem solving.

Students with Disabilities
Adolescents' EFs-especially when measured by their current teachers-predicts performance on standardized academic assessments throughout the middle and high school grades. This effect existed among a rather diverse sample of students both with and without diagnosed disabilities (the latter, scoring lower on measures of EF and Academic Performance, as expected).

Age
Age was reliably associated with performance on these standardized exams: older students tended to do better even though the exam scores were standardized by grade level. BRIEF scores themselves also tended to become more strongly associated with Reading and Language Arts (but not Math) scores as the students aged. EFs typically improve throughout childhood, aligning well with the maturation of the frontal lobes (Anderson, 2002) and with cortical areas that continue to develop throughout adolescence (Faridi et al., 2015;Yakovlev & Lecours, 1967 (19) studies but could not consistently track the development of EFs within individuals; their meta-analysis can thus be seen as providing a strong, generalizable, but cross-sectional investigation of the moderating effect of age, with most of the studies included therein sampling adolescents near the lower range of those sampled here (Note 4). The current study, however, represents a closer, more controlled investigation of a smaller sample of adolescents who tend to be older and less diverse (Note 5) than those studied by Cortés Pascual et al.
We benefited from having obtained both teacher-and self-reports on EFs every year for the same adolescents. A different teacher rates a given adolescent every year. Although we have not tested this proposition, it may well be then that each year's teacher is using a different standard to evaluate EFs, with, e.g., a six-grade teacher basing their ratings on sixth graders and a twelfth-grade teachers basing their ratings on twelfth graders. If so, this would suggest that it is not the improvement in EFs per se that is leading to an effect of age (on reading and language arts), but that EFs are increasingly important as an adolescent age-whatever the EF's level of maturation at those ages.
We did not find a significant effect of age on the effects of self-reported EFs on standardized exam performance. We generally found fewer and weaker effects of EFs measured through the students themselves; this is at least partially due to the greater variability within the students' self-reported scores. However, it may also be that adolescents view their own behaviors quite differently than their teachers (BRIEF GEC and BRIEF-SR GEC scores correlated weakly here, r = .19)-and that the criteria used by teachers measures EFs in ways more relevant jedp.ccsenet.org Journal of Educational and Developmental Psychology Vol. 13, No. 2;2023 to academic performance. This conjecture remains untested but gains some support from the fact that the effect of age on teacher-reported EFs was only significant here when both teacher-and student-reported scores were both added to the model; adding them both together isolates the effects of each from each other (Note 6), making any differences in what those scores measure more acute.

Gender
Identifying as female reliably predicted better performance on the Reading and Language Arts exams, but less so for performance on the Math exam. Gender was never a significant predictor of Math scores when any executive functioning terms were considered suggesting that the effect of gender is sufficiently mediated by girls' stronger development of EFs here. Our findings do not confirm those reported by Grissom and Reyes (2019) who reviewed several studies and concluded that while individual factors may show a tendency towards gender differences in EFs (e.g., increased impulsive action in males, reduced reaction time in males, avoidance of frequent punishment in females, improved working memory in females), those differences are not overwhelming. Within-gender variability often far exceeds between-gender variability, and in few cases could one look at a given person's data and be able to classify them by their responses as male or female.

Race/Ethnicity
Students' race/ethnicity was sometimes associated with their performance on the standardized exams. Consistent patterns here are difficult to detect but our findings seem to support results by Reid and Ready (2022) who report on the heterogeneity of EFs' development. More specifically, they stated that low-SES and Hispanic dual-language-learning children with immigrant parents entered kindergarten with the lowest average EFs skills but then made remarkable EFs gains. However, low-SES, non-dual-language-learning Black and Hispanic children had similarly low initial EFs skills, but did not exhibit the same pattern of catch-up, in part due to their reduced likelihood of enjoying positive relationships with their teachers.

Conclusions and Implications: The Importance of EFs in Academic Performance
Adolescents' EFs-especially when measured by their current teachers-well predict performance on standardized academic assessments throughout the middle and high school grades. The current study therefore further supports the body of research underlining the importance of EFs in academics; it also advances that understanding, as follows.
First, we found this effect among a rather diverse sample of students with and without diagnosed disabilities. Although all of these students attended one school, they represent the general population better than most studies on adolescent's EFs where those with particular disabilities (e.g., ADHD) are the only ones studied. These findings should therefore support the roles of EFs among equally general populations of adolescents.
Second, EFs rated through teachers were more reliably predictive of standardized exam performances than EFs rated by the students themselves. Prior studies (e.g., Samuels et al., 2019) found that adolescents' EFs predicted their GPAs; although such studies lend support to the EF-academics association, the teachers who rated the EFs were among those assigning grades. The current study removes that potential bias, demonstrating the importance of EFs on academic outcomes created and assessed well outside of the school. It is worth noting that we do not believe these results mean that students' self-assessments are either wrong or uninformative. It may be that students self-assess their EFs accurately, but if they do, it's in ways that are less related to their academic performance.
Students' self-assessments also tended to vary more than those made by the teachers. A different teacher assessed the students each year, so the variability in teacher assessments here arises from some different sources than the variability in a student assessing themselves over different years. If one's goal is simply to measure EFs in ways more closely associated with academic performance, then, asking teachers-and perhaps multiple teachers-is advisable.
Third, we found that the predictiveness of EFs tended to improve across these grades-when measured by teachers, but not those measured by students themselves. EFs were somewhat more strongly associated with performance on Reading and Language Arts exams than on Math exams. Although we cannot say why this is so, it implies that EFs may even be more important among older adolescents, perhaps as they become increasingly entrusted (even expected) to be responsible for their own academic performance.
Fourth, as expected, students with IEPs tended to display lower EFs scores and to obtain lower scores on these standardized assessments than those without IEPs.
Fifth, students who identified as female tended to show stronger EFs. Nonetheless, gender per se continued to jedp.ccsenet.org Vol. 13, No. 2;2023 significantly predict exam performance even after EFs terms were added to the models (thus partialing out the effects of EFs on the gender-exam association). Therefore, those who identify as female appear to benefit from generally stronger EFs and from other gender-related factors not directly measured by instruments like the BRIEF and BRIEF-SR.
Sixth, we did not find that any one individual EF was markedly more predictive than any other individual EF. This was especially true among those rated by the teachers. This was somewhat less true among those rated by the students themselves where Task Completion, Plan/Organize, and Monitor significantly were the only individual EFs that significantly predicted Math scores (all individual student-rated EFs significantly predicted Reading and Language Arts scores). It may be that we could not detect real differences between the EFs, but even where they exist, these individual differences appear to matter less than a consideration of EFs more broadly. We therefore recommend at least measuring a wider range of EFs and expecting that general interventions may be more effective than ones tailored to specific EFs.
Finally, an initial analysis of those with a diagnosed disability, indicated that a one-year change in age was associated with essentially the same level of academic performance as the prior age, when students were placed in the inclusive classrooms of this study. In contrast, Tormanen and Roebers (2018) found that after two years students with disabilities placed in self-contained classrooms displayed significantly lower scores in academic achievement and EFs. It is clear that more research is needed on this subject.
Together, these findings suggest that EFs offer unique and important insights into adolescents' performance on standardized exams. Researchers can further investigate: 1) The relationship between EF and performance for those with diagnosed disabilities; and 2) the mechanisms through which EFs affect performance-especially on high-stakes exams-and whether interventions can help students further strengthen their EFs. Teachers and administrators can consider a holistic development of their students while knowing that addressing broad competencies like EFs can help students with important, specific tasks like performance on standardized exams.

Limitations
The primary limitations of this study are that the sample, though large, contains students who all attended the same middle and high school. In addition, the study design was single group observation in an important setting (student with and without disabilities) and the design did not manipulate EFs to investigate more systematic effects on academic performance. More controlled trials are needed in a systematic series of studies.
Following students at only two schools affects the generalizability of the results (their internal validity should not be affected by this). The schools that these students attend are entirely inclusive, at least in the sense that students with and without disabilities share all classes and extracurricular activities. These rare inclusive environments may affect the roles of EFs in students' performance, although we cannot at this time say how.
The importance of EFs may differ among other student bodies. EFs may also develop differently among other populations, so longitudinal studies conducted elsewhere may find stronger, weaker, or simply different effects of time and age. The mixed results found elsewhere (e.g., Cortés Pascual et al., 2018) also suggest that this is a particularly worthwhile area for further study. These limitations notwithstanding, the results of this study extend the contribution of executive functioning to standardized assessments, adolescents with and without diagnosed disabilities and for both its cognitive and behavioral components.

Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).