Developmental Relationship Between Metacognitive Monitoring and Reading Comprehension

,


Introduction
Metacognitive processes are components of self-regulatory abilities that help individuals to cope with everyday life challenges. Knowing about, monitoring, and controlling one's cognitive enterprise is important to effectively solve problems in and outside schools (Flavell, 1979;Schraw & Moshman, 1995). Metacognition is "any knowledge or cognitive activity that takes as its object or regulates, any aspect of any cognitive enterprise" (Flavell & Miller, 2002, p. 150). Paris and Byrnes (1989) differentiated between declarative and procedural metacognition. Declarative metacognition refers to conscious and detailed knowledge about factors in tasks and situational variables that affect cognitive performance and the reasons behind that (Paris, 2002;Edossa et al., 2019). The declarative facet of metacognition is mainly "verbalizable, stable, and late-developing" (Schneider, 2015, p. 258). Procedural metacognition, on the other hand, is the application of metacognitive aspects such as monitoring, controlling, and regulating (Fritz et al., 2010;Nelson & Narens, 1994).
Monitoring is one component of procedural metacognitive processes and refers to a live awareness of task performance such as text comprehension (Schraw & Moshman, 1995). Nelson and Narens (1990) distinguished between prospective monitoring and retrospective monitoring. The three common types of prospective monitoring are namely: ease-of-learning, judgment-of-learning, and feeling-of-knowing (Nelson & Narens, 1994). On the other hand, confidence judgement is one aspect of retrospective monitoring in which individuals reflect on the accuracy of a previous recall response (Nelson & Narens, 1990). The focus of the present study is retrospective monitoring, specifically, judgements that follow cognitive performance -commonly known as postdictions -and its developmental interrelationship with reading comprehension.

Metacognitive Monitoring and Academic Achievement
The role of self-regulation such as emotional and behavioral regulation (Edossa et al., 2017) and metacognition (Schneider & Artelt, 2010;Tibken et al., 2021;Edossa et al., 2019 ) on academic achievement is established. Specifically, the concept of metacognition seems applicable in the explanation of children's production deficiencies in several cognitive tasks (Schneider, 2015). Production deficiency is children's ability to deploy a strategy that is taught but an inability to execute spontaneously (for review see Schneider & Pressley, 1997). Metacognition has been positively linked with academic achievement in general and in the process of reading in particular (Pintrich, 2002;Schneider & Pressley, 1997;Zargar et al., 2020). Metacognition is assumed to play a fundamental role in the reflection of live reading processes, choosing and employing relevant strategies (Artelt & Schneider, 2015;Soto et al., 2020). Students who are capable of regulating their learning process are actively engaged in metacognitive activities before and after reading to construct meaning from a text (Paris et al., 1991;Schneider &Pressley, 1997). Like other facets of metacognition, a positive association has been reported between procedural metacognition and academic achievement (Schneider & Pressley, 1997).
Specifically, the monitoring aspect of procedural metacognition plays a vital role in the learning process by providing information about the current epistemic state to regulate the process of learning, effectively (Nelson & Narens, 1994). Metacognitive judgment in the process of monitoring helps us to observe and reflect on our cognitive process (Flavell, 1979). It has a valuable role in self-regulated learning in terms of the allocation of time and employment of appropriate learning strategies (Thiede & Dunlosky, 1999). The decision to continue a study item widely depends on the feedback from the judgement accuracy of students in the monitoring process (Hacker et al., 2000;Thiede & Dunlosky, 1999). Premature termination of study and unnecessarily prolonged study time can be prevented when students acquire the skill to accurately judge their learning process and performance (Hacker et al., 2000). Accurate predictors focus and exert their energy on the right items because they have an insight into the cognitive demand of the task and their ability to perform the given task (Dunning et al., 2003).
However, the relationship between metacognition and academic achievement is not necessarily unidirectional. There is a line of theoretical assumptions that the developmental relationship between metacognition and academic achievement can be reciprocal (Flavell & Wellman, 1977;Schneider & Pressley, 1997). Dunning et al. (2003) assume the ability required to accurately judge one's performance is similar to the skill to correctly solve a given cognitive task. Therefore, they argue that if people fail to produce correct responses, they are double cursed with the inability to be aware of whether their responses are correct or wrong. For instance, incompetent readers fail to understand what they comprehend from the text (Dunning et al., 2003). This implies the importance of knowledge for metacognitive monitoring process. Tibken et al. (2021) investigated the effects of metacognitive competence -declarative and procedural metacognition-on the development of school achievement in gifted and non-gifted adolescents, and the result of the study showed that metacognitive competence predicted school achievement (including reading comprehension) for both group of participants. The study linked low metacognitive competence to underachievement among gifted children. Roebers et al. (2014) demonstrated a strong and positive effect of metacognitive monitoring on test performance in a study conducted among 9 and 11 years old children with β = .76 and β =.52, respectively. Similarly, Stankov et al. (2012) found the accuracy of judgement to be the best predictor of mathematics and reading achievement. Roderer and Roebers (2010) found that the highest achieving students were most accurate in performance prediction. Overall, the positive role of metacognitive competence on academic achievement seems to be well established (Bouffard et al., 1998;Tibken et al., 2021;Roebers et al., 2014). On the other hand, King and McInerney (2016) examined the longitudinal relationship between metacognitive strategy (changing to improve, monitoring and planning) and academic achievement (mathematics and reading comprehension). They employed self-reported measures to assess metacognitive processes and cross-lagged analysis to investigate the developmental interplay between metacognitive strategy and academic achievement. However, they found a contradictory result to the established literature that metacognitive processes had no statistically significant longitudinal effects on academic achievement. The study revealed a uni-directional influence from academic achievement to metacognitive strategy only.
Cai et al. (2019) studied the developmental interplays among goal, metacognitive strategy, and school achievement longitudinally across three time points, and the authors reported a bi-directional developmental interplay between metacognitive strategy and academic achievement from time points 1 to 2. However, the developmental interplay became uni-directional from time points 2 to 3. Only metacognitive strategy had a statistically significant prediction on academic achievement in the later time points. Taken together, the literature on the developmental association between metacognitive monitoring and academic achievement appears inconsistent. Therefore, the main aim of the present study is to examine the developmental relationship between metacognitive monitoring and reading comprehension by taking into account socioeconomic status (SES) since past investigations have shown a positive effect of SES on both metacognition and achievement (Thompson & Foster, 2014;Yerdelen-Damar & Peşman, 2013). Meta-analytic studies reported that SES and academic achievement have a moderate to strong relationship (Sirin, 2005;Liu, Peng, & Luo, 2020). Metacognition was reported to play a mediator role in the relationship between SES and academic achievement (e.g., Yerdelen-Damar et al., 2013). SES might also moderate the relationship between metacognition and academic achievement (e.g., Koyuncu et al., 2021). Therefore, SES is controlled to rule out the confounding effects of SES on the result of the study.

Implications of Under-and Overestimation on Achievement
The role of monitoring accuracy in academic achievement has been discussed from a metacognitive perspective. Low monitoring accuracy might be due to under -or overestimation. Researchers from the perspective of social psychology, personality, and motivation further debate the implication of under-and overestimation of one's performance in a healthy developmental adaption of children. Overall, people tend to overestimate their competence in several areas such as reasoning, grammatical ability, and social skills (Kruger & Dunning, 1999;Dunning et al., 2003). Generally, high confidence is observed for correct answers across all age groups but young children tend to be more overconfident than older children and adults for answers that turn out to be incorrect (Roderer & Roebers, 2010). While the metacognitive research perspective emphasizes the importance of judgement accuracy for effective regulation of learning, the other perspectives from social psychology, personality, and motivation studies have mixed assumptions on the adaptive implication of under -and overestimation in and outside school (Bouffard & Narciss, 2011). Drawing on assumptions from social cognitive theory, some argue that overestimation of one's performance is useful for academic competence because it boosts motivation and enhances persistence in challenging contexts (Bouffard et al., 2006;Bouffard & Narciss, 2011). They argue overestimation can be helpful to solve novel and difficult tasks by making students persistent and motivated -which they would not pursue if they were accurate in judging their competence (Bjorklund & Bering, 2002;Shin et al., 2007). As a result, consistent underestimation of performance is assumed to be limiting and detrimental to academic achievement in terms of motivation (Bouffard & Narciss, 2011). Contrary to this assumption, the other line of the literature suggests that overestimation may be an obstacle for students to identify their learning needs and thus incapable of effectively regulating their learning process and putting the required effort into their study (Dunlosky et al., 2005;Butler & Winne, 1995). Therefore, students who overestimate their performance might not be well prepared, and might not ask for help from other persons (Stone & May, 2002) and thus may achieve lower than students who moderately underestimate their achievement. In addition, because low achieving students may not notice their poor performance, they might engage in overestimation (Dunning et al., 2003). As reflected by the theories, empirical evidence is mixed. For instance,  positively associated overestimation with higher academic achievement. In contrast, other studies linked underestimation to high academic achievement (Chiu & Klassen, 2010;Gonida & Leondari, 2011).
Studies conducted from the perspective of metacognition focus on the positive role of monitoring accuracy on academic performance and vice versa. Other researchers assume overestimation or self-enhancement of one's performance could have a positive effect on academic achievement via motivation while underestimation might be detrimental. The difference across different research perspectives is not only in theoretical assumptions but also in terminologies and methods used. Studies conducted in light of social learning theory use social comparison or social consensus as a criterion to determine self-evaluation bias (Bouffard & Narciss, 2011). However, this method is criticized because it is dependent on the attitude of other people (Kwan et al., 2008). To deal with this limitation, metacognition researchers use specific assessment criteria such as test scores to determine the accuracy of judgement of one's competence. There are two popular methods to measure the accuracy of judgement of performance about test scores. The first method calculates a deviation score by subtracting the criterion (actual test score) from self-evaluation/estimation. However, this method has been criticized, especially, in studies that focus on the relationship between judgement and academic competence because the deviation score is confounded with competence (Dufner et al., 2015). To overcome this limitation of deviation score, a second method is recommended which is computed by regressing the self-evaluation (estimation of one's performance) on the criterion (actual test score) and taking the residual score to measure the level of judgment accuracy (Dufner et al., 2015;Gonida & Leondari, 2011). Therefore, in the present study, the residual score was mainly used to measure accuracy of judgement to find out its longitudinal relationship with reading comprehension using large-scale data from the German National Educational Panel Study (NEPS).

Hypothesis
The literature on procedural metacognition, metacognitive monitoring in particular, and its role in academic achievement are mixed. Researchers from the perspective of metacognition emphasized the importance of monitoring accuracy to effectively employ appropriate strategies and allocate proper time and energy for academic success. In addition, academic achievement is assumed to have effects on the development of metacognitive monitoring accuracy despite scant longitudinal empirical investigations. In addition, it is not clear whether the positive role of monitoring accuracy on achievement in general, reading comprehension in particular, holds true for both under -and overestimating students. Therefore, the study intends to fill this gap by testing the following hypothesis using a residual based score of judgement, which is less confounded to achievement.
• Metacognitive monitoring has a positive effect on the development of reading comprehension from grades 5 to 7 and grades 7 to 9.
• In turn, reading comprehension has a positive effect on the development of metacognitive monitoring from grades 5 to 7 and grades 7 to 9.
• In addition, we aimed to test whether the above hypotheses hold true for under-and overestimating students or not.

Participants and Procedures
The study uses data from the National Educational Panel Study (NEPS), a longitudinal large-scale project on education and development from early childhood to late adulthood in Germany (Blossfeld et al., 2011). The present study focused on the starting cohort of grade 5 that consisted of regular students randomly drawn using a multi-stage stratified cluster sampling technique (starting from the year 2010/2011) to make sure that the sample is representative of students of secondary regular schools in Germany (for more details on the sampling procedure (see Steinhauer et al., 2015). The present study focused on the main sample (n = 5,870) and followed them at three time points: grade 5 (10 years old), grade 7 (12 years old), and grade 9 (14 years old). In the later time points, additional refreshment participants were recruited. However, these participants were not included in the present study as data were not collected from them starting from the initial wave (grade 5). Therefore, the focus of the present study is the main sample, which amounts to 52% of boys, approximately. The competence tests were administered in small groups by trained test administrators at the respective schools. The assessment was paperbased at each of the measurement points. More information on the procedures and the test and survey instruments used in starting cohort of grade 5 of the NEPS is available online (https://www.neps-data.de/Data-Center/Data-and-Documentation/Start-Cohort-Grade-5/Documentation).

Reading Comprehension
The framework of the assessment of reading competence at the NEPS is mainly focused on the functions of text and the type of text connected to these functions and how they are related to the cognitive requirement of reading (Gehrer et al., 2013;Weinert et al., 2011). The competence was measured using five text functions namely: a) informational texts, b) commenting or argumentum texts, c) literary texts, d) instruction texts, and e) advertising reading texts (Gehrer et al. 2013). The cognitive requirements were namely: finding information text, drawing text-related conclusions, and reflecting and assessing. The approximate length of each text was from 200 to 550. Most of the tasks were designed in a multiple-choice format. The rest of the tasks were in decision making or matching format. Participants were asked whether a given statement was correct or incorrect in the decision making task while a selection of title to a corresponding text was involved in the matching task (Gehrer et al., 2013). The test duration was 28 minutes at each of the measurement points. The test consisted of a total of 32 items in grade 5, 29 items in grade 7, and 32 items in grade 9. The items were reported to support unidimensional structure. They have good item fit and high reliability score (.76 to .78), and were measurement invariant across subgroups (Pohl et al., 2012). Except for the first wave (grade 5), the tests were adapted to the ability of the participants, and respondents with low ability were given tests with relatively less difficult items. The test scores were linked across time points to make sure they are comparable over the time points for longitudinal investigation (Fischer et al., 2016). Weighted maximum likelihood estimates (WLEs) scores were used instead of sum and mean scores to take the item difficulty into account. Lockl, 2013), in which children reflect on the accuracy of a previous recall response (Nelson & Narens, 1994). Specifically, after completing reading tests, participants were asked to estimate their performance ("how many of the questions did you presumably answer correctly?") (Lockl, 2013). The judgements were global as well as text specific. In the present study, we used text specific judgements. Two types of scores were calculated based on the estimation of children's performance. The first score was the proportion of the estimated correctly solved number of items. This was calculated by dividing the number of items they judged they correctly solved by the total number of items. The second score was a deviation score, which is the difference between the proportions of correctly solved items and the proportion of the estimation of correctly solved items. For a methodological reason, instead of directly using these scores, we calculated different procedural metacognition (judgement accuracy) indexes. First, we predicted the proportion of the estimation of their performance from their actual score in reading competence. Second, we took the absolute value of the residual. We computed this score because previous studies (Dufner et al., 2015) have shown that deviation scores are problematic because they are confounded with the corresponding academic achievement to study their interrelationship. Therefore, the residual based index of procedural metacognition was used as an indicator to model procedural metacognition at a latent level. There were five indicators in grade 5 and three indicators in grades 7 and 9.

Socioeconomic Status
The socioeconomic status of participants was assessed by ISEI-08 (International Socio-Economic Index of Occupational Status). It is a scaling of occupations in the context of a status attainment process, i.e. how education, occupation, and earnings are obtained (Ganzeboom & Treiman, 2010). ISEI taps the features of occupation that change parents' education into income. ISEI index is assumed to maximize the indirect effect of education on income via occupation and to minimize the direct influence of education on income (Ganzeboom et al., 1992). In other words, occupational status is assumed to convert educational credentials into earnings (Ganzeboom & Treiman, 2010). It is constructed based on the four skill level classifications of the International Standard Classification of Occupation (ISCO-08). For instance, skill level 1 involves the performance of simple and routine physical or manual tasks while skill level 4 involves tasks that require complex problem-solving, decision-making, and creativity such as professional and managerial occupations. We took the highest status in a family.

Analysis
The analysis of the study was performed using R (R Development Core Team, 2015). Structural Equation Modeling (SEM) using the package lavaan (Rosseel et al., 2015) in R was conducted to analyze the measurement and the main models. comparative fit index (CFI) and the root mean square error of approximation (RMSEA) values were used to evaluate the model fit: CFI ≥ .95, RMSEA ≤ .08 (Hu & Bentler, 1999). Specifically, confirmatory factor analysis (CFA) was employed to check the measurement models of the latent variables. In the main analysis, we used cross-lagged panel analysis (Cole & Maxwell, 2003) to test the bidirectional relationship between procedural metacognition and reading competence. We employed measurement invariance testing to make sure the latent construct, procedural metacognition, is comparable for under and over-estimating participants. A difference in the CFI of > .01 between two consecutive models in invariance testing (e.g., configural and weak measurement invariance models) was assumed as a serious deterioration in model fit (Cheung & Rensvold, 2002).

Descriptive Statistics
Actual reading competence increased over time for the whole group of participants. The mean of reading comprehension (longitudinally linked WLEs scores) was -.02 in grade 5 and it increased to .76 in grade 7. In grade 9, the mean score became 1.39. On other hand, the means of the participants' estimation of their achievement in reading comprehension decreased from grade 5 to 7 and then it became flat from grade 7 to 9 for the participants as a whole. The mean of their estimated scores were .73, .72, and .72 in grades 5, 7, and 9, respectively. This implies that, on average, the participants estimated that they correctly solved 73% of the questions in grade 5 and 72% of the questions in grades 7 and 9. The absolute values of the deviation (see the formula below) between students' estimation of their performance and their actual test score, were calculated to show the magnitude of the accuracy of judgment regardless of the direction of the self-bias for the overall group of participants. As can be seen in Table  1, the absolute deviation declined from grade 5 (M = .22) to 7 (M = .21) and 9 (M = 18). The decline in deviation indicated improvement in the overall monitoring accuracy over time because it showed a decline in the discrepancy between the estimation of one's performance and the actual score.
Journal of Educational and Developmental Psychology Vol. 13, No. 1; In addition, the descriptive statistics for underestimating and overestimating groups are presented in Table 1. Participants who had a deviation score of less than zero (percentage of estimation of one's correct response in the reading minus percentage of actual score) were categorized in the underestimating group. On the other hand, participants who had a deviation score of greater than zero were categorized in the overestimating group. A deviation score of zero is assumed to be an indication of perfect judgement accuracy. The grouping was done to show the general trend as well as the difference in reading competence and monitoring accuracy scores between under-and overestimating participants. The longitudinal trend of the mean of reading competence was the same for both under and over-estimating groups and increased over time. However, the means of reading competence of underestimating group appeared to be higher than the means of the overestimating group consistently across the three-time points. The means of reading competence in the underestimating group were .83, 1.43, and 1.80 in grades 5, 7, and 9, respectively. The means of the overestimating group were -.33, .28, and 1.00, in grades 5, 7, and 9, respectively. The mean scores of estimated scores of reading comprehension increased over time for the underestimating group while they declined for the overestimating group. As expected, the underestimating group had a lower estimation score than the overestimating group. The mean estimation scores of the underestimating group were .64, .66, and .67 in grades 5, 7, and 9, respectively. The means of the estimation scores of the overestimating group were .77, .77. and .76 at the corresponding time points. As expected, the underestimating group had a negative mean of deviation scores. Compared to the overestimating group, the underestimating group had a wider deviation in terms of magnitude. The means of the deviation scores of the underestimating group in grades 5, 7, and 9 were -.10, -.12, and -.11, respectively. On the other hand, the means of the deviation scores of the overestimating group were .19, .19, and .17 in grades 5, 7, and 9, respectively. Note. Overall = the whole participants, Under = under-estimating group, Over = overestimating group; WLEs scores were used for reading comprehension; Estimation = participants' estimation of the percentage of items they correctly solved; Deviation = the actual percentage of the number of items they correctly solved minus the percentage of the estimation of the number of items they correctly solved. In addition, we did a correlational analysis to see the overall as well as the trend of the relationship between reading competence and procedural metacognition between under -and overestimating groups at a cross-sectional level. As the correlation matrix for the whole group of participants can be seen in Table 2, the coefficient of correlation between metacognitive monitoring and reading comprehension was moderate at the three-time points. For the overall participants, the coefficient of correlations between metacognitive monitoring and reading comprehension were .31, .28, and .25 in grades 5, 7, and 9, respectively. For the correlation as well as the cross-lagged panel analysis, we used the absolute residual scores (the procedure of the calculation is explained in the method section) to tap the monitoring accuracy of participants. The relationship between metacognitive monitoring and reading comprehension seemed higher for the underestimating than the overestimating group. As can be seen in Table 3, while the correlation between procedural metacognition and reading competence at grade 5 was r = .58 for the underestimating group, it was r = .33 for the overestimating group. Similarly, in grade 7 a correlation coefficient of r = .46 was observed between reading comprehension and metacognitive monitoring for the underestimating group. On the other hand, the correlation coefficient was .20 for overestimating participants in grade 7. A similar trend was observed during the later grades that the correlation was higher for underestimating than the overestimating group of participants

Main Cross-Lagged Model
Before proceeding to the main analysis, we checked for the measurement model of the latent variablemetacognitive monitoring. Accordingly, we conducted a longitudinal CFA in grades 5, 7, and 9. The cross-lagged model was mainly analyzed to see whether metacognitive monitoring and reading competence had a bidirectional relationship between grades 5 to 7 and grades 7 to 9. To control for the previous metacognitive monitoring and reading comprehension, the autoregressive effects were taken into account. While metacognitive monitoring was computed at a latent level, longitudinally linked WLEs scores of reading comprehension were used in the cross-lagged panel analysis. The missing values were handled using a full maximum likelihood information (FIML) function.
For the overall participants, we found low developmental stability in metacognitive monitoring from grade 5 to 7 (β = .20, p < .01) but a slight increment was observed in the later grades from 7 to 9 (β = .30, p < .01). On the other hand, a relatively moderate developmental stability in reading comprehension was observed from grade 5 to 7 (β = .54, p < .01) and 7 to 9 (β = .56, p < .01) as depicted in Figure 1. The cross-lagged panel analysis result revealed that the development of reading comprehension at grade 7 was positively predicted (β = .09, p < .01) by procedural metacognition at grade 5 after controlling for reading comprehension at grade 5 (see Figure 1). Similarly, a relatively smaller but significant cross-lagged effect from metacognitive monitoring to reading competence was consistently shown in the later grades. Metacognitive monitoring grade 7 positively and significantly predicted the development of reading comprehension at grade 9 (β = .07, p < .01) after controlling for reading comprehension at grade 7. In the opposite direction, stronger effects from reading comprehension to metacognitive monitoring were observed. Reading comprehension in grade 5 had a positive and significant developmental effect on metacognitive monitoring at grade 7 (β = .18, p < .01) after controlling for metacognitive monitoring in grade 5 (see Figure 1). In the later grades, the development of metacognitive monitoring at grade 9 was significantly and positively predicted by reading comprehension at grade 7 (β = .17, p < .01) after controlling for metacognitive monitoring at grade 7 as depicted in Figure 1. Overall, the cross-lagged panel analysis showed the developmental relationship between reading competence and metacognitive monitoring to be reciprocal. However, there was a consistent difference in the strength of the effects that reading comprehension had more effect than metacognitive monitoring. This difference was statistically confirmed using a χ²-difference test that evaluated the model fit difference between a restricted (i.e., where the reciprocal effects were equal) and an unrestricted model. The χ²-difference test showed the effect of reading comprehension to be significantly greater than the effect of metacognitive monitoring from grade 5 to 7 (Δχ² (1, N = 5,754) = 13.75, p < .001) and from grade 7 to 9 (Δχ² (1, N = 5,754) = 5.96, p < .001).

Multi-Group Cross-Lagged Models (Under -vs Overestimation)
To further understand the trend of the longitudinal relationship between reading comprehension and metacognitive monitoring between under and overestimating participants, a multi-group cross-lagged panel analysis was conducted. We could not compute a multi-group model including the three time points since the group members were dynamic over the period. If the group variable was constant (such as sex), we could have computed the multi-group cross-lagged models for the three-time points in tandem. Therefore, we divided the time points into two. First, we conducted the multi-group cross-lagged model for the time points grade 5 and 7 between those participants who under -and overestimated their performance in grade 5. Consequently, we did the same analysis for the time points in grades 7 and 9, between those participants who under -and overestimated their performance in reading in grade 7.
Before computing the multi-group cross-lagged models, we checked for the measurement invariance of the latent variable -metacognitive monitoring. For the first grouping (between under -and overestimating groups at grade5), the model fit indices of the configural measurement invariance of the procedural metacognition were acceptable (χ² = 53.16, df = 38, p < .001; CFI = .99; RMSEA=.01), which suggested similar factor structure between the two groups (see Table 4.). In the next step, weak measurement invariance testing, the factor loadings were constrained to be equal between under and overestimating groups and it did not show meaningful model fit deterioration (χ² = 70.45, df = 44, p < .001; ΔCFI = .00) as ΔCFI was not above the cutoff point (.01) suggested by Cheung and Rensvold (2002). In the third step, testing for strong measurement invariance, the factor loadings, and intercepts were constrained to be equal across the two groups. The fit indices did not exhibit meaning model fit deterioration (χ² = 96.54, df = 50, p < .001; ΔCFI = .01) because ΔCFI did not exceed the threshold. However, in the strict invariance (factor loadings, intercepts, and residuals constrained to be equal) the model fit showed significant deterioration (χ² = 399.99 df = 58, p < .001; ΔCFI = .15) because ΔCFI was above the cutoff point, .01. Note. Configural = similar factor structure; Weak = factor loadings are constrained to equality; Strong = intercepts are constrained to equality in addition to factor loadings; Strict = residuals are constrained to equality in addition to factor loadings and intercepts.
We computed similar measurement invariance testing for metacognitive monitoring under -and overestimating groups at grade 7 and the change in the model fit indices confirmed strong measurement invariance as presented in Table 4. Therefore, strong measurement invariance was imposed in the subsequent multi-group cross-lagged panel analysis models. This would exclude the assumption that the difference observed between the two groups was merely because of a difference in the meaning of the measure between under -and overestimating groups of participants.
The fit indices of the multi-group cross-lagged models (Figures 2 and 3) based on the grouping at grade 5 (χ² = 293.45, df = 88, CFI = .96 RMSEA = .03, SRMR = .04) and grade 7 (χ² = 155.40, df = 50, CFI = .97, RMSEA = .03, SRMR = .04) were acceptable. For the underestimating group, there was a significant and positive reciprocal effect between reading comprehension and metacognitive monitoring from grade 5 to 7 taking the prior relationship into account (see Figure 2). Metacognitive monitoring at grade 5 had a positive and significant effect on the development of reading comprehension at grade 7 (β = .21, p < .01) taking reading competence at grade 5 into account for the underestimating group of participants. Similarly, reading comprehension in grade 5 had a positive and significant effect (β = .15, p < .05) on the development of metacognitive monitoring in grade 7 controlling for metacognitive monitoring in grade 5. Similarly, there was a reciprocal developmental relationship between metacognitive monitoring and reading comprehension for overestimating group from grade 5 to 7 although the magnitudes of the effects were lower compared to the underestimating group. Metacognitive monitoring at grade 5 had a positive and significant developmental effect (β = .07, p < .05) on reading comprehension in grade 7 after controlling for reading comprehension at grade 5. On the other direction, there was a positive and significant effect (β = .11, p < .05) from reading comprehension at grade 5 to metacognitive monitoring at grade 7 after controlling for metacognitive monitoring at grade 5.  Vol. 13, No. 1; are for underestimating the group of participants; Strong measurement invariance was imposed between the groups; SES of the participants was controlled; P* < .05, P** < .01. Note. The groups were those who underestimated and overestimated their performance at grade 7; n (under) = 1,680 n (over) = 2, 441 χ 2 = 155.40, df = 40, CFI = .97, RMSEA = .03, SRMR = .04; The parameters on the left side are for underestimating the group of participants; Strong measurement invariance was imposed between the groups; The socioeconomic status of participants was controlled; P* < .05, P** < .01.
However, a difference in the developmental relationship between metacognitive monitoring and reading comprehension was observed between under -and overestimating groups at the later time points, from grade 7 to 9. The developmental relationship became uni-directional for both groups but in a different direction. While a positive and significant effect (β = .12, p < .01) from metacognitive monitoring in grade 7 to reading comprehension in grade 9 was evident for underestimating group, in the opposite direction, there was a significant effect (β = .21, p < .01) from reading comprehension at grade 7 to metacognitive monitoring at grade 9 for overestimating group. The developmental effects from reading comprehension at grade 7 to metacognitive monitoring at grade 9, and metacognitive monitoring at grade 7 to reading comprehension at grade 9 became insignificant for under and over-estimating groups, respectively, as depicted in Figure 3.

Discussion
The study intended to examine whether the relationship between metacognitive monitoring and reading comprehension is reciprocal. The monitoring aspect of procedural metacognition, which was measured by the accuracy of self-judgment of performance in reading comprehension, was the particular focus of the study. The finding of the study has confirmed the hypothesis that metacognitive monitoring and reading comprehension has a reciprocal developmental relationship from grade 5 to 7 and grade 7 to 9. The observed positive effects of metacognitive monitoring on reading comprehension support the established literature on the importance of procedural metacognition on academic achievement in general and reading competence in particular (Pintrich, 2002;Schneider, 2015). From a metacognition research perspective, it can be assumed that monitoring is useful. It provides accurate information about the current epistemic state to regulate learning processes (Nelson & Narens, 1994). And this information, obtained through monitoring, helps students to properly allocate their time during their studies and employ appropriate learning strategies (Thiede & Dunlosky, 1999). This illustrates the importance of monitoring accuracy in providing ongoing feedback on one's learning process to initiate self-regulation such as whether to terminate or prolong study time (Hacker et al., 2000;Thiede & Dunlosky, 1999). Previous empirical investigations also demonstrated consistent results with the present study that metacognitive monitoring accuracy plays a positive role in reading comprehension (Bouffard et al., 1998;Roebers et al., 2014;Stankov et al., 2012).
Overall, students' metacognitive monitoring accuracy had positive and significant developmental effects on reading comprehension from grade 5 to 7 and grade 7 to 9. However, reading comprehension had stronger jedp.ccsenet.org Journal of Educational and Developmental Psychology Vol. 13, No. 1; reciprocal effects on later metacognitive monitoring accuracy from grade 5 to 7 and grade 7 to 9. The reciprocal effects were small but it should be noted that autoregressive effects and SES were controlled. This longitudinal finding is consistent with the general theoretical assumption that metacognition and academic achievement might be mutually interdependent throughout development (Flavell & Wellman, 1977;Schneider, 1985;Schneider & Pressley, 1997). In a specific focus on metacognitive monitoring, the skill required to accurately judge one's performance in reading is similar to the skill required to effectively comprehend a given text (Dunning et al., 2003). Therefore, low achievers might be in a double curse as they fail to judge their performance the way they fail to comprehend a given text. (Dunning et al., 2003;Miller & Geraci, 2011). This indicates the role of knowledge in a subject matter on the later development of monitoring accuracy. Among a few previous studies that focus on the reciprocal relationship between procedural metacognition and reading comprehension, the findings of the present study appear to be partially inconsistent with a recent study conducted on secondary school students by King and McInerney (2016). Instead of a reciprocal relationship, they found a uni-directional effect from academic achievement (mathematics and reading comprehension) to procedural metacognition (monitoring and planning). However, it should be noted that there was a difference in the measure between theirs and the present study that they used a self-report measure to assess metacognitive ability.
In addition to the overall investigation of the developmental relationship between metacognitive monitoring and reading comprehension, the study aimed to examine whether the result observed in the overall group of participants holds to be true for both under -and overestimating groups. The multi-group cross-lagged panel analysis indicated that metacognitive monitoring and reading comprehension had a reciprocal developmental relationship from grades 5 to 7. That means monitoring accuracy and reading comprehension were mutually interdependent for both groups despite the strength of the association being more pronounced for the underestimating group. However, the relationship became uni-directional from grade 7 to 9 and the effects were from metacognitive monitoring to reading achievement for the underestimating group, and from reading to metacognitive monitoring for the overestimating group. This could be because the overestimating group had lower reading achievement (as was observed in the descriptive statistics in Table 1) at the initial time point and its development could help the students to improve their monitoring accuracy at grade 9. On the other hand, underestimating students were already competent in reading comprehension during the initial period thus effectively monitoring their learning process could further contribute to the development of reading comprehension.
In addition, the association between metacognitive monitoring and reading comprehension was more pronounced for the underestimating group in the cross-sectional correlation analysis at the three time points. When the multi-group cross-lagged panel analysis is combined with the cross-sectional correlation analysis, monitoring accuracy appeared to be useful for the later development of reading comprehension. However, judgement accuracy seemed more important for those students who underestimated their competence than overestimating students. This was evident even after imposing strong measurement invariance between the two groups in the longitudinal analysis. On the other hand, the means of the reading comprehension of the underestimating group were higher than the means of the overestimating group. On contrary, the means of the deviation of the underestimating group were less than the overestimating group across the three-time points. Therefore, the observed higher means of the underestimating group in reading comprehension are not surprising given the means of the estimation scores of the underestimating group were closer to their actual scores than the means of the estimation scores of the overestimating group -as accuracy leads to high achievement. This could also mean excessive overestimation is more hindering than moderate underestimation. Self-handicapping tendency might also be another explanation for why the underestimating group had higher achievement scores in reading than the overestimating group. Some students might underestimate their achievements to protect their self-esteem (Elliot & Church, 2003). On the other hand, the fact that accuracy and achievement were more associated with underestimation than overestimation -in the longitudinal analysis -might imply that excessive underestimation may impair the development of reading comprehension to a greater extent than excessive overestimation. This could be because excessive underestimation might create negative emotions that limit motivation (Bouffard & Narciss, 2011).
To summarize, the study illustrates the importance of metacognitive monitoring to provide the right ongoing feedback to regulate learning processes as required (Hacker et al., 2000;Thiede & Dunlosky, 1999). Low metacognitive accuracy could be because of either under -or overestimation of one's performance. Consistent underestimation of one's performance might have a negative impact on reading comprehension because it affects motivation Efklides, 2006). From the perspective of metacognition research, overestimation is also assumed to have a negative impact on academic achievement as it might be an obstacle to identifying one's own learning needs and this might make students less prepared and less interested in seeking help from others jedp.ccsenet.org Journal of Educational and Developmental Psychology Vol. 13, No. 1; 2023 (Butler & Winne, 1995;Dunlosky et al., 2005;Stone & May, 2002). While the result of the study suggested the benefit of the accuracy of one's judgement for reading comprehension, comparatively, excessive underestimation tends to be more hindering than excessive overestimation for later achievement in reading comprehension. The result of the present study has shown this complex longitudinal relationship using large-scale data and a measure that did not depend on social comparison but was based on the residual of a criterion (actual test score), which is less confounded to achievement. Often deviation scores between self-evaluation and actual score are used to assess monitoring accuracy and this was reported to be problematic because it is highly confounded with achievement (Dufner et al., 2015). To overcome this problem, using the residual after predicting the criterion on self-evaluation is recommended (Dufner et al., 2015;Gonida & Leondari, 2011). The residual based score strengthens the assumption the relationship observed between metacognitive monitoring and reading comprehension was not because the measures were simply related technically. And this is believed to be an important contribution to shedding a light on the debate on the developmental relationship between metacognitive monitoring and reading comprehension.

Conclusion
The findings of the study are valuable to contribute to the scientific debate on the longitudinal relationship between metacognitive monitoring and reading comprehension. It demonstrated the developmental interplay between the monitoring aspect of metacognition and reading comprehension in large-scale data. It further provided longitudinal insights into the implication of under -and overestimation of one's performance on later reading comprehension. These have practical implications such that promoting students to accurately monitor their own learning process could have a positive effect on academic achievement in general and reading comprehension in particular. Discouraging under -and overestimation of one's own performance might pave the way to effectively regulate the learning process. Therefore, parents and teachers are recommended to help children develop a realistic view of their performance. One of the mechanisms of fostering students' monitoring ability can be encouraging the competence of children in the subject matter as metacognitive monitoring and achievement are mutually interdependent in the course of development. Despite these theoretical and practical implications, it is worth mentioning one of the limitations of this study i.e. only one aspect of procedural metacognition was investigated. Therefore, it may be fruitful to investigate the developmental interplay by including other aspects of procedural metacognition such as the allocation of study time. In addition, it may also be worthwhile to examine the relationships with shorter time intervals between the measurement points (such as micro genetic study) to analyze how monitoring accuracy affects metacognitive control and learning behavior immediately. Moreover, examining whether the observed reciprocal relationship holds to be true with other domains of academic achievement might be the focus of future research.