Introduction

As a highly complex strategic literacy skill alongside reading, writing is an essential prerequisite for learning and determines students’ educational success (Gogolin, 2018; in print). Writing requires the coordination and integration of cognitive, metacognitive, and linguistic skills (Stavans et al., 2019). Theoretically, the assumed number of components involved in the writing process varies (Kim, 2020). From a developmental perspective, the mastery of writing represents a long-term process. This process may be described as a gradual progression on a continuum, with both proficiency and text quality evolving throughout schooling and beyond (Olinghouse & Wilson, 2013; Silliman et al., 2020). From a social semiotic perspective (Halliday, 1978, 1985), writing represents an active sign-making process. Writers as active sign makers may draw on and use codes as fluid semiotic resources to create meaning in any particular context (Sun et al., 2021). According to this view, language represents one of many semiotic resources involved in text construction (Canagarajah, 2015).

A student’s linguistic repertoire may involve multiple languages in linguistically diverse settings including the majority language (ML), heritage language(s) (HL), and any foreign language(s) (FL) learned at school. In such contexts, depending on their learning opportunities, students may develop writing proficiency in multiple languages and use the semiotic resources of these languages. Valuable theoretical approaches have attempted to grasp the interrelation of languages within multilingual repertoires. Thus, Cummins’ (1979) interdependence hypothesis proposed the interrelation between two languages. According to this hypothesis, well-developed language skills in one language would facilitate well-developed language skills in another language. This relationship between languages is based on a common underlying language proficiency (conceptual knowledge, cognitive, language, and literacy skills; Cummins, 1979). Concerning the interrelation of languages within the context of migration, Cummins proposed that migrants transfer heritage language skills to their receiving context’s majority language. Other studies have elaborated approaches to capturing the interrelation of languages within the multilingual repertoires necessitated by more than two languages. According to the Focus on Multilingualism (FoM) approach (Cenoz & Gorter, 2011), research on multilinguals needs to evaluate multilingual repertoires and investigate the interrelation of languages within such repertoires in their whole complexity rather than individually. The FoM approach draws on the Dynamic Systems Theory in Second Language Acquisition (De Bot et al., 2007), which considers language development a complex, nested, and interconnected dynamic system. While investigating the interrelation of languages, a consideration of the diversity of contexts is necessary. In such contexts, a variety of factors and arrangements may contribute to the heterogeneity of multilingual skills. According to the translingual approach to writing (Canagarajah, 2015), proposes the multidirectionality of influences between languages. Therefore, multilingual writing has been conceptualized as a synthesized competence (Canagarajah, 2015) that includes all languages in a person’s repertoire and evolves continuously and dynamically.

Despite the extensively developed theory on the interrelation of multilingual writing skills, the related empirical research usually focuses only on a certain part of a person’s multilingual repertoire. Studies investigating multilingual writing repertoires as (a) writing proficiency in the majority and foreign language(s), (b) the majority and heritage language(s), and (c) the majority, heritage, and foreign languages have shown that writing proficiency is interrelated among different languages within a multilingual repertoire (Lanauze & Snow, 1989; Riehl, 2020, 2021; Schoonen et al., 2011; Soltero-González et al., 2012; Sparrow et al., 2014; Usanova, 2019; Usanova & Schnoor, 2021a). The relationships between writing proficiencies in different languages may be based on the common linguistic and metacognitive knowledge they share. Thus, writers may use their metacognitive knowledge about writing tasks and strategies in one language to compose a text in another language (Schoonen et al., 2011; Soltero-González et al., 2012; Victori, 1999). The FoM approach has been applied by Usanova and Schnoor (2021a), who have demonstrated the positive interrelation of multilingual writing skills within students’ writing repertoires in three languages: the majority language, the heritage language, and the foreign language learned at school. Although the interrelation of languages and writing skills has been shown in numerous cross-sectional studies, none of these studies could determine whether writing skills in different languages may serve as mutual resources to enrich multilingual proficiency.

Furthermore, the positive interrelation of languages according to the interdependence hypothesis was questioned by Vanhove and Berthele (2018) and Berthele and Vanhove (2020), who argued that correlations between linguistic skills might be “mere epiphenomena of general cognitive effects” (Berthele & Vanhove, 2020, p. 551).

In any case, longitudinal data on a persons’ multilingual writing would be needed to empirically clarify the interrelated nature of multilingual writing skills and to identify any possible confounders. In a German panel study, “Multilingual Development: A Longitudinal Perspective” (MEZ), we observed students’ language development in secondary education, i.e., in the age group of 13–18 years old. This longitudinal study comprised four measurement points relevant to secondary education and included a panel of 2,103 students. Students’ receptive and productive literacy skills (reading, writing, general language skills) were tested in German (the language of schooling) and English (the first foreign language). Roughly half of this sample of migrants was also tested in a heritage language, either Turkish or Russian. Moreover, 800 students took tests in their second foreign language, French, and a small group of 70 students was tested in their second foreign language, Russian.

By drawing on these unique longitudinal data on writing development among secondary school students, the current study aims to determine whether multilingual writing proficiencies may serve as mutual resources in the multilingual repertoires of secondary school students. Using a developmental perspective on students’ multilingual writing proficiency, we examine the interrelation between writing proficiencies in the majority language of German, the heritage language of Russian or Turkish, and the foreign language of English among secondary school students in Germany. The unique empirical exploration of the multilingual writing repertoires of adolescents conducted in the current study should clarify the challenges and opportunities connected to a successful acquisition of literacy and uncover adolescent migrants’ potential for multilingual language learning in linguistically diverse contexts.

Theoretical concepts of multilingual writing development

Several theoretical approaches have addressed the relationship between languages in a bilingual repertoire. Canagarajah (2015) applied them to multilingual literacy development and, specifically, to multilingual writing development. He distinguished four types of models: subtractive, additive, recursive, and translingual.

In the subtractive model, languages are considered to interfere in learning in a conflictual relationship. This model suggests that language development in one language impairs development in another language. The additive model proposes that it is possible to build proficiencies in different languages without conflicts, which coalesce in one’s multilingual repertoire. Both models consider writing development in different languages as autonomous processes, with learners having “compartmentalized competencies” in each language (Canagarajah, 2015, p. 422). The recursive model suggests that language development is complex and dynamic, enabling reciprocal relationships in the learning process. Between-language influences do not appear simultaneously but rather consecutively over time, in strict chronological order. One has to develop mature skills in one language system to be able to transfer them into another language system. Thus, the recursive model also treats languages as rather separate entities. In contrast, the translingual model assumes even deeper connections between languages that permit multilingual people to “multitask or parallel process with their languages, not keeping them disconnected when they are learning or using them” (Canagarajah, 2015, p. 423). Proficiencies in individual languages may differ, but they build a synthesized competence. The translingual and recursive models are very similar, differing in only two respects. The translingual model assumes that (a) multidirectional influences between languages are both sequential and simultaneous and (b) multilingual writing proficiency is synthesized rather summarized. In contrast, the differences between these models and the additive model are fundamental and even more severe than those with the subtractive model. The translingual model’s prime features are: (1) it does not treat languages as separate; (2) the acquisition of literacy is not a linear but a multidirectional process with multiple possible influences; (3) it describes multilingual literacy as integrated, with all languages building a synthesized competence; and (4) in contrast to the other three models, it considers multilingual proficiency continuously evolving rather than complete (Canagarajah, 2015).

Overall, the valuable theoretical approaches thoroughly conceptualize the relationship between the languages involved in multilingual writing development. We applied this theoretical foundation to guide the empirical analyses of multilingual writing proficiency within the current study.

Research questions

The current study investigates the interrelation of multilingual writing skills among German-Russian and German-Turkish bilingual adolescents in Germany. We draw on the social semiotic understanding of writing proficiency using fluid semiotic resources to create meaning in a particular context and consider writing acquisition an evolving dynamic process. While overarching these concepts with the FoM approach (Cenoz & Gorter, 2011) and taking into account the models of the acquisition of multilingual literacy (Canagarajah, 2015), our study aimed to answer the following research questions:

  • What are the relationships between multilingual writing proficiencies?

  • Do writing proficiencies in different languages serve as mutual resources in the process of writing development?

Project

The current study is conducted as part of the German junior research group called “Multiliteracies as a Resource for the Labor Market. Social Conditions and Transformability into Economic Capital (MARE)”, which is funded by the German Federal Ministry of Education and Research (2021 to 2026). MARE investigates multiliteracy as a multidimensional construct that involves both multilingual and multimodal literacy skills.

Context

Officially, Germany is considered to be a monolingual country with migration-related multilingualism not being recognized as the normal case. This consideration, however, does not reflect the country’s multilingual reality (for the extensive context, see Gogolin, 2018a). Linguistic diversity has been constantly increasing in Germany in recent decades with the arrival of migrants from 190 countries (for a review of the statistics, see Gogolin, 2018b, p. 14). The nonacceptance of actual multilingual reality at the official level results in monolingual-oriented education policies and practices. Although one’s majority and foreign language(s) may be learned in institutional settings, heritage languages are often learned without any institutional support. Opportunities to learn heritage language(s) by attending heritage classes are provided only for a limited number of languages. Overall, there is no systematic overview of the languages taught in such classes in Germany, and the number of languages varies among German federal states (Lengyel & Neumann, 2017; Usanova & Schnoor, 2021b). There are different forms of heritage language classes such as those organized by federal states (taking place at schools), by consulates, or by private organizers (e.g., private courses in churches, communities, private associations) (Brehmer & Mehlhorn, 2018; Lengyel & Neumann, 2017). The number of schools that provide HL classes is rather low, with the teaching contents, methods, and qualifications of teachers in HL classes vary extensively (Montanari et al., 2018, p. 220). Students who attend HL classes have been reported to have rather disparate proficiency levels, especially regarding their literacy skills (Montanari et al., 2018).

Data

We apply data from a German panel study, “Multilingual Development: A Longitudinal Perspective (MEZ)” (Gogolin et al., 2017), which was funded by the German Federal Ministry of Education and Research (BMBF). Following an interdisciplinary perspective on multilingualism, MEZ aimed to provide insights into the individual and contextual conditions that influence the development of multilingual literacy among adolescents. MEZ was a longitudinal cohort-sequence study with two starting cohorts (7th- and 9th-grade students) and four waves of data collection over three years (2016 to 2018). The MEZ panel comprised 2103 students from the German secondary educational system with Russian, Turkish, and monolingual German language backgrounds. Receptive and productive skills were measured in the majority language (German), the heritage language (Russian or Turkish), and the first (English) and second foreign languages (French and Russian) learned at school. Concerning the comparability of multilingual repertoires, MEZ included only students with school careers in Germany. Thus, the MEZ sample consisted of students who had entered the German education system by the 3rd grade at the latest (Klinger et al., in print). Newly immigrated persons were not part of the study due to comparability considerations.

The sample selection, implementation of ethics approval procedures, and data collection organization were conducted by an external survey institute (IEA Hamburg, 2017a, 2017b, 2018). The students were tested in groups in their respective schools. To avoid interactions between the language data, two test days with an intermediate waiting period were necessary. Writing in German and English was tested on the first day. Writing in the heritage language, Turkish or Russian, and in the second foreign language(s) was tested approximately a week later.

Participants

This study analyzed data from the first three waves of the bilingual panel and both starting cohorts. In the first wave, conducted in spring 2016 (IEA Hamburg, 2017b), the students were in grade 7 (mean age = 13.2 years) and grade 9 (mean age = 15.2 years). In the second wave, conducted in fall 2016 (IEA Hamburg, 2017a), the students were at the beginning of grade 8 (mean age = 14.0 years) and grade 10 (mean age = 16.0 years). In the third wave, conducted in summer 2017 (IEA Hamburg, 2018), the participants were at the end of grades 8 (mean age = 14.6 years) and 10 (mean age = 16.6 years). The longitudinal sample consisted of 965 students (49.7% from the 7th-grade starting cohort and 50.3% from the 9th-grade starting cohort 9), including 364 German-Russian (63.1% females) and 601 (58.3% females) German-Turkish bilinguals.

Concerning their migration backgrounds, 85.6% of the German-Russian bilinguals and 91.7% of the German-Turkish bilinguals were born in Germany. Regarding their linguistic acculturation, 59.5% of the German-Russian bilinguals and 30.5% of the German-Turkish bilinguals started acquiring German in their first two years of life, and 93.4% and 96.3% began in their first five years of life, respectively. Additionally, 38.3% of the German-Russian bilinguals and 43.4% of the German-Turkish bilinguals speak mostly German at home. Concerning the institutional support for literacy in heritage languages, 26.7% of German-Russian bilinguals and 50.5% of German-Turkish bilinguals have attended heritage language classes in school, while 8.6% and 3.8% had attended classes outside of school, respectively.

Concerning group differences in language proficiencies, Table 1 gives an overview of the language competence data gathered in the MEZ study (for detailed information about the instruments, see Gogolin et al., 2017; Gogolin et al., in print; Klinger et al. 2019). For simplicity, we report measurements from the first wave to show the initial differences between the German-Russian and German-Turkish bilinguals.

Table 1 Participants’ language proficiency (means, standard deviations, and mean differences’ effect sizes)

Regarding German, there are significant mean differences with the effect size coefficient Cohen´s d (see Cohen, 1988; d < 0.20 = no effect, d > 0.20 = small effect, d > 0.50 medium effect, d > 0.80 = large effect), indicating only small between-group differences in reading comprehension (t[773] = 3.91, p < .001, d = 0.29) and writing proficiency (t[767] = 3.64, p < .001, d = 0.27). Concerning Russian and Turkish, no cross-language comparisons are possible; that is, e.g., German-Turkish bilinguals are more proficient in Turkish than German-Russian bilinguals are in Russian. Nevertheless, there is no reason to assume substantial between-group differences in heritage language proficiency when looking at the numbers. Regarding English, there is no significant mean difference in general language ability (t[773] = 1.75, p = .08, d = 0.13) and a small difference in writing proficiency (t[756] = 2.17, p = .03, d = 0.16). However, the effect size coefficient indicates that this difference is irrelevant. Therefore, there is sufficient comparability between the German-Russian and German-Turkish bilinguals.

Measures

Multilingual writing skills. We applied the “MEZ writing task for adolescents” (Gogolin et al., in print) to measure students’ multilingual writing skills development. In the three data collection waves, we used six different pictorial stimuli tasks; all were developed on the same principles: they involved nine pictures and aimed to elicit expository text types. We applied expository writing tasks because the aim of our study was to measure the writing skills that are relevant to adolescents’ educational attainment.

The tasks required students to write a trial version of an article for a youth journal. The topics to assess writing skills in German, as well as in Russian and Turkish, were how to make a gingerbread house (first wave), decorative fairy lights (second wave), and a boomerang (third wave). All prompts that were distributed in HLs were pretested for their comparability. Our pretests revealed no group-specific differences with regard to the selected topics. This was an expected result since most students were born in Germany and all of them had attended the German school system beginning no later than the third grade. Furthermore, the selected topic of handicraft also takes into account students’ prior knowledge (Moll, 2019) since handicraft is a widespread activity in the context of family homes.

To assess writing skills in English, curriculum-related topics were selected. The preanalysis of textbook content in the relevant foreign languages was conducted to ensure that all participants had at least some experience with the selected topics (e.g., food, travel). The tasks were preparing breakfast in Germany (first wave), picnicking in a park (second wave), and taking a trip to Hamburg (third wave).

The theory this study used to measure students’ writing skills considers writing multicomponential (Puranik et al., 2008; Wagner et al., 2011). It covers different basic dimensions of language, i.e., textual-pragmatic, lexico-syntactic, and productivity, which are aspects of the general construct of “writing skills” that are common for all investigated languages (Gogolin et al., in print; Table 2). For the measurement of the general construct, in the current study, we used four empirical indicators that refer to these basic dimensions of language: a rating score for task accomplishment (textual-pragmatic dimension), types of verbs and types of conjunctions (lexico-syntactic dimension), and text length by number of words (productivity). All written texts were analyzed by trained research assistants and achieved high levels of reliability in all languages across both the empirical indicators (internal consistency [Cronbach`s α] ranging from α = 0.859 to α = 0.944) and the raters (intraclass correlations [ICC] ranging from κ = 0.865 to κ = 0.988).

Table 2 The assessment of writing skills, applied to all investigated languages

For longitudinal analyses, the students’ raw scores for the indicators were transformed into POMP scores (percentage of maximum possible). The students’ raw scores were standardized as a percentage of the maximum achievable [(raw score/maximum achievable score) * 100]. Since only task accomplishment has a natural maximum (27 points), the highest empirical score measured at all waves was used to set the maximum (Cohen et al., 1999; Moeller, 2015).

Confirmatory factor analyses revealed that these four indicators constitute a unidimensional latent construct of writing skills in each of the languages tested, indicating that the measurement of the construct meets the criterion of configural invariance (Meredith, 1993). Further analyses of the measurements’ comparability (measurement invariance, MI) showed that the latent construct was measured in the same metric over time (Klinger & Schnoor, 2020; Gogolin et al., in print) and between the different task versions (Schnoor, in print). These preliminary analyses ensured the interpretability of the covariance structure modeled in this study.

Cognitive ability. As a covariate, the nonverbal subtest N2 of the standardized “Cognitive Ability Test for Grades 4–12 + R” (Heller & Perleth, 2000) was used as a proxy for cognitive ability. The task is to recognize figural analogies between pictorial items, i.e., of “A” is related to “B” as “C” is related to “?“ Participants could choose from five alternatives. The score is the sum of correct answers over 25 test items.

Sample statistics

Table 3 contains descriptive statistics of the observed data, organized by language and wave. The indicators are scored in the POMP metric (percentage of maximum possible). While the indicators’ variance and covariance are interpretable over time due to metric measurement invariance, this is not true for the indicators’ means because scalar measurement invariance would be necessary (for further information, see Gogolin et al., in print). Therefore, only the longitudinal covariance structure is interpretable. Concerning the symmetry of the indicator distributions, the data are moderately skewed with a leptokurtic tendency. This asymmetry is because we capture a relatively wide range of individual writing proficiencies with a set of indicators, of which only one (task accomplishment) has a scale limitation. However, since we did not observe disturbed parameter estimates, we chose not to eliminate outliers and instead used a maximum likelihood algorithm that produces robust standard errors (MLR) to deal with nonnormality in the observed variables.

To address missing data, we used full information maximum likelihood estimation (FIML; Dempster et al., 1977; C. K. Enders, 2010; C. Enders & Bandalos, 2001; B. O. Muthén et al., 1987; Schafer & Graham, 2002) in Mplus 8.2 software (L. K. Muthén & Muthén, 1998–2017) for model parameter estimation. This approach allowed us to use the complete sample of 965 students.

Table 3 Writing skills indicators’ descriptive statistics

Analytic strategy

We conducted Longitudinal Structural Equation Modeling (LSEM; Little, 2013) to analyze multilingual writing development by relying on Klinger (2015) and Klinger et al. (2019). SEM is a latent variable approach to the model unobserved (latent) processes that produce the observed (manifest) data.

The base layer of an SEM is Confirmatory Factor Analysis (CFA). The CFA model is the measurement model that links the model’s manifest and latent parts. Based on the Common Factor Model (Thurstone, 1947), Confirmatory Factor Analysis (Jöreskog, 1967, 1969) assumes that a person’s scores on a test are manifestations (manifest indicators) of the influence of one (or more) hypothetical construct(s) at a higher level of abstraction (latent factor). Despite the ability of CFA to separate the construct’s true variance from indicator-specific measurement errors, the explicit modeling of the measurement model’s factorial structure also allows testing for its psychometric properties by imposing restrictions on model parameters (for an introduction to CFA, see Brown, 2015; Kline, 2016; Little, 2013).

Figure 1 shows the 3-wave CFA model that we use to measure students’ multilingual writing skills in their ML, HL, and FL. We estimated separate CFA models for each language. Each CFA model consists of three latent factors (ovals) of writing skills, repeatedly measured by four indicators (rectangles, see also Table 1): task accomplishment (Y1, Y5, Y9), verbs (Y2, Y6, Y10), conjunctions (Y3, Y7, Y11), and text length (Y4, Y8, Y12). As a restriction to the parameter estimation, we constrained the factor loadings to be equal over time to establish metric measurement invariance, which is a prerequisite for interpreting the longitudinal covariance structure (for a detailed discussion of the measurement invariance of the “MEZ writing task for adolescence”, see Klinger & Schnoor, 2020; Gogolin et al., in print; Schnoor, in print). By estimating the same measurement model in all languages, we also ensured configural measurement invariance between the languages. This feature makes the interpretation of reciprocal between-language effects easier since structurally identical constructs are addressed.

Fig. 1
figure 1

Latent 3-wave-measurement model of writing skills. Eta (η) denotes the repeated measured latent factors of writing skills, and Psi (ψ) denotes the factors’ variances/covariances. Lambda (λ) denotes the factor loadings that represent the common variance of factors and their indicators (Y1 to Y12). Epsilon (ε) denotes indicator-specific and random residual variance, which is separated to keep factors’ variance/covariance free of measurement errors (see Little 2013, p. 95)

At the structural level of the SEM model, we used a longitudinal panel model (Little, 2013; Selig & Little, 2012) for the developmental process of multilingual writing. Based on the latent Markov(LM) model for longitudinal data (Bartolucci et al., 2012, 2014), the longitudinal panel model assumes that the indicator variables’ scores (i.e., the observed data) are manifestations of a latent longitudinal process that causes the observed data. However, unlike the LM model, we did not assume a rigid Markov chain, where the indicator variables are conditionally independent given the latent process (local independency), but allowed the indicator residual covariances over time to account for method effects (for a detailed discussion, see Little, 2013).

Figure 2 shows an example of a process with two LM chains and three waves, i.e., of the development of skills in language A and language B over three waves of data collection. We will now briefly discuss the key model components that we will refer to in our analysis (see in detail Little, 2013, 180 ff.; Selig & Little, 2012, 256 ff.). The initial relationship between the two constructs in the first wave is called zero-order association (if no covariates are involved). If both factor variances are fixed to one, it is a correlation; otherwise, it is a covariance. It represents the starting point for the monitoring of the developmental process. At later time points, the associations between the latent factors are called residual associations or disturbance factors. They represent the amount of association between the factors that are not explained by the modeled process. Therefore, they are important indicators of whether the key components of the development process have been modeled.

Fig. 2
figure 2

Latent Markov model of writing proficiency development in two languages (structural model part only). η1, η3, and η5 are three states of writing proficiency in language A, and η2, η4, η6 are the corresponding states of writing proficiency in language B. Beta (β) denotes regressions (path coefficients) between the factors, and Psi (ψ) denotes the factors’ variance/covariance (in the case of ψ2,1, it is a correlation because the factors’ variances are fixed to 1 (see Little, 2013, p. 183)

The model also contains two types of regressive effects. The autoregressive effects represent the stability of individual differences (interindividual variability) over time by capturing the “amount of variance explained by the prior levels of the construct [A], controlling for the individuals’ standings on the other construct [B]” (Little, 2013, p. 182). Therefore, “a small or zero autoregressive coefficient means that there has been a substantial reshuffling of individuals’ standings on the construct over time. In contrast, a sizeable coefficient means that individuals’ relative standings on the construct have changed very little over time” (Selig & Little, 2012, p. 266).

The cross-lagged effects represent a regression of a construct on another construct at a later time point. The coefficient captures the “amount of change variance explained in the latent construct after controlling for the stability information from the prior measurement occasions” (Little, 2013, p. 182). Thus, significant cross-lagged coefficients mean that the changes in the interindividual differences on a construct can be explained by the interindividual differences on another construct, which indicates relations between constructs in the developmental process. The effect size of cross-lagged coefficients is usually very small compared to cross-sectional designs because they are controlled for the stability information from the autoregressive paths. Therefore, even trivial effects (according to universal guidelines of interpreting the magnitude of effect sizes) are meaningful in this context of data analysis (see Adachi & Willoughby, 2015, pp. 116 f.).

Notably, although our model can provide indications regarding the causal structure of multilingual writing development, these should not be mistaken for “causal inferences” that have been gathered from experimental studies. Empirical findings that rely on observational (nonexperimental) data always have the potential to be biased by unmeasured confounders and reverse causality, even when they stem from panel data (for a discussion and methodological innovations, see Allison et al., 2017; Leszczensky & Wolbring, 2019). However, since we are primarily concerned with whether writing skills in the students’ multilingual repertoires function as mutual resources, the question about ultimate causal inference is outside our scope.

For model evaluation, we relied on RMSEA (root mean square error of approximation, Steiger, 1998; Steiger & Lind, 1980) as a global measure (RMSEA < .01 “great fit”;.< .05 “good/close fit”; < .08 “acceptable fit”; < .10 “mediocre fit”; > .10 “poor fit”, Little 2013)”. RMSEA tests the degree of models’ approximations to their data, thus accounting for models always being only approximations of the actual processes that produced the data (Little, 2013, 108 ff.). In contrast, the χ2 test, which tests the absolute fit of a model, is too strict for complex models with large samples due to the test power with respect to the null hypothesis of perfect model fit. Thus, even trivial deviations from a perfect fit lead to model rejection (Brown, 2015; Little, 1997; Wang & Wang, 2012). We also used the standardized root mean square residual (SRMR; Bentler, 1995) as a global fit index that quantifies the deviation of the model-implied variance/covariance matrix compared to the empirical variance/covariance matrix in terms of residuals (SRMR < .08 “good fit”; < .10 “acceptable fit”, Hu & Bentler, 1995; Kline, 2016). Concerning relative model fit indices, we applied the CFI (comparative fit index, Bentler, 1995) and TLI (Tucker–Lewis Index, Tucker & Lewis, 1973), which compares the specified model with its null model (> .99 “outstanding fit”; .095 to .99 “very good fit”; .90 to .95 “acceptable fit”; .85 to .90 “mediocre fit”; < .85 “poor fit”; Little, 2013). We used the Satorra-Bentler scaled χ2 difference test to compare nested models (Satorra & Bentler, 1988, 2010).

Results

Our results report starts by describing separate models of writing development for the investigated languages, which we then use for the conjunct modeling of multilingual writing development to investigate between-language effects.

Measuring writing development in different languages

The conjunct analysis of writing skills in multilingual writing repertoires requires comparability of the measures, i.e., the structural equivalence of the constructs and the interpretability of their covariance over time. Therefore, we applied the same measurement model of writing skills with longitudinal metric measurement invariance to the writing task data in German, the relevant heritage language, and English in each data collection wave (see Fig. 1).

The German model shows a good data fit, and the models for the heritage languages and English provide a lower still acceptable fit (Table 4). Thus, our data meet the psychometric requirements for the latent modeling of multilingual writing development via an autoregressive panel model with cross-lagged effects. These results also confirm within-language relations of writing skills over time.

Table 4 Model fit indices for the measurement models of writing in German, heritage language (Russian or Turkish), and English (n = 965)

Modeling of multilingual writing development

We applied an LM model with three languages and three waves to model multilingual writing development among secondary students. To find the most parsimonious model, we estimated a series of four alternative models (M1, M2, M3, M4). Table 5 shows the model fit statistics.

Table 5 Autoregressive panel models of multilingual writing development in German, heritage language, and English (n = 965)

M1 serves as a reference (comparison model). It allows languages to correlate freely in the development of multilingual writing skills. M1 fits the data very well, revealing significant between-language effects. M2 and M3 are subject to restrictions that challenge the existence of language interrelation.

In M2, we fixed all path coefficients between languages to 0 (zero-order correlations, residual covariances, and cross-lagged regressions), assuming writing skills in different languages to be independent. M2 shows a poor data fit, and the following comparison with M1 favors the less restrictive M1 (Δχ2df = 27, n = 965] = 691.89, p < .001). M3 relaxes the assumption of no between-language effects made by M2 by only fixing cross-lagged effects between languages to zero while freely estimating zero-order correlations (wave 1) and residual covariances (wave 2, wave 3), thus accounting for unobserved heterogeneity as the cause for interlanguage correlations (internal validity).

The M3 also fits the data well. However, the SRMR exceeds the benchmark, indicating that the model cannot reproduce crucial relationships among the observed data. The comparison of M3 to M1 also favors the still less restrictive M1 (Δχ2[Δdf = 18, n = 965] = 129.07, p < .001). Therefore, we assume that substantive between-language effects over time indicate interrelated writing development in multilingual repertoires.

Finally, with M4, we attempted to validate our model against the objection that we do not measure relationships in writing but merely in cognitive ability. We controlled the initial correlations between the languages for a proxy on general cognitive ability. This variable shows the expected positive regression among all three languages (criterion-related validity) but does not affect other model parameters, indicating construct separation (discriminant validity). This result shows that our measurement of multilingual writing development is independent of cognitive ability. However, there are other potential influences on the developmental process of multilingual writing skill, as indicated by the residual covariances in the second and third waves. M4 shows a good data fit that is almost identical to M1. Since M1 and M4 are not nested models, we could not apply the χ2 difference test for comparison. Instead, we investigated information criteria (AIC, BIC) that slightly favor M1. However, we proceeded with M4, since we deemed the information on cognitive abilities as more valuable than model parsimony.

Figure 3 shows a path diagram of the structural level of the final model, M4. For convenience, it displays only significant path coefficients. In the following paragraphs, we will highlight the key findings. We start with the initial state in the first wave. Then, we discuss within- and between-language effects among the developmental process of multilingual writing skill.

Fig. 3
figure 3

Latent Markov model of students’ writing development in German, heritage language, and English, with their cognitive ability controlled for with t 1 as a single indicator factor (M8, see Table 1); n = 965; *p < 5%; **p < 1%; ***p < .1%

Controlling for the putative confounding of writing skills with cognitive abilities, the left side of the figure displays the aforementioned significant regressions on English (β = 0.46, p < .001), German (β = 0.24, p < .001), and heritage language (β = 0.13, p = .001). The resulting partialized zero-order correlations at the observational periods’ starting point are r = .26 (p < .001) between German and the heritage language, r = .50 (p < .001) between German and English, and r = .38 (p < .001) between the heritage language and English. Therefore, it is not surprising that both languages taught in school have the strongest interrelation.

Concerning within-language development, we see substantial first- and second-order autoregressive effects that represent the stability of the interindividual differences over time. For the most part, these positive effects are moderate for first-order relationships and weaken for second-order relationships, indicating relatively high stability in interindividual differences over time that increasingly detach from one another. The much stronger stability of the interindividual differences in English and heritage language from the first to the second wave can be explained by a total lack of significant cross-lagged paths on second-wave English and heritage language writing. Therefore, the autoregressive path is not decomposed and contains the entire change variance.

Concerning interlanguage relations, we find four significant cross-lagged effects, all of which are positive. This result supports the notion that multilingual writing skills are mutual resources. The effect sizes, ranging from 0.09 to 0.13, may not seem particularly high. Notably, however, these coefficients are partialized for both the initial correlations between the languages in the first wave and writing development within the languages. Considering these restrictions, finding any significant effects is remarkable.

From the first to the second wave, we see positive effects from the heritage language (β = 0.09, p < .050) and English (β = 0.23, p < .001) on German and, from the second to the third wave, from German on English (β = 0.13, p < .010) and vice versa (β = 0.15, p < .050). Therefore, the reciprocal interrelation between writing proficiencies is closer for German and English, while the heritage language seems to be associated with an interrelation of a lesser degree.

Another notable result is the residual correlations between the languages in the second and third waves. This is the part of the observed indicator covariances that the latent process cannot reproduce. These unexplained interlanguage relations suggest that exogenous variables, which influence multilingual development, are not contained in the model, although the benchmark of the SRMR is not violated.

Discussion

The current study investigated multilingual writing proficiencies among German-Russian and German-Turkish migrant secondary students in Germany. Using a developmental perspective of multilingual writing, we examined the interrelation between the majority language (German), the heritage language (Russian or Turkish), and the foreign language (English) across three waves of data collection. Based on the FoM approach (Cenoz & Gorter, 2011) and models of multilingual literacy acquisition (Canagarajah, 2015), our study aimed to answer the following research questions: What are the relationships between multilingual writing proficiencies? Do they serve as mutual resources in the process of writing development?

Previous research showed positive interrelations between languages in multilingual writing repertoires and considered these results an indication that languages serve as mutual resources (e.g., Riehl, 2020, 2021; Soltero-González et al., 2012; Sparrow et al., 2014; Usanova & Schnoor, 2021a). However, these findings stemmed from cross-sectional data, but panel data are needed to test the resources hypothesis. Other studies using longitudinal writing data to investigate multilingual writing development have not considered heritage languages, thus missing an important part of migrant students’ writing repertoires (e.g., Schoonen et al., 2011; van Gelderen et al., 2003).

In the current study, we relied on the longitudinal data modeling approach of Klinger (2015) and Klinger et al. (2019) to apply an autoregressive panel model with cross-lagged effects that examines within- and between-language effects in the process of multilingual writing development by using longitudinal competence data.

As a first result, we found positive initial correlations between all languages at the beginning of the observational lag (wave1), which we controlled for cognitive ability to ensure that the correlation between writing abilities was not just a reflection of general cognitive effects, as critically discussed in Berthele and Vanhove (2020). Thus, we could reproduce previous research findings in our writing data.

Moreover, we were able to extend the limits of previous research. First, we provided a more comprehensive understanding of the multilingual writing repertoires of migrant students by simultaneously considering three languages in an integrative model of multilingual writing development. Second, we used longitudinal competence data to decompose covariance between languages to isolate those parts of the variances that are truly predictive for change within and between languages. This approach tests the resources hypothesis more rigorously.

By taking advantage of our panel data, we found within- and between-language relations in multilingual writing development. Concerning within-language effects, we found only positive correlations over time. This positive path dependency of within-language writing development indicates moderate to high stability in the interindividual differences in students’ performance rankings over time, with a suitable number of divergent developmental pathways at the intraindividual level.

However, our most important finding is the existence of between-language effects in multilingual writing development. We found four significant cross-lagged effects over time, all of which are positive. From the first to the second wave, we observed positive effects from the relevant heritage language on German. Such a positive effect of a heritage language on German was also previously shown regarding the oral language skills of younger children (Klinger, 2015). We have now been able to detect this positive influence on writing among migrant secondary students. This finding indicates that writing skills in both heritage and foreign languages serve as resources for writing development in the majority language among adolescent writers and supports Cummins’ interdependence hypothesis.

Thus, writing skills in students’ heritage language represent an integrative part of their multiliterate repertoires. As writing is a highly strategic skill, all resources at a person’s disposal can be used to fulfill the writing task. The observed cross-lagged effect that stems from heritage language indicates that students in our sample actively drew on the resources available in their heritage language to compose their text in German. The effect occurring only once reflects the dynamic nature of the interrelations between the languages in a person’s repertoire, which change over time. Indeed, we did not expect that the effects of the heritage language would occur in each wave. First, the application of resources in writing is always situation dependent. Students draw on their resources as needed. In our analyses, we covered only three particular measurement points. Potentially, further influences may have occurred at any other time point before or after the tested period. Second, the autoregressive model controls for past levels on the outcome in predicting change over time, which may greatly reduce the magnitude of the effect (for the problematics of small effects in longitudinal modeling, see Adachi and Willoughby, 2014). Under this model specification, a substantial change in the influence needs to occur at further measurement points to provide a sufficient magnitude for the effect to appear within the model. For such influences to take place, stable institutional support for the development of writing in a heritage language is needed, which is not the case within the current migration situation; learning opportunities in a heritage language tend to be limited (Brehmer & Mehlhorn, 2018; Usanova & Schnoor, 2021b). Given our samples’ specifics, i.e., most students were born in Germany and all spent their school careers in Germany, our results also support promoting the development of literacy skills in a heritage language in the context of migration.

We also found a positive interrelation between German and English, but this was much denser. Effects from English on German occur at both lags, yet an effect from German on English only occurs from the second to the third wave. This finding is consistent with previous longitudinal research on the interrelationship between writing skills in the majority language and foreign language(s) (Schoonen et al., 2011). This dense connectivity between writing proficiency in German and English indicates that development in both languages is steady and that they can therefore serve as mutual resources over time. This result is expected since students receive comparable input in these two languages in institutional settings.

Overall, our study provides valuable insights into the interrelationship between majority language, heritage language, and foreign language writing skills in students’ writing repertoires. Furthermore, our results clarify the relationship between multilingual writing proficiencies. The following characteristics of this relationship can be elaborated:

  1. 1)

    the relationship is exclusively positive;

  2. 2)

    it is multidirectional and shows the influences at multiple time points;

  3. 3)

    it is dynamic and changes over time as students develop their writing proficiencies.

These results allow us to determine which model of multiliteracy acquisition (Canagarajah, 2015) best represents the observed data. The subtractive model does not fit because it assumes negative relationships between languages, which is false; we found initial correlations in the first wave. Moreover, the results do not support the additive model because it would explain the positive correlations of the languages within a wave, but it assumes no cross-lagged effects over time. On the other hand, the recursive and translingual models are both supported by the data, the translingual model in particular, due to the observed multidirectionality of influences. However, we cannot authoritatively answer this question because our findings do not specify whether writing skills are synthesized or summarized. Here, further empirical research is needed to address whether multilingual writing proficiency is a synthesized competence. Thus, further research needs to clarify what exactly underlies the relationships between multilingual writing skills (Berthele & Vanhove, 2020). The notion of a synthesized competence (Canagarajah, 2015) is one possible explanation for the observed data. Alternatively, multilingual writing strategies (Soltero-González et al., 2012) may underlie the interrelations of writing skills across different languages. The distinction between these two alternatives is between automaticity and intentionality; proficiency is somewhat automatic and effortless, and writing strategies are controlled, effortful, and intentional (for the distinction between skills and strategies, see Alves, 2019).

Regardless of whether the translingual or the recursive model underlies the interrelation of languages, our results support the general notion that multilingual writing skills are mutual resources, as suggested by the FoM approach. Thus, we affirm the large number of studies that have shown the application of languages to be fluid recourses in writing at the surface level (e.g., García & Kleifgen, 2019; Velasco, P., & García, 2014) by providing unique empirical evidence for the interrelation of writing skills at the unobserved proficiency level, which is the research area that has thus far lacked empirical evidence. This is the most important result of our study and is particularly relevant for both further research and for educational policies and practices.

Our study shows that by applying a resource-oriented perspective to multilingual skills, empirical studies may unravel the valuable strategic potential of languages in a person’s repertoire. Excluding students’ literacy skills in their heritage and foreign languages from a study design and data analyses while focusing exclusively on a majority language neglects the driving force of students’ language development. Resource-oriented empirical research with large datasets is urgently needed to deconstruct the myth that multilingualism hinders language development. This research provides a foundation for promoting the potential of migration-related multilingualism in education policies and practices in the context of increasing linguistic diversity.

Limitations

The current study has several limitations, and here we discuss those that we consider the most important.

The first limitation concerns whether the model of multilingual writing is comprehensive enough. The correlating residuals between the language-specific writing proficiencies in the second and third waves mean that some of their covariances are not explained by the endogenous process we modeled. This residual covariance suggests that in addition to writing proficiencies, other exogenous variables might be involved in multilingual writing development. Prominent variables are shared resources, such as writing strategies and metacognitive awareness. However, language-specific skills, such as depth of semantic knowledge (vocabulary), might impact the complexity of writing. Unfortunately, we do not have data to investigate these issues further.

The second limitation is methodological and concerns autoregressive latent panel models’ lack of an explicit theory of change (see for a discussion Selig & Little, 2012, 266 f.). Since panel models only process information about interindividual changes (longitudinal covariance structure), they typically do not contain information about intraindividual changes (longitudinal means structure). Therefore, panel models can only indicate between-individual uniformity developments. They cannot reveal the within-individual developmental trajectories that latent growth models provide, for example. Thus, a largely positive effect in a panel model could mean either positive, negative, or even no within-person development over time because the used indicators’ variance/covariance matrix does not carry this information. For this study’s case, one would have to add the mean vectors to the equation, but that would also require a higher level of measurement invariance. However, panel models are a recommended starting point for investigating developmental processes where little is known about developmental trajectories; they do not need an explicit theory of change.