1 Introduction

Data-driven educational technologies have significantly advanced, leveraging large volumes of learning log data in online and digital learning platforms (Prinsloo & Slade, 2017). As a typical data-driven service, learning analytics (LA) platforms with intelligent recommendation systems, can help educators to understand student progress, identify areas for improvement, and tailor instructional approaches accordingly (Klašnja-Milićević et al., 2015). The predominant focus of LA research is to analyze data to gain insights, inform decision-making, and personalize learning experiences (Hirsto et al., 2022), thus helping children develop their abilities and thrive in the Society 5.0 era. However, challenges persist in employing data-driven technologies in actual school settings, including interoperability of data systems, and establishing meaningful workflows aligned with educational objectives. These gaps hinder the deployment of cutting-edge educational technologies in real educational environments to address the evolving needs of educational practitioners.

Co-design, as a collaborative approach involving multiple stakeholders in the design process, is increasingly recognized as a valuable method to engage stakeholders in technology deployment (Roschelle & Penuel, 2006). In the education area, co-design has been applied across various settings, from primary to higher education (Iniesto et al., 2022; Mäkelä et al., 2018). It has been used to develop innovative teaching methods (Wong et al., 2014), curriculum designs (Lin & Brummelen, 2021), and technology-enhanced learning environments (Holstein et al., 2019). By involving all stakeholders, co-design promotes ownership, engagement, and a deeper understanding of the challenges and opportunities in educational settings for them. Therefore, in reforming teaching and learning with data-driven educational technology, co-design can play a vital role in narrowing the gap in technology deployment.

Inspired by EDUsummIT which encourages the cooperation of practice, policy, and research stakeholders, this paper demonstrates the co-design approach for data-driven technology and practice in Japanese contexts, through a series of case studies. From the co-design practice of educational technologies, the paper highlights a 6-phase co-design framework: motivate, pilot, implement, refine, evaluate, and maintain. By showing how LA can adapt to the need for educational practices, the framework suggests the potential to build trust in data among teachers and students and enact the use of data-informed tools in everyday learning (Prinsloo & Slade, 2017).

2 Deployment of Data-Driven Educational Technology: International Perspective and Japanese Context

2.1 Worldwide Practice and Gaps in Data-Driven Infrastructure

Learning analytics (LA) focuses on the measurement, collection, analysis, and reporting of information related to learners and their contexts to enhance understanding and improve learning outcomes (Lang et al., 2017). Many countries around the world have state-of-the-art LA techniques to inform and support decision-making and provide necessary support to learners (Becerra et al., 2023; Cobos, 2023). For instance, in the US, LA has been explored in higher education for purposes of monitoring, prediction, and intervention with automated feedback for prevention (Caspari-Sadeghi, 2023). Nonetheless, in some regions, such as southern Brazil, the use of LA is still in the preparation phase, where the development of infrastructures and teacher training is essential to meet teachers' expectations (Biancato et al., 2023).

To advance the field of LA, the presence of e-learning infrastructure is vital for providing data-driven support. While the COVID-19 pandemic led to vast educational data from various online platforms, existing digital tools and communication applications are insufficient to ensure effective education at both institutional and national levels. In the EU, the General Data Protection Regulation (GDPR) imposes substantial limitations on data utilization, and there remains a need for technology that can balance the utility of data with security concerns. In North America, data-sharing initiatives have been initiated through the Unizin consortium (https://unizin.org/), primarily focused on higher education. In Australia, despite a burgeoning research community (Colvin et al., 2015), the lack of a common infrastructure for data sharing hinders progress in this field. These regional variations underscore the challenges in deploying technologies in schools and emphasize considering regulatory, sector-specific, and infrastructure factors.

2.2 Japanese Context: Government Initiatives and Teacher Challenges

Compared to the aforementioned countries, the infrastructure and policy in Japan are more supportive, while the low acceptance of busy educational practitioners becomes a key obstacle to technology deployment.

In Japan, the Ministry of Education, Culture, Sports, Science and Technology (MEXT) spent over 430 billion JPY to introduce the Global and Innovation Gateway for All (GIGA) School Program to transform learning enabling one device per student. The COVID-19 pandemic caused nationwide school closures in Japan but progressed the development of the ICT environment. The closure of K-12 schools lasted for about 2 months, and higher education institutions remained closed for up to a year. Even after schools reopened, classes would temporarily shift online whenever there was a surge in the epidemic. This situation increased the ICT literacy of teachers and learners regarding using devices and conducting hybrid classes.

To promote the utilization of educational data, the Japanese government is advancing in the digitalization of education and establishing a data-driven model to acquire diverse data effectively and efficiently. MEXT has also set up an expert conference on educational data utilization, focusing on practical issues such as practices, policies, copyrights, and personal information. In October 2021, The National Institute for Educational Policy Research established the "Educational Data Science Center'' as a hub for analyzing and researching Japan's educational data and sharing results.

Despite the effort to promote ICT in education, school teachers still stick to traditional teaching methods and are reluctant to try digital tools. Japanese compulsory education teachers face heavy workloads, with concerns about overwork reaching levels where death from overwork is a concern. Compared to other countries, the burden of extracurricular activities is significant, and adopting ICT adds to the administrative burden on teachers. Despite the desire of teachers to increase the time spent interacting with students, the situation makes it difficult to do so based on enthusiasm alone.

2.3 Co-design as a Solution

Co-design offers a promising avenue to bridge the gaps above by engaging stakeholders in developing data-driven infrastructures, thereby fostering trustful relationships with data. For example, Bowers and Krumm (2021) illustrate how school leaders and researchers collaborated to visualize and interpret data from a shared LMS used by students across multiple schools and grade levels. Through the co-design process, designers can gain a deeper understanding of other stakeholders, thus increasing the likelihood of technology acceptance and adoption (Hoadley, 2017).

A key challenge of co-design is empowering learners and teachers to actively shape educational tools and practices in their learning contexts (Prieto-Alvarez et al., 2018). The global movement aims to involve a wide range of stakeholders throughout various co-design phases (Sarmiento et al., 2022), especially non-technical stakeholders (Baumer, 2017). Regions like Norway have taken proactive steps to address digital competence among students and teachers, ensuring they stay abreast of technological advancements, including AI literacy, coding literacy, and advanced information literacy. These movements have been showcased in EDUsummIT discussions, where practice, policy, and research perspectives are incorporated in the call to action in each TWG.

To achieve successful co-design, it is common to involve educators in generating innovative ideas. Chounta et al. (2022) surveyed teachers, revealing their top concerns in LA design to be speed, personalization, and assessment, offering insights into current perspectives. Additionally, testing and evaluating LA tools in development can exemplify co-design practices and inspire further refinements, like the design of LA dashboards (Wise & Jung, 2019). There are also instances where non-technical stakeholders actively participate throughout the entire design process of complex LA systems (Holstein et al., 2019).

3 Technical Infrastructure and Theoretical Framework of Current Research

In this research context in Japan, the LEAF (Learning and Evidence Analytics Framework) platform facilitates educational technology co-design (Ogata et al., 2022). Co-design within LEAF involves systematic stakeholder engagement across six design phases. Stakeholders in the subsequent cases are mainly from an experimental school with basic infrastructure like LMSs, WiFi, and one-tablet-per-student setup, representative of Japan's post-COVID educational landscape.

3.1 Overview of LEAF Infrastructure

As illustrated in Fig. 1, LEAF consists of three components: (1) Learning activity sensors and feedback, (2) Learning record store, and (3) Evidence record store. LEAF can be connected to any existing learning management system through the learning tools interoperability (LTI) protocol. Once the teachers and learners enter the LEAF platform, their learning and teaching activities are recorded in various sensor applications such as digital book reading system BookRoll (Ogata et al., 2015), self-directed learning support system GOAL (Li et al., 2021), and learning analytics dashboard LogPalette with multiple-purpose applications (Ogata et al., 2022). These records are transferred and stored in the format of standard xAPI statements, enabling cross-application processing and analysis. They are also processed and returned to the learners and teachers through LogPallete for instant interventions. Further, they undergo additional processing and extraction to yield higher-level indicators of learning, referred to as "evidence" (Kuromiya et al., 2020).

Fig. 1
figure 1

Overview of LEAF infrastructure

3.2 Co-design Approach for LEAF

The co-design of LEAF had a systematic approach to bring together the following five different stakeholders in the process:

  1. 1.

    Learner–the user of the LEAF system for individual or cooperative learning.

  2. 2.

    Practitioners–the teachers at the school or university level who will implement practices with LEAF to facilitate learning in their classroom.

  3. 3.

    Policymakers–The members of the school management who draw policies on the utilization of technology and interface with the parents of the learners.

  4. 4.

    Researchers–the members of an academic research laboratory responsible for conceptualizing the technology.

  5. 5.

    Developers–the members of the industry who might be developing parts of the product.

The stakeholders contribute across six broad phases:

  • Phase 1. Motivate and define technology core: In the first phase, the motivation and the basic definition of the algorithm or technical approach are conceptualized. This motivation can come from teachers or by observation conducted by researchers to understand teachers’ and learners’ needs. Approaching data-driven services, we start by defining the core technology innovation in the application domain of education (Ogata et al., 2022). This phase primarily involves the researcher, teachers, and learners.

  • Phase 2. Pilot: In the pilot phase, researchers implement a prototype technology. Typically, the pilot is in an educational environment with greater control over the developed prototype, such as part of the research lab’s academic activities with the members of the control classes in which the researcher was part of the teaching staff. Researchers also discuss the workflow and the implementation strategies with the teachers in the school.

  • Phase 3. Implement: The first stable version of the technology is implemented for use in the daily school context. The researchers and the practitioner also work together to prepare the onboarding materials for the learners and other practitioners. They conduct an early evaluation of and compile the new requirements.

  • Phase 4. Refine: After the initial implementation and trial, industry partners can update the technology based on socio-technical demands in the implementation phase. In this phase, further focus is on the technical aspects of scaling up.

  • Phase 5. Evaluate: The evaluation phase focuses on the impact on learning. At this phase, the researcher and practitioner decide on empirical study plans and collect data with designed experiments. This phase requires immense coordination as some studies focus on causal inferences with control and experimental group design, which is quite challenging to conduct in regular classroom sessions.

  • Phase 6: Maintain: The maintenance phase integrates the educational technology into the practitioner’s workflow. They become competent to utilize the system for daily teaching and learning practices. The industry partner at this phase can maintain the system for daily teaching–learning context and actively promote and train practitioners across multiple institutions. Policymakers must look at aligning the system's use and educational objectives to suggest long-term improvements for practitioners and researchers.

4 Case Studies

To illustrate the practical implementation of the LEAF infrastructure and co-design efforts in a Japanese context, this section will provide a collection of case studies. The case studies focus on concrete examples of how the co-design was conducted in different learning contexts, suggesting the broader potential to employ co-design to promote ICT deployment. Table 1 offers a general overview, which spans various co-design phases.

Table 1 The summary of case studies

4.1 Data-Enhanced Active Reading Using Active Reading Dashboard

Active Reading (AR) strategies aim to actively engage learners in the reading process, crucial for enhancing reading comprehension in language learning (Pulver, 2020). While previous studies used technology like e-books for AR, there's limited research from an LA perspective. To address the gap, an AR dashboard (AR-D) was developed (see Fig. 2), focusing on visualizing the reading process and providing feedback. AR-D is designed to promote learners' reflection on their reading process to improve their reading performance and motivation by using learning logs visualized on the dashboard. It also scaffolds teachers to prepare for lessons and make decisions about the class and subsequent activities. Graph-based algorithms (TopicRank, TextRank, and MultiPartiteRank for the reading phase; TF-IDF and LexRank for the summary phase) are used to calculate the similarity between documents as a score. During AR activities, lists and text overlays in AR-D can be used to confirm answers to the questions about the text content found in the textbook or created by teachers, and word clouds can be used to confirm word meanings and key concepts.

Fig. 2
figure 2

The interface and key functions of AR-D

The developed AR-D prototype was first implemented in a university flipped-online AR class by the researcher (Toyokawa et al., 2021). We confirmed AR-D's potential to create data-driven learning contexts, prompting teachers to plan the next activities and students to reflect on their learning, both in and out of class. Next, AR-D was introduced into secondary schools and repeatedly updated through learning practices. In an experiment at the high school, we verified the effectiveness of AR in LEAF. We found that in addition to acquiring vocabulary and understanding reading content through AR activities, using AR-D also enabled students to develop a positive attitude towards learning in the target language (Toyokawa et al., 2023). On the other hand, we also received critical comments about the system UI, usability, and accessibility from authentic learners and teachers Taking the comments from users and interviews with a teacher who has used the LEAF over the past 2 years, the new AR-D was designed with the integration of a Word Per Minute (WPM) automatic extraction function and a textbook recommendation function based on past markers and memos.

AR also finds applications in Special Needs Education (SNE), employed in a public elementary school's resource room to assist students needing extra support (MEXT, 2016). The data-enhanced AR implementation revealed issues in the reading process, like lack of concentration and adherence to learning functions, not visible in traditional paper-based AR. Conducting such experiments requires close collaboration with teachers and obtaining parental approval. In a pilot study examining pen-stroke analysis to detect issues from handwritten memos (Toyokawa et al., 2022), teachers discuss with parents before researchers and teachers decide, with verification possible only after obtaining their approval. Requiring consent from additional stakeholders, such as parents, may hinder desired experiments. Therefore, repeated discussions with resource room teachers were held regarding creating and deciding learning materials, activities, schedules, and data sharing, enabling visualization of students’ strengths and weaknesses for teacher sharing. With AR-D dashboards serving as a conduit for sharing information with parents and school officials, this instance exemplifies the expectations for LA, which serves as a bridge to connect stakeholders in co-design communications.

4.2 Pen-Stroke Interactions for Self-Explanation in Mathematics Classes

Traditionally, exercises in mathematics classes require handwritten answers involving detailed working out. It was reported during collaboration with K-12 teachers that there is a burden of distributing and collating the results of such exercises and that BookRoll could serve to distribute the exercises and collect handwritten answers in the form of pen strokes (Yoshitake et al., 2020). Often a teacher will review the exercise answers to find good examples to either explain in front of the class or provide as a worked example for answering self-explanation. This led to the development through co-design with classroom teachers of the pen-stroke and self-explanation modules with LEAF where students can play back their answers, and explain their thought processes. Conventionally, Intelligent Tutoring Systems (ITS) and Intelligent Computer Assisted Instruction (ICAI) systems that diagnose and advise students on how to solve their math problems are often primarily rule-based and are costly to develop as they rely heavily on input from domain experts, however, our system is more data-driven in nature, focusing on the answering process described through students' self-explanation. This data-driven approach has the potential to improve as data is collected by the system and used to refine real-time feedback, sample answer generation, and automated scoring. As shown in Fig. 3, the pen-stroke input interface allows the learner to handwrite directly on the page of the exercise as would normally be done when using physical exercise books or working out sheets in the classroom.

Fig. 3
figure 3

Early incarnation in the co-design process of the pen-stroke input function and pen-stroke analysis self-explanation interface (reproduced from Flanagan et al., 2021b; Yoshitake et al., 2020)

Self-explanation is a process where students reflect on their problem-solving approaches, articulate their thoughts, reason, and apply skills (Chi et al., 1989). Advanced techniques, including pen stroke input time series analysis, can aid in pinpointing deficiencies in prerequisite knowledge and guide the tailored use of self-explanation (Yoshitake et al., 2020). The self-explanation module in the LEAF supports the reply of pen-stroke answers that have been recorded in Bookroll, allowing the student to reflect on their answer. A combination of the analysis of the pen-stroke information as time series data and the self-explanations as a form of annotation to the pen-stroke data allows the system to identify the following: the order of self-explanations in relation to the answering process by comparing the timestamps of pen strokes being replayed and the self-explanation annotations; pen stroke data also includes information on where students backtracked by erasing some of their working out in the answering process, which can be used to identify impasses and potential weaknesses. The analysis of self-explanations, considering the required knowledge and assessing adherence to appropriate scoring rubrics in the process (Nakamoto et al., 2024), introduces an additional level of complexity to the system. The development of the self-explanations rubrics overview (see Fig. 4), was through the co-design process with teachers in the classroom and researchers who were developing the system to automatically analyze pen-stroke and self-explanation data to provide real-time feedback to students (Nakamoto et al., 2023a). The rubric used in this system was co-created by teachers as domain experts and researchers to reflect the required knowledge and adherence in the answering process. This was used to create scoring models based on the rubrics. Some issues were identified during the co-design phase, such as the limitation of self-explanation samples that could impact the accuracy of the proposed algorithm. A model trained using a limited self-explanation data set collected from learners was designed to alleviate this problem by automatically generating sample answers (Nakamoto et al., 2023b).

Fig. 4
figure 4

An overview of the pen-stroke and self-explanation analysis process (reproduced from Nakamoto et al., 2024)

4.3 Recommendation for Mathematics Exercises with Explainable AI

The LEAF AI recommendation system overcomes a common limitation in traditional learning support systems by providing detailed explanations for recommended exercises (Ogata et al., 2024). It utilizes exercise-answering logs and pre-registered metadata related to knowledge units within the learning materials, which are labeled through collaboration with publishers and teachers. The system continuously improves its recommendations by analyzing students' exercise results and tailoring explanations based on the frequency of successive failures in each learning topic. Figure 5 presents the interface of the recommendation system.

Fig. 5
figure 5

The main interface of the recommendation system

Co-design of the recommendation module involves the motivation and refinement of the technical core and interfaces. When we initially introduced the idea of recommending mathematical materials for solving wheel-spinning (active but unproductive effort) to teachers, they expressed skepticism about its effectiveness. Hence we used log data to demonstrate instances of wheel-spinning, clarifying the system's benefit for stuck students to their concerns. Weekly meetings with teachers guide system design, incorporating manifold feedback to enhance the system. Teachers' feedback on our pilot prototypes, such as students seeking clarity on interface numbers, helps to refine the system's balance between detailed explanations and user-friendly clarity.

Close ties and communications with teachers also drive new system functions. As initiated by one authentic teacher in a practical presentation on the implementation of learning analytics in the LEAF system, designing a series of quizzes for student assessments is an essential task for educators, posing a significant burden on teachers. Previously, teachers estimated average scores and answer times for test questions based on their own experience. With the analysis dashboard tool, even inexperienced teachers can easily do this, reducing the time required for question creation (Takami et al., 2022). Therefore, we proposed a method for automatically generating question sets adjusted to the desired time and difficulty levels by teachers and students based on BKT parameters and quiz answer history database (see Fig. 6). This feature is expected to alleviate the workload associated with quiz preparation for educators and can also serve as a practice test for students before the actual examination.

Fig. 6
figure 6

Test set maker optimizing solving time and BKT parameters value (reproduced from Takami et al., 2022)

Besides the creation phase of co-design, education practitioners also help in implementation. To promote the system’s use in education, collaborative efforts with teachers resulted in the creation of informative posters (as described in Fig. 7). These materials actively encouraged junior high schools to adopt the system, emphasizing its benefits for students of diverse academic levels and personalities.

Fig. 7
figure 7

A poster promoting the usage of a recommendation system for students with English translation

4.4 Algorithmic Group Formation Using Learning Log Data

Group learning is increasingly adopted, emphasizing social-emotional aspects and interpersonal skills. In the LEAF group learning module, various learner attributes seamlessly integrate to automatically form homogeneous or heterogeneous groups. As shown in Fig. 8, this tool empowers educators to choose attributes and group sizes based on their needs, utilizing data from LMSs and reading logs from BookRoll. External attributes like test scores and survey responses can be easily uploaded through the dashboard. These data are aggregated via LEAF LTIs, constructing a vector for each student, and students are allocated into groups through iterative processes with a genetic algorithm (Flanagan et al., 2021a).

Fig. 8
figure 8

Algorithms and data leveraged in the group formation module of LEAF

The initial system version had limited functionality, creating groups based on rankable numeric values processed by genetic algorithms and fitness functions. Liang et al. (2021) applied this system in a primary school math class for jigsaw activities, which required group reconfiguration within a single class session. This insight from the experiences of actual teachers prompted the system developers to understand the unique demands of practical contexts, motivating them to enhance the system by enabling group reconfiguration within existing groups. Concurrently, teachers identified key variables they desired to incorporate into the group formation process, such as communication skills and relationships, which guided further refinements of the system. The reformed system allowed teachers to efficiently create groups for multi-phase activities, leading to increased student engagement and positive affective states. Following these implementations, feedback seminars were conducted, where teachers reflected on their teaching experiences and shared their thoughts and concerns on system usage. During these sessions, representatives from the enterprise that sponsored audio collection and recognition devices visited the study class. These collaborative efforts involving diverse stakeholders play a pivotal role in addressing contemporary educational challenges.

The integration of overlapping annotations provides another illustration. In Liang et al. (2023), an experienced English teacher, with over a decade of teaching experience, introduced the concept of grouping students based on shared or distinct BookRoll markers. This concept also aligns with bibliographic coupling, a concept introduced for detecting academic collaboration and suggested by the system developer with an informetrics background, providing a theoretical foundation for this feature. Further, an empirical study was conducted to examine the effect of the marker-based group formation extension in a Japanese middle school, where students collaborated in groups to comprehend and act out an English story. During this experiment, the former teacher who initially proposed the idea, collaborated with the class organizer to develop quizzes for group formation and evaluation rubrics to appraise group work performance. They also graded the summary writing assignment for the story. The results revealed that group members with varying difficulty markers contributed to enhanced vocabulary learning, as evidenced by their vocabulary quiz scores and summary evaluations. This collaborative preparation allowed the teachers to gain a clearer picture of the system's data pipeline. In this case, the co-design efforts not only yielded a valuable system evaluation but also opened a promising avenue for frontline teachers to embrace the capabilities of data-driven systems in their classrooms, thereby facilitating their future use.

4.5 Self-Direction Skill Acquisition with Data from Learning & Physical Activities

Self-direction skill (SDS) is crucial for fostering learners' independence and organization for knowledge acquisition in the twenty-first century. (Hill et al., 2020; Toh & Kirschner, 2020). By reflecting on what and how they have learned, learners become aware of their learning processes and possible alternative strategies. To provide opportunities to start the SDS practice for K12 learners, Li et al. (2021) developed the Goal Oriented Active Learner (GOAL) system. Co-designing this project holds three aspects: infrastructure building of the GOAL system, modeling with data (involving rubric formulation), and activity implementation (requiring collaboration with teachers).

To build the infrastructure to aggregate multi-source learner data, the motivation stemmed from the lack of research supporting SDS skill acquisition tailored to individual contexts and data (Toh & Kirschner, 2020). Majumdar et al. (2018) designed the DAPER model, a five-phase data-driven approach to SDS execution and acquisition. With the initial GOAL system, integrating the health and learning data would help test common training design modules for SDS taking both health and learning as contexts. Then the university prototype was created as a mobile app. Then a pilot study was conducted in a graduate seminar course, with monthly meetings and participant feedback guiding the development and plans.

Subsequently, Li et al. (2020) transitioned from the native app to a web-based system. Learning data from the BookRoll reading activity enabled students to plan, monitor, and reflect on their reading activities. Simultaneously, they secured grants to provide students with smartwatches, developing a service API to collect Garmin activity data. With user workflows and data pipelines for the services, an experimental campaign of physical activities with school teachers and 119 seventh-graders was launched. The students tracked their sleep and engaged in data collection and analysis tasks through the GOAL system. Figure 9 depicts the learning tasks that engage the students to develop their analysis skills with data from self-directed activities. First, they analyze self-status using the visualization tool displaying their own steps data and average/maximum/minimum data from their peers. They can also check the criteria relating to the activity. Second, they report their activity trends of the most recent days and predict their activity status for the next day. Finally, they check the feedback given by the system to promote analysis skills.

Fig. 9
figure 9

Learner workflow of analysis tasks with data from self-directed activities

The system is currently in continuous use in the school context for extensive reading, weekly tests, and vacation campaigns (Majumdar et al., 2023). With over 1300 users’ data, there lies a potential for extracting learning habits and creating a dashboard for supporting their development. For instance, from longitudinal study data across 3 years, Hsu et al. (2023) identified the regular patterns of the learners, such as learning in the morning to prepare for the weekly math quizzes. On the other hand, they also detected the phases of behavior change where the learners stayed to provide adaptive feedback and support them to build reading habits via the learning dashboard.

4.6 Evidence Portal for Sharing Practices

The definition of evidence varies depending on the subject matter of interest (Davies, 1999). In data-driven education, evidence refers to the authentic indicators of intervention practice, compared to subjective opinions from teachers (Majumdar et al., 2019). To extract evidence from log data, teachers must divide classes into intervention and control groups for comparison. This task demands skills, knowledge, and experience in selecting appropriate groups. Conventionally, evidence extraction often relies on statistical experts, and it can impose time and cost burdens for teachers. The LEAF system enables evidence extraction via an evidence portal shown in Fig. 10 (Kuromiya et al., 2020). Herein, the effectiveness of intervention classes can be measured without support from experts, underpinned by metrics from BookRoll reading logs and LogPalette operation logs. The results obtained can assist teachers in refining their lesson design to better align with students' needs.

Fig. 10
figure 10

The interface of the evidence portal that supports evidence-based practice

The development of the evidence portal began with prototypes designed to incorporate insights from various researchers (Ogata et al., 2018). These prototypes underwent refinement based on feedback from high school and university educators to support evidence-based education from both theoretical and practical angles. By incorporating real-world data from teaching practices in authentic classroom settings, the evidence portal's development is rooted in practical application, enabling it to mirror real-world dynamics.

To introduce the evidence portal into the classroom, Nakanishi et al. (2021) proposed and implemented a workflow (see Fig. 11) in a first-year high school mathematics class, where the data reported was collected from 40 students in one month. The authors evaluated this workflow through a teacher survey, where teachers answered that it deepened their understanding of their skills. They were also concerned about students' correct answer rates when providing instruction. The classroom implementation suggests the potential of the evidence portal to support teachers' reflection and encourage class improvement by embedding evidence-based education into practice. This case paves a promising avenue for continually developing applications and supporting teachers to implement evidence-based education.

Fig. 11
figure 11

Workflow in evidence-based practice (reproduced from Nakanishi et al., 2021)

Incorporating systems to support evidence-based educational practices necessitates an awareness of the inherent limitations of the gathered evidence. These limitations stem from the reliance on classroom practices provided by teachers themselves, introducing subjectivity that can affect the integrity and reliability of the evidence. While such systems can enhance educational practices through evidence-based approaches, they inherently possess a lower level of reliability compared to randomized controlled trials, which remain the gold standard for empirical evidence. This highlights the need for future exploration of alternative methods to ensure evidence reliability, independent of sole reliance on instructional practice data.

5 Outputs and Insights of Co-design Practice in Japan

This section explores the typical practices within six co-design phases as demonstrated in the aforementioned cases. We also delve into broader concerns about data-driven delivery, including its implications for learners and ethical considerations. Drawing from our practical experiences, we conclude by presenting implications and expectations for each stakeholder, ultimately aiming to achieve successful co-design for educational technology and practice.

5.1 Implementing Co-design for Data-Driven Service in Education

As demonstrated in the case studies, two landscapes for initiating the co-design of technologies emerge: the define-refine-maintain and pilot-refine-implement workflows. These frameworks embody two ways of conducting research (Ogata et al., 2022). The former is practice-driven, initiating design based on teacher input (e.g., from a pen stroke to identify sticking points in mathematics problem-solving, as seen in the self-explanation study). On the other hand, the latter is theory-driven, where research applications motivate design. It starts with researchers, followed by confirmation, explanation, and implementation with educational practitioners, involving trials of the basic functions in the initial prototypes (e.g., the goal system originated from the SDS theory with a pilot study involving teachers).

It's noteworthy that co-design with practitioners doesn't always start from scratch. Ideas from practitioners can also influence the delivery of interventions in the refining and maintenance phases. Additionally, not necessarily for the entire tool, teacher-driven design can focus on sub-functions within existing infrastructures. This recognizes the challenge of expecting non-technical teachers to conceive entirely new tools beyond existing structures. Nevertheless, through co-design practices with teachers, especially pilot demonstrations and guidance, teachers can get used to the concepts in the educational technologies, and in turn get willing to employ the tools in their class, thus narrowing the gaps for non-technical stakeholders and forming a virtuous cycle. Figure 12 highlights the overall phases and the involvement of the stakeholders in the LEAF co-design phases.

Fig. 12
figure 12

LEAF co-design phases and involvement of stakeholders

5.2 Implications and Expectations for Stakeholders

Our case studies indicate that teachers directly communicate with researchers and contribute practical ideas and demands, such as creating test sets and enabling re-grouping for jigsaw activities. Besides, in-time feedback from teachers is essential for refining the technical core. Periodical meetings have proven to be necessary, and in achieving this connection, the feature of enabling experimental schools in the Japanese educational context serves as a playground for pilot studies in the co-design process. This instance holds relevance from a policymaker's perspective.

For researchers and system developers, co-design is not confined to communication with teachers, as discussed in the aforementioned section. Collaboration among different research teams can facilitate studies both theoretically and practically, exemplified in the co-design marker-based group formation extension. Such practice broadens the horizons of research with related theories from different areas.

Last but not least, beyond educational practitioners and researchers developing data-driven tools, industrial enterprises, and textbook publishers can also play a role in the co-design process, especially in implementation. They can provide technical infrastructure, such as laptops for the GIGA school program in Japan, metadata for e-textbooks, and other data-collection services. Their involvement extends to introducing innovations to more learning scenarios, thereby promoting research outcomes to more schools.

5.3 Reflections on Co-design Projects in Japan

As emphasized by Wise et al. (2021), pedagogical needs play a vital role alongside data in guiding LA research. The emphasized cases underscore the imperative of bridging the gap between "real-world education" and researchers' perspectives. To tackle this challenge, we're conducting pilot demonstrations and experiments in several schools across Japan. These schools serve as initial testing grounds before broader public implementation. Discussions with teachers from these schools revealed the importance of considering 'real' students and designing systems that benefit them. Therefore, beyond technical breakthroughs, it's crucial to devise user-friendly systems that promote participation, facilitate data accumulation, and foster a symbiotic relationship between research and practical application.

Moreover, ethical considerations are inherent in educational data-driven designs (Ueda et al., 2021). Implementing recommendation systems faced constraints due to research agreements, limiting interventions for students, and complicating comparisons across multiple groups. These concerns are amplified when dealing with special-needs learners (Toyokawa et al., 2022), necessitating close communication with stakeholders and meticulous agreement processes involving parents. Co-design practices aid in clarifying data ownership and privacy concerns by enhancing understanding of educational interventions and data utilization, as illustrated in the group formation case, ensuring user comfort with data used for future research endeavors.

In summary, consistent with trends proposed by Masiello et al. (2024), our Japanese experience shows that co-design practices facilitate better integration between data science and learning sciences, allowing the value of data-driven technology to flourish in authentic educational contexts.