1 Introduction

In recent years, chatbots have become increasingly common in various domains, enabled by recent advancements in natural language processing and increased usage of mobile and online messaging platforms. A chatbot can be defined as a computer program designed to simulate conversation especially to provide information or assistance to the user as part of an automated service [1].

In the educational context, chatbots have been used, for example, in admissions [34], elective course selection [12], helping students in their campus life [13], language learning [8] and instructional scaffolding [46]. One prominent use case for chatbots in education is a virtual student advisor who helps students with studies, wellbeing, and other issues. While early prototypes of such systems exist [5, 28], this is mostly unresearched territory.

In the present study, we introduce a setting where a chatbot proactively provides support opportunities to students by asking them if they need help. We investigate whether students trust the chatbot and are willing to disclose their support needs to it. The theoretical background for our work is, on the one hand, in help-seeking behavior in academic and health contexts, and on the other hand in human-computer trust. In the following, we present relevant literature and our detailed research questions.

1.1 Help-Seeking Behavior

Help-seeking has been researched in both contexts of self-regulated learning [22, 23] and mental health [39]. Depending on the context, help-seeking can be defined as ‘the process of seeking assistance from other individuals or other sources that facilitate accomplishing desired goals’ [22] or as ‘an adaptive coping process that attempts to obtain external assistance to deal with a mental health concern’ [39].

In the self-regulated learning context help-seeking is seen as a skill and a strategy instead of an act of dependency [22]. Newman [32] describes an adaptive help-seeker as someone who begins by accurately assessing that help is necessary, formulates an appropriate request for help, understands the best resources available, designs strategies for successful requests, and productively processes the help received. However, many students needing help may not seek help due to feeling hopeless or threatened, or due to the lack of adequate help-seeking skills [22]. White and Bembenutty [45] identified three different kinds of help-seekers in their study; 54% of students saw help-seeking as an important self-regulatory strategy, 32% of students showed a tendency to avoid help-seeking yet were able to use adaptive strategies, and 14% felt that seeking help implies inadequacy. Regarding help-seeking for mental health, only a minority of adolescents reporting symptoms for mental health seek and receive help from specialist health services [47].

The factors related to avoiding help-seeking are diverse. Early research on help-seeking found low achievement being associated with the reluctance to seek help [33]. Regarding motivational orientations, mastery-oriented students are more likely to seek help, whereas performance-oriented are more likely to avoid seeking help [10, 21]. Moreover, instructional and emotional support by teachers predict help-seeking behavior [14]. In a mental health context, adolescents themselves see stigma and embarrassment, problems recognizing symptoms, and a preference for self-reliance as the most important barriers to help-seeking [18]. On the other hand, parents perceived systemic issues, views and attitudes towards services and treatment, understanding of mental health problems and the help-seeking process and family circumstances [38] as main barriers.

Technological advancements have resulted in major changes in help-seeking research and practice [24]. Data from information and communication systems expand opportunities to track the student learning process to more completely understand help-seeking. [9, 23]. Moreover, help-seeking can now include assistance from sources that do not comprise communication with an actual person [40]. In a context of mental health, the key benefits of online help-seeking include anonymity and privacy, immediacy, ease of access, inclusivity, the ability to connect with others and share experiences, and a greater sense of control over the help-seeking journey [37]. There are some concrete examples. First, Andalibi [6] found that Instagram-users used #depression -hashtag to connect with others having similar experiences and to seek support. Second, Frost and Casey [15] found in relation to self-injury that young people who were least likely to seek help overall, were most likely to seek help online and that internet may be a proximal step to face-to-face help-seeking. Third, Glasheen and colleagues [16] found that students experiencing psychological distress had a preference for online counseling.

In summary, avoidance of help-seeking is a complex problem. The nature of the problem has changed with technological advancement: young people are willing to search for help online, but seeking professional help remains an issue. Here, chatbots show promise as a solution combining interactivity with the ease of access and lack of stigma. These will be discussed in more detail in the following section.

1.2 Human-Computer Trust and Chatbots

An established view is that people form trusting relationships with computers and assign them human characteristics [27, 44]. The literature on the differences between human-human interaction and human-computer interaction, especially in sensitive topics, shows some mixed findings. For example, Mou and colleagues [30] found that users tended to be more open and self-disclosing when interacting with humans than with AI. On the other hand, Ta and colleagues [42] found that the social companion chatbot Replika can provide a “safe space” in which users can discuss any topic without the fear of judgment or retaliation.

Also, Zamora [48] considers a lack of judgment as a unique aspect of chatbots. They found that having a conversation and gaining insights on sensitive topics without being judged is valuable among American and Indian participants. However, they also found several participants voicing concerns about mishandling their sensitive data and were afraid of possible leaks [48]. Thus, developing trust with a chatbot will be required for meaningful interactions. In general, trust can be defined as the willingness of a trustor to be vulnerable to a trustee’s actions based on the expectation that the trustee will perform a particular action important to the trustor, irrespective of the ability to monitor or control the trustee [26].

Various factors have been found to affect humans’ trust in chatbots. Høiland and colleagues [19] found that participants were willing to trust a mental health chatbot when they felt that the chatbot cared for them and perceived it as comforting. Toader and colleagues [43] found that participants who interacted with a chatbot anthropomorphized as female reported significantly higher willingness to disclose personal information, showing that also gender stereotypes may have an effect. Moreover, Müller and colleagues [31] found that people with different personality profiles significantly vary in their trust in chatbots. Regarding usage context, Aoki [7] found that the public’s initial trust in chatbots was lower in chatbots for parental support than in chatbots for waste separation. Also cultural differences may affect trust: Chien and colleagues found that general trust towards automation varied significantly between the US, Taiwan and Turkey [11].

Views on favorable use cases for chatbots seem to vary. In a medical context, Powell [36] argues that artificial intelligence needs to supplement rather than replace medical professionals. Palanica [35] found that physicians may be comfortable using chatbots to automate simple logistical tasks but do not believe that chatbots are advanced enough to replace complex decision-making tasks requiring expert medical opinions. On the other hand, there is some early evidence that human-chatbot relationships may positively affect well-being [41] and that chatbots can provide actual care in mental health issues [19]. While these scenarios may seem suspicious and risky, positive experiences in interaction with the chatbot seem to increase trust and encourage more self-disclosure, further strengthening the human-chatbot relationship [41].

1.3 Aims of the Study

According to Moore and colleagues [29] there is widespread recognition that student support initiatives should address both academic and non-academic needs. Based on previous research, chatbots show promise as a novel way to provide learning-related [28, 46] and health-related support [19]. In the present study, we take an integrative approach to target all student concerns related to studies, well-being, and other issues.

Usually, getting help requires the student to take initiative. As shown in help-seeking research, this may prevent many young people in need of help from actually getting it because of stigma, feeling threatened, or preference for self-reliance (possibly relying on the information found on the internet) [18, 22, 45]. We hypothesize that when support opportunities are proactively presented to all students by a chatbot, soliciting help becomes cognitively easier, socially more acceptable, and simply more convenient.

Finally, a central issue with new technologies such as chatbots is whether students trust the technology. Both too high and low levels of trust may cause misuse, abuse, or disuse of the technology [20]. In the present study, the students should trust the chatbot enough to respond and solicit help, but on the other hand, be realistic about its capabilities.

In the present study, we use a rule-based chatbot to proactively offer both academic and nonacademic support to all students, not just those at-risk. This low-threshold opportunity is designed to catch students’ latent concerns and needs. When a need for support is recognized in the conversation, a human professional will take over and provide the requested support. In the following, we present the research questions.

RQ1: Will Students Disclose Their Support Needs to a Chatbot that Provides Support Opportunities? We answer this question by conducting a chatbot pilot with four vocational programs in a large Finnish vocational education and training (VET) organization. We investigate if and how students respond to the chatbot and whether there are any differences between vocational programs or gender.

RQ2: Do Students Trust the Chatbot and How Satisfied They Are with It? After the pilot, we conduct a survey asking the students about their experience with the chatbot. Satisfaction is measured by a five-point customer satisfaction instrument and trust by a multi-dimensional scale by Gulati, Sousa, and Lamas [17].

RQ3: What Are the Connections Between Responding to the Chatbot, Needing Support, Being Satisfied with the Chatbot, and Trusting the Chatbot? We hypothesize that responding, being satisfied, and having trust are all intertwined together. Moreover, we hypothesize that not responding is connected with low levels of trust with the chatbot. Correlations between these variables are calculated to answer this question.

2 Methodology

2.1 Context

The study was conducted in a large Finnish vocational education and training (VET) institution. The Finnish VET system aims to increase and maintain the vocational skills of the population, develop commerce and industry, and respond to its competence needs [4]. Around half of the students completing their basic education in Finland continue to VET instead of general upper secondary education. Vocational education and training also enables pupils to continue their studies in higher education [3].

Four vocational programs (information and communications technology, electrical engineering and automation technology, safety and security, and social and health care) participated in a pilot in which a chatbot contacted students offering them support opportunities. At the time of the research, teaching was primarily organized as distance education because of the COVID-19 pandemic. While distance education was the main rule, some small group teaching and workshops by special needs teachers were organized in the school premises, and on-the-job learning was carried out normally whenever possible. The school offered student support services in a hybrid model, where students could choose if they want to meet in person or via teleconferencing. However, the school did not organize social events related to student support during the distance education period.

2.2 Participants

All the students who had started their studies between August 2020 and January 2021 in target programs (N = 275, see Table 1) were part of the chatbot pilot program, and 49 of them agreed to participate on a research survey after the pilot. The gender distributions were male-dominant on technology programs and female-dominant on the social and health care program, which is typical for corresponding programs in Finland.

Table 1. Participants

2.3 Intervention

The chatbot intervention was carried out using a chatbot called Annie Advisor. Two weeks into the beginning of spring term 2021, each student in the pilot programs received an SMS message from the chatbot (see Fig. 1). In the chatbot conversation, students were offered the possibility to disclose a need for support. In case the student needed help, the chatbot asked students to specify their requirements further. If the student did not respond in 24 h, the chatbot reminded the student. If the student still did not respond, the student was marked as needing support, and the case was assigned to a designated teacher mentor.

Fig. 1.
figure 1

The chatbot from a student’s perspective.

Fig. 2.
figure 2

The chatbot system from a professional’s perspective.

Based on the chatbot conversation and the collaboratively pre-designed classification of support needs and responsible professionals (see Table 2), the system assigned the support cases to the corresponding professional. Professionals then used the chatbot system’s administration view (see Fig. 2) to track their students and carry out the following steps (e.g., setting up meetings).

Table 2. Classification of support needs.

2.4 Measures

The data for this study were collected in two ways. First, we extracted log data of the chatbot system to collect data on students’ responses. Second, one week after the pilot, we sent a survey link to students to collect data on user satisfaction and human-computer trust.

Responses in the Chatbot Pilot. A dummy variable responded takes the value 1 if the student responded to the chatbot within 72 h from the message being sent and otherwise 0. Furthermore, a dummy variable needed support takes the value 1 if the student disclosed the need for support while responding to the chatbot and 0 if the student indicated that no support is required. When support is requested, the needs are classified into three support categories: studies, wellbeing, and other issues. Although more fine-grained data was available in the system, the categories are reported on this aggregated level for privacy reasons.

User Satisfaction. The student’s general satisfaction with the chatbot was measured by a question, ‘In general, how satisfied were you with Annie?’ rated on a scale ranging from 1 (‘Very unsatisfied’) to 5 (‘Very satisfied’).

Human-Computer Trust. To measures students’ trust in the chatbot, we used the human-computer trust scale by Gulati, Sousa and Lamas [17]. The scale consists of 12 items measuring four dimensions: risk perception (3 items, e.g. ‘I believe that there could be negative consequences when using Annie’), benevolence (3 items, e.g. ‘I believe that Annie will act in my best interest’), competence (3 items, e.g. ‘I think that Annie is competent and effective in offering support’) and general trust (3 items, e.g. ‘If I use Annie, I think I would be able to depend on it completely’). Items are answered on a scale from 1 (‘Disagree’) to 5 (‘Agree’). The scores of risk perception items are inverted so that a higher score indicates less risk perception. Also, we calculated a total trust score using all the trust items.

2.5 Analyses

First we combined the chatbot system log data with the survey data. For participants who had not responded to the survey, we marked survey measures as not available. In all of the statistical analyses, we included the largest possible number of participants in each calculation. We carried out chi-squared tests to determine if there were differences in responding or needing support between survey respondents and other pilot participants. We found no significant differences.

Sum variables were created for the dimensions of the human-computer trust scale and checked for internal consistency, which was satisfactory for risk-perception (\(\alpha = 0.67\)) and good for all other trust measures (\(\alpha \ge 0.84\)). We then calculated descriptive statistics for all the measures and ran pairwise Spearman correlations between responded (yes/no), response time, needed support (yes/no), user satisfaction, and trust (i.e., risk perception, benevolence, competence, general trust) to determine the relationships between the measures.

We used chi-squared tests to test for possible differences between programs or genders on responding to the chatbot or needing support. Regarding user satisfaction, we used Mann-Whitney U-test to test for differences between genders and independent samples Kruskal-Wallis to test for differences between vocational groups. For trust measures, we used independent samples T-test to test for differences between genders and a one-way ANOVA to test for differences between vocational groups.

3 Results

3.1 Students’ Responses to the Chatbot

Outcomes of the chatbot conversations are presented in Table 3. The average response rate was \(86 \%\), which can be considered very high. Based on chi-squared tests, there were no significant differences in responding between programs or genders on responding or not responding to the chatbot.

Altogether \(19 \%\) (\(N=44\)) of participants disclosed a need for support in the chatbot conversation. Studies-related needs were most common (\(N=26\)), followed by well-being-related (\(N=13\)) and other (\(N=5\)). Based on chi-squared tests, there were no significant differences between programs or genders on needing or not needing support. A small sample size prevented testing differences on support need categories (studies, well-being, other) between programs or genders.

Table 3. Outcomes of the chatbot conversations.

3.2 User Satisfaction and Trust

The means and standard deviations for user satisfaction and trust are presented in Table 4. The mean score for user satisfaction can be considered satisfactory (\(3.82 \pm 0.97\)). We found a significant difference between genders on user satisfaction using independent samples Mann-Whitney U-test (\(U=121.5, p=0.03\)), with female students being more satisfied (\(4.36 \pm 0.81\)) than male students (\(3.68 \pm 0.97\)). Using independent samples Kruskal-Wallis test, we found no significant differences in user satisfaction between different programs.

The general trust score was satisfactory (\(71\%, 3.55\, \pm \, 0.72)\). Out of different trust measures, inverted risk perception scored highest (\(3.79\, \pm \, 0.88\)), followed by benevolence (\(3.58\, \pm \, 0.84\)), competence (\(3.53\, \pm \, 0.85\)) and general trust (\(3.33\, \pm \, 0.92\)). We found no significant differences between genders or programs in any of the trust measures, based on independent samples T-test and one-way ANOVA.

3.3 Connections Between Variables

Correlation analyses revealed multiple significant correlations between variables (see Table 4). First, as anticipated, all the trust measures were positively correlated with one another, with the lowest correlation between risk perception and competence (\(\rho =0.31\)) and highest between benevolence and the total trust score (\(\rho =0.90\)). Second, all the trust variables positively correlated with user satisfaction, benevolence having the highest correlation (\(\rho =0.49\)) followed by the total trust score (\(\rho =0.47\)), competence (\(\rho =0.43\)), risk perception (\(\rho =0.30\)) and general trust (\(\rho =0.29\)).

Whether students responded to the chatbot or not was positively correlated with the total trust score (\(\rho =0.29\)). This confirms our hypothesis that the lower the trust, the less probably a student responds to the chatbot. However, correlations with risk perception, benevolence, competence, and general trust were not significant.

Finally, whether students needed support was positively correlated with user satisfaction (\(\rho =0.34\)).

Table 4. Correlations, descriptive statistics and measures of internal consistency.

4 Discussion

4.1 Responses to the Chatbot Reveal Latent Needs for Support

As shown in the literature, avoiding help-seeking is a problem both in the context of self-regulated learning [22, 45] and mental health [18, 47]. Our first research question was whether students would disclose their support needs to a chatbot. Our results show that students responded to the chatbot with a very high response rate (\(86\%\)). For comparison, Manfreda and colleagues found an \(11\%\) average response rate in a meta-analysis of online surveys [25]. Furthermore, almost every fifth of respondents disclosed a need for support. To put the figure into context, Zachrisson and colleagues found that 6.9% of adolescents reported seeking help for mental problems during the preceding twelve months in a large Norwegian sample.

We consider these results a success and as a signal that there indeed are latent needs for support. The possible explanation for these high numbers is that when support opportunities are offered proactively with a fixed set of choices, soliciting help becomes cognitively easier and socially more acceptable. It is considerably easier, both practically and mentally, for a student to answer an SMS compared to initiating contact with a support professional.

Regarding user satisfaction, students who needed support had higher satisfaction with the chatbot than those who did not need support. This was an expected result, as for students not needing support, the experience was presumably neutral. Even in this case, the initial message conveys a message that the school cares for its students and that support is available if needed later.

Interestingly, we found a gender difference in satisfaction, showing that female students’ satisfaction with the chatbot was significantly higher than male students’. Regarding other measures in the study, we found no differences between genders. In future studies, this should be investigated in more detail.

4.2 Trust with Chatbots in Different Scenarios

Our results showed satisfactory levels of trust with the chatbot (total trust score 71%). Furthermore, the level of trust was positively correlated with student’s likelihood to respond to the chatbot, indicating that not responding is to some extent related to lack of trust. Based on the literature, the need for trust seems to increase as the content of conversation becomes more sensitive. For example, sharing a concern related to well-being may require more trust than sharing a study-related need. However, within the present study, we did not aim for students to self-disclose themselves to a chatbot but merely catch a need for further discussion with a human professional.

Regarding the different dimensions of trust, benevolence (i.e., acting in the user’s best interest) had the highest correlation with user satisfaction and the total trust score. This is also in line with the findings by Høiland and colleagues [19], who found that feeling chatbot as ‘caring’ was linked with the willingness to trust the chatbot.

The question of an optimal level of trust is complex and can also vary based on the chatbot’s aim. For example, when a chatbot is used to provide actual care, the requirement for trust is naturally higher. However, there are also situations where too high trust may cause problems. For example, in our study, we did not want students to report details about their health to the chatbot for security reasons.

4.3 Limitations

An obvious limitation of the current study is that the sample size is relatively small, male-dominant, and from only four programs in a single institution, entailing the results’ limited generalizability. Another factor limiting the generalizability is the cultural context, which affects in multiple ways. First, general trust towards automation has been shown to vary between countries [11]. Second, student support’s availability and scope vary a lot between educational systems, and the intervention used here might not be viable in another context. Third, young people’s preference for online communication about secrets, inner feelings, and concerns varies between countries and is relatively high in Finland [2].

One possible problem is that it might be difficult for students to distinguish their experience of using the chatbot from receiving help from a professional. Students who received support, especially the user satisfaction measure, may reflect satisfaction with the whole process from requesting help to receive it.

Furthermore, our comprehension of nonresponding students is limited. While it is possible that a student does not answer because there is no need for support, not responding even after reminding could also be seen as an alarming sign. Therefore, we included the non-responding students on the chatbot system’s administration view as students potentially needing help. However, analyzing non-respondents more closely is out of the scope of the present study.

4.4 Implications

Although the response rate was very high, the chatbot’s total trust score was only satisfactory, and means to increase trust should be investigated. Randomized controlled trials with alternative designs (e.g., different communication channels and wording, scheduling, and personalization of the messages) should be carried out to determine the optimal design. Moreover, cultural differences in trust should be addressed with a study carried out in multiple different contexts. Also, we find it important to address the nonresponding students and possible false negatives (students reporting everything to be fine although actually needing help) in more detail.

An important aspect to consider is that recognizing many needs on short notice may strain the institutions’ resources. However, pre-emptive care aims to prevent students’ problems from escalating into more severe problems, which also burden the support personnel. We hypothesize that large-scale adoption of this kind of system would initially increase the workload but, while in place, actually save time and target the use of support resources in a more impactful way.

5 Conclusions

In this study, we used a chatbot to proactively provide support opportunities to students by asking them if they need help. Our results showed that students were ready to solicit help from a chatbot and that this kind of approach is applicable for recognizing students’ latent needs for support. Moreover, we show that students’ trust in the chatbot was positively correlated with their general satisfaction with the chatbot and their likelihood to respond to the chatbot. While an adequate level of trust with the chatbot is important, future studies are needed to understand better the formation of trust between the student and the chatbot and cultural differences in trusting chatbots.

Disclosure

Joonas A. Pesonen is employed as Chief Product Officer at Annie Advisor Ltd, receives a salary from the company, and owns stocks of the company.