Introduction

Distance learning has a history spanning almost two centuries. It is normally described as a type of remote instruction with the aid of communication technology and special institutional organization (Moore and Kearsley 2011). Nowadays, with the maturity of computer technology and extensive application, web-based learning systems are becoming increasingly popular. According to the data collected by Class CentralFootnote 1, by 2016, around 58 million students worldwide have taken at least one massive open online course (MOOC), and the total number of courses has grown to 6,850. Schedule flexibility and ease of accessibility are two major reasons for students to choose to study online (Ally 2004). However, these advantages are accompanied by a concomitant problem of isolation. Students are likely to perceive low sense of community (feelings of connectedness with community members and commonality of learning expectations and goals (Rovai 2002)), which leads to high dropout rate and unsatisfactory learning outcome (Zheng et al. 2015). In order to promote the interaction and facilitate students’ learning process, computer-supported communication tools have been popular in recent online courses (Branon and Essex 2001; Weber and Brusilovsky 2001; Naidu and Järvelä 2006; So and Brush 2008). Specifically, the interaction can occur in either asynchronous or synchronous manners (Hrastinski 2008). For example, a discussion forum is a typical way of supporting students to engage in asynchronous communication, where students can post messages at their own pace (Vonderwell 2003). Electronic email is another widely utilized asynchronous communication tool, mainly used to enhance the interaction between students and their instructors (Wild and Winniford 1993). On the other hand, some tools are regarded as synchronous communication channels, commonly applied to create an interconnected learning environment. For example, students can communicate with their instructors and learning peers synchronously through exchanging messages in a chat room during the online class (Coetzee et al. 2014). In addition, some note-taking facilities, such as “NoteBlogger”, allow students to take handwritten notes on top of the instructors’ slides, and their notes are immediately reviewable by peers (Simon et al. 2008).

Meanwhile, developing more personalized communication support tailored to students’ personal characteristics has evolved as a trend in recent years, because it is critical to engagement and retention of online students (Betts 2009; Mandernach 2009; Roll and Wylie 2016). Based on the reports from Drexel University’s online Master of Science in Higher Education Program, students are more likely to be engaged throughout their courses and stay connected as learning peers when they study in a more personalized online environment. Personality has been considered a valuable personal factor to incorporate into the provision of personalized learning supports, as educational psychologists believe that personality can be important for understanding how students engage in online courses and whether they take responsibility for self-direction and discipline (Felder et al. 2002; Kim et al. 2013). To be specific, studies have found that personality is useful for explaining students’ motivations, needs, preferred instructional approaches, relationships with teachers and peers, and academic achievements (Chamorro-Premuzic and Furnham 2008; 2009; Chamorro-Premuzic et al. 2007; Duff et al. 2004; Hanzaki and Epp 2018; Komarraju et al. 2009; Pavalache-Ilie and Cocorada 2014; Solimeno et al. 2008).

However, the issue of how to obtain students’ personality has not been well solved. The existing studies mostly rely on psychological questionnaires to explicitly obtain users’ personality, which unavoidably demands user effort. From users’ perspective, they may be unwilling to respond to the quiz for the sake of saving effort or protecting their privacy. The application of existing personality-based learning supports will thus be limited. Although lately some studies have endeavored to infer personality based on online learners’ behavior (Chen et al. 2016; Ghorbani and Montazer 2015; Halawa et al. 2015), they mainly consider some individual learning activities, such as the number of entrances to system and the time in watching material. Little work has empirically explored the effect of students’ personality on their usage of both synchronous and asynchronous communication tools in web-based learning systems and furthermore derived personality from their communication behavior.

Therefore, we have aimed to answer the following two research questions:

RQ1: :

How does students’ personality influence their usage of communication tools in web-based learning systems?

RQ2: :

To what extent can personality be inferred from students’ online communication behavior?

In order to answer the two questions, we conduct our experiment with 164 college students in eBanshuFootnote 2, which is a web-based learning system equipped with both synchronous and asynchronous online communication tools. Concretely, the synchronous communication tools include a chat room for students to exchange messages with both instructors and peers in real time, a hands-up facility for students to ask questions to instructors in the online class, a note-taking facility and note-sharing facility that allow students to take notes and share their written notes with others. The asynchronous tools include a discussion forum where students can ask questions and/or answer other students’ questions, a materials-sharing facility for students to share their learning materials with others, and an assignment submission facility for students to submit assignments to instructors.

We have first performed a multiple linear regression analysis to study how students’ personalities influence their behavior in using online communication tools. In our study, a student’s personality is defined based on the Big-Five factor model (including five factors: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (Digman 1990)), which is one of the most often used personality models in many domains such as the industry, education, forensic clinic, and health psychology domains (Barrick and Mount 1991; Costa 1991; McCrae and John 1992). It is worth mentioning that the behavioral features not only include students’ in-class and after-class activities (such as the numbers of messages posted in chat room and discussion forum respectively), but also qualitative linguistic characteristics embedded in the messages’ textual contents. Our results demonstrate that a total of 32 features are empirically proved to be significantly affected by students’ five personality traits (p < 0.01 with Bonferroni-type adjustment). Among them, there are 22 content features (e.g., number of first-person plural pronouns used in each chat message), 9 activity features (e.g., total number of messages posted in chat room) and 1 demographic feature (i.e., gender).

We have then built an inference model (i.e., Least Absolute Shrinkage and Selection Operator) based on these significant features to predict students’ personalities. The results show that our approach achieves better personality prediction accuracy than that of previous work inferring personality in similar or other domains such as social networking websites. Hence, other practitioners could apply the model to implicitly acquire information about student personality and implement personality-based learning support.

In the following, we begin by introducing the related work of this study. We then describe our experimental setup and analyze our results. Finally, we summarize the experimental findings and discuss our work’s practical implications, limitations, and future directions.

Related Work

The Big-Five Model of Personality

As mentioned previously, there are five personality factors defined in the Big-Five model (Digman 1990). Openness to Experience can be used to judge whether a person is creative/open-minded (with a high value) or reflective/conventional (with a low value). Conscientiousness inherently leads a person to become self-disciplined/prudent (with a high value) or careless/impulsive (with a low value). Extraversion distinguishes people who are sociable/talkative (with a high value) from those who are reserved/shy (with a low value). Agreeableness reflects individual differences in concern with cooperation and social harmony. People with a high Agreeableness value tend to be trusting and cooperative, while people with a low value are likely to be aggressive and cold. The last factor is Neuroticism, which reflects an individual’s tendency to experience psychological distress: People with a high Neuroticism value become emotionally unstable more easily than those with a low value (Digman 1990).

The Ten-Item Personality Inventory (TIPI) developed by Gosling et al. (2003) is a brief instrument used to acquire a person’s personality. It has been regarded as a reasonable proxy for longer Big-Five instruments, as it not only reaches adequate levels in terms of convergent and discriminant validity measures, but also complies with the trend toward shorter instruments that save users’ time (Gosling et al. 2003; Rammstedt and John 2007). Therefore, in this study, we use the TIPI questionnaire to measure students’ personality.

Personality and Learning

With the goal of identifying individual differences in students’ learning motivation/performance, some researchers in the education community have studied the role of personality. Komarraju et al. (2009) found that personality can be used to explain students’ learning motivation. For instance, the personality traits Conscientiousness and Openness to Experience are positively related to intrinsic motivation, while Extraversion is positively related to extrinsic motivation, suggesting that self-disciplined or curious students are willing and eager to learn. However, extraverted students are more likely to be motivated by some external rewards, such as a college degree. Moreover, Chamorro-Premuzic and Furnham (2008) and Duff et al. (2004) observed that personality can affect students’ academic performance. For example, Conscientiousness and Neuroticism are identified as stronger predictors of exam grades, indicating that those who are more self-disciplined (with a high Conscientiousness value) or emotionally stable (with a low Neuroticism value) tend to achieve better academic outcomes. In addition, Hanzaki and Epp (2018) reported that they could predict grades in MOOCs when using both students’ personality and the level of collaboration in an online course.

Students’ preferences for teaching styles and learning strategies are also influenced by their personality. For instance, Chamorro-Premuzic et al. (2007) identified that more introverted students (i.e., low Extraversion values) are inclined to show a stronger preference for independent study and that students who are more emotionally stable (i.e., low Neuroticism values) tend to enjoy lab classes and clinical teaching. Chamorro-Premuzic and Furnham (2009) showed that Openness to Experience has a significant effect on students’ preferences for learning approaches. Those who score higher on this personality trait are more likely to adopt deep learning (i.e., learning with understanding) rather than surface learning (i.e., rote learning). Busato et al. (1998) found that meaning-directed learning styles are more preferred by students with higher Conscientiousness values and lower Neuroticism values, indicating that self-organized or emotionally stable students tend to find out exactly what their study material meant and interrelate what they have learned.

As for the effect of personality on students’ communication behavior in web-based learning systems, it was found that introverts normally behave more actively in discussion forums (Pavalache-Ilie and Cocorada 2014), whereas extraverts often post messages with larger numbers of words in chat rooms (Blau and Barak 2012). Extraverts also tend to use more social and cognitive process words in their messages than introverts (Wu et al. 2016). Chen and Caropreso (2004) reported that students with higher Openness to Experience values (i.e., students who are outgoing and intellectual) prefer to engage in two-way communication in which they can share and negotiate ideas with peers through successive and progressive dialogue. Wilson (2000) found that the sensing-thinking students sent more messages and used more words in each message than the intuitive-feeling students. In addition, Ghorbani and Montazer (2015) associated students’ personality with their behavior such as entering a learning management system (i.e., Moodle) in a week, adding posts in a forum, and dedicating time to reading materials. According to the results, Agreeableness and Extraversion are positively correlated with students’ participations in forums, and Neuroticism is positively related to a frequency of delay in assignment submission and negatively correlated with the number of entrances to the system per week. Chen et al. (2016) explored the effect of personality on Massive Open Online Courses (MOOCs). Through correlation analysis, they found that among the students with low prior knowledge, those who are more self-disciplined (i.e., high Conscientiousness values) or conventional (i.e., low Openness to Experience values) are inclined to engage more with learning materials and attempt to solve more questions. Looking at students with high levels of prior knowledge, they observed that conscientious students tend to behave more actively in discussion forums in terms of their number of replies, forum posts, and forum interactions. Additionally, extraverted students tend to spend less time in the forums than those who are introverted. However, the limitation is that little work has empirically compared the effect of personality on students’ communication behavior in a web-based learning system when both synchronous and asynchronous communication tools are provided.

Personality Prediction

Although personality has been proven to be helpful in the provision of personalized learning, until a few years ago, the acquisition of personality was mainly through extensive questionnaires such as the 50-item IPIP (Goldberg et al. 2006) and the 240-item NEO-PI-R personality quiz (Costa and McCrae 1992). This kind of explicit way to obtaining personality unavoidably demands a lot of user efforts and potentially influences the prospect of personality-based applications in real life.

Lately, there have been endeavors to automatically infer users’ personality from their self-generated data. To be specific, most of the research mainly focused on social networking sites, which provides a unique opportunity for personalized services to capture various aspects of user behavior. For example, Gao et al. (2013) tried to detect users’ personalities based on their micro-blog data. They concretely extracted content features from 1,766 Sina micro blog users and predicted users’ Big-Five personality with acceptable Pearson Correlation by using several regression models, such as Gaussian Process, M5 Rules and Pace Regression. Similarly, Golbeck et al. (2011) first identified a set of features extracted from users’ Facebook profiles, such as personal information, activity features, structural features, and language features. With two machine learning algorithms, M5 Rules and Gaussian Process, they reported that they could predict each of the five personality traits to within 11% - 18% of its actual value. Kosinski et al. (2013) attempted to recognize users’ Big-Five personality mainly based on their “like” behavior in Facebook. Specifically, the authors reduced the number of features by using Singular Value Decomposition and further applied the logistic regression model to identify the personality. Their results showed that the two personality traits Openness to Experience and Extraversion could be accurately recognized. Shen et al. (2015) reported that Neuroticism and Extraversion could be inferred based on users’ Facebook data, such as their personal profile (e.g., number of friends), interaction behavior (e.g., average number of comments and likes) and posted content (e.g., number of positive/negative words in posts). Wei et al. (2017) predicted users’ personalities by integrating heterogeneous information on Twitter, which include self-language usage, avatar, emoticon, and response patterns. They indicated that their model achieves better prediction performance relative to several widely adopted baseline methods.

Besides, Ferwerda et al. (2015) attempted to infer personality traits from the way users take pictures and apply filters to them in Instagram. They found some features extracted from Instagram picture such as brightness, saturation, and pleasure-arousal-dominance are correlated with personality, especially Openness to Experience. They further trained their predictive model with the radial basis function network and achieved comparable accuracy with previous work on personality prediction from other social media trails. Moreover, Farnadi et al. (2016) performed a comparative analysis of state-of-the-art personality inference methods on a varied set of social media ground truth data from Facebook, Twitter and YouTube. They leveraged univariate and multivariate regression models, and observed that the Multi-Target Stacking Corrected model and the Ensemble of Regressor Chains Corrected model perform the best. In addition, they identified that most of the significant features varied depending on the specific dataset and could not be well applied in another domain.

In addition to the related studies on implicit acquisition of user personality in social networking sites, van Lankveld et al. (2011) analyzed users’ behavior in a video game and extracted some features like conversation behavior, movement behavior and miscellaneous behavior so as to infer users’ personality. Chittaranjan et al. (2013) used mobile phone usage information (e.g., call logs, SMS logs and application-usage) as the predictors to detect users’ Big-Five personality traits. They achieved better classification results (via Support Vector Machine algorithm) than the baseline. Shen et al. (2013) tried to infer users’ personality by analyzing their behavior when writing emails. They first extracted some high-level aggregated features from email contents, such as bag-of-word features, meta features, word statistics and writing styles. They further applied these features to three generative models (i.e., joint model, sequential model, and survival model). Their results demonstrated that the survival model performs the best among the three generative models in terms of both prediction accuracy and computation efficiency. In addition, Wu and Chen (2015) derived user personality from their implicit behavior in movie domain. Concretely, they first identified a set of features that are significantly correlated with users’ personality traits, which include both features that are specific to the domain (e.g., users’ preference for movie genre and movies’ diversity, watching duration, and watching motives) and domain independent features (e.g., users’ rating behavior and age info). They then integrated all of these features into a regression model to infer users’ personalities and found that Gaussian Process performs the best in terms of inference accuracy.

With the recent popularity of web-based learning systems, some studies have also attempted to infer users’ personalities based on their online learning behavior (Ghorbani and Montazer 2015; Halawa et al. 2015; Chen et al. 2016). To be specific, Ghorbani and Montazer (2015) proposed a fuzzy inference system to identify 53 students’ personalities with their online behaviors on Learning Management System (LMS). They first extracted a total of 13 observable behaviors as the input features of their fuzzy system, such as the number of entries to a system in a week, adding posts in forums, and dedicating time for reading materials. They then defined a set of fuzzy rules based on experts’ knowledge and personality definition. With the combination of fuzzy variables and fuzzy rules, they finally obtained the predicted category of each personality trait (i.e., low, medium or high). Their experimental results revealed that their proposed fuzzy models can predict the personality traits Extraversion, Openness to Experience and Agreeableness with acceptable accuracy. Similarly, Halawa et al. (2015) detected 240 students’ Myers-Briggs Type Indicator personality types (Myers et al. 1998) from their behavior on LMS (Moodle) and a social network. Concretely, they utilized a unified classification model to combine all of the 9 behavioral features, such as the number of pages visited on the system by a student, the number of comments written by a student in course pages and the number of the student’s early or late assignment submissions. Their experiment results revealed that their comparative classification models can achieve satisfactory prediction accuracy, among which OneR algorithm performs the best. Chen et al. (2016) attempted to predict 763 students’ personality traits from their behavior carried out in Massive Open Online Course (MOOC) traces. Specifically, they extracted 20 activity features for each learner from their MOOC platform, including the number of quiz questions a student attempted to solve, the number of messages posted in a forum, and the total amount of time spent on the MOOC platform, and so on. They observed a low ability to predict personality traits from students’ behavior in MOOC traces with two regression models, Gaussian Process and Random Forest.

As for the limitations, it can be seen that the relevant work in learning domain basically performed their personality inference models by considering some of students’ individual activities. Few of them have studied in depth the role of students’ online communication behavior in predicting their personality. In addition, as personality has shown significant correlation with some linguistic features (Gill and Oberlander 2002; Mairesse et al. 2007; Golbeck et al. 2011), it would be interesting to extract linguistic features from text contents (such as the messages posted in chat rooms and discussion forums) and further study their impact on personality prediction.

We are thus interested in not only exploring the effects of personality on students’ behavior towards different types of communication tools, but also investigating how to infer students’ personality from their communication behavior in a web-based learning system.

Experiment Setup

Materials and Participants

In order to answer our research questions, we conducted an experiment on eBanshu web-based learning system, which was released in 2013 and has been used by more than 20 universities in China with over 33,000 students who have enrolled on 100 courses so far. On this website, instructors can use video cameras and digitizers (for writing notes) to give real-time lectures. In the online class, students can communicate with instructors and peers through a text chat room, ask or answer questions by using the hands-up facility, and take notes and share them (see Fig. 1). After class, they can leave messages in a course-based discussion forum, share learning materials, and submit assignments. These communication tools are provided for students to freely use, not counted in their final assessment.

Fig. 1
figure 1

Snapshot of the synchronous instruction interface in eBanshu (www.ebanshu.com)

From March to June 2015, 1,559 bachelor students from Hebei Normal University in China enrolled in 16 elective courses in 3 different subject types: liberal arts (9 courses, e.g., “Comparative Literature”), science (6 courses, e.g., “Discrete Mathematics”), and engineering (1 course, “Microcomputer Principles and Interface Description”). Each student enrolled in one course, and the average enrollment per course was 97.3 students (min= 50, max= 209, st.d.= 42.2). Each course lasted for 12 weeks, with 2 lessons given every week (each lesson took 1 hour). At the end, students received credit if they passed the assignments and examinations. We sent survey invitations to all of these students before they attended class, and 202 students accepted the invitations. After filtering out incomplete and invalid answers to our survey questionsFootnote 3, we ultimately collected data for 164 students (including 95 females). Their ages ranged from 20 to 25 (mean= 22.3, st.d.= 0.93), and the students were from 11 different majors (e.g., English, Physics, Mathematics, Pedagogy).

Procedure and Measurement

Before each course started, we asked the students to fill in a questionnaire about their personality. As mentioned in previous section, we accessed the students’ personality traits via the TIPI questionnaire (Gosling et al. 2003). To be specific, each personality trait score comprises the average of scores on two related questions. For example, one question assessing Extraversion is “I see myself as: extraverted, enthusiastic”, which is rated from 1 (“disagree strongly”) to 7 (“agree strongly”).

We also measured a student’s actual behavior during the whole course, as the system can automatically record his or her behavior in a log file. The log file includes not only the activities he or she carried out in and after class, but also all of his or her text messages posted in the chat room and the discussion forum.

From the log file, we extracted two types of features: activity features and content features (see Table 1). The activity features are further divided into two categories: in-class activity features that include students’ attendance rates, frequency of using hands-up facility, number of messages posted in the chat room, number of notes taken in class, frequency of sharing notes, frequency of using the mouse, and frequency of using the keyboard; after-class activity features that include students’ number of messages posted in the discussion forum, frequency of sharing learning materials, and assignment submission behavior such as submission rates and initiative. The assignment submission initiative score measures the proportion of time advance (i.e., how far ahead of the deadline the assignment was submitted) by a given time period, which is calculated as \(initiative=\frac {Date_{AssignmentDeadline}-Date_{AssignmentSubmission}}{Date_{AssignmentDeadline}-Date_{AssignmentAssigned}}\). Higher assignment submission initiative scores imply that the student submitted the assignment much earlier than the scheduled deadline.

Table 1 List of students’ behavioral features in the online courses

The content features include linguistic processes, psychological presence, and task engagement of messages students posted in the chat room and the discussion forum. To be specific, the linguistic processes mainly reflect students’ writing styles through measurements of their sentence length, the number of times they use personal pronouns, their punctuation, and some special words. Psychological presence evaluates whether students’ online communication can foster collaborative and meaningful learning (Oztok et al. 2013), which is defined in two categories: social presence (the degree of awareness of others in an interaction) and cognitive presence (the extent of both reflection and discourse in the construction of meaningful outputs). If a word (in a message) belongs to the “social process” (the process of producing social interactions) or “affective process” (the process of expressing affective states), it is taken as an indicator of social presence (Oztok et al. 2013). Otherwise, if the word is coded as “cognitive process” (the process of thinking or remembering something), it is classified as cognitive presence (Oztok et al. 2013). To code these two kinds of features, we adopted a popularly used text analysis tool, Chinese Linguistic Inquiry and Word Count (CLIWC) dictionary (Pennebaker et al. 2015) (see the coding in Table 2). As for task engagement, it measures whether the posted messages are related to the course content (Chen and Caropreso 2004). Each message sentence’s task engagement level was manually determined by counting the occurrences of learning-related word/phrase. If the sentence contains a word/phrase like “ (assignment)” or “ (exam)”, it is classified as “fully-engaged”. If it contains a word/phrase such as “ (ask for leave)” or “ (technical support)”, it is classified as “somewhat-engaged”. Otherwise, if the sentence contains words/phrases that are not relevant to the learning task (such as the greeting words like “ (hello)” or the modal particle words like “ (wow)”), it is classified as “disengaged”. The definitions of each engagement level are also given in Table 2.

Table 2 Coding of message content’s linguistic processes, psychological presence and task engagement level (the process for linguistic processes and psychological presence is defined by Pennebaker et al. (2015))

Results And Analysis

Data Overview

We are interested in validating whether students’ personality will influence their behavior in using different communication tools in a web-based learning system. Before reporting the results, we first describe our collected data. In terms of our participants’ personality, the reliability analysis of the TIPI personality questionnaire shows that its internal consistency coefficient (Cronbach’s alpha) is 0.723 (>0.70), suggesting the questions have satisfactory internal validity (Nunnally et al. 1967). Furthermore, Fig. 2 presents the mean value of each personality trait (on a [1-7] scale): Openness to Experience (M = 4.66, SD= 1.02), Conscientiousness (M = 5.03, SD= 1.06), Extraversion (M = 4.46, SD= 1.16), Agreeableness (M = 5.16, SD= 1.12), and Neuroticism (M = 3.52, SD= 1.19).

Fig. 2
figure 2

TIPI scores of participants who joined in our user survey

In addition to students’ answers to our survey questions, we also have their behavioral logs that have been automatically recorded. The results of analyzing students’ activities and message contents are given in Table 3. As for activity features, we mainly focus on comparing students’ usage of synchronous and asynchronous communication tools. Particularly, using synchronous communication tools can facilitate students’ sense of community and promote real-time interaction between peers and instructors, while using asynchronous communication tools can encourage in-depth discussions and leave enough time for reflection (Branon and Essex 2001). As different students may have different preferences and engagement in synchronous and asynchronous communication, we think this individual difference may reflect their personality. To be specific, the average course attendance rate is 99.3%, indicating the students took the majority of online lessons. Moreover, during the whole course, all of the students had mouse and keyboard activities in class. 99.3% of the students posted at least one message in the chat room. 73.8% of the students took course notes, and 21.3% of these students shared their written notes with others at least once. 40.9% of the students had experience using the hands-up facility. After class, 96.3% of the students submitted assignments through the system, and 81.1% shared their learning materials at least once. Relatively, the percentage of students who used the discussion forum is lower, with 42.1%. On the other hand, the average numbers of activities among all students show that the frequency of posting messages in the chat room is largely higher than other activities, with a mean of 71.46 times, vs. average 17.8 times of taking notes, 3.54 times of sharing materials, 2.10 messages posted in the discussion forum, 1.49 times of using hands-up facility, and 1.02 times of sharing notes.

Table 3 Results of analyzing students’ activities and message contents

Thus, the above results demonstrate that in online classes, our studied students were more active in communicating with others synchronously through exchanging messages in the chat room than asking questions directly through the hands-up facility. They also mostly took course notes when attending the real-time class, but their tendency to share notes with others was not strong. After class, they liked sharing learning materials, but the frequency of posting content in the discussion forum was relatively low. The students’ average assignment submission rate is 87.2%, indicating they submitted assignments actively. Their assignment submission initiative value is 53.1%, meaning students submitted their assignments around half of the time ahead of the expected deadline on average.

Regarding message content (see Table 3), the average sentence length of messages in the chat room is shorter than that of messages in the discussion forum (3.74 vs. 4.80). As the data are not normally distributed, we adopted the Mann-Whitney U test and observe a significant difference (Z=-2.611, p = 0.009, d=-0.288), inferring that students may prefer to write shorter sentences during synchronous communication. Students also used fewer personal pronouns, punctuation marks, and special words in chat messages, relative to their usage in forum messages. In addition, although the quantity of messages posted in the discussion forum is quite lower than that in the chat room, the quality seems better. Concretely, students used more social process words (M = 0.73 vs. 0.52 in chat messages, Z=-2.643, p =0.008, d=-0.292), affective process words (0.49 vs. 0.40, Z=-0.894, p= 0.371, d=-0.099), and cognitive process words (1.88 vs. 1.60, Z=-2.668, p =0.007, d=-0.295) than in messages posted in chat room, indicating they prefer to show their social presence and cognitive presence via the asynchronous communication tool. Another phenomenon is that both types of messages (in the chat room and the discussion forum) include more cognitive process words than social process and affective process words, implying that through sustained communication, students are more inclined to exert cognitive presence. In terms of the task engagement level of the messages, the discussion forum contains a high proportion of learning-related messages (52.30% fully-engaged and 29.90% somewhat-engaged messages, vs. 17.80% disengaged). In comparison, in the chat room, the students posted more disengaged messages (46.90%) than fully-engaged messages (16.60%) and somewhat-engaged messages (36.50%).

The Impact of Personality on Students’ Online Communication Behavior

We ran multiple linear regression (Seber and Lee 2012) for analyzing the impact of personality on students’ behavior in using different communication tools in a web-based learning system, for which students’ five personality traits are predictors and their behavior are dependent variables. This method enables us to see the relative effect of each personality trait. However, performing multiple testing may result in the inflation of Type I error (i.e., accepting “spurious” significance results as “real”) (Perrett et al. 2006). To solve this issue, we used a Bonferroni-type adjustment (Armstrong 2014), which is one of the commonly used methods for adjusting the significant levels of individual tests when multiple tests are performed on the same data. To be specific, the adjusted level of significance, in general α/k for k tests, is used to conduct each of the k individual tests (Perrett et al. 2006). Table 4 shows the results of multiple linear regression analyses, where p < 0.01 (= 0.05/5) indicates that changes in one predictor can be significantly associated with changes in the dependent variable. As we have a total of 120 features, we only list the features with significant values due to space limitations.

Table 4 Multiple linear regression of five personality traits on students’ online communication behavior in web-based learning systems (p < 0.01 with Bonferroni-type adjustment refers to the statistically significant relationship between the predictor and the dependent variable)

Personality and Activity Features

Looking at activity features, the results show that all of the five personality traits significantly affect students’ activities carried out both in and after class. Specifically, the number of messages students have posted in the chat room is significantly (p < 0.01) positively influenced by Conscientiousness and Extraversion, implying that students who are more self-organized and extraverted are inclined to post more synchronous chat messages. This finding is partially consistent with Shen et al. (2015) and Emerson et al. (2016)’s observation that higher Extraversion values lead to more active participation in communication-based activities such as synchronous chat, with the purpose of receiving gratification through interactions. As for the usage of keyboard and mouse in synchronous class, it shows that students who are more open-minded (with higher Openness to Experience) tend to use the keyboard more frequently, whereas those who are more self-organized (with higher Conscientiousness) tend to use the mouse more often. When combining all the activities carried out in class, we find that more introverted students (with lower Extraversion) are likely to engage in a larger number of in-class activities to compensate for any difficulties they may experience during the asynchronous environment (Amichai-Hamburger and Vinitzky 2010). In addition, those who score higher on Agreeableness (preferring to help others) are inclined to behave more actively in synchronous class. However, students’ personalities do not have a significant impact on their behavior of using the hands-up facility, taking notes and sharing notes in synchronous class.

In relation to after-class activities, the personality trait Conscientiousness plays a leading role. To be specific, students with higher Conscientiousness values tend to share their learning materials with others more frequently. Those self-disciplined students are also likely to have better assignment submission rates and submit their assignments before the deadline. In addition, students who are more extraverted are inclined to post more messages in the discussion forum, which is basically consistent with the findings in Pavalache-Ilie and Cocorada (2014). Moreover, the total number of after-class activities is significantly affected by Neuroticism in a negative way, indicating that more emotionally stable students tend to engage in more activities after class.

Personality and Content Features

As for the content features, we observe that personality traits make significant (p < 0.01) impacts on the features of linguistic process, psychological presence, and task engagement, in the conversational texts of both the chat room and the discussion forum. More specifically, in the first category, “linguistic processes”, we find that the personality traits Extraversion and Openness to Experience significantly influence students’ usage of personal pronouns. For example, more extraverted students tend to use more first-person plural pronouns when posting messages in the chat room. However, in the discussion forum, the first-person single pronouns and plural pronouns are likely to be preferred by those who are more open-minded and introverted respectively. More open-minded students also tend to post longer sentences when writing forum messages. In addition, the frequency of use of punctuation marks is also significantly affected by students’ personality. For instance, students who score higher on Openness to Experience and Conscientiousness are inclined to use more punctuation marks such as commas when writing messages in both the chat room and the discussion forum, probably because these open-minded and organized students are more willing to make their meaning clearer via punctuation. As for the usage of exclamation marks, we find that students who are more emotionally unstable (with higher Neuroticism) tend to have higher usage frequency in chat room for expressing their strong feelings. Personality can also reflect students’ writing styles in terms of their use of certain special words. For example, in the chat room, students who are more open-minded (with higher Openness to Experience) and competitive (with lower Agreeableness) tend to use more negation words such as “ (not)”, whereas those who are more extraverted (with higher Extraversion) tend to use more non-fluent words (e.g., “ (umm)”). While in the discussion forum, the filter words (e.g., “ (you know)”) and assent words (e.g., “ (ok)”) are more preferred by those who score higher on Openness to Experience.

In the “psychological presence” category, we find that the personality trait Extraversion significantly positively influences students’ usage of social words in chat messages, indicating that those students who are extraverted tend to show more social presence in the chat room. Considering affective process words, students who are more impulsive (with lower Conscientiousness) and emotional unstable (with higher Neuroticism) are inclined to use more anxiety words in the chat room. Sadness words are also more preferred by those who are emotional unstable. In addition, the numbers of both negative emotion words used in chat messages and positive emotion words used in forum messages are significantly affected by Openness to Experience in a positive manner, implying that students who are more open-minded tend to choose more positive and/or negative emotion words to express their emotions. Moreover, cognitive process words are preferred by those who score higher on Openness to Experience. Particularly, more open-minded students are likely to use more insight words in their chat messages and tentative words in their forum messages.

In the “task engagement” category, we find that more open-minded students are inclined to post higher numbers of fully engaged sentences in the chat room. On the other hand, the number of disengaged sentences posted in the chat room is significantly positively influenced by Conscientiousness and Agreeableness. In other words, those students who are more self-organized and easy-going are more prone to use words irrelevant to learning.

Moreover, as related studies show that there are several demographic properties associated with users’ personality (Lynn and Martin 1997; Costa et al. 2001; Chausson 2010; Wu and Chen 2015), we also investigate whether students’ gender and age could be used in predicting their personality. Specifically, through two independent sample t-test, we find that males scored significantly higher on Openness to Experience (t(162)= 2.899, p < 0.05) and Extraversion (t(162)= 3.046, p < 0.05) than females, consistent with Lynn and Martin (1997) and Costa et al. (2001)’s observations. However, our data reveal that students’ age fails to show significant correlation with their personality (p > 0.05 via Kendall’s tau Correlation). Additionally, using analysis of variance (ANOVA), we notice that the mean differences of each personality trait across three subject types (i.e., liberal arts, science, and engineering) are not significant (for Openness to Experience: F(2,161)= 1.322, p > 0.05; Conscientiousness: F(2,161)= 1.773, p > 0.05; Extraversion: F(2,161)= 1.359, p > 0.05; Agreeableness: F(2,161)= 0.264, p > 0.05; and Neuroticism: F(2,161)= 1.394, p > 0.05).

In summary, 32 (out of 120) features are empirically found to be significantly influenced by students’ personality. Of these features, gender is a domain-independent feature that can be applied to other domains except for web-based learning, and the other features are domain dependent. There are also more content-based features (22 vs. 9 activity features). Particularly, personality traits have stronger impact on the text contents of messages posted in the chat room, in comparison with contents extracted from forum messages. One more observation is that of the five personality traits, Openness to Experience, Conscientiousness, and Extraversion significantly influence larger numbers of students’ online communication behavior.

Personality Prediction

For the next step, we are interested in predicting students’ Big-Five personality based on the significantly influenced activity and content features identified in the previous section.

Inference Model

We normalized each feature fi into [0,1] via the logarithmic form of normalization: \(\bar f_{i}=\frac {\log _{10}f_{i}}{\log _{10}max}\), where max gives the maximum value among all of the samples. We then adopted the Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996) to train a regression model because of its ability to deal with the over-fitting problem that is likely to occur in our dataset (with more features and fewer samples). To enhance the prediction accuracy and interpretability of the statistical model, LASSO performs both variable selection and L1-regularization by shrinking some coefficients and setting others to zero. The regularization is a powerful mathematical tool for reducing over-fitting, as it adds a penalty term and controls the model complexity using that penalty term. In formal terms, LASSO is used to solve the following optimization puzzle (Tibshirani 1996):

$$ \min_{{\beta_{0}},\beta}\left( \frac{1}{2N}\sum\limits_{i = 1}^{N} (y_{i}-\beta_{0}-x{_{i}^{T}}\beta)^{2}+\lambda \sum\limits_{j = 1}^{p} |\beta_{j}|\right) $$

where N is the number of observations, yi is the response at observation i, and xi is a vector of p values at observation i. The parameters β0 and β are the scalar and p-vector LASSO coefficients respectively. λ ≥ 0 is a tuning parameter, controlling the number of nonzero components of β. In order to test the significance of predictor variable that enters the current lasso model in the sequence of models visited along the lasso solution path, we conducted a covariance test proposed by Lockhart et al. (2014), which is a test statistic that has a simple and exact asymptotic null distribution.

Procedure

We randomly selected 90% of 164 students who participated in our user survey to train each model and tested it on the remaining 10% of students. To avoid any biases, we performed 10-fold cross-validation, and measured the accuracy via the commonly used metrics Mean Absolute Error (MAE, the lower, the better), Root Mean Square Error (RMSE, the lower, the better), and Spearman’s rank correlation (the higher, the better) (Willmott et al. 1985; Zar 2005). All significance tests were done using two-tailed paired t-tests at the p <0.05 level.

Prediction Results

The evaluation results are shown in Table 5, where we present each personality trait’s prediction performance. We observe that LASSO achieves significant improvements against the baseline (that simply uses the average value of training data as the predicted score for all of the testing samples) in terms of all the five personality traits. To be specific, the prediction error of LASSO is significantly lower than that of baseline regarding MAE and RMSE (average MAE: 0.664 vs. 0.852, t=-8.8, p < 0.01; average RMSE: 0.842 vs. 0.900, t=-6.3, p < 0.01). As for Spearman’s rank correlation, the average value of five personality traits generated by LASSO is 0.594 while the value of baseline is -0.198. Actually, Spearman’s rank correlation measures the statistical relationship between predicted values and ground truth (the closer that the value is to 1, the better the prediction results). In addition to these values, we also report the improvement percentageFootnote 4 of MAE and RMSE that each model achieves against the baseline. For instance, the RMSE improvement percentage returned by LASSO w.r.t. Neuroticism is the highest (29.4%), followed by Conscientiousness (24.5%), Extraversion (22.4%), and Openness to Experience (19.6%). On the other hand, the relative accuracy increase of Agreeableness against the baseline is the smallest (12.4%).

Table 5 Least Absolute Shrinkage and Selection Operator (LASSO) regression results for predicting students’ Big-Five personality traits (Note: the value inside the parenthesis indicates the improvement percentage against the baseline approach, p <0.05 via two-tailed paired t-test)

Moreover, Table 6 lists the significant predictors of Big-Five personality traits, where the significance (p-value) is determined via the covariance test for LASSO (Lockhart et al. 2014). As for Openness to Experience, LASSO enters three predictors at the 0.05 level: the number of question marks appearing in each chat message, total number of keyboard activities, and the number of tentative words used in each forum message. Conscientiousness consists of four significant features which are all related to students’ behavior in synchronous chat room, including the total number of chat messages and three content features (i.e., the usage of all punctuation, commas, and anxiety words in chat messages). Regarding Extraversion, only two predictors are significant. One is students’ total number of messages posted in chat room, and another is their gender. Agreeableness is mainly inferred by two significant features including the total number of aggregated in-class activities and the number of negation words in each chat message. Finally, three predictors are entered via LASSO at the 0.05 level when predicting Neuroticism, including the numbers of exclamation marks and sadness words appearing in each message and the total amount of after-class activities.

Table 6 Significant predictors for inferring students’ Big-Five personality traits via Least Absolute Shrinkage and Selection Operator (LASSO) (p < 0.05, where p is determined by covariance test)

We further compare our prediction results with those of related studies that also infer Big-Five personality in similar domains (e.g., MOOC (Chen et al. 2016) and Moodle (Ghorbani and Montazer 2015)), or other domains (e.g., Twitter (Quercia et al. 2011; Adali and Golbeck 2012)). To make the comparison results more intuitive, we normalize the value of MAE and RMSE, and write them for “NMAE” and “NRMSE” respectively. In Table 7, compared with these related work, our model, which is derived from online communication behavioral features, achieves relatively lower prediction error rates and higher Spearman correlation in terms of all of the five personality traits. In addition, we observe some consistent results w.r.t. the prediction performances of each personality trait in our work and the relevant studies. For instance, Openness to Experience and Conscientiousness perform better in terms of the prediction errors (i.e., with lower MAE and RMSE) relative to the other three personality traits (Quercia et al. 2011; Adali and Golbeck 2012). While for Spearman correlation, both Chen et al. (2016)’s method and ours product relatively lower values in terms of Agreeableness.

Table 7 Comparison between our method and related work in terms of personality prediction

Discussion

Summary of Our Experimental Findings

To sum up, the experimental results show that students’ Big-Five personality traits can significantly influence their usage of communication tools in web-based learning systems. For instance, Conscientiousness significantly impacts users’ activities carried out both in class and after class, while Openness to Experience is the dominant personality trait among others that influences the content features extracted from students’ messages posted in both the chat room and the discussion forum. Another observation is that personality traits can affect larger numbers of content features relative to activity features, especially those features that represent linguistic processes in chat messages.

Moreover, our results demonstrate that students’ personalities can be effectively inferred from students’ online communication behavior. The LASSO model used in our work not only outperforms the baseline in terms of both prediction error and Spearman correlation, but also identifies the significant features for predicting students’ Big-Five personality. For example, Extraversion is significantly inferred by the number of messages students posted in the chat room and their gender, whereas Agreeableness is significantly predicted by the numbers of aggregated in-class activities and the negation words appearing in each chat message.

In our view, this research brings several practical implications. On one hand, the results of this study, which elucidate the impact of students’ personality on their online communication behavior in web-based learning system, can better explain individual differences in learning. For example, although communication has been proven to be important in online learning (Hmelo-Silver 2004), some studies have reported that students who prefer to study alone achieve better learning outcomes than those who prefer to study with peers (Reid 1987; Wallace 1992). Our findings may help educational psychologists to better elaborate and analyze this phenomenon from the angle of personality.

On the other hand, the ability to infer students’ personality from their online communication behavior can further allow instructors to take individual differences into account when designing their teaching strategies and course structures, which may improve the instructional design and make students’ learning processes more effective. To be specific, more personalized communication support could be provided in a way tailored to students’ spontaneous needs. For instance, a chat room could be incorporated into current classes and recommended to extraverted and conscientious students, who would be more likely to actively participate. Even before each course starts, some suggestions may be provided for each student on choosing different online instruction modes (i.e., synchronous and asynchronous) based on his or her predicted personality. For example, agreeable students are encouraged to join the synchronous class, as they enjoy connecting with peers and value getting along with others. In contrast, emotionally stable students are encouraged to receive the asynchronous mode of instruction, as they feel more comfortable learning when they have more reflection time.

Limitations

In predicting our students’ personality, we mainly relied on the features extracted from both synchronous and asynchronous communication tools such as chat room, hands-up facility, note-taking facility, discussion forum, materials-sharing facility, and so on so forth. However, on one hand, different forms of these features being integrated into course activities may make different impacts on the results. On the other hand, some other learning platforms may lack some of the features, especially those related to synchronous communication tools. In this case, our prediction model may not be directly applied to these learning platforms. Moreover, we used students’ behavior during the whole course to infer their personality, instead of analyzing the log data dynamically. If we are able to know how long students’ behavior must be collected before accurate personality predictions can be made or how the accuracy varies with the time that students spend on the web-based learning system, probably we could provide more timely and pertinent support during their learning process based on the predicted personality. Another potential limitation of our current work is that the participants who joined in this study may not be representative of the target population of all on-learning students, due to the differences of language and culture.

Future Work

Our work has several future directions. First, we plan to verify the effects of personality on larger-scale students with diverse backgrounds (e.g., age, nationality, ethnic background). Second, we will try to further improve our personality prediction model by extracting more semantic features from students’ conversational texts. Third, during experimental analysis, we will measure the correlations between personality and students’ learning outcomes to identify whether personality accounts for any additional variances in learning outcomes that cannot be explained by students’ behavior.