1 Introduction

Personality prediction has positive impacts in different fields like social media, career advising, relationship matching, resume filter, etc. It aids in social platforms to engage more users and get more advertisement. Recommendation systems are based on people’s choices and preferences which personality identification supplies [1]. In career advising, knowing the personality helps youngers to select their careers. Personality understanding supports dating applications to match personalities. Personality awareness assists human resources to pick target candidates. Personality prediction is a challenging and hard area of research; this task is complex and difficult for humans as it demands expert’s knowledge. It is useful in machine learning, psychology, natural language processing, behavior analysis and artificial intelligence [2].

The formal way to realize personality is a questionnaire based test. Humans are not keen to do this test for the reason that it is time-wasting and boring [3]. Predicting the personality without doing a questionnaire is demanded. The huge amount of textual data in social media has made it a rich environment for research. The availability of users’ profiles with massive text has encouraged the field research [4]. Text analysis is a complex research area.

Personality defines the person’s attributes’ individuality. The personality models are multiple such as Big Five Model, Three Factor Model, Myers Briggs Type Indicator (MBTI) and Enneagram. Enneagram explains nine different human behaviors and the relationship between them [5]. A personality model that explains inner behavior like motivations, desires, fears and features is called the Enneagram. It is far more complex than other models. It gives awareness of advantages and disadvantages for each personality which helps to reach personality’s growth [6]. It is composed of nine personalities: Reformer, Helper, Achiever, Individualist, Investigator, Loyalist, Enthusiast, Challenger and Peacemaker.

Enneagram is learned in a lot of USA universities with a variety of fields like business, education, arts, medicine and psychology [7]. It is a great tool which psychiatrists and psychologists gain advantage. Psychiatrists have employed the Enneagram from 1970 till now [8]. Knowing the patient’s personality assists them to give appropriate psychological aid. Counselors apply the Enneagram to figure out the attitude of the client that aids in recovery and development [9]. It is also helpful in human development and education.

Almost all of the personality detection research is based on machine learning and deep learning approaches. Personality prediction is a recent area of research in machine learning [10]. Machine learning has a lot of problems like transparency, consistency and dependability. Deep learning has drawbacks like large dataset requirements, high computational cost, data dependability, etc. These problems have an impact especially on natural language which prevents artificial intelligence from approaching human intelligence [11]. The focus of previous research on personality prediction has been on personality models like the Big Five model, MBTI and the Three Factor model. The Enneagram provides insights into our core motives and our purpose; it gives a considerably greater level of depth than other personality models. This is the first technique to predict Enneagram personality from text.

The Enneagram prediction system contains several phases: pre-processing, word-based feature representation, word-based feature selection and personality prediction. The task of cleaning, normalization, lemmatization and stemming is achieved through pre-processing. Conversion of text to words as features is done in the second step. Feature picking is accomplished using Enneagram ontology [12, 13] and English lexicon [14]. Personality prediction which is a statistical method to select highest probability distribution among personalities [15, 16]. This paper applies Enneagram personality prediction to 180 different Twitter profiles. The wide-scale profiles evaluate the applied technique. Investigating different personalities with multiple parameters are demonstrated.

The paper composes of multiple sections. These sections are Sect. 2: related work, Sect. 3: design of Enneagram personality prediction, Sect. 4: dataset, Sect. 5: results, Sect. 6: discussion and Sect. 7: conclusion. Section 2: related work presents previous work in personality prediction. Section 3: design of Enneagram personality detection demonstrates the stages and the applied techniques in personality prediction. Section 4: dataset illustrates the dataset sources. Section 5: results explains different personalities prediction outcomes, the personalities’ evaluations, the results in analytical view and the achievements. Section 6: discussion illustrates the analysis of the result and the explore the results impact. Section 7: conclusion summarizes the design, results, the inference and the future directions.

2 Related Work

Personality identification from text applies to two techniques: machine learning and text linguistics properties [2]. The dominant of the current research depends on various methods of machine learning. Most of the personality prediction applies Big Five Personality Traits as a personality model. Several future directions like other pre-processing, parsing, other classification algorithms, including more features and using other social media platforms.

A dynamic deep graph convolutional network (D-DGCN) was presented in the system. The MBTI model was used by the system. Future study, creating personality-based pre-training exercises would be interesting [17].

Personality prediction proposed the Big Five Personality Traits system which combined questionnaire and text based. The system used unsupervised Estimation. They plan to use advanced text processing models and different applications usage rather than personality [18].

Two attentions based which embed text and emoji to identify personality. The applied technique was deep Learning architectures. Big Five Personality Traits was the personality model. Their future enhancements are visual features application, other learning models implementation and model testing on different social media platforms [19].

Text classification was presented to identify personality. Radix tree conversion model, database saving, comma-separated value (csv) processing, sentence processing, trait estimation and ontology were the techniques applied. The personality model was Big Five Personality Traits. Their future modifications are different parsing algorithm usage, words counting according to weight classification and words corpus addition from different platforms [20,21,22].

The previous research enhancements are based on better pre-processing, more convenient algorithms for more accurate results and different personality models [23]. There are no previous techniques for Enneagram prediction which is far deeper than these models. This paper presents the first method for identifying the Enneagram personality from the text.

3 Design of Enneagram Personality Prediction

Twitter text is utilized as input to test the design. The design consists of four stages: pre-processing, feature representation, feature selection and personality prediction. The design is illustrated in Fig. 1.

Fig. 1
figure 1

Design of enneagram personality prediction

  1. 1.

    Pre-processing comprises cleaning, stemming, lemmatization and normalization. Cleaning includes many steps like removing punctuation, removing special characters, removing stop words and removing hashtags. Normalization involves normalizing Unicode, lower case folding and spell checking. Stemming and lemmatization is applied in order to pick the source of the word.

  2. 2.

    Preprocessed text is tokenized into words. Then, the word features are represented in a bag-of-words form. The bag of words contains the words and the count for each word in the text.

  3. 3.

    Feature selection is utilized in order to pick meaningful features. This step is done by using Enneagram ontology [12, 13] and thesaurus English lexicon [14]. The required knowledge of the Enneagram is represented by Enneagram ontology. Ontology is used to model domain knowledge [24]. English lexicon expands words selection by using equivalent words. A list of words for every personality is formed from the Enneagram ontology and the words from the lexicon with the same meaning. Then, the words are preprocessed. The selected features for every personality are created from the intersection of the personality list of words and the bag of words as shown in Fig. 2.

  4. 4.

    A statistical approach is applied to predict personality. Probabilities distributions are measured across different personalities of the Enneagram. The sum of the selected features’ counts for each personality is calculated. Personalities’ probabilities distributions are calculated by using the sum of words’ occurrences for each related personality as shown in Eq. 1.

    $$\begin{aligned} sfp = \sum _{i=1}^{n}c_i \end{aligned}$$
    (1)

    where sfp represents selected features per personality \(c_i\) represents count of feature which belongs to the personality in the text, n represents the number of features found in the text which belongs to the personality. The probability distribution for a personality equals the sum for this personality is divided by the sum of all personalities as illustrated in Eqs. 2 and 3.

    $$\begin{aligned}{} & {} tsf= \sum _{p=1}^{9} sfp \end{aligned}$$
    (2)
    $$\begin{aligned}{} & {} prob(p) = \frac{sfp}{tsf} \end{aligned}$$
    (3)

    where tsf represents total selected features for all personalities, p represents nine personalities in the Enneagram and prob(p) represents probability per personality. The highest probability indicates the predicted personality [15, 16].

Fig. 2
figure 2

Selected features per personality

4 Dataset

Enneagram dataset contains 180 profiles gathered from Twitter accounts. Dataset composes of Twitter identifier and Authors’ Enneagram personality type. The dataset composes of Twitter User’s Id and the Enneagram’s personality. The distributions of users’ counts across personalities are equal. This dataset contains 11.11% users for every personality. The dataset is collected from multiple sources. Enneagram’s official website [25,26,27,28,29,30,31,32,33] experts analyze multiple celebrities in different personalities. These celebrities are searched on Twitter. Their profiles are used as a part of the dataset. Many public profiles performed Enneagram assessment test and posted their results on Twitter. These profiles are also added to the dataset. The dataset link is here.

5 Results

The results are based on the largest words probability distribution related to a personality which are available in the text. For example, Reformer selected features are ‘critic’: 15, ‘right’: 10, ‘real’:2, ‘lead’: 8, ‘encourage’: 15. These selected features contain the found features in the bag of words related to the Reformer’s list personality and the occurrences count in the bag of words. The same applies to all other personalities’ lists. If the sum of words count for each personality are Reformer: 50, Helper: 30, Achiever: 40, Individualist: 8, Investigator:16, Loyalist: 20, Enthusiast:4, Challenger: 10, Peacemaker:12; then, the sum of words counts for all personalities equal to 200. The probability distribution is the sum of words count for each personality is divided by the total sum of words count for all personalities. The probability distribution for each personality is Reformer: 30 %, Helper: 15 %, Achiever: 20%, Individualist: 4%, Investigator: 8%, Loyalist: 10%, Enthusiast: 2%, Challenger: 5%, Peacemaker: 6%. The largest probability distribution is the 30 % which is related to Reformer. Then, the predicted personality is the Reformer. In case, there are two personalities with the same percentage then the system gives priority to the wing value (every personality has a wing (the personality before or the personality after)).

Enneagram personality prediction is applied on different Twitter’s profiles. Twitter Text is utilized using bio text and a different number of tweets. Multiple number of tweets are 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 which are embedded as a parameter. The outcomes are several across personalities and number of tweets. Reformer results are 70% correctly classified with biography text and 30 tweets. Helper outputs with biography text and 10 tweets are 50% right identified. Results of Peacemaker with 30 tweets and biography text are 50%. All personalities’ outcomes precisely predicted with variable parameters are shown in Table 1.

Table 1 Enneagram personality prediction results recall details

Precision results varies the highest among every personality. These values are Reformer:0.61, Helper:0.77, Achiever:0.62, Individualist:0.80, Investigator:0.35, Loyalist:1.00, Enthusiast:0.36, Challenger:1.00, Peacemaker:0.89 as shown in Table 2. F1-score calculations have different values. These values have the highest among every personality Reformer:0.75, Helper:0.61, Achiever:0.46, Individualist:0.45, Investigator:0.51, Loyalist:0.40, Enthusiast:0.53, Challenger:0.56, Peacemaker:0.57 as shown in Table 3. Accuracy results have multiple variations across personalities. The highest value per personality is Reformer:0.92, Helper:0.93, Achiever:0.89, Individualist:0.91, Investigator:0.82, Loyalist:0.92, Enthusiast:0.80, Challenger:0.92, Peacemaker:0.93 as shown in Table 4.

Table 2 Precision results details
Table 3 F1-score results details
Table 4 Accuracy results details

The results differ from one personality to another. The personalities that have the highest predictions at 10 tweets are the Helper, the Achiever, the Loyalist and the Challenger. The Individualist has the largest outcome value in 10 and 20 tweets. The Peacemaker has the largest classified result in 30 tweets. The Reformer’s predictions are the highest in 60, 70 and 80 tweets. The Investigator’s outcomes are the largest in 60, 80 and 100 tweets. The biggest values of the Enthusiast’s outputs are in 50, 60, 70, 80, 90 and 100 tweets. Further improvements are required across all personalities’ predictions, especially in the Helper and the Loyalist.

Noise words affect the results as the lexicon retrieves multiple meanings for every word. Preprocessing sometimes cannot pick the source words correctly. The word lists need more enrichments from the same meaning to expand the feature selection. In the future, several modifications are requested to increase the percentage of correctly classified results. Meaning selection is recommended to avoid noise words. Better preprocessing techniques are desired to detect more features. Various lexicons will be used to expand vocabulary.

Total results vary across different personalities from the highest one, which is 95%, to the lowest one which is 4%. Results are evaluated by computing accuracy, precision, recall and f1-score. Total precision values range from 31% to 88%.Total recall outputs are from 4% to 95%. Total f1-score calculated results are within 7% to 62%. Total accuracy computations vary from 78% to 91%. Results are illustrated in Fig. 3 and Table 5.

Fig. 3
figure 3

Enneagram personalities prediction total result per personality

Table 5 Enneagram personality prediction total results per personality

The evaluation of this system has different values. f1-score large values are Reformer 62%, Peacemaker 51% and Enthusiast 49%. Minimum estimates of this score are Loyalist 7%, Helper 13% and Achiever 27%.

6 Discussion

One of the future directions of past research, applying other personality models that understand human behavior deeply; directing systems toward more intelligent behavior. Enneagram has greater benefits than these models; it describes desires, fears, motivations and areas of growth.

Previous work was applied on different personality theories including the Big Five Model, Three Factor Model, MBTI, etc. These models are based on certain traits to be either positive or negative. These traits are independent and do not relate to each other. Most of this research was dependent on the Big Five Model. In this model, there are five different traits: openness, conscientiousness, extraversion, agreeableness and neuroticism. A person’s personality has these five traits with various values. These values are of a binary nature either true or false. Big Five Model personality detection is composed of 5 binary classifiers. The following Table 6 demonstrates a previous work result on Big Five Model.

Table 6 Big five personality traits results

This Enneagram text-based personality prediction is the first system to identify personality from text. Enneagram personality prediction is harder as it contains 9 personalities. It is a multi-class classification problem. The result is one of the 9 personalities. A person is one personality of the following: Reformer or Helper or Achiever or Individualist or Investigator or Loyalist or Enthusiast or Challenger or Peacemaker. Performance measures are computed to evaluate the output. These measures are precision, recall, f1-score and accuracy. The range of precision values is 31% to 88% . Outputs for recall range from 4% to 95%. Results of the f1-score calculation range from 7% to 62% . Calculations of accuracy range from 78% to 91% as shown in Fig. 3 and Table 5.

This system recommends more study in this regard. The consequences serve recommendation systems, dating applications, education, human development, psychiatrists, physicians and psychologists. Enneagram prediction system aids to boost their business. This information gives them awareness of the person’s objectives and priorities. Noise words have an impact on the outcomes due to the lexicon yielding numerous potential meanings for each word. Preprocessing can occasionally identify the source words inaccurately. The number of words with the same meaning still needs expansion. There is not a large Enneagram dataset that is available. The enhancements will include meaning selection, improved preprocessing methods, and the use of many lexicons to maximize the accuracy of personality classification. The system needs to investigate more Enneagram datasets in various websites and social media platforms. This leads to discover and research more in this point.

7 Conclusion

Enneagram personality prediction system is presented. Personality prediction depends on ontology, lexicon and a statistical method. Twitter text is the input to the system. The procedures are pre-processing, feature representation, feature selection and personality prediction. pre-processing involves normalization, cleaning, stemming and lemmatization. Feature representation is tokenized sentences to words. Word based selection is dependent on the Enneagram ontology and the lexicon. The last step is personality prediction. This is accomplished by a statistical technique. The approach is based on picking the highest probability calculated among personalities.

The results vary across Enneagram personalities. The results indicate that the Enthusiast personality has the highest recall percentage which is 95%. The results of the Reformer and Investigator personality are also among the high recall. Medium recalls are the Individualist, Achiever and Challenger personalities. Low recalls are the Helper and Loyalist personalities’ results. This Enneagram text-based personality prediction is the first system to determine Enneagram personality from text. Enneagram personality prediction system provides a good start. In the future, several enhancements will be applied like better pre-processing techniques, meaning selection from the lexicon and using multiple lexicons. Better processing techniques target is to select more related features which increases accuracy. Meaning selection from the lexicon purpose is to select certain meaning; it will be employed to avoid noise words. Several lexicons should be used to increase the related words for each personality.