1 Introduction

Recently, there have been increasing numbers of people utilizing Question and Answer (Q&A) sites, which are communities where users can manually post questions and answers, such as Yahoo! Chiebukuro (Y!C)Footnote 1 [1]. These Q&A sites are thought of as databases which encompass massive amounts of knowledge to resolve a variety of matters. The basic flow of a Q&A system is as follows: a user posts a question, and others might respond. The questioner chooses the most appropriate answer as the “Best Answer” (BA) and provides the respondent with awards as a token. The BA is the response the questioner subjectively finds most fulfilling.

With more users of Q&A sites registered and questions posted, it is getting more troublesome for respondents to pick up questions that coincide with their specialty and interests. Hence, a question given by a user might not be browsed and replied to by qualified respondents. In addition, though Q&A sites are becoming the collective knowledge for society, inappropriate answers can also be accumulated. Many an inappropriate answer statement could be posted as well. Thus, no appropriate respondents could result in mismatching and the following issues:

  • Inappropriate answers may confuse the questioner and spread wrong knowledge.

  • The shortage of necessary knowledge prevents respondents from providing proper answers, leaving the question unsolved.

  • Abusive words, slander, or statements against public order and standards of decency might offend users.

Hence, requiring respondents to be users who are expected to provide appropriate answers is essential for storing appropriate answer statements. For the purpose of solving the issues explained above, a number of prior studies researching Q&A sites [2,3,4,5,6,7,8,9] with the employment of textual features or link analysis have been reported. Nevertheless, these works have yet to take into consideration the tendencies of the written styles of the users. Moreover, it is hard to say that a method to introduce appropriate respondents to a questioner has been settled yet. Thus, by gauging the impressions made by the statements, the objective of our work is to introduce appropriate respondents to a questioner. The promotion and extension of our work will contribute to the growing sphere of mere appropriate answer statements and make Q&A sites invaluable for societies, resulting in the swift and efficient promotion of social activities.

The aim of our work is thus to pose questions to users qualified to post proper answers to them, leading to curtailment of the problematic issues stated earlier. Through factor analysis applied to the experimental results, nine factors depicting the impression of Q&A statements have been captured [10]. Factor scores have then been estimated through multiple regression analysis from the 77 feature values of statements [11].

However, our method so far has largely depended on the syntactic information (Syn-Info) obtained through morphological analysisFootnote 2 (MA). In addition, the number of explanatory variables (EVs) is so enormous, resulting from regarding quadratic terms,Footnote 3 that the multiple regression equations to estimate factor scores employing them become tremendously complicated. Therefore, we have proceeded to estimating factor scores by utilizing the feature values of Syn-Info extracted through N-gram, one of the syntactic analysis methods like MA. [12]. As an initial step of N-gram, N was set to 2 in our previous work published as a conference proceeding [12].

In the previous analysis utilizing 2-gram instead of MA, in performing multiple regression analysis, the feature values based on 2-gram and those other than the Syn-Info were collectively employed as EVs, whereas the factor scores of their respective nine factors were set as respondent variables [12]. The analysis results have indicated that, for all these factors, the estimation result utilizing 2-gram has been nearly similar to or greater than the result employing MA [12]. Additionally, unlike the former method using MA where the quadratic term was indispensable for great estimation accuracy, a monadic term alone could be adequate for estimating factor scores and would contribute to fewer EVs with the simplification of the analysis results [12].

As an initial step to applying the N-gram so far, a mere 2-gram has been applied to the feature values of Syn-Info [12]. Therefore, in this paper, 3-gram is applied in place of 2-gram or MA. Similar to the previous analysis using 2-gram, through multiple regression analysis, the feature values based on 3-gram and those other than the Syn-Info are collectively utilized as EVs, whereas the factor scores are used as respondent variables. The further analysis results has shown that applying 2-gram and 3-gram show better estimation accuracy than MA. Comparing estimation accuracy for all nine factors, 2-gram shows the best results. It could also be suggested that in applying N-gram as Syn-Info, a mere 2-gram or 3-gram would be sufficient.

The rest of this paper is composed as follows. Section 2 introduces related works. As with our previous works, Sect. 3 summarizes obtaining factors of statement scores and estimating them. As with our previous work on applying N-gram, Sect. 4 explains multiple regression analysis utilizing 2-gram. Then Sect. 5 presents multiple regression analysis using 3-gram. Section 6 discusses considerations toward our analysis results are discussed. Finally, Sect. 7 concludes the paper.

2 Related Works

Numerous prior works investigating Q&A sites have been reported as follows: estimating BAs [2, 3]; introducing users to answer statements [4,5,6]; inspecting the quality or tendency of answer statements [7,8,9]; etc.

2.1 Estimation of BAs

Several works have tackled the estimation of BAs. Blooma et al. utilized a respective set of both five textual and non-textual features to predict the BAs [2]. Their analysis results have conveyed that textual features influenced the quality of the answers more than the non-textual ones did.

Calefato et al. assessed twenty-six BA prediction models in the following two steps [3]. Firstly, they studied the performance of models in predicting BAs in Stack OverflowFootnote 4 [13]. Then, they evaluated the performance in a cross-platform setting where the prediction models were trained on Stack Overflow and tested on other Q&A sites. Their analysis results showed that the choice of the classifier and automated parameter tuning would play a significant role in predicting the BA. It has also been shown that their method of BA prediction issues is generalizable across technical Q&A sites.

2.2 Introducing Users to Answer Statements

Several research studies have proposed introducing users to answers. Zhang et al. tackled the issues where the patients’ current usage of clinical data is considerably limited because of the technical nature of the clinical report [4]. With the rapid tendency for patients using online resources, e.g., Q&A sites, to acquire the knowledge by themselves, they analyzed Q&A statements posted in a Q&A site in order to shed light on what kinds of support people are providing to and receiving from the community and what contextual information they provide to deduce relevant answers. The analysis results revealed that users provided both objective and subjective information to the community. This emphasizes the importance of developing mechanisms to address the problem of the quality of online health information.

Haq et al. have done research on the Q&A site reputation through Quora, which is a Q&A platform that integrates elements of social networks to the traditional Q&A model [5]. In their recent study, they examined the impact of anonymity on the linguistic patterns, which were considered as playing a vital role in the involvement and grasp of the content. They then further developed their research on user interaction to demographics and analyzed its effect on topics engagement. They demonstrated that anonymity does not impact the polarity; and that anonymous answers and non-anonymous ones are drastically different from the viewpoints of length, subjectivity, and lexical diversity. It has been shown as well that stronger subjectivity contributes to more extreme polarity, partially because of the self-experience argued in the anonymous content.

Through a broad review of the present literature on expert recommendation, Yang et al. proposed four challenges [6]. Firstly, extant recommendation methods disregard the users’ willingness to keep contributing within the online knowledge community. Another proposal is insufficient information in user profiles which hinders identifying potential experts. Thirdly, recommending experts as a collaborative group rather than looking for familiar individuals could drastically enhance the recommended answer rate. Finally, it is vital to regard the self-evolution of present expert recommendation approaches.

2.3 Quality or Tendency of Answer Statements

Several works have inspected the quality or tendency of answer statements. Bornfeld et al. explored the influence of vote and comment feedback mechanisms on the survival of answer providers after posting their first answer [7]. Their analysis results showed a strong correlation between votes and comments after the first post.

With a view to improving the professionalism of the social Q&A community and lead ordinary users to post high-quality answers, Shi et al. focused on the answer contents by disregarding the differences in the ability of respondents and evaluations of other users [8]. Through the relevant literature reviews, three dimensions were constructed: text features, rhetorical features, and emotional features of answer content. Nine features were then identified that might influence the quality of answer contents. Their analysis results have provided suggestions for users to post higher quality answers in terms of content for the functional optimization of social Q&A communities from the perspective of user requirements.

Li et al. have explored the characteristics of high-quality academic answer statements across different question types to facilitate the academic social Q&A sites to recommend high-quality answers to users on the basis of different question types [9]. Their analysis results have revealed that for discussion-seeking questions, users put more weight on the authority of respondents and whether the answer contains social elements, while for information-seeking questions, users focus more on whether the answer refers to the theoretical basis.

2.4 Summary

Although these prior works have primarily developed their research by employing textual features or link analysis, the tendency of answer statements have not been adequately taken into consideration. Some users may write in a polite style, while others might prefer to post their response in a ruder tone. Some commonly prefer abstract words, whereas others are apt to use more concrete ones. On the contrary, we focus on using impressions on top of textual features. In addition, despite several prior studies in the literature that introduce users to answer statements as described [4,5,6], a method to introduce appropriate respondents to a questioner has yet to be contrived. Therefore, using the impression of statements, our work aims to introduce appropriate respondents to a questioner.

3 Previous Works

3.1 Factors of Statements

To evaluate impressions of answer statements, an evaluation experiment was performed with the cooperation of 41 evaluators. They were asked to evaluate the style or content of statements and allocate five-level labels from a list of 50 impression words [10]. The experimental materials were 12 sets of Q&A statements composed of the respective three sets from four categories: Auction, PC, Love, and Politics & social issues. These materials were selected from those virtually posted at Y!C [2] in 2005 [10].

Factor analysis was then applied to the experimental results to obtain factors. The factors indicated the nature of a statement, as interpreted through the several impression words allotted to the statement. These factors were named accuracy, displeasure, creativity, ease, persistence, ambiguity, moving, effort, and hotness. The factor scores were also obtained to use in describing the characteristics of Q&A statements.

3.2 Estimation of Factor Scores

3.2.1 Feature Values of Statements

At this point, the factor scores were calculated for merely the sixty experimental materials utilized in the experiment explained in Sect. 3.1. With the aim of estimating the factor scores of any statements, multiple regression analysis was performed on their 77 feature values [11]. These feature values adopted are shown in Table Table 4 Feature values on 2-gram [12]gFeature values: Syn-Info (2-gram)g78[Noun-Part]g79[Part-Verb]g80[Part-Noun]g81[Noun-Noun]g82[Sign-Noun]g83[Verb-Aux]g84[Part-Sign]g85[Sign-Part]g86[Aux-Part]g87[Noun-Aux]g88[Aux-Sign]g89[Verb-Noun]g90[Noun–Verb]g91[Aux-Noun]g92[Aux-Aux]g93[Sign-Sign]g94[Part-Part]1. They are explained and summarized in the following five categories [11]:

  1. (1)

    Syntactic Information (Syn-Info)

    First, Syn-Info was utilized as the feature values of statements including statistics of statements, e.g., number/length of statements, and number/percentage of Part-of-Speeches (e.g., nouns, verbs etc.), etc. Specific marks such as exclamation and question marks were employed as well [11].

  2. (2)

    Word Imageability (WI)

    WI was also regarded as the feature values of statements [11]. WI is a subjective attribute implying how diverse imaginations can be recalled from words. The characteristic value of WI ranges from 1 to 7 [11].

  3. (3)

    Closing Sentence Expressions (Closings)

    Closings were included in the feature values as well [11]. The fundamental Japanese words adopted were “zo,” “da,” “yo,” “ne,” “ka,” “na,” “shi,” “desu,” “masu,” “tai,” and “nai” [11]. The feature values of Closings consist of the closing sentence words, the appearance, and closing sentence appearances as well as those words themselves. Here, the “closing word” indicates the appearance of the word at the end of a sentence.

    Closing also includes the words “desuka,” “naidesu,” “masuka,” and “mashita,” which consist of two words of either “desu,” “ka,” “nai,” “desu,” and “masu.”

  4. (4)

    Word Familiarity (WF)

    WF is an index indicating how familiar people feel or think either aurally or visually with a word [11]. The score of WF ranges from 1 to 7.

  5. (5)

    Notation Validity (NV)

    NV indicates the validity of a word and is evaluated by an index ranging from 1 to 5 [11]. A word can possess multiple different styles or meanings. Taking an example of the Japanese word “kosho,” it could mean “breakdown,” “lake,” “name,” etc., and written in the style of Chinese characters, hiragana or katakana characters, or their mixtures thereof.

Table 1 77 Feature values of statements used for estimating factor scores [11]

3.2.2 Estimation Result

Multiple regression analysis was performed on the sixty Q&A statements utilized as the experimental materials in Sect. 3.1. Based on 77 monadic EVs, a total of 281 quadratic terms were set as explanatory variables, while factor scores for the nine factors were used as respondent variables.

The analysis result has shown that multiple correlation coefficients (MCCs), which indicate the estimation accuracy, were over 0.9 for all the nine factors [11]. Thus, all nine factors showed very good estimation accuracy.

4 Multiple Regression Analysis Using 2-gram

4.1 Aim

As summarized in Sect. 3.2, our method so far was largely dependent on the Syn-Info extracted through morphological analysis (MA). Moreover, employing quadratic terms has resulted in enormous EVs, leading to considerable complicated multiple regression equations utilized for estimating factor scores. Therefore, this paper aims to estimate factor scores employing the feature values of Syn-Info extracted through N-gram in place of MA. Using N-gram ought to result in higher estimation accuracy and provide more simplified equations to calculate factor scores.

4.2 N-gram

N-gram is also known as another method of syntactic analysis along with MA. N-gram depicts the adjacent sequence of N units of characters, morphemes, or Part-of-Speeches. Here, N is an arbitrary integer larger than 1 [14]. One question statement out of the sixty Q&A statements explained in Sect. 3.1 is utilized to show an N-gram Part-of-Speech example. The original Japanese question statements and their English translations are shown in Table 2. As a matter of convenience, the question is denoted as “QA04.”

Table 2 The original Japanese statements of QA04 and their English translations [12]

As for the 2-gram of QA04, their Part-of-Speeches, examples and frequencies are shown in Table 3. The column entitled “2-gram” have both literal notations and abbreviations. The notations “Noun,” “Verb” and “Sign” are used as they are, whereas “Adjective,” “Particle” and “Auxiliary” are abbreviated as “Adj,” “Part” and “Aux,” respectively. Thus, taking an example of the notation [Sign–Adj] shown in the first row, the 2-gram consists of a sign and an adjective. This provides one respective example each per 2-gram extracted from QA04 as shown in the column entitled “Example.”

Table 3 2-Gram and frequency for QA04 [12]

4.3 Analysis Method of 2-gram

In our previous analysis, 2-gram of Part-of-Speech was tentatively used instead of MA [12]. Here, 2-gram was applied to the sixty Q&A statements used for the experiment and stated in Sect. 3.1 to extract the feature values of 2-gram. Here, 2-gram was processed using RFootnote 5 [15]. At R, the library entitled RMeCab is installed so that N-gram as well as MA can be processed.

Similar to the analysis stated in Sect. 3.2, multiple regression analysis was run to obtain factor scores of the nine factors, which were used as the respondent variable. Meanwhile, as for a part of EVs, through trial and error, seventeen 2-gram of Part-of-Speeches were employed as feature values of Syn-Info, which are denoted as g78–g94 as summarized in Table 4. In this analysis, the feature values of Syn-Info on the basis of 2-gram (g78–g94) were used in place of those based on MA (g1–g36). In conjunction with WI, Closings, WF, and NV (g37–g77), a total of 68 feature values (g37–g94) were employed as EVs.

Table 4 Feature values on 2-gram [12]

4.4 Estimation Result

EVs with absolute values of the standardized partial regression coefficient (SPRCs) bigger than 0.1 were focused on. The EVs are summarized in Table 5. Among the EVs satisfying the condition of SPRCs over 1.0, the maximum three positive/negative strongest EVs were extracted for each factor. In the column entitled “FV,” the classifications of feature values are shown that coincide with the column entitled “EV” and that are shown in Tables 1 and 4.

Table 5 Explanatory variable (EV) and feature value (FV) with higher standardized partial regression coefficient (SPRC): 2-gram [12]

Similar to the results with MA depicted in Sect. 3.2, MCCs outnumbered 0.9 for all the nine factors. Moreover, MCCs were improved with the application of 2-gram rather than MA. Therefore, estimation accuracy utilizing 2-gram showed results almost equivalent or superior to those employing MA.

5 Multiple Regression Analysis Using 3-gram

5.1 Aim

In applying N-gram to our method so far, a mere 2-gram was applied to the feature values of Syn-Info. For further analysis, a bigger unit of N-gram than 2-gram must also be applied and analyzed. Therefore, in this paper, 3-gram was applied instead of 2-gram or MA. Similar to the previous analysis using 2-gram, the analysis method using 3-gram was performed through multiple regression analysis. The analysis result using 3-gram was then compared with those using 2-gram or MA to validate the effectiveness of the application of 3-gram.

The feature values based on 3-gram and those besides the Syn-Info were collectively utilized as EVs, whereas the factor scores for the respective nine factors were employed as respondent variables.

5.2 Analysis Method of 3-gram

Similar to the analysis method using 2-gram stated in Sect. 4.3, multiple regression analysis was processed. Factor scores of the nine factors are utilized as the respondent variable. In order to easily and directly compare the analyses between 2 and 3-gram, as for the feature values of Syn-Info, the amount of 3-gram extracted is the same as that of 2-gram, seventeen. These feature values are denoted as g95–g111 shown in Table 6.

Table 6 3-gram and frequency for QA04

In this analysis, the feature values of Syn-Info on the basis of MA (g1–g36) are replaced by those based on 3-gram (g95–g111). In conjunction with WI, Closings, WF, and NV (g37–g77), a total of 68 feature values (g37–g94) are utilized as EVs. Most of the abbreviations are already explained in Sect. 4.2, except one that had not appeared in Tables 3 or 4: “Adv” stands for adverb and is extracted as one component of 3-gram [Part-Sign-Adv] (g106).

5.3 Estimation Result

Similar to the former method utilizing 2-gram stated in Sect. 4.2, EVs with absolute values of SPRCs larger than 1.0 are summarized in Table 7. Among the EVs meeting the condition of SPRC over 1.0, the maximum three positive/negative strongest EVs are shown for each factor. However, there are several cases where the absolute values of SPRCs are below 1.0 for all the EVs: negative SPRC for the 2nd factor and both positive/negative SPRCs for the 5th factor. For these cases, only the one largest EV for positive/negative SPRC is shown. The columns entitled “FV” is explained in Sect. 4.4. The analysis result thus conveys that the MCCs for all nine factors outscore 0.9.

Table 7 Explanatory variable (EV) and feature value (FV) with higher standardized partial regression coefficient (SPRC): 3-gram

6 Considerations

In order to compare MCCs among 3-gram, 2-gram, and MA, these results are summarized in Table 8. From the viewpoints of MCCs, the figures with the case using 2-gram show the best results for five factors (2nd, 5th, 6th, 7th, and 8th), followed by those with the case employing 3-gram for three (3rd, 4th, and 9th) and those utilizing MA for one (1st). From these comparisons, using 2-gram is best among these three cases. Nevertheless, as a whole, MCCs are improved with the application of 2-gram or 3-gram. Therefore, it could be suggested that considering N-gram would outperform the analysis results using mere MA. It could also be suggested that considering 2-gram or 3-gram would be sufficient in applying N-gram. In other words, it would be unnecessary to analyze beyond 4-g with this method.

Table 8 Comparison of multiple correlation coefficients (MCCs)

These results could result from regarding 2-gram or 3-gram, which convey the collocations among two/three words. However, the associations among words are disregarded with the cases of MA. From these standpoints, regarding 2-gram or 3-gram could be more productive for estimating factor scores of Q&A statements.

In addition, in the previous analysis using MA, quadratic terms were required for good estimation accuracy. With the usage of N-gram, by contrast, monadic terms alone would be adequate in estimating factor scores. Thus, N-gram contributes to limiting the process to much fewer EVs, which results in the simplification of multiple regression equations to obtain factor scores.

Nevertheless, the meanings or contents of Q&A statements have not been considered for our analysis so far. Hence, with a view to regarding them, a meaning analysis needs to be applied to our method in the future. Moreover, it is indispensable to investigate if our proposed method utilizing N-gram can be extended to other languages.

7 Conclusions

In this paper, 3-gram was applied instead of 2-gram or MA. Similar to our previous analysis using 2-gram, through performing multiple regression analysis, the feature values based on 3-gram, and those other than syntactic information were collectively utilized as explanatory variables, while the factor scores for the respective nine factors were set as respondent variables. As a result of this further analysis, in comparing estimation accuracy for the nine factors among the cases using 2-gram, 3-gram, and MA, 2-gram showed the best results. As a whole, applying 2-gram or 3-gram would improve estimation accuracy more than MA would. In addition, it could also be suggested that a mere 2-gram or 3-gram would be sufficient in applying N-gram as syntactic information to our method.

For future work, the meanings and contents of Q&A statements must be taken into account for the analysis. Moreover, with the feature values of syntactic information based on MA, the factor scores obtained were subsequently employed for inspecting the possibility of detecting respondents who could be expected to post the appropriate answer to a newly posted question [16]. Therefore, whether the feature values based on 2-gram could be effective in finding appropriate respondents should be inspected and compared with the case of MA. Because most of the feature values used in this study are based on Japanese language materials, the generalization of these findings to other languages has to be addressed as another topic in our future work as well.