Author Gender Identification Considering Gender Bias

Jeyaraj, Manuela Nayantara; Delany, Sarah Jane

doi:10.1007/978-3-031-26438-2_17

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1662))

Included in the following conference series:

Irish Conference on Artificial Intelligence and Cognitive Science

8493 Accesses

Abstract

Writing style and choice of words used in textual content can vary between men and women both in terms of who the text is talking about and who is writing the text. The focus of this paper is on author gender prediction, identifying the gender of who is writing the text. We compare closed and open vocabulary approaches on different types of textual content including more traditional writing styles such as in books, and more recent writing styles used in user generated content on digital platforms such as blogs and social media messaging. As supervised machine learning approaches can reflect human biases in the data they are trained on, we also consider the gender bias of the different approaches across the different types of dataset. We show that open vocabulary approaches perform better both in terms of prediction performance and with less gender bias.

You have full access to this open access chapter, Download conference paper PDF

Biological gender identification in Turkish news text using deep learning models

Article 08 November 2023

A New Term Weight Measure for Gender Prediction in Author Profiling

Investigating the Role of Emotion-Based Features in Author Gender Classification of Text

Keywords

1 Introduction

During the 2017 Labor leadership election in Britain, an analysis of the language used in news articles about the candidates showed discrepancies related to their gender in how they were described^{Footnote 1}. The single male candidate was more likely to be discussed in terms of professional employment, politics and law and order and the two female candidates were much more likely to be discussed in terms of their families, in particular their fathers.

The language style, choice of words, etc. in text differs between men and women [3]. This can be viewed from 2 perspectives; one is towards the subject of the text (inferring whether the person discussed in the text is male or female), and the other is towards the author of the text (inferring whether the author of that text is male or female based on their style of writing). Our focus in this paper is on author gender identification which is the latter case.

Previous research in supervised learning for author gender prediction has generally used a closed vocabulary approach [9, 36]. The vocabulary used to represent the text is typically a list of characteristics of the text structure and content such as character frequencies and word or sentence count, vocabulary richness measures and the frequencies of an extensive list of predefined set of words and phrases identified through psychological or linguistic studies. In contrast, we show that an open vocabulary approach using feature selection, a data-driven approach that dynamically identifies the words that are more predictive of the author gender, performs significantly better than the closed vocabulary approach.

We evaluate the closed and open approaches on different types of textual content including (i) user-generated content which reflects the more modern, digital writing style such as tweets and blogs and (ii) text content that follow a more conventional writing style using eBooks from the Gutenberg digital repository.

Prediction models are often trained on datasets that reflect human bias and learn the same biases provided as examples to them [8]. This can lead to models making biased decisions that reflect human biases, including gender bias [37]. We show that the open-vocabulary approach displays significantly less gender bias than the closed approaches across all the datasets.

We also explore a hybrid closed and open approach, using a significantly smaller set of features which we call the POS (Parts-of-Speech) feature set. Though these POS features reflect a closed-vocabulary approach as they measure the proportions of word usage in text, they can be considered as moving towards a more open-vocabulary approach as they capture how different parts of speech are used. We found that combining the proposed POS feature set with a features obtained using an open vocabulary approach increases the capacity to identify author gender without having a significant impact on gender bias.

The rest of the paper is structured as follows. The following section outlines related work in author gender prediction. Section 3 outlines our methodology, Sect. 4 presents our evaluation and results while we conclude and outline our future work in Sect. 5.

2 Related Work

Initial work in the area of attributing text content to author gender used closed vocabularies and statistical methods [5, 7]. The closed vocabularies used extensive lists of stylometric textual characteristics, e.g., word frequencies, word length, and sentence count [4]. Since such count-based features were characterized by the length of the text, lists of vocabulary richness measures such as the hapax’ legomena, Yule’s K, etc., that described the lexical structure of a document independent of the length of text, were introduced [20, 40]. These vocabulary richness measures were originally defined for the author attribution tasks [20], but over time were adapted for author gender prediction [11, 21, 25, 40].

In addition to using stylometric features, researchers started exploring if the use of particular words in text can be attributed to a particular gender [26, 28]. This gave rise to the use of function words which include article words, pronouns, conjunctions, etc. as closed-vocabulary features [16]. Building on the idea of using a predefined dictionary of words as features, Tausczik et al. [38] used a set of words and phrases introduced by Pennebaker et al. [31] in their study on the psychometric properties of words from psychological or linguistic studies. These features were known as the LIWC features (Linguistic Inquiry and Word Count) [30].

Gradually, researchers started exploring the application of supervised ML techniques on these closed vocabulary features [2, 3, 6, 12]. A variety of classification techniques have been used, including Winnow [12], Decision trees [2], SVM [9], Random forest [34]. The limitation with the closed-vocabulary approach is that it requires an extensive list curated by humans, based on the counts or number of occurrences of words. As an example, the popular LIWC2015 dictionary is an extensive list of approximately 6,400 identified words [30]. Cheng et al. [9] chose to use 545 closed vocabulary features where they included function words as features on top of stylometric features. Feature selection techniques were then applied to reduce this vocabulary. Koppel et al. [24] attempted to identify the optimal number of features that can effectively predict an author’s gender by performing feature reduction using multiplicative update rules where a weight vector is learned by iteratively going through each training instance. After the weights for all features are learned, the less prominent features displayed a weight that tend to zero. Using such a feature selection method, they were able to observe that the top 64 to 128 features were sufficient to effectively predict an author’s gender.

Researchers started exploring open vocabulary methods to automatically identify content-based features that are indicative of an author’s gender. Open vocabulary methods typically use a Bag of Words approach to identify the vocabulary across all training data. This resulted in very high dimensionality and a sparse representation. Hence, topic modelling approaches were used to identify a reduced set of features [23] which were shown to perform better than the closed vocabularies on the task [41]. One study found a subset of 83 closed vocabulary features outperformed content based features [41]. However, the comparison is against the top 1000 to 3000 content words with the highest tf-idf frequency values which does not necessarily select content features that are useful for distinguishing male and female authors.

The classification techniques used ranged from logistic regression [9], Adaboost [34], Random forest [29], through to SVM with linear kernel [9, 18]. The datasets used varied from proprietary non-open datasets from Facebook [15], blogs [27], news corpora [9], short-messaging-service (SMS) texts [14], to publicly available data such as the original Enron dataset^{Footnote 2} which originally had gender information but this has now been removed from the dataset [12].

The PAN CLEF (Conference and Labs of the Evaluation Forum) 2017 challenge has involved differentiating human authored from bot-generated text in twitter data and included the task of author gender identification. Some of the approaches to this challenge used word embeddings to represent the text [1, 10] however the best performing approach used tf-idf representation with topic modelling in the multi-class classification task of identifying bot-generated from male and female authored tweets.

The closest work to ours is the work done by Fatima et al. [15] which concluded that content based approaches with feature selection can be used for multilingual text. They evaluated a range of classification and feature selection approaches on a single proprietary Facebook posts and comments dataset. Our focus is on different styles and lengths of English language content and we consider gender bias.

3 Approach

We used 4 different datasets, each being representative of different lengths of text and different writing styles (traditional and more modern user-generated content). The characteristics of the datasets used are included in Table 1.

Table 1. Dataset description.

Full size table

The Twitter dataset is adapted from an original dataset provided by Rangel et al. [33] which was used to differentiate bot-generated tweets from human-authored tweets. We removed the bot-generated tweets and used only those generated by either a male or female human author. The dataset includes 100 tweets for each author and is a balanced dataset with 50% female-authored and 50% male-authored tweets. With the maximum number of characters in a tweet being 140 characters, this dataset is considered as short text content.

The Race-gender Blogs dataset was taken from the recent work published by Kambhatla et al. [22] where it was used to identify racial stereotypes using identity portrayal. The dataset was compiled from crowd-sourced workers on prolific.com where they were asked to provide blogs they’ve written with self-identified gender and racial information. This dataset is labelled as the author gender for each blog text is known.

The Blogger Blogs dataset was adapted from a dataset published by Schler et al. [35] which was scraped from blogs over 200 words published on blogger.com that included author-provided indication of gender. We removed blogs that contained words from languages other than English ending up with 72,789 blogs from 19,230 unique authors, with 57% male- and 43% female-authored instances.

The eBooks dataset is a set of English language long-text eBooks freely indexed by epub and kindle eBooks under the Gutenberg eBooks project [17]. Since the author gender is not available with the meta-data for each eBook, we used gender.api^{Footnote 3} and genderize^{Footnote 4} APIs to infer the gender of the author based on their first name/s. The books where the gender inferred from both APIs matched were retained. There are significantly more male authored books available in Gutenberg than female authored books. We took all female-authored books available to us and randomly selected an equal number of male-authored books for our dataset. The resulting dataset included 18,398 books equally balanced between male and female authors.

For our evaluation, all 4 datasets above are split on a train-test split of 70:30. Parameter tuning was performed on the training data using cross validation to obtain the optimal set of hyper parameters for the SVM classifier.

We considered different feature sets to observe the effect that these features have in predicting the gender of the author from text. Our aim was to explore the differences between using the existing closed vocabulary feature sets and more open vocabulary feature sets that are derived from the textual content.

Closed-vocabulary features were derived from work by Koppel et al. [24] and Cheng et al. [9]. We implemented 66 stylometric character, word and structural features that were commonly identified as the significant discriminators of gender from the above research works (see Fig. 2).

Table 2. 66 Stylometric closed-vocabulary features.

Full size table

In addition, all 373 function word features presented in Cheng et al. [9] were included in our closed-vocabulary features as well. This rendered a closed-vocabulary feature set of 439 features.

Content features are the dynamic, open-vocabulary words obtained directly from the text. We used a tf-idf term weighting representation to represent our open-vocabulary content features similar to [10]. This results in a very high dimensional, sparse vector representation for each document. We used a Chi-squared filter feature selection technique on each dataset and selected the top ranking 10,000 features as our open-vocabulary representation which we call the content features. In our evaluation, we explore the impact on performance of different numbers of content features from the open vocabulary set.

POS Proportion Features. The function words used in the closed vocabulary approach try to capture differences in gender writing style identified by linguistic and psychological studies [13]. Inspired by this, we used a feature set of 16 features which we call the POS features. They capture the frequency of use of different types of words which are identified by part-of-speech tagging the text content. Table 3 lists these features. While these may appear more like closed vocabulary features, the fact that they focus on different types of speech based on the word’s syntactic function rather than a lexicon of words moves this set towards the open vocabulary approach.

Table 3. POS Features.

Full size table

We used an SVM classifier with a linear kernel as the classifier in our experiments. Preliminary results on the performance of a variety of classifiers across both open- and closed-vocabulary features showed that the SVM with a linear kernel performed consistently well. In addition, SVMs are commonly used for text classification tasks [39, 42].

To measure task performance on the task of gender author classification we used average class recall or accuracy across the male and female authored classes. To measure the gender bias of a model that predicts author gender we used the $TPR_{gap}$ measure [32], as defined in Eq. 1 which measures the differences in the gender specific true positive rates.

$$\begin{aligned} TPR_{gap} = | TPR_{male} - TPR_{female} | \end{aligned}$$

(1)

This measure is an equality of opportunity measure where predictions are independent of gender but conditional on the ground truth or actual outcomes in the training data [19]. This uses a democratic parity measure which insists on equal outcomes for both genders regardless of prevalence or ground truth.

4 Evaluation

Figure 1a shows the average class accuracy on different feature sets across all the datasets.

The content feature set which is the open vocabulary approach significantly outperforms the closed vocabulary features across all three datasets. The newly proposed 16 POS features perform better than the closed vocabulary features on the more structured, long-text eBooks dataset but does not work as well as the closed-vocabulary features on the user-generated content in the twitter and blogs datasets. This may be due to the nature of user generated digital content such as tweets and blogs which can have irregular and incomplete sentences and depend more on the use of slang, acronyms and emoticons. As the POS feature set uses different types of speech based on the word’s syntactic function this requires the text to have a certain level of structure to it. However with only 16 features in the POS feature set, it performs very well compared with the significantly larger numbers of features required by the other two feature sets.

Figure 1b shows the performance of the classifier as the POS features are combined with the open-vocabulary content features. Here, adding the 16 POS proportions to the content features increased the performance across all 4 datasets.

We also evaluated the feature sets for bias using the $TPR_{gap}$ gender bias measure shown in Eq. 1. Figure 2 shows the gender bias of the classifier for each of the feature sets. The higher the value the more gender bias displayed. Bias displayed on the right side of the figure indicates than more male-authored documents are classified correctly than female-authored documents, meaning more female-authored documents are predicted as male than vice versa. We consider this as male gender bias. Bias displayed on the left side of the figure indicates female gender bias.

Overall the content features from the open vocabulary approach displays less gender bias than the closed vocabulary approach. Both approaches display mostly male gender bias across all four datasets with the level of gender bias for the eBooks dataset on the closed vocabulary features exceedingly high at 66%.

The POS features display significantly less gender bias across all datasets except the blogs from the blogger dataset. Also, the POS feature set interestingly shifts the bias more towards female bias than male bias, particularly for the user-generated content. Though the addition of the POS features to the content features increased the prediction performance for all datasets, it has only shown a positive influence in reducing the gender bias for the more traditional eBooks dataset with the bias for the user-generated content datasets remaining more or less the same.

Given the good performance of the content features, we explored the impact of the number of content features used.

Figure 3 shows the average class accuracy as the number of features used increases for the eBooks, Blogger Blogs and Twitter datasets.

The graph shows that the performance for the Blogger Blogs and eBooks datasets level out at around 10,000 features but the performance steadily increases for the Twitter dataset. In fact, the performance continues to increase steadily even after 30,000 features with a classification performance of 0.8 at 100,000 features. This is not surprising as the Twitter dataset is considered short-text and the lack of text content would result in a very sparse representation reducing the signal in the text.

5 Conclusion

This research presents the impact of closed-vocabulary features and open-vocabulary features on author gender identification in terms of accuracy and gender bias. We were able to observe that open vocabulary features perform better than closed-vocabulary features in accurately identifying an author’s gender from text. In addition, we also propose a much smaller set of 16 POS features that reflect the frequency of usage of different parts-of-speech in the content. We suggest that these follow a more open-vocabulary approach. Though these POS features do not outperform the content features, they show much less gender bias as well as an interesting shift to female bias for the user-generated content. The addition of POS features to content features increased the prediction performance across all datasets while not significantly impacting the gender bias of the models.

As shown in Fig. 2, though the POS features display a generally lower gender bias than the content features, the addition of POS features to content features does not necessarily reduce the gender bias on user-generated content. Hence, further experimentation is required to explain this behaviour for the user-generated content.

By identifying the features that are highly predictive of the author’s gender, we hope to explore methods to effectively recommend linguistic modifications and provide positive reinforcement to authors about their language use to prompt a more gender-neutral writing style.

Notes

1.
Gender bias in Political description of candidates: https://www.theguardian.com/technology/2017/apr/13/ai-programs-exhibit-racist-and-sexist-biases-research-reveals.
2.
Enron dataset: https://www.cs.cmu.edu/~enron/.
3.
Gender-api: https://gender-api.com/.
4.
Genderize API: https://genderize.io/.

References

Akhtyamova, L., Cardiff, J., Ignatov, A.: Twitter author profiling using word embeddings and logistic regression. In: CLEF (Working Notes) (2017)
Google Scholar
Apte, C., Damerau, F., Weiss, S.M., Apte, C., Damerau, F., Weiss, S.: Text mining with decision trees and decision rules. In: In Proceedings of the Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web. Citeseer (1998)
Google Scholar
Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. Text Talk 23(3), 321–346 (2003)
Google Scholar
Aries, E.J., Johnson, F.L.: Close friendship in adulthood: conversational content between same-sex friends. Sex Roles 9(12), 1183–1196 (1983)
Article Google Scholar
Baayen, H., Van Halteren, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)
Article Google Scholar
Burger, J.: Discriminating gender on Twitter. EMNLP-Association for Computational Linguistics (2011)
Google Scholar
Burrows, J.F.: Not unles you ask nicely: the interpretative nexus between analysis and information. Lit. Linguist. Comput. 7(2), 91–109 (1992)
Article Google Scholar
Cadwalladr, C.: Google, democracy and the truth about internet search. Guardian 4(12), 2016 (2016)
Google Scholar
Cheng, N., Chandramouli, R., Subbalakshmi, K.: Author gender identification from text. Digit. Investig. 8(1), 78–88 (2011)
Article Google Scholar
Daneshvar, S., Inkpen, D.: Gender identification in twitter using N-grams and LSA. In: proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018) (2018)
Google Scholar
De Vel, O., Corney, M., Anderson, A., Mohay, G.: Language and gender author cohort analysis of e-mail for computer forensics. In: Proceedings of Digital Forensics Research Workshop, pp. 1–16 (2002)
Google Scholar
Deitrick, W., Miller, Z., Valyou, B., Dickinson, B., Munson, T., Hu, W.: Author gender prediction in an email stream using neural networks (2012)
Google Scholar
Eichstaedt, J.C., et al.: Closed-and open-vocabulary approaches to text analysis: a review, quantitative comparison, and recommendations. Psychol. Methods 26(4), 398 (2021)
Article Google Scholar
Fatima, M., et al.: Multilingual SMS-based author profiling: data and methods. Nat. Lang. Eng. 24(5), 695–724 (2018)
Article Google Scholar
Fatima, M., Hasan, K., Anwar, S., Nawab, R.M.A.: Multilingual author profiling on Facebook. Inf. Process. Manag. 53(4), 886–904 (2017)
Article Google Scholar
Garcia, A.M., Martin, J.C.: Function words in authorship attribution studies. Lit. Linguist. Comput. 22(1), 49–66 (2006)
Article Google Scholar
Gerlach, M., Font-Clos, F.: A standardized project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics. Entropy 22(1), 126 (2020)
Article Google Scholar
Grivas, A., Krithara, A., Giannakopoulos, G.: Author profiling using stylometric and structural feature groupings. In: CLEF (Working Notes) (2015)
Google Scholar
Heidari, H., Loi, M., Gummadi, K.P., Krause, A.: A moral framework for understanding fair ML through economic models of equality of opportunity. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 181–190 (2019)
Google Scholar
Holmes, D.I.: Authorship attribution. Comput. Humanit. 28(2), 87–106 (1994)
Article Google Scholar
Hoover, D.L.: Another perspective on vocabulary richness. Comput. Humanit. 37(2), 151–178 (2003)
Article Google Scholar
Kambhatla, G., Stewart, I., Mihalcea, R.: Surfacing racial stereotypes through identity portrayal. In: 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 1604–1615 (2022)
Google Scholar
Kiatkawsin, K., Sutherland, I., Kim, J.Y.: A comparative automated text analysis of Airbnb reviews in Hong Kong and Singapore using latent Dirichlet allocation. Sustainability 12(16), 6673 (2020)
Article Google Scholar
Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17(4), 401–412 (2002)
Article Google Scholar
Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, F.: Chat mining for gender prediction. In: Yakhno, T., Neuhold, E.J. (eds.) ADVIS 2006. LNCS, vol. 4243, pp. 274–283. Springer, Heidelberg (2006). https://doi.org/10.1007/11890393_29
Chapter Google Scholar
Mehl, M.R., Pennebaker, J.W.: The sounds of social life: a psychometric analysis of students’ daily social environments and natural conversations. J. Pers. Soc. Psychol. 84(4), 857 (2003)
Article Google Scholar
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: EMNLP, pp. 207–217 (2010)
Google Scholar
Mulac, A., Bradac, J.J., Gibbons, P.: Empirical support for the gender-as-culture hypothesis: an intercultural analysis of male/female language differences. Hum. Commun. Res. 27(1), 121–152 (2001)
Article Google Scholar
Palomino-Garibay, A., et al.: A random forest approach for authorship profiling. In: Proceedings of CLEF (2015)
Google Scholar
Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of LIWC2015. Technical report (2015)
Google Scholar
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count: LIWC 2001. Mahway Lawrence Erlbaum Assoc. 71(2001), 2001 (2001)
Google Scholar
Prost, F., Thain, N., Bolukbasi, T.: Debiasing embeddings for reduced gender bias in text classification. GeBNLP 2019 9573, 69 (2019)
Google Scholar
Rangel, F., Rosso, P.: PAN19 author profiling: bots and gender profiling (2019). https://doi.org/10.5281/zenodo.3692340
Sboev, A., Litvinova, T., Gudovskikh, D., Rybka, R., Moloshnikov, I.: Machine learning models of text categorization by author gender using topic-independent features. Proc. Comput. Sci. 101, 135–142 (2016)
Article Google Scholar
Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, vol. 6, pp. 199–205 (2006)
Google Scholar
Simaki, V., Aravantinou, C., Mporas, I., Kondyli, M., Megalooikonomou, V.: Sociolinguistic features for author gender identification: from qualitative evidence to quantitative analysis. J. Quantit. Linguist. 24(1), 65–84 (2017)
Article Google Scholar
Sun, T., et al.: Mitigating gender bias in natural language processing: literature review. arXiv preprint arXiv:1906.08976 (2019)
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)
Article Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2(Nov), 45–66 (2001)
MATH Google Scholar
Tweedie, F.J., Baayen, R.H.: How variable may a constant be? Measures of lexical richness in perspective. Comput. Humanit. 32(5), 323–352 (1998)
Article Google Scholar
Wanner, L., et al.: How to use less features and reach better performance in author gender identification. In: LREC 2014, pp. 1315–1319 (2014)
Google Scholar
Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowl.-Based Syst. 21(8), 879–886 (2008)
Article Google Scholar

Download references

Acknowledgements

This work was funded by Science Foundation Ireland through the SFI Centre for Research Training in Machine Learning (18/CRT/6183). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Author information

Authors and Affiliations

Technological University Dublin, Dublin, Ireland
Manuela Nayantara Jeyaraj & Sarah Jane Delany

Authors

Manuela Nayantara Jeyaraj
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Jane Delany
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuela Nayantara Jeyaraj .

Editor information

Editors and Affiliations

Technological University Dublin, Dublin, Ireland
Luca Longo
Munster Technological University, Cork, Ireland
Ruairi O’Reilly

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jeyaraj, M.N., Delany, S.J. (2023). Author Gender Identification Considering Gender Bias. In: Longo, L., O’Reilly, R. (eds) Artificial Intelligence and Cognitive Science. AICS 2022. Communications in Computer and Information Science, vol 1662. Springer, Cham. https://doi.org/10.1007/978-3-031-26438-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-26438-2_17
Published: 23 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26437-5
Online ISBN: 978-3-031-26438-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Author Gender Identification Considering Gender Bias

Abstract

Similar content being viewed by others

Biological gender identification in Turkish news text using deep learning models

A New Term Weight Measure for Gender Prediction in Author Profiling

Investigating the Role of Emotion-Based Features in Author Gender Classification of Text

Keywords

1 Introduction

2 Related Work

3 Approach

4 Evaluation

5 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Author Gender Identification Considering Gender Bias

Abstract

Similar content being viewed by others

Biological gender identification in Turkish news text using deep learning models

A New Term Weight Measure for Gender Prediction in Author Profiling

Investigating the Role of Emotion-Based Features in Author Gender Classification of Text

Keywords

1 Introduction

2 Related Work

3 Approach

4 Evaluation

5 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation