Electronic Markets

, Volume 29, Issue 1, pp 93–106 | Cite as

Cognitive computing for customer profiling: meta classification for gender prediction

  • Robin HirtEmail author
  • Niklas Kühl
  • Gerhard Satzger
Research Paper
Part of the following topical collections:
  1. Special Issue on "Smart Services: The move to customer-orientation"


Analyzing data from micro blogs is an increasingly interesting option for enterprises to learn about customer sentiments, public opinion, or unsatisfied needs. A better understanding of the underlying customer profiles (considering e.g. gender or age) can substantially enhance the economic value of the customer intimacy provided by this type of analytics. In a design science approach, we draw on information processing theory and meta machine learning to propose an extendable, cognitive classifier that, for profiling purposes, integrates and combines various isolated base classifiers. We evaluate its feasibility and the performance via a technical experiment, its suitability in a real use case, and its utility via an expert workshop. Thus, we augment the body of knowledge by a cognitive method that enables the integration of existing, as well as emerging customer profiling classifiers for an improved overall prediction performance. Specifically, we contribute a concrete classifier to predict the gender of German-speaking Twitter users. We enable enterprises to reap information from micro blog data to develop customer intimacy and to tailor individual offerings for smarter services.


Cognitive computing Micro blog data Gender detection Meta machine learning Meta classifier 

JEL classification



  1. Allport, G. W., & Odbert, H. S. (1936). Trait-names: a psycho-lexical study. Psychological Monographs, 47, 171–220.Google Scholar
  2. Alowibdi, J. S., Buy, U. a., & Yu, P. (2013). Language independent gender classification on Twitter. Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining - ASONAM ‘13, (May), 739–743.Google Scholar
  3. Argamon, S., Koppel, M., Pennebaker, J., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2), 119–123.Google Scholar
  4. Arnold, K. A., & Bianchi, C. (2001). Relationship marketing, gender, and culture: implications for consumer behaviour. In C. G. Mary & M. L. Joan (Eds), Advances in consumer research (vol. 28, pp. 100–105). Valdosta: Association for Consumer Research.Google Scholar
  5. Arroju, M., Hassan, A., & Farnadi, G. (2015). Age, gender and personality recognition using tweets in a multilingual setting. In 6th Conference and Labs of the Evaluation Forum (CLEF 2015): Experimental IR meets multilinguality, multimodality, and interaction, Toulouse, France, pp. 23–31.Google Scholar
  6. Atrey, P. K., Hossain, M. A., El Saddik, A., & Kankanhalli, M. S. (2010). Multimodal fusion for multimedia analysis: A survey. Multimedia Systems (Vol. 16).
  7. Baird, C. H., & Parasnis, G. (2011). From social media to social customer relationship management. Strategy & Leadership, 39, 30–37.Google Scholar
  8. Bergsma, S., Dredze, M., Van Durme, B., Wilson, T., & Yarowsky, D. (2013). Broadly improving user classification via communication-based name and location clustering on Twitter. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), Atlanta, USA, pp. 1010–1019.Google Scholar
  9. Blair, D. C. (1979). Information retrieval, 2nd edn. Journal of the American Society for Information Science, 30(6), 374–375.
  10. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(421), 123–140.Google Scholar
  11. Burger, J. D., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating gender on Twitter. Association for Computational Linguistics, 146, 1301–1309.Google Scholar
  12. Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: a framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. Scholar
  13. Cranshaw, J., Schwartz, R., Hong, J. I. & Sadeh, N. (2012). The livehoods project: utilizing social media to understand the dynamics of a city. In Proceedings of the 6th International Conference on Weblogs and Social Media (ICWSM’12), Dublin, Ireland, AAAI Press, pp. 58–65.Google Scholar
  14. Dietterich, T. G. (1997). Machine-learning research. AI Magazine, 18(4), 97. Scholar
  15. Džeroski, S., & Ženko, B. (2004). Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54(3), 255–273.Google Scholar
  16. Estival, D., Gaustad, T., Pham, S. B., Radford, W., & Hutchinson, B. (2007). Author profiling for English emails. 10th conference of the Pacific Association for Computational Linguistics, 263–272.Google Scholar
  17. European Commission. (2017). Reducing CO2 emissions from passenger car. Retrieved June 21, 2018, from
  18. Fischer, E., & Arnold, S. J. (1994). Sex, gender identity, gender role attitudes, and consumer behavior. Psychology and Marketing, 11, 163–182.Google Scholar
  19. Gama, J., & Brazdil, P. (2000). Cascade Generalization. Machine Learning, 41(3), 315–343.Google Scholar
  20. Giraud-Carrier, C., Giraud-Carrier, C., Vilalta, R., Vilalta, R., Brazdil, P., & Brazdil, P. (2004). Introduction to the special issue on Meta-learning. Machine Learning, 54, 187–193.Google Scholar
  21. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Retrieved June 21, 2018, from
  22. Gottipati S., Qiu M., Yang L., Zhu F., & Jiang J. (2014). An integrated model for user attribute discovery: a case study on political affiliation identification. In V. S. Tseng, T. B. Ho, Z. H. Zhou, A. L. P. Chen & H. Y. Kao (Eds.), Advances in knowledge discovery and data mining. PAKDD 2014. Lecture Notes in Computer Science (vol. 8443). Cham: Springer.Google Scholar
  23. Gregor, S., & Hevner, A. R. (2013). Positioning and presenting design science types of knowledge in design science research. MIS Quarterly, 37(2), 337–355.Google Scholar
  24. Gregor, S., & Jones, D. (2007). The anatomy of a design theory. Journal of the Association for Information Systems, 8(5), 1–25.Google Scholar
  25. Grimes, T. (1990). Audio-video correspondence and its role in attention and memory. Educational Technology Research and Development, 38(3), 15–25.Google Scholar
  26. Habryn, F. (2012). Customer intimacy analytics: leveraging operational data to assess customer knowledge and relationships and to measure their business impact. KIT Scientific Publishing.
  27. Heimbach, I., Gottschlich, J., & Hinz, O. (2015). The value of user’s Facebook profile data for product recommendation generation. Electronic Markets, 25(2), 125–138.Google Scholar
  28. Hevner, A., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1), 75–105.Google Scholar
  29. Hirt, R., & Kühl, N. (2018). Cognition in the era of smart service systems: Inter-organizational analytics through meta and transfer learning. In Proceedings of the Thirty Ninth International Conference on Information Systems (ICIS), San Francisco, CA, USA, 13th–16th December 2018. Google Scholar
  30. Hsu, C., Chang, C.-C., & Lin, C.-J. (2008). A practical guide to support vector classifcation. Bioinformatics, 1(1), 1–15.Google Scholar
  31. IBM. (2016). Watson Visual Recognition service. Retrieved October 16, 2016, from
  32. Ikeda, K., Hattori, G., Ono, C., Asoh, H., & Higashino, T. (2013). Twitter user profiling based on text and community mining for market analysis. Knowledge-Based Systems, 51, 35–47.Google Scholar
  33. Jenkins, M.-C., Churchill, R., Cox, S., & Smith, D. (2007). Analysis of user interaction with service oriented Chatbot systems. Human Computer Interaction, 4552, 76–83.Google Scholar
  34. Kludas J., Bruno E., & Marchand-Maillet S. (2008). Information fusion in multimedia information retrieval. In N. Boujemaa, M. Detyniecki & A. Nürnberger (Eds.), Adaptive multimedia retrieval: retrieval, user, and semantics. AMR 2007. Lecture Notes in Computer Science (vol. 4918). Berlin, Heidelberg: Springer.Google Scholar
  35. Kraftfahrt-Bundesamt. (2014). Anzahl der Neuzulassungen von Elektroautos im Zeitraum von 2011 bis 2014.Google Scholar
  36. Kuechler, W., & Vaishnavi, V. (2012). A framework for theory development in design science research: multiple perspectives. Journal of the Association for Information Systems, 13(6), 395–423.Google Scholar
  37. Kühl, N., Scheurenbrand, J., & Satzger, G. (2016). Needmining: Identifying micro blog data containing customer needs. Proceedings of the 24th European Conference on Information Systems, 1–16.Google Scholar
  38. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710.Google Scholar
  39. Liu, W., & Ruths, D. (2013). What’s in a name? Using first names as features for gender inference in Twitter. Analyzing Microtext: Papers from the 2013 AAAI Spring Symposium, 10–16.Google Scholar
  40. Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(June), 22–31.Google Scholar
  41. Michie, E. D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification. Technometrics, 37(4), 459.Google Scholar
  42. Miller, G. A. (1956). The magical number 7, plus or minus 2 - some limits on our capacity for processing information. Psychological Review, 63, 81–97. Scholar
  43. Modha, D. S., Ananthanarayanan, R., Esser, S. K., Ndirango, A., Sherbondy, A. J., & Singh, R. (2011). Cognitive computing. Communications of the ACM, 54(8), 62.Google Scholar
  44. Narr, S., Hulfenhaus, M., & Albayrak, S. (2012). Language-independent twitter sentiment analysis. Knowledge discovery and machine learning (KDML), LWA, 12–14.Google Scholar
  45. Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33(1), 31–88.Google Scholar
  46. Neuhofer, B., Buhalis, D., & Ladkin, A. (2015). Smart technologies for personalized experiences: a case study in the hospitality domain. Electronic Markets, 25(3), 243–254.Google Scholar
  47. Nguyen, D., Gravel, R., Trieschnigg, D., & Meder, T. (2013). How old do you think I am ?: A study of language and age in Twitter. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, 8-11 July 2013, Cambridge, Massachusetts, USA, 439–448.Google Scholar
  48. Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatterjee, S. (2007). A design science research methodology for information systems research. Journal of Management Information Systems, 24(3), 45–77.Google Scholar
  49. Peffers K., Rothenberger M., Tuunanen T., & Vaezi R. (2012). Design science research evaluation. In K. Peffers, M. Rothenberger & B. Kuechler (Eds.), Design science research in information systems. Advances in theory and practice. DESRIST 2012. Lecture Notes in Computer Science (vol. 7286). Berlin, Heidelberg: Springer.Google Scholar
  50. Powers, D. M. W. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.Google Scholar
  51. Quinlan, J. R. (2006). Bagging, boosting, and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence, 5(Quinlan 1993), 725–730.Google Scholar
  52. Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010). Classifying latent user attributes in twitter. Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents - SMUC ‘10, 37.Google Scholar
  53. Rumelhart, D. E., & Mcclelland, J. L. (1986). Parallel distributed processing: explorations in the microstructure of cognition. Volume 1: Foundations. MIT Press: Cambridge. Google Scholar
  54. Scheurenbrand, J., Engel, C., Peters, F., & Kühl, N. (2015). Holistically defining E-mobility: a modern approach to systematic literature reviews. Karlsruhe Service Summit, 17–27.
  55. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E. P., & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One, 8(9), e73791.Google Scholar
  56. Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 25, 1–9.Google Scholar
  57. Sonnenberg, C., & Vom Brocke, J. (2012). Evaluations in the science of the artificial - reconsidering the build-evaluate pattern in design science research. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
  58. Statista. (2016). Anzahl der monatlich aktiven Nutzer von Twitter in Deutschland in den Jahren 2014 und 2015 sowie eine Prognose für 2016 (in Millionen). Retrieved August 4, 2016, from
  59. Stone, M. (1977). Asymptotics for and against cross-validation. Biometrika, 64(1), 29–35.Google Scholar
  60. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.Google Scholar
  61. Todorovski, L., & Džeroski, S. (2003). Combining classifiers with meta decision trees. Machine Learning, 50(3), 223–249.Google Scholar
  62. Treacy, M., & Wiersema, F. (1993). Customer intimacy and other value disciplines customer intimacy and other value disciplines. Harvard Business Review, 71(9301), 84–93.Google Scholar
  63. Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2), 77–95. Scholar
  64. Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare for the future: writing a literature review. MIS Quarterly, 26(2), xiii–xxiii Scholar
  65. Wieneke, A., & Lehrer, C. (2016). Generating and exploiting customer insights from social media data. Electronic Markets, 26(3), 245–268.Google Scholar
  66. Zhou, G., Shen, D., Zhang, J., Su, J., & Tan, S. (2005). Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics, 6(1), 1.Google Scholar

Copyright information

© Institute of Applied Informatics at University of Leipzig 2019

Authors and Affiliations

  1. 1.Karlsruhe Institute of Technology, Karlsruhe Service Research InstituteKarlsruheGermany

Personalised recommendations