Journal of Intelligent Information Systems

, Volume 51, Issue 1, pp 71–96 | Cite as

Segregation discovery in a social network of companies

  • Alessandro BaroniEmail author
  • Salvatore Ruggieri


We introduce a framework for the data-driven analysis of social segregation of minority groups, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem is introduced, which consists of searching sub-groups of population and minorities for which a segregation index is above a minimum threshold. A search algorithm is devised that solves the segregation problem by computing a multi-dimensional data cube that can be explored by the analyst. The machinery underlying the search algorithm relies on frequent itemset mining concepts and tools. The framework is challenged on a cases study in the context of company networks. We analyse segregation on the grounds of sex and age for directors in the boards of the Italian companies. The network includes 2.15M companies and 3.63M directors.


Segregation discovery Segregation indexes Frequent itemset mining Network of company board directors 


  1. Almeida, H.V., & Wolfenzon, D. (2006). A theory of pyramidal ownership and family business groups. The Journal of Finance, 61(6), 2637–2680.CrossRefGoogle Scholar
  2. Alstott, J., Bullmore, E., & Plenz, D. (2014). Powerlaw: A Python package for analysis of heavy-tailed distributions. PLoS ONE, 9(1), e85777.CrossRefGoogle Scholar
  3. Atkinson, A.B., Piketty, T., & Saez, E. (2011). Top incomes in the long run of history. Journal of Economic Literature, 1(49), 3–71.CrossRefGoogle Scholar
  4. Bakshy, E., Messing, S., & Adamic, L.A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130–1132.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Baroni, A., & Ruggieri, S. (2015). Segregation discovery in a social network of companies. In Advances in intelligent data analysis XIV, LNCS, vol. 9385, pp. 37–48. Springer.Google Scholar
  6. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., & Lakhal, L. (2000). Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2), 66–75.CrossRefzbMATHGoogle Scholar
  7. Battiston, S., Bonabeau, E., & Weisbuch, G. (2003). Decision making dynamics in corporate boards. Physica A: Statistical Mechanics and its Applications, 322, 567–582.CrossRefzbMATHGoogle Scholar
  8. Battiston, S., & Catanzaro, M. (2004). Statistical properties of corporate board and director networks. The European Physical Journal B, 38(2), 345–352.CrossRefGoogle Scholar
  9. Bell, W. (1954). A probability model for the measurement of ecological segregation. Social Forces, 32, 357–364.CrossRefGoogle Scholar
  10. Bettio, F., & Verashchagina, A. (2009). Gender segregation in the labour market: Root causes, implications and policy responses in the EU. Publications Office of the European Union.Google Scholar
  11. Borgelt, C. (2012). Frequent item set mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 437–456. Scholar
  12. Burke, R.J. (2000). Company size, board size and numbers of women corporate directors. In Women on corporate boards of directors, pp. 157–167. Springer.Google Scholar
  13. Burke, R.J., & Mattis, M.C. (2013). Women on corporate boards of directors: International challenges and opportunities, vol. 14. Springer Science & Business Media.Google Scholar
  14. Clark, W.A.V. (1991). Residential preferences and neighborhood racial segregation: A test of the Schelling segregation model. Demography, 28(1), 1–19.MathSciNetCrossRefGoogle Scholar
  15. Cristaldi, F. (2012). Immigrazione e territorio: lo spazio con/diviso. Pàtron.Google Scholar
  16. Croppenstedt, A., Goldstein, M., & Rosas, N. (2013). Gender and agriculture: Inefficiencies, segregation, and low productivity traps. World Bank Research Observer, 28, 79–109.CrossRefGoogle Scholar
  17. Das, S., & Kramer, A.D.I. (2013). Self-censorship on Facebook. In Proceedings of the international conference on weblogs and social media (ICWSM 2013). The AAAI Press.Google Scholar
  18. Davis, G.F., Yoo, M., & Baker, W.E. (2003). The small world of the American corporate elite, 1982-2001. Strategic organization, 1(3), 301–326.CrossRefGoogle Scholar
  19. Demb, A., & Neubauer, F.F. (1992). The corporate board: Confronting the paradoxes. Long range planning, 25(3), 9–20.CrossRefGoogle Scholar
  20. Denton, N.A., & Massey, D.S. (1988). Residential segregation of Blacks, Hispanics, and Asians by socioeconomic status and generation. Social Science Quarterly, 69(4), 797–817.Google Scholar
  21. Duncan, O.D., & Duncan, B. (1955). A methodological analysis of segregation indexes. American Sociological Review, 20, 210–217.CrossRefGoogle Scholar
  22. Fischer, E. (2011). Distribution of race and ethnicity in US major cities. Published on line at under Creative Commons licence, CC BY-SA 2.0.
  23. Flaxman, S., Goel, S., & Rao, J.M. (2013). Ideological segregation and the effects of social media on news consumption. Available at SSRN:
  24. Flückiger, Y., & Silber, J. (1999). The measurement of segregation in the labor force. Berlin: Springer Science & Business Media.CrossRefGoogle Scholar
  25. Frey, J.H., & Eitzen, D.S. (1991). Sport and society. Annual Review of Sociology, 17, 503–522.CrossRefGoogle Scholar
  26. Gastwirth, J.L. (1971). A general definition of the Lorenz curve. Econometrica: Journal of the Econometric Society, 39, 1037–1039.CrossRefzbMATHGoogle Scholar
  27. Gentzkow, M., & Shapiro, J.M. (2011). Ideological segregation online and offline. Quarterly Journal of Economics, 126(4), 1799–1839.CrossRefGoogle Scholar
  28. Goethals, B. (2010). Frequent itemset mining implementations repository.
  29. Grevet, C. (2016). Being nice on the internet: Designing for the coexistence of diverse opinions online. Ph.D. thesis: Georgia Institute of Technology.Google Scholar
  30. Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.MathSciNetCrossRefGoogle Scholar
  31. Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd edn. Burlington: Morgan Kaufmann Publishers Inc.zbMATHGoogle Scholar
  32. Hutchens, R.M. (1991). Segregation curves, Lorenz curves, and inequality in the distribution of people across occupations. Mathematical Social Sciences, 21(1), 31–51.MathSciNetCrossRefGoogle Scholar
  33. International Organization for Standardization (2013). ISO 3166-1:2013 International standard for country codes and codes for their subdivisions.Google Scholar
  34. James, D.R., & Tauber, K.E. (1985). Measures of segregation. Sociological Methodology, 13, 1–32.CrossRefGoogle Scholar
  35. Kaser, O., & Lemire, D. (2016). Compressed bitmap indexes: Beyond unions and intersections. Software: Practice and Experience, 46, 167–198. Scholar
  36. Kogut, B., & Walker, G. (2001). The small world of Germany and the durability of national networks. American Sociological Review, 66, 317–335.CrossRefGoogle Scholar
  37. Loy, J.W., & Elvogue, J.F. (1970). Racial segregation in American sport. International Review for the Sociology of Sport, 5(1), 5–24.CrossRefGoogle Scholar
  38. Maes, M., & Bischofberger, L. (2015). Will the personalization of online social networks foster opinion polarization Available at SSRN:
  39. Massey, D.S. (2016). Segregation and the perpetuation of disadvantage. In The Oxford Handbook of the Social Science of Poverty (pp. 369–393).Google Scholar
  40. Massey, D.S., & Denton, N.A. (1988). The dimensions of residential segregation. Social Forces, 67(2), 281–315.CrossRefGoogle Scholar
  41. Massey, D.S., Rothwell, J., & Domina, T. (2009). The changing bases of segregation in the United States. Annals of the American Academy of Political and Social Science, 626, 74–90.CrossRefGoogle Scholar
  42. Mitchell, T. (1997). Machine Learning. New York: The Mc-Graw-Hill Companies, Inc.zbMATHGoogle Scholar
  43. Mizruchi, M.S. (1996). What do interlocks do? An analysis, critique, and assessment of research on interlocking directorates. Annual Review of Sociology, 22(1), 271–298.CrossRefGoogle Scholar
  44. Mora, R., & Ruiz-Castillo, J. (2011). Entropy-based segregation indices. Sociological Methodology, 41, 159–194.CrossRefGoogle Scholar
  45. Musterd, S. (2005). Social and ethnic segregation in Europe: Levels, causes, and effects. Journal of Urban Affairs, 27(3), 331–348.CrossRefGoogle Scholar
  46. Negrevergne, B., Termier, A., Rousset, M.C., & Méhaut, J.F. (2014). Para Miner: a generic pattern mining algorithm for multi-core architectures. Data Mining and Knowledge Discovery, 28(3), 593–633.MathSciNetCrossRefzbMATHGoogle Scholar
  47. Ooi, C.A., Hooy, C.W., & Som, A.P.M. (2015). Diversity in human and social capital: Empirical evidence from Asian tourism firms in corporate board composition. Tourism Management, 48, 139–153.CrossRefGoogle Scholar
  48. Pariser, E. (2011). The Filter Bubble: What the Internet is hiding from you. Penguin UK.Google Scholar
  49. Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd edn. New York: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  50. Pearl, J. (2014). Comment: Understanding simpson’s paradox. The American Statistician, 68(1), 8–13.MathSciNetCrossRefGoogle Scholar
  51. Piccardi, C., Calatroni, L., & Bertoni, F. (2010). Communities in Italian corporate networks. Physica A: Statistical Mechanics and its Applications, 389(22), 5247–5258.CrossRefGoogle Scholar
  52. Randøy, T., Thomsen, S., & Oxelheim, L. (2006). A nordic perspective on corporate board diversity. Tech. Rep., 0, 5428.Google Scholar
  53. Robins, G., & Alexander, M. (2004). Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1), 69–94.CrossRefzbMATHGoogle Scholar
  54. Romei, A., & Ruggieri, S. (2014). A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29(5), 582–638.CrossRefGoogle Scholar
  55. Romei, A., Ruggieri, S., & Turini, F. (2015). The layered structure of company share networks. In Proceedings of the IEEE international conference on data science and advanced analytics (DSAA 2015), pp. 1–10. IEEE Computer Society.Google Scholar
  56. Sankowska, A., & Siudak, D. (2016). The small world phenomenon and assortative mixing in Polish corporate board and director networks. Physica A: Statistical Mechanics and its Applications, 443, 309–315.CrossRefGoogle Scholar
  57. Schelling, T.C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1(2), 143–186.CrossRefzbMATHGoogle Scholar
  58. Smith, S.L., & Choueiti, M. (2011). Black characters in popular film: Is the key to diversifying cinematic content held in the hand of the black director. Annenberg School for Communication & Journalism. Retrieved March, 12, 2013.Google Scholar
  59. Xu, K. (2003). How has the literature on Gini’s index evolved in the past 80 years? Economics working paper. Halifax: Dalhousie University. Available at SSRN: Scholar
  60. Zhou, T., Ren, J., Medo, M., & Zhang, Y.C. (2007). Bipartite network projection and personal recommendation. Physical Review E, 76(4), 046115.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Dipartimento di InformaticaUniversità di PisaPisaItaly

Personalised recommendations