Exploratory Analysis of Big Social Data Using MIC/MINE Statistics

  • Piyawat Lertvittayakumjorn
  • Chao Wu
  • Yue Liu
  • Hong Mi
  • Yike Guo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10540)

Abstract

A major goal of Exploratory Data Analysis (EDA) is to understand main characteristics of a dataset, especially relationships between variables, which are helpful for creating a predictive model and analysing causality in social science research. This paper aims to introduce Maximal Information Coefficient (MIC) and its by-product statistics to social science researchers as effective EDA tools for big social data. A case study was conducted using a historical data of more than 3,000 country-level indicators. As a result, MIC and some by-product statistics successfully provided useful information for EDA complementing the traditional Pearson’s correlation. Moreover, they revealed several significant, including nonlinear, relationships between variables which are intriguing and able to suggest further research in social sciences.

Keywords

Exploratory data analysis Big social data Maximal information coefficient Correlation 

References

  1. 1.
    Akhand, M., Nandi, R., Amran, S., Murase, K.: Gene regulatory network inference incorporating maximal information coefficient into minimal redundancy network. ICEEICT 2015, 1–4 (2015)Google Scholar
  2. 2.
    Bhattacherjee, A.: Social science research: principles, methods, and practices (2012)Google Scholar
  3. 3.
    Boutyline, A., Vaisey, S.: Belief network analysis: a relational approach to understanding the structure of attitudes. Am. J. Sociol. 122(5), 1371–1447 (2017)CrossRefGoogle Scholar
  4. 4.
    Neuman, L.W.: Social research methods: qualitative and quantitative approaches, 7th edn. Pearson Education Limited, Essex (2014)Google Scholar
  5. 5.
    Paul, A.K., Shill, P.C.: Reconstruction of gene network through backward elimination based information-theoretic inference with maximal information coefficient. In: icIVPR 2017, pp. 1–5 (2017)Google Scholar
  6. 6.
    Qiuheng, T., Jiang, H., Yiming, D.: Model selection method based on maximal information coefficient of residuals. Acta Math. Sci. 34(2), 579–592 (2014)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Rau, C., Wisniewski, N., Orozco, L., Bennett, B., Weiss, J., Lusis, A.: Maximal information component analysis: a novel non-linear network analysis method. Front. Genet. 4, 28 (2013)CrossRefGoogle Scholar
  8. 8.
    Reshef, D.N., Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., Mcvean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334, 1518–1524 (2011)CrossRefMATHGoogle Scholar
  9. 9.
    Simon, N., Tibshirani, R.: Comment on “Detecting novel associations in large data sets”. In: Reshef., D.N., et al. Science, 16 December 2011. ArXiv e-prints (2014)Google Scholar
  10. 10.
    Tableau Software: Answer questions as fast as you can think of them with tableau (2016). http://www.tableau.com/trial/tableau-software
  11. 11.
    Transparency International: Corruption perceptions index 2016: Frequently asked questions (2017). https://www.transparency.org/news/feature/corruption_perceptions_index_2016
  12. 12.
    Zhao, X., Deng, W., Shi, Y.: Feature selection with attributes clustering by maximal information coefficient. Procedia Comput. Sci. 17, 70–79 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Piyawat Lertvittayakumjorn
    • 1
  • Chao Wu
    • 1
  • Yue Liu
    • 1
    • 2
  • Hong Mi
    • 2
  • Yike Guo
    • 1
  1. 1.Data Science Institute, Imperial College LondonLondonUK
  2. 2.School of Public Affairs, Zhejiang UniversityZhejiangChina

Personalised recommendations