How to Understand Connections Based on Big Data: From Cliques to Flexible Granules

  • Ali Jalal-Kamali
  • M. Shahriar Hossain
  • Vladik KreinovichEmail author
Part of the Studies in Big Data book series (SBD, volume 8)


One of the main objectives of science and engineering is to predict the future state of the world—and to come up with actions which will lead to the most favorable outcome. To be able to do that, we need to have a quantitative model describing how the values of the desired quantities change—and for that, we need to know which factors influence this change. Usually, these factors are selected by using traditional statistical techniques, but with the current drastic increase in the amount of available data—known as the advent of big data—the traditional techniques are no longer feasible. A successful semi-heuristic method has been proposed to detect true connections in the presence of big data. However, this method has its limitations. The first limitation is that this method is heuristic—its main justifications are common sense and the fact that in several practical problems, this method was reasonably successful. The second limitation is that this heuristic method is based on using “crisp” granules (clusters), while in reality, the corresponding granules are flexible (“fuzzy”). In this chapter, we explain how the known semi-heuristic method can be justified in statistical terms, and we also show how the ideas behind this justification enable us to improve the known method by taking granule flexibility into account.


Connections Big data Flexible granules Intelligence analysis Biomedical publications 



This work was supported in part by the National Science Foundation (NSF) grants HRD-0734825 and HRD-1242122 (Cyber-ShARE Center of Excellence), NSF grant DUE-0926721, and by M. S. Hossain’s startup grant at UTEP. The authors are greatly thankful to the anonymous referees for valuable suggestions and to the editors of this volume, Shyi-Ming Chen and Witold Pedrycz, for their support and encouragement.


  1. 1.
    Aczel, J.: Functional Equations and Their Applications. Academic, New York (1966)zbMATHGoogle Scholar
  2. 2.
    Brassard, J.-P., Gecsei, J.: Path building in cellular partitioning networks. ACM SIGARCH Computer Archit News 8(3), 44–50 (1980)Google Scholar
  3. 3.
    Di Ciaccio, A., Coli, M., Angulo Ibanez, J.M. (eds.): Advanced Statistical Methods for the Analysis of Large Data. Springer, Berlin (2012)zbMATHGoogle Scholar
  4. 4.
    Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD’04, Seattle, Washington, pp. 118–127. 22–25 Aug 2004Google Scholar
  5. 5.
    Fang, L., Sarma, A.D., Yu, C., Bohannon, P.: Rex: explaining relationships between entity pairs. Proc. VLDB Endowment 5(3), 241–252 (2011)CrossRefGoogle Scholar
  6. 6.
    Heath, K., Gelfand, N., Ovsjanikov, M., Aanjaneya, M., Guibas, L.: Image webs: computing and exploiting connectivity in image collections. In: Proceedings of the 23th IEEE Conference on Computer Vision and Pattern Recognition CVPR’2010, San Francisco, California, pp. 3432–3439. 13–18 June 2010Google Scholar
  7. 7.
    Hossain, M.S., Akbar, M., Polys, N.F.: Narratives in the network: interactive methods for mining cell signaling networks. J. Comput. Biol. 19(9), 1043–1059 (2012)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Hossain, M.S., Butler, P., Boedihardjo, A.P., Ramakrishnan, N.: Storytelling in entity networks to support intelligence analysts. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD’12, Beijing, China, pp. 1375–1383. 12–16 Aug 2012Google Scholar
  9. 9.
    Hossain, M.S., Gresock, J., Edmonds, Y., Helm, R., Potts, M., Ramakrishnan, N.: Connecting the dots between PubMed abstracts. PLoS ONE 7(1), Paper e29509 (2012)Google Scholar
  10. 10.
    Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River (1995)zbMATHGoogle Scholar
  11. 11.
    Kumar, D., Ramakrishnan, N., Helm, R., Potts, M.: Algorithms for storytelling. IEEE Trans. Knowl. Data Eng. 20(6), 736–751 (2008)CrossRefGoogle Scholar
  12. 12.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  13. 13.
    Nguyen, H.T., Walker, E.A.: A First Course in Fuzzy Logic. Chapman and Hall/CRC, Boca Raton, Florida (2006)Google Scholar
  14. 14.
    Ohlhorst, F.J.: Big Data Analytics. Wiley, New York (2012)Google Scholar
  15. 15.
    Pedrycz, W.: Granular Computing: Analysis and Design of Intelligent Systems. CRC Press/Francis Taylor, Boca Raton (2013)CrossRefGoogle Scholar
  16. 16.
    Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)CrossRefGoogle Scholar
  17. 17.
    Roy, R., Olver, D.W.J.: Lambert W function. In: Olver, W.J., Lozier, D.M., Boisvert, R.F., Clark, C.F. (eds.) NIST Handbook of Mathematical Functions. Cambridge University Press, Cambridge (2010)Google Scholar
  18. 18.
    Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. Chapman and Hall/CRC Press, Boca Raton, Florida (2011)zbMATHGoogle Scholar
  19. 19.
    Srinivasa, S., Bhatnagar, V. (eds.): Big data analytics. In: Proceedings of the First International Conference on Big Data Analytics BDA’2012. Lecture Notes in Computer Science, vol. 7678. Springer, New Delhi, 24–26 Dec 2012Google Scholar
  20. 20.
    Swanson, D.R.: Complementary structures in disjoint science literatures. In: Bookstein, A., Chiaramella, Y., Salton, G., Raghavan, V.V. (eds.) Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR’91, Chicago, Illinois, pp. 280–289. 13–16 Oct 1991Google Scholar
  21. 21.
    Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ali Jalal-Kamali
    • 1
  • M. Shahriar Hossain
    • 1
  • Vladik Kreinovich
    • 1
    Email author
  1. 1.Department of Computer ScienceUniversity of Texas at El PasoEl PasoUSA

Personalised recommendations