Skip to main content

A Methodology for Resolving Heterogeneity and Interdependence in Data Analytics

  • Conference paper
  • First Online:
Book cover Advanced Data Mining and Applications (ADMA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11888))

Included in the following conference series:

  • 1759 Accesses

Abstract

The big data analytics achieves wide application in a number of areas due to its capability in uncovering hidden patterns, correlations and insights through integrating multiple data sources. However, the interdependence and heterogeneity features of these data sources pose a big challenge in managing these data sources to support “last mile” analytics in decision making and value co-creation which are usually with multiple perspectives and at multiple granularities. In this paper, we propose a unified knowledge representation framework, namely, Cyber-Entity (Cyber-E) modeling, to capture and formalize selected behaviors of real entities in both the social and physical worlds to the cyber analytic space. Its special features include not only the stateful, intra- properties of a Cyber-E, but also the inter-relationship and dependence among them. A grouping mechanism, called Cyber-G, is also introduced to support flexible granularity adjustment in the knowledge management. It supports rapid on-demand self-service analytics. An illustrating example of applying this approach in academic research community is given, followed by a case study of two top conferences in service computing area– ICSOC and ICWS– to illustrate the effectiveness and potentials of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The CAF can take multiple inputs and gives one single output. More specifically, we have (a) the input of caf can be raw data, can also be property output of the same or other CAF, (b) different CAF can share the same input, (c) an algorithm of multiple outputs could be decomposed into multiple single-output algorithms. Correspondingly, the number of the input arrows could be 1 or many, while the number of output arrows could only be 1.

  2. 2.

    There are two situations for the output of a potential inter-group CAF: (i) a property of a Cyber-G, or (ii) a property of a Cyber-E which belongs to certain Cyber-G. Suppose \(GS\ne \emptyset \), for each situation, the definition is given in Definition 10

  3. 3.

    https://www.microsoft.com/en-us/research/project/academic/

  4. 4.

    https://dl.acm.org/

  5. 5.

    https://ieeexplore.ieee.org/Xplore/home.jsp

  6. 6.

    Due to data limitations, the propagation through the relational properties (i.e., “Published In Venue”, “Cited By Author”, “Cited By Paper”) is broken as illustrated by line \(l_1\) and \(l_2\), as shown in Fig. 2.

References

  1. Lustig, I., Dietrich, B., et al.: The analytics journey. Analytics Mag. (2010)

    Google Scholar 

  2. Rutkowski, L.: Computational Intelligence: Methods and Techniques, 1st edn. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76288-1

    Book  MATH  Google Scholar 

  3. Miller, G.: Social scientists wade into the tweet stream. Science 333(6051), 1814–1815 (2011)

    Article  Google Scholar 

  4. Johan, B., Huina, M.: Twitter mood as a stock market predictor. IEEE Comput. 44(10), 91–94 (2011)

    Article  Google Scholar 

  5. Kenny, D.A., Cook, W.L.: Dyadic Data Analysis. The Guilford Press, New York (2006)

    Google Scholar 

  6. Brachman, R., Levesque, H.: Knowledge Representation and Reasoning. Morgan Kaufmann, San Francisco (2004)

    MATH  Google Scholar 

  7. Zhang, D., Guo, B., Yu, Z.: The emergence of social and community intelligence. IEEE Comput. 44(7), 21–28 (2011)

    Article  Google Scholar 

  8. Bergstrom, C.: Eigenfactor: measuring the value and prestige of scholarly journals. College Res. Libr. News 68(5), 314–316 (2007)

    Article  Google Scholar 

  9. Cheang, B., Chu, S., et al.: A multidimensional approach to evaluating management journals: refining pagerank via the differentiation of citation types and identifying the roles that management journals play. J. Am. Soc. Inform. Sci. Technol. 65(12), 2581–2591 (2014)

    Article  Google Scholar 

  10. Bollen, J., Rodriguez, M.A., et al.: Journal status. Scientometrics 69(3), 669–687 (2006)

    Article  Google Scholar 

  11. Alonso, S., Cabrerizo, F.J., et al.: h-index: a review focused in its variants, computation and standardization for different scientific fields. J. Inf. 3(4), 273–289 (2009)

    Google Scholar 

  12. Guerrero-Bote, V.P., Moya-Anegon, F.: Relationship between downloads and citations at journal and paper levels, and the influence of language. Scientometrics 101(2), 1043–1065 (2014)

    Article  Google Scholar 

  13. Aduku, K.J., ThelWall, M., et al.: Do Mendeley reader counts reflect the scholarly impact of conference papers? An investigation of computer science and engineering. Scientometrics 112(1), 1–9 (2017)

    Article  Google Scholar 

  14. Zhuang, Z., Elmacioglu, E., et al.: Measuring conference quality by mining program committee characteristics. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, Vancouver, BC, Canada (2007)

    Google Scholar 

  15. Yan, E., Ding, Y.: Discovering author impact: a PageRank perspective. Inf. Process. Manage. 47(1), 125–134 (2011)

    Article  Google Scholar 

  16. Egghe, L.: Theory and practise of the g-index. Scientometrics 69(1), 131–152 (2006)

    Article  MathSciNet  Google Scholar 

  17. Ma, N., Guan, J., et al.: Bringing PageRank to the citation analysis. Inf. Process. Manage. 44(2), 800–810 (2008)

    Article  MathSciNet  Google Scholar 

  18. Yan, E., Ding, Y., et al.: P-rank: an indicator measuring prestige in heterogeneous scholarly networks. J. Am. Soc. Inform. Sci. Technol. 62(3), 467–477 (2011)

    Google Scholar 

  19. Mu, D., Guo, L., et al.: Query-focused personalized citation recommendation with mutually reinforced rankingk. IEEE Access, 3107–3119 (2018)

    Article  Google Scholar 

  20. Liu, Z., Huang, H., et al.: Tri-rank: an authority ranking framework in heterogeneous academic networks by mutual reinforce. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, pp. 493–500 (2014)

    Google Scholar 

  21. Guerrero-Bote, V.P., Moya-Anegón, F.: A further step forward in measuring journals’ scientific prestige: the SJR2 indicator. J. Inf. 6(4), 674–688 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Can Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Han, H. et al. (2019). A Methodology for Resolving Heterogeneity and Interdependence in Data Analytics. In: Li, J., Wang, S., Qin, S., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2019. Lecture Notes in Computer Science(), vol 11888. Springer, Cham. https://doi.org/10.1007/978-3-030-35231-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35231-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35230-1

  • Online ISBN: 978-3-030-35231-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics