Abstract
The big data analytics achieves wide application in a number of areas due to its capability in uncovering hidden patterns, correlations and insights through integrating multiple data sources. However, the interdependence and heterogeneity features of these data sources pose a big challenge in managing these data sources to support “last mile” analytics in decision making and value co-creation which are usually with multiple perspectives and at multiple granularities. In this paper, we propose a unified knowledge representation framework, namely, Cyber-Entity (Cyber-E) modeling, to capture and formalize selected behaviors of real entities in both the social and physical worlds to the cyber analytic space. Its special features include not only the stateful, intra- properties of a Cyber-E, but also the inter-relationship and dependence among them. A grouping mechanism, called Cyber-G, is also introduced to support flexible granularity adjustment in the knowledge management. It supports rapid on-demand self-service analytics. An illustrating example of applying this approach in academic research community is given, followed by a case study of two top conferences in service computing area– ICSOC and ICWS– to illustrate the effectiveness and potentials of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The CAF can take multiple inputs and gives one single output. More specifically, we have (a) the input of caf can be raw data, can also be property output of the same or other CAF, (b) different CAF can share the same input, (c) an algorithm of multiple outputs could be decomposed into multiple single-output algorithms. Correspondingly, the number of the input arrows could be 1 or many, while the number of output arrows could only be 1.
- 2.
There are two situations for the output of a potential inter-group CAF: (i) a property of a Cyber-G, or (ii) a property of a Cyber-E which belongs to certain Cyber-G. Suppose \(GS\ne \emptyset \), for each situation, the definition is given in Definition 10
- 3.
- 4.
- 5.
- 6.
Due to data limitations, the propagation through the relational properties (i.e., “Published In Venue”, “Cited By Author”, “Cited By Paper”) is broken as illustrated by line \(l_1\) and \(l_2\), as shown in Fig. 2.
References
Lustig, I., Dietrich, B., et al.: The analytics journey. Analytics Mag. (2010)
Rutkowski, L.: Computational Intelligence: Methods and Techniques, 1st edn. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76288-1
Miller, G.: Social scientists wade into the tweet stream. Science 333(6051), 1814–1815 (2011)
Johan, B., Huina, M.: Twitter mood as a stock market predictor. IEEE Comput. 44(10), 91–94 (2011)
Kenny, D.A., Cook, W.L.: Dyadic Data Analysis. The Guilford Press, New York (2006)
Brachman, R., Levesque, H.: Knowledge Representation and Reasoning. Morgan Kaufmann, San Francisco (2004)
Zhang, D., Guo, B., Yu, Z.: The emergence of social and community intelligence. IEEE Comput. 44(7), 21–28 (2011)
Bergstrom, C.: Eigenfactor: measuring the value and prestige of scholarly journals. College Res. Libr. News 68(5), 314–316 (2007)
Cheang, B., Chu, S., et al.: A multidimensional approach to evaluating management journals: refining pagerank via the differentiation of citation types and identifying the roles that management journals play. J. Am. Soc. Inform. Sci. Technol. 65(12), 2581–2591 (2014)
Bollen, J., Rodriguez, M.A., et al.: Journal status. Scientometrics 69(3), 669–687 (2006)
Alonso, S., Cabrerizo, F.J., et al.: h-index: a review focused in its variants, computation and standardization for different scientific fields. J. Inf. 3(4), 273–289 (2009)
Guerrero-Bote, V.P., Moya-Anegon, F.: Relationship between downloads and citations at journal and paper levels, and the influence of language. Scientometrics 101(2), 1043–1065 (2014)
Aduku, K.J., ThelWall, M., et al.: Do Mendeley reader counts reflect the scholarly impact of conference papers? An investigation of computer science and engineering. Scientometrics 112(1), 1–9 (2017)
Zhuang, Z., Elmacioglu, E., et al.: Measuring conference quality by mining program committee characteristics. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, Vancouver, BC, Canada (2007)
Yan, E., Ding, Y.: Discovering author impact: a PageRank perspective. Inf. Process. Manage. 47(1), 125–134 (2011)
Egghe, L.: Theory and practise of the g-index. Scientometrics 69(1), 131–152 (2006)
Ma, N., Guan, J., et al.: Bringing PageRank to the citation analysis. Inf. Process. Manage. 44(2), 800–810 (2008)
Yan, E., Ding, Y., et al.: P-rank: an indicator measuring prestige in heterogeneous scholarly networks. J. Am. Soc. Inform. Sci. Technol. 62(3), 467–477 (2011)
Mu, D., Guo, L., et al.: Query-focused personalized citation recommendation with mutually reinforced rankingk. IEEE Access, 3107–3119 (2018)
Liu, Z., Huang, H., et al.: Tri-rank: an authority ranking framework in heterogeneous academic networks by mutual reinforce. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, pp. 493–500 (2014)
Guerrero-Bote, V.P., Moya-Anegón, F.: A further step forward in measuring journals’ scientific prestige: the SJR2 indicator. J. Inf. 6(4), 674–688 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Han, H. et al. (2019). A Methodology for Resolving Heterogeneity and Interdependence in Data Analytics. In: Li, J., Wang, S., Qin, S., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2019. Lecture Notes in Computer Science(), vol 11888. Springer, Cham. https://doi.org/10.1007/978-3-030-35231-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-35231-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35230-1
Online ISBN: 978-3-030-35231-8
eBook Packages: Computer ScienceComputer Science (R0)