A Methodology for Resolving Heterogeneity and Interdependence in Data Analytics

Han, Han; Zhao, Yunwei; Wang, Can; Shu, Min; Peng, Tao; Chi, Chi-Hung; Yu, Yonghong

doi:10.1007/978-3-030-35231-8_2

Han Han¹³,
Yunwei Zhao¹³,
Can Wang¹⁴,
Min Shu¹³,
Tao Peng¹⁵,
Chi-Hung Chi¹⁶ &
…
Yonghong Yu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11888))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

1759 Accesses

Abstract

The big data analytics achieves wide application in a number of areas due to its capability in uncovering hidden patterns, correlations and insights through integrating multiple data sources. However, the interdependence and heterogeneity features of these data sources pose a big challenge in managing these data sources to support “last mile” analytics in decision making and value co-creation which are usually with multiple perspectives and at multiple granularities. In this paper, we propose a unified knowledge representation framework, namely, Cyber-Entity (Cyber-E) modeling, to capture and formalize selected behaviors of real entities in both the social and physical worlds to the cyber analytic space. Its special features include not only the stateful, intra- properties of a Cyber-E, but also the inter-relationship and dependence among them. A grouping mechanism, called Cyber-G, is also introduced to support flexible granularity adjustment in the knowledge management. It supports rapid on-demand self-service analytics. An illustrating example of applying this approach in academic research community is given, followed by a case study of two top conferences in service computing area– ICSOC and ICWS– to illustrate the effectiveness and potentials of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The CAF can take multiple inputs and gives one single output. More specifically, we have (a) the input of caf can be raw data, can also be property output of the same or other CAF, (b) different CAF can share the same input, (c) an algorithm of multiple outputs could be decomposed into multiple single-output algorithms. Correspondingly, the number of the input arrows could be 1 or many, while the number of output arrows could only be 1.
2.
There are two situations for the output of a potential inter-group CAF: (i) a property of a Cyber-G, or (ii) a property of a Cyber-E which belongs to certain Cyber-G. Suppose \(GS\ne \emptyset \), for each situation, the definition is given in Definition 10
3.
https://www.microsoft.com/en-us/research/project/academic/
4.
https://dl.acm.org/
5.
https://ieeexplore.ieee.org/Xplore/home.jsp
6.
Due to data limitations, the propagation through the relational properties (i.e., “Published In Venue”, “Cited By Author”, “Cited By Paper”) is broken as illustrated by line \(l_1\) and \(l_2\), as shown in Fig. 2.

References

Lustig, I., Dietrich, B., et al.: The analytics journey. Analytics Mag. (2010)
Google Scholar
Rutkowski, L.: Computational Intelligence: Methods and Techniques, 1st edn. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76288-1
Book MATH Google Scholar
Miller, G.: Social scientists wade into the tweet stream. Science 333(6051), 1814–1815 (2011)
Article Google Scholar
Johan, B., Huina, M.: Twitter mood as a stock market predictor. IEEE Comput. 44(10), 91–94 (2011)
Article Google Scholar
Kenny, D.A., Cook, W.L.: Dyadic Data Analysis. The Guilford Press, New York (2006)
Google Scholar
Brachman, R., Levesque, H.: Knowledge Representation and Reasoning. Morgan Kaufmann, San Francisco (2004)
MATH Google Scholar
Zhang, D., Guo, B., Yu, Z.: The emergence of social and community intelligence. IEEE Comput. 44(7), 21–28 (2011)
Article Google Scholar
Bergstrom, C.: Eigenfactor: measuring the value and prestige of scholarly journals. College Res. Libr. News 68(5), 314–316 (2007)
Article Google Scholar
Cheang, B., Chu, S., et al.: A multidimensional approach to evaluating management journals: refining pagerank via the differentiation of citation types and identifying the roles that management journals play. J. Am. Soc. Inform. Sci. Technol. 65(12), 2581–2591 (2014)
Article Google Scholar
Bollen, J., Rodriguez, M.A., et al.: Journal status. Scientometrics 69(3), 669–687 (2006)
Article Google Scholar
Alonso, S., Cabrerizo, F.J., et al.: h-index: a review focused in its variants, computation and standardization for different scientific fields. J. Inf. 3(4), 273–289 (2009)
Google Scholar
Guerrero-Bote, V.P., Moya-Anegon, F.: Relationship between downloads and citations at journal and paper levels, and the influence of language. Scientometrics 101(2), 1043–1065 (2014)
Article Google Scholar
Aduku, K.J., ThelWall, M., et al.: Do Mendeley reader counts reflect the scholarly impact of conference papers? An investigation of computer science and engineering. Scientometrics 112(1), 1–9 (2017)
Article Google Scholar
Zhuang, Z., Elmacioglu, E., et al.: Measuring conference quality by mining program committee characteristics. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, Vancouver, BC, Canada (2007)
Google Scholar
Yan, E., Ding, Y.: Discovering author impact: a PageRank perspective. Inf. Process. Manage. 47(1), 125–134 (2011)
Article Google Scholar
Egghe, L.: Theory and practise of the g-index. Scientometrics 69(1), 131–152 (2006)
Article MathSciNet Google Scholar
Ma, N., Guan, J., et al.: Bringing PageRank to the citation analysis. Inf. Process. Manage. 44(2), 800–810 (2008)
Article MathSciNet Google Scholar
Yan, E., Ding, Y., et al.: P-rank: an indicator measuring prestige in heterogeneous scholarly networks. J. Am. Soc. Inform. Sci. Technol. 62(3), 467–477 (2011)
Google Scholar
Mu, D., Guo, L., et al.: Query-focused personalized citation recommendation with mutually reinforced rankingk. IEEE Access, 3107–3119 (2018)
Article Google Scholar
Liu, Z., Huang, H., et al.: Tri-rank: an authority ranking framework in heterogeneous academic networks by mutual reinforce. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, pp. 493–500 (2014)
Google Scholar
Guerrero-Bote, V.P., Moya-Anegón, F.: A further step forward in measuring journals’ scientific prestige: the SJR2 indicator. J. Inf. 6(4), 674–688 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

CNCERT/CC, Beijing, China
Han Han, Yunwei Zhao & Min Shu
School of ICT, Griffith University, Gold Coast, Australia
Can Wang
Dongguan University of Technology, Dongguan, China
Tao Peng
CSIRO, Hobart, Australia
Chi-Hung Chi
Nanjing University of Posts and Telecommunications, Nanjing, China
Yonghong Yu

Authors

Han Han
View author publications
You can also search for this author in PubMed Google Scholar
Yunwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Can Wang
View author publications
You can also search for this author in PubMed Google Scholar
Min Shu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Peng
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Hung Chi
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Can Wang .

Editor information

Editors and Affiliations

Deakin University, Burwood, VIC, Australia
Jianxin Li
The University of Queensland, St. Lucia, QLD, Australia
Sen Wang
Flinders University, Bedford Park, SA, Australia
Shaowen Qin
Dalian Neusoft University of Information, Dalian, China
Xue Li
Beijing Institute of Technology, Beijing, China
Shuliang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, H. et al. (2019). A Methodology for Resolving Heterogeneity and Interdependence in Data Analytics. In: Li, J., Wang, S., Qin, S., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2019. Lecture Notes in Computer Science(), vol 11888. Springer, Cham. https://doi.org/10.1007/978-3-030-35231-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-35231-8_2
Published: 15 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35230-1
Online ISBN: 978-3-030-35231-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics