Abstract
Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical to resolve conflicts and discover values that reflect the real world; this task is called data fusion. This paper describes a novel approach that finds true values from conflicting information when there are a large number of sources, among which some may copy from others. We present a case study on real-world data showing that the described algorithm can significantly improve accuracy of truth discovery and is scalable when there are a large number of data sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Artz, D., Gil, Y.: A survey of trust in computer science and the semantic web. Journal of Web Semantics 5(2) (2010)
Borodin, A., Roberts, G., Rosenthal, J., Tsaparas, P.: Link analysis ranking: algorithms, theory, and experiments. TOIT 5, 231–297 (2005)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Buneman, P., Cheney, J., Tan, W.-C., Vansummeren, S.: Curated databases. In: Proc. of PODS (2008)
Davidson, S., Freire, J.: Provenance and scientific workflows: Challenges and opportunites. In: Proc. of SIGMOD (2008)
Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Global detection of complex copying relationships between sources. PVLDB (2010)
Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. PVLDBÂ 2(1) (2009)
Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. PVLDBÂ 2(1) (2009)
Kamvar, S., Schlosser, M., Garcia-Molina, H.: The Eigentrust algorithm for reputation management in P2P networks. In: WWW (2003)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: SODA (1998)
Li, X., Dong, X.L., Lyons, K.B., Meng, W., Srivastava, D.: Truth finding on the deep web: Is the problem solved? PVLDBÂ 6(2) (2013)
Singh, A., Liu, L.: TrustMe: anonymous management of trust relationshiops in decentralized P2P systems. In: IEEE Intl. Conf. on Peer-to-Peer Computing (2003)
Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J.: A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB 5(6), 550–561 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dong, X.L., Berti-Equille, L., Srivastava, D. (2013). Data Fusion: Resolving Conflicts from Multiple Sources. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-38562-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)