Abstract
Entity-based searching has been introduced as a way of allowing users and applications to retrieve information about a specific real world object such as a person, an event, or a location. Recent advances in crawling, information extraction, and data exchange technologies have brought a new era in data management, typically referred to through the term Web 2.0. Entity searching over Web 2.0 data facilitates the retrieval of relevant information from the plethora of data available in semantic and social web applications.
Effective entity searching over a variety of sources requires the integration of the different pieces of information that refer to the same real world entity. Entity-based aggregation of Web 2.0 data is an effective mechanism towards this direction. Adopting the suggestions of the Linked Data movement, aggregators are able to efficiently match and merge the data that refer to the same real world object.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
The collections can be found at http://www.softnet.tuc.gr/~ioannou/entityrequests.html.
References
Aizawa, A., Oyama, K.: A fast linkage detection scheme for multi-source information integration. In: WIRI, pp. 30–39 (2005)
Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. PVLDB 1(1), 230–244 (2008)
Amer-Yahia, S., Markl, V., Halevy, A.Y., Doan, A., Alonso, G., Kossmann, D., Weikum, G.: Databases and Web 2.0 panel at VLDB 2007. SIGMOD Rec. 37, 49–52 (2008)
Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses. In: VLDB (2002)
Bhattacharya, I., Getoor, L.: Deduplication and group detection using links. In: LinkKDD (2004)
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)
Cohen, W.: Data integration using similarity joins and a word-based information representation language. TOIS 18(3), 288–321 (2000)
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: IIWeb Co-located with IJCAI, pp. 73–78 (2003)
Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26, 83–94 (2005)
Doan, A., Lu, Y., Lee, Y., Han, J.: Object matching for information integration: a profiler-based approach. In: IIWeb Co-located with IJCAI, pp. 53–58 (2003)
Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD Conference, pp. 85–96 (2005)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. TKDE 19, 1–16 (2007)
Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: schema mapping creation and data exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009)
Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. J. Data Semant. 7(3), 46–76 (2011)
Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. 7, 3–12 (2005)
Ioannou, E., Garofalakis, M.: Query analytics over probabilistic databases with unmerged duplicates. TKDE 27(8), 2245–2260 (2015)
Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. PVLDB 3(1), 429–438 (2010)
Ioannou, E., Niederée, C., Nejdl, W.: Probabilistic entity linkage for heterogeneous information spaces. In: Bellahsène, Z., Léonard, M. (eds.) CAiSE 2008. LNCS, vol. 5074, pp. 556–570. Springer, Heidelberg (2008)
Ioannou, E., Niederée, C., Velegrakis, Y.: Enabling entity-based aggregators for web 2.0 data. In: WWW, pp. 1119–1120 (2010)
Ioannou, E., Sathe, S., Bonvin, N., Jain, A., Bondalapati, S., Skobeltsyn, G., Niederée, C., Miklos, Z.: Entity search with Necessity. In: WebDB (2009)
Koudas, N., Marathe, A., Srivastava, D.: Flexible string matching against large databases in practice. In: VLDB, pp. 1078–1086 (2004)
McCallum, A., Nigam, K., Ungar, L.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD, pp. 169–178 (2000)
Miklós, Z., et al.: From Web data to entities and back. In: Pernici, B. (ed.) CAiSE 2010. LNCS, vol. 6051, pp. 302–316. Springer, Heidelberg (2010)
On, B.W., Koudas, N., Lee, D., Srivastava, D.: Group linkage. In: ICDE (2007)
Papadakis, G., Ioannou, E., Niederée, C., Fankhauser, P.: Efficient entity resolution for large heterogeneous information spaces. In: WSDM, pp. 535–544 (2011)
Papadakis, G., Ioannou, E., Niederée, C., Palpanas, T., Nejdl, W.: Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data. In: WSDM, pp. 53–62 (2012)
Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. PVLDB 4(4), 208–218 (2011)
Shen, W., DeRose, P., Vu, L., Doan, A., Ramakrishnan, R.: Source-aware entity matching: a compositional approach. In: ICDE, pp. 196–205 (2007)
Staworko, S., Ioannou, E.: Management of inconsistencies in data integration. In: Data Exchange, Integration, and Streams, pp. 217–225 (2013)
Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: KDD (2002)
Whang, S., Menestrina, D., Koutrika, G., Theobald, M., Garcia-Molina, H.: Entity resolution with iterative blocking. In: SIGMOD Conference, pp. 219–232 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ioannou, E., Velegrakis, Y. (2016). Searching Web 2.0 Data Through Entity-Based Aggregation. In: Nguyen, N.T., Kowalczyk, R., Rupino da Cunha, P. (eds) Transactions on Computational Collective Intelligence XXI. Lecture Notes in Computer Science(), vol 9630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49521-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-49521-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49520-9
Online ISBN: 978-3-662-49521-6
eBook Packages: Computer ScienceComputer Science (R0)