Similarity Function Recommender Service Using Incremental User Knowledge Acquisition

Ryu, Seung Hwan; Benatallah, Boualem; Paik, Hye-Young; Kim, Yang Sok; Compton, Paul

doi:10.1007/978-3-642-25535-9_15

Seung Hwan Ryu¹⁹,
Boualem Benatallah¹⁹,
Hye-Young Paik¹⁹,
Yang Sok Kim¹⁹ &
…
Paul Compton¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7084))

Included in the following conference series:

International Conference on Service-Oriented Computing

1668 Accesses
1 Citations

Abstract

Similar entity search is the task of identifying entities that most closely resemble a given entity (e.g., a person, a document, or an image). Although many techniques for estimating similarity have been proposed in the past, little work has been done on the question of which of the presented techniques are most suitable for a given similarity analysis task. Knowing the right similarity function is important as the task is highly domain- and data-dependent. In this paper, we propose a recommender service that suggests which similarity functions (e.g., edit distance or jaccard similarity) should be used for measuring the similarity between two entities. We introduce the notion of “similarity function recommendation rule” that captures user knowledge about similarity functions and their usage contexts. We also present an incremental knowledge acquisition technique for building and maintaining a set of similarity function recommendation rules.

Download to read the full chapter text

Chapter PDF

Bring User Interest to Related Entity Recommendation

Similarity vs. Relevance: From Simple Searches to Complex Discovery

Efficient Graph-Based Document Similarity

Keywords

References

Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses. In: VLDB, pp. 586–597 (2002)
Google Scholar
Báez, M., Benatallah, B., Casati, F., Chhieng, V.M., Mussi, A., Satyaputra, Q.K.: Liquid Course Artifacts Software Platform. In: Maglio, P.P., Weske, M., Yang, J., Fantinato, M. (eds.) ICSOC 2010. LNCS, vol. 6470, pp. 719–721. Springer, Heidelberg (2010)
Chapter Google Scholar
Bilenko, M., Basu, S., Sahami, M.: Adaptive product normalization: Using online learning for record linkage in comparison shopping. In: ICDM, pp. 58–65 (2005)
Google Scholar
Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: KDD, pp. 39–48. ACM (2003)
Google Scholar
Bilenko, M., Mooney, R.J., Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: Adaptive name matching in information integration. IEEE Intelligent Systems 18(5), 16–23 (2003)
Article Google Scholar
Buzan, T., Buzan, B.: The mind map book. BBC Active (2006)
Google Scholar
Carey, M.: Data delivery in a service-oriented world: the bea aqualogic data services platform. In: SIGMOD 2006, pp. 695–705 (2006)
Google Scholar
Castro, P., Nori, A.: Astoria: A programming model for data on the web. In: ICDE, pp. 1556–1559 (2008)
Google Scholar
Christen, P.: A comparison of personal name matching: Techniques and practical issues. In: ICDM Workshops, pp. 290–294 (2006)
Google Scholar
Cochinwala, M., Kurien, V., Lalk, G., Shasha, D.: Efficient data reconciliation. Inf. Sci. 137, 1–15 (2001)
Article MATH Google Scholar
Cohen, W.W., Richman, J.: Learning to match and cluster large high-dimensional data sets for data integration. In: KDD, pp. 475–480 (2002)
Google Scholar
Compton, P., Jansen, R.: A philosophical basis for knowledge acquisition. Knowl. Acquis. 2(3), 241–257 (1990)
Article Google Scholar
Compton, P., Peters, L., Lavers, T., Kim, Y.S.: Experience with long-term knowledge acquisition. In: K-CAP, pp. 49–56 (2011)
Google Scholar
Dong, X., Halevy, A.Y., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD Conference, pp. 85–96 (2005)
Google Scholar
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Article Google Scholar
Hall, P.A.V., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. 12, 381–402 (1980)
Article MathSciNet Google Scholar
Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: Data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2, 9–37 (1998)
Article Google Scholar
Ho, V.H., Compton, P., Benatallah, B., Vayssière, J., Menzel, L., Vogler, H.: An incremental knowledge acquisition method for improving duplicate invoices detection. In: ICDE, pp. 1415–1418 (2009)
Google Scholar
Lee, M.L., Ling, T.W., Low, W.L.: Intelliclean: a knowledge-based intelligent data cleaner. In: KDD, pp. 290–294 (2000)
Google Scholar
Li, Q., Wu, Y.-F.B.: People search: Searching people sharing similar interests from the web. J. Am. Soc. Inf. Sci. Technol. 59(1), 111–125 (2008)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Peukert, E., Eberius, J., Rahm, E.: Amc - a framework for modelling and comparing matching systems as matching processes. In: ICDE, pp. 1304–1307 (2011)
Google Scholar
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: KDD, pp. 269–278 (2002)
Google Scholar
Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: KDD, pp. 350–359 (2002)
Google Scholar
Winkler, W.E.: Using the em algorithm for weight computation in the fellegi-sunter model of record linkage. In: Survey Research Methods Section, American Statistical Association, pp. 667–671 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Engineering, University of New South Wales, Sydney, NSW, 2051, Australia
Seung Hwan Ryu, Boualem Benatallah, Hye-Young Paik, Yang Sok Kim & Paul Compton

Authors

Seung Hwan Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Boualem Benatallah
View author publications
You can also search for this author in PubMed Google Scholar
Hye-Young Paik
View author publications
You can also search for this author in PubMed Google Scholar
Yang Sok Kim
View author publications
You can also search for this author in PubMed Google Scholar
Paul Compton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstraße 9-11, 1010, Vienna, Austria
Gerti Kappel
College of Information Technology, Zayed University, P.O. Box 19282, Dubai, UAE
Zakaria Maamar
HP Labs - Services Research Lab, 1501 Page Mill Road, 94304, Palo Alto, CA, USA
Hamid R. Motahari-Nezhad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ryu, S.H., Benatallah, B., Paik, HY., Kim, Y.S., Compton, P. (2011). Similarity Function Recommender Service Using Incremental User Knowledge Acquisition. In: Kappel, G., Maamar, Z., Motahari-Nezhad, H.R. (eds) Service-Oriented Computing. ICSOC 2011. Lecture Notes in Computer Science, vol 7084. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25535-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-25535-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25534-2
Online ISBN: 978-3-642-25535-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Similarity Function Recommender Service Using Incremental User Knowledge Acquisition

Abstract

Chapter PDF

Similar content being viewed by others

Bring User Interest to Related Entity Recommendation

Similarity vs. Relevance: From Simple Searches to Complex Discovery

Efficient Graph-Based Document Similarity

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Similarity Function Recommender Service Using Incremental User Knowledge Acquisition

Abstract

Chapter PDF

Similar content being viewed by others

Bring User Interest to Related Entity Recommendation

Similarity vs. Relevance: From Simple Searches to Complex Discovery

Efficient Graph-Based Document Similarity

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation