Semantic-based Merging of RSS Items

Taddesse, Fekade Getahun; Tekli, Joe; Chbeir, Richard; Viviani, Marco; Yetongnon, Kokou

doi:10.1007/s11280-009-0074-4

Semantic-based Merging of RSS Items

Published: 02 December 2009

Volume 13, pages 169–207, (2010)
Cite this article

World Wide Web Aims and scope Submit manuscript

Fekade Getahun Taddesse¹,
Joe Tekli¹,
Richard Chbeir¹,
Marco Viviani¹ &
…
Kokou Yetongnon¹

212 Accesses
18 Citations
Explore all metrics

Abstract

Merging XML documents can be of key importance in several applications. For instance, merging the RSS news from same or different sources and providers can be beneficial for end-users in various scenarios. In this paper, we address this issue and explore the relatedness measure between RSS elements. We show here how to define and compute exclusive relations between any two elements and provide several predefined merging operators that can be extended and adapted to human needs. We also provide a set of experiments conducted to validate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aldendefer, M.S., Blashfield, R.K.: Cluster analysis. Sage, Beverly Hills (1984)
Google Scholar
Bergamaschi, S., Castano, S., Vincini, M., Beneventano, D.: Semantic integration of heterogeneous information sources. Data Knowl Eng 36, 215–249 (2001)
Article MATH Google Scholar
Bille, P.: A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1–3), 217–239 (2005)
Article MATH MathSciNet Google Scholar
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput Linguist 32(1), 13–47 (2006)
Article Google Scholar
Chawathe, S.S.: Comparing hierarchical data in external memory. In VLDB '99: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 90–101. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Cohen, W.: A web-based information system that reasons with structured collections of text. In Proceedings of Autonomous Agents’98 (1998)
Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.K.: A methodology for clustering XML documents by structure. Inf. Syst. 31(3), 187–228 (2006)
Article Google Scholar
Flesca, S., Manco, G., Masciari, E., Pontieri, L.: Fast detection of xml structural similarity. IEEE Trans. Knowl. Data Eng. 17(2), 160–175 (2005). Student Member-Andrea Pugliese
Article Google Scholar
Garcia, I., Ng, Y.-K.: Eliminating redundant and less-informative RSS news articles based on word similarity and a fuzzy equivalence relation. ICTAI 465–473 (2006)
Getahun, F., Tekli, J., Atnafu, S., Chbeir, R.: Towards efficient horizontal multimedia database fragmentation using semantic-based predicates implication. In XXII Simposio Brasileiro de Banco de Dados, 15–19 de Outubro, Jo ~ ao Pessoa, Para ba, Brasil, Anais, Proceedings, pp. 68–82 (2007)
Getahun, F., Tekli, J., Chbeir, R., Viviani, M., Yétongnon, K.: Relating RSS News/Items. ICWE 442-452 (2009)
Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 18, 54–64 (1969)
Article MathSciNet Google Scholar
Grabs, T., Schek, H.-J.: Generating vector spaces on-the-fly for flexible XML retrieval. In Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Tampere, Finland, pp. 4–13. ACM (2002)
Grahne, G., Mendelzon, A.: Tableau techniques for querying information sources through global schemas. In Proceedings of the 7th International Conference on Database Theory (ICDT’99), Lecture Notes in Computer Science. Springer (1999)
Gulli, A.: http://www.di.unipi.it/~gulli/ (2009)
Gustafson, N. Pera, M.S., Ng, Y.-K.: Generating fuzzy equivalence classes on RSS news articles for retrieving correlated information. ICCSA, Springer-Verlag, Berlin, Heidelberg, pp. 232–247 (2008)
Halevy, A.Y.: Answering queries using views: a survey. The VLDB Journal 10(4), 270–294 (2001)
Article MATH Google Scholar
Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R.: Template-based wrappers in the TSIMMIS system. In Proceedings of ACM SIGMOD’97. ACM (1997)
Hammersley, B.: Content Syndication with RSS. O’Reilly & Associates, San Francisco (2003)
Google Scholar
Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. SIGMOD Rec. 25(2), 205–216 (1996)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)
Article MATH Google Scholar
Hubert, L.J., Levin, J.R.: A general statistical framework for accessing categorical clustering in free recall. Psychol. Bull. 83, 1072–1082 (1976)
Article Google Scholar
Hunter, A., Liu, W.: Fusion rules for merging uncertain information. Inform. Fusion 7(1), 97–134 (2006)
Google Scholar
Hunter, A., Liu, W.: Merging uncertain information with semantic heterogeneity in XML. Knowl. Inf. Syst. 9(2), 230–258 (2006)
Article Google Scholar
Hunter, A., Summerton, R.: Fusion rules for context-dependent aggregation of structured news reports. J Appl Non-Class Log. 14(3), 329–366 (2004)
Article MATH Google Scholar
Hunter, A., Summerton, R.: A knowledge-based approach to merging information. Knowl.-Based Syst. 19(8), 647–674 (2006)
Article Google Scholar
Hunter, A., Summerton, R.: Propositional fusion rules. In Symbolic and Quantitative Approaches to Reasoning with Uncertainty, 7th European Conference, ECSQARU 2003, Aalborg, Denmark, July 2-5, 2003. Proceedings, Lecture Notes in Computer Science, pp. 502–514. Springer (2003)
Hunter, A., Summerton, R.: Propositional fusion rules. In: LNCS, vol. 2711, pp. 502–514 Springer
Jardine, N., Sibson, R.: Mathematical taxonomy. Wiley, New York (1971)
MATH Google Scholar
Kade, A.M., Heuser, C.A.: Matching XML documents in highly dynamic applications. Proceeding of the Eighth ACM symposium on Document engineering ISBN:978-1-60558-081-4, Sao Paulo, Brazil, pp. 191–198 (2008)
King, B. Step-wise Clustering Procedures. J. Am. Stat. Assoc. 69, 86–101
Konieczny, S., Pérez, R.P.: Merging with integrity constraints. In ECSQARU '95: Proceedings of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty, pp. 233–244. Springer-Verlag, London (1999)
Google Scholar
Konieczny, S., Pérez, R.P.: On the logic of merging. In Principles of knowledge representation and reasoning (KR), pp. 488–498 (1998)
Krogstie, J. Opdahl, A.L., Sindre, G.: Generic schema merging, pp. 127–141, LNCS 4495 Springer-Verlag Berlin Heidelberg (2007)
La Fontaine, R.: Merging XML files: A new approach providing intelligent merge of XML data sets. In Proceedings of XML Europe ‘02 (2002)
Lau, H., Ng, W: A Unifying framework for merging and evaluating XML information. DASFAA '05, Proceedings, volume 3453 of Lecture Notes in Computer Science, pp. 81–94. Springer (2005)
Lin, D.: An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, pp. 296–304, Morgan Kaufmann Publishers Inc. (1998)
Lindholm, T.: XML three-way merge as a reconciliation engine for mobile data. In MobiDe '03: Proceedings of the 3rd ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 93–97. ACM, New York (2003)
Book Google Scholar
Lindholm, T.: A three-way merge for XML documents. In DocEng '04: Proceedings of the 2004 ACM Symposium on Document Engineering, pp. 1–10. ACM, New York (2004)
Book Google Scholar
McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Article Google Scholar
Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In Proceedings of the Fifth International Workshop on the Web and Databases, WebDB 2002, pp. 61–66. University of California (2002)
Pera, M.S., Ng, Y.-K.: Finding similar RSS news articles using correlation-based phrase matching. KSEM 336–348 (2007)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Poulovassilis, A., McBrien, P.: A general formal framework for schema transformation. Data Knowl Eng 28, 47–71 (1998)
Article MATH Google Scholar
Princeton University Cognitive Science Laboratory. WordNet: a lexical database for the English language. http://wordnet.princeton.edu/
Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11, 95–130 (1999)
MATH Google Scholar
Richardson, R., Smeaton, A.F.: Using wordnet in a knowledge-based approach to information retrieval. Technical Report CA-0395, School of Computer Applications, Trinity College, Dublin, Ireland (1995)
RSS Advisory Board. RSS 2.0 Specification. http://www.rssboard.org/
Sneath, P.H.A., Sokal, R.R.: Numerical taxonomy: the principles and practice of numerical classification. W.H. Freeman, San Francisco (1973)
MATH Google Scholar
Tekli, J. Chbeir, R., Ytongnon, K.: A hybrid approach for xml similarity. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plasil, F. (eds.) SOFSEM '07, Proceedings, vol. 4362 of Lecture Notes in Computer Science, pp. 783–795. Springer (2007)
Ullman, J.D.: Information integration using logical views. In ICDT '97: Proceedings of the 6th International Conference on Database Theory, pp. 19–40. Springer-Verlag, London (1997)
Google Scholar
Wu, S., Manber, U., Myers, G., Miller, W.: An O(NP) sequence comparison algorithm. Inf. Process Lett. 35(6), 317–323 (1990)
Article MATH MathSciNet Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138, Morristown, NJ, USA (1994). Association for Computational Linguistics
WWW Consortium. The document object model, http://www.w3.org/DOM

Download references

Author information

Authors and Affiliations

LE2I Laboratory UMR-CNRS, University of Bourgogne, Engineer’s wing, 9 Savary St., 21078, Dijon Cedex, France
Fekade Getahun Taddesse, Joe Tekli, Richard Chbeir, Marco Viviani & Kokou Yetongnon

Authors

Fekade Getahun Taddesse
View author publications
You can also search for this author in PubMed Google Scholar
Joe Tekli
View author publications
You can also search for this author in PubMed Google Scholar
Richard Chbeir
View author publications
You can also search for this author in PubMed Google Scholar
Marco Viviani
View author publications
You can also search for this author in PubMed Google Scholar
Kokou Yetongnon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fekade Getahun Taddesse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taddesse, F.G., Tekli, J., Chbeir, R. et al. Semantic-based Merging of RSS Items. World Wide Web 13, 169–207 (2010). https://doi.org/10.1007/s11280-009-0074-4

Download citation

Received: 02 April 2009
Revised: 12 October 2009
Accepted: 03 November 2009
Published: 02 December 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11280-009-0074-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic-based Merging of RSS Items

Abstract

Access this article

Similar content being viewed by others

A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents

A methodology for measuring structure similarity of fuzzy XML documents

Clustering XML Documents Using Frequent Edge-Sets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic-based Merging of RSS Items

Abstract

Access this article

Similar content being viewed by others

A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents

A methodology for measuring structure similarity of fuzzy XML documents

Clustering XML Documents Using Frequent Edge-Sets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation