Advertisement

World Wide Web

, Volume 13, Issue 1–2, pp 169–207 | Cite as

Semantic-based Merging of RSS Items

  • Fekade Getahun Taddesse
  • Joe Tekli
  • Richard Chbeir
  • Marco Viviani
  • Kokou Yetongnon
Article

Abstract

Merging XML documents can be of key importance in several applications. For instance, merging the RSS news from same or different sources and providers can be beneficial for end-users in various scenarios. In this paper, we address this issue and explore the relatedness measure between RSS elements. We show here how to define and compute exclusive relations between any two elements and provide several predefined merging operators that can be extended and adapted to human needs. We also provide a set of experiments conducted to validate our approach.

Keywords

RSS merging document relatedness clustering merging operators 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aldendefer, M.S., Blashfield, R.K.: Cluster analysis. Sage, Beverly Hills (1984)Google Scholar
  2. 2.
    Bergamaschi, S., Castano, S., Vincini, M., Beneventano, D.: Semantic integration of heterogeneous information sources. Data Knowl Eng 36, 215–249 (2001)zbMATHCrossRefGoogle Scholar
  3. 3.
    Bille, P.: A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1–3), 217–239 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput Linguist 32(1), 13–47 (2006)CrossRefGoogle Scholar
  5. 5.
    Chawathe, S.S.: Comparing hierarchical data in external memory. In VLDB '99: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 90–101. Morgan Kaufmann, San Francisco (1999)Google Scholar
  6. 6.
    Cohen, W.: A web-based information system that reasons with structured collections of text. In Proceedings of Autonomous Agents’98 (1998)Google Scholar
  7. 7.
    Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.K.: A methodology for clustering XML documents by structure. Inf. Syst. 31(3), 187–228 (2006)CrossRefGoogle Scholar
  8. 8.
    Flesca, S., Manco, G., Masciari, E., Pontieri, L.: Fast detection of xml structural similarity. IEEE Trans. Knowl. Data Eng. 17(2), 160–175 (2005). Student Member-Andrea PuglieseCrossRefGoogle Scholar
  9. 9.
    Garcia, I., Ng, Y.-K.: Eliminating redundant and less-informative RSS news articles based on word similarity and a fuzzy equivalence relation. ICTAI 465–473 (2006)Google Scholar
  10. 10.
    Getahun, F., Tekli, J., Atnafu, S., Chbeir, R.: Towards efficient horizontal multimedia database fragmentation using semantic-based predicates implication. In XXII Simposio Brasileiro de Banco de Dados, 15–19 de Outubro, Jo ~ ao Pessoa, Para ba, Brasil, Anais, Proceedings, pp. 68–82 (2007)Google Scholar
  11. 11.
    Getahun, F., Tekli, J., Chbeir, R., Viviani, M., Yétongnon, K.: Relating RSS News/Items. ICWE 442-452 (2009)Google Scholar
  12. 12.
    Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 18, 54–64 (1969)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Grabs, T., Schek, H.-J.: Generating vector spaces on-the-fly for flexible XML retrieval. In Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Tampere, Finland, pp. 4–13. ACM (2002)Google Scholar
  14. 14.
    Grahne, G., Mendelzon, A.: Tableau techniques for querying information sources through global schemas. In Proceedings of the 7th International Conference on Database Theory (ICDT’99), Lecture Notes in Computer Science. Springer (1999)Google Scholar
  15. 15.
  16. 16.
    Gustafson, N. Pera, M.S., Ng, Y.-K.: Generating fuzzy equivalence classes on RSS news articles for retrieving correlated information. ICCSA, Springer-Verlag, Berlin, Heidelberg, pp. 232–247 (2008)Google Scholar
  17. 17.
    Halevy, A.Y.: Answering queries using views: a survey. The VLDB Journal 10(4), 270–294 (2001)zbMATHCrossRefGoogle Scholar
  18. 18.
    Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R.: Template-based wrappers in the TSIMMIS system. In Proceedings of ACM SIGMOD’97. ACM (1997)Google Scholar
  19. 19.
    Hammersley, B.: Content Syndication with RSS. O’Reilly & Associates, San Francisco (2003)Google Scholar
  20. 20.
    Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. SIGMOD Rec. 25(2), 205–216 (1996)CrossRefGoogle Scholar
  21. 21.
    Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)zbMATHCrossRefGoogle Scholar
  22. 22.
    Hubert, L.J., Levin, J.R.: A general statistical framework for accessing categorical clustering in free recall. Psychol. Bull. 83, 1072–1082 (1976)CrossRefGoogle Scholar
  23. 23.
    Hunter, A., Liu, W.: Fusion rules for merging uncertain information. Inform. Fusion 7(1), 97–134 (2006)Google Scholar
  24. 24.
    Hunter, A., Liu, W.: Merging uncertain information with semantic heterogeneity in XML. Knowl. Inf. Syst. 9(2), 230–258 (2006)CrossRefGoogle Scholar
  25. 25.
    Hunter, A., Summerton, R.: Fusion rules for context-dependent aggregation of structured news reports. J Appl Non-Class Log. 14(3), 329–366 (2004)zbMATHCrossRefGoogle Scholar
  26. 26.
    Hunter, A., Summerton, R.: A knowledge-based approach to merging information. Knowl.-Based Syst. 19(8), 647–674 (2006)CrossRefGoogle Scholar
  27. 27.
    Hunter, A., Summerton, R.: Propositional fusion rules. In Symbolic and Quantitative Approaches to Reasoning with Uncertainty, 7th European Conference, ECSQARU 2003, Aalborg, Denmark, July 2-5, 2003. Proceedings, Lecture Notes in Computer Science, pp. 502–514. Springer (2003)Google Scholar
  28. 28.
    Hunter, A., Summerton, R.: Propositional fusion rules. In: LNCS, vol. 2711, pp. 502–514 SpringerGoogle Scholar
  29. 29.
    Jardine, N., Sibson, R.: Mathematical taxonomy. Wiley, New York (1971)zbMATHGoogle Scholar
  30. 30.
    Kade, A.M., Heuser, C.A.: Matching XML documents in highly dynamic applications. Proceeding of the Eighth ACM symposium on Document engineering ISBN:978-1-60558-081-4, Sao Paulo, Brazil, pp. 191–198 (2008)Google Scholar
  31. 31.
    King, B. Step-wise Clustering Procedures. J. Am. Stat. Assoc. 69, 86–101Google Scholar
  32. 32.
    Konieczny, S., Pérez, R.P.: Merging with integrity constraints. In ECSQARU '95: Proceedings of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty, pp. 233–244. Springer-Verlag, London (1999)Google Scholar
  33. 33.
    Konieczny, S., Pérez, R.P.: On the logic of merging. In Principles of knowledge representation and reasoning (KR), pp. 488–498 (1998)Google Scholar
  34. 34.
    Krogstie, J. Opdahl, A.L., Sindre, G.: Generic schema merging, pp. 127–141, LNCS 4495 Springer-Verlag Berlin Heidelberg (2007)Google Scholar
  35. 35.
    La Fontaine, R.: Merging XML files: A new approach providing intelligent merge of XML data sets. In Proceedings of XML Europe ‘02 (2002)Google Scholar
  36. 36.
    Lau, H., Ng, W: A Unifying framework for merging and evaluating XML information. DASFAA '05, Proceedings, volume 3453 of Lecture Notes in Computer Science, pp. 81–94. Springer (2005)Google Scholar
  37. 37.
    Lin, D.: An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, pp. 296–304, Morgan Kaufmann Publishers Inc. (1998)Google Scholar
  38. 38.
    Lindholm, T.: XML three-way merge as a reconciliation engine for mobile data. In MobiDe '03: Proceedings of the 3rd ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 93–97. ACM, New York (2003)CrossRefGoogle Scholar
  39. 39.
    Lindholm, T.: A three-way merge for XML documents. In DocEng '04: Proceedings of the 2004 ACM Symposium on Document Engineering, pp. 1–10. ACM, New York (2004)CrossRefGoogle Scholar
  40. 40.
    McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  41. 41.
    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)CrossRefGoogle Scholar
  42. 42.
    Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In Proceedings of the Fifth International Workshop on the Web and Databases, WebDB 2002, pp. 61–66. University of California (2002)Google Scholar
  43. 43.
    Pera, M.S., Ng, Y.-K.: Finding similar RSS news articles using correlation-based phrase matching. KSEM 336–348 (2007)Google Scholar
  44. 44.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  45. 45.
    Poulovassilis, A., McBrien, P.: A general formal framework for schema transformation. Data Knowl Eng 28, 47–71 (1998)zbMATHCrossRefGoogle Scholar
  46. 46.
    Princeton University Cognitive Science Laboratory. WordNet: a lexical database for the English language. http://wordnet.princeton.edu/
  47. 47.
    Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11, 95–130 (1999)zbMATHGoogle Scholar
  48. 48.
    Richardson, R., Smeaton, A.F.: Using wordnet in a knowledge-based approach to information retrieval. Technical Report CA-0395, School of Computer Applications, Trinity College, Dublin, Ireland (1995)Google Scholar
  49. 49.
    RSS Advisory Board. RSS 2.0 Specification. http://www.rssboard.org/
  50. 50.
    Sneath, P.H.A., Sokal, R.R.: Numerical taxonomy: the principles and practice of numerical classification. W.H. Freeman, San Francisco (1973)zbMATHGoogle Scholar
  51. 51.
    Tekli, J. Chbeir, R., Ytongnon, K.: A hybrid approach for xml similarity. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plasil, F. (eds.) SOFSEM '07, Proceedings, vol. 4362 of Lecture Notes in Computer Science, pp. 783–795. Springer (2007)Google Scholar
  52. 52.
    Ullman, J.D.: Information integration using logical views. In ICDT '97: Proceedings of the 6th International Conference on Database Theory, pp. 19–40. Springer-Verlag, London (1997)Google Scholar
  53. 53.
    Wu, S., Manber, U., Myers, G., Miller, W.: An O(NP) sequence comparison algorithm. Inf. Process Lett. 35(6), 317–323 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
  54. 54.
    Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138, Morristown, NJ, USA (1994). Association for Computational LinguisticsGoogle Scholar
  55. 55.
    WWW Consortium. The document object model, http://www.w3.org/DOM

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Fekade Getahun Taddesse
    • 1
  • Joe Tekli
    • 1
  • Richard Chbeir
    • 1
  • Marco Viviani
    • 1
  • Kokou Yetongnon
    • 1
  1. 1.LE2I Laboratory UMR-CNRSUniversity of BourgogneDijon CedexFrance

Personalised recommendations