Advertisement

XCC: change control of XML documents

An Efficient and Reliable Framework for XML Diff, Patch, and Merge
  • Sebastian Rönnau
  • Uwe M. Borghoff
Special Issue Paper

Abstract

XML-based documents play a major role in modern information architectures and their corresponding workflows. In this context, the ability to identify and represent differences between two versions of a document is essential, as well as the merging of document versions resulting from parallel editing processes.

Different approaches try to meet these challenges using operational transformation or document annotation. In both approaches, the changes are tracked during editing, which requires corresponding editing applications. In the context of software development, however, a state-based approach is common. Here, versions are compared and merged using external tools, called diff and patch. This allows the users for editing documents without being tightened to editing tools. Approaches exist that are able to compare XML documents, but lack a corresponding merge capability.

In this article, we present a comprehensive framework that allows for comparing and merging of XML documents using a state-based approach. Its design is based on an analysis of XML documents and their modification patterns. The framework is built on top of a context-oriented delta model. We present a diff algorithm that appears to be highly efficient in terms of speed and delta quality. The patch algorithm is able to merge document versions efficiently and reliably. The efficiency and the reliability of our approach are verified using a competitive test scenario.

Keywords

XML Document version control Change management Diff Patch Merge 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brauer M, Weir R, McRae M (2007) OpenDocument v1.1 specification. URL http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.pdf
  2. 2.
    Paoli J, Valet-Harper I, Farquhar A, Sebestyen I (2006) ECMA-376 office open XML file formats. URL http://www.ecma-international.org/publications/standards/Ecma-376.htm
  3. 3.
    Pemberton S (2002) XHTML™1.0 the extensible hypertext markup language (second edition). http://www.w3.org/TR/2002/REC-xhtml1-20020801
  4. 4.
    Selkow SM (1977) The tree-to-tree editing problem. Inf Process Lett 6(6):184–186 doi: 10.1016/0020-0190(77)90064-3 MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Rönnau S, Pauli C, Borghoff UM (2008) Merging changes in XML documents using reliable context fingerprints. In: DocEng ’08: Proc of the 8th ACM symp on document engineering. ACM, New York, pp 52–61. doi:http://doi.acm.org/10.1145/1410140.1410151 CrossRefGoogle Scholar
  6. 6.
    Rönnau S, Borghoff UM (2009) Versioning XML-based office documents. Multimed Tools Appl 43(3):253–274. doi:http://dx.doi.org/10.1007/s11042-009-0271-2 CrossRefGoogle Scholar
  7. 7.
    Rönnau S, Philipp G, Borghoff UM (2009) Efficient change control of XML documents. In: DocEng ’09: Proc of the 9th ACM symp on document engineering. ACM, New York, pp 3–12 CrossRefGoogle Scholar
  8. 8.
    Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173. doi:http://doi.acm.org/10.1145/321796.321811 MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: SPIRE ’00: Proc of the 7th int symp on string processing information retrieval. IEEE Computer Society, Washington, p 39 CrossRefGoogle Scholar
  10. 10.
    Tai KC (1979) The tree-to-tree correction problem. J ACM 26(3):422–433. doi:http://doi.acm.org/10.1145/322139.322143 MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Bille P (2005) A survey on tree edit distance and related problems. Theor Comput Sci 337(1-3):217–239. doi:http://dx.doi.org/10.1016/j.tcs.2004.12.030 MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Chawathe SS, Garcia-Molina H (1997) Meaningful change detection in structured data. SIGMOD Rec 26(2):26–37. doi:http://doi.acm.org/10.1145/253262.253266 CrossRefGoogle Scholar
  13. 13.
    DeRose S, Clark J XML path language (XPath) version 1.0 (1999). http://www.w3.org/TR/1999/REC-xpath-19991116
  14. 14.
    Chen W (2001) New algorithm for ordered tree-to-tree correction problem. J Algorithms 40(2):135–158. doi: 10.1006/jagm.2001.1170 MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Chawathe SS, Rajaraman A, Garcia-Molina H, Widom J (1996) Change detection in hierarchically structured information. In: SIGMOD ’96: Proc of the 1996 ACM int conf on management of data. ACM, New York, pp 493–504. doi:http://doi.acm.org/10.1145/233269.233366 CrossRefGoogle Scholar
  16. 16.
    Fetterly D, Manasse M, Najork M, Wiener J (2003) A large-scale study of the evolution of web pages. In: WWW ’03: Proc of the 12th int conf on world wide web. ACM, New York, pp 669–678. doi:http://doi.acm.org/10.1145/775152.775246 Google Scholar
  17. 17.
    Ntoulas A, Cho J, Olston C (2004) What’s new on the web?: the evolution of the web from a search engine perspective. In: WWW ’04: Proc of the 13th int conf on world wide web. ACM, New York, pp 1–12. doi:http://doi.acm.org/10.1145/988672.988674 CrossRefGoogle Scholar
  18. 18.
    Adar E, Teevan J, Dumais ST, Elsas JL (2009) The web changes everything: understanding the dynamics of web content. In: WSDM ’09: Proc of the 2nd ACM int conf on web search and data mining. ACM, New York, pp 282–291. doi:http://doi.acm.org/10.1145/1498759.1498837 CrossRefGoogle Scholar
  19. 19.
    Davison W (1990) Unified context diff tools. vol 14, Issue 70 of comp.sources.misc Google Scholar
  20. 20.
    Mouat A (2002) XML diff and patch utilities. Master’s thesis, Heriot-Watt University, Edinburgh Scotland Google Scholar
  21. 21.
    Myers EW (1986) An o(nd) difference algorithm and its variations. Algorithmica 1:251–266 MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Zhang K (1996) Efficient parallel algorithms for tree editing problems. In: CPM ’96: Proc of the 7th ann symp on combinatorial pattern matching. Springer, London, pp 361–372 CrossRefGoogle Scholar
  23. 23.
    Khanna S, Kunal K, Pierce BC (2007) A formal investigation of diff3. In: Prasad A (ed) Foundations of software technology and theoretical computer science (FSTTCS) Google Scholar
  24. 24.
    Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262. doi:http://dx.doi.org/10.1137/0218082 MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Rönnau S, Scheffczyk J, Borghoff UM (2005) Towards XML version control of office documents. In: DocEng ’05: Proc of the 2005 ACM symp on document engineering. ACM, New York, pp 10–19. doi:http://doi.acm.org/10.1145/1096601.1096606 CrossRefGoogle Scholar
  26. 26.
    Cobéna G, Abiteboul S, Marian A (2002) Detecting changes in XML documents. In: Proc of the 18th int conf on data engineering. IEEE Computer Society, Los Alamitos, pp 41–52 CrossRefGoogle Scholar
  27. 27.
    Lindholm T, Kangasharju J, Tarkoma S (2006) Fast and simple XML tree differencing by sequence alignment. In: DocEng ’06: Proc of the 2006 ACM symp on document engineering. ACM, New York, pp 75–84. doi:http://doi.acm.org/10.1145/1166160.1166183 CrossRefGoogle Scholar
  28. 28.
    Lindholm T (2004) A three-way merge for XML documents. In: DocEng ’04: Proc of the 2004 ACM symp on document engineering. ACM, New York, pp 1–10. doi:http://doi.acm.org/10.1145/1030397.1030399 CrossRefGoogle Scholar
  29. 29.
    Fontaine RL (2002) Merging XML files: a new approach providing intelligent merge of XML data sets. In: Proc of XML Europe 2002 Google Scholar
  30. 30.
    Rosado LA, Márquez AP, Gil JM (2007) Managing branch versioning in versioned/temporal XML documents. In: Barbosa D, Bonifati A, Bellahsene Z, Hunt E, Unland R (eds) XSym. Lecture notes in computer science, vol 4704. Springer, Berlin, pp 107–121 Google Scholar
  31. 31.
    Sun C, Ellis C (1998) Operational transformation in real-time group editors: issues, algorithms, and achievements. In: CSCW ’98: Proc of the 1998 ACM conf on computer supported cooperative work. ACM, New York, pp 59–68. doi:http://doi.acm.org/10.1145/289444.289469 CrossRefGoogle Scholar
  32. 32.
    Molli P, Skaf-Molli H, Oster G, Jourdain S (2002) Sams: synchronous, asynchronous, multi-synchronous environments. pp 80–84. doi: 10.1109/CSCWD.2002.1047653
  33. 33.
    Wang D, Mah A (2009) Google wave operational transformation. http://www.waveprotocol.org/whitepapers/operational-transform
  34. 34.
    Ignat CL, Oster G (2008) Peer-to-peer collaboration over XML documents. In: CDVE ’08: Proc of the 5th int conf on cooperative design, visualization, and engineering. Springer, Berlin/Heidelberg, pp 66–73 CrossRefGoogle Scholar
  35. 35.
    Sun C, Sosič R (1999) Optimal locking integrated with operational transformation in distributed real-time group editors. In: PODC ’99: Proc. of the 18th ann ACM symp on principles of distributed computing. ACM, New York, pp 43–52. doi:http://doi.acm.org/10.1145/301308.301322 CrossRefGoogle Scholar
  36. 36.
    Mens T (2002) A state-of-the-art survey on software merging. IEEE Trans Softw Eng 28(5):449–462 CrossRefGoogle Scholar
  37. 37.
    Gropengießer F, Hose K, Sattler KU (2009) An extended transaction model for cooperative authoring of XML data. Comput Sci Res Dev 24(1–2):85–100 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Universität der Bundeswehr MünchenMunichGermany

Personalised recommendations