Skip to main content

Markup Beyond the Trees

  • Chapter
  • First Online:

Part of the book series: Law, Governance and Technology Series ((LGTS,volume 15))

Abstract

The issue of enabling the coexistence of independent sets of annotations that may overlap on the same textual content has been and still is an important topic to address within the document markup community. In this chapter, I propose a solution to the problem of overlapping markup based on the use of Semantic Web technologies. In particular, I introduce EARMARK, the Extremely Annotational RDF Markup, a markup metalanguage and an OWL ontology that enable one to create documents with single hierarchies (as with XML) and also with multiple overlapping hierarchies whose textual content within the markup items belongs to some hierarchies but not to others. I show possible scenarios of application of EARMARK and, finally, I discuss how to use existing Semantic Web technologies, e.g., OWL DL reasoners, to assess the correctness properties typical of the structural markup, such as validity against a schema.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The document structure on its own can be seen as a particular kind of semantics. In fact, when we speak about a text as structured in terms of its paragraphs, sections, chapters, etc., what we are doing is to associate a semantic role of particular parts of the text.

  2. 2.

    The element person as defined in TEI Text Encoding Initiative Consortium (2013): http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-person.html.

  3. 3.

    The class Person as defined in FOAF (Brickley and Miller 2010): http://xmlns.com/foaf/spec/#term_Person.

  4. 4.

    Pellet: http://pellet.owldl.com.

  5. 5.

    EARMARK ontology: http://www.essepuntato.it/2008/12/earmark.

  6. 6.

    This and the following diagrams comply with the Graphic framework for OWL ontologies (Graffoo), introduced in Sect. 6.4. A legend for all Graffoo diagrams can be found in Fig. 6.13 on page 227.

  7. 7.

    All our OWL samples are presented using the Manchester Syntax (Horridge and Patel-Schneider 2012) and Turtle (Prud’hommeaux and 2013), which are two of the standard linearisation syntaxes of OWL. The prefixes rdfsand xsd refer respectively to RDF Schema and XML Schema namespaces, while the prefix earmark refers to the EARMARK ontology URI plus “#”. Moreover, we use the prefix co to indicate entities taken from an imported ontology made for the SWAN project (Ciccarese et al. 2008), available at http://swan-ontology.googlecode.com/svn/tags/1.2/collections.owl.

  8. 8.

    This class (and its name) is based on the concept introduced by Ted Nelson in his Xanadu Project (Nelson 1980) to refer to the collection of text fragments that can be interconnected to each other and transcluded into new documents.

  9. 9.

    http://en.wikipedia.org/wiki/Palindrome#Semordnilaps.

  10. 10.

    A blog post by Paolo Ciccarese explaining why RDF collections cannot be used in OWL contexts: http://hcklab.blogspot.com/2008/12/moving-towards-swan-collectionshtml.

  11. 11.

    In the excerpt, the prefix overlapping refers to “http://www.essepuntato.it/2011/05/overlapping/”.

  12. 12.

    The EARMARK documents describing these three overlapping scenarios and all the other ones presented in the following sections are available at http://www.essepuntato.it/2011/jasist/examples.

  13. 13.

    The EARMARK Overlapping Ontology: http://www.essepuntato.it/2011/05/overlapping.

  14. 14.

    In order to individually address the issues, we edited the original bullets into a numbered list.

  15. 15.

    The full details about each version and each format are also available at http://www.essepuntato.it/2011/jasist/discussion.

  16. 16.

    HCalendar: http://microformats.org/wiki/hcalendar.

  17. 17.

    HCard: http://microformats.org/wiki/hcard.

  18. 18.

    Huggle: http://en.wikipediaorg/wiki/Wikipedia:Huggle.

  19. 19.

    Lupin, the Anti-vandal tool: http://en.wikipedia.org/wiki/User:Lupin/Anti-vandal_tool.

  20. 20.

    Twinkle: http://en.wikipedia.org/wiki/Wikipedia:Twinkle.

  21. 21.

    MediaWiki: http://www.mediawiki.org.

  22. 22.

    For the sake of clarity we removed all markup irrelevant to our discussion.

  23. 23.

    More formally, for XML-based languages, a content model of a markup element is “a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear” (Bray et al. 2006).

  24. 24.

    The model is available at http://www.essepuntato.it/2011/03/schemaexample.

  25. 25.

    The prefix pattern refers to “http://www.essepuntato.it/2008/12/pattern#”.

  26. 26.

    The Pattern Ontology: http://www.essepuntato.it/2008/12/pattern

  27. 27.

    http://www.essepuntato.it/2010/04/ParadiseLost

  28. 28.

    http://www.essepuntato.it/2010/04/ParadiseLost/test.

  29. 29.

    JXML2OWL: http://jxml2owl.projects.semwebcentral.org .

References

  • Adida, B., M. Birbeck, S. McCarron, and S. Pemberton. 2013. RDFa Core 1.1 - Second edition: Syntax and processing rules for embedding RDF through attributes. W3C Recommendation 22 August 2013. World Wide Web Consortium. http://www.w3.org/TR/rdfa-syntax/. Accessed 30 July 2013.

  • Alexander, C. 1979. The timeless way of building. New York: Oxford University Press. (ISBN 0195024029)

    Google Scholar 

  • Allsopp, J. 2007. Microformats: Empowering your markup for web 2.0. New York: Friends of ED Press. (ISBN: 1590598146).

    Google Scholar 

  • Barabucci, G., A. Di Iorio, S. Peroni, F. Poggi, and F. Vitali. 2013. Annotations with EARMARK in practice: A fairy tale. Proceedings of the 2013 workshop on collaborative annotations in shared environments: Metadata, vocabularies and techniques in the digital humanities (DH-CASE 2013). New York: ACM. doi:10.1145/2517978.2517990.

    Google Scholar 

  • Barabucci, G., S. Peroni, F. Poggi, and F. Vitali. 2012. Embedding semantic annotations within texts: The FRETTA approach. Proceedings of the 27th Symposium on Applied Computing (SAC 2012): 658–663. New York: ACM. doi:10.1145/2245276.2245403

    Google Scholar 

  • Bański, P. 2010. Why TEI stand-off annotation doesn’t quite work: And why you might want to use it nevertheless. Proceedings of balisage: The markup conference 2010. Rockville: Mulberry Technologies, Inc. http://www.balisage.net/Proceedings/vol5/html/Banski01/BalisageVol5-Banski01.html Accessed 30 July 2013.

  • Berglund, A., S. Boag, D. Chamberlin, M. F. Fernández, M. Kay, J. Robie, and J. Siméon. 2011. XML Path Language (XPath) 2.0 (Second edition). W3C Recommendation 14 December 2010 (Link errors corrected 3 January 2011). World Wide Web Consortium. http://www.w3.org/TR/xpath20/. Accessed 30 July 2013.

  • Bray, T., J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau, and J. Cowan. 2006. Extensible Markup Language (XML) 1.1 (Second edition). W3C Recommendation 16 August 2006, edited in place 29 September 2006. World Wide Web Consortium. http://www.w3.org/TR/xml11/. Accessed 30 July 2013.

  • Brickley, D., and L. Miller. 2010. FOAF vocabulary specification 0.98. Namespace document, 9 August 2010 - Marco Polo Edition. http://xmlns.com/foaf/spec/. Accessed 30 July 2013.

  • Carroll, J., I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. 2004. Jena: Implementing the semantic web recommendations. In Proceedings of the 13th international conference on World Wide Web - Alternate Track Papers & Posters (WWW 2004), ed. S. I. Feldman, M. Uretsky, M. Najork, and C. E. Wills, 74–83. New York: ACM. doi:10.1145/1013367.1013381.

    Google Scholar 

  • Ciccarese, P., E. Wu, J. Kinoshita, G. Wong, M. Ocana, A. Ruttenberg, and T. Clark. 2008. The SWAN biomedical discourse ontology. Journal of Biomedical Informatics 41 (5): 739–751. doi:10.1016/j.jbi.2008.04.010.

    Article  Google Scholar 

  • Clark, J. 2001. RELAX NG specification. Committee specification. Committee specification 3 December 2001. Organization for the advancement of structured information standards. http://relaxng.org/spec-20011203.html. Accessed 30 July 2013.

  • Clark, J. 2002. RELAX NG Compact syntax. Committee specification. Committee specification 21 November 2002. Organization for the advancement of structured information standards. http://relaxng.org/compact-20021121.html. Accessed 30 July 2013.

  • Dattolo, A., A. Di Iorio, S. Duca, A.A. Feliziani, and F. Vitali. 2007. Structural documents. In Proceedings of the 7th International Conference on Web Engineering 2007 (ICWE 2007), Lecture notes in computer science 4607 Baresi, ed. L. Fraternali and P. G. Houben, 421–426. Berlin: Springer. doi:10.1007/978-3-540-73597-7_35.

    Google Scholar 

  • De Waard, A. 2010. From proteins to fairytales: Directions in semantic publishing. IEEE Intelligent Systems 25 (2): 83–88. doi:10.1109/MIS.2010.49.

    Article  Google Scholar 

  • Di Iorio, A., D. Gubellini, and F. Vitali. 2005. Design patterns for document substructures. In Proceedings of the extreme markup languages 2005. Rockville: Mulberry Technologies, Inc. http://conferences.idealliance.org/extreme/html/2005/Vitali01/EML2005Vitali01.html. Accessed 30 July 2013.

  • Di Iorio, A., C. Marchetti, M. Schirinzi, and F. Vitali. 2009. Natural and multi-layered approach to detect changes in tree-based textual documents. In Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS 2009), lecture notes in business information processing 24 vols, ed. J. Cordeiro and J. Filipe, 90–101. Berlin: Springer. doi:10.1007/978-3-642-01347-8_8.

    Google Scholar 

  • Di Iorio, A., S. Peroni, F. Poggi, and F. Vitali. 2012. A first approach to the automatic recognition of structural patterns in XML documents. Proceedings of the 2012 ACM symposium on document engineering (DocEng 2012). 85–94. New York: ACM. doi:10.1145/2361354.2361374.

    Google Scholar 

  • Di Iorio, A., S. Peroni, F. Poggi, and F. Vitali. 2013. Dealing with structural patterns of XML documents. To appear in journal of the American society for information science and Technology. doi:10.1002/asi.23088

    Google Scholar 

  • Di Iorio, A., S. Peroni, and F. Vitali. 2009. Towards markup support for full GODDAGs and beyond: The EARMARK approach. Proceedings of balisage: The markup conference 2009. Rockville: Mulberry Technologies, Inc. http://balisage.net/Proceedings/vol3/html/Peroni01/BalisageVol3-Peroni01. Accessed 30 July 2013.

  • Di Iorio, A., S. Peroni, and F. Vitali. 2010. Handling markup overlaps using OWL. In Proceedings of the 17th international conference on knowledge engineering and knowledge management (EKAW 2010), Lecture notes in computer science 6317 vols, ed. P. Cimiano and H. S. Pinto, 391–400. Berlin: Springer. doi:10.1007/978-3-642-16438-5_29.

    Google Scholar 

  • Di Iorio, A., S. Peroni, and F. Vitali. 2011. A semantic web approach to everyday overlapping markup. Journal of the American Society for Information Science and Technology 62 (9): 1696–1716. doi:10.1002/asi.21591.

    Google Scholar 

  • Di Iorio, A., S. Peroni, and F. Vitali. 2011. Using semantic web technologies for analysis and validation of structural markup. International Journal of Web Engineering and Technologies 6 (4): 375–398. doi:10.1504/IJWET.2011.043439.

    Google Scholar 

  • Di Iorio, A., S. Peroni, F. Vitali, J. Lumley, and T. Wiley. 2009. Towards XML Transclusions. In Proceedings of the 1st workshop on new forms of xanalogical storage and function, CEUR workshop proceedings, ed. F. Vitali, A. Di Iorio, and J. Blustein, vol. 508, 23–28. Aachen: CEUR-WS.org. http://ceur-ws.org/Vol-508/paper5.pdf. Accessed 30 July 2013.

  • Durand, D. G. 1994. Palimpsest, a data model for revision control. Paper presented at the workshop on collaborative editing systems, co-located with the computer supported cooperative work conference (CSCW94). October 22–26, 1994, Chapel Hill.

    Google Scholar 

  • Durand, D. G. 2008. Palimpsest: Change-oriented concurrency control for the support of collaborative applications. Charleston: CreateSpace.

    Google Scholar 

  • Ferdinand, M., C. Zirpins, and D. Trastour. 2004. Lifting XML schema to OWL. In Proceedings of the 4th International Conference on Web Engineering 2004 (ICWE 2004), Lecture notes in computer science 3140, ed. N. Koch, P. Fraternali, and M. Wirsing, 354–358. Berlin: Springer. doi:10.1007/978-3-540-27834-4_44.

    Google Scholar 

  • Gamma, E., R. Helm, R. Johnson, and J. Vlissides. 1994. Design patterns: Elements of reusable object-oriented software. Boston: Addison-Wesley. (ISBN: 0201633610).

    Google Scholar 

  • Gao, S., C. M. Sperberg-McQueen, and H. S Thompson. 2012. W3C XML schema definition language (XSD) 1.1 Part 1: Structures. W3C Recommendation 5 April 2012. World Wide Web Consortium. http://www.w3.org/TR/xmlschema11-1/. Accessed 30 July 2013.

  • Garlik, S. H., and A. Seaborne. 2013. SPARQL 1.1 Query language. W3C Recommendation 21 March 2013. World Wide Web Consortium. http://www.w3.org/TR/sparql11-query/. Accessed 30 July 2013.

  • Georg, R., O. Schonefeld, T. Trippel, and A. Witt. 2010. Sustainability of linguistic resources revisited. Proceedings of the international symposium on XML for the long haul: Issues in the long-term preservation of XML. Rockville: Mulberry Technologies, Inc. http://www.balisage.net/Proceedings/vol6/html/Witt01/BalisageVol6-Witt01.html. Accessed 30 July 2013.

  • Goldfarb, C. F. 1990. The SGML handbook. New York: Oxford University Press. (ISBN 0198537373).

    Google Scholar 

  • Horridge, M., and P. Patel-Schneider. 2012. OWL 2 web ontology language manchester syntax (Second edition). W3C working group note 11 December 2012. World Wide Web Consortium. http://www.w3.org/TR/owl2-manchester-syntax/. Accessed 30 July 2013.

  • Horrocks, I., P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean. 2004. SWRL: A semantic web rule language combining OWL and RuleML. W3C Member Submission 21 May 2004. World Wide Web Consortium. http://www.w3.org/Submission/SWRL/. Accessed 30 July 2013.

  • JTC1/SC34 WG 4. 2011. ISO/IEC 29500-1:2011—Information technology—document description and processing languages—office open XML file formats—Part 1: Fundamentals and markup language reference. Geneva: International Organization for Standardization. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=59575. Accessed 30 July 2013.

  • JTC1/SC34 WG 6. 2006. ISO/IEC 26300:2006—Information technology—open document format for office applications (OpenDocument) v1.0. Geneva: International Organization for Standardization. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=43 485. Accessed 30 July 2013.

  • Motik, B., P. F. Patel-Schneider, and B.C. Grau. 2012. OWL 2 Web ontology language: Direct semantics (Second edition). W3C Recommendation 11 December 2012. World Wide Web Consortium. http://www.w3.org/TR/owl2-direct-semantics/. Accessed 30 July 2013.

  • Nelson, T. 1980. Literary machines: The report on, and of, project Xanadu concerning word processing, electronic publishing, hypertext, thinkertoys, tomorrow’s intellectual... including knowledge, education and freedom.. Sausalito: Mindful Press.

    Google Scholar 

  • Peroni, S., A. Gangemi, and F. Vitali. 2011. Dealing with markup semantics. In Proceedings the 7th International conference on semantic systems (I-SEMANTICS 2011), ed. C. Ghidini, A. Ngonga Ngomo, S. N. Lindstaedt, and T. Pellegrini, 111–118. New York: ACM. doi:10.1145/2063518.2063533.

    Google Scholar 

  • Peroni, S., F. Poggi, and F. Vitali. 2013. Tracking changes through EARMARK: A theoretical perspective and an implementation. In Proceedings of 1st international workshop on (Document) changes: Modeling, detection, storage and visualization (DChanges 2013), ed. G. Barabucci, U. M. Borghoff, A. Di Iorio, and S. Maier, Aachen: CEUR-WS.org. http://ceur-ws.org/Vol-1008/paper6.pdf. Accessed 30 July 2013.

  • Peroni, S., and F. Vitali. 2009. Annotations with EARMARK for arbitrary, overlapping and out-of order markup. In Proceedings of the 2009 ACM Symposium on Document Engineering (DocEng 2009), ed. U. M. Borghoff and B. Chidlovskii, 171–180. New York: ACM. doi:10.1145/1600193.1600232

    Google Scholar 

  • Presutti, V., and A. Gangemi. 2008. Content ontology design patterns as practical building blocks for web ontologies. In Proceedings of the 27th international conference on conceptual modeling (ER 2008), Lecture notes in computer science 5231 vols, ed. Q. Li, S. Spaccapietra, E. S. K. Yu, and A. Olivé, 128-141. Berlin: Springer. doi:10.1007/978-3-540-87877-3_11

    Google Scholar 

  • Prud’hommeaux, E., and G. Carothers. 2013. Turtle, terse RDF triple language. W3C candidate recommendation 19 February 2013. World Wide Web Consortium. http://www.w3.org/TR/turtle/. Accessed 30 July 2013.

  • Rodrigues, T., P. Rosa, and J. Cardoso. 2006. Mapping Xml to existing owl ontologies. In Proceedings of the IADIS international conference on WWW/Internet 2006, ed. M. B. Nunes, P. Isaías, and I. J. Martínez. Lisbon: IADIS.

    Google Scholar 

  • Sirin, E., B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. 2007. Pellet: A practical OWL-DL reasoner. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 5 (2): 51–53. doi:10.1016/j.websem.2007.03.004.

    Google Scholar 

  • Sperberg-McQueen, C. M. 2006. Rabbit/duck grammars: A validation method for overlapping structures. Proceedings of extreme markup languages conference 2006. Rockville: Mulberry Technologies, Inc. http://conferences.idealliance.org/extreme/html/2006/SperbergMcQueen01/EML2006SperbergMcQueen01.html. Accessed 30 July 2013.

  • Sperberg-McQueen, C. M., and C. Huitfeldt. 2004. GODDAG: A data structure for overlapping hierarchies. In Proceeding of the 5th international workshop on the Principles of Digital Document Processing (PODDP 2000), Lecture notes in computer science 2023, ed. P. R. King and E. V Munson. 139–160. Berlin: Springer. doi:10.1007/978-3-540-39916-2_12.

    Google Scholar 

  • Tennison, J., and W. Piez. 2002. The Layered Markup and Annotation Language (LMNL). Presented at the extreme markup languages conference 2002. 4–9 August 2002, Montreal.

    Google Scholar 

  • Text Encoding Initiative Consortium 2013. TEI P5: Guidelines for electronic text encoding and interchange. Charlottesville: TEI Consortium. http://www.tei-c.org/Guidelines/P5. Accessed 30 July 2013.

  • Van Rijsbergen, C. J. 1986. A new theoretical framework for information retrieval. In Proceedings of the 9th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR86), ed. V. Raghavan and E. A. Fox, 23-29. New York: ACM. doi:10.1145/24634.24635.

    Google Scholar 

  • Yang, K., R. Steele, and A. Lo. 2007. An ontology for XML schema to ontology mapping representation. In Proceedings of the 9th international conference on information integration and Web-based Applications & Services (iiWAS 2007), ed. G. Kotsis, D. Taniar, E. Pardede, and I. K. Ibrahim, 101–111. Vienna: Austrian Computer Society.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvio Peroni .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Peroni, S. (2014). Markup Beyond the Trees. In: Semantic Web Technologies and Legal Scholarly Publishing. Law, Governance and Technology Series, vol 15. Springer, Cham. https://doi.org/10.1007/978-3-319-04777-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04777-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04776-8

  • Online ISBN: 978-3-319-04777-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics