Journal on Data Semantics

, Volume 1, Issue 3, pp 187–201 | Cite as

Linked Open Piracy: A Story about e-Science, Linked Data, and Statistics

  • Willem Robert van Hage
  • Marieke van ErpEmail author
  • Véronique Malaisé
Open Access
Original Article


There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent movement is the addition of RDF metadata to make automatic processing by computers easier. A fine example of this movement is the open government data initiative which, by representing data from spreadsheets and textual reports in RDF, strives to speed up the creation of geographical mashups and visual analytic applications. In this paper, we present a newly linked dataset and the method we used to automatically translate semi-structured reports on the Web to an RDF event model. We demonstrate how the semantic representation layer makes it possible to easily analyze and visualize the aggregated reports to answer domain questions through a SPARQL client for the R statistical programming language. We showcase our method on piracy attack reports issued by the International Chamber of Commerce (ICC-CCS). Our pipeline includes conversion of the reports to RDF, linking their parts to external resources from the linked open data cloud and exposing them to the Web.


Information extraction Metadata enrichment Linked data 



This work has been carried out as a part of the Poseidon project and the Agora project. Work in the Poseidon project was done in cooperation with Thales Nederland, under the responsibilities of the Embedded Systems Institute (ESI). The Poseidon project is partially supported by the Dutch Ministry of Economic Affairs under the BSIK03021 program. The Agora project is funded by NWO in the CATCH programme, grant 640.004.801.We would like to thank Davide Ceolin, Juan Manuel Coleto, and Vincent Osinga for their significant contributions.We thank the ICC-CCS IMB and the NGA for providing the open piracy reports.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.


  1. 1.
    Bellamy C (2011) Maritime piracy—return of the world’s second-oldest security problem. RUSI J 156(6): 78–83CrossRefGoogle Scholar
  2. 2.
    Bellare K, McCallum A (2007) Learning extractors from unlabeled text using relevant databases. In: Proceedings of sixth international workshop on information integration on the web (IIWeb-07), in conjunction with AAAI-07, July 23. AAAI Press, Vancouver, pp 10–16Google Scholar
  3. 3.
    Bensassi S, Martínez-Zarzoso I (2012) How costly is modern maritime piracy to the international community? Rev Int Econ (preprint)Google Scholar
  4. 4.
    Bizer C (2004) D2RQ—treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd international semantic web conference (ISWC2004)Google Scholar
  5. 5.
    Canisius S, Sporleder C (2007) Bootstrapping information extraction from field books. In: Proceedings of the 2007 joint meeting of the conference on empirical methods on natural language processing (EMNLP) and the conference on natural language learning (CoNLL), June 28–30. ACL, Prague, pp 827–836Google Scholar
  6. 6.
    Cohen WW (1995) Fast effective rule induction. In: Twelfth international conference on machine learning (ICML’95), pp 115–123Google Scholar
  7. 7.
    Crofts N, Doerr M, Gill T, Stead S, Stiff M (2008) Definition of the CIDOC conceptual reference model. Technical report, ICOM/CIDOC CRM Special Interest Group. version 4.2.5Google Scholar
  8. 8.
    Ding L, Lebo T, Erickson JS, DiFranzo D, Williams GT, Li X, Michaelis J, Graves A, Zheng J, Shangguan Z, Flores J, McGuinness DL, Hendler JA (2011) Twc logd: A portal for linked open government data ecosystems. J Web Semant 9(3): 325–333CrossRefGoogle Scholar
  9. 9.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1)Google Scholar
  10. 10.
    Hiebel G, Hanke K, Hayek I (2010) Methodology for CIDOC CRM based data integration with spatial data. In: 38th annual conference on computer applications and quantitative methods in archaeology. Granada, SpainGoogle Scholar
  11. 11.
    Jakob M, Vanĕk O, Pĕchouček M (2011) Using agents to improve international maritime transport security. IEEE Intell Syst:90–95Google Scholar
  12. 12.
    Kauppinen T, Gräler B (2012) Using the SPARQL package in R to handle Spatial Linked Data.
  13. 13.
    Lendvai P, Hunt S (2008) From field notes towards a knowledge base. In: Proceedings of the sixth international language resources and evaluation (LREC’08), 28–30 May 2008. European Language Resources Association (ELRA), Marrakech, pp 644–649Google Scholar
  14. 14.
    Li Ding DD, McGuinness DL, Hendler J, Magidson S (2009) The data-gov wiki: a semantic web portal for linked government data. In: 8th international semantic web conference (ISWC 2009)Google Scholar
  15. 15.
    Maletic J, Marcus A (2000) Data cleansing: beyond integrity analysis. In: Proceedings of the conference on information quality (IQ 2000), 20–22 Oct. Cambridge, pp 200–209Google Scholar
  16. 16.
    Omitola T, Koumenides C, Popov I, Yang Y, Salvadores M, Szomszor M, Berners-Lee T, Gibbins N, Hall W, Schraefel MC, Shadbolt N (2010) Put in your postcode, out comes the data: a case study. In: 7th extended semantic web conference (ESWC 2010)Google Scholar
  17. 17.
    Porter MF (1980) An algorithm for suffix stripping. Program 14(3): 130–137CrossRefGoogle Scholar
  18. 18.
    Ramsey A (2011) Alternative approaches: land-based strategies to countering piracy off the coast of somalia. Technical report, Civil Military Fusion CentreGoogle Scholar
  19. 19.
    Shaw R, Troncy R, Hardman L (2009) Lode: linking open descriptions of events. In: 4th annual Asian semantic web conference (ASWC’09). Shanghai, ChinaGoogle Scholar
  20. 20.
    Tsilis T (2011) Counter piracy escort operations in the gulf of aden. Master’s thesis, Naval Postgraduate School, MontereyGoogle Scholar
  21. 21.
    UNOSAT / UNITAR. Spatial analysis of somali pirate attacks in 2009., June 2010
  22. 22.
    Van Erp M (2010) Accessing natural history: discoveries in data cleaning, structuring, and retrieval. PhD thesis, Tilburg UniversityGoogle Scholar
  23. 23.
    van Erp M, Oomen J, Segers R, van den Akker C, Aroyo L, Jacobs G, Legêne, van der Meij L, van Ossenbruggen J, Schreiber G (2011) Automatic heritage metadata enrichment with historic events. In Museums and the Web 2011Google Scholar
  24. 24.
    van Hage WR, Malaisé V, Segers R, Hollink L, Schreiber G (2011) Design and use of the simple event model (SEM). J Web Semant 9(2): 128–136CrossRefGoogle Scholar
  25. 25.
    van Hage WR, Wielemaker J, Schreiber G (2010) The space package: tight integration between space and semantics. Trans in GIS 14(2)Google Scholar
  26. 26.
    Wang Y (2011) Semantically-enhanced recommendations in cultural heritage. PhD thesis, Technische Universiteit EindhovenGoogle Scholar
  27. 27.
    Wielemaker J, Huang Z, van der Meij L (2008) SWI-prolog and the web, volume theory and practice of logic programming. Cambridge University Press, Cambridge, pp 363–392Google Scholar
  28. 28.
    Willems N, van Hage WR, de Vries G, Janssens J, Malaisé V (2010) An integrated approach for visual analysis of a multi-source moving objects knowledge base. Int J Geogr Inf Sci 24(9): 1–16Google Scholar

Copyright information

© The Author(s) 2012

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Authors and Affiliations

  • Willem Robert van Hage
    • 1
  • Marieke van Erp
    • 1
    Email author
  • Véronique Malaisé
    • 2
  1. 1.Department of Computer ScienceVU University AmsterdamAmsterdamThe Netherlands
  2. 2.Elsevier Content Enrichment Center (CEC)AmsterdamThe Netherlands

Personalised recommendations