Abstract
There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent movement is the addition of RDF metadata to make automatic processing by computers easier. A fine example of this movement is the open government data initiative which, by representing data from spreadsheets and textual reports in RDF, strives to speed up the creation of geographical mashups and visual analytic applications. In this paper, we present a newly linked dataset and the method we used to automatically translate semi-structured reports on the Web to an RDF event model. We demonstrate how the semantic representation layer makes it possible to easily analyze and visualize the aggregated reports to answer domain questions through a SPARQL client for the R statistical programming language. We showcase our method on piracy attack reports issued by the International Chamber of Commerce (ICC-CCS). Our pipeline includes conversion of the reports to RDF, linking their parts to external resources from the linked open data cloud and exposing them to the Web.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bellamy C (2011) Maritime piracy—return of the world’s second-oldest security problem. RUSI J 156(6): 78–83
Bellare K, McCallum A (2007) Learning extractors from unlabeled text using relevant databases. In: Proceedings of sixth international workshop on information integration on the web (IIWeb-07), in conjunction with AAAI-07, July 23. AAAI Press, Vancouver, pp 10–16
Bensassi S, Martínez-Zarzoso I (2012) How costly is modern maritime piracy to the international community? Rev Int Econ (preprint)
Bizer C (2004) D2RQ—treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd international semantic web conference (ISWC2004)
Canisius S, Sporleder C (2007) Bootstrapping information extraction from field books. In: Proceedings of the 2007 joint meeting of the conference on empirical methods on natural language processing (EMNLP) and the conference on natural language learning (CoNLL), June 28–30. ACL, Prague, pp 827–836
Cohen WW (1995) Fast effective rule induction. In: Twelfth international conference on machine learning (ICML’95), pp 115–123
Crofts N, Doerr M, Gill T, Stead S, Stiff M (2008) Definition of the CIDOC conceptual reference model. Technical report, ICOM/CIDOC CRM Special Interest Group. version 4.2.5
Ding L, Lebo T, Erickson JS, DiFranzo D, Williams GT, Li X, Michaelis J, Graves A, Zheng J, Shangguan Z, Flores J, McGuinness DL, Hendler JA (2011) Twc logd: A portal for linked open government data ecosystems. J Web Semant 9(3): 325–333
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1)
Hiebel G, Hanke K, Hayek I (2010) Methodology for CIDOC CRM based data integration with spatial data. In: 38th annual conference on computer applications and quantitative methods in archaeology. Granada, Spain
Jakob M, Vanĕk O, Pĕchouček M (2011) Using agents to improve international maritime transport security. IEEE Intell Syst:90–95
Kauppinen T, Gräler B (2012) Using the SPARQL package in R to handle Spatial Linked Data. http://www.linkedscience.org/tools/sparql-package-for-r/tutorial-on-sparql-package-for-r/
Lendvai P, Hunt S (2008) From field notes towards a knowledge base. In: Proceedings of the sixth international language resources and evaluation (LREC’08), 28–30 May 2008. European Language Resources Association (ELRA), Marrakech, pp 644–649
Li Ding DD, McGuinness DL, Hendler J, Magidson S (2009) The data-gov wiki: a semantic web portal for linked government data. In: 8th international semantic web conference (ISWC 2009)
Maletic J, Marcus A (2000) Data cleansing: beyond integrity analysis. In: Proceedings of the conference on information quality (IQ 2000), 20–22 Oct. Cambridge, pp 200–209
Omitola T, Koumenides C, Popov I, Yang Y, Salvadores M, Szomszor M, Berners-Lee T, Gibbins N, Hall W, Schraefel MC, Shadbolt N (2010) Put in your postcode, out comes the data: a case study. In: 7th extended semantic web conference (ESWC 2010)
Porter MF (1980) An algorithm for suffix stripping. Program 14(3): 130–137
Ramsey A (2011) Alternative approaches: land-based strategies to countering piracy off the coast of somalia. Technical report, Civil Military Fusion Centre
Shaw R, Troncy R, Hardman L (2009) Lode: linking open descriptions of events. In: 4th annual Asian semantic web conference (ASWC’09). Shanghai, China
Tsilis T (2011) Counter piracy escort operations in the gulf of aden. Master’s thesis, Naval Postgraduate School, Monterey
UNOSAT / UNITAR. Spatial analysis of somali pirate attacks in 2009. http://www.unosat-maps.web.cern.ch/unosat-maps/SO/CE20100714SOM/UNOSAT_SOM_CE2010-PiracyAnalysis_Report_HR_v1.pdf, June 2010
Van Erp M (2010) Accessing natural history: discoveries in data cleaning, structuring, and retrieval. PhD thesis, Tilburg University
van Erp M, Oomen J, Segers R, van den Akker C, Aroyo L, Jacobs G, Legêne, van der Meij L, van Ossenbruggen J, Schreiber G (2011) Automatic heritage metadata enrichment with historic events. In Museums and the Web 2011
van Hage WR, Malaisé V, Segers R, Hollink L, Schreiber G (2011) Design and use of the simple event model (SEM). J Web Semant 9(2): 128–136
van Hage WR, Wielemaker J, Schreiber G (2010) The space package: tight integration between space and semantics. Trans in GIS 14(2)
Wang Y (2011) Semantically-enhanced recommendations in cultural heritage. PhD thesis, Technische Universiteit Eindhoven
Wielemaker J, Huang Z, van der Meij L (2008) SWI-prolog and the web, volume theory and practice of logic programming. Cambridge University Press, Cambridge, pp 363–392
Willems N, van Hage WR, de Vries G, Janssens J, Malaisé V (2010) An integrated approach for visual analysis of a multi-source moving objects knowledge base. Int J Geogr Inf Sci 24(9): 1–16
Acknowledgments
This work has been carried out as a part of the Poseidon project and the Agora project. Work in the Poseidon project was done in cooperation with Thales Nederland, under the responsibilities of the Embedded Systems Institute (ESI). The Poseidon project is partially supported by the Dutch Ministry of Economic Affairs under the BSIK03021 program. The Agora project is funded by NWO in the CATCH programme, grant 640.004.801.We would like to thank Davide Ceolin, Juan Manuel Coleto, and Vincent Osinga for their significant contributions.We thank the ICC-CCS IMB and the NGA for providing the open piracy reports.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
van Hage, W.R., van Erp, M. & Malaisé, V. Linked Open Piracy: A Story about e-Science, Linked Data, and Statistics. J Data Semant 1, 187–201 (2012). https://doi.org/10.1007/s13740-012-0009-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-012-0009-6