Advertisement

A Framework for Linking RDF Datasets for Thailand Open Government Data Based on Semantic Type Detection

  • Pattama Krataithong
  • Marut Buranarach
  • Nattanont Hongwarittorrn
  • Thepchai Supnithi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10075)

Abstract

Most of datasets in open government data portals are mainly in tabular format in spreadsheet, e.g. CSV and XLS. To increase the value and reusability of these datasets, the datasets should be made available in RDF format that can support better data querying and data integration. Our previous work proposed a semi-automatic framework for generating RDF datasets from existing datasets in tabular format. In this paper, we extend our framework to support automatic linking of the RDF datasets. One of the important steps is mapping some literal values that appear in a dataset to some standard URIs. Several previous researches use semantic search API such as DBpedia or Sindice for URI mapping. However, this approach is not appropriate for the datasets of Thailand open data portal (Data.go.th) because there is insufficient data for Thai name entities. In addition, a name may match with more than one URI, i.e. word ambiguity. For example, the name “Bangkok” may match with those referenced by URIs of a province, a hospital or a university. To resolve these issues, our framework proposes that finding semantic types is essential to resolve word ambiguity in retrieving a proper URI for a name entity. This paper presents a framework for finding semantic types and mapping name entities to URIs, i.e. URI lookup. A Name Entity Recognition (NER) technique is applied in finding semantic type of a column in a CSV dataset. The results are used for creating ontology and RDF data that include the URI mappings for name entities. We evaluate two approaches by comparing the performance of a semantic search API, i.e. Wikipedia and the NER technique using some datasets from the Data.go.th website.

Keywords

Finding semantic types Name Entity Recognition (NER) Automatic ontology creation Automatic linked dataset creation 

Notes

Acknowledgement

This project was funded by the Electronic Government Agency (EGA) and the National Science and Technology Development Agency (NSTDA), Thailand.

References

  1. 1.
    Krataithong, P., Buranarach, M., Hongwarittorrn, N.: Semi-automatic framework for generating RDF dataset from open data. In: Proceedings of the 11th International Symposium on Natural Language Processing (SNLP2016), February 2016Google Scholar
  2. 2.
    Krataithong, P., Buranarach, M., Supnithi, T.: RDF dataset management framework for data.go.th. In: Proceedings of the 10th International Conference on Knowledge, Information and Creativity Support Systems (KICSS 2015), November 2015Google Scholar
  3. 3.
    Ermilov, I., Auer, S., Stadler, C.: User-driven semantic mapping of tabular data. In: Proceedings of the 9th International Conference Semantic System - I-SEMANTICS 2013. 105 (2013)Google Scholar
  4. 4.
    Tirasaroj, N., Aroonmanakun, W.: Thai named entity recognition based on conditional random fields. In: 2009 Eighth International Symposium Natural Language Processing, pp. 216–220 (2009)Google Scholar
  5. 5.
    Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the First International Workshop on Consuming Linked Data (2010)Google Scholar
  6. 6.
    Maali, F., Cyganiak, R., Peristeras, V.: Re-using Cool URIs: Entity reconciliation against LOD hubs. In: Proceedings of the Linked Data on the Web Workshop 2011 (LDOW 2011), WWW 2011 (2011)Google Scholar
  7. 7.
    Chanlekha, H., Kawtrakul, A., Varasrai, P., Mulasas, I.: Statistical and heuristic rule based model for thai named entity. In: Proceedings of SNLP 2002 (2002)Google Scholar
  8. 8.
    Chanlekha, H., Kawtrakul, A.: Thai named entity extraction by incorporating maximum entropy model with simple heuristic information. In: Proceedings of the IJCNLP (2004)Google Scholar
  9. 9.
    Buranarach, M., Thein, Y.M., Supnithi, T.: A community-driven approach to development of an ontology-based application management framework. In: Takeda, H., Qu, Y., Mizoguchi, R., Kitamura, Y. (eds.) JIST 2012. LNCS, vol. 7774, pp. 306–312. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37996-3_21 CrossRefGoogle Scholar
  10. 10.
    Knoblock, C.A., et al.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-30284-8_32 CrossRefGoogle Scholar
  11. 11.
    Sande, M.V., De Vocht, L., Van Deursen, D., Mannens, E., Van De Walle, R.: Lightweight transformation of tabular open data to RDF. In: 8th International Conference on Semantic Systems, pp. 38–42 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Pattama Krataithong
    • 1
    • 2
  • Marut Buranarach
    • 1
  • Nattanont Hongwarittorrn
    • 2
  • Thepchai Supnithi
    • 1
  1. 1.Language and Semantic Technology LaboratoryNational Electronics and Computer Technology Center (NECTEC)PathumthaniThailand
  2. 2.Department of Computer Science, Faculty of Science and TechnologyThammasat UniversityPathumthaniThailand

Personalised recommendations