Abstract
Trust was widely discussed and formalized in the literature. In the context of Big Data and Connected World, it becomes crucial for developing data-driven solutions. Trusted data increase the quality of decision support systems. Recently, companies are racing towards Linked Open Data (LOD) and Knowledge Bases (KB) to improve their added value, but ignore their SPARQL query-logs. If well cured, these logs can present an asset for analysts. A naive and direct use of these logs is too risky because their provenance and quality are highly questionable. Users of these logs in a trusted way have to be assisted by providing them with in-depth knowledge of the whole LOD environment and tools to cure these logs. In this paper, we propose an ontology-based model inspired by the recent developments in \(<trust, risk, value>\)-ontology engineering. Then, a trust-aware curation approach is presented, composed of enriched ETL-like operators integrating trust metrics that keep only trustworthy queries. Finally, experiments are conducted to study the effectiveness and efficiency of our proposal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For simplicity, we use query-logs to refer to LOD query-logs.
- 2.
- 3.
- 4.
- 5.
Complex query: query with complex shapes like Forrest, Bouquet and depth > 7.
- 6.
Simple query: query with simple/ chain shapes and depth <3.
References
Abedjan, Z., Golab, L., Naumann, F.: Data profiling: a tutorial. In: ICDE, pp. 1747–1751 (2017)
Almendros Jiménez, J.M., Becerra Terón, A., Cuzzocrea, A.M.: Detecting and diagnosing syntactic and semantic errors in SPARQL queries. In: EDBT/ICDT Workshops (2017)
Amaral, G., Sales, T.P., Guizzardi, G., Porello, D.: Towards a reference ontology of trust. In: OTM Conferences, pp. 3–21 (2019)
Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A.: CoreKG: a knowledge lake service. Proc. VLDB Endow. 11(12), 1942–1945 (2018)
Beheshti, A., Benatallah, B., Tabebordbar, A., Motahari-Nezhad, H.R., Barukh, M.C., Nouri, R.: DataSynapse: a social data curation foundry. Distrib. Parallel Databases 37(3), 351–384 (2019)
Behkamal, B., Kahani, M., Bagheri, E.: Quality metrics for linked open data. In: DEXA, pp. 144–152 (2015)
Bonifati, A., Martens, W., Timm, T.: DARQL: Deep analysis of SPARQL queries. In: WWW, pp. 187–190 (2018)
Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 1–25 (2019)
Cai, L., Zhu, Y.: The challenges of data quality and data quality assessment in the big data era. Data Sci. J. 14 (2015)
Ceolin, D., Maccatrozzo, V., Aroyo, L., De-Nies, T.: Linking trust to data quality. In: METHOD Workshop (2015)
Dividino, R., Sizov, S., Staab, S., Schueler, B.: Querying for provenance, trust, uncertainty and other meta knowledge in RDF. JWS 7(3), 204–219 (2009)
Djebri, A.E.A., Tettamanzi, A.G.B., Gandon, F.: Linking and negotiating uncertainty theories over linked data. In: Companion of WWW, pp. 859–865 (2019)
Dong, X.L., et al.: Knowledge-based trust: estimating the trustworthiness of web sources. arXiv preprint arXiv:1502.03519 (2015)
Dumitrache, A., et al.: Crowdtruth 2.0: quality metrics for crowdsourcing with disagreement. arXiv preprint arXiv:1808.06080 (2018)
Gambetta, D., et al.: Can we trust trust? Br. J. Sociol. 13, 213–237 (2000)
Gaona-García, P.A., et al.: A fuzzy logic system to evaluate levels of trust on linked open data resources. Revista Facultad de Ingeniería Universidad de Antioquia 86, 40–53 (2018)
Hartig, O.: Querying trust in RDF data with TSPARQL. In: ESWC, pp. 5–20 (2009)
Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: ICDE, pp. 717–728 (2005)
Khouri, S., Lanasri, D., Saidoune, R., Boudoukha, K., Bellatreche, L.: Loglinc: log queries of linked open data investigator for cube design. In: DEXA, pp. 352–367 (2019)
Llave, M.R.: Data lakes in business intelligence: reporting from the trenches. Procedia Comput. Sci. 138, 516–524 (2018)
Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the most out of Wikidata: semantic technology usage in Wikipedia’s knowledge graph. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 376–394. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_23
Sales, T.P., Almeida, J.P.A., Santini, S., Baião, F., Guizzardi, G.: Ontological analysis and redesign of risk modeling in archimate. In: EDOC, pp. 154–163 (2018)
Sales, T.P., Baião, F., Guizzardi, G., Almeida, J.P.A., Guarino, N., Mylopoulos, J.: The common ontology of value and risk. In: ER, pp. 121–135 (2018)
Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. IJSWIS 3(4), 1–24 (2007)
Suriarachchi, I., Plale, B.: Crossing analytics systems: a case for integrated provenance in data lakes. In: e-Science, pp. 349–354 (2016)
Tian, Y., Umbrich, J., Yu, Y.: Enhancing source selection for live queries over linked data via query log mining. In: JIST, pp. 176–191 (2011)
Acknowledgements
We would like to thank Prof. Giancarlo Guizzardi for his valuable comments on risk, value, and trust ontologies.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lanasri, D., Khouri, S., Bellatreche, L. (2020). Trust-Aware Curation of Linked Open Data Logs. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds) Conceptual Modeling. ER 2020. Lecture Notes in Computer Science(), vol 12400. Springer, Cham. https://doi.org/10.1007/978-3-030-62522-1_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-62522-1_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62521-4
Online ISBN: 978-3-030-62522-1
eBook Packages: Computer ScienceComputer Science (R0)