Intelligent Knowledge Lakes: The Age of Artificial Intelligence and Big Data

Beheshti, Amin; Benatallah, Boualem; Sheng, Quan Z.; Schiliro, Francesco

doi:10.1007/978-981-15-3281-8_3

Amin Beheshti¹²,
Boualem Benatallah¹³,
Quan Z. Sheng¹² &
…
Francesco Schiliro^12,14

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1155))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1503 Accesses
17 Citations
2 Altmetric

Abstract

The continuous improvement in connectivity, storage and data processing capabilities allow access to a data deluge from the big data generated on open, private, social and IoT (Internet of Things) data islands. Data Lakes introduced as a storage repository to organize this raw data in its native format until it is needed. The rationale behind a Data Lake is to store raw data and let the data analyst decide how to curate them later. Previously, we introduced the novel notion of Knowledge Lake, i.e., a contextualized Data Lake, and proposed algorithms to turn the raw data (stored in Data Lakes) into contextualized data and knowledge using extraction, enrichment, annotation, linking and summarization techniques. In this tutorial, we introduce Intelligent Knowledge Lakes to facilitate linking Artificial Intelligence (AI) and Data Analytics. This will enable AI applications to learn from contextualized data and use them to automate business processes and develop cognitive assistance for facilitating the knowledge intensive processes or generating new rules for future business analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://en.wikipedia.org/wiki/Boston_Marathon_bombing.
2.
https://developers.google.com/knowledge-graph.
3.
https://www.wikidata.org/.
4.
An entity E is represented as a data object that exists separately and has a unique identity. Entities are described by a set of attributes.

References

Alsubaiee, S., et al.: Storage management in AsterixDB. Proc. VLDB Endow. 7(10), 841–852 (2014)
Article Google Scholar
Amouzgar, F., Beheshti, A., Ghodratnama, S., Benatallah, B., Yang, J., Sheng, Q.Z.: iSheets: a spreadsheet-based machine learning development platform for data-driven process analytics. In: Liu, X., et al. (eds.) ICSOC 2018. LNCS, vol. 11434, pp. 453–457. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17642-6_43
Chapter Google Scholar
Beheshti, A., Benatallah, B., Motahari-Nezhad, H.R.: ProcessAtlas: a scalable and extensible platform for business process analytics. Softw.: Pract. Exp. 48(4), 842–866 (2018)
Google Scholar
Beheshti, A., Benatallah, B., Nouri, R., Chhieng, V.M., Xiong, H., Zhao, X.: CoreDB: a data lake service. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, 06–10 November 2017, pp. 2451–2454 (2017)
Google Scholar
Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A.: CoreKG: a knowledge lake service. PVLDB 11(12), 1942–1945 (2018)
Google Scholar
Beheshti, A., Benatallah, B., Tabebordbar, A., Motahari-Nezhad, H.R., Barukh, M.C., Nouri, R.: DataSynapse: a social data curation foundry. Distrib. Parallel Databases 37(3), 351–384 (2019)
Article Google Scholar
Beheshti, A., Moraveji-Hashemi, V., Yakhchi, S., Motahari-Nezhad, H.R., Ghafari, S.M., Yang, J.: personality2vec: enabling the analysis of behavioral disorders in social networks. In: Proceedings of the 13th ACM International Conference on Web Search and Data Mining, WSDM 2020, Houston, Texas, USA (2020)
Google Scholar
Beheshti, A., et al.: iProcess: enabling IoT platforms in data-driven knowledge-intensive processes. In: Weske, M., Montali, M., Weber, I., vom Brocke, J. (eds.) BPM 2018. LNBIP, vol. 329, pp. 108–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98651-7_7
Chapter Google Scholar
Beheshti, A., Vaghani, K., Benatallah, B., Tabebordbar, A.: CrowdCorrect: a curation pipeline for social data cleansing and curation. In: Mendling, J., Mouratidis, H. (eds.) CAiSE 2018. LNBIP, vol. 317, pp. 24–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92901-9_3
Chapter Google Scholar
Beheshti, S., Benatallah, B., Motahari-Nezhad, H.R.: Galaxy: a platform for explorative analysis of open data sources. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, 15–16 March 2016, pp. 640–643 (2016)
Google Scholar
Beheshti, S., Benatallah, B., Motahari-Nezhad, H.R.: Scalable graph-based OLAP analytics over process execution data. Distrib. Parallel Databases 34(3), 379–423 (2016)
Article Google Scholar
Beheshti, S., et al.: Process Analytics - Concepts and Techniques for Querying and Analyzing Process Data. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-25037-3
Book Google Scholar
Beheshti, S., Benatallah, B., Venugopal, S., Ryu, S.H., Motahari-Nezhad, H.R., Wang, W.: A systematic review and comparative analysis of cross-document coreference resolution methods and tools. Computing 99(4), 313–349 (2017)
Article MathSciNet Google Scholar
Beheshti, S., Motahari Nezhad, H.R., Benatallah, B.: Temporal provenance model (TPM): model and query language. CoRR, abs/1211.5009 (2012)
Google Scholar
Beheshti, S., Tabebordbar, A., Benatallah, B., Nouri, R.: On automating basic data curation tasks. In: Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017, pp. 165–169 (2017)
Google Scholar
Berners-Lee, T.: Designing the web for an open society. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011, pp. 3–4 (2011)
Google Scholar
Freitas, A., Curry, E.: Big Data Curation. In: Cavanillas, J.M., Curry, E., Wahlster, W. (eds.) New Horizons for a Data-Driven Economy, pp. 87–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21569-3_6
Chapter Google Scholar
Gitelman, L.: Raw Data Is an Oxymoron. MIT Press, Cambridge (2013)
Book Google Scholar
Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al’.s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM (2016)
Google Scholar
Lord, P., Macdonald, A., Lyon, L., Giaretta, D.: From data deluge to data curation. In: Proceedings of the UK e-science All Hands meeting, pp. 371–375. Citeseer (2004)
Google Scholar
McAfee, A., Brynjolfsson, E., Davenport, T.H., Patil, D., Barton, D.: Big data: the management revolution. Harv. Bus. Rev. 90(10), 60–68 (2012)
Google Scholar
Miller, D.: Tales from Facebook. Polity, Cambridge (2011)
Google Scholar
Miloslavskaya, N., Tolstoy, A.: Big data, fast data and data lake concepts. Procedia Comput. Sci. 88, 300–305 (2016)
Article Google Scholar
Moreau, L., et al.: The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011)
Google Scholar
Murthy, D.: Twitter. Polity Press, Cambridge (2018)
Google Scholar
Schiliro, F., et al.: iCOP: IoT-enabled policing processes. In: Liu, X., et al. (eds.) ICSOC 2018. LNCS, vol. 11434, pp. 447–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17642-6_42
Chapter Google Scholar
Shadbolt, N., et al.: Linked open government data: Lessons from data.gov.uk. IEEE Intell. Syst. 27(3), 16–24 (2012)
Google Scholar
Stonebraker, M., et al.: Data curation at scale: the data tamer system. In: CIDR (2013)
Google Scholar
Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: Lrec, vol. 4, pp. 40. Citeseer (2004)
Google Scholar
Sumbaly, R., Kreps, J., Shah, S.: The big data ecosystem at linkedin. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1125–1134. ACM (2013)
Google Scholar
Tene, O., Polonetsky, J.: Big data for all: privacy and user control in the age of analytics. Nw. J. Tech. Intell. Prop. 11, xxvii (2012)
Google Scholar
Terrizzano, I.G., Schwarz, P.M., Roth, M., Colino, J.E.: Data wrangling: the challenging yourney from the wild to the lake. In: CIDR (2015)
Google Scholar
Xia, F., Yang, L.T., Wang, L., Vinel, A.: Internet of things. Int. J. Commun. Syst. 25(9), 1101 (2012)
Article Google Scholar
Zomaya, A.Y., Sakr, S.: Handbook of Big Data Technologies. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4
Book Google Scholar

Download references

Acknowledgements

We Acknowledge the AI-enabled Processes (AIP) (https://aip-research-center.github.io/) Research Centre for funding part of this research.

Author information

Authors and Affiliations

Macquarie University, Sydney, Australia
Amin Beheshti, Quan Z. Sheng & Francesco Schiliro
University of New South Wales, Sydney, Australia
Boualem Benatallah
Australian Federal Police, Canberra, Australia
Francesco Schiliro

Authors

Amin Beheshti
View author publications
You can also search for this author in PubMed Google Scholar
Boualem Benatallah
View author publications
You can also search for this author in PubMed Google Scholar
Quan Z. Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Schiliro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amin Beheshti .

Editor information

Editors and Affiliations

University of Macau, Macau, China
Leong Hou U
Macquarie University, Sydney, NSW, Australia
Jian Yang
South China University of Technology, Guangzhou, China
Yi Cai
International Institute of Information Technology, Hyderabad, India
Kamalakar Karlapalem
Soochow University, Suzhou, China
An Liu
Hong Kong Baptist University, Hong Kong, China
Xin Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beheshti, A., Benatallah, B., Sheng, Q.Z., Schiliro, F. (2020). Intelligent Knowledge Lakes: The Age of Artificial Intelligence and Big Data. In: U, L., Yang, J., Cai, Y., Karlapalem, K., Liu, A., Huang, X. (eds) Web Information Systems Engineering. WISE 2020. Communications in Computer and Information Science, vol 1155. Springer, Singapore. https://doi.org/10.1007/978-981-15-3281-8_3

Download citation

DOI: https://doi.org/10.1007/978-981-15-3281-8_3
Published: 06 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3280-1
Online ISBN: 978-981-15-3281-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics