Abstract
With the wide use of information expertise in advanced analytics, basically three characteristics of big data have been identified. These are volume, velocity and variety. The first of these two have enjoyed quite a lot of focus, volume of data and velocity of data, less thought has been focused on variety of available data worldwide. Data variety refers to the nature of data in store and under processing, which has three orthogonal natures: structured, semi-structured and unstructured. To handle the variety of data, current universally acceptable solutions are either costlier than customized solutions or less efficient to cater data heterogeneity. Thus, a basic idea is to, first design data processing systems that create abstraction that covers a wide range of data types and support fundamental processing on underlying heterogeneous data. In this paper, we conceptualized data management architecture ‘Big DataSpace’, for big data processing with the capability to combine heterogeneous data from various data sources. Further, we explain how Big DataSpace architecture can help in processing the heterogeneous and distributed data, a fundamental task in data management.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anis, D.S., Dong, X., Halevy, A.Y.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 861–874. ACM, USA (2008)
Divyakant, A., Bernstein, P., et. al.: Challenges and Opportunities with Big Data. A community white paper. Feb, USA (2012)
David, L., Alex, P., et. al.: Computational Social Science. A technical report on Science, vol. 323(5915), pp. 721–723. USA (2009)
Daizy, Z., Dong, X., Sarma, A.D., Franklin, M.J., Halevy, A.Y.: Functional dependency generation and applications in pay-as-you-go data integration systems. In: WebDB (2009)
Steve, L.: The age of Big Data. A technical report. New York Times, Feb (2012)
Singh, G., Bharathi, S., Chervenak, A., Deelman, E., Kesselman, C., Manohar, M., Patil, S., Pearlman L.: A metadata catalog service for data intensive applications. In: Proceedings of International Conference on Supercomputing, pp. 20–37. IEEE/ACM, USA (2003)
Vagelis, H., Gravano, L., Papakonstantinou, Y.: Efficient IR-Style keyword search over relational databases. In: Proceedings of the International Conference on VLDB, pp. 850–861. Berlin, Germany (2003)
Vagelis, H., Papakonstantinou, Y.: DISCOVER: Keyword search in relational databases. In: Proceedings of the International Conference on VLDB, pp. 670–681. Berlin, Germany (2002)
Dittrich, J.P.: iDM: A unified and versatile data model for personal dataspace management. In: Proceedings of the International Conference on VLDB, pp. 367–378. Seoul, Korea (2006)
Salles, M.A., Dittrich, V.J., Blunschi, L.: Intentional associations in Dataspaces. In: Proceedings of International Conference of Data Engineering, pp. 30–35. IEEE, USA (2010)
Franklin, M., Halevy A., Maier, D.: From databases to dataspaces: A new abstraction for information management. In: Proceedings of the 2005 ACM SIGMOID Record, vol. 34(4), pp. 27–33, ACM USA (2005)
Ibrahim, E., Peter, B., Tjoa, A.M.: Towards realization of dataspaces. In: Proceedings of the 17th International Conference on Database and Expert Systems Applications, pp. 266–272. IEEE, USA (2006)
Bhalotia, G., Nakhey, C., Hulgeri, A., Chakrabarti, S., Sudarshanz S.: Keyword Searching and browsing in databases using BANKS. In: Proceedings of the International Conference of Data Engineering, pp. 431–441. IEEE, USA (2002)
Xin, D., Halevy, A.: Indexing dataspaces. In: Proceedings of 2007 ACM SIGMOD International Conference on Management of Data, pp. 32–45. ACM, USA (2007)
Manyika, J., Chui, M., et.al.: Big data: the next frontier for innovation, competition, and productivity. A Technical Report. McKinsey Global Institute (2011)
Marcos, A., Salles M.A., Dittrich J.: iTrails: pay-as-you-go information integration in dataspaces. In: Proceedings of International Conference of VLDB, pp 663–674. Vienna, Austria (2007)
Dittrich, J.P.: iMeMex: A platform for personal dataspace management. In: Proceedings of 2nd Invitational Workshop for Personal Information Management, pp. 292–308. USA (2006)
Salles, M.V.: Pay-as-you-go information integration in personal and social dataspaces. Ph.D. Dissertation, ETH Zurich (2008)
Sanjay, A., Chaudhuri, S., Das, G.: Dbxplorer: a system for keyword-based search over relational databases. In: Proceedings of the International Conference on Data Engineering, pp. 1–5. IEEE, USA (2002)
Shawn, R.J., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: Proceedings of the SIGMOD Conference, pp. 847–860. ACM, USA (2008)
Yuhan, C., Xin, L.: Personal information management with SEMEX. In: Proceedings of 2005 ACM SIGMOD International Conference on Management of Data, pp. 921–923. ACM, USA (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Sheokand, V., Singh, V. (2016). Modeling Data Heterogeneity Using Big DataSpace Architecture. In: Choudhary, R., Mandal, J., Auluck, N., Nagarajaram, H. (eds) Advanced Computing and Communication Technologies. Advances in Intelligent Systems and Computing, vol 452. Springer, Singapore. https://doi.org/10.1007/978-981-10-1023-1_26
Download citation
DOI: https://doi.org/10.1007/978-981-10-1023-1_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1021-7
Online ISBN: 978-981-10-1023-1
eBook Packages: EngineeringEngineering (R0)