Skip to main content

Inferring Versioned Schemas from NoSQL Databases and Its Applications

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9381))

Included in the following conference series:

Abstract

While the concept of database schema plays a central role in relational database systems, most NoSQL systems are schemaless: these databases are created without having to formally define its schema. Instead, it is implicit in the stored data. This lack of schema definition offers a greater flexibility; more specifically, the schemaless databases ease both the recording of non-uniform data and data evolution. However, this comes at the cost of losing some of the benefits provided by schemas. In this article, a MDE-based reverse engineering approach for inferring the schema of aggregate-oriented NoSQL databases is presented. We show how the obtained schemas can be used to build database utilities that tackle some of the problems encountered using implicit schemas: a schema diagram viewer and a data validator generator are presented.

Work partially supported by the Cátedra SAES of the University of Murcia (http://www.catedrasaes.org), a research lab sponsored by the SAES company (http://www.electronica-submarina.com/).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://json.org/.

References

  1. Abiteboul, S.: Querying semi-structured data. Technical report 1996–19, Stanford InfoLab (1996). http://ilpubs.stanford.edu:8090/144/

  2. Apache Foundation: Apache Drill, Visited April 2015. http://drill.apache.org/

  3. Bugiotti, F., Cabibbo, L., Atzeni, P., Torlone, R.: Database design for NoSQL systems. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8824, pp. 223–231. Springer, Heidelberg (2014)

    Google Scholar 

  4. Buneman, P.: Semistructured data. In: Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 117–121. ACM (1997)

    Google Scholar 

  5. Cánovas Izquierdo, J.L., Cabot, J.: Discovering implicit schemas in JSON data. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 68–83. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Fowler, M.: Schemaless Data Structures, January 2013. http://martinfowler.com/articles/schemaless/

  7. IETF: JSON Schema Specification, Visited April 2015. http://json-schema.org/

  8. Janga, P., Davis, K.C.: Mapping heterogeneous XML document collections to relational databases. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8824, pp. 86–99. Springer, Heidelberg (2014)

    Google Scholar 

  9. Karpov, V.: Mongoose NPM package, Visited April 2015. https://www.npmjs.com/package/mongoose

  10. Klettke, M., Störl, U., Scherzinger, S.: Schema extraction and structural outlier detection for JSON-based NoSQL data stores. In: BTW 2105, pp. 425–444 (2015)

    Google Scholar 

  11. Redmond, E., Wilson, J.R.: Seven Databases in Seven Weeks. A Guide to Modern Databases and the NoSQL Movement, Pragmatic Programmers (2013)

    Google Scholar 

  12. Rückstieß, T.: mongodb-schema NPM package, Visited April 2015. https://www.npmjs.com/package/mongodb-schema

  13. Sadalage, P., Fowler, M.: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, Reading (2012)

    Google Scholar 

  14. Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: Eclipse Modeling Framework. Addison-Wesley, Reading (2008)

    Google Scholar 

  15. Zaharia, M., Chowdhury, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, April 2012. http://spark.apache.org

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Sevilla Ruiz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Sevilla Ruiz, D., Morales, S.F., García Molina, J. (2015). Inferring Versioned Schemas from NoSQL Databases and Its Applications. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25264-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25263-6

  • Online ISBN: 978-3-319-25264-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics