Discovering Implicit Schemas in JSON Data

  • Javier Luis Cánovas Izquierdo
  • Jordi Cabot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7977)

Abstract

JSON has become a very popular lightweigth format for data exchange. JSON is human readable and easy for computers to parse and use. However, JSON is schemaless. Though this brings some benefits (e.g., flexibility in the representation of the data) it can become a problem when consuming and integrating data from different JSON services since developers need to be aware of the structure of the schemaless data. We believe that a mechanism to discover (and visualize) the implicit schema of the JSON data would largely facilitate the creation and usage of JSON services. For instance, this would help developers to understand the links between a set of services belonging to the same domain or API. In this sense, we propose a model-based approach to generate the underlying schema of a set of JSON documents.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ying, M., Miller, J.: Refactoring legacy AJAX applications to improve the efficiency of the data exchange component. Syst. Soft. 86(1), 72–88 (2013)CrossRefGoogle Scholar
  2. 2.
    Nurseitov, N., Paulson, M.: Comparison of JSON and XML data interchange formats: A case study. In: CAINE Conf., pp. 157–162 (2009)Google Scholar
  3. 3.
    Fowler, M.: Schemaless data structures, http://martinfowler.com/articles/schemaless
  4. 4.
    IETF: A json media type for describing the structure and meaning of json documents. Standard Draft v3Google Scholar
  5. 5.
    Lin, Y., Gray, J., Jouault, F.: DSMDiff: a differentiation tool for domain-specific models. Europ. Inf. Syst. 16(4), 349–361 (2007)CrossRefGoogle Scholar
  6. 6.
    Kolovos, D.S., Di Ruscio, D., Pierantonio, A., Paige, R.F.: Different models for model matching: An analysis of approaches to support model differencing. In: CVSM Conf., pp. 1–6 (2009)Google Scholar
  7. 7.
    Nestorov, S., Abiteboul, S., Motwani, R.: Inferring structure in semistructured data. ACM SIGMOD Record 26(4), 39–43 (1997)CrossRefGoogle Scholar
  8. 8.
    Chang, C., Kayed, M.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006)CrossRefGoogle Scholar
  9. 9.
    Arasu, A., Garcia-Molina, H., University, S.: Extracting structured data from Web pages. In: SIGNMOD Conf., p. 337. ACM Press (2003)Google Scholar
  10. 10.
    Crescenzi, V., Mecca, G.: Automatic information extraction from large websites. Journal of the ACM 51(5), 731–779 (2004)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Hernández, I., Rivero, C.R., Ruiz, D., Corchuelo, R.: Towards Discovering Conceptual Models behind Web Sites. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 166–175. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Ohst, D., Welle, M., Kelter, U.: Differences between versions of UML diagrams. In: ACM SIGSOFT Conf., pp. 227–236 (2003)Google Scholar
  13. 13.
    Alanen, M., Porres, I.: Difference and union of models. In: Stevens, P., Whittle, J., Booch, G. (eds.) UML 2003. LNCS, vol. 2863, pp. 2–17. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Melnik, S., Garcia-molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm. In: DE Conf., pp. 117–128 (2002)Google Scholar
  15. 15.
    Selonen, P., Kettunen, M.: Metamodel-Based Inference of Inter-Model Correspondence. In: CSMR Conf., pp. 71–80 (2007)Google Scholar
  16. 16.
    Treude, C., Berlik, S., Wenzel, S., Kelter, U.: Difference computation of large models. In: ESEC/FSE Conf., p. 295 (2007)Google Scholar
  17. 17.
    Whang, S.E., Garcia-Molina, H.: Joint entity resolution. In: ICDE Conf., pp. 294–305 (2012)Google Scholar
  18. 18.
    Xie, T., Pei, J.: MAPO: Mining API usages from open source repositories. In: MSR Workshop, pp. 54–57 (2006)Google Scholar
  19. 19.
    Robillard, M.P., Bodden, E., Kawrykow, D., Mezini, M., Ratchford, T.: Automated API Property Inference Techniques. IEEE Trans. Soft. Eng., 1–1 (2012)Google Scholar
  20. 20.
    Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: ESEC/FSE Conf., pp. 213–222 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Javier Luis Cánovas Izquierdo
    • 1
  • Jordi Cabot
    • 1
  1. 1.AtlanMod, École des Mines de Nantes - INRIA - LINANantesFrance

Personalised recommendations