Skip to main content

LearnPADS  + + : Incremental Inference of Ad Hoc Data Formats

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 7149)

Abstract

An ad hoc data source is any semi-structured, non-standard data source. The format of such data sources is often evolving and frequently lacking documentation. Consequently, off-the-shelf tools for processing such data often do not exist, forcing analysts to develop their own tools, a costly and time-consuming process. In this paper, we present an incremental algorithm that automatically infers the format of large-scale data sources. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle large-scale or streaming data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions.

Keywords

  • Edit Distance
  • Initial Description
  • Dependent Pair
  • Membership Query
  • Grammatical Inference

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. LearnPADS + + , http://www.padsproj.org/incremental-learning.html

  2. Appel, A.W.: Modern Compiler Implementation in ML. Cambridge University Press (1998)

    Google Scholar 

  3. Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: SIGMOD, pp. 337–348 (2003)

    Google Scholar 

  4. Bille, P.: A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1-3), 217–239 (2005)

    CrossRef  MathSciNet  MATH  Google Scholar 

  5. Chidlovskii, B., Ragetli, J., de Rijke, M.: Wrapper Generation via Grammar Induction. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 96–108. Springer, Heidelberg (2000)

    CrossRef  Google Scholar 

  6. Earley, J.: An efficient context-free parsing algorithm. Communications of the ACM 13(2), 94–102 (1970)

    CrossRef  MATH  Google Scholar 

  7. Fisher, K., Walker, D., Zhu, K., White, P.: From dirt to shovels: Fully automatic tool generation from ad hoc data. In: POPL (January 2008)

    Google Scholar 

  8. Fisher, K., Walker, D., Zhu, K.Q.: LearnPADS: Automatic tool generation from ad hoc data. In: SIGMOD (2008)

    Google Scholar 

  9. Grünwald, P.D.: The Minimum Description Length Principle. MIT Press (May 2007)

    Google Scholar 

  10. De La Higuera, C.: Current Trends in Grammatical Inference. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 28–31. Springer, Heidelberg (2000)

    CrossRef  Google Scholar 

  11. PADS project (2009), http://www.padsproj.org/

  12. Parekh, R., Honavar, V.: An Incremental Interactive Algorithm for Regular Grammar Inference. In: Miclet, L., de la Higuera, C. (eds.) ICGI 1996. LNCS, vol. 1147, pp. 238–249. Springer, Heidelberg (1996)

    CrossRef  Google Scholar 

  13. Vidal, E.: Grammatical Inference: An Introduction Survey. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 1–4. Springer, Heidelberg (1994)

    CrossRef  Google Scholar 

  14. Zhu, K.Q., Fisher, K., Walker, D.: Incremental learning of system log formats. In: ACM SOSP Workshop on the Analysis of System Logs (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, K.Q., Fisher, K., Walker, D. (2012). LearnPADS  + + : Incremental Inference of Ad Hoc Data Formats. In: Russo, C., Zhou, NF. (eds) Practical Aspects of Declarative Languages. PADL 2012. Lecture Notes in Computer Science, vol 7149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27694-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27694-1_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27693-4

  • Online ISBN: 978-3-642-27694-1

  • eBook Packages: Computer ScienceComputer Science (R0)