Exploratory Analysis System for Semi-structured Engineering Logs

  • Michael Flaster
  • Bruce Hillyer
  • Tin Kam Ho
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)


Engineering diagnosis often involves analyzing complex records of system states printed to large, textual log files. Typically the logs are designed to accommodate the widest debugging needs without rigorous plans on formatting. As a result, critical quantities and flags are mixed with less important messages in a loose structure. Once the system is sealed, the log format is not changeable, causing great difficulties to the technicians who need to understand the event correlations. We describe a modular system for analyzing such logs where document analysis, report generation, and data exploration tools are factored into generic, reusable components and domain-dependent, isolated plug-ins. The system supports incremental, focused analysis of complicated symptoms with minimal programming effort and software installation. We discuss important concerns in the analysis of logs that sets it apart from understanding natural language text or rigorously structured computer programs. We highlight the research challenges that would guide the development of a deep analysis system for many kinds of semi-structured documents.


Document Image Text Line Optical Character Recognition Text Block Reusable Component 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Angluin, D.: Finding Patterns Common to A Set of Strings. In: Proc. of the 11th Annual ACM Symposium on Theory of Computing, Atlanta, pp. 130–141 (1979)Google Scholar
  2. 2.
    Angluin, D.: Learning Regular Sets From Queries and Counterexamples. Information and Computation 75, 87–106 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Baird, H.S.: Anatomy of a versatile page reader. Proceedings of the IEEE 80(7), 1059–1065 (1992)CrossRefGoogle Scholar
  4. 4.
    Bunke, H., Sanfeliu, A.: Syntactic And Structural Pattern Recognition: Theory And Applications. World Scientific, Singapore (1990)zbMATHGoogle Scholar
  5. 5.
    Cullen, P., Ho, T.K., Hull, J.J., Prussak, M., Srihari, S.N.: Contextual Analysis of Machine Printed Addresses. In: Proc. of the 4th USPS Advanced Technology Conference, Washington, D.C, November 1990, pp. 779–793 (1990)Google Scholar
  6. 6.
    Fielding, R.T.: Architectural Styles and the Design of Network-based Software Architectures, PhD Dissertation, Information and Computer Science, University of California, Irvine (2000)Google Scholar
  7. 7.
    Franke, K., Guyon, I., Schomaker, L., Vuurpijl, L.: The WANDAML Markup Language for Digital Document Annotation. In: Proc. of the 9th International Workshop on Frontiers in Handwriting Recognition, pp. 563–568Google Scholar
  8. 8.
    Glance, N.S., Hurst, M., Tomokiyo, T.: BlogPulse: Automated Trend Discovery for Weblogs. In: Proc. of WWW 2004, May 17-22, New York (2004),
  9. 9.
    Ho, T.K.: Exploratory Analysis of Point Proximity in Subspaces. In: Proc. of the 16th ICPR, Quebec City, Canada, August 11-15 (2002)Google Scholar
  10. 10.
    Ho, T.K.: Interactive Tools for Pattern Discovery. In: Proc. of the 17th ICPR, Cambridge, U.K, August 22-26, vol. 2, pp. 509–512 (2004)Google Scholar
  11. 11.
    Ho, T.K.: Mirage project site,
  12. 12.
    Honavar, V., Slutzki, G. (eds.): ICGI 1998. LNCS (LNAI), vol. 1433. Springer, Heidelberg (1998)Google Scholar
  13. 13.
    Hu, J., Kashi, R., Wilfong, G.: Document Image Layout Comparison and Classification. In: Proc. of the 5th ICDAR, Bangalore, p. 285 (1999)Google Scholar
  14. 14.,
  15. 15.
    Lopresti, D., Nagy, G.: Automated Table Processing: An (Opinionated) Survey. In: Proc. IAPR Workshop on Graphics Recognition (GREC 1999), Jaipur, September 1999, pp. 109–134 (1999)Google Scholar
  16. 16.
    Madhvanath, S., Govindaraju, V., Ramanaprasad, V., Lee, D.S., Srihari, S.N.: Reading Handwritten US Census Forms. In: Proc. of the 3rd ICDAR, vol. 1, p. 82 (1995)Google Scholar
  17. 17.
    Nagy, G.: Twenty Years of Document Image Analysis in PAMI. IEEE Trans. PAMI 22(1), 38–62 (2000)Google Scholar
  18. 18.
    Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: Proc. of the 27th VLDB Conference, Roma, Italy (2001)Google Scholar
  19. 19.
    Rossmanith, P., Zeugmann, T.: Stochastic Finite Learning of the Pattern Languages. Machine Learning 44, 67–91 (2001)zbMATHCrossRefGoogle Scholar
  20. 20.
    Sakakibara, Y.: Grammatical Inference in Bioinformatics. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 1051–1062 (2005)CrossRefGoogle Scholar
  21. 21.
    Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Proc. of the 25th VLDB Conference, Edinburgh, Scotland (1999)Google Scholar
  22. 22.
    Spitz, A.L.: Determination of the Script And Language Content of Document Images. IEEE Trans. Pattern Analysis and Machine Intelligence 19(3), 235–245 (1997)CrossRefGoogle Scholar
  23. 23.
    van Zaanen, M.: The Grammatical Induction Website,
  24. 24.
    Watanabe, T., Sobue, T.: Layout Analysis of Complex Documents. In: Proc. of the 15th ICPR, Barcelona, vol. 4, p. 4447Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Michael Flaster
    • 1
  • Bruce Hillyer
    • 1
  • Tin Kam Ho
    • 1
  1. 1.Bell LaboratoriesLucent TechnologiesMurray HillUSA

Personalised recommendations