Analyzer: A Framework for File Analysis

  • Martin Svoboda
  • Jakub Stárka
  • Jan Sochna
  • Jiří Schejbal
  • Irena Mlýnková
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6193)


This paper aims to introduce Analyzer – a complete framework for performing statistical analyses of real-world documents. Exploitation of results of these analyses is a classical way how data processing can be optimized in many areas. Although this intent is legitimate, ad hoc and dedicated analyses soon become obsolete, they are usually built on insufficiently extensive collections and are difficult to repeat. Analyzer represents an easily extensible framework, which helps the user with gathering documents, managing analyses and browsing computed reports. This paper particularly attempts to discuss proposed analyses model, standard application usage and features, and also basic aspects of Analyzer architecture and implementation.


File Analysis Analyzer Architecture Extensible Framework Mime Type Propose Analysis Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
    XML Path Language (XPath) 1.0. W3C (1999),
  8. 8.
    Extensible Markup Language (XML) 1.0, 4th edn. W3C (2006),
  9. 9.
    XQuery 1.0: An XML Query Language. W3C (2007),
  10. 10.
    Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data. In: WWW 2008, pp. 825–834. ACM, New York (2008)CrossRefGoogle Scholar
  11. 11.
    Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML Schema: a Practical Study. In: WebDB 2004, pp. 79–84. ACM, New York (2004)CrossRefGoogle Scholar
  12. 12.
    Biron, P.V., Malhotra, A.: XML Schema Part 2: Datatypes, 2nd edn. W3C (2004),
  13. 13.
    Busse, R., Carey, M., Florescu, D., Kersten, M., Manolescu, I., Schmidt, A., Waas, F.: XMark Generator 0.96,
  14. 14.
    Choi, B.: What are Real DTDs Like? In: WebDB 2002, Madison, Wisconsin, USA, pp. 43–48. ACM, New York (2002)Google Scholar
  15. 15.
    Galamboš, L.: Egothor 1.0, Java Search Engine (2006),
  16. 16.
    Klettke, M., Schneider, L., Heuer, A.: Metrics for XML Document Collections. In: XMLDM 2002 Workshops, Prague, Czech Republic, pp. 162–176 (2002)Google Scholar
  17. 17.
    Krátký, M., Pokorný, J., Snášel, V.: Indexing XML Data with UB-Trees. In: Manolopoulos, Y., Návrat, P. (eds.) ADBIS 2002. LNCS, vol. 2435, pp. 155–164. Springer, Heidelberg (2002)Google Scholar
  18. 18.
    McArdle, S.: MIME Utils 2.0, Mime Type Detection Utility for Java (2009),
  19. 19.
    McDowell, A., Schmidt, C., Yue, K.: Analysis and Metrics of XML Schema. In: SERP 2004, Las Vegas, Nevada, USA, pp. 538–544. CSREA Press (2004)Google Scholar
  20. 20.
    Mignet, L., Barbosa, D., Veltri, P.: The XML Web: a First Study. In: WWW 2003, pp. 500–510. ACM, New York (2003)CrossRefGoogle Scholar
  21. 21.
    Mlýnková, I., Pokorný, J.: Similarity of XML Schema Fragments Based on XML Data Statistics. In: Innovations 2007, pp. 243–247. IEEE Press, Los Alamitos (2007)Google Scholar
  22. 22.
    Mlýnková, I., Toman, K., Pokorný, J.: Statistical Analysis of Real XML Data Collections. In: COMAD 2006, New Delhi, India, pp. 20–31. Tata McGraw-Hill Publishing Company Limited, New York (2006)Google Scholar
  23. 23.
    Sahuguet, A.: Everything You Ever Wanted to Know About DTDs, But Were Afraid to Ask. In: Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, pp. 171–183. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  24. 24.
    Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema Part 1: Structures, 2nd edn. W3C (2004),

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Martin Svoboda
    • 1
  • Jakub Stárka
    • 1
  • Jan Sochna
    • 1
  • Jiří Schejbal
    • 1
  • Irena Mlýnková
    • 1
  1. 1.Department of Software EngineeringCharles University in PragueCzech Republic

Personalised recommendations