Abstract
Principles of constructing systems for generating DTDs for a collection of XML documents are discussed. Methods and algorithms for creating DTDs are developed. A DTD generation system for a collection of XML documents is developed. This system can efficiently be used both for solving applied problems and for theoretical studies.
Similar content being viewed by others
REFERENCES
Extensible Markup Language (XML), 1.0, Third Edition, http://www.w3.org/TR/REC-xml.
Overview of SGML Resources: http://www.w3.org/MarkUp/SGML/.
AlphaWorks: Data Descriptors by Example, http://www.alphaworks.ibm.com/tech/DDbE.
Allora, http://www.hitsw.com/products_services/xml_platform/allora_dsheet.html.
XML Spy, http://www.xmlspy.com/.
SAXON, http://sourceforge.net/projects/saxon.
Garofalakis, M., Gionis, A., Rajeev, R., Seshadri, S., and Kyuseok Shim, XTRACT: Learning Document Type Descriptors from XML Document Collections, Data Mining Knowledge Discovery, 2003, no. 7, pp. 23–56.
Shafer, K.E., Creating DTDs via the GB-Engine and Fred, 1995.
Brazma, A., Efficient Identification of Regular Expressions from Representative Examples, Proc. of the Ann. Conf. on Computational Learning Theory (COLT), 1993.
Kilpelainen, P., Mannila, H., and Ukkonen, E., MDL Learning of Unions of Simple Pattern Languages from Positive Examples, Proc. of the European Conf. on Computational Learning Theory (EuroCOLT), 1995.
Fernandez, M. and Suciu, D., Optimizing Regular Path Expressions Using Graph Schemas, Proc. of the Intl. Conf. on Database Theory (ICDT), 1997.
Goldman, R. and Widom, J., DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases, Proc. of the Intl. Conf. on Very Large Data Bases (VLDB), 1997.
Nestorov, S., Abiteboul, S., and Motwani, R., Extracting Schema from Semistructured Data, Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 1998.
Novak, L.G. and Kuznetsov, S.D., Properties of XML Data Schemas, Trudy Instituta Sistemnogo Programmirovaniya RAN (Proceeding of the Institute for System Programming RAS), Moscow, 2003.
Rissanen, J., Modeling by Shortest Data Description, Automatica, 1978, no. 14, pp. 465–471.
Rissanen, J., Stochastic Complexity in Statistical Inquiry, World Sci., 1989.
Brayton, R.K., and McMullen, C., The Decomposition and Factorization of Boolean Expressions, Proc. of the Intl. Symp. on Circuits and Systems, 1982.
Wang, A.R.R., Algorithms for Multi-Level Logic Optimization, PhD Dissertation, Berkeley: Univ. of California, 1989.
Charikar, M. and Guha, S., Improved Combinatorial Algorithms for the Facility Location and K-Median Problem, Proc. of the Annu. Symp. on Foundations of Computer Sci. (FOCS), 1999.
Hochbaum, D.S., Heuristics for the Fixed Cost Median Problem, Math. Programming, 1982, no. 22, pp. 148–162.
Hopcroft, J.E. and Ullman, J.D., Introduction to Automation Theory, Languages and Computation, Reading, Mass.: Addison-Wesley, 1979.
Khusnutdinov, R.R., Storing XML Data in RDBMS and Generating DTD Based on Them, MD Dissertation, Moscow: Moscow Inst. of Physics and Technology, 2004.
Leonov, A.V. and Khusnutdinov, R.R., Construction of an Optimal Relational Schema for Storing XML Documents in an RDBMS without Using DTD/XML Schema, Programmirovanie, 2004, no. 6, pp. 30–48.
MySQL official site: Open Source Relational Database Management System, http://www.mysql.com.
Java 2 Standard Edition official site, http://java.sun.com/j2se/.
Author information
Authors and Affiliations
Additional information
__________
Translated from Programmirovanie, Vol. 31, No. 4, 2005.
Original Russian Text Copyright © 2005 by Leonov, Khusnutdinov.
Rights and permissions
About this article
Cite this article
Leonov, A.V., Khusnutdinov, R.R. Study and Development of the DTD Generation System for XML Documents. Program Comput Soft 31, 197–210 (2005). https://doi.org/10.1007/s11086-005-0032-6
Received:
Issue Date:
DOI: https://doi.org/10.1007/s11086-005-0032-6