Abstract
At the heart of today's information-explosion problems are issues involving semantics, mutual understanding, concept matching, and interoperability. Ontologies and the Semantic Web are offered as a potential solution, but creating ontologies for real-world knowledge is nontrivial. If we could automate the process, we could significantly improve our chances of making the Semantic Web a reality. While understanding natural language is difficult, tables and other structured information make it easier to interpret new items and relations. In this paper we introduce an approach to generating ontologies based on table analysis. We thus call our approach TANGO (Table ANalysis for Generating Ontologies). Based on conceptual modeling extraction techniques, TANGO attempts to (i) understand a table's structure and conceptual content; (ii) discover the constraints that hold between concepts extracted from the table; (iii) match the recognized concepts with ones from a more general specification of related concepts; and (iv) merge the resulting structure with other similar knowledge representations. TANGO is thus a formalized method of processing the format and content of tables that can serve to incrementally build a relevant reusable conceptual ontology.
Similar content being viewed by others
References
R. Baumgartner, S. Flesca, and G. Gottlob, “Visual web information extraction with Lixto,” in Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). Rome, Italy, 2001, pp. 119–128.
S. Bergamaschi, S. Castano, and M. Vincini, “Semantic integration of semistructured and structured data sources,” SIGMOD Record 28(1), 1999, 54–59.
T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic Web,” Scientific American 36(25) 2001.
J. Biskup and D. Embley, “Extracting information from heterogeneous information sources using ontologically specified target views,” Information Systems 28(3), 2003, 169–212.
A. Burgun and O. Bodenreider, “Comparing terms, concepts, and semantic classes in WordNet and the Unified Medical Language System,” in WordNet and Other Lexical Resources: Applications, Extensions, and Customizations; An NAACL-01 (North American Association for Computational Linguistics) Workshop. Pittsburgh, Pennsylvania, 2001, pp. 77–82.
A. Cali, D. Calvanese, G. D. Giacomo, and M. Lenzerini, “On the expressive power of data integration systems,” in Proceedings of 21st International Conference on Conceptual Modeling (ER2002). Tampere, Finland, 2002, pp. 338–350.
S. Castano, V. D. Antonellis, M. Fugini, and B. Pernici, “Conceptual Schema Analysis: Techniques and Applications,” ACM Transactions on Database Systems 23(3), 1998, 286–333.
T. Chartrand, “Ontology-based extraction of RDF data from the world wide web”. Master's thesis, Brigham Young University, Provo, Utah 2003.
R. Chiang, T. Barron, and V. Storey, “Reverse engineering of relational databases: Extraction of an eer model from a relational database,” Data & Knowledge Engineering 12(2), 1994, 107–142.
S. Clyde, D. Embley, and S. Woodfield, “Improving the quality of systems and domain analysis through object class congruency,” in Proceedings of the International IEEE Symposium on Engineering of Computer Based Systems (ECBS'96), Friedrichshafen, Germany, 1996, pp. 44–51.
V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRunner: Towards automatic data extraction from large web sites,” in Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). Rome, Italy, 2001, pp. 109–118.
dlbeck.com, 2003, “dlbeck.com,” http://www.dlbeck.com/population.htm.
A. Doan, P. Domingos, and A. Halevy, “Reconciling schemas of disparate data sources: A machine-learning approach,” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001). Santa Barbara, California, 2001, pp. 509–520.
D. Embley, “Programming with data frames for everyday data items,” in Proceedings of the 1980 National Computer Conference. Anaheim, California, 1980, pp. 301–305.
D. Embley, Object Database Development: Concepts and Principles, Addison-Wesley: Reading, Massachusetts, 1998.
D. Embley, D. Campbell, Y. Jiang, S. Liddle, D. Lonsdale, Y.-K. Ng, and R. Smith, “Conceptual-model-based data extraction from multiple-record web pages,” Data & Knowledge Engineering 31(3), 1999, 227–251.
D. Embley, D. Hurst, D. Lopresti, and G. Nagy,“Table processing Paradigms: A research survey,” International Journal on Document Analysis and Recognition, 2004a. (Submitted).
D. Embley, D. Jackman, and L. Xu, “Multifaceted exploitation of metadata for attribute match discovery in information integration,” in Proceedings of the International Workshop on Information Integration on the Web (WIIW'01). Rio de Janeiro, Brazil, 2001, pp. 110–117.
D. Embley, B. Kurtz, and S. Woodfield, Object-Oriented Systems Analysis: A Model-Driven Approach, Prentice Hall: Englewood Cliffs, New Jersey, 1992.
D. Embley, C. Tao, and S. Liddle, “Automating the extraction of data from tables with unknown structure,” Data & Knowledge Engineering. (to appear) currently at http://www.deg.byu.edu/papers/dke2003etl.pdf, 2004b.
D. Embley and M. Xu, “Relational database reverse engineering: A model-centric, transformational, interactive approach formalized in model theory,” in DEXA'97 Workshop Proceedings, Toulouse, France, 1997, pp. 372–377.
C. Fellbaum, WordNet: An Electronic Lexical Database, MIT Press: Cambridge, Massachussets, 1998.
T. R. Gruber, “Towards principles for the design of ontologies used for knowledge sharing,” in N. Guarino and R. Poli (eds.), Formal Ontology in Conceptual Analysis and Knowledge Representation. Deventer, The Netherlands, 1993.
N. Guarino, “Formal ontologies and information systems,” in N. Guarino (ed.), Proceedings of the First International Conference on Formal Ontology in Information Systems (FOIS98). Trento, Italy, 1998, pp. 3–15.
J.-L. Hainaut, “Database reverse engineering: Models, techniques and strategies,” Proc. of the 10th International Conference on Entity-Relationship Approach (ER'91). San Mateo, California, USA, 1991, pp. 643–670.
Y. Kalfoglou and M. Schorlemmer, “Ontology mapping: The state of the art,” The Knowledge Engineering Review 18(1), 2003, 1–31.
M. Kantola, H. Mannila, K.-J. Räihä, and H. Siirtola, “Discovering functional and inclusion dependencies in relational databases,” International Journal of Intelligent Systems 7, 1992, 591–607.
J. Lemke, “Multiplying meaning: Visual and verbal semiotics in scientific text,” in J. Martin and R. Veel (eds.), Reading Science: Critical and Functional Perspectives on Discourses of Science. Routledge, 1998, pp. 87–113.
W.-S. Li and C. Clifton, “Semantic integration in heterogeneous databases using neural networks”. in Proceedings of the 20th Very Large Data Base Conference. Santiago, Chile, 1994.
D. Lopresti and G. Nagy, “A tabular survey of table processing,” in A. Chhabra and D. Dori (eds.), Graphics Recognition—Recent Advances, Lecture Notes in Computer Science, LNCS 1941. Springer Verlag, 2000, pp. 93–120.
J. Madhavan, P. Bernstein, and E. Rahm, “Generic schema matching with cupid,” in Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). Rome, Italy, 2001, pp. 49–58.
D. Maier, The Theory of Relational Databases, Computer Science Press, Inc: Rockville, Maryland, 1983.
D. Maier and L. Delcambre, “Superimposed information for the internet,” in S. Cluet and T. Milo (eds.), Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99). Philadelphia, Pennsylvania, 1999.
F. D. Marchi, S. Lopes, J.-M. Petit, and F. Toumani, “Analysis of existing databases and the logical level: The DBA companion project,” SIGMOD Record 32(1), 2003, 47–52.
V. Markowitz and J. A. Makowsky, “Identifying extended entity-relationship object structures in relational schemas,” IEEE Transactions on Software Engineering 16(8), 1990, 777–790.
D. McGuinness, R. Fikes, J. Rice, and S. Wilde, “An environment for merging and testing large ontologies,” in Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning. Breckenridge, Colorado, 2000, pp. 483–493.
T. Milo and S. Zohar, “Using schema matching to simplify heterogeneous data translation,” in Proceedings of the 24th International Conference on Very Large Data Bases (VLDB-98), 1998, pp. 122–133.
R. Mizoguchi and M. Ikeda, “Towards ontology engineering,” in proceedings of the Joint 1997 Pacific Asian Conference on Expert Systems / Singapore International Conference on Intelligent Systems. Singapore, 1997, pp. 259–266.
MoA, 2004, “MoA—An OWL ontology merging and alignment tool,” http://mknows.etri.re.kr/moa/index.html.
MostSpokenLanguages, “The 30 most spoken languages of the world,” http://www.krysstal.com/spoken.html, 2003.
S. Nestorov, S. Abiteboul, and R. Motwani, “Extracting schema from semistructured data,” in Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD'98), Seattle, Washington, 1998, pp. 295–306.
E. Rahm and P. Bernstein, “A survey of approaches to automatic schema matching,” The VLDB Journal 10, 2001, 334–350.
M. Schoop, A. Becks, C. Quix, T. Burwick, C. Engels, and M. Jarke, “Enhancing decision and negotiation support in enterprise networks through semantic web technologies,” in XML Technologien fur das Semantic Web—XSW 2002, Proceedings zum Workshop, 2002, pp. 161–167.
P. Spyns, R. Meersman, and M. Jarrar, “Data modeling versus ontology engineering,” SIGMOD Record 31(4), 2002, 12–17.
Y. Tijerino, D. Embley, D. Lonsdale, and G. Nagy, “Ontology generation from tables,” in Proceedings of the 4th International Conference on Web Information Systems Engineering. Rome, Italy, 2003, 242–249.
TopoZone2002: 2002, ‘TopoZone,’ http://www.topozone.com.
K. Wang and H. Liu, “Schema discovery for semistructured data,” in Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. Newport Beach, California, 1997, pp. 271–274.
WorldAtlas2003, ‘WorldAtlas.Com,’ 2003, http://www.worldatlas.com/geoquiz/thelist.htm.
WorldFactbook2003, “The World Factbook—2003”, 2003. http://www.cia.gov/cia/publications/factbook.
L. Xu and D. Embley, “Using domain ontologies to discover direct and indirect matches for schema elements,” in Proceedings of the Workshop on Semantic Integration (WSI'03). Sanibel Island, Florida, 2003, pp. 105–110.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tijerino, Y.A., Embley, D.W., Lonsdale, D.W. et al. Towards Ontology Generation from Tables. World Wide Web 8, 261–285 (2005). https://doi.org/10.1007/s11280-005-0360-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-005-0360-8