Abstract
The similarities between data cubes and multi-dimensional tables have long been noted. Routinely, OLAP reporting tools produce multidimensional tables from data cubes. In this paper, we develop a scheme that does the reverse transformation, automatically, so that one may produce charts directly from multi-dimensional tables using standard OLAP data visualization tools. In the process, we develop several new techniques for table processing: (i) extraction of non-overlapping hierarchies from a table; (ii) extraction of metadata from the table title via natural language processing; and (iii) integration of tables in a table series, and integration of tables with common dimensions. Experiments were conducted on some 800 summary tables from Statistics Canada, and our success rate was greater than 90 tested.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Shoshani, A.: OLAP and statistical databases: similarities and differences. In: Fifth International Conference on Information and Knowledge Management, Rockville, Maryland (1996)
Wang, X.: Tabular abstraction, editing, and formatting, University of Waterloo, Ph.D. Thesis (1996)
Lenz, H.J., Shoshani, A.: Summarizability in OLAP and Statistical Data Bases. In: Ninth International Conference on Scientific and Statistical Database Management, Olympia, WA, USA, pp. 132–143 (1997)
Embley, D.W., Lopresti, D.P., Nagy, G.: Notes on Contemporary Table Recognition. In: 7th Int. Workshop on Document Analysis Systems, pp. 164–175 (2006)
Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards Domain-Independent Information Extraction from Web Tables. In: WWW 2007, Banff, Alberta, Canada (2007)
Pivk, A.: Automatic ontology generation from web tabular structures. AI Communications 19, 83–85 (2006)
Tanaka, M., Ishida, T.: Ontology Extraction from Tables on the Web. In: International Symposium on Applications on Internet, pp. 284–290 (2006)
Seth, S., Jandhyala, R., Krishnamoorthy, M., Nagy, G.: Analysis and Taxonomy of Column Header Categories for Web Tables. In: DAS 2010, Boston, MA, USA (2010)
Luk, W., Leung, P.: Extraction of Semantics From Web Statistical Tables. In: IEEE/WIC/ACM International Workshop on Semantic Web Mining and Reasoning, Beijing, China (2004)
US Census Bureau, Education Attainment (Table 3) (2011), http://www.census.gov/hhes/socdemo/education/data/cps/2011/tables.html
Northedge, R.: Code Project: Statistical parsing of English sentences (2011), http://www.codeproject.com/Articles/12109/Statistical-parsing-of-English-sentences
Simpson, T., Dao, T.: WordNet-based semantic similarity measurement, Source code (January 2010), http://wordnetdotnet.googlecode.com/svn/trunk/Projects/Thanh/
Simpson, T., Dao, T.: Code Project: WordNet-based semantic similarity measurement (2010), http://www.codeproject.com/KB/string/semanticsimilaritywordnet.aspx?msg=2776502
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics, pp. 133–138 (1994)
Statistics Canada, http://www.statcan.gc.ca/start-debut-eng.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alrayes, N., Luk, WS. (2012). Automatic Transformation of Multi-dimensional Web Tables into Data Cubes. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2012. Lecture Notes in Computer Science, vol 7448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32584-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-32584-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32583-0
Online ISBN: 978-3-642-32584-7
eBook Packages: Computer ScienceComputer Science (R0)