Abstract
With the amount of textual information massively growing in various kinds of business systems and Internet, there are increasingly demands for analyzing both structured data and unstructured text data. Online Analysis Processing (OLAP) is effective for analyzing and mining structured data. However, while handling with unstructured data, it is powerless. After working on several information integration and data analysis applications, we have realized the defect of OLAP on text data analysis and use technical ways to handle this issue. In this paper, we propose a semi-supervised algorithm to extract dimensions and their members from textual information for the purpose of analyzing a huge set of textual data. We use straightforward measures to express analysis results. Experiment result shows that the extracting algorithm is valid and our approach has a high scalability and flexibility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: VLDB, pp. 506–521 (1996)
Chaudhuri, S., Dayal, U.: An overview of data warehousing and olap technology. SIGMOD Rec. 26, 65–74 (1997)
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: ICDE, p. 152 (1996)
Wu, T., Xin, D., Mei, Q.: Promotion analysis in multi-dimensional space. In: VLDB 2009 (2009)
Inokuchi, A., Takeda, K.: A Method for Online Analytical Processing of Text Data. ACM, New York (2007)
Baid, A., Balmin, A., Hwang, H.: DBPubs: multidimensional exploration of database publications. ACM, New York (2008)
Lin, C.X., Ding, B., Han, J., Zhu, F., Zhao, B.: Text Cube: Computing IR Measures for Multidimensional Text Database Analysis. In: ICDM (2008)
Cody, W.F., Kreulen, J.T., Krishna, V., Spangler, W.S.: The integration of business intelligence and knowledge management. IBM Syst. J. 41, 697–713 (2002)
Megaputer’s polyanalyst, http://www.megaputer.com/
Yu, Y., Lin, C.X., Sun, Y.: iNextCube: Information network-enhanced text cube. ACM, New York (2009)
Simitsis, A., Baid, A., Sismanis, Y., Reinwald, B.: VLDB 2008 Multidimensional Content eXploration (2008)
Zhang, D., Zhai, C., Han, J.: Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases. In: SDM (2009)
Liu, Y.: Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions. In: IJCAI 2007 (2007)
Brefeld, U.: Co-EM support vector learning. In: Conference on Machine Learning (2004)
Stanford Log-linear Part-Of-Speech Tagger, http://nlp.stanford.edu/software/tagger.shtml
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, C., Wang, X., Peng, Z. (2011). Extracting Dimensions for OLAP on Multidimensional Text Databases. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-23982-3_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)