Model for Automatic Textual Data Clustering in Relational Databases Schema

Yafooz, Wael M.S.; Abidin, Siti Z.Z.; Omar, Nasiroh; Halim, Rosenah A.

doi:10.1007/978-981-4585-18-7_4

Wael M.S. Yafooz⁴,
Siti Z.Z. Abidin⁴,
Nasiroh Omar⁴ &
…
Rosenah A. Halim⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 285))

3041 Accesses
2 Citations

Abstract

In the last two decades, unstructured information has become a major challenge in information management. Such challenge is caused by the massive and increasing amount of information resulting from the conversion of almost all daily tasks into digital format. Tools and applications are necessary in organizing unstructured information, which can be found in structured data, such as in relational database management systems (RDBMS). RDBMS has robust and powerful structures for managing, organizing, and retrieving data. However, structured data still contains unstructured information. In this paper, the methods used for managing unstructured data in RDBMS are investigated. In addition, an incremental and dynamic repository for managing unstructured data in relational databases are introduced. The proposed technique organizes unstructured information through linkages among textual data based on semantics. Furthermore, it provides users with a good picture of the unstructured information. The proposed technique can rapidly and easily obtain useful data, and thus, it can be applied in numerous domains, particularly those who deal with textual data, such as news articles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Doan, A., et al., Information extraction challenges in managing unstructured data. SIGMOD Record, 2008. Vol. 37, No. 4.
Google Scholar
Doan, A., et al., The case for a structured approach to managing unstructured data. arXiv preprint arXiv:0909.1783, 2009.
Google Scholar
Li, Y., S.M. Chung, and J.D. Holt, Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering, 2008. 64.1: p. 381-404.
Article Google Scholar
Blumberg, R. and S. Atre, The problem with unstructured data. DM REVIEW, 2003. 13: p. 42-49.
Google Scholar
Chu, E., et al., A relational approach to incrementally extracting and querying structure in unstructured data. Proceedings of the 33rd international conference on Very large databases, 2007. VLDB Endowment.
Google Scholar
Tari, L., et al., Parse Tree Database for Information Extraction. IEEE TRANSACTIONS ON KNOWLEDGE and DATA ENGINEERING, 2010.
Google Scholar
Mansuri, I.R. and Sarawagi, Integrating unstructured data into relational databases. Data Engineering, ICDE’06. Proceedings of the 22nd International Conference on. IEEE, 2006.
Google Scholar
Roy, P., et al., Towards Automatic Association of Relevant Unstructured Content with Structured Query Results. Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.
Google Scholar
Roy, P. and M. Mohania, SCORE: symbiotic context oriented information retrieval. Advances in Data and Web Management. Springer Berlin Heidelberg, 2007: p. 30-38.
Google Scholar
Jain, A., A. Doan, and L. Gravano, Optimizing SQL Queries over Text Databases. Data Engineering,. ICDE. IEEE 24th International Conference on. IEEE, 2008.
Google Scholar
Kandogan, E., et al., Avatar Semantic Search: A Database Approach to Information Retrieval. SIGMOD, Chicago, Illinois,USA, 2006: p. 790-792.
Google Scholar
Agrawal, S., S. Chaudhuri, and G. Das, DBXplorer: A System for Keyword-Based Search over Relational Databases. Data Engineering. Proceedings. 18th International Conference on. IEEE, 2002.
Google Scholar
Hristidis, V. and Y. Papakonstantinou, Discover: Keyword search in relational databases. Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, 2002.
Google Scholar
Li, G., et al., EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data. Proceedings of the ACM SIGMOD international conference on Management of data, 2008.
Google Scholar
Luo, Y., W. Wang, and X. Lin, SPARK: A Keyword Search Engine on Relational Databases. Data Engineering. ICDE. IEEE 24th International Conference on. IEEE, 2008.
Google Scholar
YafoozA, W.M.S., S.Z. Abidin, and N. Omar, Towards automatic column-based data object clustering for multilingual databases. Control System, Computing and Engineering (ICCSCE), IEEE International Conference on. IEEE, 2011.
Google Scholar
Miller, G., WordNet: A Lexical Database for English. Communications of the ACM 1995. 38.11: p. 39-41.
Article Google Scholar
Sarawagi, S., Information Extraction. Foundations and Trends in Databases, 2008. Vol. 1, No. 3 (2007): p. 261–377.
Google Scholar
Koc, M.L. and C. R′e, Incrementally Maintaining Classification using an RDBMS. Proceedings of the VLDB Endowment, 2011. Vol. 4, No. 5.
Google Scholar
Fischer, U., et al., Towards Integrated Data Analytics: Time Series Forecasting in DBMS. Datenbank Spektrum 2013. 13.
Google Scholar
Cafarella, M.J., et al., Structured querying of Web text. 3rd Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA, 2007.
Google Scholar
Cafarella, M.J., Extracting and Querying a Comprehensive Web Database. Proc. of the 4 th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA., 2009.
Google Scholar
Jain, A., A. Doan, and L. Gravano, SQL Queries Over Unstructured Text Databases. Data Engineering. ICDE, IEEE 23rd International Conference on. IEEE, 2007.
Google Scholar
Text, O., 11 g Oracle Text Technical White Paper. 2007.
Google Scholar
Text, O., an oracle technical white paper. 2005.
Google Scholar
Jain, A.K., N. Murty, and P.J. Flynn, Data Clustering: A Review. ACM computing surveys (CSUR), 1999. 31.3: p. 264-323.
Article Google Scholar
Su, C., et al., Text Clustering Approach Based on Maximal Frequent Term Sets. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA, 2009.
Google Scholar
Vishal Gupta, G.S.L., A Survey of Text Mining Techniques and Applications. JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, 2009. VOL. 1, NO. 1.
Google Scholar
Steinberger, R., et al., RC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. In RANLP 2011: p. pp. 104-110.
Google Scholar
YafoozB, W.M.S., S.Z. Abidin, and N. Omar, Challenges and issues on online news management. Control System, Computing and Engineering (ICCSCE),IEEE International Conference on., 2011.
Google Scholar
Fung, B.C.M., K. Wangy, and M. Ester, Hierarchical Document Clustering Using Frequent Itemsets. Proceedings of the SIAM international conference on data mining, 2003. 30. No. 5.
Google Scholar

Download references

Acknowledgments

The authors wish to thank Universiti Teknologi MARA(UiTM) for the financial support. This work was supported in part by a grant number 600-RMI-/DANA 5/3/RIF (498/2012).

Author information

Authors and Affiliations

Faculty of Computer and Mathematical Sciences, UiTM, Shah Alam, Selagor, Malaysia
Wael M.S. Yafooz, Siti Z.Z. Abidin, Nasiroh Omar & Rosenah A. Halim

Authors

Wael M.S. Yafooz
View author publications
You can also search for this author in PubMed Google Scholar
Siti Z.Z. Abidin
View author publications
You can also search for this author in PubMed Google Scholar
Nasiroh Omar
View author publications
You can also search for this author in PubMed Google Scholar
Rosenah A. Halim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wael M.S. Yafooz .

Editor information

Editors and Affiliations

Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Tutut Herawan
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Mustafa Mat Deris
School of Information Technology, Deakin University, Burwood, Victoria, Australia
Jemal Abawajy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yafooz, W.M., Abidin, S.Z., Omar, N., Halim, R.A. (2014). Model for Automatic Textual Data Clustering in Relational Databases Schema. In: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore. https://doi.org/10.1007/978-981-4585-18-7_4

Download citation

DOI: https://doi.org/10.1007/978-981-4585-18-7_4
Published: 15 December 2013
Publisher Name: Springer, Singapore
Print ISBN: 978-981-4585-17-0
Online ISBN: 978-981-4585-18-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics