Model for Automatic Textual Data Clustering in Relational Databases Schema

  • Wael M.S. YafoozEmail author
  • Siti Z.Z. Abidin
  • Nasiroh Omar
  • Rosenah A. Halim
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 285)


In the last two decades, unstructured information has become a major challenge in information management. Such challenge is caused by the massive and increasing amount of information resulting from the conversion of almost all daily tasks into digital format. Tools and applications are necessary in organizing unstructured information, which can be found in structured data, such as in relational database management systems (RDBMS). RDBMS has robust and powerful structures for managing, organizing, and retrieving data. However, structured data still contains unstructured information. In this paper, the methods used for managing unstructured data in RDBMS are investigated. In addition, an incremental and dynamic repository for managing unstructured data in relational databases are introduced. The proposed technique organizes unstructured information through linkages among textual data based on semantics. Furthermore, it provides users with a good picture of the unstructured information. The proposed technique can rapidly and easily obtain useful data, and thus, it can be applied in numerous domains, particularly those who deal with textual data, such as news articles.


Relational databases Unstructured data Document clustering Query efficiency Textual data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The authors wish to thank Universiti Teknologi MARA(UiTM) for the financial support. This work was supported in part by a grant number 600-RMI-/DANA 5/3/RIF (498/2012).


  1. 1.
    Doan, A., et al., Information extraction challenges in managing unstructured data. SIGMOD Record, 2008. Vol. 37, No. 4.Google Scholar
  2. 2.
    Doan, A., et al., The case for a structured approach to managing unstructured data. arXiv preprint arXiv:0909.1783, 2009.Google Scholar
  3. 3.
    Li, Y., S.M. Chung, and J.D. Holt, Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering, 2008. 64.1: p. 381-404.CrossRefGoogle Scholar
  4. 4.
    Blumberg, R. and S. Atre, The problem with unstructured data. DM REVIEW, 2003. 13: p. 42-49.Google Scholar
  5. 5.
    Chu, E., et al., A relational approach to incrementally extracting and querying structure in unstructured data. Proceedings of the 33rd international conference on Very large databases, 2007. VLDB Endowment.Google Scholar
  6. 6.
    Tari, L., et al., Parse Tree Database for Information Extraction. IEEE TRANSACTIONS ON KNOWLEDGE and DATA ENGINEERING, 2010.Google Scholar
  7. 7.
    Mansuri, I.R. and Sarawagi, Integrating unstructured data into relational databases. Data Engineering, ICDE’06. Proceedings of the 22nd International Conference on. IEEE, 2006.Google Scholar
  8. 8.
    Roy, P., et al., Towards Automatic Association of Relevant Unstructured Content with Structured Query Results. Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.Google Scholar
  9. 9.
    Roy, P. and M. Mohania, SCORE: symbiotic context oriented information retrieval. Advances in Data and Web Management. Springer Berlin Heidelberg, 2007: p. 30-38.Google Scholar
  10. 10.
    Jain, A., A. Doan, and L. Gravano, Optimizing SQL Queries over Text Databases. Data Engineering,. ICDE. IEEE 24th International Conference on. IEEE, 2008.Google Scholar
  11. 11.
    Kandogan, E., et al., Avatar Semantic Search: A Database Approach to Information Retrieval. SIGMOD, Chicago, Illinois,USA, 2006: p. 790-792.Google Scholar
  12. 12.
    Agrawal, S., S. Chaudhuri, and G. Das, DBXplorer: A System for Keyword-Based Search over Relational Databases. Data Engineering. Proceedings. 18th International Conference on. IEEE, 2002.Google Scholar
  13. 13.
    Hristidis, V. and Y. Papakonstantinou, Discover: Keyword search in relational databases. Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, 2002.Google Scholar
  14. 14.
    Li, G., et al., EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data. Proceedings of the ACM SIGMOD international conference on Management of data, 2008.Google Scholar
  15. 15.
    Luo, Y., W. Wang, and X. Lin, SPARK: A Keyword Search Engine on Relational Databases. Data Engineering. ICDE. IEEE 24th International Conference on. IEEE, 2008.Google Scholar
  16. 16.
    YafoozA, W.M.S., S.Z. Abidin, and N. Omar, Towards automatic column-based data object clustering for multilingual databases. Control System, Computing and Engineering (ICCSCE), IEEE International Conference on. IEEE, 2011.Google Scholar
  17. 17.
    Miller, G., WordNet: A Lexical Database for English. Communications of the ACM 1995. 38.11: p. 39-41.CrossRefGoogle Scholar
  18. 18.
    Sarawagi, S., Information Extraction. Foundations and Trends in Databases, 2008. Vol. 1, No. 3 (2007): p. 261–377.Google Scholar
  19. 19.
    Koc, M.L. and C. R′e, Incrementally Maintaining Classification using an RDBMS. Proceedings of the VLDB Endowment, 2011. Vol. 4, No. 5.Google Scholar
  20. 20.
    Fischer, U., et al., Towards Integrated Data Analytics: Time Series Forecasting in DBMS. Datenbank Spektrum 2013. 13.Google Scholar
  21. 21.
    Cafarella, M.J., et al., Structured querying of Web text. 3rd Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA, 2007.Google Scholar
  22. 22.
    Cafarella, M.J., Extracting and Querying a Comprehensive Web Database. Proc. of the 4 th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA., 2009.Google Scholar
  23. 23.
    Jain, A., A. Doan, and L. Gravano, SQL Queries Over Unstructured Text Databases. Data Engineering. ICDE, IEEE 23rd International Conference on. IEEE, 2007.Google Scholar
  24. 24.
    Text, O., 11 g Oracle Text Technical White Paper. 2007.Google Scholar
  25. 25.
    Text, O., an oracle technical white paper. 2005.Google Scholar
  26. 26.
    Jain, A.K., N. Murty, and P.J. Flynn, Data Clustering: A Review. ACM computing surveys (CSUR), 1999. 31.3: p. 264-323.CrossRefGoogle Scholar
  27. 27.
    Su, C., et al., Text Clustering Approach Based on Maximal Frequent Term Sets. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA, 2009.Google Scholar
  28. 28.
    Vishal Gupta, G.S.L., A Survey of Text Mining Techniques and Applications. JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, 2009. VOL. 1, NO. 1.Google Scholar
  29. 29.
    Steinberger, R., et al., RC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. In RANLP 2011: p. pp. 104-110.Google Scholar
  30. 30.
    YafoozB, W.M.S., S.Z. Abidin, and N. Omar, Challenges and issues on online news management. Control System, Computing and Engineering (ICCSCE),IEEE International Conference on., 2011.Google Scholar
  31. 31.
    Fung, B.C.M., K. Wangy, and M. Ester, Hierarchical Document Clustering Using Frequent Itemsets. Proceedings of the SIAM international conference on data mining, 2003. 30. No. 5.Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2014

Authors and Affiliations

  • Wael M.S. Yafooz
    • 1
    Email author
  • Siti Z.Z. Abidin
    • 1
  • Nasiroh Omar
    • 1
  • Rosenah A. Halim
    • 1
  1. 1.Faculty of Computer and Mathematical SciencesUiTMSelagorMalaysia

Personalised recommendations