Trends in Database Research
Research in databases has played a vital role in developing information system technology and also the outcome of this research has been very successful resulting in great potential for industry. The database industry is generating billions of dollars of business annually. It has estimated that the database industry itself has generated $42 billion revenue in 2000 and this market is growing at 11% annually. In the software industry, it is second only to operating system software.
The evolution of database systems started in late 60’s when hierarchical and network data models were developed, but these models did not get much momentum because they were not suited for complex applications . In early 70’s, Ted Codd proposed a relational data model (for which he was awarded a Turing award) which became the backbone of developing database applications. Although this model was criticized by COBOL/CODASYL group people, it became very popular because of its simplicity. In late 70’s and early 80’s most of the research work was focused on developing fundamentals of relational database theory, query languages, transaction management and query optimization. In late 80’s and early 90’s, the database research community grew exponentially and made several breakthrough in many areas like object-oriented, active, deductive, and parallel and distributed databases. The research ideas in these areas have been implemented successfully in different database vendors’ products. In early 90’s, most of the businesses realize that their business is becoming very competitive and they need some sophisticated tools that can analyze their business data, customers profiles and product information so that they can improve their marketing strategy and management of organization. Data mining and data warehousing technology was developed for satisfying such needs .
Data Warehousing is a recent technology  that allows information to be easily and efficiently accessed for decision making activities. Knowledge discovery in data warehouses focuses upon the extraction of interesting and previously unknown knowledge . Researchers and application developers have designed knowledge discovery systems for number of application domains including finance, health, telecommunications and marketing. The data warehouse stores information of interest to the enterprise from multiple data sources and presents it in an integrated manner to the end user. This eager or in-advance approach to data integration and query processing from distributed data sources pays rich dividends when it translates into calculated decisions backed by sound analysis . The information stored in the data warehouse facilitates decision making activities. On-Line Analytical Processing (OLAP) tools provide an environment for decision making and business modeling activities by supporting ad-hoc queries. The multidimensional data model has been proved to be the most suitable for OLAP applications. One of the issue that is not yet fully addressed is how to define a set of constraints on this model so that we can exploit them in every area of warehouse implementation.
There are many other areas of active research in data warehousing and knowledge discovery, such as integrating active rules in warehouse data , view selection and maintenance, multiple query optimization using views, update filtering, on-line view maintenance, fragmentation of multidimensional database, parallel processing, summarizability problem, data expiry, data indexing, instance based data mining, finding emerging patterns in data cubes, and security based on data mining . Some of these issues have been addressed adequately in the literature.
The growth of the internet has dramatically changed the way in which information is managed and accessed. The WWW is a distributed global information resource and it contains large amount of uncontrolled data (HTML or XML documents) relevant to essentially all domains of human activity. To manage and access data available on the web, there is a diagnostic need for effective and efficient tools for information consumers. Users must be able to easily locate required information in the web ranging from unstructured documents and pictures to structured record oriented data. If the information is found, it is generally scattered in a piecemeal fashion. An initial effort of building a virtual warehouse of book data to provide information consumers one point of contact was done by Amazon (www.amazon.com) which now has multi-million dollar revenue from sale of books on Internet. They have a warehouse of data about the books available on Internet and give users one integrated source to find books of their interest.
Several web database system architectures have been proposed LORE  WHOWEDA , FLORID , and W3QS . These systems retrieve and manipulate semistructured data by supporting web query languages. For instance, in W3QS a user can specify content and structure queries on the web and can maintain the results of queries as database views of the web. These systems allow users to access any part of web data and manipulate it.
The enormous flexibility provided by this ubiquitous data reservoir, however, has created a number of problems related to information modeling, efficient search of information, indexing, supporting query languages, document clustering, information sharing, web mining (content, structure and log/usage file), and security. Some of these problems, especially information organization, security, and credibility of data in WWW become more complicated because information are continuously added to the web with varied processing requirements. With the increased usage of WWW, several criteria such as accuracy, objectivity, coverage, authority, ownership, are becoming useful for evaluating a web page depending on the purpose.
Another important problem that has not yet been addressed is how to exploit the relationship among web pages. The web pages retrieved by the search engine on a given topic have some relationships among themselves. A few examples of the relationships are Next-to, Previous-to, Similar-to, Example-of, Derived-from, Same-as, Part-of etc. Determining these relationships among static web pages is very important so that user can find the information in more organized way and follow the links depending on the type of relationships.
Agent technology is very useful to solve many web related problems. Several web agents have been designed for specific purpose, such as link validation and repair agent, negotiation and bargaining agent, web page quality control agent, unauthorized access agent, aggregate result formulator agent. Recently, web researchers have started their focus on developing web caching schemes. Caching is important in theWorld WideWeb since data and the number of users on the web are increasing exponentially, far out pacing the increase of network bandwidth. Caching schemes developed so far are primarily based on data usage in the past (e.g. how often the data has been accessed) to predict the access patterns for the future. Some caching schemes support content-awareness to improve cache efficiency as well as information sharing among users .
One of the greatest obstacles to widespread adoption of e-commerce is concern about the security of the system. With the growing use of the Web, security of the web-based system is now an important business decision. Many projects are developing techniques (web materialized views, mobile agents, data mining techniques) to deal with this problem.
We have studied the trends of database research by analyzing the track of accepted papers in DEXA conference in last 10 years. DEXA (Database and Expert Systems Applications) conference started in 1990 with the aim to bring researchers and practitioners together from database and expert system areas to discuss research issues and experience in developing and deploying advanced database systems. Since then, it has become a forum for exchanging ideas and publishing research papers from these areas. As time goes, new technologies emerge and DEXA recognizes, for example, few prime areas of research in databases, such as DataWarehousing, Data Mining, E-Commerce,Web Databases and two different conferences got started in these areas (DaWaK: Data Warehousing and Knowledge Discovery; EC-Web: E-Commerce and Web Technologies) with their roots in databases and knowledge bases.
From the graph, we observe that papers in the fields of OODB, Spatial, Parallel and Temporal DB have remained steady, where as papers in distributed, heterogeneous, Multimedia and Video DB have increased steadily.
KeywordsData Warehousing Cache Scheme Semistructured Data Relational Data Model Distribute Data Source
Unable to display preview. Download preview PDF.
- 1.Microsoft Data Mining and Knowledge Discovery. http://www.research.microsoft.com/datamine/.
- 2.Konopnicki D. and Shmueli O. W3qs: A query system for the world wide web. In Proc. of International conference on Very Large Databases, pages 54–65, 1995.Google Scholar
- 3.U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, editors, Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI Press, Menlo Park, California, 1996.Google Scholar
- 4.W.H. Inmon. Building the Data Warehouse. QED Publishing Group, Boston, Massachussetts, 1992.Google Scholar
- 5.Cheng Kai, Kambayashi Yahiko, and Mohania Mukesh. Using database technology to improve performance of web proxy servers. In Proceedings of the Fourth International Workshop on the Web and Databases (WebDB), pages 73–78, 2001.Google Scholar
- 6.Bertram L., Rainer H., Georg L., Wolfgang M., and Christian S. Managing semistructured data with orid: A deductive object-oriented perspective. Information Systems, 23(8), 1998.Google Scholar
- 7.Mohania Mukesh, Samtani S., Roddick J.F., and Y. Kambayashi. Advances and research directions in data warehousing technology. Australian Journal of Information Systems, 1999.Google Scholar
- 8.McHugh J.and Abiteboul S., Goldman R.and Dallan Q., and Widom J. Lore: A Database Management System for Semistructured Data. http://www-db.stanford.edu/lore, 1994.
- 9.Michael Stonebraker and Joseph M. Hellerstein, editors. Readings in Database Systems. Morgan Kaufmann Publishers, 1998.Google Scholar
- 10.Thalhammer Thomas, Schre Michael, and Mohania Mukesh. Active datawarehouses: Complementing olap with active rules. Journal of Data and Knowledge Engineering, to appear.Google Scholar
- 11.Ng Wee-Keong, Lim Ee-Peng, Bhowmick S., and Madria S. Web warehousing: Design and issues. In Proc. of International Workshop on Data Warehousing and Data Mining, LNCS 1552, 1998.Google Scholar
- 12.Jennifer Widom. Research problems in data warehousing. In Proc. Fourth Intl. Conference on Information and Knowledge Management, 1995.Google Scholar