Employing Inductive Databases in Concrete Applications
In this paper we present the application of the inductive database approach to two practical analytical case studies: Web usage mining in Web logs and financial data. As far as concerns the Web domain, we have considered the enriched XML Web logs, that we call conceptual logs, produced by specific Web applications. These ones have been built by using a conceptual model, namely WebML, and its accompanying CASE tool, WebRatio. The Web conceptual logs integrate the usual information about user requests with meta-data concerning the Web site structure. As far as concerns the analysis of financial data, we have considered the trade stock exchange index Dow Jones and studied its component stocks from 1997 to 2002 using the so-called technical analysis. Technical analysis consists in the identification of the relevant (graphical) patterns that occur in the plot of evolution of a stock quote as time proceeds, often adopting different time granularities. On the plots the correlations between distinctive variables of the stocks quote are pointed out, such as the quote trend, the percentage variation and the volume of the stocks exchanged. In particular we adopted candle-sticks, a figurative pattern representing in a condensed diagram the evolution of the stock quotes in a daily stock exchange. In technical analysis, candle-sticks have been frequently used by practitioners to predict the trend of the stocks quotes in the market.
We then apply a data mining language, namely MINE RULE, to these data in order to identify different types of patterns. As far as Web data is concerned, recurrent navigation paths, page contents most frequently visited, and anomalies such as intrusion attempts or a harmful usage of the resources are among the most important patterns. As far as concerns the financial domain, we searched for the sets of stocks which frequently exhibited a positive daily exchange in the same days, so as to constitute a collection of quotes for the constitution of the customers’ portfolio, or the candle-sticks frequently associated to certain stocks, or finally the most similar stocks, in the sense that they mostly presented in the same dates the same typology of candle-stick, that is the same behaviour in time.
The purpose of this paper is to show that the exploitation of the nuggets of information embedded in the data and of the specialised mining constructs provided by the query languages, enables the rapid customization of the mining procedures following to the users’ need. Given our experience, we also claim that the use of queries in advanced languages, as opposed to ad-hoc heuristics, eases the specification and the discovery of a large spectrum of patterns.
Unable to display preview. Download preview PDF.
- 1.Botta, M., Meo, R., Malangone, C.: Association rules extraction with mine rule operator. Technical report, RT73-2003, Dipartimento di Informatica, University of Torino, Italy (April 2003)Google Scholar
- 2.Ceri, S., Fraternali, P., Bongio, A.: Web modeling language (webml): a modeling language for designing web sites. In: Proc. of WWW9 Conference (May 2000)Google Scholar
- 3.Ceri, S., Fraternali, P., Bongio, A., Brambilla, M., Comai, S., Matera, M.: Designing Data-Intensive Web Applications. Morgan Kaufmann, San Francisco (2002)Google Scholar
- 4.Apache Cocoon. Cocoon, http://xml.apache.org/cocoon/
- 5.Cooley, R.: Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data. PhD thesis, University of Minnesota (2000)Google Scholar
- 6.Cooley, R., Tan, P.N., Srivastava, J.: Discovery of Interesting Usage Patterns from Web Data. LNCS (LNAI). Springer, Heidelberg (2000)Google Scholar
- 7.Das, G., Lin, K.-I., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery from time series. In: Proceedings of the 1997 ACM SIGKDD International Conference, ACM SIGKDD (1997)Google Scholar
- 9.Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: A survey. Technical Report 2003.15, Dipartimento di Elettronica e Informazione. Politecnico di Milano. (April 2003)Google Scholar
- 10.Farrell, J.: Portfolio Management: Theory and Application. McGraw-Hill, New York (1997)Google Scholar
- 11.Fraternali, P., Matera, M., Maurino, A.: Conceptual-level log analysis for the evaluation of web application quality. In: Proceedings of LA-Web 2003, Santiago, Chile, November 2003. IEEE Computer Society, Los Alamitos (2003)Google Scholar
- 12.Fu, T.-C., Chung, F.L., Ng, V., Luk, R.: Pattern discovery from stock time series using self-organizing maps. In: Proceedings of the 1997 ACM SIGKDD International Conference, ACM SIGKDD (2001)Google Scholar
- 15.Ito, A.: Empirical evaluation of technical analysis: A synthesis. Technical report, International University of Japan (November 1999)Google Scholar
- 16.Jensen, M.C.: Random walks and technical theories: Some additional evidence. The Journal of Finance (25), 469–482 (1970)Google Scholar
- 17.Kohavi, R., Parekh, R.: Ten supplementary analyses to improve e-commerce web sites. In: Proceedings of the Fifth WEBKDD Workshop: Webmining as a premise to effective and intelligent Web Applications, ACM SIGKDD, Washington, DC, USA. Springer, Heidelberg (2003)Google Scholar
- 19.Lo, A.W., Mamaysky, H., Wang, J.: Foundations of technical analysis: Computational algorithms, statistical inference, and empirical implementation. The Journal of Finance LV(4), 1705–1765 (2000)Google Scholar
- 20.Meo, R., Psaila, G., Ceri, S.: An extension to SQL for mining association rules. Journal of Data Mining and Knowledge Discovery 2(2) (1998)Google Scholar
- 22.Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Mining bases for association rules using closed sets. In: Proceedings of the 16th International Conference on Extending Databases. IEEE, Los Alamitos (2000)Google Scholar
- 23.Pirolli, P., Pitkow, J., Rao, R.: Silk from a sow’s ear: Extracting usable structures form the web. In: Proc. of CHI 96 Conference. ACM Press, New York (April 1996)Google Scholar
- 24.Pring, M.: An introduction to Technical Analysis. McGraw-Hill, New York (1997)Google Scholar
- 27.Teltzrow, M., Berendt, B.: Web-usage-based success metrics for multi-channel businesses. In: Proceedings of the Fifth WEBKDD Workshop: Webmining as a premise to effective and intelligent Web Applications, ACM SIGKDD, Washington, DC, USA. Springer, Heidelberg (2003)Google Scholar