Employing Inductive Databases in Concrete Applications

  • Rosa Meo
  • Pier Luca Lanzi
  • Maristella Matera
  • Danilo Careggio
  • Roberto Esposito
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3848)

Abstract

In this paper we present the application of the inductive database approach to two practical analytical case studies: Web usage mining in Web logs and financial data. As far as concerns the Web domain, we have considered the enriched XML Web logs, that we call conceptual logs, produced by specific Web applications. These ones have been built by using a conceptual model, namely WebML, and its accompanying CASE tool, WebRatio. The Web conceptual logs integrate the usual information about user requests with meta-data concerning the Web site structure. As far as concerns the analysis of financial data, we have considered the trade stock exchange index Dow Jones and studied its component stocks from 1997 to 2002 using the so-called technical analysis. Technical analysis consists in the identification of the relevant (graphical) patterns that occur in the plot of evolution of a stock quote as time proceeds, often adopting different time granularities. On the plots the correlations between distinctive variables of the stocks quote are pointed out, such as the quote trend, the percentage variation and the volume of the stocks exchanged. In particular we adopted candle-sticks, a figurative pattern representing in a condensed diagram the evolution of the stock quotes in a daily stock exchange. In technical analysis, candle-sticks have been frequently used by practitioners to predict the trend of the stocks quotes in the market.

We then apply a data mining language, namely MINE RULE, to these data in order to identify different types of patterns. As far as Web data is concerned, recurrent navigation paths, page contents most frequently visited, and anomalies such as intrusion attempts or a harmful usage of the resources are among the most important patterns. As far as concerns the financial domain, we searched for the sets of stocks which frequently exhibited a positive daily exchange in the same days, so as to constitute a collection of quotes for the constitution of the customers’ portfolio, or the candle-sticks frequently associated to certain stocks, or finally the most similar stocks, in the sense that they mostly presented in the same dates the same typology of candle-stick, that is the same behaviour in time.

The purpose of this paper is to show that the exploitation of the nuggets of information embedded in the data and of the specialised mining constructs provided by the query languages, enables the rapid customization of the mining procedures following to the users’ need. Given our experience, we also claim that the use of queries in advanced languages, as opposed to ad-hoc heuristics, eases the specification and the discovery of a large spectrum of patterns.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Botta, M., Meo, R., Malangone, C.: Association rules extraction with mine rule operator. Technical report, RT73-2003, Dipartimento di Informatica, University of Torino, Italy (April 2003)Google Scholar
  2. 2.
    Ceri, S., Fraternali, P., Bongio, A.: Web modeling language (webml): a modeling language for designing web sites. In: Proc. of WWW9 Conference (May 2000)Google Scholar
  3. 3.
    Ceri, S., Fraternali, P., Bongio, A., Brambilla, M., Comai, S., Matera, M.: Designing Data-Intensive Web Applications. Morgan Kaufmann, San Francisco (2002)Google Scholar
  4. 4.
    Apache Cocoon. Cocoon, http://xml.apache.org/cocoon/
  5. 5.
    Cooley, R.: Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data. PhD thesis, University of Minnesota (2000)Google Scholar
  6. 6.
    Cooley, R., Tan, P.N., Srivastava, J.: Discovery of Interesting Usage Patterns from Web Data. LNCS (LNAI). Springer, Heidelberg (2000)Google Scholar
  7. 7.
    Das, G., Lin, K.-I., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery from time series. In: Proceedings of the 1997 ACM SIGKDD International Conference, ACM SIGKDD (1997)Google Scholar
  8. 8.
    Brown, D., Jennings, R.: On technical analysis. Review of Finance Studies 2, 527–551 (1989)CrossRefGoogle Scholar
  9. 9.
    Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: A survey. Technical Report 2003.15, Dipartimento di Elettronica e Informazione. Politecnico di Milano. (April 2003)Google Scholar
  10. 10.
    Farrell, J.: Portfolio Management: Theory and Application. McGraw-Hill, New York (1997)Google Scholar
  11. 11.
    Fraternali, P., Matera, M., Maurino, A.: Conceptual-level log analysis for the evaluation of web application quality. In: Proceedings of LA-Web 2003, Santiago, Chile, November 2003. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  12. 12.
    Fu, T.-C., Chung, F.L., Ng, V., Luk, R.: Pattern discovery from stock time series using self-organizing maps. In: Proceedings of the 1997 ACM SIGKDD International Conference, ACM SIGKDD (2001)Google Scholar
  13. 13.
    Ramazan, G.: The predictability of security returns with simple trading rules. The Journal of Empirical Finance 5, 347–359 (1998)CrossRefGoogle Scholar
  14. 14.
    Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Coomunications of the ACM 39(11), 58–64 (1996)CrossRefGoogle Scholar
  15. 15.
    Ito, A.: Empirical evaluation of technical analysis: A synthesis. Technical report, International University of Japan (November 1999)Google Scholar
  16. 16.
    Jensen, M.C.: Random walks and technical theories: Some additional evidence. The Journal of Finance (25), 469–482 (1970)Google Scholar
  17. 17.
    Kohavi, R., Parekh, R.: Ten supplementary analyses to improve e-commerce web sites. In: Proceedings of the Fifth WEBKDD Workshop: Webmining as a premise to effective and intelligent Web Applications, ACM SIGKDD, Washington, DC, USA. Springer, Heidelberg (2003)Google Scholar
  18. 18.
    Blume, L., Easley, D., O’Hara, M.: Market statistics and technical analysis: the role of trading volumes. The Journal of Finance 49, 153–181 (1994)CrossRefGoogle Scholar
  19. 19.
    Lo, A.W., Mamaysky, H., Wang, J.: Foundations of technical analysis: Computational algorithms, statistical inference, and empirical implementation. The Journal of Finance LV(4), 1705–1765 (2000)Google Scholar
  20. 20.
    Meo, R., Psaila, G., Ceri, S.: An extension to SQL for mining association rules. Journal of Data Mining and Knowledge Discovery 2(2) (1998)Google Scholar
  21. 21.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefGoogle Scholar
  22. 22.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Mining bases for association rules using closed sets. In: Proceedings of the 16th International Conference on Extending Databases. IEEE, Los Alamitos (2000)Google Scholar
  23. 23.
    Pirolli, P., Pitkow, J., Rao, R.: Silk from a sow’s ear: Extracting usable structures form the web. In: Proc. of CHI 96 Conference. ACM Press, New York (April 1996)Google Scholar
  24. 24.
    Pring, M.: An introduction to Technical Analysis. McGraw-Hill, New York (1997)Google Scholar
  25. 25.
    Punin, J.R., Krishnamoorthy, M.S., Zaki, M.J.: Logml: Log markup language for web usage mining. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WebKDD 2001. LNCS (LNAI), vol. 2356, pp. 88–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  26. 26.
    Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2), 12–23 (2000)CrossRefGoogle Scholar
  27. 27.
    Teltzrow, M., Berendt, B.: Web-usage-based success metrics for multi-channel businesses. In: Proceedings of the Fifth WEBKDD Workshop: Webmining as a premise to effective and intelligent Web Applications, ACM SIGKDD, Washington, DC, USA. Springer, Heidelberg (2003)Google Scholar
  28. 28.
    Wille, R.: Concept lattices and conceptual knowledge systems. Computers and Mathematics with Applications 23, 493 (1992)MATHCrossRefGoogle Scholar
  29. 29.
    Zaki, M.: Mining non-redundant association rules. Data Mining and Knowledge Discovery 9, 223–248 (2004)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Rosa Meo
    • 1
  • Pier Luca Lanzi
    • 2
  • Maristella Matera
    • 2
  • Danilo Careggio
    • 1
  • Roberto Esposito
    • 1
  1. 1.Dipartimento di InformaticaUniversità di TorinoTorinoItaly
  2. 2.Dipartimento di Elettronica e InformazionePolitecnico di MilanoMilanoItaly

Personalised recommendations