Abstract
Web data provides information and knowledge to improve the web site content and structure. Indeed, it eventually contains knowledge which suggests changes that makes a web site more efficient and effective to attract and retain visitors. Making use of a Data Webhouse or a web analytics solution, it is possible to store statistical information concerning the behaviour of users in a website. Likewise, through applying web mining algorithms, interesting patterns can be discovered, interpreted and transformed into useful knowledge. On the other hand, web data include quantities of irrelevant but complex data preprocessing that must be applied in order to model and understand visitor browsing behaviour. Nevertheless, there are many ways to pre-process web data and model the browsing behaviour, hence different patterns can be obtained depending on which model is used. In this sense, a knowledge representation is necessary to store and manipulate web patterns. Generally, different patterns are discovered by using distinct web mining techniques on web data with dissimilar treatments. Consequently, patterns meta-data are relevant to manipulate the discovered knowledge. In this chapter, topics like feature selection, web mining techniques, models characterisation and pattern management will be covered in order to build a repository that stores patterns’ meta-data. Specifically, a Pattern Webhouse that facilitates knowledge management in the web environment.
Keywords
- Feature Selection
- Mean Square Error
- Association Rule
- Data Mining Model
- Minimum Description Length Principle
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22(2), 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Arnold, W.A., Bowie, J.S.: Artificial Intelligence: A Personal Commonsense Journey. Prentice-Hall, Englewood Cliffs (1985)
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1-2), 245–271 (1997)
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the vapnik-chervonenkis dimension. J. ACM 36(4), 929–965 (1989)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: COLT 1992: Proceedings of the fifth annual workshop on Computational learning theory, pp. 144–152. ACM Press, New York (1992)
Catania, B., Maddalena, A.: Hershey, PA, USA
Catania, B., Maddalena, A., Mazza, M.: Psycho: A prototype system for pattern management. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.-Å., Ooi, B.C. (eds.) VLDB, pp. 1346–1349. ACM, New York (2005)
Chimphlee, S., Salim, N., Ngadiman, M.S.B., Chimphlee, W., Srinoy, S.: Independent component analysis and rough fuzzy based approach to web usage mining. In: AIA 2006: Proceedings of the 24th IASTED international conference on Artificial intelligence and applications, pp. 422–427. ACTA Press, Anaheim (2006)
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1, 5–32 (1999)
Davenport, T., Prusak, L.: Working Knowledge: How Organizations Manage What They Know. Harvard Business School Press, Cambridge (1997)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)
Davis, R., Shrobe, H., Szolovits, P.: What is knowledge representation. AI Magazine 14(1), 17–33 (1993)
Dell, R.F., Román, P.E., Velásquez, J.D.: Web user session reconstruction using integer programming. In: Web Intelligence, pp. 385–388. IEEE, Los Alamitos (2008)
Domingos, P., Pazzani, M., Provan, G.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 103–130 (1997)
Dujovne, L.E., Velásquez, J.D.: Design and implementation of a methodology for identifying website keyobjects. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) Knowledge-Based and Intelligent Information and Engineering Systems. LNCS, vol. 5711, pp. 301–308. Springer, Heidelberg (2009)
Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P.M.B. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: ICML, pp. 148–156 (1996)
Grossman, R.L.: What is analytic infrastructure and why should you care? SIGKDD Explor. Newsl. 11(1), 5–9 (2009)
Grossman, R.L., Hornick, M.F., Meyer, G.: Data mining standards initiatives. Commun. ACM 45(8), 59–61 (2002)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Hah, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: Dmql: A data mining query language for relational databases (1996)
Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer Series in Statistics. Springer, Heidelberg (2009); corr. 3rd printing edition (September 2009)
Imieliński, T., Virmani, A.: Msql: A query language for database mining. Data Min. Knowl. Discov. 3(4), 373–408 (1999)
Inmon, W.H.: Building the Data Warehouse, 4th edn. Wiley Publishing, Chichester (2005)
Elder IV, J.F., Fogelman-Soulié, F., Flach, P.A., Zaki, M.J. (eds.): Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 - July 1, 2009. ACM, New York (2009)
Kimball, R., Merx, R.: The Data Webhouse Toolkit. Wiley Computer Publisher, Chichester (2000)
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edn. Wiley, Chichester (2002)
Klopotek, M.A., Wierzchon, S.T., Trojanowski, K.: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM 2006 Conference, Ustron, Poland, June 19-22, 2006. Advances in Soft Computing. Springer-Verlag New York, Inc., Secaucus (2006)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1-2), 273–324 (1997)
Kohonen, T., Schroeder, M.R., Huang, T.S. (eds.): Self-Organizing Maps. Springer-Verlag New York, Inc., Secaucus (2001)
Larsen, J., Hansen, L.K., Have, A.S., Christiansen, T., Kolenda, T.: Webmining: learning from the world wide web. Computational Statistics & Data Analysis 38(4), 517–532 (2002)
Liu, B.: Web Data Mining: Exploring Hyperlinks, Content and Usage Data, 1st edn. Springer, Heidelberg (2007)
Luo, P., Lin, F., Xiong, Y., Zhao, Y., Shi, Z.: Towards combining web classification and web information extraction: a case study. In: IV et al.: [28], pp. 1235–1244
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Maldonado, S., Weber, R.: A wrapper method for feature selection using support vector machines. Inf. Sci. 179(13), 2208–2217 (2009)
Markov, Z., Larose, D.T.: Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. Wiley Interscience, Hoboken (2007)
Meo, R., Psaila, G., Ceri, S.: An extension to sql for mining association rules. Data Min. Knowl. Discov. 2(2), 195–224 (1998)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Papadimitriou, C.H., Tamaki, H., Raghavan, P., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: PODS 1998: Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 159–168. ACM Press, New York (1998)
Pechter, R.: What’s pmml and what’s new in pmml 4.0? SIGKDD Explor. Newsl. 11(1), 19–25 (2009)
Rosenblatt, F.: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books (1962)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pp. 109–129. WIT Press, Southampton (2005)
Terrovitis, M., Vassiliadis, P., Skiadopoulos, S., Bertino, E., Catania, B., Maddalena, A.: Modeling and language support for the management of pattern-bases. In: International Conference on Scientific and Statistical Database Management, vol. 0, p. 265 (2004)
Torkkola, K.: Feature extraction by non parametric mutual information maximization. Journal of Machine Learning Research 3, 1415–1438 (2003)
Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 264–280 (1971)
Vapnik, V.N.: The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, Heidelberg (1999)
Velasquez, J.D., Palade, V.: Adaptive Web Sites: A Knowledge Extraction from Web Data Approach. IOS Press, Amsterdam (2008)
Velasquez, J.D., Palade, V.: Building a knowledge base for implementing a web-based computerized recommendation system. International Journal of Artificial Intelligence Tools 16(5), 793–828 (2007)
Velásquez, J.D., Palade, V.: A knowledge base for the maintenance of knowledge extracted from web data. Knowledge Based Systems 20(3), 238–248 (2007)
Velasquez, J.D., Yasuda, H., Aoki, T., Weber, R.: A new similarity measure to understand visitor behavior in a web site. IEICE Transactions on Information and Systems, Special Issues in Information Processing Technology for web utilization E87-D(2), 389–396 (2004)
Velasquez, J.D., Rios, S.A., Bassi, A., Yasuda, H., Aoki, T.: Towards the identification of keywords in the web site text content: A methodological approach. International Journal of Web Information Systems information 1(1), 53–57 (2005)
Wang, Y., Hodges, J., Tang, B.: Classification of web documents using a naive bayes method. In: ICTAI 2003: Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence, p. 560. IEEE Computer Society Press, Washington (2003)
Wen, C.W., Liu, H., Wen, W.X., Zheng, J.: A distributed hierarchical clustering system for web mining. In: Wang, X.S., Yu, G., Lu, H. (eds.) WAIM 2001. LNCS, vol. 2118, pp. 103–113. Springer, Heidelberg (2001)
Werbos, P.J.: The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. Wiley Interscience, New York (1994)
Wolf, L., Shashua, A.: Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. J. Mach. Learn. Res. 6, 1855–1887 (2005)
Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: IV et al.: [28], pp. 877–886
Xu, R., Wunsch, I.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Yin, Z., Li, R., Mei, Q., Han, J.: Exploring social tagging graph for web object classification. In: IV et al.: [28], pp. 957–966
Young, T.Y.: The reliability of linear feature extractors. IEEE Transactions on Computers 20(9), 967–971 (1971)
Zeller, M., Grossman, R., Lingenfelder, C., Berthold, M.R., Marcade, E., Pechter, R., Hoskins, M., Thompson, W., Holada, R.: Open standards and cloud computing: Kdd-2009 panel report. In: KDD 2009: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 11–18. ACM Press, New York (2009)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 1151–1157. ACM Press, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rebolledo, V.L., L’Huillier, G., Velásquez, J.D. (2010). Web Pattern Extraction and Storage. In: Velásquez, J.D., Jain, L.C. (eds) Advanced Techniques in Web Intelligence - I. Studies in Computational Intelligence, vol 311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14461-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-14461-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14460-8
Online ISBN: 978-3-642-14461-5
eBook Packages: EngineeringEngineering (R0)