Machine Learning

, Volume 57, Issue 1, pp 83–113

Lessons and Challenges from Mining Retail E-Commerce Data


  • Ron Kohavi
  • Llew Mason
    • Blue Martini Software
  • Rajesh Parekh
    • Blue Martini Software
  • Zijian Zheng
    • Microsoft Corporation

DOI: 10.1023/B:MACH.0000035473.11134.83

Cite this article as:
Kohavi, R., Mason, L., Parekh, R. et al. Machine Learning (2004) 57: 83. doi:10.1023/B:MACH.0000035473.11134.83


The architecture of Blue Martini Software's e-commerce suite has supported data collection, data transformation, and data mining since its inception. With clickstreams being collected at the application-server layer, high-level events being logged, and data automatically transformed into a data warehouse using meta-data, common problems plaguing data mining using weblogs (e.g., sessionization and conflating multi-sourced data) were obviated, thus allowing us to concentrate on actual data mining goals. The paper briefly reviews the architecture and discusses many lessons learned over the last four years and the challenges that still need to be addressed. The lessons and challenges are presented across two dimensions: business-level vs. technical, and throughout the data mining lifecycle stages of data collection, data warehouse construction, business intelligence, and deployment. The lessons and challenges are also widely applicable to data mining domains outside retail e-commerce.

data miningdata analysisbusiness intelligenceweb analyticsweb miningOLAPvisualizationreportingdata transformationsretaile-commerceSimpson's paradoxsessionizationbot detectionclickstreamsapplication serverweb logsdata cleansinghierarchical attributesbusiness reportingdata warehousing
Download to read the full article text

Copyright information

© Kluwer Academic Publishers 2004