The Orange Customer Analysis Platform

  • Raphaël Féraud
  • Marc Boullé
  • Fabrice Clérot
  • Françoise Fessant
  • Vincent Lemaire
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6171)


In itself, the continuous exponential increase of the data-warehouses size does not necessarily lead to a richer and finer-grained information since the processing capabilities do not increase at the same rate. Current state-of-the-art technologies require the user to strike a delicate balance between the processing cost and the information quality. We describe an industrial approach which leverages recent advances in treatment automatization and relevant data/instance selection and indexing so as to dramatically improve our capability to turn huge volumes of raw data into useful information.


Customer Relationship Management Marketing Campaign Indexation Table Informative Variable Machine Learn Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1. (last access on December 28)
  2. 2.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHCrossRefGoogle Scholar
  3. 3.
    Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction: Foundations and Applications. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  4. 4.
    Kass, G.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29(2), 119–127 (1980)CrossRefGoogle Scholar
  5. 5.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International, California (1984)zbMATHGoogle Scholar
  6. 6.
    Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  7. 7.
    Kerber, R.: Chimerge discretization of numeric attributes. In: Proceedings of the 10th International Conference on Artificial Intelligence, pp. 123–128. MIT Press, Cambridge (1992)Google Scholar
  8. 8.
    Kohavi, R., Sahami, M.: Error-based and entropy-based discretization of continuous features. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114–119. AAAI Press/MIT Press (1996)Google Scholar
  9. 9.
    Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)CrossRefGoogle Scholar
  10. 10.
    Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)Google Scholar
  11. 11.
    Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: 10th National Conference on Artificial Intelligence, pp. 223–228. AAAI Press, Menlo Park (1992)Google Scholar
  12. 12.
    Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2-3), 103–130 (1997)zbMATHCrossRefGoogle Scholar
  13. 13.
    Hand, D., Yu, K.: Idiot bayes? not so stupid after all? International Statistical Review 69(3), 385–399 (2001)zbMATHCrossRefGoogle Scholar
  14. 14.
    Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann, San Francisco (1994)Google Scholar
  15. 15.
    Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)Google Scholar
  16. 16.
    Vitter, J.: Random sampling with a reservoir. ACM Trans. Math. Software 11(1), 37–57 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Aggrawal, C.: On biased reservoir sampling in the presence of stream evolution. In: Proceedings of the VLDB conference (2006)Google Scholar
  18. 18.
    Chaudhuri, S., Motwani, R.: On sampling and relational operators. In: IEEE on Data Engineering (1999)Google Scholar
  19. 19.
    Kolonko, M., Wasch, D.: Sequential reservoir sampling with a non-uniform distribution. Technical report, University of Clausthal (2004)Google Scholar
  20. 20.
    Efraimidis, P.S., Spirakis, P.G.: Weighted random sampling. Technical report, Research Academic Computer Technology Institute (2004)Google Scholar
  21. 21.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB Conference (1999)Google Scholar
  22. 22. (last access on December 21)

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Raphaël Féraud
    • 1
  • Marc Boullé
    • 1
  • Fabrice Clérot
    • 1
  • Françoise Fessant
    • 1
  • Vincent Lemaire
    • 1
  1. 1.Orange LabsLannion

Personalised recommendations