Skip to main content

A Short Tour of the Predictive Modeling Process

  • Chapter

Abstract

To begin Part I of this work, we present a simple example that illustrates the broad concepts of model building. Section 2.1 provides an overview of a fuel economy data set for which the objective is to predict vehicles' fuel economy based on standard vehicle predictors such as engine displacement, number of cylinders, type of transmission, and manufacturer. In the context of this example, we explain the concepts of “spending” data, estimating model performance, building candidate models, and selecting the optimal model (Section 2.2).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    One of our graduate professors once said “the only way to be comfortable with your data is to never look at it.”

References

  • Abdi H, Williams L (2010). “Principal Component Analysis.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.

    Article  Google Scholar 

  • Agresti A (2002). Categorical Data Analysis. Wiley–Interscience.

    Google Scholar 

  • Ahdesmaki M, Strimmer K (2010). “Feature Selection in Omics Prediction Problems Using CAT Scores and False Nondiscovery Rate Control.” The Annals of Applied Statistics, 4(1), 503–519.

    Article  MathSciNet  MATH  Google Scholar 

  • Alin A (2009). “Comparison of PLS Algorithms when Number of Objects is Much Larger than Number of Variables.” Statistical Papers, 50, 711–720.

    Article  MathSciNet  MATH  Google Scholar 

  • Altman D, Bland J (1994). “Diagnostic Tests 3: Receiver Operating Characteristic Plots.” British Medical Journal, 309(6948), 188.

    Article  Google Scholar 

  • Ambroise C, McLachlan G (2002). “Selection Bias in Gene Extraction on the Basis of Microarray Gene–Expression Data.” Proceedings of the National Academy of Sciences, 99(10), 6562–6566.

    Article  MATH  Google Scholar 

  • Amit Y, Geman D (1997). “Shape Quantization and Recognition with Randomized Trees.” Neural Computation, 9, 1545–1588.

    Article  Google Scholar 

  • Armitage P, Berry G (1994). Statistical Methods in Medical Research. Blackwell Scientific Publications, Oxford, 3rd edition.

    Google Scholar 

  • Artis M, Ayuso M, Guillen M (2002). “Detection of Automobile Insurance Fraud with Discrete Choice Models and Misclassified Claims.” The Journal of Risk and Insurance, 69(3), 325–340.

    Article  Google Scholar 

  • Austin P, Brunner L (2004). “Inflation of the Type I Error Rate When a Continuous Confounding Variable Is Categorized in Logistic Regression Analyses.” Statistics in Medicine, 23(7), 1159–1178.

    Article  Google Scholar 

  • Ayres I (2007). Super Crunchers: Why Thinking–By–Numbers Is The New Way To Be Smart. Bantam.

    Google Scholar 

  • Barker M, Rayens W (2003). “Partial Least Squares for Discrimination.” Journal of Chemometrics, 17(3), 166–173.

    Article  Google Scholar 

  • Batista G, Prati R, Monard M (2004). “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data.” ACM SIGKDD Explorations Newsletter, 6(1), 20–29.

    Article  Google Scholar 

  • Bauer E, Kohavi R (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning, 36, 105–142.

    Article  Google Scholar 

  • Becton Dickinson and Company (1991). ProbeTec ET Chlamydia trachomatis and Neisseria gonorrhoeae Amplified DNA Assays (Package Insert).

    Google Scholar 

  • Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). “Tissue Classification with Gene Expression Profiles.” Journal of Computational Biology, 7(3), 559–583.

    Article  Google Scholar 

  • Bentley J (1975). “Multidimensional Binary Search Trees Used for Associative Searching.” Communications of the ACM, 18(9), 509–517.

    Article  MathSciNet  MATH  Google Scholar 

  • Berglund A, Kettaneh N, Uppgård L, Wold S, DR NB, Cameron (2001). “The GIFI Approach to Non–Linear PLS Modeling.” Journal of Chemometrics, 15, 321–336.

    Google Scholar 

  • Berglund A, Wold S (1997). “INLR, Implicit Non–Linear Latent Variable Regression.” Journal of Chemometrics, 11, 141–156.

    Article  Google Scholar 

  • Bergmeir C, Benitez JM (2012). “Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS.” Journal of Statistical Software, 46(7), 1–26.

    Article  Google Scholar 

  • Bergstra J, Casagrande N, Erhan D, Eck D, Kégl B (2006). “Aggregate Features and AdaBoost for Music Classification.” Machine Learning, 65, 473–484.

    Article  Google Scholar 

  • Berntsson P, Wold S (1986). “Comparison Between X-ray Crystallographic Data and Physiochemical Parameters with Respect to Their Information About the Calcium Channel Antagonist Activity of 4-Phenyl-1,4-Dihydropyridines.” Quantitative Structure-Activity Relationships, 5, 45–50.

    Article  Google Scholar 

  • Bhanu B, Lin Y (2003). “Genetic Algorithm Based Feature Selection for Target Detection in SAR Images.” Image and Vision Computing, 21, 591–608.

    Article  Google Scholar 

  • Bishop C (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.

    MATH  Google Scholar 

  • Bishop C (2006). Pattern Recognition and Machine Learning. Springer.

    Google Scholar 

  • Bland J, Altman D (1995). “Statistics Notes: Multiple Significance Tests: The Bonferroni Method.” British Medical Journal, 310(6973), 170–170.

    Article  Google Scholar 

  • Bland J, Altman D (2000). “The Odds Ratio.” British Medical Journal, 320(7247), 1468.

    Article  Google Scholar 

  • Bohachevsky I, Johnson M, Stein M (1986). “Generalized Simulated Annealing for Function Optimization.” Technometrics, 28(3), 209–217.

    Article  MATH  Google Scholar 

  • Bone R, Balk R, Cerra F, Dellinger R, Fein A, Knaus W, Schein R, Sibbald W (1992). “Definitions for Sepsis and Organ Failure and Guidelines for the Use of Innovative Therapies in Sepsis.” Chest, 101(6), 1644–1655.

    Article  Google Scholar 

  • Boser B, Guyon I, Vapnik V (1992). “A Training Algorithm for Optimal Margin Classifiers.” In “Proceedings of the Fifth Annual Workshop on Computational Learning Theory,” pp. 144–152.

    Google Scholar 

  • Boulesteix A, Strobl C (2009). “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High–Dimensional Prediction.” BMC Medical Research Methodology, 9(1), 85.

    Article  Google Scholar 

  • Box G, Cox D (1964). “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252.

    Google Scholar 

  • Box G, Hunter W, Hunter J (1978). Statistics for Experimenters. Wiley, New York.

    MATH  Google Scholar 

  • Box G, Tidwell P (1962). “Transformation of the Independent Variables.” Technometrics, 4(4), 531–550.

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman L (1996a). “Bagging Predictors.” Machine Learning, 24(2), 123–140.

    MathSciNet  MATH  Google Scholar 

  • Breiman L (1996b). “Heuristics of Instability and Stabilization in Model Selection.” The Annals of Statistics, 24(6), 2350–2383.

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman L (1996c). “Technical Note: Some Properties of Splitting Criteria.” Machine Learning, 24(1), 41–47.

    MathSciNet  MATH  Google Scholar 

  • Breiman L (1998). “Arcing Classifiers.” The Annals of Statistics, 26, 123–140.

    MathSciNet  MATH  Google Scholar 

  • Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125.

    Google Scholar 

  • Breiman L (2001). “Random Forests.” Machine Learning, 45, 5–32.

    Article  MATH  Google Scholar 

  • Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.

    MATH  Google Scholar 

  • Bridle J (1990). “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition.” In “Neurocomputing: Algorithms, Architectures and Applications,” pp. 227–236. Springer–Verlag.

    Google Scholar 

  • Brillinger D (2004). “Some Data Analyses Using Mutual Information.” Brazilian Journal of Probability and Statistics, 18(6), 163–183.

    MathSciNet  MATH  Google Scholar 

  • Brodnjak-Vonina D, Kodba Z, Novi M (2005). “Multivariate Data Analysis in Classification of Vegetable Oils Characterized by the Content of Fatty Acids.” Chemometrics and Intelligent Laboratory Systems, 75(1), 31–43.

    Article  Google Scholar 

  • Brown C, Davis H (2006). “Receiver Operating Characteristics Curves and Related Decision Measures: A Tutorial.” Chemometrics and Intelligent Laboratory Systems, 80(1), 24–38.

    Article  Google Scholar 

  • Bu G (2009). “Apolipoprotein E and Its Receptors in Alzheimer’s Disease: Pathways, Pathogenesis and Therapy.” Nature Reviews Neuroscience, 10(5), 333–344.

    Article  Google Scholar 

  • Buckheit J, Donoho DL (1995). “WaveLab and Reproducible Research.” In A Antoniadis, G Oppenheim (eds.), “Wavelets in Statistics,” pp. 55–82. Springer-Verlag, New York.

    Google Scholar 

  • Burez J, Van den Poel D (2009). “Handling Class Imbalance In Customer Churn Prediction.” Expert Systems with Applications, 36(3), 4626–4636.

    Google Scholar 

  • Cancedda N, Gaussier E, Goutte C, Renders J (2003). “Word–Sequence Kernels.” The Journal of Machine Learning Research, 3, 1059–1082.

    MathSciNet  MATH  Google Scholar 

  • Caputo B, Sim K, Furesjo F, Smola A (2002). “Appearance–Based Object Recognition Using SVMs: Which Kernel Should I Use?” In “Proceedings of NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision,”.

    Google Scholar 

  • Carolin C, Boulesteix A, Augustin T (2007). “Unbiased Split Selection for Classification Trees Based on the Gini Index.” Computational Statistics & Data Analysis, 52(1), 483–501.

    Article  MathSciNet  MATH  Google Scholar 

  • Castaldi P, Dahabreh I, Ioannidis J (2011). “An Empirical Assessment of Validation Practices for Molecular Classifiers.” Briefings in Bioinformatics, 12(3), 189–202.

    Article  Google Scholar 

  • Chambers J (2008). Software for Data Analysis: Programming with R. Springer.

    Google Scholar 

  • Chan K, Loh W (2004). “LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees.” Journal of Computational and Graphical Statistics, 13(4), 826–852.

    Article  MathSciNet  Google Scholar 

  • Chang CC, Lin CJ (2011). “LIBSVM: A Library for Support Vector Machines.” ACM Transactions on Intelligent Systems and Technology, 2, 27: 1–27:27.

    Google Scholar 

  • Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002). “SMOTE: Synthetic Minority Over–Sampling Technique.” Journal of Artificial Intelligence Research, 16(1), 321–357.

    MATH  Google Scholar 

  • Chun H, Keleş S (2010). “Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(1), 3–25.

    Article  MathSciNet  Google Scholar 

  • Chung D, Keles S (2010). “Sparse Partial Least Squares Classification for High Dimensional Data.” Statistical Applications in Genetics and Molecular Biology, 9(1), 17.

    Article  MathSciNet  MATH  Google Scholar 

  • Clark R (1997). “OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets.” Journal of Chemical Information and Computer Sciences, 37(6), 1181–1188.

    Article  Google Scholar 

  • Clark T (2004). “Can Out–of–Sample Forecast Comparisons Help Prevent Overfitting?” Journal of Forecasting, 23(2), 115–139.

    Article  Google Scholar 

  • Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.

    Article  MathSciNet  Google Scholar 

  • Cleveland W (1979). “Robust Locally Weighted Regression and Smoothing Scatterplots.” Journal of the American Statistical Association, 74(368), 829–836.

    Article  MathSciNet  MATH  Google Scholar 

  • Cleveland W, Devlin S (1988). “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting.” Journal of the American Statistical Association, pp. 596–610.

    Google Scholar 

  • Cohen G, Hilario M, Pellegrini C, Geissbuhler A (2005). “SVM Modeling via a Hybrid Genetic Strategy. A Health Care Application.” In R Engelbrecht, AGC Lovis (eds.), “Connecting Medical Informatics and Bio–Informatics,” pp. 193–198. IOS Press.

    Google Scholar 

  • Cohen J (1960). “A Coefficient of Agreement for Nominal Data.” Educational and Psychological Measurement, 20, 37–46.

    Article  Google Scholar 

  • Cohn D, Atlas L, Ladner R (1994). “Improving Generalization with Active Learning.” Machine Learning, 15(2), 201–221.

    Google Scholar 

  • Cornell J (2002). Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data. Wiley, New York, NY.

    Book  MATH  Google Scholar 

  • Cortes C, Vapnik V (1995). “Support–Vector Networks.” Machine Learning, 20(3), 273–297.

    MATH  Google Scholar 

  • Costa N, Lourenco J, Pereira Z (2011). “Desirability Function Approach: A Review and Performance Evaluation in Adverse Conditions.” Chemometrics and Intelligent Lab Systems, 107(2), 234–244.

    Google Scholar 

  • Cover TM, Thomas JA (2006). Elements of Information Theory. Wiley–Interscience.

    Google Scholar 

  • Craig-Schapiro R, Kuhn M, Xiong C, Pickering E, Liu J, Misko TP, Perrin R, Bales K, Soares H, Fagan A, Holtzman D (2011). “Multiplexed Immunoassay Panel Identifies Novel CSF Biomarkers for Alzheimer’s Disease Diagnosis and Prognosis.” PLoS ONE, 6(4), e18850.

    Article  Google Scholar 

  • Cruz-Monteagudo M, Borges F, Cordeiro MND (2011). “Jointly Handling Potency and Toxicity of Antimicrobial Peptidomimetics by Simple Rules from Desirability Theory and Chemoinformatics.” Journal of Chemical Information and Modeling, 51(12), 3060–3077.

    Article  Google Scholar 

  • Davison M (1983). Multidimensional Scaling. John Wiley and Sons, Inc.

    MATH  Google Scholar 

  • Dayal B, MacGregor J (1997). “Improved PLS Algorithms.” Journal of Chemometrics, 11, 73–85.

    Article  Google Scholar 

  • de Jong S (1993). “SIMPLS: An Alternative Approach to Partial Least Squares Regression.” Chemometrics and Intelligent Laboratory Systems, 18, 251–263.

    Google Scholar 

  • de Jong S, Ter Braak C (1994). “Short Communication: Comments on the PLS Kernel Algorithm.” Journal of Chemometrics, 8, 169–174.

    Google Scholar 

  • de Leon M, Klunk W (2006). “Biomarkers for the Early Diagnosis of Alzheimer’s Disease.” The Lancet Neurology, 5(3), 198–199.

    Google Scholar 

  • Defernez M, Kemsley E (1997). “The Use and Misuse of Chemometrics for Treating Classification Problems.” TrAC Trends in Analytical Chemistry, 16(4), 216–221.

    Article  Google Scholar 

  • DeLong E, DeLong D, Clarke-Pearson D (1988). “Comparing the Areas Under Two Or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics, 44(3), 837–45.

    Google Scholar 

  • Derksen S, Keselman H (1992). “Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables.” British Journal of Mathematical and Statistical Psychology, 45(2), 265–282.

    Article  Google Scholar 

  • Derringer G, Suich R (1980). “Simultaneous Optimization of Several Response Variables.” Journal of Quality Technology, 12(4), 214–219.

    Google Scholar 

  • Dietterich T (2000). “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning, 40, 139–158.

    Article  Google Scholar 

  • Dillon W, Goldstein M (1984). Multivariate Analysis: Methods and Applications. Wiley, New York.

    MATH  Google Scholar 

  • Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC.

    Google Scholar 

  • Drucker H, Burges C, Kaufman L, Smola A, Vapnik V (1997). “Support Vector Regression Machines.” Advances in Neural Information Processing Systems, pp. 155–161.

    Google Scholar 

  • Drummond C, Holte R (2000). “Explicitly Representing Expected Cost: An Alternative to ROC Representation.” In “Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,” pp. 198–207.

    Chapter  Google Scholar 

  • Duan K, Keerthi S (2005). “Which is the Best Multiclass SVM Method? An Empirical Study.” Multiple Classifier Systems, pp. 278–285.

    Google Scholar 

  • Dudoit S, Fridlyand J, Speed T (2002). “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, 97(457), 77–87.

    Article  MathSciNet  MATH  Google Scholar 

  • Duhigg C (2012). “How Companies Learn Your Secrets.” The New York Times. URL http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html.

  • Dunn W, Wold S (1990). “Pattern Recognition Techniques in Drug Design.” In C Hansch, P Sammes, J Taylor (eds.), “Comprehensive Medicinal Chemistry,” pp. 691–714. Pergamon Press, Oxford.

    Google Scholar 

  • Dwyer D (2005). “Examples of Overfitting Encountered When Building Private Firm Default Prediction Models.” Technical report, Moody’s KMV.

    Google Scholar 

  • Efron B (1983). “Estimating the Error Rate of a Prediction Rule: Improvement on Cross–Validation.” Journal of the American Statistical Association, pp. 316–331.

    Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004). “Least Angle Regression.” The Annals of Statistics, 32(2), 407–499.

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B, Tibshirani R (1986). “Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy.” Statistical Science, pp. 54–75.

    Google Scholar 

  • Efron B, Tibshirani R (1997). “Improvements on Cross–Validation: The 632+ Bootstrap Method.” Journal of the American Statistical Association, 92(438), 548–560.

    MathSciNet  MATH  Google Scholar 

  • Eilers P, Boer J, van Ommen G, van Houwelingen H (2001). “Classification of Microarray Data with Penalized Logistic Regression.” In “Proceedings of SPIE,” volume 4266, p. 187.

    Google Scholar 

  • Eugster M, Hothorn T, Leisch F (2008). “Exploratory and Inferential Analysis of Benchmark Experiments.” Ludwigs-Maximilians-Universität München, Department of Statistics, Tech. Rep, 30.

    Google Scholar 

  • Everitt B, Landau S, Leese M, Stahl D (2011). Cluster Analysis. Wiley.

    Google Scholar 

  • Ewald B (2006). “Post Hoc Choice of Cut Points Introduced Bias to Diagnostic Research.” Journal of clinical epidemiology, 59(8), 798–801.

    Article  Google Scholar 

  • Fanning K, Cogger K (1998). “Neural Network Detection of Management Fraud Using Published Financial Data.” International Journal of Intelligent Systems in Accounting, Finance & Management, 7(1), 21–41.

    Article  Google Scholar 

  • Faraway J (2005). Linear Models with R. Chapman & Hall/CRC, Boca Raton.

    MATH  Google Scholar 

  • Fawcett T (2006). “An Introduction to ROC Analysis.” Pattern Recognition Letters, 27(8), 861–874.

    Article  MathSciNet  Google Scholar 

  • Fisher R (1936). “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics, 7(2), 179–188.

    Article  Google Scholar 

  • Forina M, Casale M, Oliveri P, Lanteri S (2009). “CAIMAN brothers: A Family of Powerful Classification and Class Modeling Techniques.” Chemometrics and Intelligent Laboratory Systems, 96(2), 239–245.

    Article  Google Scholar 

  • Frank E, Wang Y, Inglis S, Holmes G (1998). “Using Model Trees for Classification.” Machine Learning.

    Google Scholar 

  • Frank E, Witten I (1998). “Generating Accurate Rule Sets Without Global Optimization.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151.

    Google Scholar 

  • Free Software Foundation (June 2007). GNU General Public License.

    Google Scholar 

  • Freund Y (1995). “Boosting a Weak Learning Algorithm by Majority.” Information and Computation, 121, 256–285.

    Article  MathSciNet  MATH  Google Scholar 

  • Freund Y, Schapire R (1996). “Experiments with a New Boosting Algorithm.” Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156.

    Google Scholar 

  • Friedman J (1989). “Regularized Discriminant Analysis.” Journal of the American Statistical Association, 84(405), 165–175.

    Article  MathSciNet  Google Scholar 

  • Friedman J (1991). “Multivariate Adaptive Regression Splines.” The Annals of Statistics, 19(1), 1–141.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J (2001). “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 29(5), 1189–1232.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J (2002). “Stochastic Gradient Boosting.” Computational Statistics and Data Analysis, 38(4), 367–378.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1–22.

    Article  Google Scholar 

  • Geisser S (1993). Predictive Inference: An Introduction. Chapman and Hall.

    Google Scholar 

  • Geladi P, Kowalski B (1986). “Partial Least-Squares Regression: A Tutorial.” Analytica Chimica Acta, 185, 1–17.

    Article  Google Scholar 

  • Geladi P, Manley M, Lestander T (2003). “Scatter Plotting in Multivariate Data Analysis.” Journal of Chemometrics, 17(8–9), 503–511.

    Article  Google Scholar 

  • Gentleman R (2008). R Programming for Bioinformatics. CRC Press.

    Google Scholar 

  • Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber M, Iacus S, Irizarry R, Leisch F, Li C, Mächler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004). “Bioconductor: Open Software Development for Computational Biology and Bioinformatics.” Genome Biology, 5(10), R80.

    Article  Google Scholar 

  • Giuliano K, DeBiasio R, Dunlay R, Gough A, Volosky J, Zock J, Pavlakis G, Taylor D (1997). “High–Content Screening: A New Approach to Easing Key Bottlenecks in the Drug Discovery Process.” Journal of Biomolecular Screening, 2(4), 249–259.

    Article  Google Scholar 

  • Goldberg D (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison–Wesley, Boston.

    MATH  Google Scholar 

  • Golub G, Heath M, Wahba G (1979). “Generalized Cross–Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics, 21(2), 215–223.

    Article  MathSciNet  MATH  Google Scholar 

  • Good P (2000). Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer.

    Google Scholar 

  • Gowen A, Downey G, Esquerre C, O’Donnell C (2010). “Preventing Over–Fitting in PLS Calibration Models of Near-Infrared (NIR) Spectroscopy Data Using Regression Coefficients.” Journal of Chemometrics, 25, 375–381.

    Article  Google Scholar 

  • Graybill F (1976). Theory and Application of the Linear Model. Wadsworth & Brooks, Pacific Grove, CA.

    MATH  Google Scholar 

  • Guo Y, Hastie T, Tibshirani R (2007). “Regularized Linear Discriminant Analysis and its Application in Microarrays.” Biostatistics, 8(1), 86–100.

    Article  MATH  Google Scholar 

  • Gupta S, Hanssens D, Hardie B, Kahn W, Kumar V, Lin N, Ravishanker N, Sriram S (2006). “Modeling Customer Lifetime Value.” Journal of Service Research, 9(2), 139–155.

    Article  Google Scholar 

  • Guyon I, Elisseeff A (2003). “An Introduction to Variable and Feature Selection.” The Journal of Machine Learning Research, 3, 1157–1182.

    MATH  Google Scholar 

  • Guyon I, Weston J, Barnhill S, Vapnik V (2002). “Gene Selection for Cancer Classification Using Support Vector Machines.” Machine Learning, 46(1), 389–422.

    Article  MATH  Google Scholar 

  • Hall M, Smith L (1997). “Feature Subset Selection: A Correlation Based Filter Approach.” International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858.

    Google Scholar 

  • Hall P, Hyndman R, Fan Y (2004). “Nonparametric Confidence Intervals for Receiver Operating Characteristic Curves.” Biometrika, 91, 743–750.

    Article  MathSciNet  MATH  Google Scholar 

  • Hampel H, Frank R, Broich K, Teipel S, Katz R, Hardy J, Herholz K, Bokde A, Jessen F, Hoessler Y (2010). “Biomarkers for Alzheimer’s Disease: Academic, Industry and Regulatory Perspectives.” Nature Reviews Drug Discovery, 9(7), 560–574.

    Article  Google Scholar 

  • Hand D, Till R (2001). “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.” Machine Learning, 45(2), 171–186.

    Article  MATH  Google Scholar 

  • Hanley J, McNeil B (1982). “The Meaning and Use of the Area under a Receiver Operating (ROC) Curvel Characteristic.” Radiology, 143(1), 29–36.

    Article  Google Scholar 

  • Hardle W, Werwatz A, Müller M, Sperlich S, Hardle W, Werwatz A, Müller M, Sperlich S (2004). “Nonparametric Density Estimation.” In “Nonparametric and Semiparametric Models,” pp. 39–83. Springer Berlin Heidelberg.

    Google Scholar 

  • Harrell F (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York.

    Book  MATH  Google Scholar 

  • Hastie T, Pregibon D (1990). “Shrinking Trees.” Technical report, AT&T Bell Laboratories Technical Report.

    Google Scholar 

  • Hastie T, Tibshirani R (1990). Generalized Additive Models. Chapman & Hall/CRC.

    Google Scholar 

  • Hastie T, Tibshirani R (1996). “Discriminant Analysis by Gaussian Mixtures.” Journal of the Royal Statistical Society. Series B, pp. 155–176.

    Google Scholar 

  • Hastie T, Tibshirani R, Buja A (1994). “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association, 89(428), 1255–1270.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.

    Google Scholar 

  • Hawkins D (2004). “The Problem of Overfitting.” Journal of Chemical Information and Computer Sciences, 44(1), 1–12.

    Article  Google Scholar 

  • Hawkins D, Basak S, Mills D (2003). “Assessing Model Fit by Cross–Validation.” Journal of Chemical Information and Computer Sciences, 43(2), 579–586.

    Article  Google Scholar 

  • Henderson H, Velleman P (1981). “Building Multiple Regression Models Interactively.” Biometrics, pp. 391–411.

    Google Scholar 

  • Hesterberg T, Choi N, Meier L, Fraley C (2008). “Least Angle and L 1 Penalized Regression: A Review.” Statistics Surveys, 2, 61–93.

    Article  MathSciNet  MATH  Google Scholar 

  • Heyman R, Slep A (2001). “The Hazards of Predicting Divorce Without Cross-validation.” Journal of Marriage and the Family, 63(2), 473.

    Article  Google Scholar 

  • Hill A, LaPan P, Li Y, Haney S (2007). “Impact of Image Segmentation on High–Content Screening Data Quality for SK–BR-3 Cells.” BMC Bioinformatics, 8(1), 340.

    Article  Google Scholar 

  • Ho T (1998). “The Random Subspace Method for Constructing Decision Forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 340–354.

    Google Scholar 

  • Hoerl A (1970). “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics, 12(1), 55–67.

    Article  MathSciNet  MATH  Google Scholar 

  • Holland J (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI.

    Google Scholar 

  • Holland J (1992). Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA.

    Google Scholar 

  • Holmes G, Hall M, Frank E (1993). “Generating Rule Sets from Model Trees.” In “Australian Joint Conference on Artificial Intelligence,”.

    Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.

    Article  MathSciNet  Google Scholar 

  • Hothorn T, Leisch F, Zeileis A, Hornik K (2005). “The Design and Analysis of Benchmark Experiments.” Journal of Computational and Graphical Statistics, 14(3), 675–699.

    Article  MathSciNet  Google Scholar 

  • Hsieh W, Tang B (1998). “Applying Neural Network Models to Prediction and Data Analysis in Meteorology and Oceanography.” Bulletin of the American Meteorological Society, 79(9), 1855–1870.

    Article  Google Scholar 

  • Hsu C, Lin C (2002). “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks, 13(2), 415–425.

    Article  Google Scholar 

  • Huang C, Chang B, Cheng D, Chang C (2012). “Feature Selection and Parameter Optimization of a Fuzzy-Based Stock Selection Model Using Genetic Algorithms.” International Journal of Fuzzy Systems, 14(1), 65–75.

    MathSciNet  Google Scholar 

  • Huuskonen J (2000). “Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology.” Journal of Chemical Information and Computer Sciences, 40(3), 773–777.

    Article  Google Scholar 

  • Ihaka R, Gentleman R (1996). “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics, 5(3), 299–314.

    Google Scholar 

  • Jeatrakul P, Wong K, Fung C (2010). “Classification of Imbalanced Data By Combining the Complementary Neural Network and SMOTE Algorithm.” Neural Information Processing. Models and Applications, pp. 152–159.

    Google Scholar 

  • Jerez J, Molina I, Garcia-Laencina P, Alba R, Ribelles N, Martin M, Franco L (2010). “Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem.” Artificial Intelligence in Medicine, 50, 105–115.

    Article  Google Scholar 

  • John G, Kohavi R, Pfleger K (1994). “Irrelevant Features and the Subset Selection Problem.” Proceedings of the Eleventh International Conference on Machine Learning, 129, 121–129.

    Google Scholar 

  • Johnson K, Rayens W (2007). “Modern Classification Methods for Drug Discovery.” In A Dmitrienko, C Chuang-Stein, R D’Agostino (eds.), “Pharmaceutical Statistics Using SAS: A Practical Guide,” pp. 7–43. Cary, NC: SAS Institute Inc.

    Google Scholar 

  • Johnson R, Wichern D (2001). Applied Multivariate Statistical Analysis. Prentice Hall.

    Google Scholar 

  • Jolliffe I, Trendafilov N, Uddin M (2003). “A Modified Principal Component Technique Based on the lasso.” Journal of Computational and Graphical Statistics, 12(3), 531–547.

    Article  MathSciNet  Google Scholar 

  • Kansy M, Senner F, Gubernator K (1998). “Physiochemical High Throughput Screening: Parallel Artificial Membrane Permeation Assay in the Description of Passive Absorption Processes.” Journal of Medicinal Chemistry, 41, 1007–1010.

    Article  Google Scholar 

  • Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). “kernlab - An S4 Package for Kernel Methods in R.” Journal of Statistical Software, 11(9), 1–20.

    Article  Google Scholar 

  • Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”.

    Google Scholar 

  • Kim J, Basak J, Holtzman D (2009). “The Role of Apolipoprotein E in Alzheimer’s Disease.” Neuron, 63(3), 287–303.

    Article  Google Scholar 

  • Kim JH (2009). “Estimating Classification Error Rate: Repeated Cross–Validation, Repeated Hold–Out and Bootstrap.” Computational Statistics & Data Analysis, 53(11), 3735–3745.

    Article  MathSciNet  MATH  Google Scholar 

  • Kimball A (1957). “Errors of the Third Kind in Statistical Consulting.” Journal of the American Statistical Association, 52, 133–142.

    Article  Google Scholar 

  • Kira K, Rendell L (1992). “The Feature Selection Problem: Traditional Methods and a New Algorithm.” Proceedings of the National Conference on Artificial Intelligence, pp. 129–129.

    Google Scholar 

  • Kline DM, Berardi VL (2005). “Revisiting Squared–Error and Cross–Entropy Functions for Training Neural Network Classifiers.” Neural Computing and Applications, 14(4), 310–318.

    Article  Google Scholar 

  • Kohavi R (1995). “A Study of Cross–Validation and Bootstrap for Accuracy Estimation and Model Selection.” International Joint Conference on Artificial Intelligence, 14, 1137–1145.

    Google Scholar 

  • Kohavi R (1996). “Scaling Up the Accuracy of Naive–Bayes Classifiers: A Decision–Tree Hybrid.” In “Proceedings of the second international conference on knowledge discovery and data mining,” volume 7.

    Google Scholar 

  • Kohonen T (1995). Self–Organizing Maps. Springer.

    Google Scholar 

  • Kononenko I (1994). “Estimating Attributes: Analysis and Extensions of Relief.” In F Bergadano, L De Raedt (eds.), “Machine Learning: ECML–94,” volume 784, pp. 171–182. Springer Berlin / Heidelberg.

    Google Scholar 

  • Kuhn M (2008). “Building Predictive Models in R Using the caret Package.” Journal of Statistical Software, 28(5).

    Google Scholar 

  • Kuhn M (2010). “The caret Package Homepage.” URL http://caret.r-forge.r-project.org/.

  • Kuiper S (2008). “Introduction to Multiple Regression: How Much Is Your Car Worth?” Journal of Statistics Education, 16(3).

    Google Scholar 

  • Kvålseth T (1985). “Cautionary Note About R 2.” American Statistician, 39(4), 279–285.

    Google Scholar 

  • Lachiche N, Flach P (2003). “Improving Accuracy and Cost of Two–Class and Multi–Class Probabilistic Classifiers using ROC Curves.” In “Proceedings of the Twentieth International Conference on Machine Learning,” volume 20, pp. 416–424.

    Google Scholar 

  • Larose D (2006). Data Mining Methods and Models. Wiley.

    Google Scholar 

  • Lavine B, Davidson C, Moores A (2002). “Innovative Genetic Algorithms for Chemoinformatics.” Chemometrics and Intelligent Laboratory Systems, 60(1), 161–171.

    Article  Google Scholar 

  • Leach A, Gillet V (2003). An Introduction to Chemoinformatics. Springer.

    Google Scholar 

  • Leisch F (2002a). “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In W Härdle, B Rönz (eds.), “Compstat 2002 — Proceedings in Computational Statistics,” pp. 575–580. Physica Verlag, Heidelberg.

    Google Scholar 

  • Leisch F (2002b). “Sweave, Part I: Mixing R and LaTeX.” R News, 2(3), 28–31.

    Google Scholar 

  • Levy S (2010). “The AI Revolution is On.” Wired.

    Google Scholar 

  • Li J, Fine JP (2008). “ROC Analysis with Multiple Classes and Multiple Tests: Methodology and Its Application in Microarray Studies.” Biostatistics, 9(3), 566–576.

    Article  MATH  Google Scholar 

  • Lindgren F, Geladi P, Wold S (1993). “The Kernel Algorithm for PLS.” Journal of Chemometrics, 7, 45–59.

    Article  Google Scholar 

  • Ling C, Li C (1998). “Data Mining for Direct Marketing: Problems and solutions.” In “Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining,” pp. 73–79.

    Google Scholar 

  • Lipinski C, Lombardo F, Dominy B, Feeney P (1997). “Experimental and Computational Approaches To Estimate Solubility and Permeability In Drug Discovery and Development Settings.” Advanced Drug Delivery Reviews, 23, 3–25.

    Article  Google Scholar 

  • Liu B (2007). Web Data Mining. Springer Berlin / Heidelberg.

    Google Scholar 

  • Liu Y, Rayens W (2007). “PLS and Dimension Reduction for Classification.” Computational Statistics, pp. 189–208.

    Google Scholar 

  • Lo V (2002). “The True Lift Model: A Novel Data Mining Approach To Response Modeling in Database Marketing.” ACM SIGKDD Explorations Newsletter, 4(2), 78–86.

    Article  Google Scholar 

  • Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002). “Text Classification Using String Kernels.” The Journal of Machine Learning Research, 2, 419–444.

    MATH  Google Scholar 

  • Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.

    MathSciNet  MATH  Google Scholar 

  • Loh WY (2010). “Tree–Structured Classifiers.” Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364–369.

    Article  Google Scholar 

  • Loh WY, Shih YS (1997). “Split Selection Methods for Classification Trees.” Statistica Sinica, 7, 815–840.

    MathSciNet  MATH  Google Scholar 

  • Mahé P, Ueda N, Akutsu T, Perret J, Vert J (2005). “Graph Kernels for Molecular Structure–Activity Relationship Analysis with Support Vector Machines.” Journal of Chemical Information and Modeling, 45(4), 939–951.

    Article  Google Scholar 

  • Mahé P, Vert J (2009). “Graph Kernels Based on Tree Patterns for Molecules.” Machine Learning, 75(1), 3–35.

    Article  Google Scholar 

  • Maindonald J, Braun J (2007). Data Analysis and Graphics Using R. Cambridge University Press, 2nd edition.

    Google Scholar 

  • Mandal A, Johnson K, Wu C, Bornemeier D (2007). “Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques.” Journal of Chemical Information and Modeling, 47(3), 981–988.

    Article  Google Scholar 

  • Mandal A, Wu C, Johnson K (2006). “SELC: Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms.” Technometrics, 48(2), 273–283.

    Article  MathSciNet  Google Scholar 

  • Martin J, Hirschberg D (1996). “Small Sample Statistics for Classification Error Rates I: Error Rate Measurements.” Department of Informatics and Computer Science Technical Report.

    Google Scholar 

  • Martin T, Harten P, Young D, Muratov E, Golbraikh A, Zhu H, Tropsha A (2012). “Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling?” Journal of Chemical Information and Modeling, 52(10), 2570–2578.

    Article  Google Scholar 

  • Massy W (1965). “Principal Components Regression in Exploratory Statistical Research.” Journal of the American Statistical Association, 60, 234–246.

    Article  Google Scholar 

  • McCarren P, Springer C, Whitehead L (2011). “An Investigation into Pharmaceutically Relevant Mutagenicity Data and the Influence on Ames Predictive Potential.” Journal of Cheminformatics, 3(51).

    Google Scholar 

  • McClish D (1989). “Analyzing a Portion of the ROC Curve.” Medical Decision Making, 9, 190–195.

    Article  Google Scholar 

  • Melssen W, Wehrens R, Buydens L (2006). “Supervised Kohonen Networks for Classification Problems.” Chemometrics and Intelligent Laboratory Systems, 83(2), 99–113.

    Article  Google Scholar 

  • Mente S, Lombardo F (2005). “A Recursive–Partitioning Model for Blood–Brain Barrier Permeation.” Journal of Computer–Aided Molecular Design, 19(7), 465–481.

    Article  Google Scholar 

  • Menze B, Kelm B, Splitthoff D, Koethe U, Hamprecht F (2011). “On Oblique Random Forests.” Machine Learning and Knowledge Discovery in Databases, pp. 453–469.

    Google Scholar 

  • Mevik B, Wehrens R (2007). “The pls Package: Principal Component and Partial Least Squares Regression in R.” Journal of Statistical Software, 18(2), 1–24.

    Article  Google Scholar 

  • Michailidis G, de Leeuw J (1998). “The Gifi System Of Descriptive Multivariate Analysis.” Statistical Science, 13, 307–336.

    Article  MathSciNet  MATH  Google Scholar 

  • Milborrow S (2012). Notes On the earth Package. URL http://cran.r-project.org/package=earth.

  • Min S, Lee J, Han I (2006). “Hybrid Genetic Algorithms and Support Vector Machines for Bankruptcy Prediction.” Expert Systems with Applications, 31(3), 652–660.

    Article  Google Scholar 

  • Mitchell M (1998). An Introduction to Genetic Algorithms. MIT Press.

    Google Scholar 

  • Molinaro A (2005). “Prediction Error Estimation: A Comparison of Resampling Methods.” Bioinformatics, 21(15), 3301–3307.

    Article  Google Scholar 

  • Molinaro A, Lostritto K, Van Der Laan M (2010). “partDSA: Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction.” Bioinformatics, 26(10), 1357–1363.

    Article  Google Scholar 

  • Montgomery D, Runger G (1993). “Gauge Capability and Designed Experiments. Part I: Basic Methods.” Quality Engineering, 6(1), 115–135.

    Article  Google Scholar 

  • Muenchen R (2009). R for SAS and SPSS Users. Springer.

    Google Scholar 

  • Myers R (1994). Classical and Modern Regression with Applications. PWS-KENT Publishing Company, Boston, MA, second edition.

    Google Scholar 

  • Myers R, Montgomery D (2009). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. Wiley, New York, NY.

    MATH  Google Scholar 

  • Neal R (1996). Bayesian Learning for Neural Networks. Springer-Verlag.

    Google Scholar 

  • Nelder J, Mead R (1965). “A Simplex Method for Function Minimization.” The Computer Journal, 7(4), 308–313.

    Article  MATH  Google Scholar 

  • Netzeva T, Worth A, Aldenberg T, Benigni R, Cronin M, Gramatica P, Jaworska J, Kahn S, Klopman G, Marchant C (2005). “Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure–Activity Relationships.” In “The Report and Recommendations of European Centre for the Validation of Alternative Methods Workshop 52,” volume 33, pp. 1–19.

    Google Scholar 

  • Niblett T (1987). “Constructing Decision Trees in Noisy Domains.” In I Bratko, N Lavrač (eds.), “Progress in Machine Learning: Proceedings of EWSL–87,” pp. 67–78. Sigma Press, Bled, Yugoslavia.

    Google Scholar 

  • Olden J, Jackson D (2000). “Torturing Data for the Sake of Generality: How Valid Are Our Regression Models?” Ecoscience, 7(4), 501–510.

    Google Scholar 

  • Olsson D, Nelson L (1975). “The Nelder–Mead Simplex Procedure for Function Minimization.” Technometrics, 17(1), 45–51.

    Article  MATH  Google Scholar 

  • Osuna E, Freund R, Girosi F (1997). “Support Vector Machines: Training and Applications.” Technical report, MIT Artificial Intelligence Laboratory.

    Google Scholar 

  • Ozuysal M, Calonder M, Lepetit V, Fua P (2010). “Fast Keypoint Recognition Using Random Ferns.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.

    Article  Google Scholar 

  • Park M, Hastie T (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9(1), 30.

    Article  MATH  Google Scholar 

  • Pepe MS, Longton G, Janes H (2009). “Estimation and Comparison of Receiver Operating Characteristic Curves.” Stata Journal, 9(1), 1–16.

    Google Scholar 

  • Perrone M, Cooper L (1993). “When Networks Disagree: Ensemble Methods for Hybrid Neural Networks.” In RJ Mammone (ed.), “Artificial Neural Networks for Speech and Vision,” pp. 126–142. Chapman & Hall, London.

    Google Scholar 

  • Piersma A, Genschow E, Verhoef A, Spanjersberg M, Brown N, Brady M, Burns A, Clemann N, Seiler A, Spielmann H (2004). “Validation of the Postimplantation Rat Whole-embryo Culture Test in the International ECVAM Validation Study on Three In Vitro Embryotoxicity Tests.” Alternatives to Laboratory Animals, 32, 275–307.

    Google Scholar 

  • Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press.

    Google Scholar 

  • Provost F, Domingos P (2003). “Tree Induction for Probability–Based Ranking.” Machine Learning, 52(3), 199–215.

    Article  MATH  Google Scholar 

  • Provost F, Fawcett T, Kohavi R (1998). “The Case Against Accuracy Estimation for Comparing Induction Algorithms.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453.

    Google Scholar 

  • Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.

    Article  Google Scholar 

  • Quinlan R (1992). “Learning with Continuous Classes.” Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, pp. 343–348.

    Google Scholar 

  • Quinlan R (1993a). “Combining Instance–Based and Model–Based Learning.” Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243.

    Google Scholar 

  • Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.

    Google Scholar 

  • Quinlan R (1996a). “Bagging, Boosting, and C4.5.” In “In Proceedings of the Thirteenth National Conference on Artificial Intelligence,”.

    Google Scholar 

  • Quinlan R (1996b). “Improved use of continuous attributes in C4.5.” Journal of Artificial Intelligence Research, 4, 77–90.

    Google Scholar 

  • Quinlan R, Rivest R (1989). “Inferring Decision Trees Using the Minimum Description Length Principle.” Information and computation, 80(3), 227–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Radcliffe N, Surry P (2011). “Real–World Uplift Modelling With Significance–Based Uplift Trees.” Technical report, Stochastic Solutions.

    Google Scholar 

  • Rännar S, Lindgren F, Geladi P, Wold S (1994). “A PLS Kernel Algorithm for Data Sets with Many Variables and Fewer Objects. Part 1: Theory and Algorithm.” Journal of Chemometrics, 8, 111–125.

    Google Scholar 

  • R Development Core Team (2008). R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments. R Foundation for Statistical Computing, Vienna, Austria.

    Google Scholar 

  • R Development Core Team (2010). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

    Google Scholar 

  • Reshef D, Reshef Y, Finucane H, Grossman S, McVean G, Turnbaugh P, Lander E, Mitzenmacher M, Sabeti P (2011). “Detecting Novel Associations in Large Data Sets.” Science, 334(6062), 1518–1524.

    Article  Google Scholar 

  • Richardson M, Dominowska E, Ragno R (2007). “Predicting Clicks: Estimating the Click–Through Rate for New Ads.” In “Proceedings of the 16th International Conference on the World Wide Web,” pp. 521–530.

    Google Scholar 

  • Ridgeway G (2007). “Generalized Boosted Models: A Guide to the gbm Package.” URL http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf.

  • Ripley B (1995). “Statistical Ideas for Selecting Network Architectures.” Neural Networks: Artificial Intelligence and Industrial Applications, pp. 183–190.

    Google Scholar 

  • Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge University Press.

    Google Scholar 

  • Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M (2011). “pROC: an open-source package for R and S+ to analyze and compare ROC curves.” BMC Bioinformatics, 12(1), 77.

    Google Scholar 

  • Robnik-Sikonja M, Kononenko I (1997). “An Adaptation of Relief for Attribute Estimation in Regression.” Proceedings of the Fourteenth International Conference on Machine Learning, pp. 296–304.

    Google Scholar 

  • Rodriguez M (2011). “The Failure of Predictive Modeling and Why We Follow the Herd.” Technical report, Concepcion, Martinez & Bellido.

    Google Scholar 

  • Ruczinski I, Kooperberg C, Leblanc M (2003). “Logic Regression.” Journal of Computational and Graphical Statistics, 12(3), 475–511.

    Article  MathSciNet  MATH  Google Scholar 

  • Rumelhart D, Hinton G, Williams R (1986). “Learning Internal Representations by Error Propagation.” In “Parallel Distributed Processing: Explorations in the Microstructure of Cognition,” The MIT Press.

    Google Scholar 

  • Rzepakowski P, Jaroszewicz S (2012). “Uplift Modeling in Direct Marketing.” Journal of Telecommunications and Information Technology, 2, 43–50.

    Google Scholar 

  • Saar-Tsechansky M, Provost F (2007a). “Decision–Centric Active Learning of Binary–Outcome Models.” Information Systems Research, 18(1), 4–22.

    Article  MATH  Google Scholar 

  • Saar-Tsechansky M, Provost F (2007b). “Handling Missing Values When Applying Classification Models.” Journal of Machine Learning Research, 8, 1625–1657.

    MATH  Google Scholar 

  • Saeys Y, Inza I, Larranaga P (2007). “A Review of Feature Selection Techniques in Bioinformatics.” Bioinformatics, 23(19), 2507–2517.

    Article  Google Scholar 

  • Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227.

    Google Scholar 

  • Schapire YFR (1999). “Adaptive Game Playing Using Multiplicative Weights.” Games and Economic Behavior, 29, 79–103.

    Article  MathSciNet  MATH  Google Scholar 

  • Schmidberger M, Morgan M, Eddelbuettel D, Yu H, Tierney L, Mansmann U (2009). “State–of–the–Art in Parallel Computing with R.” Journal of Statistical Software, 31(1).

    Google Scholar 

  • Serneels S, Nolf ED, Espen PV (2006). “Spatial Sign Pre-processing: A Simple Way to Impart Moderate Robustness to Multivariate Estimators.” Journal of Chemical Information and Modeling, 46(3), 1402–1409.

    Article  Google Scholar 

  • Shachtman N (2011). “Pentagon’s Prediction Software Didn’t Spot Egypt Unrest.” Wired.

    Google Scholar 

  • Shannon C (1948). “A Mathematical Theory of Communication.” The Bell System Technical Journal, 27(3), 379–423.

    Article  MathSciNet  MATH  Google Scholar 

  • Siegel E (2011). “Uplift Modeling: Predictive Analytics Can’t Optimize Marketing Decisions Without It.” Technical report, Prediction Impact Inc.

    Google Scholar 

  • Simon R, Radmacher M, Dobbin K, McShane L (2003). “Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification.” Journal of the National Cancer Institute, 95(1), 14–18.

    Article  Google Scholar 

  • Smola A (1996). “Regression Estimation with Support Vector Learning Machines.” Master’s thesis, Technische Universit at Munchen.

    Google Scholar 

  • Spector P (2008). Data Manipulation with R. Springer.

    Google Scholar 

  • Steyerberg E (2010). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, 1st ed. softcover of orig. ed. 2009 edition.

    Google Scholar 

  • Stone M, Brooks R (1990). “Continuum Regression: Cross-validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares, and Principal Component Regression.” Journal of the Royal Statistical Society, Series B, 52, 237–269.

    MathSciNet  MATH  Google Scholar 

  • Strobl C, Boulesteix A, Zeileis A, Hothorn T (2007). “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics, 8(1), 25.

    Article  Google Scholar 

  • Suykens J, Vandewalle J (1999). “Least Squares Support Vector Machine Classifiers.” Neural processing letters, 9(3), 293–300.

    Article  MathSciNet  MATH  Google Scholar 

  • Tetko I, Tanchuk V, Kasheva T, Villa A (2001). “Estimation of Aqueous Solubility of Chemical Compounds Using E–State Indices.” Journal of Chemical Information and Computer Sciences, 41(6), 1488–1493.

    Article  Google Scholar 

  • Tibshirani R (1996). “Regression Shrinkage and Selection via the lasso.” Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R, Hastie T, Narasimhan B, Chu G (2002). “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.” Proceedings of the National Academy of Sciences, 99(10), 6567–6572.

    Article  Google Scholar 

  • Tibshirani R, Hastie T, Narasimhan B, Chu G (2003). “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays.” Statistical Science, 18(1), 104–117.

    Article  MathSciNet  MATH  Google Scholar 

  • Ting K (2002). “An Instance–Weighting Method to Induce Cost–Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), 659–665.

    Article  Google Scholar 

  • Tipping M (2001). “Sparse Bayesian Learning and the Relevance Vector Machine.” Journal of Machine Learning Research, 1, 211–244.

    MathSciNet  MATH  Google Scholar 

  • Titterington M (2010). “Neural Networks.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 1–8.

    Article  MATH  Google Scholar 

  • Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001). “Missing Value Estimation Methods for DNA Microarrays.” Bioinformatics, 17(6), 520–525.

    Article  Google Scholar 

  • Tumer K, Ghosh J (1996). “Analysis of Decision Boundaries in Linearly Combined Neural Classifiers.” Pattern Recognition, 29(2), 341–348.

    Article  Google Scholar 

  • US Commodity Futures Trading Commission and US Securities & Exchange Commission (2010). Findings Regarding the Market Events of May 6, 2010.

    Google Scholar 

  • Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.

    Article  MATH  Google Scholar 

  • Van Der Putten P, Van Someren M (2004). “A Bias–Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000.” Machine Learning, 57(1), 177–195.

    Article  MATH  Google Scholar 

  • Van Hulse J, Khoshgoftaar T, Napolitano A (2007). “Experimental Perspectives On Learning From Imbalanced Data.” In “Proceedings of the 24th International Conference On Machine learning,” pp. 935–942.

    Google Scholar 

  • Vapnik V (2010). The Nature of Statistical Learning Theory. Springer.

    Google Scholar 

  • Varma S, Simon R (2006). “Bias in Error Estimation When Using Cross–Validation for Model Selection.” BMC Bioinformatics, 7(1), 91.

    Article  MathSciNet  Google Scholar 

  • Varmuza K, He P, Fang K (2003). “Boosting Applied to Classification of Mass Spectral Data.” Journal of Data Science, 1, 391–404.

    Google Scholar 

  • Venables W, Ripley B (2002). Modern Applied Statistics with S. Springer.

    Google Scholar 

  • Venables W, Smith D, the R Development Core Team (2003). An Introduction to R. R Foundation for Statistical Computing, Vienna, Austria, version 1.6.2 edition. ISBN 3-901167-55-2, URL http://www.R-project.org.

  • Venkatraman E (2000). “A Permutation Test to Compare Receiver Operating Characteristic Curves.” Biometrics, 56(4), 1134–1138.

    Article  MathSciNet  MATH  Google Scholar 

  • Veropoulos K, Campbell C, Cristianini N (1999). “Controlling the Sensitivity of Support Vector Machines.” Proceedings of the International Joint Conference on Artificial Intelligence, 1999, 55–60.

    Google Scholar 

  • Verzani J (2002). “simpleR – Using R for Introductory Statistics.” URL http://www.math.csi.cuny.edu/Statistics/R/simpleR.

  • Wager TT, Hou X, Verhoest PR, Villalobos A (2010). “Moving Beyond Rules: The Development of a Central Nervous System Multiparameter Optimization (CNS MPO) Approach To Enable Alignment of Druglike Properties.” ACS Chemical Neuroscience, 1(6), 435–449.

    Article  Google Scholar 

  • Wallace C (2005). Statistical and Inductive Inference by Minimum Message Length. Springer–Verlag.

    Google Scholar 

  • Wang C, Venkatesh S (1984). “Optimal Stopping and Effective Machine Complexity in Learning.” Advances in NIPS, pp. 303–310.

    Google Scholar 

  • Wang Y, Witten I (1997). “Inducing Model Trees for Continuous Classes.” Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137.

    Google Scholar 

  • Weiss G, Provost F (2001a). “The Effect of Class Distribution on Classifier Learning: An Empirical Study.” Department of Computer Science, Rutgers University.

    Google Scholar 

  • Weiss G, Provost F (2001b). “The Effect of Class Distribution On Classifier Learning: An Empirical Study.” Technical Report ML-TR-44, Department of Computer Science, Rutgers University.

    Google Scholar 

  • Welch B (1939). “Note on Discriminant Functions.” Biometrika, 31, 218–220.

    MathSciNet  MATH  Google Scholar 

  • Westfall P, Young S (1993). Resampling–Based Multiple Testing: Examples and Methods for P–Value Adjustment. Wiley.

    Google Scholar 

  • Westphal C (2008). Data Mining for Intelligence, Fraud & Criminal Detection: Advanced Analytics & Information Sharing Technologies. CRC Press.

    Google Scholar 

  • Whittingham M, Stephens P, Bradbury R, Freckleton R (2006). “Why Do We Still Use Stepwise Modelling in Ecology and Behaviour?” Journal of Animal Ecology, 75(5), 1182–1189.

    Article  Google Scholar 

  • Willett P (1999). “Dissimilarity–Based Algorithms for Selecting Structurally Diverse Sets of Compounds.” Journal of Computational Biology, 6(3), 447–457.

    Article  MathSciNet  Google Scholar 

  • Williams G (2011). Data Mining with Rattle and R : The Art of Excavating Data for Knowledge Discovery. Springer.

    Google Scholar 

  • Witten D, Tibshirani R (2009). “Covariance–Regularized Regression and Classification For High Dimensional Problems.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 71(3), 615–636.

    Article  MathSciNet  MATH  Google Scholar 

  • Witten D, Tibshirani R (2011). “Penalized Classification Using Fisher’s Linear Discriminant.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(5), 753–772.

    Article  MathSciNet  MATH  Google Scholar 

  • Wold H (1966). “Estimation of Principal Components and Related Models by Iterative Least Squares.” In P Krishnaiah (ed.), “Multivariate Analyses,” pp. 391–420. Academic Press, New York.

    Google Scholar 

  • Wold H (1982). “Soft Modeling: The Basic Design and Some Extensions.” In K Joreskog, H Wold (eds.), “Systems Under Indirect Observation: Causality, Structure, Prediction,” pt. 2, pp. 1–54. North–Holland, Amsterdam.

    Google Scholar 

  • Wold S (1995). “PLS for Multivariate Linear Modeling.” In H van de Waterbeemd (ed.), “Chemometric Methods in Molecular Design,” pp. 195–218. VCH, Weinheim.

    Google Scholar 

  • Wold S, Johansson M, Cocchi M (1993). “PLS–Partial Least-Squares Projections to Latent Structures.” In H Kubinyi (ed.), “3D QSAR in Drug Design,” volume 1, pp. 523–550. Kluwer Academic Publishers, The Netherlands.

    Google Scholar 

  • Wold S, Martens H, Wold H (1983). “The Multivariate Calibration Problem in Chemistry Solved by the PLS Method.” In “Proceedings from the Conference on Matrix Pencils,” Springer–Verlag, Heidelberg.

    MATH  Google Scholar 

  • Wolpert D (1996). “The Lack of a priori Distinctions Between Learning Algorithms.” Neural Computation, 8(7), 1341–1390.

    Article  Google Scholar 

  • Yeh I (1998). “Modeling of Strength of High-Performance Concrete Using Artificial Neural Networks.” Cement and Concrete research, 28(12), 1797–1808.

    Article  Google Scholar 

  • Yeh I (2006). “Analysis of Strength of Concrete Using Design of Experiments and Neural Networks.” Journal of Materials in Civil Engineering, 18, 597–604.

    Article  Google Scholar 

  • Youden W (1950). “Index for Rating Diagnostic Tests.” Cancer, 3(1), 32–35.

    Article  Google Scholar 

  • Zadrozny B, Elkan C (2001). “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers.” In “Proceedings of the 18th International Conference on Machine Learning,” pp. 609–616. Morgan Kaufmann.

    Google Scholar 

  • Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.

    Article  MathSciNet  Google Scholar 

  • Zhu J, Hastie T (2005). “Kernel Logistic Regression and the Import Vector Machine.” Journal of Computational and Graphical Statistics, 14(1), 185–205.

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie T (2005). “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, 67(2), 301–320.

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H, Hastie T, Tibshirani R (2004). “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics, 15, 2006.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kuhn, M., Johnson, K. (2013). A Short Tour of the Predictive Modeling Process. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_2

Download citation

Publish with us

Policies and ethics