Data Pre-Processing and Modeling Factors

Cohen, Maxime C.; Gras, Paul-Emile; Pentecoste, Arthur; Zhang, Renyu

doi:10.1007/978-3-030-85855-1_2

Maxime C. Cohen⁶,
Paul-Emile Gras⁷,
Arthur Pentecoste⁸ &
…
Renyu Zhang⁹

Part of the book series: Springer Series in Supply Chain Management ((SSSCM,volume 14))

1147 Accesses

Abstract

This chapter covers several important pre-processing steps. Before implementing a demand prediction method, it is crucial to process the raw data in order to extract as much predictive power as possible from the different features available in the data. We discuss how to deal with missing data and how to test for outliers in the context of demand prediction. We then cover various concepts related to feature engineering for demand prediction, such as accounting for time effects and constructing lag-price variables. We end this chapter by discussing the practice of scaling features, and how to sort and export the resulting processed dataset. Each step is illustrated using the accompanying dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer.fit_transform.
2.
https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html.
3.
https://link.springer.com/referenceworkentry/10.1007/978-0-387-32833-1_401.
4.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html.
5.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html.
6.
See Srinivasan et al. (2005).
7.
See, e.g., Pindyck and Rubinfeld (2018), Cohen and Perakis (2020).
8.
https://sklearn.org/modules/generated/sklearn.preprocessing.StandardScaler.html.
9.
https://sklearn.org/modules/generated/sklearn.preprocessing.MinMaxScaler.html.

References

Cohen MC, Perakis G (2020) Optimizing promotions for multiple items in supermarkets. Channel Strategies and Marketing Mix in a Connected World, 71–97 (Springer).
Google Scholar
Pindyck RS, Rubinfeld DL (2018) Microeconomics.
Google Scholar
Srinivasan, S. R., S. Ramakrishnan, S. Grasman. 2005. Incorporating cannibalization models into demand forecasting. Marketing Intelligence & Planning.
Google Scholar

Download references

Author information

Authors and Affiliations

Desautels Faculty of Management, McGill University, Montreal, QC, Canada
Maxime C. Cohen
Virtuo Technologies, Paris, France
Paul-Emile Gras
Boston Consulting Group GAMMA, New York, NY, USA
Arthur Pentecoste
New York University Shanghai, Shanghai, China
Renyu Zhang

Authors

Maxime C. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Paul-Emile Gras
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Pentecoste
View author publications
You can also search for this author in PubMed Google Scholar
Renyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cohen, M.C., Gras, PE., Pentecoste, A., Zhang, R. (2022). Data Pre-Processing and Modeling Factors. In: Demand Prediction in Retail . Springer Series in Supply Chain Management, vol 14. Springer, Cham. https://doi.org/10.1007/978-3-030-85855-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-85855-1_2
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85854-4
Online ISBN: 978-3-030-85855-1
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics