Skip to main content
Log in

Hedonic pricing modelling with unstructured predictors: an application to Italian Fashion Industry

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

This study proposes a comparison of hedonic pricing models that use attributes obtained by featurizing text. We collected prices of items sold on the websites of five famous fashion producers in order to estimate hedonic pricing models that leverage the information contained in product descriptions. We mapped product descriptions to a high-dimensional feature space and compared predictive accuracy and variable selection properties of some statistical estimators that leverage sparse modelling, topic modelling and aggregated predictors, to test whether better predictive accuracy comes with an empirically consistent selection of attributes. We call this approach Hedonic Text-Regression modelling. Its novelty is that by using attributes obtained by text-mining of product descriptions, we obtain an estimate of the implicit price of the words contained therein. Empirically, all the proposed models outperformed the traditional hedonic pricing model in terms of predictive accuracy, while also providing consistent variable selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. There may be sections of websites with regulation rather than marketing information on products. By scraping and mining this data, the set of significant attributes that we obtain is more likely a consequence of obvious causality rather than actual price determinants.

  2. In Italy, depending on the region, usually starts at mid-January.

  3. Stopwords are words that do not provide any semantic meaning. The descriptions that we use are Italian descriptions.

  4. we use the notion of superwords to refer to group of words. It resembles the notion of supergenes introduced by Park et al. (2007)

  5. we excluded the dresses priced over 1000 euros for brand EF as of Figure 1ii.

  6. The competition rewarded the best three pricing algorithms in terms of root-mean-squared logarithmic error with a monetary prize of 60 k, 30 k and 10 k dollars, respectively

References

  • Aggarwal, C.C.: Machine learning for text. Springer, Newyork (2018)

    Book  Google Scholar 

  • Archak, N., Ghose, A., Ipeirotis, P.G.: Deriving the pricing power of product features by mining consumer reviews. Manag. Sci. 57(8), 1485–1509 (2011)

    Article  Google Scholar 

  • Baltas, G., Saridakis, C.: Measuring brand equity in the car market: a hedonic price analysis. J. Oper. Res. Soc. 61(2), 284–293 (2010)

    Article  Google Scholar 

  • Belloni, A., Chernozhukov, V., Wang, L.: Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791–806 (2011)

    Article  MathSciNet  Google Scholar 

  • Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc.: Ser. B 57(1), 289–300 (1995)

    MathSciNet  Google Scholar 

  • Berry, M., Kogan, J.: Text mining: applications and theory. Wiley, Newjersey (2010)

    Book  Google Scholar 

  • Berry, M.W., Castellanos, M.: Survey of text mining. Comput. Rev. 45(9), 548 (2004)

    Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    Google Scholar 

  • Bogdan, M., Van Den Berg, E., Sabatti, C., Su, W., Candès, E.J.: Slope–adaptive variable selection via convex optimization. Ann. Appl. Stat. 9(3), 1103 (2015)

    Article  MathSciNet  Google Scholar 

  • Cachon, G.P., Swinney, R.: The value of fast fashion: quick response, enhanced design, and strategic consumer behavior. Manag. Sci. 57(4), 778–795 (2011)

    Article  Google Scholar 

  • Cassel, E., Mendelsohn, R.: The choice of functional forms for hedonic price equations: comment. J. Urban Econ. 18(2), 135–142 (1985)

    Article  Google Scholar 

  • Cavallo, A.: Are online and offline prices similar? evidence from large multi-channel retailers. Am. Econ. Rev. 107(1), 283–303 (2017)

    Article  Google Scholar 

  • Cavallo, A.: Scraped data and sticky prices. Rev. Econ. Stat. 100(1), 105–119 (2018)

    Article  Google Scholar 

  • Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  • Einav, L., Levin, J.: Economics in the age of big data. Science 346(6210), 1243089 (2014)

    Article  Google Scholar 

  • Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. National Acad. Sci. 101(suppl 1), 5228–5235 (2004)

    Article  Google Scholar 

  • Nowak, A., Smith, P.: Textual analysis in real estate. J. Appl. Economet. 32(4), 896–918 (2017)

    Article  MathSciNet  Google Scholar 

  • Park, M.Y., Hastie, T., Tibshirani, R.: Averaged gene expressions for regression. Biostatistics 8(2), 212–227 (2007)

    Article  Google Scholar 

  • Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)

    Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc.: Ser. B 58(1), 267–288 (1996)

    MathSciNet  Google Scholar 

  • Tibshirani, R.J., Taylor, J., et al.: The solution path of the generalized lasso. Annal. Stat. 39(3), 1335–1371 (2011)

    Article  MathSciNet  Google Scholar 

  • Wainwright, M.J., Jordan, M.I., et al.: Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1(1–2), 1–305 (2008)

    Google Scholar 

  • Yan, X., Bien, J.: Rare feature selection in high dimensions. J. Am. Stat. Assoc, pp 1–14, (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Crescenzi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Crescenzi, F. Hedonic pricing modelling with unstructured predictors: an application to Italian Fashion Industry. AStA Adv Stat Anal 107, 733–753 (2023). https://doi.org/10.1007/s10182-022-00465-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-022-00465-5

Navigation