Abstract
This study proposes a comparison of hedonic pricing models that use attributes obtained by featurizing text. We collected prices of items sold on the websites of five famous fashion producers in order to estimate hedonic pricing models that leverage the information contained in product descriptions. We mapped product descriptions to a high-dimensional feature space and compared predictive accuracy and variable selection properties of some statistical estimators that leverage sparse modelling, topic modelling and aggregated predictors, to test whether better predictive accuracy comes with an empirically consistent selection of attributes. We call this approach Hedonic Text-Regression modelling. Its novelty is that by using attributes obtained by text-mining of product descriptions, we obtain an estimate of the implicit price of the words contained therein. Empirically, all the proposed models outperformed the traditional hedonic pricing model in terms of predictive accuracy, while also providing consistent variable selection.
Similar content being viewed by others
Notes
There may be sections of websites with regulation rather than marketing information on products. By scraping and mining this data, the set of significant attributes that we obtain is more likely a consequence of obvious causality rather than actual price determinants.
In Italy, depending on the region, usually starts at mid-January.
Stopwords are words that do not provide any semantic meaning. The descriptions that we use are Italian descriptions.
we use the notion of superwords to refer to group of words. It resembles the notion of supergenes introduced by Park et al. (2007)
we excluded the dresses priced over 1000 euros for brand EF as of Figure 1ii.
The competition rewarded the best three pricing algorithms in terms of root-mean-squared logarithmic error with a monetary prize of 60 k, 30 k and 10 k dollars, respectively
References
Aggarwal, C.C.: Machine learning for text. Springer, Newyork (2018)
Archak, N., Ghose, A., Ipeirotis, P.G.: Deriving the pricing power of product features by mining consumer reviews. Manag. Sci. 57(8), 1485–1509 (2011)
Baltas, G., Saridakis, C.: Measuring brand equity in the car market: a hedonic price analysis. J. Oper. Res. Soc. 61(2), 284–293 (2010)
Belloni, A., Chernozhukov, V., Wang, L.: Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791–806 (2011)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc.: Ser. B 57(1), 289–300 (1995)
Berry, M., Kogan, J.: Text mining: applications and theory. Wiley, Newjersey (2010)
Berry, M.W., Castellanos, M.: Survey of text mining. Comput. Rev. 45(9), 548 (2004)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Bogdan, M., Van Den Berg, E., Sabatti, C., Su, W., Candès, E.J.: Slope–adaptive variable selection via convex optimization. Ann. Appl. Stat. 9(3), 1103 (2015)
Cachon, G.P., Swinney, R.: The value of fast fashion: quick response, enhanced design, and strategic consumer behavior. Manag. Sci. 57(4), 778–795 (2011)
Cassel, E., Mendelsohn, R.: The choice of functional forms for hedonic price equations: comment. J. Urban Econ. 18(2), 135–142 (1985)
Cavallo, A.: Are online and offline prices similar? evidence from large multi-channel retailers. Am. Econ. Rev. 107(1), 283–303 (2017)
Cavallo, A.: Scraped data and sticky prices. Rev. Econ. Stat. 100(1), 105–119 (2018)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Einav, L., Levin, J.: Economics in the age of big data. Science 346(6210), 1243089 (2014)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. National Acad. Sci. 101(suppl 1), 5228–5235 (2004)
Nowak, A., Smith, P.: Textual analysis in real estate. J. Appl. Economet. 32(4), 896–918 (2017)
Park, M.Y., Hastie, T., Tibshirani, R.: Averaged gene expressions for regression. Biostatistics 8(2), 212–227 (2007)
Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc.: Ser. B 58(1), 267–288 (1996)
Tibshirani, R.J., Taylor, J., et al.: The solution path of the generalized lasso. Annal. Stat. 39(3), 1335–1371 (2011)
Wainwright, M.J., Jordan, M.I., et al.: Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1(1–2), 1–305 (2008)
Yan, X., Bien, J.: Rare feature selection in high dimensions. J. Am. Stat. Assoc, pp 1–14, (2020)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Crescenzi, F. Hedonic pricing modelling with unstructured predictors: an application to Italian Fashion Industry. AStA Adv Stat Anal 107, 733–753 (2023). https://doi.org/10.1007/s10182-022-00465-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-022-00465-5