Skip to main content
Log in

Complexities of leveraging user-generated book reviews for scholarly research: transiency, power dynamics, and cultural dependency

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

In the past two decades, digital libraries (DL) have increasingly supported computational studies of digitized books (Jett et al. The hathitrust research center extracted features dataset (2.0), 2020; Underwood, Distant horizons: digital evidence and literary change, University of Chicago Press, Chicago, 2019; Organisciak et al. J Assoc Inf Sci Technol 73:317–332, 2022; Michel et al. Science 331:176–182, 2011). Nonetheless, there remains a dearth of DL data provisions or infrastructures for research on book reception, and user-generated book reviews have opened up unprecedented research opportunities in this area. However, insufficient attention has been paid to real-world complexities and limitations of using these datasets in scholarly research, which may cause analytical oversights (Crawford and Finn, Geo J 80:491–502, 2015), methodological pitfalls (Olteanu et al. Front Big Data 2:13, 2019), and ethical concerns (Hu et al. Research with user-generated book review data: legal and ethical pitfalls and contextualized mitigations, Springer, Berlin, 2023; Diesner and Chin, Gratis, libre, or something else? regulations and misassumptions related to working with publicly available text data, 2016). In this paper, we present three case studies that contextually and empirically investigate book reviews for their temporal, cultural, and socio-participatory complexities: (1) a longitudinal analysis of a ranked book list across ten years and over one month; (2) a text classification of 20,000 sponsored and 20,000 non-sponsored books reviews; and (3) a comparative analysis of 537 book ratings from Anglophone and non-Anglophone readerships. Our work reflects on both (1) data curation challenges that researchers may encounter (e.g., platform providers’ lack of bibliographic control) when studying book reviews and (2) mitigations that researchers might adopt to address these challenges (e.g., how to align data from various platforms). Taken together, our findings illustrate some of the sociotechnical complexities of working with user-generated book reviews by revealing the transiency, power dynamics, and cultural dependency in these datasets. This paper explores some of the limitations and challenges of using user-generated book reviews for scholarship and calls for critical and contextualized usage of user-generated book reviews in future scholarly research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. In this paper, “Amazon Books” refers only to the online book-selling department of Amazon.com [68], not the Amazon Books retail bookstores [69].

  2. In the rest of this paper, Douban data refer to the data collected from Douban Books.

  3. According to a paper associated with the dataset [77], the location information released in this particular dataset was detected by a combination of rule-based approaches. The researchers were able to detect a total of 96.2% of user locations with this method. However, the accuracy of the inferred location information was not verified.

  4. API stands for “application programming interface,” which is software that allows computer programs to send and access data from each other.

  5. For instance, the book ratings collected and used for our studies were (1) generated algorithmically based on all ratings the books received and (2) not associated with any individual users or user accounts.

  6. We referred to Sect. 4 Limitations on Rights Article 22 of the Copyright Law of the People’s Republic of China [93, 94].

  7. The currently released datasets, as well as any future updates and scripts, are available in this repository: https://github.com/Yuerong2/JCDL2022ResearchPaperData.

  8. Non-consumptive use means that the computational analysis is performed on the digitized texts without researchers’ reading or displaying substantial portions of an in-copyright or right-restricted data; transformative use refers to a type of fair use that relies on a fundamentally transformed version of a copyrighted work for a different purpose from the original and thus does not infringe its holder’s copyright [91, 96].

  9. The shapes and relative positions of the colored ovals were chosen for better visual presentation and do not provide additional information about the book lists they represent.

  10. The parameters of the power-law distribution are alpha = 3.14 and sigma = 0.33. The total numbers of ratings we used were collected from 548 of the 552 books, as four of the books’ pages were no longer publicly accessible on Douban as of 2022.

  11. These statistics were based on the four snapshots we collected over this month; historical data in the 2011–2021 dataset were not involved.

  12. One complication was that this dataset contained textual reviews with 0-to-5-star ratings, instead of 1-to-5-star. As Goodreads did not actually allow reviewers to give 0-star ratings, the 0-star reviews mostly indicated that there were no 1-to-5-star numeric ratings collected along with the texts. We kept all the 0-star reviews in our data analysis since the review texts were valid according to our preliminary investigation.

  13. “True positive sponsored reviews” means that the reviews were indeed sponsored, as indicated by explicit sponsorship claims or disclosures, and were correctly categorized as sponsored reviews by the text classifiers.

  14. Each sample of reviews was selected randomly, and a review might be selected more than once.

  15. Standardized deviation does not apply when there is only one book in a particular language, which led to all “nan” values in this table.

  16. Standardized deviation does not apply when there is only one book in a particular language, which led to all “nan” values in this table.

  17. Standardized deviation does not apply when there is only one book in a particular language, which led to all “nan” values in this table.

References

  1. Jett, J. et al.: The hathitrust research center extracted features dataset (2.0) (2020). https://doi.org/10.13012/R2TE-C227

  2. Underwood, T.: Distant horizons: digital evidence and literary change. University of Chicago Press, Chicago (2019)

    Book  Google Scholar 

  3. Organisciak, P., Schmidt, B.M., Downie, J.S.: Giving shape to large digital libraries through exploratory data analysis. J. Am. Soc. Inf. Sci. 73(2), 317–332 (2022)

    Google Scholar 

  4. Michel, J.-B., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)

    Article  Google Scholar 

  5. Milligan, I.: The problem of history in the age of abundance (2016)

  6. Walsh, M.: Where is all the book data (2022). https://www.publicbooks.org/where-is-all-the-book-data/

  7. Kotin, J. et al.: Shakespeare and company project dataset: lending library events. https://doi.org/10.34770/39sq-bm51 (2021)

  8. So, R. J. , Wezerek, G.: Just how white is the book industry? (2020). https://www.nytimes.com/interactive/2020/12/11/opinion/culture/diversity-publishing-industry.html

  9. Boot, P.: The desirability of a corpus of online book responses. In: Proceedings of the Workshop on Computational Linguistics for Literature, pp. 32–40 (2013)

  10. English, J. F.: A future for empirical reader studies (2021). https://culturalanalytics.org/post/1208-a-future-for-empirical-reader-studies

  11. Dai, L.: From history of the book to history of reading: theories and methods for historical studies of reading, Xinxing, (2017)

  12. Walsh, M., Antoniak, M.: The goodreads"classics": a computational study of readers, amazon, and crowdsourced amateur criticism. J. Cult. Anal. 4, 243–287 (2021)

    Google Scholar 

  13. Koolena, M., Boot, P., van Zundertb, J. J.: Online book reviews and the computational modelling of reading impact. In: Proceedings http://ceur-ws.org vol. 1613, p 0073. ISSN, (2020)

  14. Rebora, S., et al.: Digital humanities and digital social reading. Digit. Scholarsh. Humanit. 36, ii230–ii250 (2021)

    Article  Google Scholar 

  15. Bartley, P.: Book tagging on librarything: How, why, and what are in the tags? Proc. Am. Soc. Inf. Sci. Technol. 46(1), 1–22 (2009)

    Article  Google Scholar 

  16. Lu, C., Park, J., Hu, X.: User tags versus expert-assigned subject terms: a comparison of librarything tags and library of congress subject headings. J. Inf. Sci. 36(6), 763–779 (2010)

    Article  Google Scholar 

  17. Worrall, A.: “Like a real friendship”: translation, coherence, and convergence of information values in librarything and goodreads. In: iConference 2015 Proceedings (2015)

  18. Bourrier, K., Thelwall, M.: The social lives of books: Reading victorian literature on goodreads. J. Cult. Anal. 1(1), 12049 (2020)

    Google Scholar 

  19. Antoniak, M., Walsh, M., Mimno, D.: Tags, borders, and catalogs: Social re-working of genre on librarything. Proc. ACM Hum. Comput. Interact. 5(CSCW1), 1–29 (2021)

    Article  Google Scholar 

  20. Gilbert, E., Karahalios, K.: Understanding deja reviewers, pp 225–228 (2010)

  21. Maity, S. K., Panigrahi, A., Mukherjee, A.: Book reading behavior on goodreads can predict the amazon best sellers, pp 451–454 (2017)

  22. Nakamura, L.: “Words with friends’’: socially networked reading on goodreads. PMLA/Publ. Mod. Lang. Assoc. Am. 128(1), 238–243 (2013)

    Article  MathSciNet  Google Scholar 

  23. Shahsavari, S. et al.: An automated pipeline for character and relationship extraction from readers literary book reviews on goodreads. com, 277–286 (2020)

  24. Wan, M., McAuley, J. J. Pera, S., Ekstrand, M. D., Amatriain, X., O’Donovan, J. (eds) Item recommendation on monotonic behavior chains. (eds Pera, S., Ekstrand, M. D., Amatriain, X. & O’Donovan, J.) In: Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018, 86–94 (ACM, 2018). https://doi.org/10.1145/3240323.3240369

  25. Wan, M., Misra, R., Nakashole, N., McAuley, J. J. Korhonen, A., Traum, D. R., Màrquez, L. (eds) Fine-grained spoiler detection from large-scale review corpora. (eds Korhonen, A., Traum, D. R. & Màrquez, L.) In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Vol 1, pp. 2605–2610 (Association for Computational Linguistics, 2019). https://doi.org/10.18653/v1/p19-1248

  26. Howsam, L.: Old books and new histories: an orientation to studies in book and print culture. University of Toronto Press, Toronto (2006)

    Google Scholar 

  27. Pianzola, F. et al.: Books’impact in digital social reading: Towards a conceptual and methodological framework. In: Digital Humanities 2022 Conference Abstracts, pp. 94–98 (2022)

  28. Hu, N., Bose, I., Koh, N.S., Liu, L.: Manipulation of online reviews: an analysis of ratings, readability, and sentiments. Decis. Support Syst. 52(3), 674–684 (2012)

    Article  Google Scholar 

  29. Hu, Y., Layne-Worthey, G., Martaus, A., Downie, J.S., Diesner, J.: Research with user-generated book review data: legal and ethical pitfalls and contextualized mitigations, pp. 163–186. Springer, Berlin (2023)

    Google Scholar 

  30. Willemsen, L.M., Neijens, P.C., Bronner, F., De Ridder, J.A.: “Highly recommended!" the content characteristics and perceived usefulness of online consumer reviews. J. Comput. Mediat. Commun. 17(1), 19–38 (2011)

    Article  Google Scholar 

  31. Liu, Z., Park, S.: What makes a useful online review? Implication for travel product websites. Tour. Manage. 47, 140–151 (2015)

    Article  Google Scholar 

  32. Kambara, T., Okamoto, S., Teramoto, Y., Kusu, K., Hatano, K.: Evaluating usefulness of reviews based on evaluation standpoints of consumers, pp. 110–117 (2018)

  33. Lopes, A.I., Dens, N., De Pelsmacker, P., De Keyzer, F.: Which cues influence the perceived usefulness and credibility of an online review? A conjoint analysis. Online Inf. Rev. 45(1), 1–20 (2020)

    Article  Google Scholar 

  34. Li, H., Wang, X., Wang, S., Zhou, W., Yang, Z.: The power of numbers: an examination of the relationship between numerical cues in online review comments and perceived review helpfulness. J. Res. Interact. Mark. (2022). https://doi.org/10.1108/JRIM-09-2021-0239

    Article  Google Scholar 

  35. Jiang, M., Diesner, J.: Issue-focused documentaries versus other films: Rating and type prediction based on user-authored reviews, pp. 225–230 (2016)

  36. Wang, J., Ghose, A., Ipeirotis, P.: Bonus, disclosure, and choice: What motivates the creation of high-quality paid reviews? Citeseer, (2012)

  37. McCluskey, M.: How extortion scams and review bombing trolls turned goodreads into many authors’worst nightmare (2021). https://time.com/6078993/goodreads-review-bombing/

  38. Fornaciari, T., Poesio, M.: Identifying fake amazon reviews as learning from crowds, Association for Computational Linguistics, pp. 279–287 (2014)

  39. Luca, M., Zervas, G.: Fake it till you make it: reputation, competition, and yelp review fraud. Manage. Sci. 62(12), 3412–3427 (2016)

    Article  Google Scholar 

  40. Wu, Y., Ngai, E.W., Wu, P., Wu, C.: Fake online reviews: literature review, synthesis, and directions for future research. Decis. Support Syst. 132, 113280 (2020)

    Article  Google Scholar 

  41. Newell, E.D., Dimitrov, S., Piper, A., Ruths, D.: How a platform shapes reviewing behavior, To buy or to read (2016)

  42. Schuckert, M., Liu, X., Law, R.: Insights into suspicious online ratings: direct evidence from tripadvisor. Asia Pac. J. Tour. Res. 21(3), 259–272 (2016)

    Article  Google Scholar 

  43. Lappas, T., Sabnis, G., Valkanas, G.: The impact of fake reviews on online visibility: a vulnerability assessment of the hotel industry. Inf. Syst. Res. 27(4), 940–961 (2016)

    Article  Google Scholar 

  44. Murray, S.: Secret agents: algorithmic culture, goodreads and datafication of the contemporary book world. Eur. J. Cult. Stud. 24(4), 970–989 (2021)

    Article  Google Scholar 

  45. Antoniak, M., Walsh, M.: The crowdsourced“classics”and the revealing limits of goodreads data (2020). https://doi.org/10.17613/7k61-eg23

  46. Lappas, T.: Fake reviews: the malicious perspective, pp. 23–34. Springer, Berlin (2012)

    Google Scholar 

  47. Streitfeld, D.: The best book reviews money can buy (2012). https://www.nytimes.com/2012/08/26/business/book-reviewers-for-hire-meet-a-demand-for-online-raves.html

  48. Kirkus. Get reviewed. get discovered. (2022). https://www.kirkusreviews.com/indie-reviews/

  49. Olivia. How to get arc and review copies of books-all you need to know (2017). https://booksandreaderssite.wordpress.com/2017/07/10/how-to-get-arc-and-review-copies-of-books-all-you-need-to-know/

  50. Murphy, D.: Are advanced reader copies (arcs) for book reviews illegal (against amazon’s terms?) (2016). https://www.creativindie.com/are-advanced-reader-copies-arcs-for-book-reviews-illegal-against-amazons-terms/

  51. Holur, P., Shahsavari, S., Ebrahimzadeh, E., Tangherlini, T.R., Roychowdhury, V.: Modelling social readers: novel tools for addressing reception from online book reviews. R. Soc. open Sci. 8(12), 210797 (2021)

    Article  Google Scholar 

  52. Mendelman, L., Mukamal, A.: The generative dissensus of reading the feminist novel, 1995–2020: a computational analysis of interpretive communities. J. Cult. Anal. 6(3), 30009 (2021)

    Google Scholar 

  53. Salgaro, M.: Literary value in the era of big data. Operationalizing critical distance in professional and non-professional reviews. J. Cult. Anal. 7(2), 36446 (2022)

    Google Scholar 

  54. Moravec, M., Chang, K.K.: Feminist bestsellers: a digital history of 1970s feminism. J. Cult. Anal. (2021). https://doi.org/10.22148/001c.22333

    Article  Google Scholar 

  55. Murray, S.: The digital literary sphere: reading, writing, and selling books in the internet era. Johns Hopkins University Press, Baltimore (2018)

    Book  Google Scholar 

  56. Crawford, K., Finn, M.: The limits of crisis data: analytical and ethical challenges of using social and mobile data to understand disasters. Geo. J. 80, 491–502 (2015)

    Google Scholar 

  57. Olteanu, A., Castillo, C., Diaz, F., Kıcıman, E.: Social data: biases, methodological pitfalls, and ethical boundaries. Front. Big Data 2, 13 (2019)

    Article  Google Scholar 

  58. Bruns, A., Weller, K.: Twitter as a first draft of the present: and the challenges of preserving it for the future, pp. 183–189 (2016)

  59. Wang, Y., Wang, Z., Zhang, D., Zhang, R.: Discovering cultural differences in online consumer product reviews. J. Electron. Commer. Res. 20(3), 169–183 (2019)

    Google Scholar 

  60. Stamolampros, P., Korfiatis, N., Kourouthanassis, P., Symitsi, E.: Flying to quality: cultural influences on online reviews. J. Travel Res. 58(3), 496–511 (2019)

    Article  Google Scholar 

  61. Manshel, A., McGrath, L. B., Porter, J. D.: The rise of must-read tv-how your netflix habit is changing contemporary fiction (2021). https://www.theatlantic.com/culture/archive/2021/07/tv-adaptations-fiction/619442/

  62. Lewis, H.: How j. k. Rowling became voldemort (2020). https://www.theatlantic.com/international/archive/2020/07/why-millennial-harry-potter-fans-reject-jk-rowling/613870/

  63. Goodreads. About goodreads (2021). https://www.goodreads.com/about/us

  64. Douban. About douban (2021). https://www.douban.com/about

  65. Bao, T., Chang, T.-L.S.: Why amazon uses both the new york times best seller list and customer reviews: an empirical study of multiplier effects on product sales from multiple earned media. Decis. Support Syst. 67, 1–8 (2014)

    Article  Google Scholar 

  66. Ptuabhof, L., Da, P.R.: Goodreads ratings and reviews analysis of booker prize titles, pp. 363–371. Segment Publication, Daryaganj (2018)

    Google Scholar 

  67. Lin, E., Fang, S., Wang, J.: Mining online book reviews for sentimental clustering, pp. 179–184, IEEE, (2013)

  68. Wikipedia contributors. Amazon (company (2021)). https://en.wikipedia.org/wiki/Amazon_(company)

  69. Wikipedia contributors. Amazon books (2021). https://en.wikipedia.org/wiki/Amazon_Books

  70. Wikipedia contributors. Goodreads (2021). https://en.wikipedia.org/wiki/Goodreads

  71. Wikipedia contributors. Librarything (2021). https://en.wikipedia.org/wiki/LibraryThing

  72. LibraryThing. About librarything (2021). https://www.librarything.com/about

  73. Wikipedia contributors. Douban (2021). https://en.wikipedia.org/wiki/Douban

  74. Xie, R.: Investigaton into douban water army: 15 RMB for a short review; votes and thumb-ups available as well (in Chinese) (2021). http://www.xinhuanet.com/fortune/2021-02/25/c_1127136296.htm

  75. Kiu-Chor, H., et al.: A case study of douban: social network communities. Masaryk Univ. J. Law Technol. 1(2), 43–56 (2007)

    Google Scholar 

  76. Sabri, N., Weber, I.: Users data (2021). https://figshare.com/articles/dataset/Users_Data/15067509. https://doi.org/10.6084/m9.figshare.15067509.v1

  77. Sabri, N., Weber, I.: A global book reading dataset. Data 6(8), 83 (2021)

    Article  Google Scholar 

  78. Diesner, J., Chin, C.-L.: Usable ethics: practical considerations for responsibly conducting research with social trace data. Big Data Ethics (2015)

  79. Diesner, J., Chin, C.-L.: Seeing the forest for the trees: understanding and implementing regulations for the collection and analysis of human centered data (2016)

  80. Zimmer, M.: “but the data is already public’’: on the ethics of research in facebook. Ethics Inf. Technol. 12, 313–325 (2010)

    Article  Google Scholar 

  81. Fiesler, C., Lampe, C., Bruckman, A. S.: Reality and perception of copyright terms of service for online content creation, pp. 1450–1461 (2016)

  82. Zimmer, M.: Addressing conceptual gaps in big data research ethics: an application of contextual integrity. Soc. Media Soc. 4(2), 2056305118768300 (2018)

    Google Scholar 

  83. Fiesler, C., Beard, N., Keegan, B. C.: No robots, spiders, or scrapers: legal and ethical regulation of data collection methods in social media terms of service, Vol. 14, pp. 187–196, (2020)

  84. Dong Fang Kuai Che (Taiyuan). Douban top 250 books (2011). https://www.douban.com/doulist/513669/

  85. Rui. Douban top 250 books old version 2013.06 (2013). https://www.douban.com/note/536479320/

  86. Shuyang. douban.com top 250 movies and books (2016). https://github.com/Shuyang/douban_top250/tree/master

  87. Zhou, J.: douban.com top 250 movies and books (2018). https://doi.org/10.18170/DVN/X20PS1

  88. Douban Books. How many douban top 250 books have you read? (2019). https://mp.weixin.qq.com/s?__biz=MzAwNzYyNDMyMA== &mid=2651117440 &idx=1 &sn=86f24dcbc54b18c40978ce325fbefb08

  89. Zebulon2020. Douban read top250 crawler (2020). https://github.com/zebulon2020/DoubanReadTop250Crawler

  90. Douban Books. Big changes to douban top 250 books: 107 new books on the list for the first time (2020). https://mp.weixin.qq.com/s/iYCf7lGdLkgNurzv_HNa-Q

  91. U.S. Copyright Office. U.s. copyright office fair use index (2023). https://www.copyright.gov/fair-use/

  92. Books, D.: Douban book tags (2022). https://book.douban.com/tag/?view=type

  93. The National People’s Congress of The People’s Republic of China. Copyright law of the people’s republic of china (chinese version) (2022). http://www.npc.gov.cn/npc/c30834/202011/848e73f58d4e4c5b82f69d25d46048c6.shtml

  94. The National People’s Congress of The People’s Republic of China. Copyright law of the people’s republic of china (english translation) (2022). http://www.china.org.cn/english/government/207484.htm

  95. Goodreads. Privacy policy (2022). https://www.goodreads.com/about/privacy

  96. Center, H. R.: Hathitrust research center non-consumptive use policy (2017). https://www.hathitrust.org/htrc_ncup

  97. Mauch, M., MacCallum, R.M., Levy, M., Leroi, A.M.: The evolution of popular music: Usa 1960–2010. R. Soc. Open Sci. 2(5), 150081 (2015)

    Article  Google Scholar 

  98. Gekoski, R.: Tolkien’s gown: and other stories of great authors and rare books, Constable, (2004)

  99. Sharma, R.: Black and lgbtq+ authors say they’re being harassed on goodreads and trolled with one-star book reviews (2021). https://inews.co.uk/culture/books/goodreads-book-reviews- black-lgbtq-authors-harrassed-trolled-949179

  100. Zhu G. S.: (an author from Zuoshu2013 the Wechat Official Account). Big changes to the douban books top 250 list, the kite runner is no longer ranked no. 1 (2020). https://post.smzdm.com/p/a830r7gq/

  101. Federal Trade Commission. Federal trade commission 16 cfr part 255 guides concerning the use of endorsements and testimonials in advertising (2023). https://www.ftc.gov/sites/default/files/attachments/press-releases/ftc-publishes-final-guides-governing-endorsements-testimonials/091005revisedendorsementguides.pdf

  102. Jiang, M., Diesner, J.: Says who...? Identification of expert versus layman critics’reviews of documentary films, pp. 2122–2132 (2016)

  103. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  104. Ramos, J. et al.: Using tf-idf to determine word relevance in document queries, Vol. 242, pp. 29–48, Citeseer, (2003)

  105. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  106. Chang, K.K., DeDeo, S.: Divergence and the complexity of difference in text and culture. J. Cult. Anal. (2020). https://doi.org/10.22148/001c.17585

    Article  Google Scholar 

  107. Camargo, C.Q., John, P., Margetts, H.Z., Hale, S.A.: Measuring the volatility of the political agenda in public opinion and news media. Public Opin. Q. 85(2), 493–516 (2021)

    Article  Google Scholar 

  108. Sorensen, A. T., Rasmussen, S. J.: Is any publicity good publicity? a note on the impact of book reviews. NBER Working Paper, Stanford University (2004)

  109. Berger, J., Sorensen, A.T., Rasmussen, S.J.: Positive effects of negative publicity: when negative reviews increase sales. Mark. Sci. 29(5), 815–827 (2010)

    Article  Google Scholar 

  110. Spence, P.J., Brandao, R.: Towards language sensitivity and diversity in the digital humanities. Digit. Stud. Champ Numér. (2021). https://doi.org/10.16995/dscn.8098

    Article  Google Scholar 

  111. Gil, A., Ortega, É.: in Global outlooks in digital humanities: multilingual practices and minimal computing, pp. 58–70. Routledge, England (2016)

    Google Scholar 

  112. Liu, A.: Culture is wide, deep, and different. J. Cult. Anal. (2021)

  113. Maryl, M.: Virtual communities–real readers: new data in empirical studies of literature (2008)

  114. Ehrmann, T., Schmale, H.: The hitchhiker’s guide to the long tail: The influence of online-reviews and product recommendations on book sales-evidence from german online retailing. In: ICIS 2008 Proceedings 157 (2008)

  115. Hong, H., Xu, D., Xu, D., Wang, G.A., Fan, W.: An empirical study on the impact of online word-of-mouth sources on retail sales. Inf. Discov. Deliv. 45(1), 30–5 (2017)

    Google Scholar 

  116. Zhang, C., Tong, T., Bu, Y.: Examining differences among book reviews from various online platforms. Online Inf. Rev. 43(7), 1169–87 (2019)

    Article  Google Scholar 

  117. Long, H.: Culture at global scale (2021). https://culturalanalytics.org/post/1160-culture-at-global-scale

  118. Dimitrov, S., Zamal, F., Piper, A., Ruths, D.: Goodreads versus amazon: the effect of decoupling book reviewing and book selling (2015)

  119. Kovács, B., Sharkey, A.J.: The paradox of publicity: how awards can negatively affect the evaluation of quality. Adm. Sci. Q. 59(1), 1–33 (2014)

    Article  Google Scholar 

  120. Wikipedia contributors. Milan kundera (2021). https://en.wikipedia.org/wiki/Milan_Kundera

  121. Douban. Lolita (webpage for the book) (2022). https://book.douban.com/subject/1465324/

  122. Goodreads. Lolita (webpage for the book) (2022). https://www.goodreads.com/book/show/7604.Lolita

  123. Chik, S., Taboada, M.: Generic structure and rhetorical relations of online book reviews in English, Japanese and Chinese. Contrastive Pragmat. 1(2), 143–179 (2020)

    Article  Google Scholar 

  124. Garthwaite, C.L.: Demand spillovers, combative advertising, and celebrity endorsements. Am. Econ. J. Appl. Econ. 6(2), 76–104 (2014)

    Article  Google Scholar 

  125. McKinnon, J.G.: Adoption of e-book platform by historical new york times best-sellers: an examination of the"long tail"theory in action. Publ. Res. Q. 31(3), 201–214 (2015)

    Article  Google Scholar 

  126. King, R.A., Racherla, P., Bush, V.D.: What we know and don’t know about online word-of-mouth: a review and synthesis of the literature. J. Interact. Mark. 28(3), 167–183 (2014)

    Article  Google Scholar 

  127. Pianzola, F., Acerbi, A., Rebora, S.: Cultural accumulation and improvement in online fan fiction. In: CEUR Workshop Proceedings (2020)

  128. Diesner, J., Chin, C.-L.: Gratis, libre, or something else? regulations and misassumptions related to working with publicly available text data (2016)

Download references

Acknowledgements

This research was partially funded by the HathiTrust Research Center, which we thank for its support. In addition, we are grateful to the Douban and Goodreads book reviewers who shared their ratings and reviews. This research would be impossible without their contributions. We would also like to thank our editors and reviewers of both the International Journal on Digital Libraries and ACM/IEEE Joint Conference on Digital Libraries 2022 (JCDL 2022) for their critical feedback. At the same time, we want to extend our gratitude to our colleagues from the iConference 2022, the 85th Annual Meeting of the Association for Information Science and Technology (ASIS &T 2022), and the Cultural Data Analytics Open Lab at Tallinn University for their insightful suggestions and critical questions. In particular, we want to thank Dr. Federico Pianzola, Dr. Peter Boot, Dr. Maximilian Schich, Dr. Chico Q. Camargo, Dr. Gary Burnett, Dr. Maria Antoniak, Dr. Melanie Walsh, Dr. Melissa Ocepek, Dr. Sara Rachel Benson, Dr. Alaine Martaus, and Dr. Andrew Adrian Pua for their much-appreciated feedback and advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuerong Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Y., LeBlanc, Z., Diesner, J. et al. Complexities of leveraging user-generated book reviews for scholarly research: transiency, power dynamics, and cultural dependency. Int J Digit Libr (2023). https://doi.org/10.1007/s00799-023-00376-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00799-023-00376-z

Keywords

Navigation