Analyzing Social Book Reading Behavior on Goodreads and How It Predicts Amazon Best Sellers

  • Suman Kalyan MaityEmail author
  • Abhishek Panigrahi
  • Animesh Mukherjee
Part of the Lecture Notes in Social Networks book series (LNSN)


A book’s success/popularity depends on various parameters: extrinsic and intrinsic. In this paper, we study how the book reading characteristics might influence the popularity of a book. Towards this objective, we perform a cross-platform study of Goodreads entities and attempt to establish the connection between various Goodreads entities and the popular books (“Amazon best sellers”). We analyze the collective reading behavior on Goodreads platform and quantify various characteristic features of the Goodreads entities to identify differences between these Amazon best sellers (ABS) and the other non-best-selling books. We then develop a prediction model using the characteristic features to predict if a book shall become a best seller after 1 month (15 days) since its publication. On a balanced set, we are able to achieve a very high average accuracy of 88.72% (85.66%) for the prediction where the other competitive class contains books which are randomly selected from the Goodreads dataset. Our method primarily based on features derived from user posts and genre-related characteristic properties achieves an improvement of 16.4% over the traditional popularity factor (ratings, reviews)-based baseline methods. We also evaluate our model with two more competitive sets of books (a) that are both highly rated and have received a large number of reviews (but are not best sellers) (HRHR) and (b) Goodreads Choice Awards Nominated books which are non-best sellers (GCAN). We are able to achieve quite good results with very high average accuracy of 87.1% as well as high ROC for ABS vs GCAN. For ABS vs HRHR, our model yields a high average accuracy of 86.22%.


  1. 1.
    E. Baumer, M. Sueyoshi, B. Tomlinson, Exploring the role of the reader in the activity of blogging, in CHI (2008), pp. 1111–1120Google Scholar
  2. 2.
    E.P. Baumer, M. Sueyoshi, B. Tomlinson, Bloggers and readers blogging together: collaborative co-creation of political blogs. Comput. Supported Coop. Work 20(1–2), 1–36 (2011)CrossRefGoogle Scholar
  3. 3.
    S. Follmer, R.T. Ballagas, H. Raffle, M. Spasojevic, H. Ishii, People in books: Using a flashcam to become part of an interactive book for connected reading, in CSCW, 685–694 (2012)CrossRefGoogle Scholar
  4. 4.
    B.A. Nardi, D.J. Schiano, M. Gumbrecht, Blogging as social activity, or, would you let 900 million people read your diary? in CSCW, 222–231 (2004)Google Scholar
  5. 5.
    H. Raffle, R. Ballagas, G. Revelle, H. Horii, S. Follmer, J. Go, E. Reardon, K. Mori, J. Kaye, M. Spasojevic, Family story play: Reading with young children (and elmo) over a distance, in CHI, pp. 1583–1592 (2010)Google Scholar
  6. 6.
    J.W. Hall, Hit Lit: Cracking the Code of the Twentieth Century’s Biggest Bestsellers (Random House, New York, 2012)Google Scholar
  7. 7.
    S.K. Maity, A. Panigrahi, A. Mukherjee, Book reading behavior on goodreads can predict the amazon best sellers, in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ASONAM ’17 (2017), pp. 451–454Google Scholar
  8. 8.
    A. Ellegård, A Statistical Method for Determining Authorship: The Junius Letters, vol. 13 (Acta Universitatis Gothoburgensis, Göteborg, 1962), pp. 1769–1772Google Scholar
  9. 9.
    J. Harvey, The content characteristics of best-selling novels. Public Opin. Q. 17(1), 91–114 (1953)CrossRefGoogle Scholar
  10. 10.
    J.J. McGann, The Poetics of Sensibility: A Revolution in Literary Style (Oxford University Press, Oxford, 1998)Google Scholar
  11. 11.
    C.J. Yun, Performance evaluation of intelligent prediction models on the popularity of motion pictures, in 2011 4th International Conference on Interaction Sciences (ICIS) (IEEE, New York, 2011), pp. 118–123Google Scholar
  12. 12.
    V.G. Ashok, S. Feng, Y. Choi, Success with style: using writing style to predict the success of novels, in Proceedings of EMNLP (2013), pp. 1753–1764Google Scholar
  13. 13.
    R. Gunning, The Technique of Clear Writing (McGraw-Hill, New York, 1952)Google Scholar
  14. 14.
    G.H. Mc Laughlin, Smog grading-a new readability formula. J. Read. 12(8), 639–646 (1969)Google Scholar
  15. 15.
    J.P. Kincaid, R.P. Fishburne Jr, R.L. Rogers, B.S. Chissom, Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, DTIC Document (1975)CrossRefGoogle Scholar
  16. 16.
    A. Stenner, I. Horabin, D.R. Smith, M. Smith, The Lexile Framework (MetaMetrics, Durham, 1988)Google Scholar
  17. 17.
    E. Fry, A readability formula for short passages. J. Read. 33(8), 594–597 (1990)Google Scholar
  18. 18.
    J.S. Chall, E. Dale, Readability Revisited: The New Dale-Chall Readability Formula (Brookline Books, Brookline, 1995)Google Scholar
  19. 19.
    A. Louis, Automatic metrics for genre-specific text quality, in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, Association for Computational Linguistics (2012), pp. 54–59Google Scholar
  20. 20.
    R.J. Kate, X. Luo, S. Patwardhan, M. Franz, R. Florian, R.J. Mooney, S. Roukos, C. Welty, Learning to predict readability using diverse linguistic features, in Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics (2010), pp. 546–554Google Scholar
  21. 21.
    S.E. Schwarm, M. Ostendorf, Reading level assessment using support vector machines and statistical language models, in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics (2005), pp. 523–530Google Scholar
  22. 22.
    M. Heilman, M. Eskenazi, Language learning: challenges for intelligent tutoring systems, in Proceedings of the Workshop of Intelligent Tutoring Systems for Ill-Defined Tutoring Systems. Eight International Conference on Intelligent Tutoring Systems (2006), pp. 20–28Google Scholar
  23. 23.
    K. Collins-Thompson, J.P. Callan, A language modeling approach to predicting reading difficulty, in HLT-NAACL (2004), pp. 193–200Google Scholar
  24. 24.
    E. Pitler, A. Nenkova, Revisiting readability: a unified framework for predicting text quality, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (2008), pp. 186–195Google Scholar
  25. 25.
    S. Raghavan, A. Kovashka, R. Mooney, Authorship attribution using probabilistic context-free grammars, in Proceedings of the ACL 2010 Conference Short Papers, Association for Computational Linguistics (2010), pp. 38–42Google Scholar
  26. 26.
    S. Feng, R. Banerjee, Y. Choi, Characterizing stylistic elements in syntactic structure, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics (2012), pp. 1522–1533Google Scholar
  27. 27.
    F. Peng, D. Schuurmans, S. Wang, V. Keselj, Language independent authorship attribution using character level language models, in Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 1, Association for Computational Linguistics (2003), pp. 267–274Google Scholar
  28. 28.
    H.J. Escalante, T. Solorio, M. Montes-y Gómez, Local histograms of character n-grams for authorship attribution, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Association for Computational Linguistics (2011), pp. 288–298Google Scholar
  29. 29.
    E. Stamatatos, N. Fakotakis, G. Kokkinakis, Automatic authorship attribution, in Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, Association for Computational Linguistics (1999), pp. 158–164Google Scholar
  30. 30.
    H. Baayen, H. Van Halteren, F. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)CrossRefGoogle Scholar
  31. 31.
    V.J. Rideout, E.A. Vandewater, E.A. Wartella, Zero to six: electronic media in the lives of infants, toddlers and preschoolers (2003)Google Scholar
  32. 32.
    H. Chen, X. Li, Z. Huang, Link prediction approach to collaborative filtering, in Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, 2005, JCDL’05 (IEEE, New York, 2005), pp. 141–142Google Scholar
  33. 33.
    J. Kamps, The impact of author ranking in a library catalogue, in Proceedings of the 4th ACM Workshop on Online Books, Complementary Social Media and Crowdsourcing (ACM, New York, 2011), pp. 35–40Google Scholar
  34. 34.
    P.C. Vaz, D. Martins de Matos, B. Martins, P. Calado, Improving a hybrid literary book recommendation system through author ranking, in Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries (ACM, New York, 2012), pp. 387–388Google Scholar
  35. 35.
    Z. Zhu, J.Y. Wang, Book recommendation service by improved association rule mining algorithm, in 2007 International Conference on Machine Learning and Cybernetics, vol. 7 (IEEE, New York, 2007), pp. 3864–3869Google Scholar
  36. 36.
    P.C. Vaz, D. Martins de Matos, B. Martins, Stylometric relevance-feedback towards a hybrid book recommendation algorithm, in Proceedings of the fifth ACM Workshop on Research Advances in Large Digital Book Repositories and Complementary Media (ACM, New York, 2012), pp. 13–16Google Scholar
  37. 37.
    X. Yang, H. Zeng, Y. Huang, Artmap-based data mining approach and its application to library book recommendation, in 2009 International Symposium on Intelligent Ubiquitous Computing and Education (IEEE, New York, 2009), pp. 26–29Google Scholar
  38. 38.
    S. Givon, V. Lavrenko, Predicting social-tags for cold start book recommendations, in Proceedings of the Third ACM Conference on Recommender Systems (ACM, New York, 2009), pp. 333–336Google Scholar
  39. 39.
    M. Zhou, Book recommendation based on web social network, in International Conference on Artificial Intelligence and Education (ICAIE) (IEEE, New York, 2010), pp. 136–139Google Scholar
  40. 40.
    M.S. Pera, Y.K. Ng, What to read next?: making personalized book recommendations for k-12 users, in Proceedings of the 7th ACM Conference on Recommender Systems (ACM, New York, 2013), pp. 113–120Google Scholar
  41. 41.
    M.S. Pera, Y.K. Ng, Automating readers’ advisory to make book recommendations for k-12 readers, in Proceedings of the 8th ACM Conference on Recommender Systems (ACM, New York, 2014), pp. 9–16Google Scholar
  42. 42.
    M.S. Pera, Y.K. Ng, Analyzing book-related features to recommend books for emergent readers, in Proceedings of the 26th ACM Conference on Hypertext & Social Media (ACM, New York, 2015), pp. 221–230Google Scholar
  43. 43.
    S. Dimitrov, F. Zamal, A. Piper, D. Ruths, Goodreads vs amazon: the effect of decoupling book reviewing and book selling, in Proceedings of ICWSM ’15 (2015)Google Scholar
  44. 44.
    A. Worrall, “Back onto the tracks”: convergent community boundaries in librarything and goodreads, in 9th Annual Social Informatics Research Symposium (2013)Google Scholar
  45. 45.
    M. Thelwal, K. Kousha, Goodreads: a social network site for book readers. J. Assoc. Inf. Sci. Technol. 68(4), 972–983 (2017)CrossRefGoogle Scholar
  46. 46.
    M. Thelwall, Book genre and author gender: Romance > paranormal-romance to autobiography > memoir. J. Assoc. Inf. Sci. Technol. 68(5), 1212–1223 (2017)CrossRefGoogle Scholar
  47. 47.
    S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic keyword extraction from individual documents, in Text Mining (2010), pp. 1–20Google Scholar
  48. 48.
    L. Deng, J. Wiebe, Mpqa 3.0: an entity/event-level sentiment corpus, in Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies (2015)Google Scholar
  49. 49.
    J.W. Pennebaker, M.E. Francis, R.J. Booth, Linguistic Inquiry and Word Count (Lawerence Erlbaum Associates, Mahwah, 2001)Google Scholar
  50. 50.
    M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The weka data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Suman Kalyan Maity
    • 1
    Email author
  • Abhishek Panigrahi
    • 2
  • Animesh Mukherjee
    • 3
  1. 1.Kellogg School of Management and Northwestern Institute on Complex SystemsNorthwestern UniversityEvanstonUSA
  2. 2.Microsoft Research IndiaBengaluruIndia
  3. 3.Department of Computer Science and EngineeringIndian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations