Skip to main content

Analyzing Social Book Reading Behavior on Goodreads and How It Predicts Amazon Best Sellers

  • Chapter
  • First Online:
Influence and Behavior Analysis in Social Networks and Social Media (ASONAM 2018)

Abstract

A book’s success/popularity depends on various parameters: extrinsic and intrinsic. In this paper, we study how the book reading characteristics might influence the popularity of a book. Towards this objective, we perform a cross-platform study of Goodreads entities and attempt to establish the connection between various Goodreads entities and the popular books (“Amazon best sellers”). We analyze the collective reading behavior on Goodreads platform and quantify various characteristic features of the Goodreads entities to identify differences between these Amazon best sellers (ABS) and the other non-best-selling books. We then develop a prediction model using the characteristic features to predict if a book shall become a best seller after 1 month (15 days) since its publication. On a balanced set, we are able to achieve a very high average accuracy of 88.72% (85.66%) for the prediction where the other competitive class contains books which are randomly selected from the Goodreads dataset. Our method primarily based on features derived from user posts and genre-related characteristic properties achieves an improvement of 16.4% over the traditional popularity factor (ratings, reviews)-based baseline methods. We also evaluate our model with two more competitive sets of books (a) that are both highly rated and have received a large number of reviews (but are not best sellers) (HRHR) and (b) Goodreads Choice Awards Nominated books which are non-best sellers (GCAN). We are able to achieve quite good results with very high average accuracy of 87.1% as well as high ROC for ABS vs GCAN. For ABS vs HRHR, our model yields a high average accuracy of 86.22%.

This research had been performed when all the researchers were at IIT Kharagpur, India.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In Goodreads, a book shelf is a list where one can add or remove books to facilitate reading similar to real-life book shelf where one keep books.

  2. 2.

    https://hunterswritings.com/2012/10/12/elements-of-the-psychological-thriller-mystery-suspense-andor-crime-fiction-genres/.

  3. 3.

    http://www.writersdigest.com/wp-content/uploads/Essential_Elements.pdf.

  4. 4.

    This research is an extension of our earlier published work [7] at ASONAM ’2017 and reporting a much more detailed analysis emphasizing various aspects of social book reading in more detail and perform detailed comparison of the best sellers with other kind of competitors

  5. 5.

    http://www.amazon.com/gp/bestsellers/1995/books.

  6. 6.

    https://github.com/aneesha/RAKE

  7. 7.

    https://www.goodreads.com/choiceawards/.

References

  1. E. Baumer, M. Sueyoshi, B. Tomlinson, Exploring the role of the reader in the activity of blogging, in CHI (2008), pp. 1111–1120

    Google Scholar 

  2. E.P. Baumer, M. Sueyoshi, B. Tomlinson, Bloggers and readers blogging together: collaborative co-creation of political blogs. Comput. Supported Coop. Work 20(1–2), 1–36 (2011)

    Article  Google Scholar 

  3. S. Follmer, R.T. Ballagas, H. Raffle, M. Spasojevic, H. Ishii, People in books: Using a flashcam to become part of an interactive book for connected reading, in CSCW, 685–694 (2012)

    Book  Google Scholar 

  4. B.A. Nardi, D.J. Schiano, M. Gumbrecht, Blogging as social activity, or, would you let 900 million people read your diary? in CSCW, 222–231 (2004)

    Google Scholar 

  5. H. Raffle, R. Ballagas, G. Revelle, H. Horii, S. Follmer, J. Go, E. Reardon, K. Mori, J. Kaye, M. Spasojevic, Family story play: Reading with young children (and elmo) over a distance, in CHI, pp. 1583–1592 (2010)

    Google Scholar 

  6. J.W. Hall, Hit Lit: Cracking the Code of the Twentieth Century’s Biggest Bestsellers (Random House, New York, 2012)

    Google Scholar 

  7. S.K. Maity, A. Panigrahi, A. Mukherjee, Book reading behavior on goodreads can predict the amazon best sellers, in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ASONAM ’17 (2017), pp. 451–454

    Google Scholar 

  8. A. Ellegård, A Statistical Method for Determining Authorship: The Junius Letters, vol. 13 (Acta Universitatis Gothoburgensis, Göteborg, 1962), pp. 1769–1772

    Google Scholar 

  9. J. Harvey, The content characteristics of best-selling novels. Public Opin. Q. 17(1), 91–114 (1953)

    Article  Google Scholar 

  10. J.J. McGann, The Poetics of Sensibility: A Revolution in Literary Style (Oxford University Press, Oxford, 1998)

    Google Scholar 

  11. C.J. Yun, Performance evaluation of intelligent prediction models on the popularity of motion pictures, in 2011 4th International Conference on Interaction Sciences (ICIS) (IEEE, New York, 2011), pp. 118–123

    Google Scholar 

  12. V.G. Ashok, S. Feng, Y. Choi, Success with style: using writing style to predict the success of novels, in Proceedings of EMNLP (2013), pp. 1753–1764

    Google Scholar 

  13. R. Gunning, The Technique of Clear Writing (McGraw-Hill, New York, 1952)

    Google Scholar 

  14. G.H. Mc Laughlin, Smog grading-a new readability formula. J. Read. 12(8), 639–646 (1969)

    Google Scholar 

  15. J.P. Kincaid, R.P. Fishburne Jr, R.L. Rogers, B.S. Chissom, Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, DTIC Document (1975)

    Book  Google Scholar 

  16. A. Stenner, I. Horabin, D.R. Smith, M. Smith, The Lexile Framework (MetaMetrics, Durham, 1988)

    Google Scholar 

  17. E. Fry, A readability formula for short passages. J. Read. 33(8), 594–597 (1990)

    Google Scholar 

  18. J.S. Chall, E. Dale, Readability Revisited: The New Dale-Chall Readability Formula (Brookline Books, Brookline, 1995)

    Google Scholar 

  19. A. Louis, Automatic metrics for genre-specific text quality, in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, Association for Computational Linguistics (2012), pp. 54–59

    Google Scholar 

  20. R.J. Kate, X. Luo, S. Patwardhan, M. Franz, R. Florian, R.J. Mooney, S. Roukos, C. Welty, Learning to predict readability using diverse linguistic features, in Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics (2010), pp. 546–554

    Google Scholar 

  21. S.E. Schwarm, M. Ostendorf, Reading level assessment using support vector machines and statistical language models, in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics (2005), pp. 523–530

    Google Scholar 

  22. M. Heilman, M. Eskenazi, Language learning: challenges for intelligent tutoring systems, in Proceedings of the Workshop of Intelligent Tutoring Systems for Ill-Defined Tutoring Systems. Eight International Conference on Intelligent Tutoring Systems (2006), pp. 20–28

    Google Scholar 

  23. K. Collins-Thompson, J.P. Callan, A language modeling approach to predicting reading difficulty, in HLT-NAACL (2004), pp. 193–200

    Google Scholar 

  24. E. Pitler, A. Nenkova, Revisiting readability: a unified framework for predicting text quality, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (2008), pp. 186–195

    Google Scholar 

  25. S. Raghavan, A. Kovashka, R. Mooney, Authorship attribution using probabilistic context-free grammars, in Proceedings of the ACL 2010 Conference Short Papers, Association for Computational Linguistics (2010), pp. 38–42

    Google Scholar 

  26. S. Feng, R. Banerjee, Y. Choi, Characterizing stylistic elements in syntactic structure, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics (2012), pp. 1522–1533

    Google Scholar 

  27. F. Peng, D. Schuurmans, S. Wang, V. Keselj, Language independent authorship attribution using character level language models, in Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 1, Association for Computational Linguistics (2003), pp. 267–274

    Google Scholar 

  28. H.J. Escalante, T. Solorio, M. Montes-y Gómez, Local histograms of character n-grams for authorship attribution, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Association for Computational Linguistics (2011), pp. 288–298

    Google Scholar 

  29. E. Stamatatos, N. Fakotakis, G. Kokkinakis, Automatic authorship attribution, in Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, Association for Computational Linguistics (1999), pp. 158–164

    Google Scholar 

  30. H. Baayen, H. Van Halteren, F. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)

    Article  Google Scholar 

  31. V.J. Rideout, E.A. Vandewater, E.A. Wartella, Zero to six: electronic media in the lives of infants, toddlers and preschoolers (2003)

    Google Scholar 

  32. H. Chen, X. Li, Z. Huang, Link prediction approach to collaborative filtering, in Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, 2005, JCDL’05 (IEEE, New York, 2005), pp. 141–142

    Google Scholar 

  33. J. Kamps, The impact of author ranking in a library catalogue, in Proceedings of the 4th ACM Workshop on Online Books, Complementary Social Media and Crowdsourcing (ACM, New York, 2011), pp. 35–40

    Google Scholar 

  34. P.C. Vaz, D. Martins de Matos, B. Martins, P. Calado, Improving a hybrid literary book recommendation system through author ranking, in Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries (ACM, New York, 2012), pp. 387–388

    Google Scholar 

  35. Z. Zhu, J.Y. Wang, Book recommendation service by improved association rule mining algorithm, in 2007 International Conference on Machine Learning and Cybernetics, vol. 7 (IEEE, New York, 2007), pp. 3864–3869

    Google Scholar 

  36. P.C. Vaz, D. Martins de Matos, B. Martins, Stylometric relevance-feedback towards a hybrid book recommendation algorithm, in Proceedings of the fifth ACM Workshop on Research Advances in Large Digital Book Repositories and Complementary Media (ACM, New York, 2012), pp. 13–16

    Google Scholar 

  37. X. Yang, H. Zeng, Y. Huang, Artmap-based data mining approach and its application to library book recommendation, in 2009 International Symposium on Intelligent Ubiquitous Computing and Education (IEEE, New York, 2009), pp. 26–29

    Google Scholar 

  38. S. Givon, V. Lavrenko, Predicting social-tags for cold start book recommendations, in Proceedings of the Third ACM Conference on Recommender Systems (ACM, New York, 2009), pp. 333–336

    Book  Google Scholar 

  39. M. Zhou, Book recommendation based on web social network, in International Conference on Artificial Intelligence and Education (ICAIE) (IEEE, New York, 2010), pp. 136–139

    Google Scholar 

  40. M.S. Pera, Y.K. Ng, What to read next?: making personalized book recommendations for k-12 users, in Proceedings of the 7th ACM Conference on Recommender Systems (ACM, New York, 2013), pp. 113–120

    Google Scholar 

  41. M.S. Pera, Y.K. Ng, Automating readers’ advisory to make book recommendations for k-12 readers, in Proceedings of the 8th ACM Conference on Recommender Systems (ACM, New York, 2014), pp. 9–16

    Google Scholar 

  42. M.S. Pera, Y.K. Ng, Analyzing book-related features to recommend books for emergent readers, in Proceedings of the 26th ACM Conference on Hypertext & Social Media (ACM, New York, 2015), pp. 221–230

    Google Scholar 

  43. S. Dimitrov, F. Zamal, A. Piper, D. Ruths, Goodreads vs amazon: the effect of decoupling book reviewing and book selling, in Proceedings of ICWSM ’15 (2015)

    Google Scholar 

  44. A. Worrall, “Back onto the tracks”: convergent community boundaries in librarything and goodreads, in 9th Annual Social Informatics Research Symposium (2013)

    Google Scholar 

  45. M. Thelwal, K. Kousha, Goodreads: a social network site for book readers. J. Assoc. Inf. Sci. Technol. 68(4), 972–983 (2017)

    Article  Google Scholar 

  46. M. Thelwall, Book genre and author gender: Romance > paranormal-romance to autobiography > memoir. J. Assoc. Inf. Sci. Technol. 68(5), 1212–1223 (2017)

    Article  Google Scholar 

  47. S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic keyword extraction from individual documents, in Text Mining (2010), pp. 1–20

    Google Scholar 

  48. L. Deng, J. Wiebe, Mpqa 3.0: an entity/event-level sentiment corpus, in Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies (2015)

    Google Scholar 

  49. J.W. Pennebaker, M.E. Francis, R.J. Booth, Linguistic Inquiry and Word Count (Lawerence Erlbaum Associates, Mahwah, 2001)

    Google Scholar 

  50. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The weka data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suman Kalyan Maity .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Maity, S.K., Panigrahi, A., Mukherjee, A. (2019). Analyzing Social Book Reading Behavior on Goodreads and How It Predicts Amazon Best Sellers. In: Kaya, M., Alhajj, R. (eds) Influence and Behavior Analysis in Social Networks and Social Media. ASONAM 2018. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-02592-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02592-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02591-5

  • Online ISBN: 978-3-030-02592-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics