Skip to main content

Automatic Classification of Web Search Results: Product Review vs. Non-review Documents

  • Conference paper
Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers (ICADL 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4822))

Included in the following conference series:

Abstract

This study seeks to develop an automatic method to identify product review documents on the Web using the snippets (summary information that includes the URL, title, and summary text) returned by the Web search engine. The aim is to allow the user to extend topical search with genre-based filtering or categorization. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of the snippets are useful for classification. The best results were obtained using just the title and URL (domain and folder names) of the snippets as phrase terms (n-grams). Then we developed a heuristic approach that utilizes domain knowledge constructed semi-automatically, and found that it performs comparatively well, with only a small drop in accuracy rates. A hybrid approach which combines both the machine learning and heuristic approaches performs slightly better than the machine learning approach alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boese, E.S., Howe, A.E.: Effects of Web Document Evolution on Genre Classification. In: Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM 2005), Bremen, Germany, pp. 632–639 (2005)

    Google Scholar 

  2. Chen, H., Dumais, S.T.: Bringing Order to the Web: Automatically Categorizing Search Results. In: Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2000), pp. 145–152 (2000)

    Google Scholar 

  3. Choi, B., Yao, Z.: Web Page Classification, Foundations and Advances in Data Mining, Studies in Fuzziness and Soft Computing, vol. 180, pp. 221–274. Springer, Berlin (2005)

    Google Scholar 

  4. Finn, A., Kushmerick, N., Smyth, B.: Genre classification and domain transfer for information filtering. In: Crestani, F., Girolami, M., van Rijsbergen, C.J.K. (eds.) Advances in Information Retrieval. LNCS, vol. 2291, pp. 353–362. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of 10th European Conference on Machine-learning, Chemnitz, Germany, April 21-24, pp. 137–142 (1998)

    Google Scholar 

  6. Jones, K.S., Willet, P.: Readings in Information Retrieval. Morgan Kaufman, San Francisco (1997)

    Google Scholar 

  7. Kessler, B., Nunberg, G., Schutze, H.: Automatic detection of text genre. In: Proceedings of the Eighth Conference on European Chapter of the ACL (Association for Computational Linguistics), pp. 32–38 (1997)

    Google Scholar 

  8. Na, J.-C., Khoo, C., Chan, S., Hamzah, N.B.: A sentiment-based search in digital libraries. In: Proceedings of Joint Conference on Digital Libraries 2005 (JCDL 2005), Denver, pp. 143–144 (2005)

    Google Scholar 

  9. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine-learning techniques. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, July 6-7, pp. 79–86 (2002)

    Google Scholar 

  10. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Francisco (1993)

    Google Scholar 

  11. Sebastiani, F.: Machine-learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  12. Thet, T.T., Na, J.-C., Khoo, C.S.G.: Filtering Product Reviews from Web Search Results. In: Proceedings of ACM Symposium on Document Engineering (DocEng 2007), Winnipeg, Canada (August 28 - 31, 2007)

    Google Scholar 

  13. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  14. Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to Cluster Web Search Results. In: Proceedings of the 27th Annual International ACM SIGIR Conference, Sheffield, UK, pp. 210–217 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dion Hoe-Lian Goh Tru Hoang Cao Ingeborg Torvik Sølvberg Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Thet, T.T., Na, JC., Khoo, C.S.G. (2007). Automatic Classification of Web Search Results: Product Review vs. Non-review Documents. In: Goh, D.HL., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. ICADL 2007. Lecture Notes in Computer Science, vol 4822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77094-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77094-7_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77093-0

  • Online ISBN: 978-3-540-77094-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics