Extracting and Summarizing Hot Item Features Across Different Auction Web Sites

  • Tak-Lam Wong
  • Wai Lam
  • Shing-Kit Chan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)


Online auction Web sites are fast changing and highly dynamic. It is difficult to digest the poorly organized and vast amount of information contained in the auction sites. We develop a unified framework aiming at automatically extracting the product features and summarizing the hot item features across different auction Web sites. One challenge of this problem is to extract useful information from the product descriptions provided by the sellers, which vary largely in the layout format. We formulate the problem as a single graph labeling problem using conditional random fields which can model the relationship among the neighbouring tokens in a Web page, the tokens from different pages, as well as various information such as the hot item features across different auction sites. We have conducted extensive experiments from several real-world auction Web sites to demonstrate the effectiveness of our framework.


Product Feature Information Extraction Online Auction Potential Buyer Text Fragment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of Eighteenth International Conference on Machine Learning (ICML), pp. 282–289 (2001)Google Scholar
  3. 3.
    Ghani, R.: Price prediction and insurance for online auctions. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 411–418 (2005)Google Scholar
  4. 4.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 168–177 (2004)Google Scholar
  5. 5.
    Yi, J., Niblack, W.: Sentiment mining in WebFountain. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 1073–1083 (2005)Google Scholar
  6. 6.
    Popescu, A., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of the Human Language Technology Conference Conference on Empirical Methods in Natural Language Processing. (2005)Google Scholar
  7. 7.
    Etzioni, O., Cafarella, M., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupservised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)CrossRefGoogle Scholar
  8. 8.
    Mani, I., Maybury, M.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)Google Scholar
  9. 9.
    Kushmerick, N., Thomas, B.: Adaptive information extraction: Core technologies for information agents. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 79–103. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Muslea, I., Minton, S., Knoblock, C.: Hierarchical wrapper induction for semistructured information sources. Journal of Autonomous Agents and Multi-Agent Systems 4(1-2), 93–114 (2001)CrossRefGoogle Scholar
  11. 11.
    Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 20–29 (2004)Google Scholar
  12. 12.
    Crescenzi, V., Mecca, G.: Automatic information extraction from large websites. Journal of the ACM 51(5), 731–779 (2004)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Freitag, D., McCallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence, AAAI (2000)Google Scholar
  14. 14.
    Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Wong, T.L., Lam, W.: A probabilistic approach for adapting information extraction wrappers and discovering new attributes. In: Proceedings of the 2004 IEEE International Conference on Data Mining (ICDM), pp. 257–264 (2004)Google Scholar
  16. 16.
    Chang, C., Lui, S.C.: IEPAD: information extraction based on pattern discovery. In: Proceedings of the Tenth International Conference on World Wide Web (WWW), pp. 681–688 (2001)Google Scholar
  17. 17.
    Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 601–606 (2003)Google Scholar
  18. 18.
    Crescenzi, V., Mecca, G., Merialdo, P.: ROADRUNNER: Towards automatic data extraction from large web sites. In: Proceedings of the 27th Very Large Databases Conference (VLDB), pp. 109–118 (2001)Google Scholar
  19. 19.
    McCallum, A., Jensen, D.: A note on the unification of information extraction and data mining using conditional-probability, relational models. In: Proceedings of the IJCAI Workshop on Learning Statistical Models from Relational Data (2003)Google Scholar
  20. 20.
    Wellner, B., McCallum, A., Peng, F., Hay, M.: An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 593–601 (2004)Google Scholar
  21. 21.
    McCallum, A., Wellner, B.: Toward conditional models of identity uncertainty with application to proper noun coreference. In: Proceedings of the IJCAI Workshop on Information Integration on the Web (2003)Google Scholar
  22. 22.
    Bunescu, R., Mooney, R.: Collective information extraction with relational markov networkds. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 439–446 (2004)Google Scholar
  23. 23.
    Kschischang, F., Frey, B., Loeliger, H.: Factor graphs and the sum-product algorithm. IEEE Transaction on Information Theory 47(2), 498–519 (2001)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Collins, M.: Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 489–496 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tak-Lam Wong
    • 1
  • Wai Lam
    • 1
  • Shing-Kit Chan
    • 1
  1. 1.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongShatinHong Kong

Personalised recommendations