Extraction of Top-k List by Using Web Mining Technique

Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 10)


In present days, finding relevant and desired information in less time is very crucial, however, problem is that very small proportion data on internet is interpretable and meaningful and need lot of time to extract. The paper provides solution to problem by extracting information from top-k websites, which consist top-k instances of a subject. For Example “top 5 football teams in the world”. In comparison with other structured information like web tables top-k lists contains high quality information. It can be used to enhance open-domain knowledge base (which can support search or fact answering applications). Proposed system in paper extract the top-k list by using title classifier, parser, candidate picker, ranker, content processor.


Top-k list Information extraction Top-k web pages Structured information 


  1. 1.
    Zhixian Zhang, Kenny Q. Zhu, Haixun Wang Hong song Li, “Automatic top k list extraction from web” IEEE, ICDE Conference, 2013, 978-1-4673-4910-9.Google Scholar
  2. 2.
    J. Wang, H. Wang, Z. Wang, and K. Q. Zhu, “Understanding tables on the web,” in ER, 2012, pp. 141–155.Google Scholar
  3. 3.
    M. J. Cafarella, E. Wu, A. Halevy, Y. Zhang, and D. Z. Wang, “Web tables: Exploring the power of tables on the web,” in VLDB, 2008.Google Scholar
  4. 4.
    Z. Zhang, K. Q. Zhu, and H. Wang, “A System for extracting top k list from web” in KDD, 2012.Google Scholar
  5. 5.
    F. Fumarola, T. Weninger, R. Barber, D. Malerba, and J. Han, “Extracting general lists from web document: A hybrid approach,” in IEA/AIE (1), 2011, pp. 285–294.Google Scholar
  6. 6.
    Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen, “Short text conceptualization using a probabilistic knowledgebase,” in IJCAI, 2011.Google Scholar
  7. 7.
    G. Miao, J. Tatemura, W.-P. Hsiung, A. Sawires, and L. E. Moser, “Extracting data records from the web using tag path clustering,” in WWW, 2009, pp. 981–990.Google Scholar
  8. 8.
    W. Gatterbauer, P. Bohunsky, M. Herzog, B. Krupl, and B. Pollak, “Towards domain-independent information extraction from web tables,” in WWW. ACM Press, 2007, pp. 71–80.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Computer EngineeringDr. D.Y. Patil Institute of Engineering and Technology (DYPIET)Pimpri, PuneIndia

Personalised recommendations