Ranking Techniques for Finding Correlated Webpages
In general,when users try to search information, they can have difficulties to express the information as exact queries. Therefore, users consume many times to find useful webpages. Previous techniques could not solve the problem effectively. In this paper, we propose an algorithm, RCW (Ranking technique for finding Correlated Webpages) for improving previous ranking techniques. Our method makes it possible to retrieve not only basic webpages but also correlated webpages. Therefore, RCW algorithm in this paper can help users easily look for meaningful information without using exact queries. To find correlated webpages, the algorithm applies a novel technique for computing correlations among webpages. In performance evaluation, we test precision, recall, and NDCG of our RCW compared with the other popular system. In this result, RCW guarantees that itfinds the number of correlated webpages greater than the other method, and shows high ratios in terms of precision, recall, and NDCG.
KeywordsWebpage analysis Correlation searching Ranking technique Information retrieval
This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 2012-0003740 and 2012-0000478).
- 1.Hulth A, Karlren J, Jonsson A, Bostrom H, Asker L (2010) Automatic keyword extraction using domain knowledge. Lect Notes Comput Sci 472–482Google Scholar
- 2.Ishii H, Tempo R (2010) Distributed randomized algorithms for the page rank computation. IEEE Control Syst Soc 55(9):1987–2002Google Scholar
- 3.Ermelinda O, Massimo R (2011) Towards a spatial instance learning method for deep web pages. In: Industrial conference on data mining (ICDM), pp 270–285Google Scholar
- 4.Fu L, Mmeng Y, Xia Y, Yu H (2010) Web content extraction based on webpage layout analysis. In: Information technology and computer science (ITCS), pp 40–43Google Scholar
- 5.Baillie M, Carman M, Crestani F (2011) A multi-collection latent topic model for federated search. Inf Retrieval 14(4):390–412Google Scholar
- 6.Ricardo Y, Carlos C, Flavio J, Vassilis P, Fabrizio S (2007) Challenges on distributed web retrieval. In: International conference on data engineering, pp 15–20Google Scholar
- 7.Flora T (2011) Web-based geographic search engine for location-aware search in Singapore. Expert Syst Appl (ESWA) 38(1):1011–1016Google Scholar
- 8.Song G, Yajie M, Liu Y, Chunping L (2009) Topic-based computing model for web page popularity and website influence. In: Australasian conference on artificial intelligence, pp 210–219Google Scholar
- 9.Costantinos D, Christos M, Yannis P, Evangelos T, Athanasios T (2010) A web page usage prediction scheme using sequence indexing and clustering techniques. Data Knowl Eng (DKE) 69(4):371–382Google Scholar
- 10.Sandeepkumar S, Sahely B, Sundararajan S, Rajeev R, Prithviraj S (2011) Web information extraction using markov logic networks. In: Knowledge discovery and data mining (KDD), pp 1406–1414Google Scholar
- 11.Metzler D (2008) Generalized inverse document frequency. In: Conference on information and knowledge management, pp 399–408Google Scholar
- 12.CLucene Project web page http://clucene.sourceforge.net/