Provenance Based Web Search

Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 182)


During web search, we often end up with untrusted, duplicates and near duplicate search results which dilutes the focus of search query. Factors that may influence the trust of web search results shall be referred to as ’Provenance’. Provenance is basically the information about the history of data. In this paper, we propose a provenance model which uses both content based and trust based factors in identifying trusted search results. The novelty of our idea lies in attempting to construct a provenance matrix which encompasses 6 factors (who, where, when, what, why, how) related to the search results. Inferences performed over the provenance matrix leads to trust score which is then utilized to remove near-duplicates and retrieve trusted search results.


Web search Provenance Mining Provenance Matrix Near- Duplicates Trust Semantics Document Clustering Ontology 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Singh, B., Singh, H.K.: Web Data Mining Research: A Survey. In: Computational Intelligence and Computing Research (ICCIC), pp. 1–10 (2010)Google Scholar
  2. 2.
    Hartig, O.: Provenance Information in the Web of Data. In: Proceedings of the Linked Data on the Web (LDOW) Workshop at the World Wide Web Conference (WWW), Madrid, Spain, pp. 1–7 (April 2009)Google Scholar
  3. 3.
    Ma, Q., Miyamori, H., Kidawara, Y., Tanaka, K.: Content-coverage Based Trust-oriented Evaluation Method for Information Retrieval. In: Proceedings of the Second International Conference on Semantics, Knowledge, and Grid (SKG 2006), pp. 22–26 (2006)Google Scholar
  4. 4.
    Li, X., Yang, Q., Zeng, L.: Clustering Web Retrieval Results Accompanied by Removing Duplicate Documents. In: 2010 International Conference on Web Information Systems and Mining, pp. 259–261 (2010)Google Scholar
  5. 5.
    Bollegala, D., Matsuo, Y., Ishizuka, M.: A Web Search Engine-Based Approach to Measure Semantic Similarity between Words. IEEE Transactions on Knowledge and Data Engineering 23, 977–990 (2011)CrossRefGoogle Scholar
  6. 6.
    Anderson, N.: Putting Search in Context: Using Dynamically-Weighted Information Fusion to Improve Search Results. In: 2011 Eighth International Conference on Information Technology, pp. 66–71 (2011)Google Scholar
  7. 7.
    Pandey, S.K., Mishra, R.B.: Intelligent Web Mining Model to Enhance Knowledge Discovery on the Web. In: Proceedings of the Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 339–343 (2006)Google Scholar
  8. 8.
    Taylan, D., Poyraz, M., Akyokuş, S., Ganiz, M.C.: Intelligent Focused Crawler: Learning which Links to crawl, pp. 504–508. IEEE (2011)Google Scholar
  9. 9.
    Tanaka, K.: Knowledge Search and Trust-oriented Search. In: International Conference on Informatics Education and Research for Knowledge-Circulating Society, pp. 81–86 (2008)Google Scholar
  10. 10.
    Huang, C., Chen, Y., Wang, W., Cui, Y., Wang, H., Du, N.: A novel social search model based on trust and popularity. In: Proceedings of IC-BNMT, pp. 1030–1034 (2010)Google Scholar
  11. 11.
    Vasquez, I., Gomadam, K., Patterson, S.: Data Provenance in next-gen information systems: Adding, extracting and analyzing information in the Web services domain (2008)Google Scholar
  12. 12.
    Syed Mudhasir, Y., Deepika, J., Sendhilkumar, S., Mahalakshmi, G.S.: Near-Duplicates De-tection and Elimination Based on Web Provenance for Effective Web Search. (IJIDCS) International Journal on Internet and Distributed Computing Systems 1(1), 22–32 (2011)Google Scholar
  13. 13.
    Subhashini, R., Akilandeswari, J.: A Survey On Ontology Construction Methodologies. International Journal of Enterprise Computing and Business Systems 1(1), 60–72 (2011)Google Scholar
  14. 14.
    Biryukov, M., Wang, Y.: Classification of Personal Names with Application to DBLP. In: Third International Conference on Digital Information Management (ICDIM), pp. 131–137 (2008)Google Scholar
  15. 15.
    Beel, J., Gipp, B.: Google Scholar’s ranking algorithm: The impact of citation counts (An empirical study). In: Third International Conference on Research Challenges in Information Science (RCIS), pp. 439–446 (2009)Google Scholar
  16. 16.
    Poomagal, S., Hamsapriya, T.: K-Means for Search Results Clustering Using URL and Tag Contents. In: International Conference on Process Automation, Control and Computing (PACC), pp. 1–7 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Information Science & TechnologyCollege of Engineering Guindy, Anna UniversityChennaiIndia

Personalised recommendations