Advertisement

Semantic Recognition of Web Structure to Retrieve Relevant Documents from Google by Formulating Index Term

  • Jinat Ara
  • Hanif BhuiyanEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 924)

Abstract

Nowadays among various search mechanism, Google is one of the best information retrieval mechanisms for retrieving useful result as per user query. Generally, the complete searching process of Google is completed by crawler and indexing process. Actually, Google performs the documents indexing process considering various features of web documents (title, meta tags, keywords, etc.) which helps to fetches the complementary result by exactly matching the given query with the index term with user interest. Though appropriate indexing process is much difficult but it essential as extracting relevant document completely or partially depend on how much relevant the index term with the document is. However, sometimes, this indexing process is influenced by assorted number of feature of web documents which produced variegated result those are either completely or partially irrelevant to the search and seems unexpected. To reduce this problem, we analyze web documents considering its unstructured (web content and link features) data in terms of efficiency, quality, and relevancy with the user search query and present a keyword-based approach to formulate appropriate index term through semantic analysis using NLP concept. This approach helps to understand the current web structure (effectiveness, quality, and relevancy) and mitigate the current inconsistency problem by generating appropriate, efficient, and relevant index term which might improve the search quality; as satisfied search result is the major concern of Google search engine. The analysis helps to generate appropriate Google web documents index term which might useful to retrieve appropriate and relevant web documents more systematically than other existing approaches (lexicon and web structure based approaches). The experimental result demonstrates that, the proposed approach is an effective and efficient methodology to predict about the competence of web documents and finding appropriate and relevant web documents.

Keywords

Google Search engine Information retrieval Page ranking algorithm Index term and feature 

References

  1. 1.
    Brin, S., Page, L.: The anatomy of a large scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  2. 2.
    Madhavi, A., Chari, K.H.: Architecture based study of search engines and meta search for information retrieval. Int. J. Eng. Res. Technol. (IJERT) (2013)Google Scholar
  3. 3.
    Speretta, M., Gauch, S.:  Personalized search based on user search histories. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 622–628. IEEE Computer Society (2005)Google Scholar
  4. 4.
    Lieberman, M. D., Samet, H., Sankaranarayanan, J., Sperling, J.: STEWARD: architecture of a spatio-textual search engine. In: Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems, p. 25. ACM (2007)Google Scholar
  5. 5.
    Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: the Google cluster architecture. IEEE Micro. 2, 22–28 (2003)CrossRefGoogle Scholar
  6. 6.
    Horowitz, D., Kamvar, S.D.: The anatomy of a large scale social search engine. In: Proceedings of the 19th International Conference on World Wide Web, pp. 431–440. ACM (2010)Google Scholar
  7. 7.
    Ilyas, Q.M., Kai, Y.Z., Talib, M.A.: A conceptual architecture for semantic search engine. In: 9th IEEE International Multi Topic Conference Pakistan (2004)Google Scholar
  8. 8.
    Jones, C.B., Abdelmoty, A.I., Finch, D., Fu, G., Vaid, S.: The spirit spatial search engine: architecture, ontologies and spatial indexing. In: International Conference on Geographic Information Science. Springer, Berlin, Heidelberg, pp. 125–139 (2004)CrossRefGoogle Scholar
  9. 9.
    AlShourbaji, I., Al-Janabi, S., Patel, A.: Document selection in a distributed search engine architecture. arXiv preprint (2016). arXiv:1603.09434
  10. 10.
    Verma, D., Kochar, B.: Multi agent architecture for search engine. Int. J. Adv. Comput. Sci. Appl. 7(3), 224–229 (2016)Google Scholar
  11. 11.
    Bhuiyan, H., Ara, J., Bardhan, R., Islam, M.R.: Retrieving youtube video by sentiment analysis on user comment. In: 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 474–478. IEEE (2017)Google Scholar
  12. 12.
    Brophy, J., Bawden, D.: Is google enough? Comparison of an internet search engine with academic library resources. In: Aslib Proceedings, vol. 57, no. 6, pp. 498–512. Emerald Group Publishing Limited (2005)Google Scholar
  13. 13.
    Joachims, T.: Optimizing search engines using click through data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM (2002)Google Scholar
  14. 14.
    Trotman, A., Degenhardt, J., Kallumadi, S.: The architecture of eBay search. In: Proceedings of the SIGIR 2017 Workshop on eCommerce (ECOM 17) (2017)Google Scholar
  15. 15.
    Tyagi, N., Sharma, S.: Weighted page rank algorithm based on number of visits of links of web page. Int. J. Soft Comput. Eng. (IJSCE) (2012). ISSN 2231-2307Google Scholar
  16. 16.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Chau, M., Chen, H.: A machine learning approach to web page filtering using content and structure analysis. Decis. Support Syst. 44(2), 482–494 (2008)CrossRefGoogle Scholar
  18. 18.
    Bhuiyan, H., Oh, K. J., Hong, M.D., Jo, G.S.: An unsupervised approach for identifying the infobox template of wikipedia article. In: 2015 IEEE 18th International Conference on Computational Science and Engineering (CSE), pp. 334–338. IEEE (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Jahangirnagar UniversityDhakaBangladesh
  2. 2.University of Asia PacificDhakaBangladesh

Personalised recommendations