Abstract
At the initial stage of web world, the number of websites hosted was handful, and from the users end, it was easy to maintain log file which consists of information like web pages URL or domain name, but as the number of web hosting increased gradually, it was found hard for the users to maintain such log files. Thus, the requirement exists which helped the users to search the information from the website easily which is now renowned as “Search Engines”; the only limitation found is that the users must be sound enough to give searching keywords in order to search relevant information, but in many cases, users have obtained irrelevant information from the web, and hence, looking into the current scenario of the Internet world, the number of websites has grown drastically holding various web pages within it. These web pages are observed to be published in structured or semi-structured manner which comprises of various multimedia contents, [(Nethra et al. in J Soft Comput 4:692–696, 2014) 1; (Kardan et al. in A novel approach for Keyword extraction in learning objects using text mining & WordNet. pp. 788–792, 2011) 2; (Menaka and Radha in Int J Adv Res Comput Sci Softw Eng 352:24–28, 2013) 3] so the chance to fetch wrong information also increased, and hence, there is a need to auto-categorize the web pages into some predetermined sections. The key point in this research is to recognize and allocate the news feeds into fixed sections of news like business, sports which enhance the reader’s accessibility towards relevant news by traversing appropriate category as per his/her choice, and this is done by adopting hybrid technique of URL analysis and content context analysis. The paper emphasis on a proposed model to perform classification on news feed related to various fields such as sports, health which starts with web crawling of URLs, scraping of news contents followed by the analysis carried out on account of generating keywords, weight calculation, and then at last identify the relevant category on the basis of contents fetched among various Indian news web portal like “The Times of India”, “Hindustan Times”, “The Guardian” [(Nethra et al. in J Soft Comput 4:692–696, 2014) 1; (Jena and Kamila in Int J Appl Innov Eng Manage 2, 2013) 4; (Ercan and Cicekli in Int J Inf Process Manage 43:1705–1714, 2007) 5].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- WPC:
-
Web Page Classification
References
Nethra, K., et al.: Web content extraction using hybrid approach. ICTACT J. Soft Comput. 4(2), 692–696 (2014). ISSN 2229-6956
Kardan, A.A., et al.: A novel approach for Keyword extraction in learning objects using text mining & WordNet. In: Proceeding of 2nd World Conference on Information Technology, pp. 788–792 (2011)
Menaka, S., Radha, N.: Text classification using keyword extraction technique. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(12) (2013). ISSN 2277-128X
Jena, L., Kamila, N.: Data extraction & web page categorization using text mining. Int. J. Appl. Innov. Eng. Manage. 2(6) (2013). ISSN 2319-4847
Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Int. J. Inf. Process. Manage. 43(6), 1705–1714 (2007)
Mehtaa, P., et al.: Web personalization using web mining concept and research issue. Int. J. Inf. Educ. Technol. 2(5), 510 (2012). ISSN 2010-3689
Patil, G., Patil, A.: Web information extraction classification using vector space model algorithm. Int. J. Emerg. Technol. Adv. Eng. 1(2) (2011). ISSN 2250-2459
Chauhan, A., et al.: Cleaning web pages for relevant text extraction & text categorization. Int. J. Eng. Res. Technol. (IJERT) 2(1) (2013). ISSN 2278-0181
Singh, A.: Web content extraction to facilitate web mining. Int. J. Electr. Comput. Sci. Eng. 1(3) (2012). ISSN 2277-1956
Peng, X., Chaoi, B.: Document classification based on word semantic hierarchies. In: Proceeding of ACM International Conference on Artificial Intelligence & Application, pp. 362–367 (2005)
Antonis, M.-L., Zaiane, O.R.: Text document categorization by term association. In: Proceeding of IEEE International Conference on Data Mining, pp. 19–26 (2002)
Acknowledgements
I extend by gratitude towards my research guide, Dr. Yogesh Kumar Sharma, for his wonderful support to carry out my research, and to my wife and my parents for encouraging me in all the dimensions of my life.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Patel, A.D., Sharma, Y.K. (2019). Web Page Classification on News Feeds Using Hybrid Technique for Extraction. In: Satapathy, S., Joshi, A. (eds) Information and Communication Technology for Intelligent Systems . Smart Innovation, Systems and Technologies, vol 107. Springer, Singapore. https://doi.org/10.1007/978-981-13-1747-7_38
Download citation
DOI: https://doi.org/10.1007/978-981-13-1747-7_38
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1746-0
Online ISBN: 978-981-13-1747-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)