Naive Bayes for URL Classification Using Kid’s Computer Data
The vast size of the World Wide Web (WWW) nowadays makes it the largest database ever existed. One of the most important functions of the Internet is information retrieval. This research explores a new data source called personal (kids’) browsing data(PBD).The purpose of this study is to assist information retrieval on the Internet by applying data mining techniques. The use of data mining in this domain can be seen as the application of a new technology to an acknowledged problem. Several techniques exist in data mining: association rule, classification, cluster, sequential, and time series. However, the ultimate purpose of WUM is to discover useful knowledge from Web users’ interactive data. In this paper we intend to focus on the classification task. Using only the URL of a web page its category can be identified. The advantage of doing classification using only URLs is its high speed.
KeywordsClassification WUM Naïve Bayes classification URL fragmentation
Unable to display preview. Download preview PDF.
- 1.Levering, R., Cutler, M., Yu, L.: Using Visual Features for Fine-Grained Genre Classification of Web Pages. In: Proceedings of the 41st Hawaii International Conference on System Sciences (2008)Google Scholar
- 2.Shih, L.K., Karger, R.D.: Using URLs and Table Layout for Web Classification Tasks. In: WWW 2004 (May 17-22, 2004)Google Scholar
- 3.Aldwairi, M., Alsalman, R.: MALURLS: A Lightweight Malicious Website. Emerging Technologies in Web Intelligence 4(2) (2012)Google Scholar