Abstract
With vertical search engines, it is possible to search the web pages on a specific domain such as products, restaurants or academic papers and present the users only the interested information. Gathering and integrating such objects from multiple web pages into a single system provides a useful facility for users. Placing the extracted objects from multiple data sources into a single hierarchical structure is a challenging classification problem, especially if there are limited object attributes. In this work, we propose a confidence-based incremental Naïve Bayesian approach for categorization, focusing on the product domain. Incremental approach is based on extending the training set and retraining the classifier as new objects are assigned to a category with high confidence. The ordering of product data is taken into account as well. The proposed approach is applied on a vertical search engine that collects product data from several online stores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nie, Z., Wen, J.-R., Ma, W.-Y.: Object-Level Vertical Search. In: 3rd Biennial Conference on Innovative Data Systems Research (CIDR), pp. 235–246 (2007)
Olston, C., Najork, M.: Web Crawling. Foundations and Trends in Information Retrieval 4(3), 175–246 (2010)
Ding, Y., Korotkiy, M., Omelayenko, B., Kartseva, B., Zykov, V., Klein, M., Schulten, E., Fensel, D.: GoldenBullet: Automated Classification of Product Data in E-commerce. In: Proceedings of Business Information Systems, Poland (2002)
Rahm, E.: Towards Large-Scale Schema and Ontology Matching. Schema Matching and Mapping, part:1, 3–27 (2011)
Wick, M.L., Rohanimanesh, K., Schultz, K., McCallum, A.: A Unified Approach for Schema Matching, Coreference and Canonicalization. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA (2008)
Xiong, Y., Luo, P., Zhao, Y., Lin, F., Feng, S., Zhou, B., Zheng, L.: OfCourse: Web Content Discovery, Classification and Information Extraction for Online Course Materials. In: 18th ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, China (2009)
Snasel, V., Kudelka, M.: Web Content Mining Focused on Web Objects. In: 1st International Conference on Intelligent Human Computer Interaction, India, pp. 37–58 (2009)
Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Kohavi, R., Becker, B., Sommerfield, D.: Improving Simple Bayes. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 78–87. Springer, Heidelberg (1997)
Kolcz, A., Yih, W.: Raising the Baseline for High-precision Text Classifiers. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, pp. 400–409 (2007)
Gama, J.: Iterative Bayes. Intelligent Data Analysis 4(6), 475–488 (2000)
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: 11th Conference on Computational Learning Theory (COLT), New York, USA, pp. 92–100 (1998)
Klawonn, F., Angelov, P.: Evolving Extended Naive Bayes Classifier. In: 6th IEEE International Conference on Data Mining, Los Alamitos, pp. 643–647 (2006)
Friedman, N., Goldszmidt, M.: Sequential update of Bayesian network structure. In: 13th Conference on Uncertainty in Artificial Intelligence, Rhode Island, USA, pp. 165–174 (1997)
Agrawal, R., Bayardo, R., Srikant, R.: Athena: Mining-Based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)
Agrawal, R., Srikant, R.: On Integrating Catalogs. In: 10th International Conference on World Wide Web, Hong Kong, pp. 603–612 (2001)
Batsakis, S., Petrakis, E.G.M., Milios, E.: Improving the Performance of Focused Web Crawlers. Data and Knowledge Engineering 68(10), 1001–1013 (2009)
Mitchell, T.: Machine Learning. McGraw Hill (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ozdikis, O., Senkul, P., Sinir, S. (2012). Confidence-Based Incremental Classification for Objects with Limited Attributes in Vertical Search. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds) Advanced Research in Applied Artificial Intelligence. IEA/AIE 2012. Lecture Notes in Computer Science(), vol 7345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31087-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-31087-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31086-7
Online ISBN: 978-3-642-31087-4
eBook Packages: Computer ScienceComputer Science (R0)