Skip to main content

Confidence-Based Incremental Classification for Objects with Limited Attributes in Vertical Search

  • Conference paper
Advanced Research in Applied Artificial Intelligence (IEA/AIE 2012)

Abstract

With vertical search engines, it is possible to search the web pages on a specific domain such as products, restaurants or academic papers and present the users only the interested information. Gathering and integrating such objects from multiple web pages into a single system provides a useful facility for users. Placing the extracted objects from multiple data sources into a single hierarchical structure is a challenging classification problem, especially if there are limited object attributes. In this work, we propose a confidence-based incremental Naïve Bayesian approach for categorization, focusing on the product domain. Incremental approach is based on extending the training set and retraining the classifier as new objects are assigned to a category with high confidence. The ordering of product data is taken into account as well. The proposed approach is applied on a vertical search engine that collects product data from several online stores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nie, Z., Wen, J.-R., Ma, W.-Y.: Object-Level Vertical Search. In: 3rd Biennial Conference on Innovative Data Systems Research (CIDR), pp. 235–246 (2007)

    Google Scholar 

  2. Olston, C., Najork, M.: Web Crawling. Foundations and Trends in Information Retrieval 4(3), 175–246 (2010)

    Article  MATH  Google Scholar 

  3. Ding, Y., Korotkiy, M., Omelayenko, B., Kartseva, B., Zykov, V., Klein, M., Schulten, E., Fensel, D.: GoldenBullet: Automated Classification of Product Data in E-commerce. In: Proceedings of Business Information Systems, Poland (2002)

    Google Scholar 

  4. Rahm, E.: Towards Large-Scale Schema and Ontology Matching. Schema Matching and Mapping, part:1, 3–27 (2011)

    Google Scholar 

  5. Wick, M.L., Rohanimanesh, K., Schultz, K., McCallum, A.: A Unified Approach for Schema Matching, Coreference and Canonicalization. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA (2008)

    Google Scholar 

  6. Xiong, Y., Luo, P., Zhao, Y., Lin, F., Feng, S., Zhou, B., Zheng, L.: OfCourse: Web Content Discovery, Classification and Information Extraction for Online Course Materials. In: 18th ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, China (2009)

    Google Scholar 

  7. Snasel, V., Kudelka, M.: Web Content Mining Focused on Web Objects. In: 1st International Conference on Intelligent Human Computer Interaction, India, pp. 37–58 (2009)

    Google Scholar 

  8. Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  9. Kohavi, R., Becker, B., Sommerfield, D.: Improving Simple Bayes. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 78–87. Springer, Heidelberg (1997)

    Google Scholar 

  10. Kolcz, A., Yih, W.: Raising the Baseline for High-precision Text Classifiers. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, pp. 400–409 (2007)

    Google Scholar 

  11. Gama, J.: Iterative Bayes. Intelligent Data Analysis 4(6), 475–488 (2000)

    MATH  Google Scholar 

  12. Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: 11th Conference on Computational Learning Theory (COLT), New York, USA, pp. 92–100 (1998)

    Google Scholar 

  13. Klawonn, F., Angelov, P.: Evolving Extended Naive Bayes Classifier. In: 6th IEEE International Conference on Data Mining, Los Alamitos, pp. 643–647 (2006)

    Google Scholar 

  14. Friedman, N., Goldszmidt, M.: Sequential update of Bayesian network structure. In: 13th Conference on Uncertainty in Artificial Intelligence, Rhode Island, USA, pp. 165–174 (1997)

    Google Scholar 

  15. Agrawal, R., Bayardo, R., Srikant, R.: Athena: Mining-Based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  16. Agrawal, R., Srikant, R.: On Integrating Catalogs. In: 10th International Conference on World Wide Web, Hong Kong, pp. 603–612 (2001)

    Google Scholar 

  17. Batsakis, S., Petrakis, E.G.M., Milios, E.: Improving the Performance of Focused Web Crawlers. Data and Knowledge Engineering 68(10), 1001–1013 (2009)

    Article  Google Scholar 

  18. Mitchell, T.: Machine Learning. McGraw Hill (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ozdikis, O., Senkul, P., Sinir, S. (2012). Confidence-Based Incremental Classification for Objects with Limited Attributes in Vertical Search. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds) Advanced Research in Applied Artificial Intelligence. IEA/AIE 2012. Lecture Notes in Computer Science(), vol 7345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31087-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31087-4_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31086-7

  • Online ISBN: 978-3-642-31087-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics