Skip to main content

Multi-represented kNN-Classification for Large Class Sets

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 3453)

Abstract

The amount of stored information in modern database applications increased tremendously in recent years. Besides their sheer amount, the stored data objects are also more and more complex. Therefore, classification of these complex objects is an important data mining task that yields several new challenges. In many applications, the data objects provide multiple representations. E.g. proteins can be described by text, amino acid sequences or 3D structures. Additionally, many real-world applications need to distinguish thousands of classes. Last but not least, many complex objects are not directly expressible by feature vectors. To cope with all these requirements, we introduce a novel approach to classification of multi-represented objects that is capable to distinguish large numbers of classes. Our method is based on k nearest neighbor classification and employs density-based clustering as a new approach to reduce the training instances for instance-based classification. To predict the most likely class, our classifier employs a new method to use several object representations for making accurate class predictions. The introduced method is evaluated by classifying proteins according to the classes of Gene Ontology, one of the most established class systems for biomolecules that comprises several thousand classes.

Keywords

  • Multi-represented objects
  • classification
  • instance based learning
  • k nearest neighbor classifier

Supported by the German Ministery for Education, Science, Research and Technology (BMBF) under grant no. 031U212 within the BFAM (Bioinformatics for the Functional Analysis of Mammalian Genomes) project which is part of the German Genome Analysis Network (NGFN).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Consortium, T.G.O.: Gene Ontology: Tool for the Unification of Biology. Nature Genetics 25, 25–29 (2000)

    CrossRef  Google Scholar 

  2. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on information Theory IT-13, 21–27 (1967)

    Google Scholar 

  3. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: Proc. of the 23rd Int. Conf. on Very Large Data Bases, pp. 426–435. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  4. Berchtold, S., Böhm, C., Jagadish, H., Kriegel, H.P., Sander, J.: Independent Quantization: An Index Compression Technique for High-Dimensional Spaces. In: Int. Conf. on Data Engineering, ICDE 2000 (2000)

    Google Scholar 

  5. Brighton, H., Mellish, C.: On the consistency of information filters for lazy learning algorithms. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 283–288. Springer, Heidelberg (1999)

    CrossRef  Google Scholar 

  6. Gates, G.: The reduced nearest neighbour rule. IEEE Transactions on Information Theory 18, 431–433 (1972)

    CrossRef  Google Scholar 

  7. Ritter, G., Woodruff, H., Lowry, S.R., Isenhour, T.: An algorithm for the selective nearest neighbor decision rule. IEEE Transactions on Information Theory 21, 665–669 (1975)

    CrossRef  MATH  Google Scholar 

  8. Wilson, H., Martinez, T.: Instance pruning techniques. In: Proc. 14th Int. Conf. on Machine Learning, pp. 403–411. Morgan Kaufmann Publishers, San Francisco (1997)

    Google Scholar 

  9. Aha, D.: Tolerating noisy, irrelevant and novel attributes in in instance-based learning algorithms. Int. Jurnal of Man-Machine Studies 36, 267–287 (1992)

    CrossRef  Google Scholar 

  10. Wilson, H., Martinez, T.: Machine Learning, 38-3. Reduction Techniques for Instance-Based Learning Algorithms. Kluwer Academic Publishers, Boston (2000)

    Google Scholar 

  11. Brighton, H., Mellish, C.: Data Mining and Knowledge Discavery. Advances in Instance Selection for Instance-Based Learning Algorithms, vol. 6. Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  12. Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications. In: Data Mining and Knowledge Discovery, pp. 169–194. Kluwer Academic Publishers, Dordrecht (1998)

    Google Scholar 

  13. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. KDD 1996, Portland, OR, pp. 291–316. AAAI Press, Menlo Park (1996)

    Google Scholar 

  14. Kittler, J., Hatef, M., Duin, R., Matas, J.: On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 226–239 (1998)

    CrossRef  Google Scholar 

  15. Duin, R.: The Combining Classifier: To Train Or Not To Train? In: Proc. 16th Int. Conf. on Pattern Recognition, Quebec City, Canada, pp. 765–770 (2002)

    Google Scholar 

  16. Kuncheva, L., Bezdek, J., Duin, R.: Decision Templates for Multiple Classifier Fusion: an Experimental Comparison. Pattern Recognition 34, 299–314 (2001)

    CrossRef  MATH  Google Scholar 

  17. Kriegel, H.P., Kröger, P., Pryakhin, A., Schubert, M.: Using support vector machines for classifying large sets of multi-represented objects. In: Proc. SIAM Int. Conf. on Data Mining, Lake Buena Vista, Florida, USA, pp. 102–114 (2004)

    Google Scholar 

  18. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT Protein Knowledgebase and its Supplement TrEMBL in 2003. Nucleic Acid Research 31, 365–370 (2003)

    CrossRef  Google Scholar 

  19. Deshpande, M., Karypis, G.: Evaluation of Techniques for Classifying Biological Sequences. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 417–431. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  20. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kriegel, HP., Pryakhin, A., Schubert, M. (2005). Multi-represented kNN-Classification for Large Class Sets. In: Zhou, L., Ooi, B.C., Meng, X. (eds) Database Systems for Advanced Applications. DASFAA 2005. Lecture Notes in Computer Science, vol 3453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408079_45

Download citation

  • DOI: https://doi.org/10.1007/11408079_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25334-1

  • Online ISBN: 978-3-540-32005-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics