Skip to main content

Component Selection to Optimize Distance Function Learning in Complex Scientific Data Sets

  • Conference paper
Database and Expert Systems Applications (DEXA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5181))

Included in the following conference series:

  • 1160 Accesses

Abstract

Analyzing complex scientific data, e.g., graphs and images, often requires comparison of features: regions on graphs, visual aspects of images and related metadata, some features being relatively more important. The notion of similarity for comparison is typically distance between data objects which could be expressed as distance between features. We refer to distance based on each feature as a component. Weights of components representing relative importance of features could be learned using distance function learning algorithms. However, it is seldom known which components optimize learning, given criteria such as accuracy, efficiency and simplicity. This is the problem we address. We propose and theoretically compare four component selection approaches: Maximal Path Traversal, Minimal Path Traversal, Maximal Path Traversal with Pruning and Minimal Path Traversal with Pruning. Experimental evaluation is conducted using real data from Materials Science, Nanotechnology and Bioinformatics. A trademarked software tool is developed as a highlight of this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aksoy, S., Haralick, R.: Probabilistic versus Geometric Similarity Measures for Image Retrieval. IEEE CVPR 2, 357–362 (2000)

    Google Scholar 

  2. Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, England (1996)

    MATH  Google Scholar 

  3. Bilenko, M., Mooney, R.: Adaptive Duplicate Detection using Learnable String Similarity Measures. In: KDD, pp. 39–48 (August 2003)

    Google Scholar 

  4. Chen, L., Ng, R.: On the Marriage of Lp-Norm and Edit Distance. In: VLDB, pp. 792–803 (August 2004)

    Google Scholar 

  5. Das, G., Gunopulos, D., Mannila, H.: Finding Similar Time Series. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 88–100. Springer, Heidelberg (1997)

    Google Scholar 

  6. Dougherty, S., Liang, L., Pins, G.: Precision Nanostructure Fabrication for the Investigation of Cell Substrate Interactions, Technical Report, Worcester Polytechnic Institute, Worcester, MA (June 2006)

    Google Scholar 

  7. Friedberg, R.: A Learning Machine: Part I. IBM Journal 2, 2–13 (1958)

    MathSciNet  Google Scholar 

  8. Faloutsos, C., Lin, K.: FastMap: A Fast Algorithm for Indexing, Data Mining and Visualization of Traditional and Multimedia Datasets. SIGMOD Record 24(2), 163–174 (1995)

    Article  Google Scholar 

  9. Hinneburg, A., Aggarwal, C., Keim, D.: What is the Nearest Neighbor in High Dimensional Spaces. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 506–515. Springer, Heidelberg (1997)

    Google Scholar 

  10. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, California (2001)

    Google Scholar 

  11. Ishikawa, Y., Subramanya, R., Faloutsos, C.: MindReader: Querying Databases through Multiple Examples. In: VLDB, pp. 218–227 (August 1998)

    Google Scholar 

  12. Keim, D., Bustos, B.: Similarity Search in Multimedia Databases. In: ICDE, pp. 873–874 (March 2004)

    Google Scholar 

  13. Mitchell, T.: Machine Learning. WCB McGraw Hill, USA (1997)

    MATH  Google Scholar 

  14. Polikar, R.: Ensemble Based Systems in Decision Making. IEEE Circuits and Systems 6(3), 21–45 (2006)

    Article  Google Scholar 

  15. Rui, Y., Huang, T., Mehrotra, S.: Relevance Feedback Techniques in Interactive Content Based Image Retrieval. In: SPIE, pp. 25–36 (January 1998)

    Google Scholar 

  16. Sisson, R., Maniruzzaman, M., Ma, S.: Quenching: Understanding, Controlling and Optimizing the Process, CHTE Seminar (October 2002)

    Google Scholar 

  17. Sheybani, E., Varde, A.: Issues in Bioinformatics Image Processing, Technical Report, Virginia State University, Petersburg, VA (October 2006)

    Google Scholar 

  18. Traina, A., Traina, C., Papadimitriou, S., Faloutsos, C.: TriPlots: Scalable Tools for Multidimensional Data Mining. In: KDD, pp. 184–193 (August 2001)

    Google Scholar 

  19. Varde, A., Rundensteiner, E., Javidi, G., Sheybani, E., Liang, J.: Learning the Relative Importance of Features in Image Data. In: ICDE’s DBRank (April 2007)

    Google Scholar 

  20. Varde, A., Rundensteiner, E., Ruiz, C., Maniruzzaman, M., Sisson, R.: Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data. In: KDD’s MDM, pp. 107–112 (August 2005)

    Google Scholar 

  21. Varde, A., Rundensteiner, E., Sisson, R.: AutoDomainMine: A Graphical Data Mining System for Process Optimization. In: SIGMOD, pp. 1103–1105 (June 2007)

    Google Scholar 

  22. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Algorithms with Java Implementations. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  23. Wang, J., Wiederhold, G., Firschein, O., Wei, S.: Content-Based Image Indexing and Searching Using Daubechies Wavelets. International Journal of Digital Libraries 1, 311–328 (1997)

    Article  Google Scholar 

  24. Xing, E., Ng, A., Jordan, M., Russell, S.: Distance Metric Learning with Application to Clustering with Side Information, NIPS, pp. 503–512 (December 2003)

    Google Scholar 

  25. Zhou, Z., Wu, J., Tang, W.: Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence 137(1), 239–263 (2002)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sourav S. Bhowmick Josef Küng Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Varde, A. et al. (2008). Component Selection to Optimize Distance Function Learning in Complex Scientific Data Sets. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2008. Lecture Notes in Computer Science, vol 5181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85654-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85654-2_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85653-5

  • Online ISBN: 978-3-540-85654-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics