Component Selection to Optimize Distance Function Learning in Complex Scientific Data Sets

Varde, Aparna; Bique, Stephen; Rundensteiner, Elke; Brown, David; Liang, Jianyu; Sisson, Richard; Sheybani, Ehsan; Sayre, Brian

doi:10.1007/978-3-540-85654-2_27

Aparna Varde¹,
Stephen Bique²,
Elke Rundensteiner³,
David Brown^3,4,
Jianyu Liang⁴,
Richard Sisson^4,5,
Ehsan Sheybani⁶ &
…
Brian Sayre⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5181))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1160 Accesses

Abstract

Analyzing complex scientific data, e.g., graphs and images, often requires comparison of features: regions on graphs, visual aspects of images and related metadata, some features being relatively more important. The notion of similarity for comparison is typically distance between data objects which could be expressed as distance between features. We refer to distance based on each feature as a component. Weights of components representing relative importance of features could be learned using distance function learning algorithms. However, it is seldom known which components optimize learning, given criteria such as accuracy, efficiency and simplicity. This is the problem we address. We propose and theoretically compare four component selection approaches: Maximal Path Traversal, Minimal Path Traversal, Maximal Path Traversal with Pruning and Minimal Path Traversal with Pruning. Experimental evaluation is conducted using real data from Materials Science, Nanotechnology and Bioinformatics. A trademarked software tool is developed as a highlight of this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aksoy, S., Haralick, R.: Probabilistic versus Geometric Similarity Measures for Image Retrieval. IEEE CVPR 2, 357–362 (2000)
Google Scholar
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, England (1996)
MATH Google Scholar
Bilenko, M., Mooney, R.: Adaptive Duplicate Detection using Learnable String Similarity Measures. In: KDD, pp. 39–48 (August 2003)
Google Scholar
Chen, L., Ng, R.: On the Marriage of Lp-Norm and Edit Distance. In: VLDB, pp. 792–803 (August 2004)
Google Scholar
Das, G., Gunopulos, D., Mannila, H.: Finding Similar Time Series. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 88–100. Springer, Heidelberg (1997)
Google Scholar
Dougherty, S., Liang, L., Pins, G.: Precision Nanostructure Fabrication for the Investigation of Cell Substrate Interactions, Technical Report, Worcester Polytechnic Institute, Worcester, MA (June 2006)
Google Scholar
Friedberg, R.: A Learning Machine: Part I. IBM Journal 2, 2–13 (1958)
MathSciNet Google Scholar
Faloutsos, C., Lin, K.: FastMap: A Fast Algorithm for Indexing, Data Mining and Visualization of Traditional and Multimedia Datasets. SIGMOD Record 24(2), 163–174 (1995)
Article Google Scholar
Hinneburg, A., Aggarwal, C., Keim, D.: What is the Nearest Neighbor in High Dimensional Spaces. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 506–515. Springer, Heidelberg (1997)
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, California (2001)
Google Scholar
Ishikawa, Y., Subramanya, R., Faloutsos, C.: MindReader: Querying Databases through Multiple Examples. In: VLDB, pp. 218–227 (August 1998)
Google Scholar
Keim, D., Bustos, B.: Similarity Search in Multimedia Databases. In: ICDE, pp. 873–874 (March 2004)
Google Scholar
Mitchell, T.: Machine Learning. WCB McGraw Hill, USA (1997)
MATH Google Scholar
Polikar, R.: Ensemble Based Systems in Decision Making. IEEE Circuits and Systems 6(3), 21–45 (2006)
Article Google Scholar
Rui, Y., Huang, T., Mehrotra, S.: Relevance Feedback Techniques in Interactive Content Based Image Retrieval. In: SPIE, pp. 25–36 (January 1998)
Google Scholar
Sisson, R., Maniruzzaman, M., Ma, S.: Quenching: Understanding, Controlling and Optimizing the Process, CHTE Seminar (October 2002)
Google Scholar
Sheybani, E., Varde, A.: Issues in Bioinformatics Image Processing, Technical Report, Virginia State University, Petersburg, VA (October 2006)
Google Scholar
Traina, A., Traina, C., Papadimitriou, S., Faloutsos, C.: TriPlots: Scalable Tools for Multidimensional Data Mining. In: KDD, pp. 184–193 (August 2001)
Google Scholar
Varde, A., Rundensteiner, E., Javidi, G., Sheybani, E., Liang, J.: Learning the Relative Importance of Features in Image Data. In: ICDE’s DBRank (April 2007)
Google Scholar
Varde, A., Rundensteiner, E., Ruiz, C., Maniruzzaman, M., Sisson, R.: Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data. In: KDD’s MDM, pp. 107–112 (August 2005)
Google Scholar
Varde, A., Rundensteiner, E., Sisson, R.: AutoDomainMine: A Graphical Data Mining System for Process Optimization. In: SIGMOD, pp. 1103–1105 (June 2007)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Algorithms with Java Implementations. Morgan Kaufmann Publishers, San Francisco (2000)
Google Scholar
Wang, J., Wiederhold, G., Firschein, O., Wei, S.: Content-Based Image Indexing and Searching Using Daubechies Wavelets. International Journal of Digital Libraries 1, 311–328 (1997)
Article Google Scholar
Xing, E., Ng, A., Jordan, M., Russell, S.: Distance Metric Learning with Application to Clustering with Side Information, NIPS, pp. 503–512 (December 2003)
Google Scholar
Zhou, Z., Wu, J., Tang, W.: Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence 137(1), 239–263 (2002)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Math and Computer Science, Virginia State University, Petersburg, VA
Aparna Varde
Naval Research Laboratory, Washington, DC
Stephen Bique
Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA
Elke Rundensteiner & David Brown
Department of Mechanical Engineering, Worcester Polytechnic Institute, Worcester, MA
David Brown, Jianyu Liang & Richard Sisson
Center for Heat Treating Excellence, Metal Processing Institute, Worcester, MA
Richard Sisson
Department of Engineering and Technology, Virginia State University, Petersburg, VA
Ehsan Sheybani
Department of Biology, Virginia State University, Petersburg, VA
Brian Sayre

Authors

Aparna Varde
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Bique
View author publications
You can also search for this author in PubMed Google Scholar
Elke Rundensteiner
View author publications
You can also search for this author in PubMed Google Scholar
David Brown
View author publications
You can also search for this author in PubMed Google Scholar
Jianyu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Richard Sisson
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Sheybani
View author publications
You can also search for this author in PubMed Google Scholar
Brian Sayre
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sourav S. Bhowmick Josef Küng Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Varde, A. et al. (2008). Component Selection to Optimize Distance Function Learning in Complex Scientific Data Sets. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2008. Lecture Notes in Computer Science, vol 5181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85654-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-85654-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85653-5
Online ISBN: 978-3-540-85654-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics