An Efficient Similarity Search Approach to Incremental Multidimensional Data in Presence of Obstacles
In data mining field similarity search has always been a crucial task. A similarity search finds the data points from the same data set space that matches the given query sequence exactly or differs slightly, and is done for whole sequence matching or partial sequence matching. In data sets the existence of obstacle information greatly affects the performance of similarity search in terms of efficiency and effectiveness. Thus, in this paper we present an efficient approach to similarity search based on dynamic selection of input features or attributes in presence of obstacles in respect to better running time and accuracy, with the incremental multidimensional data set. The results show that performance of the similarity search is highly dependent on data size. Thus, our approach can improve the data analysis of financial market, engineering and scientific databases, and telecom industry, providing better performance of classification, clustering, machine learning, and medical diagnosis.
KeywordsSimilarity search Obstacles Multidimensional Data Efficiency Accuracy
Unable to display preview. Download preview PDF.
- 1.Shi, Y., Zhang, L., Zhu, L.: An approach to nearest neighboring search for multi-dimensional data. International Journal of Future Generation Communication and Networking, 4(1) (March 2011)Google Scholar
- 2.Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? The VLDB Journal, 506–515 (2000)Google Scholar
- 3.Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the Surprising Behavior of Distance Metrics in High Dimensional Space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)Google Scholar
- 4.Aggarwal, C.C.: Towards meaningful high-dimensional nearest neighbor search by human-computer interaction. In: ICDE (2002)Google Scholar
- 5.Tung, A.K.H., Zhang, R., Koudas, N., Ooi, B.C.: Similarity Search: A matching based approach. In: VLDB 2006, pp. 631–642. VLDB Endowment (2006)Google Scholar
- 6.Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory 1999, Jerusalem, Israel, pp. 217–235 (1999)Google Scholar
- 7.White, D.A., Jain, R.: Similarity Indexing with the SS-tree. In: Proceedings of the 12th Intl. Conf. on Data Engineering, New Orleans, Louisiana, pp. 516–523 (February 1996)Google Scholar
- 8.Berchtold, D.A., Keim, S., Kriegel, H.P.: The X-tree: An index structure for high-dimensional data. In: VLDB 1996, Bombay, India, pp. 28–39 (1996)Google Scholar
- 9.Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proc. 24th Int. Conf. Very Large Data Bases, VLDB, August 24-27, pp. 194–205 (1998)Google Scholar
- 10.Shi, Y., Zhang, L.: A dimension-wise approach to similarity search problems. In: The 4th International Conference on Data Mining, DMIN 2008 (2008)Google Scholar
- 11.Shi, Y.: A scalable approach to multi-dimensional data analysis. the International Journal of Bio-Science and Bio- Technology 2(4) (March 2010)Google Scholar
- 12.Bay, S.D.: The UCI KDD Archive. University of California, Irvine, Department of Information and Computer Science, http://kdd.ics.uci.edu