Skip to main content
Log in

XM-tree: data driven computational model by using metric extended nodes with non-overlapping in high-dimensional metric spaces

  • S.I. : CMKBO
  • Published:
Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

Finding similar objects based on a query and a distance, remains a fundamental problem for many applications. The general problem of many similarity measures is to focus the search on as few elements as possible to find the answer. The index structures divides the target dataset into subsets. With large amounts of data, the volumes of the subspaces grow exponentially, that will affect the search algorithms. This problem is caused by inherent deficiencies of space partitioning, and also, the overlap factor between regions. This methods have proven to be unreliable, it becomes hard to store, manage, and analyze these quantities. The research tends to degenerate into a complete analysis of the data set. In this paper, we propose a new indexing technique called XM-tree, that partitions the space using spheres. The idea is to combine two structures, arborescent and sequential, in order to limit the volume of the outer regions of the spheres, by creating extended regions and inserting them into linked lists named extended regions, and also by excluding of the empty sets—separable partitions—that do not contain objects. The goal is to eliminate some objects without the need to compute their relative distances to a query object. Therefore, we proposed a parallel version of the structure on a set of real machine. We also discuss the efficiency of the construction and querying phases, and the quality of our index by comparing it with recent techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. The first dataset is available from the COPhIR collection at http://cophir.isti.cnr.it, whereas the second one can be found at http://kdd.ics.uci.edu.

  2. GBDI Arboretum is a library C++ that implements different metric access methods (MAM) (cf. http://www.gbdi.icmc.usp.br/old/arboretum).

References

  • Almeida J, Valle E, Torres RS (2010) Dahc-tree: an effective index for approximate search in high-dimensional metric spaces. J Inf Data Manag 1(3):375–390

    Google Scholar 

  • Arroyuelo D (2014) A dynamic pivoting algorithm based on spatial approximation indexes. In: Similarity search and applications—7th international conference, SISAP 2014, Los Cabos, Mexico, 29–31 October 2014

  • Batko M, Novak D, Falchi F, Zezula P (2006) On scalability of the similarity search in the world of peers. In: Proceedings of the 1st international conference on scalable information systems (InfoScale), ACM Press, Hong Kong, China, pp 20–31

  • Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373

    Article  Google Scholar 

  • Bolettieri P, Falchi F, Lucchese C, Mass Y, Perego R, Rabitti F, Shmueli-Scheuer M (2009) Searching 100m images by content similarity. In: Post-proceedings of the 5th Italian Research Conference on Digital Library Systems (IRCD), Padova, Italy, pp 88–99

  • Bozkaya T, Özsoyoglu M (1999) Indexing large metric spaces for similarity search queries. ACM Trans Database Syst 24:361–404

    Article  Google Scholar 

  • Burkhard WA, Keller RM (1973) Some approaches to best-match file searching. Commun ACM 16(4):230–236

    Article  Google Scholar 

  • Carélo CCM, Pola IRV, Ciferri RR, Traina AJM, Traina C, de Aguiar Ciferri CD (2011) Slicing the metric space to provide quick indexing of complex data in the main memory. Inf Syst 36:79–98

    Article  Google Scholar 

  • Chakraborty D, Singh S, Dutta D (2017) Segmentation and classification of high spatial resolution images based on hölder exponents and variance. Geo-spatial Inf Sci 20(1):39–45

    Article  Google Scholar 

  • Chavez E, Navarro G, Marroquin JL, Baeza-Yates R (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321

    Article  Google Scholar 

  • Chen L, Gao Y, Zheng B, Jensen CS, Yang H, Yang K (2016) Pivot-based metric indexing. Proc VLDB Endow 10(10):1058–1069

    Article  Google Scholar 

  • Chen L, Gao Y, Li X, Jensen CS, Chen G (2017) Efficient metric indexing for similarity search and similarity joins. IEEE Trans Knowl Data Eng 29(3):556–571

    Article  Google Scholar 

  • Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB international conference, pp 426–435

  • Cordeiro RLF, Gonzaga AS (2017) A new division operator to handle complex objects in very large relational datasets. In: EDBT

  • Curtin RR (2015) Faster dual-tree traversal for nearest neighbor search. In: Similarity search and applications—8th international conference, SISAP 2015, Glasgow, UK, 12–14 October 2015

  • Fu AWC, Chan PMS, Cheung YL, Moon YS (2012) Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. VLDB J Very Large Data Bases 9:154–173

    Article  Google Scholar 

  • Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231

    Article  Google Scholar 

  • Gimenes G, Cordeiro RL, Rodrigues JF Jr (2017) ORFEL: efficient detection of defamation or illegitimate promotion in online recommendation. Inf Sci 379:274–287

    Article  Google Scholar 

  • Goto H, Shimakawa Y (2017) Storage-efficient reconstruction framework for planar contours. Geo-spatial Inf Sci 20(1):14–28

    Article  Google Scholar 

  • Graefe G (2009) Fast Loads and Fast Queries. Springer, Berlin, pp 111–124

    Google Scholar 

  • Kondylakis H, Dayan N, Zoumpatianos K, Palpanas T (2018) Coconut: a scalable bottom-up approach for building data series indexes. In: VLDB

  • Martinez J, Kouahla Z (2012) Indexing metric spaces with nested forests. In: Database and expert systems applications—23rd international conference, DEXA 2012, Part I, Vienna, Austria, 3–6 September 2012

  • Navarro G (1999) Searching in metric spaces by spatial approximation. In: Proceedings of string processing and information retrieval (SPIRE99), Cancun, Mexico

  • Navarro G (2002) Searching in metric spaces by spatial approximation. VLDB J 11(1):28–46

    Article  Google Scholar 

  • Nielsen F (2009) Bregman vantage point trees for efficient nearest neighbor queries. In: Proceedings of multimedia and exp (ICME). IEEE

  • Ooi BC (1987) Spatial kd-tree: a data structure for geographic database. Springer, Berlin, pp 247–258

    Google Scholar 

  • Ortega JP, Ortega NNA, Ruiz-Vanoye JA, Sanchez SS, Lelis JMR, Rebollar AM (2018) A-means: improving the cluster assignment phase of k-means for big data. Int J Comb Optim Probl Inf 9(2):3–10

    Google Scholar 

  • Pagh R, Silvestri F, Sivertsen J, Skala M (2015) Approximate furthest neighbor in high dimensions. In: Similarity search and applications—8th international conference, SISAP 2015, Glasgow, UK, 12–14 October 2015

  • Pola IRV, Traina C Jr, Traina AJM (2007) The mm-tree: a memory-based metric tree without overlap between nodes. In: ADBIS 2007, LNCS 4690, pp 157–171

  • Pola IRV, Traina C Jr, Traina AJM (2014) The nobh-tree: improving in-memory metric access methods by using metric hyperplanes with non-overlapping nodes. Data Knowl Eng 94:65–88

    Article  Google Scholar 

  • Pola IRV, Traina AJM, Traina C, Kaster DS (2015) Improving metric access methods with bucket files. In: Similarity search and applications—8th international conference, SISAP 2015, Glasgow, UK, 12–14 October 2015

  • Samet H (2006) Foundations of multidimensional and metric data structures. Morgan-Kaufmann, San Francisco, p 993

    Google Scholar 

  • Santoyo F, Chávez E, Tellez ES (2015) A compressed index for hamming distances. In: Similarity search and applications—7th international conference, SISAP 2014, Los Cabos, Mexico, 29–31 October 2014

  • Shu H (2016) Big data analytics: six techniques. Geo-spatial Inf Sci 19(2):119–128

    Article  Google Scholar 

  • Traina C Jr, Traina A, Seeger B, Faloutsos C (2000) Slim-trees: high performance metric trees minimizing overlap between nodes. In: International conference on extending database technology (EDBT)

  • Wan WY, Xiabi L, Wu Y (2017) Cd-tree: A clustering-based dynamic indexing and retrieval approach. Intell Data Anal 21:243–261

    Article  Google Scholar 

  • Yang H, Yu L (2017) Feature extraction of wood-hole defects using wavelet-based ultrasonic testing. J For Res 28(2):395–402

    Article  Google Scholar 

  • Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th Annual In ACM-SIAM symposium on discrete algorithms, pp 311–321

  • Zierenberg M, Schmitt I (2015) Optimizing the distance computation order of multi-feature similarity search indexing. In: Similarity search and applications—8th international conference, SISAP 2015, Glasgow, UK, 12–14 October 2015

  • Zineddine K, Martinez J (2012) A new intersection tree for content-based image retrieval. In: 10th international workshop on content-based multimedia indexing, CBMI 2012, Annecy, France, 27–29 June 2012, pp 1–6

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adeel Anjum.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kouahla, Z., Anjum, A., Akram, S. et al. XM-tree: data driven computational model by using metric extended nodes with non-overlapping in high-dimensional metric spaces. Comput Math Organ Theory 25, 196–223 (2019). https://doi.org/10.1007/s10588-018-9272-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-018-9272-x

Keywords

Navigation