Introduction

Identifying comparable objects in an image or a video series is known as object recognition in computer vision [1, 2]. Numerous enemies, such as variations in stance, occlusion, scale, low illumination, and rotations, make shape-based object detection challenging [3, 4]. Many different strategies have been developed to simplify and improve the accuracy of shape-based object recognition and visual tracking [5,6,7]. One of the main problems with object identification and a crucial system component is matching [8,9,10]. The measurement, comparison, and verification of picture data for precise recognition processes are typical objectives [11, 12].

An extensive investigation of the methods that have emerged over the years is presented, allowing the reader to appreciate the historical development of this field. Moreover, a critical analysis of these methods reveals the issues that the present study aims to address. Among the numerous techniques presented in the literature, the review focuses on the articles published in highly rated peer-reviewed journals, particularly emphasizing high-quality related techniques.

The review of shape representation techniques in this study offers an extensive investigation of the methods that have evolved over time, providing valuable insights into the historical development of this field [13]. By examining the various approaches that have been proposed, the reader gains a comprehensive understanding of the advancements made in shape representation. This critical analysis also highlights the specific issues and challenges that the present study aims to address, thus setting the context for the research’s objectives and contributions.

To ensure a focused and credible review, the study emphasizes articles published in highly rated peer-reviewed journals. By prioritizing high-quality, reputable sources, the review aims to provide a reliable assessment of the most relevant and promising shape representation techniques [14]. This approach ensures that the research is based on well-established methodologies and the latest advancements in the field, enhancing the credibility and robustness of the study’s findings.

The main contributions of this study are, review of several shape representation methods for describing shape-based object recognition, and presenting a critical analysis of current methodologies for shape-based feature representation and discussing their benefits and drawbacks.

The remainder of this paper consists of "Introduction" Section presents an introduction. “Literature review” Section present literature review. "Shape representation methods" Section discusses shape representation methods. Results and discussion are presented in "Results and discussion" Section. Finally, this study concludes in "Conclusion" Section.

Literature review

In Latif et al. [15], a review of content-based image retrieval and feature extraction techniques. It covers a wide range of methods and approaches used in these fields, providing valuable insights into the advancements made over time. The review encompasses various content-based image retrieval techniques, focusing on feature extraction methodologies that play a crucial role in identifying and matching image content [16]. By summarizing the state-of-the-art approaches, this paper serves as a valuable resource for researchers and practitioners in the field of image retrieval.

In Ren et al. [17], a comprehensive overview of the current state of the art in defect detection based on machine vision. It reviews and analyzes the latest methodologies and advancements in the field, providing valuable insights into the techniques used for identifying defects in various applications. The paper covers a wide range of defect detection approaches, discussing the strengths and limitations of each method. By summarizing the cutting-edge research in this area, the paper serves as a valuable resource for researchers and practitioners, offering a clear understanding of the existing techniques and potential avenues for further improvements in defect detection using machine vision.

The authors in [18] provided a concise summary of the current methods and pipelines used for image-based quantitation of nuclear shape and nuclear envelope abnormalities. It offers an overview of the various techniques and approaches employed in the analysis of nuclear morphology, focusing on image-based quantification methods. The paper discusses the strengths and limitations of the existing pipelines, covering a wide range of applications in biomedical research, pathology, and cell biology. By presenting a comprehensive assessment of the state-of-the-art methods, this paper serves as a valuable resource for researchers and practitioners seeking to understand the advancements in quantifying nuclear shape and envelope abnormalities using image analysis techniques.

The authors in [19] presented a comprehensive survey of object detection techniques over a span of 20 years. It covers the evolution and advancements in the field, providing an overview of the various approaches used for object detection in diverse applications. The survey includes a discussion of traditional methods as well as the recent progress made with deep learning-based techniques. By summarizing two decades of research, the paper offers valuable insights into the historical development and current state of object detection, serving as a valuable resource for researchers and practitioners to understand the evolution and trends in this dynamic field.

Shape representation methods

Shape representation often refers to numerical descriptions used to define a certain form. The given shape cannot entirely be recreated utilizing descriptors. Descriptions of various forms must be distinctive enough to allow for shape recognition [3]. Figure 1 shows the classification of shape representation and description methods. In accordance with how the shape is represented, global and structural-based techniques are used [4].

Fig. 1
figure 1

Classification of shape representation and description technique [20]

Global contour-based shape descriptors

Global contour-based shape descriptors are mathematical algorithms used to represent the overall shape of an object. These descriptors are based on the contours or outlines of the object, and they can be used to compare and classify different shapes.

One example of a global contour-based shape descriptor is the Fourier descriptor. This method uses the Fourier transform to decompose the shape of an object into a series of frequencies, which can be used to generate a shape signature that captures the object’s overall shape. Other methods include the centroidal profile descriptor, shape context descriptor, and shape distribution descriptor [20, 21].

Shape signature

Shape signature is a method for shape detection that involves representing a shape using a sequence of values that capture its unique features. The process involves selecting a set of points on the shape and measuring the curvature at each point [6]. These measurements are then normalized to create a sequence of values representing the shape’s signature. By comparing the signatures of different shapes, it is possible to detect similarities and differences between them. Shape signature is a powerful tool for shape detection and has applications in computer vision, robotics, and biomedical imaging [7, 8].

Fourier transforms (FD)

The FD has been effectively used in numerous applications involving form representation, particularly character recognition. They are particularly well-liked in various applications because of their appealing properties, such as straightforward derivation, simple normalization, and their resilience to noise [22]. The FD is produced by applying the Fourier transform to a complex vector created from the shape boundary coordinates (xn, yn), where n = 0, 1, N − 1 [9].

$$\overline{U} = \left( {\begin{array}{*{20}l} {x_{0} - x_{c} + i(y_{0} - y_{c} )} \hfill \\ {x_{1} - x_{c} + i(y_{1} - y_{c} )} \hfill \\ \vdots \hfill \\ {x_{n} - x_{c} + i(y_{n} - y_{c} )} \hfill \\ \end{array} } \right),\;\;n = 0,\;1,\; \ldots ,N - 1$$
(1)

where

$$x_{c} = \frac{1}{N}\sum\limits_{n = 0}^{N - 1} {x(n)} ,\;\;\;\;y_{c} = \frac{1}{N}\sum\limits_{n = 0}^{N - 1} {y(n)}$$
(2)

Wavelet descriptor (WD)

The WD first used wavelet transformations to characterize the appearance of planar closed curves. This technique uses numerous resolutions to break the object down into different parts of various sizes. Global information is contained in the components at higher resolutions, whereas local information at lower resolutions is more in-depth [4, 9]. In addition to being noise-insensitive [23], noise-invariant, unique, and stable against boundary fluctuations, wavelet descriptors offer several additional advantages.

Curvature scale space (CSS)

Curvature scale space (CSS) is a method for detecting and characterizing shapes in images based on their curvature information [10]. In CSS, an image is filtered with a series of scale-dependent curvature operators, which detect and enhance curved structures at different scales. Analyzing the resulting scale-space representation makes it possible to extract information about the shape, size, and orientation of objects in the image.

CSS has several advantages over other shape detection methods. First, it is scale-invariant, which means it can detect objects of different sizes without prior knowledge or parameter tuning. Second, it is robust to noise and other image distortions [14], as the curvature operators are designed to be insensitive to these effects. Finally, it can handle complex shapes with multiple components or concavities, which are challenging for other methods [11].

Shape context descriptor (SCD)

The SCD is used for assessing form similarity was shape context. Finding the relationship between the two forms and calculating how different they are from one another is the fundamental goal of shape context. N points are sampled from the shape’s contour, while a fixed reference point determines the correspondence between the two forms. With the use of an edge detection algorithm, the points are sampled. The next step is constructing a vector collection connecting every sampled position to the reference point [24].

Structural contour-based shape description methods

The structural shape representation is an additional member of the family of shape analysis. Shapes are divided into boundary pieces known as primitives using the structural method. Various structural techniques use different primitives and organize them differently to represent shapes. Boundary decomposition techniques frequently use polygonal approximation, curvature decomposition, curve fitting, and chain coding [20].

Poly line approbations

Merging pixels

According to Morse [25], adding points at a time until a certain tolerance is presented, such as the maximum distance or sum squared error. Figure 2 shows the concept of merging pixels [26].

Fig. 2
figure 2

Tolerance interval (courtesy of [25])

As shown in Fig. 2, the concept of tolerance interval pertains to the merging of pixels that lie within a certain predefined range. When representing a shape as a polyline, a continuous curve is approximated using a series of line segments. However, due to imperfections in data acquisition or digitization processes, the pixel coordinates may contain minor variations or noise. The tolerance interval provides a range around each pixel coordinate within which nearby pixels can be merged together to form a smoother and more simplified polyline representation. By allowing a certain degree of tolerance, the shape can be represented with fewer vertices, reducing computational complexity and storage requirements while preserving the overall shape’s essential features. This technique is particularly useful in applications where an accurate yet concise representation of shapes is needed, such as in computer graphics, image processing [27, 28], and shape recognition tasks.

Fit and split

In accordance with Morse, recursive boundary splitting is used to fit and separate the data, as illustrated in Fig. 3 [25].

Fig. 3
figure 3

Recursive boundary splitting (courtesy of [25])

As shown in Fig. 3, in this approach, a shape’s boundary is recursively split into smaller segments to achieve a more refined and accurate representation. The process starts by fitting a simple curve, such as a straight line, to the initial boundary. If the fit is within an acceptable tolerance, the boundary is considered adequately represented, and the process stops. However, if the fit deviates beyond the defined tolerance, the boundary is split into two sub-segments, and the fitting process is repeated for each sub-segment. This recursive splitting and fitting continue until the tolerance criterion is met for all segments, yielding a detailed and compact representation of the shape. The Fit and Split method is widely used in various applications, including image processing, pattern recognition [29, 30], and computer-aided design, where accurate and efficient shape representation is crucial for analysis and manipulation tasks.

Polygon decomposition

In [2, 3], the border shape is approximated by a polygon and divided into line segments. The vertices of the polygon are utilized as primitives. Each primitive’s feature is written as a four-element string that includes its internal angle, separation from the following vertex, and x and y coordinates. The editing distance between any two feature strings determines how similar any two forms are.

Chain code

The chain code method is a technique for shape detection that involves encoding the boundary of a shape as a sequence of codes. The method starts by selecting a starting point in the shape and then tracing its boundary by moving to the next point on the boundary. The direction of the movement is encoded as a code, such as 0 for the east, 1 for the northeast, 2 for the north, and so on. The sequence of codes represents the shape’s boundary and can be used to compare it with other shapes. The chain code method is a simple yet effective technique for shape detection and has applications in fields such as handwriting recognition, object recognition, and medical image analysis [25].

Smooth curve decomposition (SCDe)

In SCDe, primitives have been obtained from the curvature zero-crossing points of a border with a smoothed Gaussian profile. Each token has a unique maximum curvature and orientation, and the weighted Euclidean distance is used to compare the similarity of the two tokens. It is not rotation invariant since the characteristic includes the curvature orientation. To index the tokens into the feature database, an M-tree is used. Two processes are involved in retrieving shapes comparable to a query shape from the database. Token retrieval is the initial step [20].

Global region-based shape descriptors

Region-based approaches consider all available pixels in the shape region before creating the shape representation, instead of just using border information. Moment descriptors are used in shape description. Region-based methods can alternatively be categorized as global or structural. The structural technique entails segmentation, as was already explained. Examples of the structural approach include the media axis and convex hull, while the global approach includes the shape matrix and grid technique [20].

Simple shape descriptor

Simple or basic shape descriptors are used alone or in conjunction with other shape descriptors to filter out erroneous positives [20, 31]. Simple descriptors include the number of pixels in the form of an area. From quadtree or chain-coding representations, the area is computed. Moreover, the perimeter is defined as the number of pixels in the shape’s boundary. Compactness, also known as circularity, is the ratio of a form’s area to its squared perimeter, which shows how the density of the shape.

Zernike moments

Zernike moments are complex orthogonal functions that compact and efficiently represent a shape’s geometry. They are widely used as shape descriptors in computer vision [32], image processing, and pattern recognition applications. Zernike moments represent a shape by computing the weighted average of its image intensity values over concentric circles and polar angles. The coefficients obtained from this computation are known as Zernike moments and provide a unique signature of the shape’s geometry. Zernike moments are rotation and scale-invariant, meaning they remain unchanged under rotation and scaling transformations. This property makes them ideal for shape-matching and recognition tasks. Zernike moments also have the advantage of being computationally efficient and numerically stable [10].

Image moments (IM)

The IM has been demonstrated to be useful for various recognition tasks. The selected image moment is also invariant under general affine transformation and invariant under the object’s translation, rotation, and scaling [4]. The theory of algebraic invariants allows for deriving the affine moment invariants. According to the moment-based method’s proponents, all six or just the first four invariant moments could be used to describe things[12].

Angular radial transform (ART)

The ART is a shape descriptor that represents a shape by measuring the distribution of its intensity values in polar coordinates. ART encodes the shape’s boundary by dividing it into angular and radial intervals, and then computing the mean and standard deviation of intensity values within each interval. This process generates a set of features that represent the shape’s texture and curvature. ART is rotation and scale-invariant, making it suitable for shape detection and recognition tasks. ART has been used in various applications, such as object recognition, face recognition, and medical image analysis [24].

Structural region-based shape descriptor

Convex hull

A convex hull is the smallest convex polygon encompassing a given set of points in a two-dimensional space. In other words, the tightest possible boundary around a set of points contains all the points within it. The Convex hulls are used in shape detection and recognition because they can help to simplify complex shapes and identify the essential features of an object. For example, in image processing, the convex hull of a set of pixels can be used to detect the boundaries of an object and distinguish it from its background.

Convex hulls have several properties that make them useful for shape detection. They are unique, meaning that for any given set of points, there is only one possible convex hull. They are also efficient to compute, with several algorithms available for calculating the convex hull of a set of points in a time complexity proportional to the number of points [25].

Medial axis

Like the convex hull, an area skeleton can represent and describe form. A linked network of medial lines might be considered a skeleton. For thick hand-drawn characters, the skeleton can be assumed to represent the actual route the pen takes. In actuality, the skeleton’s fundamental tenet is to strip away irrelevant data while keeping just the topological details of the object’s structure that might aid in identification. The center of the largest disks included inside the form is along the medial axis. Moreover, the medial axis tends to be extremely sensitive to boundary noise and changes.

Results and discussion

This section discusses the global and structural-based shape descriptors. Moreover, the existing challenges for the descriptors are discussed in the following context.

Global contour-based shape descriptors, such as Fourier descriptors and Wavelet descriptors, represent a shape using its boundary contour. These descriptors are sensitive to boundary noise and variations in contour orientation, but they are computationally efficient and suitable for simple shapes. The challenges with these descriptors are that they may not capture the internal shape structure, and may not be robust to shape deformations and occlusions.

Structural contour-based shape descriptors, such as Chain code, Curvature scale space and Shape context, capture the shape’s internal structure by encoding its contour shape and spatial relationships between its parts. These descriptors are rotation and scale-invariant and robust to shape variations, but they may require more computational resources than global contour-based descriptors.

Global region-based shape descriptors, such as Hu moments and Zernike moments, represent a shape as a whole by computing features from its intensity distribution or distance transform. These descriptors are robust to shape deformations and occlusions but may not capture the shape’s internal structure and may be sensitive to noise [33].

Structural region-based shape descriptors, such as Geometric moments and Topological descriptors, capture the shape’s internal structure by computing features from its pixel connectivity, topology, and geometric properties. These descriptors are rotation and scale-invariant, robust to shape variations, and suitable for complex shapes, but they may require more computational resources than global region-based descriptors.

Therefore, shape descriptors’ main challenge is to balance discriminative power, robustness, and computational efficiency. The choice of descriptor depends on the application requirements, the complexity of the shapes, and the available computational resources. Combining different descriptors may enhance shape representation and improve recognition accuracy.

Conclusion

This study presents a thorough analysis of shape representation techniques. Moreover, the classification of shape representation and description techniques is presented. This study aims to review the current achievements comprehensively, highlight the weaknesses and advantages of various existing methods in shape representation methods, addressing current research issues and challenging tasks on this research scope. The performance of existing methods is discussed to address the drawbacks and effectiveness of current methods. The findings of this study reveal a comprehensive analysis of shape representation techniques, accompanied by a classification of shape representation and description methods. The research effectively reviews current achievements in the field, shedding light on the strengths and weaknesses of various existing methods, while addressing current research issues and challenging tasks in shape representation. The performance evaluation of existing methods provides insights into their drawbacks and effectiveness, contributing to a better understanding of the state-of-the-art techniques. However, this study has some limitations. The research may have focused on a specific subset of shape representation techniques, potentially leaving out other relevant approaches. Additionally, the evaluation of performance might be limited to certain datasets and scenarios, which could affect the generalizability of the findings. To address these limitations and pave the way for research, two directions can be considered. Firstly, applying shape representation to popular applications like image retrieval and sign detection, based on the addressed features from shape-based descriptors, could provide practical insights and real-world applications. Secondly, investigating a combination of shape-based representation and deep learning approaches for robust object recognition can offer innovative and promising avenues for enhancing the accuracy and efficiency of object recognition systems. By exploring these directions, researchers can further advance the field of shape representation and contribute to its broader applicability in various domains.