Rotation invariance and equivariance in 3D deep learning: a survey

Fei, Jiajun; Deng, Zhidong

doi:10.1007/s10462-024-10741-2

Rotation invariance and equivariance in 3D deep learning: a survey

Open access
Published: 07 June 2024

Volume 57, article number 168, (2024)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

Rotation invariance and equivariance in 3D deep learning: a survey

Download PDF

Jiajun Fei¹ &
Zhidong Deng¹

536 Accesses
1 Altmetric
Explore all metrics

Abstract

Deep neural networks (DNNs) in 3D scenes show a strong capability of extracting high-level semantic features and significantly promote research in the 3D field. 3D shapes and scenes often exhibit complicated transformation symmetries, where rotation is a challenging and necessary subject. To this end, many rotation invariant and equivariant methods have been proposed. In this survey, we systematically organize and comprehensively overview all methods. First, we rewrite the previous definition of rotation invariance and equivariance by classifying them into weak and strong categories. Second, we provide a unified theoretical framework to analyze these methods, especially weak rotation invariant and equivariant ones that are seldom analyzed theoretically. We then divide existing methods into two main categories, i.e., rotation invariant ones and rotation equivariant ones, which are further subclassified in terms of manipulating input ways and basic equivariant block structures, respectively. In each subcategory, their common essence is highlighted, a couple of representative methods are analyzed, and insightful comments on their pros and cons are given. Furthermore, we deliver a general overview of relevant applications and datasets for two popular tasks of 3D semantic understanding and molecule-related. Finally, we provide several open problems and future research directions based on challenges and difficulties in ongoing research.

A survey on deep geometry learning: From a representation perspective

Article Open access 10 June 2020

Deep3D reconstruction: methods, data, and challenges

Article 28 May 2021

CubeNet: Equivariance to 3D Rotation and Translation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, DNNs have played a more and more important role in 3D analysis. DNNs are capable of processing many types of 3D data, including multi-view images (Su et al. 2015; Qi et al. 2016; Yu et al. 2018), voxels (Maturana and Scherer 2015; Zhou and Tuzel 2018), point clouds (Qi et al. 2017a; Wang et al. 2019b; Fei et al. 2022), and particles (Schütt et al. 2017; Thomas et al. 2018; Satorras et al. 2021b). They have outperformed traditional methods and shown great generalizability in a sequence of tasks, like classification (Su et al. 2015; Qi et al. 2017a; Wang et al. 2019b), segmentation (Landrieu and Simonovsky 2018; Meng et al. 2019; Furuya et al. 2020), detection (Zhou and Tuzel 2018; Shi et al. 2019; Wang et al. 2023b), property prediction (Schütt et al. 2017; Satorras et al. 2021b), and generation (Hoogeboom et al. 2022; Guan et al. 2023).

Nonetheless, significant gaps exist between experiments and applications, restricting the actual deployment of DNNs. For example, most experiments are conducted under ideal settings with little noise, known data distribution, and canonical poses, which cannot be completely met in practical applications. Among them, canonical poses are widely adopted in 3D research, where 3D data is first aligned manually and then processed by DNNs. However, such a setting leads to two main problems. First, these models may have severe performance drops when evaluated with non-aligned 3D data, as shown in previous works (Esteves et al. 2018a; Sun et al. 2019b; Zhao et al. 2022a). Zhao et al. (2020b) explore the fragility of 3D DNNs and achieve an over 95% successful rate of black-box adversarial attacks through slightly rotating the evaluation 3D data. Second, these DNNs cannot be applied to solve tasks requiring the output consistency. For example, the atomization energies of molecules are irrelevant to their absolute positions and orientations (Blum and Reymond 2009; Rupp et al. 2012). If DNNs are trained with aligned molecules, they inevitably learn the nonexistent relationship between absolute coordinates and molecular properties and may overfit training data. These models are unreliable and useless as they cannot give the same prediction concerning arbitrarily-rotated inputs. There have been many ways to address such problems. We summarize them as rotation invariant and equivariant methods in this survey.

Rotation invariance has been investigated in traditional 3D descriptors. Before the emergence of DNNs, most methods can only capture low-level geometric features based on transformation invariance. FPFH (Rusu et al. 2009) combines coordinates and estimated surface normals to define Darboux frames. Then it uses several angular variations to represent the surface properties. SHOT (Tombari et al. 2010) designs unique and unambiguous local reference frames (LRFs) to construct robust and expressive 3D descriptors. Drost et al. (2010) create a global description with point pair features (PPFs) composed of relative distances and angles. They can effectively handle tasks like pose estimation and registration. Recently, Horwitz and Hoshen (2023) revisit the importance of traditional descriptors on 3D anomaly detection. DNNs can learn high-level semantic features and accomplish complicated tasks, but they usually ignore the rotation invariance and equivariance, making them unreliable for real-world applications. Existing works deal with this problem from different perspectives. T-Net (Qi et al. 2017a) directly regresses transformation matrices from raw point clouds to transform poses and features. ClusterNet (Chen et al. 2019b) constructs k nearest neighbors (kNN) graphs and computes several invariant distances and angles, which are fed into hierarchical networks for complicated downstream tasks. Tensor field networks (TFNs) (Thomas et al. 2018; Thomas 2019) are equivariant neural networks based on the irreducible representation of SO(3). They have a solid mathematical foundation and perform well over various tasks, including shape classification and RNA structure scoring.

Many distinctive approaches have been developed for rotation invariance and equivariance. However, a comprehensive review of these methods is absent, making it challenging to keep pace with the recent progress and select appropriate methods for specific tasks. Therefore, we are motivated to write this survey and fill the gap. Our contributions can be summarized from three aspects. First, this survey systematically overviews existing works related to rotation invariance and equivariance, which are further divided into several subcategories based on their structures and mathematical foundations. Second, we unify the notations of different methods, providing an intuitive perspective for analysis and comparisons. Third, we point out some open problems and propose future research directions based on them.

This paper is organized as shown in Fig. 1. In Sect. 2, we introduce the mathematical background of rotation invariance and equivariance, including the definition, commonly-used rotation groups, and evaluation metrics. Rotation invariant and equivariant methods are comprehensively overviewed and discussed, respectively, in Sect. 3 and Sect. 4. The applications and datasets are also inspected in Sect. 5. In Sect. 6, we point out several future research directions based on unsolved problems. Notations are listed in Table 1 for better readability.

Table 1 Notations adopted in this survey

Full size table

2 Background

This section introduces the background knowledge required to understand rotation invariance and equivariance. The basic concepts of group theory are beneficial for better comprehension. Readers may refer to other textbooks for more details, including Group Theory in Physics: An Introduction (Cornwell 1997) and Algebra (Artin 2013).

Invariance and equivariance have been formulated in much related work (Cohen and Welling 2016; Thomas et al. 2018; Cohen et al. 2018a, 2019a; Thomas 2019). However, their definition cannot cover some methods in this survey. Thus, we elaborately make a broad definition to include them. The definition of both strong/weak invariance and equivariance can be seen in Definition 1. Compared with the previous definition, we introduce weak invariance and equivariance through the G-variant error so as to cover methods not satisfying Eq. 1. It should be noted that the determination of C as an exact value is unnecessary since any function is C-weakly equivariant if C is large enough ($+\infty$). So C is generally omitted in this survey. If a method is weakly equivariant, it means that its G-variant error is relatively small or reduced after appropriate training.

Definition 1

Suppose that G acts on $\mathcal {X},\mathcal {Y}$, and $f:\mathcal {X}\rightarrow \mathcal {Y},\ d:\mathcal {Y}\times \mathcal {Y}\rightarrow \mathbb {R}_{\ge 0}$ is a metric on $\mathcal {Y}$.

f is strongly equivariant with respect to G, if

$$\begin{aligned} f\left( g\cdot x\right) =g\cdot f\left( x\right) ,\forall x\in \mathcal {X},g\in G. \end{aligned}$$

(1)

Meanwhile, f is C-weakly equivariant with respect to G, if

$$\begin{aligned} \int _{\mathcal {X}}\int _{G}d\left( f\left( g\cdot x\right) ,g\cdot f\left( x\right) \right) \text {d}\mu \left( g\right) \text {d}x<C. \end{aligned}$$

(2)

Specifically, if the group action of G on $\mathcal {Y}$ is trivial, i.e., $\forall g\in G,\forall y\in \mathcal {Y},g\cdot y=y$, then f is C-weakly/strongly invariant with respect to G. For discrete $\mathcal {X}$ or G, the integration on the left side of Eq. 2 is substituted with summation. The integral is named the G-variant error, denoted by $\mathcal {E}\left( f\right)$.

SO(3), O(3), SE(3), E(3), and their proper subgroups are the commonly-used groups that describe 3D rotation, reflection, and translation. Their differences are listed in Table 2. Unless otherwise specified, we focus on rotation in the 3D Euclidean space, and G is a subgroup of SO(3).

Table 2 The differences among SO(3), O(3), SE(3), and E(3)

Full size table

Rotation invariant and equivariant methods require specific evaluation metrics to reflect the performances on certain tasks and the invariance/equivariance. Let us take a supervised learning task with N training samples $\left\{ \left( x_i,y_i\right) \right\} _{i=1}^N$ as an example. $f:\mathcal {X}\rightarrow \mathcal {Y}$ is the deep model and $L:\mathcal {Y}\times \mathcal {Y}\rightarrow \mathbb {R}$ is the evaluation function. If there is no requirement on equivariance, the metric is computed as $\mathcal {L}=\sum _{i}L\left( f\left( x_i\right) ,y_i\right)$. However, if equivariance is considered, the model f should consider $L\left( f\left( g\cdot x_i\right) ,g\cdot y_i\right)$ for all $g\in G$ instead of only $L\left( f\left( x_i\right) ,y_i\right)$. Accordingly, the metric $\mathcal {L}_G$ is given as

$$\begin{aligned} \mathcal {L}_G=\sum _{i}\int _{G}L\left( f\left( g\cdot x_i\right) ,g\cdot y_i\right) \text {d}\mu \left( g\right) . \end{aligned}$$

(3)

If f is strongly equivariant and L is strongly invariant, then $\mathcal {L}_G$ degenerates into $\mathcal {L}$. As the integration is computationally inefficient, most previous works approximate the metric with randomly-rotated samples.

3 Rotation invariant methods

Invariance is a particular and straightforward case of equivariance. Rotation invariant methods aim to produce the same or close results for inputs with different poses. We will show the basic essence of these methods and discuss their advantages and drawbacks. Several milestone methods are shown in Fig. 2.

3.1 Data augmentation methods

Data augmentation methods only make changes to the loss function instead of any model structure. They use samplings to estimate the integration in Eq. 3. Thus, the loss $\mathcal {L}_G$ is constructed as

$$\begin{aligned} \mathcal {L}_G=\sum _{i,\hat{g}}L\left( f\left( \hat{g}\cdot x_i\right) ,y_i\right) , \end{aligned}$$

(4)

where $\hat{g}$ is sampled from G.

Many methods use data augmentation, and only some representative ones are listed here. Kajita et al. (2017) exploit rotated replicas to increase the classification accuracy. Zhuang et al. (2019); Zhu et al. (2020) leverage a Rubik’s cube recovery task with permutation and random rotation to learn invariant features from medical images. Choy et al. (2019); Bai et al. (2020) observe that fully convolutional neural networks (CNNs) can gain rotation invariance through data augmentation. Zhou et al. (2022a) utilize random rotations to learn invariant representations for point cloud generation. Bergmann and Sattlegger (2023) apply rotation augmentation on anomaly-free training samples for 3D anomaly detection.

Although data augmentation methods can enhance rotation robustness (Kajita et al. 2017; Choy et al. 2019; Bai et al. 2020), they have severe limitations. Data augmentation generally introduces a heavy training burden. For example, Kajita et al. (2017) use 30 times as much rotated data to progress significantly on rotation invariant descriptors. Besides, data augmentation methods cannot guarantee their invariance on arbitrary rotations, because Eq. 4 cannot minimize the loss on unseen rotations. Practically, data augmentation is often integrated into other rotation invariant methods as an auxiliary component.

3.2 Multi-view methods

Unlike data augmentation methods, multi-view methods attain rotation invariance by modifying the model instead of the loss function. In multi-view methods, the model $f:\mathcal {X}\rightarrow \mathcal {Y}$ is built as

$$\begin{aligned} f\left( x\right) =\sum _{\hat{g}_j\in \widehat{G}}w_jf_\text {b}\left( \hat{g}_j\cdot x\right) , \end{aligned}$$

(5)

where $\widehat{G}$ is a finite subset of G, $f_\text {b}:\mathcal {X}\rightarrow \mathcal {Y}$ is the base model, and $w_j>0,\sum _jw_j=1$. The metric d is generally convex, so f has a lower G-variant error than $f_\text {b}$ as Eq. 6 shows, meaning f is more invariant. A simple yet effective approach is choosing $\widehat{G}$ as a finite subgroup of G and $w_j=\frac{1}{\left| \widehat{G}\right| }$, then f is strongly invariant with respect to $\widehat{G}$.

$$\begin{aligned} \mathcal {E}\left( f\right) =&\int _{\mathcal {X}}\int _{G}d\left( \sum _{j}w_jf_\text {b}\left( \hat{g}_jg\cdot x\right) ,\sum _{j}w_jf_\text {b}\left( \hat{g}_j\cdot x\right) \right) \text {d}\mu \left( g\right) \text {d}x\nonumber \\ \le&\int _{\mathcal {X}}\int _{G}\sum _{j}w_jd\left( f_\text {b}\left( \hat{g}_jg\cdot x\right) ,f_\text {b}\left( \hat{g}_j\cdot x\right) \right) \text {d}\mu \left( g\right) \text {d}x\nonumber \\ =&\sum _{j}w_j\int _{\mathcal {X}}\int _{G}d\left( f_\text {b}\left( \hat{g}_jg\hat{g}_j^{-1}\cdot x\right) ,f_\text {b}\left( x\right) \right) \text {d}\mu \left( g\right) \text {d}x\nonumber \\ =&\int _{\mathcal {X}}\int _{G}d\left( f_\text {b}\left( g\cdot x\right) ,f_\text {b}\left( x\right) \right) \text {d}\mu \left( g\right) \text {d}x=\mathcal {E}\left( f_\text {b}\right) . \end{aligned}$$

(6)

As CNNs become a powerful tool for images (Krizhevsky et al. 2012; Simonyan and Zisserman 2014; Szegedy et al. 2015; He et al. 2016), researchers exploit multi-view images to extract features from 3D shapes. Most multi-view methods take images as input, while some later methods also process point clouds and voxels. Although these methods are designed as 3D feature extractors firstly, they can improve rotation invariance and are chosen as baselines by related work (Esteves et al. 2018a; Rao et al. 2019; Zhang et al. 2019a) (Fig. 3). MVCNN (Su et al. 2015) is a pioneering method showing that a fixed set of rendered views is highly informative for 3D shape recognition. VoxNet (Maturana and Scherer 2015) pools multi-view voxel features and achieves 2D rotation invariance around the z-axis. Qi et al. (2016) introduce multi-resolution filtering for multi-view CNNs and improve the classification accuracy. Cao et al. (2017) propose spherical projections to collect depth variations and contour information of different views for better performances. Zhang et al. (2018) apply a PointNet-like (Qi et al. 2017a) method on multi-view 2.5D point clouds to fuse information from all views. View-GCN++ (Wei et al. 2022) exploits rotation robust view-sampling to deal with rotation sensitivity. Besides, some methods replaces weighted average in Eq. 5 with pooling/fusion modules to enhance effectiveness and efficiency (Wang et al. 2017; Roveri et al. 2018; Yu et al. 2018; Wei et al. 2020; Li et al. 2020; Chen and Chen 2022). These modifications do not necessarily improve the invariance, so we omit them here.

Most multi-view methods take images as the input, so they can handle 3D rotation invariance using powerful 2D models (Su et al. 2015; Qi et al. 2016; Cao et al. 2017). Nonetheless, they lead to a heavy computational burden, making training and inference inefficient. As Eq. 5 shows, the computational burden of f is at least $\left| \widehat{G}\right|$ times than that of $f_\text {b}$. For instance, $\left| \widehat{G}\right|$ is 12 or 80 in MVCNN (Su et al. 2015). In addition, most existing multi-view methods are weakly rotation invariant. Their base models $f_\text {b}$ are not strongly rotation invariant, such as 2D CNNs (Su et al. 2015; Qi et al. 2016; Wei et al. 2022) and non-invariant 3D networks (Zhang et al. 2018). So the composite models f do not possess strong invariance.

3.3 Ringlike and cylindrical methods

Under some circumstances, it is straightforward to identify a principal axis. Thus, the 3D rotation invariance degenerates into the 2D one. These methods organize data in rings or cylinders for further processing.

The principal axis is either selected from x, y, z axes or computed using specific algorithms. DeepPano (Shi et al. 2015) takes z-axis as the principal axis and creates a panoramic view through a cylinder projection. A max-pooling layer is appended for invariance. Moon et al. (2018) extend 2D CNNs working on panoramic views to 3D CNNs working on cylindrical occupancy grids and get better performances. Cylindrical Transformer Networks (Esteves et al. 2018b) transform raw coordinates to cylindrical ones using the predicted z-axis. As the 3D convolutions acting on cylindrical coordinates are translation invariant, the final representations are rotation invariant. Many methods take this pipeline with slight modifications (Sun et al. 2019a; Ao et al. 2021; Fan et al. 2021; Xu et al. 2021c; Li et al. 2022b; Zhao et al. 2022b; Ao et al. 2023a).

Ringlike and cylindrical methods are compelling in applications like place recognition (Sun et al. 2019a; Li et al. 2022b) and registration (Ao et al. 2021; Zhao et al. 2022b). Nevertheless, their application scope is limited. They can only handle problems where the principal axes can be identified, or the inputs can fit into rings and cylinders.

3.4 Transformation methods

Transformation methods address rotation invariance through a transformation function $t:\mathcal {X}\rightarrow \text {Aut}\left( \mathcal {X}\right)$, where $\text {Aut}\left( \mathcal {X}\right)$ is the automorphism group of $\mathcal {X}$. In transformation methods, the model $f:\mathcal {X}\rightarrow \mathcal {Y}$ is given as

$$\begin{aligned} f\left( x\right) =f_\text {b}\left( t\left( x\right) \cdot x\right) , \end{aligned}$$

(7)

where $f_\text {b}:\mathcal {X}\rightarrow \mathcal {Y}$ is the base model. If t satisfies the invariance condition, i.e.,

$$\begin{aligned} \forall x\in \mathcal {X},\forall g\in G,t\left( x\right) =t\left( g\cdot x\right) g, \end{aligned}$$

(8)

then f is strongly rotation invariant. However, t does not satisfy this condition in most methods, so f is only weakly rotation invariant. These methods are usually designed for coordinate inputs like point clouds.

Spatial Transformer Networks (STNs) (Jaderberg et al. 2015) are widely used for spatial invariance in image processing. In the 3D domain, PointNet (Qi et al. 2017a) proposes joint alignment networks, i.e., T-Net, for rotation robustness, as shown in Fig. 4. T-Net is a mini-PointNet regressing the transformation matrix directly. To make the matrix $\varvec{R}\in \mathbb {R}^{3\times 3}$ orthogonal, a regularization term $L_\text {reg}=\left\| \varvec{I}-\varvec{RR}^T\right\|$ is appended. There is no clear disparity between STNs and T-Nets in the 3D domain, so they are not distinguished in this survey. T-Net is widely adopted with the spread of PointNet-like methods (Qi et al. 2017a, b; Wang et al. 2019b). SHPR-Net (Chen et al. 2018b) employs two T-Nets to connect poses in the original and canonical spaces. PVNet (You et al. 2018) applies the EdgeConv (Wang et al. 2019b) as the basic blocks of T-Net to better capture local information. Zhang et al. (2018) put raw point clouds and multi-view features into T-Net to robustify the model. In addition, many other methods also include T-Net in their models for the effectiveness and stability in different downstream tasks (Joseph-Rivlin et al. 2019; Chen et al. 2019a; Liu et al. 2019c; Zhang et al. 2020a; Yu et al. 2020b; Wang et al. 2021; Poiesi and Boscaini 2021; Hegde and Gangisetty 2021; Liu et al. 2022c; Zhu et al. 2022a).

Besides rotation matrices, some methods utilize other rotation representations. IT-Net (Yuan et al. 2018) simultaneously canonicalizes rotation and translation through the quaternion representation. PCPNet (Guerrero et al. 2018) and SCT (Liu et al. 2022a) regress unit quaternions for pose canonicalization and point cloud recognition, respectively. Poiesi and Boscaini (2023) learn a quaternion transformation network to refine the estimated LRF. RotPredictor (Fang et al. 2020) applies PointConv (Wu et al. 2019) to regress Euler angles, and RTN (Deng et al. 2021b) predicts discrete Euler angles. C3DPO (Novotny et al. 2019) divides the shape into view-specific pose parameters and a view-invariant shape basis. PaRot (Zhang et al. 2023a) also disentangles invariant features with equivariant poses via the equivariance loss. Wang et al. (2022c) formulate the rotation invariant learning problem as the minimization of an energy function, solved with an iterative strategy. Some methods are embedded in a self-supervised learning framework. Some works (Zhou et al. 2022b; Mei et al. 2023) enforce the consistency of canonical poses with a rotation equivariance loss. Sun et al. (2021) utilize Capsule Networks (Hinton et al. 2011) with the canonicalization loss for object-centric reasoning. Kim et al. (2022) introduce a self-supervised learning framework to predict canonical axes of point clouds using the icosahedral group. Currently, only a few methods are strongly rotation invariant. LGANet (Gu et al. 2021b) and ELGANet (Gu et al. 2022) exploit graph convolutional networks (GCNs) to process rotation invariant distances and angles, where the outputs are orthogonalized into rotation matrices. Katzir et al. (2022) employ equivariant networks to learn canonical poses of point clouds. RIP-NeRF (Wang et al. 2023c) transforms raw coordinates into invariance one for fine-grained editing. EIPs (Fei and Deng 2024) disentangle rotation invariance and point cloud processing with efficient invariant poses.

Beneficial from their straightforward idea, transformation methods are extensively used in many applications (Liu et al. 2019c; Guerrero et al. 2018; Zhu et al. 2022a). Notwithstanding, the invariance condition is always ignored by some works, especially those using T-Nets (Qi et al. 2017a; Joseph-Rivlin et al. 2019; Poiesi and Boscaini 2021). Thus, the transformation functions have no contribution to the rotation invariance. Besides, some methods cannot output proper rotation representations. For example, T-Net (Qi et al. 2017a) cannot guarantee proper output rotation matrices, even using the regularization term. In this case, 3D shapes are inevitably distorted, and some structural information may be lost. Moreover, heavy data augmentation is sometimes required for good performance. Le (2021) shows that T-Net needs a large amount of data augmentation to learn a steady transformation policy.

3.5 Invariant value methods

Invariant value methods achieve rotation invariance through constructing invariant values from coordinate inputs. Here, invariant values include distances, inner products, and angles:

$$\begin{aligned} \left\| \varvec{u}_i\right\| \ \left( \text {distance}\right) ,\ \varvec{u}_i\cdot \varvec{u}_j\ \left( \text {inner product}\right) ,\ \angle \left( \varvec{u}_i,\varvec{u}_j\right) \ \left( \text {angle}\right) , \end{aligned}$$

(9)

where $\left\{ \varvec{u}_i\right\} \subset \mathbb {R}^3$ is a nonzero geometric vector set. Based on these invariant values, the model $f:\mathcal {X}\rightarrow \mathcal {Y}$ is generally set up as

$$\begin{aligned} f\left( x\right) =f_\text {b}\left( f_\text {i}\left( x\right) \right) , \end{aligned}$$

(10)

where $f_\text {i}:\mathcal {X}\rightarrow \mathcal {Z}$ uses handcrafted rules to compute invariant values, and $f_\text {b}:\mathcal {Z}\rightarrow \mathcal {Y}$ is the base model. Clearly, f is strongly rotation invariant. In the following discussions, $\left\{ \varvec{x}_i\right\}$ represents a point cloud. $\varvec{x}_{ij},\varvec{n}_{ij},\left( j=1,\cdots ,k\right)$ denote the positional and normal vectors of $\varvec{x}_i$’s kNN, respectively. $\varvec{m}_i$ is the barycenter of $\mathcal {N}\left( \varvec{x}_i\right)$. We use several operators to simplify the notation: normalize (N), orthogonalize (O), and orthonormalize (NO).

$$\begin{aligned} \text {N}\left( \varvec{x}\right) =\frac{\varvec{x}}{\left\| \varvec{x}\right\| },\text {O}\left( \varvec{x},\varvec{y}\right) =\varvec{y}-\left( \varvec{y}\cdot \text {N}\left( \varvec{x}\right) \right) \text {N}\left( \varvec{x}\right) ,\text {NO}\left( \varvec{x},\varvec{y}\right) =\text {N}\left( \text {O}\left( \varvec{x},\varvec{y}\right) \right) . \end{aligned}$$

(11)

As $f_\text {b}$ is usually a deep point cloud model with slight modification, the handcrafted rules in $f_\text {i}$ are the core of invariant value methods. We divide existing methods into several groups according to the form of invariant values.

3.5.1 Local values

Table 3 Some representative local values

Full size table

Many methods generate invariant values in the local neighborhoods. ClusterNet (Chen et al. 2019b) introduces rigorously rotation invariant (RRI) mappings based on a kNN graph as

$$\begin{aligned} \text {RRI}\left( \varvec{x}_i,\left\{ \varvec{x}_{ij}\right\} _{j=1}^k\right) =&\left[ \left\| \varvec{x}_i\right\| ,\left\{ \left( \left\| \varvec{x}_{ij}\right\| ,\angle \left( \varvec{x}_i,\varvec{x}_{ij}\right) ,\phi _{ij}\right) \right\} _{j=1}^k\right] ,\\ \text {where }\phi _{ij}=&\min \left\{ \text {atan2}\left( a_{ijt},b_{ijt}\right) \vert 1\le t\le k,t\ne j,a_{ijt}\ge 0\right\} ,\nonumber \\ a_{ijt}=&\left( \text {NO}\left( \varvec{x}_i,\varvec{x}_{ij}\right) \times \text {NO}\left( \varvec{x}_i,\varvec{x}_{it}\right) \right) \cdot \text {N}\left( \varvec{x}_i\right) ,\nonumber \\ b_{ijt}=&\ \text {NO}\left( \varvec{x}_i,\varvec{x}_{ij}\right) \cdot \text {NO}\left( \varvec{x}_i,\varvec{x}_{it}\right) .\nonumber \end{aligned}$$

(12)

ClusterNet applies a hierarchical structure to aggregate features. Although all geometric information is retained, it mainly considers global information, weakening its capability to describe local structures. RIConv (Zhang et al. 2019b) addresses this issue by extracting local rotation invariant features (RIFs, Fig. 5a) via relative distances and angles as

$$\begin{aligned} \text {RIF}\left( \varvec{x}_{ij}\right) =\left[ \left\| \varvec{d}_{ij}^{\left( 0\right) }\right\| ,\left\| \varvec{d}_{ij}^{\left( 1\right) }\right\| ,\angle \left( \varvec{d}_{ij}^{\left( 0\right) },\varvec{d}_i^{\left( 0\right) }\right) ,\angle \left( \varvec{d}_{ij}^{\left( 1\right) },-\varvec{d}_i^{\left( 0\right) }\right) \right] , \end{aligned}$$

(13)

where $\varvec{d}_{ij}^{\left( 0\right) }=\varvec{x}_{ij}-\varvec{x}_i,\varvec{d}_{ij}^{\left( 1\right) }=\varvec{x}_{ij}-\varvec{m}_i,\varvec{d}_i^{\left( 0\right) }=\varvec{m}_i-\varvec{x}_i$. It applies a multi-layer perceptron (MLP) to generate final features. RIF has been widely adopted by many works (Chou et al. 2021; Zhang et al. 2022; Wang and Rosen 2023; Fan et al. 2023).

Later work mainly adds more reference points and invariant values to improve performances. Some representative invariant values are collected in Table 3. Readers may refer to the original papers for details.

3.5.2 LRF-based values

LRF-based values are special cases of local values. Specially, if three orthogonal axes $\varvec{e}_1,\varvec{e}_2,\varvec{e}_3$ can be determined in $\mathcal {N}\left( \varvec{x}_i\right)$, then $\varvec{x}_{ij}\cdot \varvec{e}_1,\varvec{x}_{ij}\cdot \varvec{e}_2,\varvec{x}_{ij}\cdot \varvec{e}_3$ are relative coordinates in this LRF. LRFs are adopted in many handcrafted 3D descriptors, like FPFH (Rusu et al. 2009), SHOT (Tombari et al. 2010), and RoPS (Guo et al. 2013). It should be noted that methods only using principal component analysis (PCA) to define LRFs are discussed separately in the next section instead of this one. We divide these methods according to the number of LRFs in each neighborhood.

Some methods define a unique LRF in each neighborhood. Usually, the normal vector is selected as one axis, a normalized weighted average vector is selected as another, and their cross product is chosen as the final axis. We summarize these methods in Table 4.

Table 4 Different LRFs adopted by LRF-based values with one LRF

Full size table

Besides, there are also methods with multiple LRFs in each neighborhood. A common choice of LRF is the Darboux frame defined as

$$\begin{aligned} \varvec{e}_x=\varvec{n}_i,\varvec{e}_y\left( \varvec{x}_{ij}\right) =\text {N}\left( \varvec{d}_{ij}^{\left( 0\right) }\times \varvec{e}_x\right) ,\varvec{e}_z\left( \varvec{x}_{ij}\right) =\varvec{e}_x\times \varvec{e}_y\left( \varvec{x}_{ij}\right) , \end{aligned}$$

(14)

where $\varvec{e}_y$ and $\varvec{e}_z$ depend on not only $\varvec{x}_i$ but also $\varvec{x}_{ij}$. CRIN (Lou et al. 2023) proposes another LRF by considering the original space basis. Some representative invariant values are listed in Table 5.

Table 5 Some representative invariant values with multiple LRFs

Full size table

3.5.3 PPF-based values

PPFs (Drost et al. 2010) are initially proposed in the 3D object recognition algorithm, which describe the relative information between two points $\varvec{x}_1,\varvec{x}_2$ as

$$\begin{aligned} \text {PPF}\left( \varvec{x}_1,\varvec{x}_2\right) =\left[ \left\| \varvec{d}_{12}\right\| ,\angle \left( \varvec{n}_1,\varvec{d}_{12}\right) ,\angle \left( \varvec{n}_2,\varvec{d}_{12}\right) ,\angle \left( \varvec{n}_1,\varvec{n}_2\right) \right] , \end{aligned}$$

(15)

where $\varvec{d}_{ij}=\varvec{x}_i-\varvec{x}_j$, as Fig. 5c shows. PPFs are strongly rotation invariant, making them suitable for invariant feature extraction.

PPFNet (Deng et al. 2018b) concatenates PPFs with coordinates and normals to improve the robustness of 3D point matching. PPF-FoldNet (Deng et al. 2018a) combines PPFNet with FoldingNet (Yang et al. 2018) to learn invariant descriptors, using only PPFs as input features. Bobkov et al. (2018) slightly modify and apply the PPFs to classification and retrieval. GMCNet (Pan et al. 2021) combines RRI (Chen et al. 2019b) and PPFs for rigorous partial point cloud registration. Using hypergraphs, Triangle-Net (Xiao and Wachs 2021) extend PPFs to three points (triangles). PaRI-Conv (Chen and Cong 2022) augments PPFs with two azimuth angles and uses them to synthesize pose-aware dynamic kernels. PPFs have been widely employed in rotation invariant point cloud matching and registration (Zhao et al. 2021; Yu et al. 2023; Zhang et al. 2023c).

3.5.4 Global values

Some methods do not require local neighborhoods to evaluate invariant values. SRINet (Sun et al. 2019b) defines point projection mapping (PPM, Fig. 5d) through projecting $\varvec{x}_i$ on three axes $\varvec{a}_1,\varvec{a}_2,\varvec{a}_3$ as

$$\begin{aligned} \text {PPM}\left( \varvec{x}_i\right) =\Big [\cos \angle \left( \varvec{a}_1,\varvec{x}_i\right) ,\cos \angle \left( \varvec{a}_2,\varvec{x}_i\right) ,\cos \angle \left( \varvec{a}_3,\varvec{x}_i\right) ,\left\| \varvec{x}_i\right\| \Big ], \end{aligned}$$

(16)

where $\varvec{a}_1=\mathop {\arg \max }_{\varvec{x}\in \left\{ \varvec{x}_i\right\} }\left\| \varvec{x}\right\| , \varvec{a}_2=\mathop {\arg \min }_{\varvec{x}\in \left\{ \varvec{x}_i\right\} }\left\| \varvec{x}\right\| , \varvec{a}_3=\varvec{a}_1\times \varvec{a}_2$. Based on SRINet, Tao et al. (2021) add attention modules, and SCT (Liu et al. 2022a) adds a quaternion T-Net for better performances. Sun et al. (2023) apply SRINet on non-rigid point clouds. Some works (Xu et al. 2021b; Qin et al. 2023a) employ the sorted Gram matrix as invariant values. The Gram matrix for $\left\{ \varvec{x}_i\right\} _1^N$ is computed as $\left( \varvec{x}_i\cdot \varvec{x}_j\right) _{N\times N}$, each row of which is then sorted and fed into point-based networks for permutation and rotation invariance.

3.5.5 Others

In addition to the above invariant values, the other values that are hard to classify are listed here. SchNet (Schütt et al. 2017, 2018) gains rotation invariance through interatomic distances. SkeletonNet (Ke et al. 2017) uses angles and ratios between distances as invariant features for human skeletons. Liu et al. (2018) leverage relative distances on global point cloud registration. 3DTI-Net (Pan et al. 2019) utilizes translation invariant graph filter kernel and employs the norms as invariant features. 3DMol-Net (Li et al. 2021a) extends it to molecular applications. RISA-Net (Fu et al. 2020) employs edge lengths and dihedral angles on 3D retrieval tasks. RMGNet (Furuya et al. 2020) feeds several handcrafted descriptors into GCNs for point cloud segmentation. GS-Net (Xu et al. 2020) uses eigenvalue decomposition on local distance graphs and exploits these eigenvalues as invariant features. SN-Graph (Zhang et al. 2021b) leverages 15 cosine values, 7 distances, and 7 radii as invariant values. TinvNN (Zhang et al. 2021c) exercises eigenvalue decomposition on the zero-centered distance matrices to get invariant features. ComENet (Wang et al. 2022b) exploits several rotation angles for global completeness. DuEqNet (Wang et al. 2023b) builds equivariant networks through relative distances for object detection. SGPCR (Salihu and Steinbach 2023) explores the rotation invariant convolution between two spherical Gaussians for object registration and retrieval. RadarGNN (Fent et al. 2023) employs rotation invariant bounding boxes and representation for radar-based perception. GeoTransformer Qin et al. (2023b) further applies sinusoidal embedding on distances and angles for robust registration.

3.5.6 Discussion

Unlike the methods above, invariant value methods are strongly rotation invariant, and their superiority has been demonstrated with many experiments (Xu et al. 2021b; Chen and Cong 2022; Sahin et al. 2022; Wang et al. 2023b). Nevertheless, there are still several concerns.

Singularity Almost every method has singularities that make invariant values meaningless, including coincident points (e.g., $\varvec{x}_i=\varvec{m}_i\Rightarrow \varvec{d}_i^{\left( 0\right) }=\varvec{0}$ leads to undefined angles in RIConv (Zhang et al. 2019b)), collinear vectors (e.g., if cross products in Cao et al. (2021); Chen and Cong (2022); Sahin et al. (2022) give zero output, then their LRFs are not properly defined), and nonunique candidate values (e.g. if two or more points are satisfying $\mathop {\arg \max }_{\varvec{x}\in \left\{ \varvec{x}_i\right\} }\left\| \varvec{x}\right\|$, then $\varvec{a}_1$ in SRINet (Sun et al. 2019b) is not determined).

Irreversibility For $f_\text {i}:\mathcal {X}\rightarrow \mathcal {Z}$, if there exists $f_\text {ri}:\mathcal {Z}\rightarrow \mathcal {X}$ satisfying

$$\begin{aligned} \forall x\in \mathcal {X},\exists \ g_x\in G,f_\text {ri}\left( f_\text {i}\left( x\right) \right) =g_x\cdot x, \end{aligned}$$

(17)

then $f_\text {i}$ is reversible. Some irreversible invariant values may lose certain structural information, harming downstream task performances (Zhang et al. 2019b; Sun et al. 2019b).

Discontinuity The base model $f_\text {b}$ is generally a continuous deep model. So if $f_\text {i}$ is discontinuous at $x_0$, then the model f may also be discontinuous at $x_0$, making it hard to train with gradient-based optimization algorithms. For example, $f_\text {i}$ in SRINet (Sun et al. 2019b) is discontinuous on point clouds whose two longest vectors are close, since it needs them to define axes.

Reflection Distances, inner products, and angles are invariant to rotations and reflections. Thus, almost all methods without cross products cannot distinguish rotations from reflections (Drost et al. 2010; Zhang et al. 2019b; Xu et al. 2021b).

3.6 PCA-based methods

PCA-based methods construct the model similarly to transformation methods, while the transformation function is unlearnable PCA alignment, as Algorithm 1 shows. $\varvec{X}$ is usually zero-centered to mitigate the influence of translations and $\varvec{\Sigma }$ is called the covariance matrix. PCA alignment can guarantee the rotation invariance. For $\varvec{X}_R=\varvec{XR}\ \left( \varvec{RR}^T=\varvec{I}\right)$, if

$$\begin{aligned} \varvec{\Sigma }_R=\varvec{X}_R^T\varvec{X}_R=\varvec{R}^T\varvec{\Sigma R}=\left( \varvec{R}^T\varvec{V}\right) \varvec{\Lambda }\left( \varvec{R}^T\varvec{V}\right) ^T\Rightarrow \varvec{V}_R=\varvec{R}^T\varvec{V}, \end{aligned}$$

(18)

then $\varvec{Z}_R=\varvec{X}_R\varvec{V}_R=\varvec{XV}=\varvec{Z}$. There are two conditions for Eq. 18. First, the eigenvalues must be distinct, i.e., $\lambda _1>\lambda _2>\lambda _3$. As it is rare that two or three eigenvalues are equal, almost all methods assume it to be true. Second, the signs of all columns of $\varvec{V}$ must be identified uniquely. If $\varvec{V}=\left[ \varvec{v}_1,\varvec{v}_2,\varvec{v}_3\right]$ satisfies Eq. 19, then $\varvec{V}\text {diag}\left( \varvec{s}\right) =\left[ s_1\varvec{v}_1,s_2\varvec{v}_2,s_3\varvec{v}_3\right] \left( \varvec{s}=\left[ s_1,s_2,s_3\right] ^T\in \left\{ -1,1\right\} ^3\right)$ also satisfies it. Some works substitute PCA with eigenvalue decomposition or singular value decomposition (SVD), but there is no substantial difference. In SVD, another matrix $\varvec{U}=\varvec{XV}\sqrt{\varvec{\Lambda }}^{-1}\in \mathbb {R}^{N\times 3}$ is introduced. In this section, PCA-based methods are classified according to how the ambiguity of signs is handled (Fig. 6).

Most methods disambiguate signs through handcrafted rules, which generally involve dot products between $\varvec{v}_k$ and other vectors. If $\varvec{v}_k\rightarrow -\varvec{v}_k\Rightarrow s_k\rightarrow -s_k$, then $s_k\varvec{v}_k$ remains the same. Some representative rules are listed in Table 6.

Table 6 Different disambiguation rules adopted by PCA-based methods. $k=1,2,3$ unless otherwise specified

Full size table

Some methods consider combinations of signs instead of just choosing one. Xiao et al. (2020) fuse all combinations through a self-attention module. OrthographicNet (Kasaei 2021) transforms raw points into canonical poses and generates several projection views for 3D object recognition. MolNet-3D (Liu et al. 2022c) average the results from 4 poses to predict the molecular properties. Puny et al. (2022) convert the group averaging operation to the subset averaging one with frames, where 4 and 8 frames are exploited for SO(3) and O(3), respectively. Li et al. (2023a) apply this approach on 3D planar reflective symmetry detection.

Some works utilize pose selectors to make one pose from multiple candidates. PR-invNet (Yu et al. 2020a) augments 8 poses with discrete rotation groups and utilizes the pose selector to choose the final pose. Li et al. (2021b) investigate the inherent ambiguity of PCA alignment. They argue that the order of $\varvec{e}_x,\varvec{e}_y,\varvec{e}_z$ is also ambiguous and the total ambiguities is $4\left( \text {sign}\right) \times 6\left( \text {order}\right) =24$. All poses are fused through a pose selector to create an optimal one. Besides coordinates, some works apply PCA on network weights (Xie et al. 2023) and the convex hull (Pop et al. 2023).

PCA-based methods are effective with intrinsic strong rotation invariance. Furthermore, they are always combined with invariant value methods for better performances (Yu et al. 2020a; Zhao et al. 2022a; Chen and Cong 2022). However, sign disambiguation may bring out problems like singularity and discontinuity in Sect. 3.5.6 (Zhang et al. 2020c; Fan et al. 2020; Gandikota et al. 2021), while considering all combinations would increase the computational burden (Xiao et al. 2020; Kasaei 2021). Besides, PCA-based methods are fragile to inputs with close eigenvalues since their eigenvectors are numerically unstable, which is an inherent problem of eigenvalue decomposition.

3.7 Summary

In a word, different methods use distinctive ways to obtain rotation invariance. Most rotation invariant methods are applied in 3D general understanding. We compare their differences in Table 7. Considering this, we summarize several characteristics of existing rotation invariant methods.

Data augmentation is always integrated with other methods, especially weakly rotation invariant ones (Fang et al. 2020; Deng et al. 2021b; Le 2021), to improve their invariance.
Multi-view methods only work with images and do not have advantages on coordinate inputs, since they are weakly invariant and usually introduce heavy computational burdens (Su et al. 2015; Qi et al. 2016; Zhang et al. 2018).
Ringlike and cylindrical methods are the best choices in tasks like place recognition (Sun et al. 2019a; Li et al. 2022b), as achieving 2D invariance is simpler than 3D.
Weakly rotation invariant transformation methods are less recommended. They can be replaced by PCA-based methods that have strong invariance and excellent performances.
Until now, strong invariance is only available by applying invariant value methods and PCA-based methods on coordinate inputs like point clouds and meshes.

Table 7 Comparisons of different rotation invariant methods

Full size table

4 Rotation equivariant methods

Most of the rotation equivariant methods are equivariant networks on rotation groups. There are already surveys on geometrically equivariant graph neural networks (Han et al. 2022; Zhang et al. 2023b), categorizing them according to the way of message passing and aggregation. We devise a slightly different taxonomy to cover more related methods. Some milestone methods are listed in Fig. 7.

4.1 G-CNNs

Group equivariant convolutional neural networks (G-CNNs) are first proposed to address 2D rotations in images (Cohen and Welling 2016). Moreover, they can be extended to 3D rotations directly. The group convolution for $\psi ,f:\mathcal {X}\rightarrow \mathbb {R}$ is defined as

$$\begin{aligned} \left[ \psi \star f\right] \left( g\right) =\int _\mathcal {X}\left[ L_g\psi \right] \left( x\right) f\left( x\right) \text {d}x, \end{aligned}$$

(19)

where $\left[ L_g\psi \right] \left( x\right) =\psi \left( g^{-1}\cdot x\right)$. The output signal is always defined on the rotation group, so $\mathcal {X}=G$ in all convolutional layers except the first one. Group convolutions are strongly rotation equivariant, i.e., $\psi \star \left[ L_gf\right] =L_g\left[ \psi \star f\right]$.

It is difficult to evaluate the integration directly, so many methods investigate group convolutions with finite groups. CubeNet (Worrall and Brostow 2018) focuses on convolutions on finite groups and reduces rotation equivariance to permutation equivariance. The group convolution for $\psi ,f:\widehat{G}\rightarrow \mathbb {R}$ satisfies

$$\begin{aligned} \left[ \psi \star f\right] \left( \hat{g}_j\right) =L_{\hat{g}_i}\left[ \psi \star f\right] \left( \hat{g}_{k\left( i,j\right) }\right) =\left[ \psi \star L_{\hat{g}_i}f\right] \left( \hat{g}_{k\left( i,j\right) }\right) , \end{aligned}$$

(20)

where $\widehat{G}$ is a finite rotation group and $\hat{g}_{k\left( i,j\right) }=\hat{g}_i\hat{g}_j$. Therefore, rotation $f\rightarrow L_{\hat{g}_i}f$ is equivalent to permutation $j\rightarrow k\left( i,j\right)$ in the group convolution, as Fig. 8 shows. Esteves et al. (2019b) put multi-view features on vertices of the icosahedron and introduce localized filters in discrete G-CNNs for efficiency. EPN (Chen et al. 2021) combines point convolutions with group convolutions for SE(3) equivariance, and has been applied on object detection (Yu et al. 2022) and place recognition (Lin et al. 2022a, 2023a). G-CNNs are employed in many tasks, like medical image analysis (Winkels and Cohen 2018, 2019; Andrearczyk and Depeursinge 2018), point cloud segmentation (Meng et al. 2019; Zhu et al. 2023), pose estimation (Li et al. 2021d), and registration (Wang et al. 2022a, 2023a; Xu et al. 2023a).

Some methods utilize Lie groups to construct equivariant models. LieConv (Finzi et al. 2020) lifts raw inputs $x\in \mathcal {X}$ to group elements $g\in G$ and orbits $q\in \mathcal {X}/G$ that $g\cdot o_q=x$, where $o_q$ is the origin of q. Thus, the convolution is defined as

$$\begin{aligned} \left[ \psi \star f\right] \left( g,q\right) =\int _G\int _{\mathcal {X}/G}\psi \left( g^{-1}g',q,q'\right) f\left( g',q'\right) \text {d}q'\text {d}\mu \left( g\right) ', \end{aligned}$$

(21)

where $\psi :G\times \mathcal {X}/G\times \mathcal {X}/G\rightarrow \mathbb {R},f:G\times \mathcal {X}/G\rightarrow \mathbb {R}$. LieTransformer (Hutchinson et al. 2021) adds attention mechanisms to LieConv. After lifting, it computes content attention and location attention, both of which are normalized for feature transformation.

G-CNNs are effective tools for handling equivariance for voxels and point clouds (Worrall and Brostow 2018; Finzi et al. 2020; Chen et al. 2021). Nonetheless, it is difficult to balance the computational burden and the approximation error when using sampling to approximate the integration. Moreover, a finite subgroup of SO(3) is one of the following groups: the cyclic group $C_k\left( \left| C_k\right| =k\right)$, the dihedral group $D_k\left( \left| D_k\right| =2k\right)$, the tetrahedral group $T\left( \left| T\right| =12\right)$, the octahedral group $O\left( \left| O\right| =24\right)$, and the icosahedral group $I\left( \left| I\right| =60\right)$ (Artin 2013). $C_k,D_k$ can be large enough but are unsuitable for arbitrary 3D rotations, while V, O, I are applicable but cannot be as large as possible. Therefore, it is hard for methods that depend on finite subgroups to extend to arbitrary rotations, like CubeNet (Worrall and Brostow 2018).

4.2 Spherical CNNs

Spherical CNNs are special cases of G-CNNs, where the inputs are spherical and SO(3) signals. In this survey, existing spherical CNNs are divided into three categories, i.e., Cohen et al. (2018a), Esteves et al. (2018a), and the others (Fig. 9).

4.2.1 Cohen et al. (2018a)

Cohen et al. (2018a) directly employ group convolutions in Eq. 19, where $\mathcal {X}$ is either $S^2$ or SO(3). They use the generalized Fourier transform (GFT) to convert convolutions into matrix multiplications. GFT and its inverse are computed as

$$\begin{aligned} \tilde{f}_m^l=&\int _{S^2}f\left( x\right) \overline{Y_m^l\left( x\right) }\text {d}x,{} & {} f\left( x\right) =\sum _{l=0}^\infty \sum _{m=-l}^l\tilde{f}_m^lY_m^l\left( x\right) , \end{aligned}$$

(22)

$$\begin{aligned} \tilde{f}_{mn}^l=&\int _{\text {SO}\left( 3\right) }f\left( g\right) \overline{D_{mn}^l\left( g\right) }\text {d}\mu \left( g\right) ,{} & {} f\left( g\right) =\sum _{l=0}^\infty \sum _{m,n=-l}^l\tilde{f}_{mn}^lD_{mn}^l\left( g\right) , \end{aligned}$$

(23)

where $l\ge 0, -l\le m,n\le l$. It can be proved that $\widetilde{\varvec{\psi }\star \varvec{f}}^l=\tilde{\varvec{f}}^l\left( \tilde{\varvec{\psi }}^{l}\right) ^H$, where $\tilde{\varvec{f}}^l,\tilde{\varvec{\psi }}^{l}\in \mathbb {C}^{2l+1}$ for spherical signals, $\tilde{\varvec{f}}^l,\tilde{\varvec{\psi }}^{l}\in \mathbb {C}^{\left( 2l+1\right) \times \left( 2l+1\right) }$ for SO(3) signals, and $\widetilde{\varvec{\psi }\star \varvec{f}}^l\in \mathbb {C}^{\left( 2l+1\right) \times \left( 2l+1\right) }$. The computation can be further accelerated with the generalized fast Fourier transform. Clebsch-Gordan Nets (Kondor et al. 2018) exploit the tensor product nonlinearity to avoid repeated transform, thus improving the efficiency. The tensor product between two steerable vectors $\tilde{\varvec{u}}^{l_1}\in \mathbb {C}^{2l_1+1},\tilde{\varvec{v}}^{l_2}\in \mathbb {C}^{2l_2+1}$ is defined as

$$\begin{aligned} \left( \tilde{\varvec{u}}^{l_1}\otimes \tilde{\varvec{v}}^{l_2}\right) ^l_m=\sum _{m_1,m_2}C^{l,m}_{l_1,m_1,l_2,m_2}\tilde{u}^{l_1}_{m_1}\tilde{v}^{l_2}_{m_2}, \end{aligned}$$

(24)

where $C^{l,m}_{l_1,m_1,l_2,m_2}$ is the Clebsch-Gordan coefficient, $\left| l_1-l_2\right| \le l\le l_1+l_2,-l\le m\le l$. Tensor product is strongly rotation equivariant, i.e., $\left( \varvec{D}^{l_1}\left( g\right) \tilde{\varvec{u}}^{l_1}\otimes \varvec{D}^{l_2}\left( g\right) \tilde{\varvec{v}}^{l_2}\right) ^l=\varvec{D}^l\left( g\right) \left( \tilde{\varvec{u}}^{l_1}\otimes \tilde{\varvec{v}}^{l_2}\right) ^l.$

Many methods are based on spherical convolutions (Cohen et al. 2018a). $a^3$SCNN (Liu et al. 2019a) proposes the alt-az anisotropic spherical convolution ($a^3$SConv), whose outputs are spherical but not SO(3) signals. $a^3$SConv ($\star _1$) is defined as $\left[ \psi \star _1 f\right] \left( x\right) =\left[ \psi \star f\right] \left( \zeta \left( x,0\right) \right)$, where $\zeta :S^2\times \left[ 0,2\pi \right) \rightarrow \text {SO}\left( 3\right)$. As $\zeta \left( x,0\right)$ cannot represent all SO(3) elements, $a^3$SConv is only equivariant to specific rotations. Esteves et al. (2020b) introduce spin weights and propose the spin-weighted spherical CNN. PRIN (You et al. 2020, 2021) propose spherical voxel convolution (SVC) for signals on the unit ball $B^3$. SVC ($\star _2$) is defined as $\left[ \psi \star _2 f\right] \left( x\right) =\left[ \psi \star f\right] \left( \iota \left( x\right) \right)$, where $\iota :B^3\rightarrow \text {SO}\left( 3\right)$. SPRIN (You et al. 2021) abandons the dense grids in PRIN by directly converting point clouds $\left\{ x_i\right\}$ into a distribution function $f\left( x\right) =\frac{1}{N}\sum _{i}\delta \left( x-x_i\right)$, where $\delta$ is the delta function. Then SVC can be efficiently approximated as an unbiased estimation. Chen et al. (2023) combines spherical CNNs with Capsule Networks (Hinton et al. 2011) for unknown pose recognition.

Most methods use the ray casting to generate spherical signals from 3D shapes. However, other methods are also applicable. Yang et al. (2019); Yang and Chakraborty (2020) generate spherical signals by collecting responses from point clouds. Spherical-GMM (Zhang 2021) represents point clouds with Gaussian mixture models. Besides classification and segmentation, spherical CNNs are widely used in many tasks, including omnidirectional localization (Zhang et al. 2021a), place recognition (Yin et al. 2020, 2021, 2022), and self-supervised representation learning (Spezialetti et al. 2019; Marcon et al. 2021; Lohit and Trivedi 2020; Spezialetti et al. 2020).

4.2.2 Esteves et al. (2018a)

Esteves et al. (2018a, 2020a) propose another spherical convolution only processing spherical signals. The spherical convolution for $\psi ,f:S^2\rightarrow \mathbb {R}$ is defined as

$$\begin{aligned} \left[ \psi *f\right] \left( x\right) =\int _{G}L_{g}\psi \left( x\right) L_{g^{-1}}f\left( \eta \right) \text {d}\mu \left( g\right) , \end{aligned}$$

(25)

where $\eta$ is the north pole. Such spherical convolutions are strongly rotation equivariant, i.e., $\psi *\left[ L_gf\right] =L_g\left[ \psi *f\right]$, which can be converted to multiplications with GFT as $\widetilde{\psi *f}^l_m=2\pi \sqrt{\frac{4\pi }{2l+1}}\tilde{\psi }_0^l\tilde{f}^l_m$. As only $\tilde{\psi }_0^l$ is involved, the only useful part is the zonal component of the filter $\psi$.

Esteves et al. (2019a) utilize pre-trained spherical CNNs as supervision and learn equivariant representations for 2D images. Mukhaimar et al. (2022) apply them on concentric spherical voxels for robust point cloud classification. Esteves et al. (2023) scale up spherical CNNs and achieve outstanding performances on molecular benchmarks and weather forecasting tasks.

4.2.3 Others

Some spherical CNNs keep GFT and part of spherical convolutions. Zhang et al. (2019a) replace the SO(3) convolutional layers with PointNet-like (Qi et al. 2017a) networks. Almakady et al. (2020) use GFT to decompose the spherical signals, then exploit the norms of individual components as invariant features for volumetric texture classification. Lin et al. (2021b) combine these norms with other invariant features to boost the classification performance.

Some spherical CNNs handle convolutions in the spatial domain. SFCNN (Rao et al. 2019) apply symmetric convolutions to each point and its neighbors on spherical lattices. Yang et al. (2020) propose the geodesic icosahedral pixelization to address the irregularity problem. Fox et al. (2022) transform point clouds into concentric spherical signals and append convolutions along the radial dimension. Shakerinava and Ravanbakhsh (2021) investigate the pixelizations of platonic solids for spheres and introduce equivariant maps on them. Xu et al. (2022) exploit global–local attention-based convolutions for spherical data.

4.2.4 Discussion

Spherical CNNs are effective for spherical signals. They have a solid mathematical foundation and nice properties on equivariance. Notwithstanding, preprocessing is sometimes problematic. The ray casting technique is commonly adopted to convert 3D shapes into spherical signals (Cohen et al. 2018a; Esteves et al. 2018a). However, Esteves et al. (2018a) argue that it is only suitable for star-shaped objects, from whose interior point the whole boundary is visible. Besides, projection on spheres would unavoidably distort shapes, and finer grids lead to less error but a heavier computational burden (Cohen et al. 2018a; Esteves et al. 2018a).

4.3 Irreducible representation methods

Irreducible representation methods utilize irreducible representations of SO(3), i.e., Wigner-D matrices $\varvec{D}^l,l=0,1,\cdots$, to achieve rotation equivariance. A degree-l steerable feature $\tilde{\varvec{u}}^l$ would transform into $\varvec{D}^l\left( g\right) \tilde{\varvec{u}}^l$ under ${g}{\in}$SO(3). In these methods, the degree-l filter $\varvec{F}^l:\mathbb {R}^3\rightarrow \mathbb {C}^{2l+1}$ is constructed as

$$\begin{aligned} \varvec{F}^l\left( \varvec{x}\right) =\varphi ^l\left( \left\| \varvec{x}\right\| \right) \varvec{Y}^l\left( \frac{\varvec{x}}{\left\| \varvec{x}\right\| }\right) \ \ \ \ \ \varvec{x}\ne \varvec{0}, \end{aligned}$$

(26)

where $\varphi ^l:\mathbb {R}_{\ge 0}\rightarrow \mathbb {R}$ and $\varvec{Y}^l$ is the spherical harmonic. To guarantee the continuity, $\varvec{F}^l\left( \varvec{0}\right)$ is determined by $\lim _{\varvec{x}\rightarrow \varvec{0}}F^l\left( \varvec{x}\right)$, which is nonzero only when $l=0$. $\varvec{F}^l$ is strongly rotation equivariant, i.e., $\varvec{F}^l\left( \varvec{R}\left( g\right) \varvec{x}\right) =\varvec{D}^l\left( g\right) \varvec{F}^l\left( \varvec{x}\right)$.

Irreducible representation methods are mostly applied to coordinate inputs like point clouds. Tensor field networks (TFNs) (Thomas et al. 2018; Thomas 2019) are the pioneering methods using irreducible representations. All inputs and outputs of the TFN layer are tensor fields $\widetilde{\varvec{V}}^l\in \mathbb {R}^{N\times C_l\times \left( 2\,l+1\right) }$, where N is the number of points, $C_l$ is the feature dimension, and $l=0,\cdots ,L$ is the rotation degree. They exploit TFN filters to generate steerable features from coordinates. Then the tensor product between these features and input tensor fields is computed as the output tensor fields, as shown in Fig. 10. TFNs and Clebsch-Gordan Nets (Kondor et al. 2018) have many similarities, including steerable features and tensor products. However, TFNs bind steerable features with points, while Clebsch-Gordan Nets exploit steerable features to describe spherical signals. N-body Networks (Kondor 2018), designed for many body physical systems, are also based on the irreducible representation of SO(3). Cormorant (Anderson et al. 2019) modifies the nonlinearity in Clebsch-Gordan Nets (Kondor et al. 2018) to avoid the blow-up of channels. SE(3)-Transformer (Fuchs et al. 2020) decomposes the TFN layer into self-interaction and message-passing, where attention is added to the second part. TF-Onet Chatzipantazis et al. (2023) also uses equivariant attention modules for shape reconstruction. Poulenard and Guibas (2021) propose a new nonlinearity for steerable features to improve the expressivity and reduce the computational burden. TFNs are leveraged in many applications, including 3D shape analysis (Poulenard et al. 2019), protein structure prediction (Fuchs et al. 2021), molecular dynamics simulation (Batzner et al. 2022), and self-supervised canonicalization (Sajnani et al. 2022).

Besides point clouds, irreducible representation methods are also applied to voxels. 3D Steerable CNNs (Weiler et al. 2018) reduce rotation equivariant linear maps between irreducible features into convolutions with steerable kernels $\varvec{W}^{ll'}:\mathbb {R}^3\rightarrow \mathbb {R}^{\left( 2\,l+1\right) \times \left( 2\,l'+1\right) }$ that satisfy

$$\begin{aligned} \varvec{W}^{ll'}\left( \varvec{R}\left( g\right) \varvec{x}\right) =\varvec{D}^l\left( g\right) \varvec{W}^{ll'}\left( \varvec{x}\right) \varvec{D}^{l'}\left( g\right) ^{-1}. \end{aligned}$$

(27)

Eq. 27 can be solved analytically with the solution as a TFN-type matrix function. 3D Steerable CNNs are employed in some applications, including 3D texture analysis (Andrearczyk et al. 2019), partial point cloud classification (Xu et al. 2023b), and multiphase flow demonstration (Siddani et al. 2021; Lin et al. 2021a). PDO-s3DCNNs (Shen et al. 2022) derive the general steerable 3D CNNs with partial differential operators.

Irreducible representation methods have intrinsic strong rotation equivariance. Nonetheless, the theory is so complex as to limit the audience (Weiler et al. 2018; Thomas et al. 2018; Thomas 2019). Besides, tensor products may increase the number of the rotation degree and harm the efficiency Thomas et al. (2018); Thomas (2019); Kondor et al. (2018).

4.4 Equivariant value methods

Equivariant value methods are networks constructed by equivariant values, i.e., scalars and vectors. They are similar to invariant value methods in Sect. 3.5. However, invariant values are only primitive features, while equivariant values form the basic blocks of equivariant networks.

EGNNs (Satorras et al. 2021b) add relative distances to graph convolutional layers. Then the coordinate $\varvec{x}_i$ and feature $\varvec{f}_i$ are updated as

$$\begin{aligned} \varvec{m}_{ij}=\phi _e&\left( \varvec{f}_i,\varvec{f}_j,\left\| \varvec{x}_i-\varvec{x}_j\right\| ^2,\varvec{a}_{ij}\right) , \end{aligned}$$

(28)

$$\begin{aligned} \varvec{x}_i+\frac{1}{N-1}\sum _{j\ne i}\left( \varvec{x}_i-\varvec{x}_j\right) \phi _x&\left( \varvec{m}_{ij}\right) \rightarrow \varvec{x}_i,\ \phi _f\left( \varvec{f}_i,\sum _{j\in \mathcal {N}\left( \varvec{x}_i\right) }\varvec{m}_{ij}\right) \rightarrow \varvec{f}_i, \end{aligned}$$

(29)

where $\varvec{a}_{ij}$ is the edge information, $\phi _e,\phi _x,\phi _f$ are update functions for edges, coordinates, and node features, respectively. Clearly, the coordinates are strongly rotation equivariant, while the features are strongly rotation invariant. E-NFs (Satorras et al. 2021a) combine EGNNs with continuous-time normalizing flows (Chen et al. 2018a) to construct equivariant generative models. EquiDock (Ganea et al. 2022) and EquiBind (Stärk et al. 2022) apply graph matching networks (Li et al. 2019b) and EGNNs on rigid body protein-protein docking and drug binding structure prediction, respectively. Some methods (Hoogeboom et al. 2022; Schneuing et al. 2022; Igashov et al. 2022; Lin et al. 2022b; Guan et al. 2023) incorporate diffusion models with EGNNs for molecule generation. SEGNNs (Brandstetter et al. 2022) extend EGNNs with steerable features.

Vector Neurons (VNs) (Deng et al. 2021a) endow networks with equivariance by replacing scalars with vectors. Take the linear layer as an example, $\varvec{v}\in \mathbb {R}^C$ is transformed into $\varvec{Wv}+\varvec{b}\in \mathbb {R}^{C'}$ in classic networks, and $\varvec{V}\in \mathbb {R}^{C\times 3}$ is transformed into $\varvec{WV}\in \mathbb {R}^{C'\times 3}$ in VNs, where $\varvec{W}\in \mathbb {R}^{C'\times C},\varvec{b}\in \mathbb {R}^{C'}$ (Fig. 11). Other layers are modified analogously. VN-Transformer (Assaad et al. 2022) derives equivariant attention mechanisms to enhance effectiveness and efficiency based on VNs. VNs are strongly rotation equivariant and have been applied in object manipulation (Simeonov et al. 2022), molecule generation (Huang et al. 2022b), point cloud registration (Zhu et al. 2022b; Lin et al. 2023b; Ao et al. 2023b), point cloud completion (Wu and Miao 2022), unsupervised point cloud segmentation (Lei et al. 2023), and point cloud canonicalization (Katzir et al. 2022; Kaba et al. 2023). Geometric vector perceptrons (GVPs) (Jing et al. 2021b) similarly operate on geometric vectors. Jing et al. (2021a) apply GVPs on structural biology tasks and reach several state-of-the-art results. PaiNN (Schütt et al. 2021) builds efficient equivariant layers to predict molecular properties. SE(3)-DDM (Liu et al. 2022b) applies PaiNN on the coordinate denoising task. TorchMD-NET (Thölke and Fabritiis 2022) designs attention-based update rules for features of different types. Directed weight neural networks (Li et al. 2022a) generalize VNs and GVPs with more operators, which can be integrated with existing GNN frameworks. Chen et al. (2022) build graph implicit functions with equivariant layers to capture geometric details. Le et al. (2022b) exploit cross products to generate new vectors in the message function.

Villar et al. (2021) utilize several theorems to construct equivariant functions on groups including O(n) and SO(n). GMN (Huang et al. 2022a) constructs equivariant networks similarly and proves their universal approximation. IsoGCNs (Horie et al. 2021) achieve equivariance through operating rank-p tensors $H^p\in \mathbb {R}^{\left| \mathcal {V}\right| \times C\times d^p}$. Using a similar approach, Finkelshtein et al. (2022) define ascending and descending layers for geometric dimension expansion and contraction, respectively. Suk et al. (2021, 2022) leverage equivariant neural networks in computational fluid dynamics. EQGAT (Le et al. 2022a) processes coordinates with attention mechanisms for better performances. Luo et al. (2022) extend message passing networks with learned orientations. DeepDFT (Jørgensen and Bhowmik 2022) employs message passing networks on fast electron density estimation.

Compared to previous methods, equivariant value methods do not introduce approximation error and their theories are relatively simple (Satorras et al. 2021b; Deng et al. 2021a). Albeit recently emerged, they have shown great potential in many applications (Deng et al. 2021a; Ganea et al. 2022; Stärk et al. 2022; Schütt et al. 2021).

4.5 Others

Some equivariant networks use quaternions to represent 3D rotations. REQNN (Shen et al. 2020) employs quaternions to revise classic layers into equivariant ones. Zhao et al. (2020a) propose quaternion equivariant capsule networks to disentangle geometry from poses. Quaternion CNNs (Jing et al. 2021c) utilize convolutions on quaternion arrays for gait identification. Qin et al. (2022) present quaternion product units to address rotation equivariance.

Some methods turn to gauges for rotation equivariance. Gauge equivariant CNNs (Cohen et al. 2019a) propose gauge equivariant convolutions based on the gauge theory. Haan et al. (2021) adapts the above structure to mesh inputs. Gauge equivariant transformer (He et al. 2021) adds attention mechanisms to gauge equivariant CNNs for better performances.

Finzi et al. (2021) derive the equivariant condition like that in 3D Steerable CNNs (Weiler et al. 2018) with Lie algebra representations. EqDDM (Azari and Erdogmus 2022) leverage these constraints to build an equivariant deep dynamical model for motion prediction. Melnyk et al. (2022) establish steerability constraints for spherical neurons to construct equivariant layers.

Li et al. (2019a) take a similar approach to CubeNet (Worrall and Brostow 2018) but without group convolutions, where invariance is achieved by eliminating the permutation. XEdgeConv (Weihsbach et al. 2022) directly explores symmetric kernels for discrete rotation equivariance. Park et al. (2022) design equivariant networks for domains where it is hard to describe the transformation of inputs explicitly.

4.6 Summary

Rotation equivariant methods have a broader application range compared to rotation invariant ones. The differences of various rotation equivariant methods are listed in Table 8. We summarize several characteristics of existing rotation equivariant methods.

The approximation error of G-CNNs (Finzi et al. 2020; Chen et al. 2021) and spherical CNNs (Cohen et al. 2018a; Esteves et al. 2018a) are inevitable, which can only be reduced through fine discretization and cumbersome computation. Therefore, they are less reliable than strongly rotation equivariant methods.
Albeit strongly rotation equivariant, irreducible representation methods (Thomas et al. 2018; Weiler et al. 2018; Thomas 2019) have a complex theory, which poses great challenges for fresh users.
Equivariant value methods (Satorras et al. 2021b; Deng et al. 2021a) achieve great balance between theoretical properties and experimental performances.

Table 8 Comparisons of different rotation equivariant methods

Full size table

5 Application and dataset

Rotation invariance and equivariance are seldom separate problems and always depend on task requirements in specific settings. We give a general overview of applications and datasets involved in related works and divide them into 3D semantic understanding and molecule-related applications.

5.1 3D semantic understanding

3D semantic understanding tasks, like classification, segmentation, and detection, evaluate the capability of DNNs on 3D shapes and scenes. Here we focus on tasks requiring rotation invariance and equivariance. We summarize these tasks and related datasets in Table 9. For aligned datasets, rotation augmentation is required to pose enough challenges on rotation invariant and equivariant methods. In the following discussions, A/B and $\frac{A}{B}$ refer to training with A augmentation and evaluation with B augmentation. We use z and SO(3) to represent azimuthal and random rotation augmentation, respectively.

Table 9 Tasks and datasets in general 3D understanding

Full size table

Table 10 ModelNet40 (Wu et al. 2015) classification results of representative rotation invariant/equivariant methods

Full size table

Table 11 ScanObjectNN (Uy et al. 2019) classification results of representative rotation invariant/equivariant methods

Full size table

Table 12 ShapeNetPart (Yi et al. 2016) segmentation results of representative rotation invariant/equivariant methods

Full size table

Classification. Classification is the most well-studied task in this field. ModelNet (Wu et al. 2015) is a commonly-used 3D CAD model dataset with two versions, i.e., ModelNet10 with 10 categories and ModelNet40 with all 40 ones. We list the experimental results of ModelNet40 classification in Table 10. As the table shows, there is an input type change from images and meshes to point clouds, which can be attributed to the fact that point clouds can provide precise coordinates essential for strong rotation invariance. Besides, rotation invariant methods generally perform better than rotation equivariant ones. They are more suitable for tasks that only require prediction of invariant targets. ShapeNetCore (Chang et al. 2015) is another popular 3D shape dataset with 55 categories. Unlike previous datasets, ScanObjectNN (Uy et al. 2019) is a real-world point cloud dataset, adding more challenges to classification. ScanObjectNN has three popular variants, i.e., OBJ_ONLY, OBJ_BG, PB_T50_RS. Many researchers also evaluate their methods on these datasets, whose experimental results are summarized in Table 11. Fewer works explore ScanObjectNN compared to ModelNet40 (Wu et al. 2015). Besides, there is no consensus on which variant to evaluate and still has much room for improvement. Other datasets are used less frequently, like RGB-D Object (Lai et al. 2011), S3DIS (Armeni et al. 2016), and ScanNet (Dai et al. 2017). Some methods, especially those processing spherical signals, use Spherical MNIST (Cohen et al. 2018a) to evaluate their performances. Yang et al. (2020) create Spherical CIFAR-10 to experiment on photorealistic images. Andrearczyk and Depeursinge (2018); Almakady et al. (2020) exploit RFAI (Paulhac et al. 2009) on 3D texture classification. Yang and Chakraborty (2020) employ the OASIS (Fotenos et al. 2005) for medical image classification.

Segmentation. Segmentation is another popular task, aiming to make fine-grained prediction. In part segmentation for small-scale objects, ShapeNetPart (Yi et al. 2016) is widely applied as the evaluation dataset, where two common metrics, i.e., instance mean IoU (ins.) and class mean IoU (cls.) are generally used. As shown in Table 12, RIConv++ (Zhang et al. 2022) and PaRI-Conv (Chen and Cong 2022) set the state-of-the-art results in class mean IoU and instance mean IoU, respectively. However, we also notice that the differences in evaluation metrics make direct comparisons of various methods confusing and unfair, which should be avoided in future works. Performance gaps still exist between rotation invariant methods and rotation equivariant ones, since part segmentation tasks only require point-wise invariant prediction. Hegde and Gangisetty (2021) employ PartNet (Mo et al. 2019) for a more thorough evaluation. Besides, Zhuang et al. (2019); Zhu et al. (2020) investigate BraTS-2018 (Menze et al. 2015) on brain tumor segmentation. In semantic segmentation for large-scale scenes, S3DIS (Armeni et al. 2016), ScanNet (Dai et al. 2017), Semantic3D (Hackel et al. 2017), and 2D-3D-S (Armeni et al. 2017) are commonly used.

Detection. Detection is a basic task but needs to be more exploited when considering rotation invariance and equivariance. Some works (Yu et al. 2022; Wang et al. 2023b) incorporate equivariant networks with 3D object detectors. These methods are applied on indoor datasets like ScanNetV2 (Dai et al. 2017), SUN RGB-D (Song et al. 2015) and outdoor datasets like KITTI (Geiger et al. 2012), nuScenes (Caesar et al. 2020). Besides, Winkels and Cohen (2018); Andrearczyk et al. (2019, 2020) investigate the pulmonary nodule detection task with LIDC/IDRI (McNitt-Gray et al. 2007) and NLST (Team 2011).

Pose Estimation. The targets for pose estimation are pose parameters. Many aligned datasets can be adjusted for pose estimation, including ModelNet (Wu et al. 2015), ShapeNet (Chang et al. 2015), and ObjectNet3D (Xiang et al. 2016). Besides general shapes, some works focus on the pose estimation of specific objects. Xu et al. (2021a) employ Human3.6M (Ionescu et al. 2014) and MPI-INF-3DHP (Mehta et al. 2017) for human pose estimation. Chen et al. (2018b) regress hand poses on ICVL (Tang et al. 2014), NYU (Tompson et al. 2014), and MSRA (Sun et al. 2015).

Shape Registration. Registration is matching among multiple inputs. 3DMatch (Zeng et al. 2017) is an well-known registration benchmark composed of 7Scenes (Shotton et al. 2013) and SUN3D (Xiao et al. 2013). Liu et al. (2018); Melzi et al. (2019) investigate registration on the Stanford 3D Scanning Repository (Curless and Levoy 1996). Melzi et al. (2019) exploit TOSCA (Bronstein et al. 2008), FAUST (Bogo et al. 2014), and TOPKIDS (Lähner et al. 2016) on deformable shape registration.

Place Recognition. Place recognition is a special case of registration through matching with maps. KITTI (Geiger et al. 2012) includes a series of benchmarks of autonomous driving, where the odometry benchmark is generally adopted to evaluate the place recognition performance. Many datasets are also leveraged for a comprehensive evaluation, including ETH (Pomerleau et al. 2012), NCLT (Carlevaris-Bianco et al. 2016), SceneCity (Zhang et al. 2016), Oxford RobotCar (Maddern et al. 2017), MulRan (Kim et al. 2020a), KITTI-360 (Liao et al. 2022).

Reconstruction. Reconstruction is a pre-training task adopted by many self-supervised methods. Much work (Shen et al. 2020; Deng et al. 2021a; Sun et al. 2021; Zhou et al. 2022b) carries out the reconstruction experiment on ShapeNetCore (Chang et al. 2015). In addition, Yu et al. (2020b) utilize ModelNet40 (Wu et al. 2015) for point cloud inpainting and completion.

Retrieval. Retrieval is the task of finding similar objects to the query object. SHREC’ 17 (Savva et al. 2017) is a famous retrieval challenge based on the ShapeNetCore (Chang et al. 2015). Some methods (Su et al. 2015; Esteves et al. 2019b; Wei et al. 2020) also experiment on ModelNet (Wu et al. 2015).

Others. Ke et al. (2017) use the NTU RGB+D (Shahroudy et al. 2016), the SBU kinect interaction (Yun et al. 2012), and the CMU dataset (CMU 2002) for skeleton action recognition. Qin et al. (2022) apply FPHA (Garcia-Hernando et al. 2018) on hand action recognition. Besides, some methods (Liu et al. 2019b; Zhang et al. 2020c; Yang et al. 2021) exploit ModelNet40 (Wu et al. 2015) on normal estimation. Esteves et al. (2023) employ the WeatherBench (Rasp et al. 2020) to evaluate large spherical CNNs (Esteves et al. 2018a) on weather forecasting.

5.2 Molecule-related application

Recently, the number of papers that employ rotation equivariant networks on molecular data grows explosively. The physical and chemical laws determine the relative but not absolute positions of atoms. Therefore, rotation invariance and equivariance are inherently needed in molecule-related applications. As related work goes further, many new tasks are investigated, and we only summarize some representative ones. Tasks and datasets are listed in Table 13.

Table 13 Tasks and datasets in molecule-related applications

Full size table

Table 14 QM9 (Ramakrishnan et al. 2014) prediction mean absolute error of representative rotation invariant/equivariant methods

Full size table

Prediction Prediction is to predict molecular properties giving molecular structures. QM7 (Blum and Reymond 2009; Rupp et al. 2012) is a small and pioneering dataset used by some works (Liu et al. 2022c; Kondor et al. 2018). QM9 (Ramakrishnan et al. 2014) is a commonly-used dataset, including 134k molecules with geometric, energetic, electronic, and thermodynamic properties. As shown in Tab. 14, there are more rotation equivariant methods than rotation invariant ones in this prediction task. As related research dives further, novel methods with powerful and sophisticated structures show great potential in decreasing the mean absolute error of molecular property prediction. ATOM3D (Townshend et al. 2021) is a set of benchmarks including various tasks. Other datasets, including MD17 (Chmiela et al. 2017), ISO17 (Schütt et al. 2017), ESOL (Delaney 2004), BACE (Subramanian et al. 2016), PDB (Berman et al. 2003), and OC20 (Zitnick et al. 2020), are also applied in different prediction tasks.

Generation In generation, the model is required to generate molecules according to certain requirements. Thomas et al. (2018) employ random deletion on QM9 (Ramakrishnan et al. 2014) and validate the model with an inpainting task. Jing et al. (2021b); Li et al. (2022a) exploit CATH 4.2 (Ingraham et al. 2019) and TS50 (Li et al. 2014) on computational protein design. Du et al. (2021) employ subsets of GEOM (Axelrod and Gómez-Bombarelli 2022) on conformation generation tasks. Satorras et al. (2021a) utilize LJ-13 (Köhler et al. 2020) on 3D states generation.

Others Jing et al. (2021b); Li et al. (2022a) apply CASP (Cheng et al. 2019) on model quality assessment. Poulenard et al. (2019) leverage PDB (Berman et al. 2000) on RNA segmentation. Ganea et al. (2022) exploit DB5.5 (Vreven et al. 2015) and DIPS (Townshend et al. 2019) on rigid protein docking.

6 Future direction

Here we point out several future research directions inspired from unsolved problems in the presence of methods and tasks.

6.1 Method

The pros and cons of existing methods have been summarized in Sects. 3 and 4. Therefore, the future method should perform better and avoid any previous drawbacks by possessing the following properties.

Strong rotation invariance and equivariance. This survey includes weakly invariant and equivariant methods in discussing rotation invariance and equivariance for the first time. Nonetheless, we argue only to use these methods if necessary. They involve redundant uncertainties and cannot deliver consistent results for the same inputs with different poses.
Concise mathematical background. The theory of many existing methods is too verbose and complicated. It should be simplified, especially when they have little connection with the implementation. Any novel method should avoid exploring general but unrelated theories.
High computational efficiency. Due to the high latency, many well-performed methods cannot be employed in practical applications. As the research progresses to large-scale and complex data, the latest work should consider such application scenarios and be as efficient as possible.
Reliable integrability. Many successful DNNs have been developed for numerous applications where rotation invariance and equivariance are not considered. Therefore, they are only suitable for aligned data. If lately-developed methods can be integrated with these models straightforwardly, then the composite models would benefit from both.

6.2 Theoretical analysis

Most of the existing theoretical analysis addresses strong invariance and equivariance. Some methods propose mathematical frameworks to construct equivariant networks (Kondor and Trivedi 2018; Cohen et al. 2018b, 2019b; Esteves 2020; Aronsson 2021; Gerken et al. 2021; Winter et al. 2022). However, the discussion on universal approximation is quite limited (Dym and Maron 2021), and most equivariant networks do not have solid mathematical foundations.

6.3 Benchmark

The research on rotation invariance and equivariance is still immature and lacks reliable and comprehensive benchmarks. Except for some well-studied tasks, most applications have yet to be intensively investigated. The evaluation metric (Eq. 3) has yet to be commonly adopted, especially for weakly rotation invariant and equivariant methods. Existing metrics cannot reflect the strength of invariance and equivariance.

7 Conclusion

In this survey, we give a comprehensive overview of rotation invariant and equivariant methods in 3D deep learning. We first discuss the limitation of DNNs trained with canonical poses, which motivates the research of rotation invariant and equivariant methods. Then, we define weak/strong invariance and equivariance and provide a unified theoretical framework for analysis. Totally, all the existing methods are divided into rotation invariant ones and equivariant ones, which continue to be subclassified in terms of their principles. At this level, representative literatures are reviewed and discussed, and both applications and datasets are sorted out. Finally, we pose some open problems and deliver future research directions based on challenges and difficulties in the current research. We hope this survey can serve as an effective tool for future research on rotation invariant and equivariant methods.

References

Almakady Y, Mahmoodi S, Conway J et al (2020) Rotation invariant features based on three dimensional Gaussian Markov random fields for volumetric texture classification. Comput Vis Image Underst 194(102):931. https://doi.org/10.1016/j.cviu.2020.102931
Article Google Scholar
Anderson B, Hy TS, Kondor R (2019) Cormorant: covariant molecular neural networks. In: Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Andrearczyk V, Depeursinge A (2018) Rotational 3d texture classification using group equivariant cnns. arXiv preprint arXiv:1810.06889
Andrearczyk V, Fageot J, Oreiller V, et al (2019) Exploring local rotation invariance in 3d cnns with steerable filters. In: Proceedings of The 2nd international conference on medical imaging with deep learning, proceedings of machine learning research, vol 102. PMLR, pp 15–26
Andrearczyk V, Fageot J, Oreiller V et al (2020) Local rotation invariance in 3d cnns. Med Image Anal 65(101):756. https://doi.org/10.1016/j.media.2020.101756
Article Google Scholar
Ao S, Hu Q, Yang B, et al (2021) Spinnet: learning a general surface descriptor for 3d point cloud registration. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11,748–11,757, https://doi.org/10.1109/CVPR46437.2021.01158
Ao S, Guo Y, Hu Q et al (2023) You only train once: learning general and distinctive 3d local descriptors. IEEE Trans Pattern Anal Mach Intell 45(3):3949–3967. https://doi.org/10.1109/TPAMI.2022.3180341
Article Google Scholar
Ao S, Hu Q, Wang H, et al (2023b) Buffer: balancing accuracy, efficiency, and generalizability in point cloud registration. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1255–1264, https://doi.org/10.1109/CVPR52729.2023.00127
Armeni I, Sener O, Zamir AR, et al (2016) 3d semantic parsing of large-scale indoor spaces. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1534–1543, https://doi.org/10.1109/CVPR.2016.170
Armeni I, Sax S, Zamir AR, et al (2017) Joint 2d-3d-semantic data for indoor scene understanding. https://doi.org/10.48550/ARXIV.1702.01105
Aronsson J (2021) Homogeneous vector bundles and g-equivariant convolutional neural networks. PhD thesis, Chalmers Tekniska Hogskola
Artin M (2013) Algebra. Pearson Education, London
Google Scholar
Assaad S, Downey C, Al-Rfou’ R, et al (2022) VN-transformer: rotation-equivariant attention for vector neurons. arXiv:2206.04176
Axelrod S, Gómez-Bombarelli R (2022) Geom, energy-annotated molecular conformations for property prediction and molecular generation. Sci Data 9(1):185. https://doi.org/10.1038/s41597-022-01288-4
Article Google Scholar
Azari B, Erdogmus D (2022) Equivariant deep dynamical model for motion prediction. In: Proceedings of The 25th international conference on artificial intelligence and statistics, proceedings of machine learning research, vol 151. PMLR, pp 11,655–11,668
Bai X, Luo Z, Zhou L, et al (2020) D3feat: joint learning of dense detection and description of 3d local features. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6358–6366, https://doi.org/10.1109/CVPR42600.2020.00639
Batzner S, Musaelian A, Sun L et al (2022) E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat Commun 13(1):2453. https://doi.org/10.1038/s41467-022-29939-5
Article Google Scholar
Bergmann P, Sattlegger D (2023) Anomaly detection in 3d point clouds using deep geometric descriptors. In: 2023 IEEE/CVF winter conference on applications of computer vision (WACV), pp 2612–2622, https://doi.org/10.1109/WACV56688.2023.00264
Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide protein data bank. Nat Struct Mol Biol 10(12):980. https://doi.org/10.1038/nsb1203-980
Article Google Scholar
Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28(1):235–242. https://doi.org/10.1093/nar/28.1.235
Article Google Scholar
Blum LC, Reymond JL (2009) 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 131(25):8732–8733. https://doi.org/10.1021/ja902302h
Article Google Scholar
Bobkov D, Chen S, Jian R et al (2018) Noise-resistant deep learning for object classification in three-dimensional point clouds using a point pair descriptor. IEEE Robotics and Automation Letters 3(2):865–872. https://doi.org/10.1109/LRA.2018.2792681
Article Google Scholar
Bogo F, Romero J, Loper M, et al (2014) Faust: dataset and evaluation for 3d mesh registration. In: 2014 IEEE conference on computer vision and pattern recognition, pp 3794–3801, https://doi.org/10.1109/CVPR.2014.491
Brandstetter J, Hesselink R, van der Pol E, et al (2022) Geometric and physical quantities improve e(3) equivariant message passing. In: International conference on learning representations (ICLR)
Bronstein AM, Bronstein MM, Kimmel R (2008) Numerical geometry of non-rigid shapes. Springer, Berlin
Google Scholar
Caesar H, Bankiti V, Lang AH, et al (2020) nuscenes: a multimodal dataset for autonomous driving. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11,618–11,628, https://doi.org/10.1109/CVPR42600.2020.01164
Cao H, Zhan R, Ma Y et al (2021) Lfnet: local rotation invariant coordinate frame for robust point cloud analysis. IEEE Signal Process Lett 28:209–213. https://doi.org/10.1109/LSP.2020.3048605
Article Google Scholar
Cao Z, Huang Q, Karthik R (2017) 3d object classification via spherical projections. In: 2017 international conference on 3D vision (3DV), pp 566–574, https://doi.org/10.1109/3DV.2017.00070
Carlevaris-Bianco N, Ushani AK, Eustice RM (2016) University of Michigan North campus long-term vision and lidar dataset. Int J Robot Res 35(9):1023–1035. https://doi.org/10.1177/0278364915614638
Article Google Scholar
Chang AX, Funkhouser T, Guibas L, et al (2015) Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012
Chatzipantazis E, Pertigkiozoglou S, Dobriban E, et al (2023) ${\rm SE(3)}$-equivariant attention networks for shape reconstruction in function space. In: International conference on learning representations (ICLR)
Chen C, Li C, Chen L, et al (2018a) Continuous-time flows for efficient inference and density estimation. In: Proceedings of the 35th International conference on machine learning (ICML), proceedings of machine learning research, vol 80. PMLR, pp 824–833
Chen C, Fragonara LZ, Tsourdos A (2019a) Gapnet: graph attention based point neural network for exploiting local feature of point cloud. https://doi.org/10.48550/ARXIV.1905.08705
Chen C, Li G, Xu R, et al (2019b) Clusternet: deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4989–4997, https://doi.org/10.1109/CVPR.2019.00513
Chen H, Liu S, Chen W, et al (2021) Equivariant point network for 3d point cloud analysis. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14,509–14,518, https://doi.org/10.1109/CVPR46437.2021.01428
Chen H, Zhao J, Zhang Q (2023) Rotation-equivariant spherical vector networks for objects recognition with unknown poses. Vis Comput. https://doi.org/10.1007/s00371-023-02904-z
Article Google Scholar
Chen Q, Chen Y (2022) Multi-view 3d model retrieval based on enhanced detail features with contrastive center loss. Multimed Tools Appl. https://doi.org/10.1007/s11042-022-12281-9
Article Google Scholar
Chen R, Cong Y (2022) The devil is in the pose: ambiguity-free 3d rotation-invariant learning via pose-aware convolution. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7462–7471, https://doi.org/10.1109/CVPR52688.2022.00732
Chen X, Wang G, Zhang C et al (2018) Shpr-net: deep semantic hand pose regression from point clouds. IEEE Access 6:43425–43439. https://doi.org/10.1109/ACCESS.2018.2863540
Article Google Scholar
Chen Y, Fernando B, Bilen H et al (2022) 3d equivariant graph implicit functions. Computer Vision–ECCV 2022. Springer, Cham, pp 485–502. https://doi.org/10.1007/978-3-031-20062-5_28
Chapter Google Scholar
Cheng J, Choe MH, Elofsson A et al (2019) Estimation of model accuracy in casp13. Proteins Struct Funct Bioinf 87(12):1361–1377. https://doi.org/10.1002/prot.25767
Article Google Scholar
Chmiela S, Tkatchenko A, Sauceda HE et al (2017) Machine learning of accurate energy-conserving molecular force fields. Sci Adv 3(5):e1603,015. https://doi.org/10.1126/sciadv.1603015
Article Google Scholar
Chou YC, Lin YP, Yeh YM, et al (2021) 3d-gfe: a three-dimensional geometric-feature extractor for point cloud data. In: 2021 Asia-Pacific Signal and information processing association annual summit and conference (APSIPA ASC), pp 2013–2017
Choy C, Park J, Koltun V (2019) Fully convolutional geometric features. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 8957–8965, https://doi.org/10.1109/ICCV.2019.00905
CMU (2002) Cmu graphics lab motion capture database. http://mocap.cs.cmu.edu/
Cohen T, Welling M (2016) Group equivariant convolutional networks. In: Balcan MF, Weinberger KQ (eds) Proceedings of The 33rd international conference on machine learning (ICML), proceedings of machine learning research, vol 48. PMLR, New York, New York, USA, pp 2990–2999
Cohen T, Weiler M, Kicanaoglu B, et al (2019a) Gauge equivariant convolutional networks and the icosahedral CNN. In: Proceedings of the 36th international conference on machine learning (ICML), proceedings of machine learning research, vol 97. PMLR, pp 1321–1330
Cohen TS, Geiger M, Köhler J, et al (2018a) Spherical CNNS. In: International conference on learning representations (ICLR)
Cohen TS, Geiger M, Weiler M (2018b) Intertwiners between induced representations (with applications to the theory of equivariant neural networks). https://doi.org/10.48550/ARXIV.1803.10743
Cohen TS, Geiger M, Weiler M (2019b) A general theory of equivariant cnns on homogeneous spaces. In: Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Cornwell JF (1997) Group theory in physics: an introduction. Academic press, San Diego
Google Scholar
Curless B, Levoy M (1996) A volumetric method for building complex models from range images. In: Proceedings of the 23rd annual conference on computer graphics and interactive techniques. Association for computing machinery, New York, NY, USA, SIGGRAPH ’96, pp 303–312, https://doi.org/10.1145/237170.237269
Dai A, Chang AX, Savva M, et al (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2432–2443, https://doi.org/10.1109/CVPR.2017.261
Delaney JS (2004) Esol: estimating aqueous solubility directly from molecular structure. J Chem Inf Comput Sci 44(3):1000–1005. https://doi.org/10.1021/ci034243x
Article Google Scholar
Deng C, Litany O, Duan Y, et al (2021a) Vector neurons: a general framework for so(3)-equivariant networks. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 12,180–12,189, https://doi.org/10.1109/ICCV48922.2021.01198
Deng H, Birdal T, Ilic S (2018) Ppf-foldnet: unsupervised learning of rotation invariant 3d local descriptors. Computer VisionECCV 2018. Springer, Cham, pp 620–638. https://doi.org/10.1007/978-3-030-01228-1_37
Chapter Google Scholar
Deng H, Birdal T, Ilic S (2018b) Ppfnet: global context aware local features for robust 3d point matching. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 195–205, https://doi.org/10.1109/CVPR.2018.00028
Deng S, Liu B, Dong Q, et al (2021b) Rotation transformation network: learning view-invariant point cloud for classification and segmentation. In: 2021 IEEE international conference on multimedia and expo (ICME), pp 1–6, https://doi.org/10.1109/ICME51207.2021.9428265
Drost B, Ulrich M, Navab N, et al (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 998–1005, https://doi.org/10.1109/CVPR.2010.5540108
Du W, Zhang H, Du Y, et al (2021) Equivariant vector field network for many-body system modeling. https://doi.org/10.48550/ARXIV.2110.14811
Dym N, Maron H (2021) On the universality of rotation equivariant point cloud networks. In: International conference on learning representations (ICLR)
Esteves C (2020) Theoretical aspects of group equivariant neural networks. arXiv preprint arXiv:2004.05154
Esteves C, Allen-Blanchette C, Makadia A et al (2018) Learning so(3) equivariant representations with spherical CNNS. Computer Vision–ECCV 2018. Springer International Publishing, Cham, pp 54–70. https://doi.org/10.1007/978-3-030-01261-8_4
Chapter Google Scholar
Esteves C, Allen-Blanchette C, Zhou X, et al (2018b) Polar transformer networks. In: International conference on learning representations (ICLR)
Esteves C, Sud A, Luo Z, et al (2019a) Cross-domain 3d equivariant image embeddings. In: Proceedings of the 36th international conference on machine learning (ICML), Proceedings of machine learning research, vol 97. PMLR, pp 1812–1822
Esteves C, Xu Y, Allec-Blanchette C, et al (2019b) Equivariant multi-view networks. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 1568–1577, https://doi.org/10.1109/ICCV.2019.00165
Esteves C, Allen-Blanchette C, Makadia A et al (2020) Learning so(3) equivariant representations with spherical CNNS. Int J Comput Vis 128:588–600. https://doi.org/10.1007/s11263-019-01220-1
Article Google Scholar
Esteves C, Makadia A, Daniilidis K (2020b) Spin-weighted spherical CNNS. In: Advances in neural information processing systems (NeurIPS), vol 33. Curran Associates, Inc., pp 8614–8625
Esteves C, Slotine JJ, Makadia A (2023) Scaling spherical CNNs. In: Krause A, Brunskill E, Cho K, et al (eds) Proceedings of the 40th international conference on machine learning (ICML), Proceedings of machine learning research, vol 202. PMLR, pp 9396–9411
Fan S, Dong Q, Zhu F, et al (2021) Scf-net: learning spatial contextual features for large-scale point cloud segmentation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14,499–14,508, https://doi.org/10.1109/CVPR46437.2021.01427
Fan Y, He Y, Tan UX (2020) Seed: a segmentation-based egocentric 3d point cloud descriptor for loop closure detection. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5158–5163, https://doi.org/10.1109/IROS45743.2020.9341517
Fan Z, Song Z, Zhang W et al (2023) Rpr-net: a point cloud-based rotation-aware large scale place recognition network. Computer Vision–ECCV 2022 Workshops. Springer Nature Switzerland, Cham, pp 709–725. https://doi.org/10.1007/978-3-031-25056-9_45
Chapter Google Scholar
Fang J, Zhou D, Song X, et al (2020) Rotpredictor: unsupervised canonical viewpoint learning for point cloud classification. In: 2020 international conference on 3D vision (3DV), pp 987–996, https://doi.org/10.1109/3DV50981.2020.00109
Fei J, Deng Z (2024) Incorporating rotation invariance with non-invariant networks for point clouds. In: 2024 international conference on 3D vision (3DV)
Fei J, Zhu Z, Liu W et al (2022) Dumlp-pin: a dual-mlp-dot-product permutation-invariant network for set feature extraction. Proceedings of the AAAI conference on artificial intelligence (AAAI) 36(1):598–606. https://doi.org/10.1609/aaai.v36i1.19939
Article Google Scholar
Fent F, Bauerschmidt P, Lienkamp M (2023) Radargnn: transformation invariant graph neural network for radar-based perception. In: 2023 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 182–191, https://doi.org/10.1109/CVPRW59228.2023.00023
Finkelshtein B, Baskin C, Maron H, et al (2022) A simple and universal rotation equivariant point-cloud network. In: Proceedings of topological, algebraic, and geometric learning workshops 2022, Proceedings of machine learning research, vol 196. PMLR, pp 107–115
Finzi M, Stanton S, Izmailov P, et al (2020) Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In: Proceedings of the 37th international conference on machine learning (ICML), proceedings of machine learning research, vol 119. PMLR, pp 3165–3176
Finzi M, Welling M, Wilson AGG (2021) A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In: Proceedings of the 38th international conference on machine learning (ICML), Proceedings of machine learning research, vol 139. PMLR, pp 3318–3328
Fotenos AF, Snyder AZ, Girton LE et al (2005) Normative estimates of cross-sectional and longitudinal brain volume decline in aging and ad. Neurology 64(6):1032–1039. https://doi.org/10.1212/01.WNL.0000154530.72969.11
Article Google Scholar
Fox J, Zhao B, del Rio BG, et al (2022) Concentric spherical neural network for 3d representation learning. In: 2022 international joint conference on neural networks (IJCNN), pp 1–8, https://doi.org/10.1109/IJCNN55064.2022.9892358
Fu R, Yang J, Sun J, et al (2020) Risa-net: rotation-invariant structure-aware network for fine-grained 3d shape retrieval. https://doi.org/10.48550/ARXIV.2010.00973
Fuchs F, Worrall D, Fischer V, et al (2020) Se(3)-transformers: 3d roto-translation equivariant attention networks. In: Advances in neural information processing systems (NeurIPS), vol 33. Curran Associates, Inc., pp 1970–1981
Fuchs FB, Wagstaff E, Dauparas J et al (2021) Iterative se(3)-transformers. Geometric science of information. Springer International Publishing, Cham, pp 585–595. https://doi.org/10.1007/978-3-030-80209-7_63
Chapter Google Scholar
Furuya T, Ohbuchi R (2016) Deep aggregation of local 3d geometric features for 3d model retrieval. In: Richard C. Wilson ERH, Smith WAP (eds) Proceedings of the British machine vision conference (BMVC). BMVA Press, pp 121.1–121.12, https://doi.org/10.5244/C.30.121
Furuya T, Hang X, Ohbuchi R et al (2020) Convolution on rotation-invariant and multi-scale feature graph for 3d point set segmentation. IEEE Access 8:140250–140260. https://doi.org/10.1109/ACCESS.2020.3012613
Article Google Scholar
Gandikota KV, Geiping J, Lähner Z, et al (2021) Training or architecture? how to incorporate invariance in neural networks. arXiv preprint arXiv:2106.10044
Ganea OE, Huang X, Bunne C, et al (2022) Independent SE(3)-equivariant models for end-to-end rigid protein docking. In: International conference on learning representations (ICLR)
Garcia-Hernando G, Yuan S, Baek S, et al (2018) First-person hand action benchmark with RGB-d videos and 3d hand pose annotations. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 409–419, https://doi.org/10.1109/CVPR.2018.00050
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3354–3361, https://doi.org/10.1109/CVPR.2012.6248074
Gerken JE, Aronsson J, Carlsson O et al (2021) Geometric deep learning and equivariant neural networks. Artif Intell Rev 56(12):14605–14662
Article Google Scholar
Gilmer J, Schoenholz SS, Riley PF, et al (2017) Neural message passing for quantum chemistry. In: Precup D, Teh YW (eds) Proceedings of the 34th International conference on machine learning (ICML), Proceedings of machine learning research, vol 70. PMLR, pp 1263–1272
Gojcic Z, Zhou C, Wegner JD, et al (2019) The perfect match: 3d point cloud matching with smoothed densities. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5540–5549, https://doi.org/10.1109/CVPR.2019.00569
Gu R, Wu Q, Ng WW et al (2021) Erinet: enhanced rotation-invariant network for point cloud classification. Pattern Recogn Lett 151:180–186. https://doi.org/10.1016/j.patrec.2021.08.010
Article Google Scholar
Gu R, Wu Q, Xu H, et al (2021b) Learning efficient rotation representation for point cloud via local-global aggregation. In: 2021 IEEE International conference on multimedia and expo (ICME), pp 1–6, https://doi.org/10.1109/ICME51207.2021.9428170
Gu R, Wu Q, Li Y et al (2022) Enhanced local and global learning for rotation-invariant point cloud representation. IEEE Multimed. https://doi.org/10.1109/MMUL.2022.3151906
Article Google Scholar
Guan J, Qian WW, Peng X, et al (2023) 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In: International conference on learning representations (ICLR)
Guerrero P, Kleiman Y, Ovsjanikov M et al (2018) Pcpnet learning local shape properties from raw point clouds. Comput Graph Forum 37(2):75–85. https://doi.org/10.1111/cgf.13343
Article Google Scholar
Guo Y, Sohel F, Bennamoun M et al (2013) Rotational projection statistics for 3d local surface description and object recognition. Int J Comput Vis 105(1):63–86. https://doi.org/10.1007/s11263-013-0627-y
Article MathSciNet Google Scholar
Haan PD, Weiler M, Cohen T, et al (2021) Gauge equivariant mesh cnns: anisotropic convolutions on geometric graphs. In: International conference on learning representations (ICLR)
Hackel T, Savinov N, Ladicky L, et al (2017) Semantic3d.net: a new large-scale point cloud classification benchmark. In: ISPRS annals of the photogrammetry, remote sensing and spatial information sciences, pp 91–98
Han J, Rong Y, Xu T, et al (2022) Geometrically equivariant graph neural networks: a survey. arXiv preprint arXiv:2202.07230
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778, https://doi.org/10.1109/CVPR.2016.90
He L, Dong Y, Wang Y, et al (2021) Gauge equivariant transformer. In: Ranzato M, Beygelzimer A, Dauphin Y, et al (eds) Advances in neural information processing systems (NeurIPS), vol 34. Curran Associates, Inc., pp 27,331–27,343
Hegde S, Gangisetty S (2021) Pig-net: inception based deep learning architecture for 3d point cloud segmentation. Comput Graph 95:13–22. https://doi.org/10.1016/j.cag.2021.01.004
Article Google Scholar
Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. Artificial neural networks and machine learning (ICANN). Springer, Heidelberg, pp 44–51. https://doi.org/10.1007/978-3-642-21735-7_6
Chapter Google Scholar
Hoogeboom E, Satorras VG, Vignac C, et al (2022) Equivariant diffusion for molecule generation in 3D. In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of machine learning research, vol 162. PMLR, pp 8867–8887
Horie M, Morita N, Hishinuma T, et al (2021) Isometric transformation invariant and equivariant graph convolutional networks. In: International conference on learning representations (ICLR)
Horwitz E, Hoshen Y (2023) Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection. In: 2023 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 2968–2977, https://doi.org/10.1109/CVPRW59228.2023.00298
Huang W, Han J, Rong Y, et al (2022a) Equivariant graph mechanics networks with constraints. In: International conference on learning representations (ICLR)
Huang Y, Peng X, Ma J, et al (2022b) 3DLinker: an e(3) equivariant variational autoencoder for molecular linker design. In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of machine learning research, vol 162. PMLR, pp 9280–9294
Hutchinson MJ, Lan CL, Zaidi S, et al (2021) Lietransformer: equivariant self-attention for lie groups. In: Proceedings of the 38th international conference on machine learning (ICML), proceedings of machine learning research, vol 139. PMLR, pp 4533–4543
Igashov I, Stärk H, Vignac C, et al (2022) Equivariant 3d-conditional diffusion models for molecular linker design. arXiv preprint arXiv:2210.05274
Ingraham J, Garg V, Barzilay R, et al (2019) Generative models for graph-based protein design. In: Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Ionescu C, Papava D, Olaru V et al (2014) Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339. https://doi.org/10.1109/TPAMI.2013.248
Article Google Scholar
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. In: Cortes C, Lawrence N, Lee D et al (eds) Advances in neural information processing systems (NIPS), vol 28. Curran Associates Inc
Google Scholar
Jing B, Eismann S, Soni PN, et al (2021a) Equivariant graph neural networks for 3d macromolecular structure. arXiv preprint arXiv:2106.03843
Jing B, Eismann S, Suriana P, et al (2021b) Learning from protein structure with geometric vector perceptrons. In: International conference on learning representations (ICLR)
Jing B, Prabhu V, Gu A et al (2021) Rotation-invariant gait identification with quaternion convolutional neural networks (student abstract). Proc AAAI Conf Artif Intell (AAAI) 35(18):15805–15806. https://doi.org/10.1609/aaai.v35i18.17899
Article Google Scholar
Joseph-Rivlin M, Zvirin A, Kimmel R (2019) Momen(e)t: f lavor the moments in learning to classify shapes. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp 4085–4094, https://doi.org/10.1109/ICCVW.2019.00503
Jørgensen PB, Bhowmik A (2022) Equivariant graph neural networks for fast electron density estimation of molecules, liquids, and solids. NPJ Comput Mater 8:183. https://doi.org/10.1038/s41524-022-00863-y
Article Google Scholar
Kaba SO, Mondal AK, Zhang Y, et al (2023) Equivariance with learned canonicalization functions. In: Krause A, Brunskill E, Cho K, et al (eds) Proceedings of the 40th international conference on machine learning, proceedings of machine learning research, vol 202. PMLR, pp 15,546–15,566
Kadam P, Zhang M, Liu S et al (2022) R-pointhop: a green, accurate, and unsupervised point cloud registration method. IEEE Trans Image Process 31:2710–2725. https://doi.org/10.1109/TIP.2022.3160609
Article Google Scholar
Kadam P, Prajapati H, Zhang M, et al (2023) S3i-pointhop: So(3)-invariant pointhop for 3d point cloud classification. In: ICASSP 2023 - 2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5, https://doi.org/10.1109/ICASSP49357.2023.10095473
Kajita S, Ohba N, Jinnouchi R et al (2017) A universal 3d voxel descriptor for solid-state material informatics with deep convolutional neural networks. Sci Rep 7(16):991. https://doi.org/10.1038/s41598-017-17299-w
Article Google Scholar
Kasaei SH (2021) Orthographicnet: a deep transfer learning approach for 3-d object recognition in open-ended domains. IEEE/ASME Trans Mechatron 26(6):2910–2921. https://doi.org/10.1109/TMECH.2020.3048433
Article Google Scholar
Katzir O, Lischinski D, Cohen-Or D (2022) Shape-pose disentanglement using se(3)-equivariant vector neurons. Computer Vision–ECCV 2022. Springer Nature Switzerland, Cham, pp 468–484
Chapter Google Scholar
Ke Q, An S, Bennamoun M et al (2017) Skeletonnet: mining deep part features for 3-d action recognition. IEEE Signal Process Lett 24(6):731–735. https://doi.org/10.1109/LSP.2017.2690339
Article Google Scholar
Kim G, Park YS, Cho Y, et al (2020a) Mulran: Multimodal range dataset for urban place recognition. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 6246–6253, https://doi.org/10.1109/ICRA40945.2020.9197298
Kim S, Park J, Han B (2020b) Rotation-invariant local-to-global representation learning for 3d point cloud. In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in neural information processing systems (NeurIPS), vol 33. Curran Associates, Inc., pp 8174–8185
Kim S, Jeong Y, Park C, et al (2022) SeLCA: Self-supervised learning of canonical axis. In: NeurIPS 2022 workshop on symmetry and geometry in neural representations
Köhler J, Klein L, Noe F (2020) Equivariant flows: exact likelihood generative learning for symmetric densities. In: Proceedings of the 37th international conference on machine learning (ICML), proceedings of machine learning research, vol 119. PMLR, pp 5361–5370
Kondor R (2018) N-body networks: a covariant hierarchical neural network architecture for learning atomic potentials. arXiv preprint arXiv:1803.01588
Kondor R, Trivedi S (2018) On the generalization of equivariance and convolution in neural networks to the action of compact groups. In: Proceedings of the 35th international conference on machine learning (ICML), proceedings of machine learning research, vol 80. PMLR, pp 2747–2755
Kondor R, Lin Z, Trivedi S (2018) Clebsch-gordan nets: a fully fourier space spherical convolutional neural network. In: Bengio S, Wallach H, Larochelle H et al (eds) Advances in neural information processing systems (NeurIPS), vol 31. Curran Associates Inc, NewYork
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges C, Bottou L et al (eds) Advances in neural information processing systems (NIPS), vol 25. Curran Associates Inc, NewYork
Google Scholar
Lähner Z, Rodola E, Bronstein MM, et al (2016) Shrec’16: matching of deformable shapes with topological noise. Proc 3DOR 2(10.2312)
Lai K, Bo L, Ren X, et al (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE international conference on robotics and automation (ICRA), pp 1817–1824, https://doi.org/10.1109/ICRA.2011.5980382
Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4558–4567, https://doi.org/10.1109/CVPR.2018.00479
Le H (2021) Geometric invariance of pointnet. Bachelor’s thesis, Tampere University, Tampere, Finland
Le T, Noé F, Clevert DA (2022a) Equivariant graph attention networks for molecular property prediction. arXiv preprint arXiv:2202.09891
Le T, Noe F, Clevert DA (2022b) Representation learning on biomolecular structures using equivariant graph attention. In: Rieck B, Pascanu R (eds) Proceedings of the first learning on graphs conference, proceedings of machine learning research, vol 198. PMLR, pp 30:1–30:17
Lei J, Deng C, Schmeckpeper K, et al (2023) Efem: equivariant neural field expectation maximization for 3d object segmentation without scene supervision. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4902–4912, https://doi.org/10.1109/CVPR52729.2023.00475
Li C, Wei W, Li J et al (2021) 3dmol-net: learn 3d molecular representation using adaptive graph convolutional network based on rotation invariance. IEEE J Biomed Health Inform. https://doi.org/10.1109/JBHI.2021.3089162
Article Google Scholar
Li F, Fujiwara K, Okura F, et al (2021b) A closer look at rotation-invariant deep point cloud analysis. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 16,198–16,207, https://doi.org/10.1109/ICCV48922.2021.01591
Li J, Bi Y, Lee GH (2019a) Discrete rotation equivariance for point cloud recognition. In: 2019 International conference on robotics and automation (ICRA), pp 7269–7275, https://doi.org/10.1109/ICRA.2019.8793983
Li J, Luo S, Deng C, et al (2022a) Directed weight neural networks for protein structure representation learning. https://doi.org/10.48550/ARXIV.2201.13299
Li L, Zhu S, Fu H, et al (2020) End-to-end learning local multi-view descriptors for 3d point clouds. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1916–1925, https://doi.org/10.1109/CVPR42600.2020.00199
Li L, Kong X, Zhao X et al (2022) Rinet: efficient 3d lidar-based place recognition using rotation invariant neural network. IEEE Robot Autom Lett 7(2):4321–4328. https://doi.org/10.1109/LRA.2022.3150499
Article Google Scholar
Li RW, Zhang LX, Li C, et al (2023a) E3sym: leveraging e(3) invariance for unsupervised 3d planar reflective symmetry detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 14,543–14,553
Li X, Li R, Chen G et al (2021) A rotation-invariant framework for deep point cloud analysis. IEEE Trans Visual Comput Graph. https://doi.org/10.1109/TVCG.2021.3092570
Article Google Scholar
Li X, Weng Y, Yi L et al (2021) Leveraging se(3) equivariance for self-supervised category-level object pose estimation from point clouds. In: Ranzato M, Beygelzimer A, Dauphin Y et al (eds) Advances in neural information processing systems (NeurIPS), vol 34. Curran Associates Inc., NewYork, pp 15370–15381
Google Scholar
Li X, Wu W, Fern XZ, et al (2023b) Improving the robustness of point convolution on k-nearest neighbor neighborhoods with a viewpoint-invariant coordinate transform. In: 2023 IEEE/CVF winter conference on applications of computer vision (WACV), pp 1287–1297, https://doi.org/10.1109/WACV56688.2023.00134
Li Y, Gu C, Dullien T, et al (2019b) Graph matching networks for learning the similarity of graph structured objects. In: Proceedings of the 36th international conference on machine learning (ICML), proceedings of machine learning research, vol 97. PMLR, pp 3835–3845
Li Z, Yang Y, Faraggi E et al (2014) Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles. Proteins Struct Funct Bioinf 82(10):2565–2573. https://doi.org/10.1002/prot.24620
Article Google Scholar
Liao Y, Xie J, Geiger A (2022) Kitti-360: a novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3179507
Article Google Scholar
Lin CE, Song J, Zhang R, et al (2022a) SE(3)-equivariant point cloud-based place recognition. In: 6th Annual conference on robot learning
Lin CE, Song J, Zhang R, et al (2023a) Se(3)-equivariant point cloud-based place recognition. In: Liu K, Kulic D, Ichnowski J (eds) Proceedings of The 6th conference on robot learning, proceedings of machine learning research, vol 205. PMLR, pp 1520–1530
Lin CW, Chen TI, Lee HY, et al (2023b) Coarse-to-fine point cloud registration with se(3)-equivariant representations. In: 2023 IEEE international conference on robotics and automation (ICRA), pp 2833–2840, https://doi.org/10.1109/ICRA48891.2023.10161141
Lin H, Huang Y, Liu M, et al (2022b) Diffbp: generative diffusion of 3d molecules for target protein binding. arXiv preprint arXiv:2211.11214
Lin J, Li H, Chen K et al (2021) Sparse steerable convolutions: an efficient learning of se(3)-equivariant features for estimation and tracking of object poses in 3d space. Advances in neural information processing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp 16779–16790
Google Scholar
Lin J, Rickert M, Knoll A (2021b) Deep hierarchical rotation invariance learning with exact geometry feature representation for point cloud classification. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 9529–9535, https://doi.org/10.1109/ICRA48506.2021.9561307
Liu D, Chen C, Xu C et al (2022) A robust and reliable point cloud recognition network under rigid transformation. IEEE Trans Instrum Meas 71:1–13. https://doi.org/10.1109/TIM.2022.3142077
Article Google Scholar
Liu M, Yao F, Choi C, et al (2019a) Deep learning 3d shapes using alt-az anisotropic 2-sphere convolution. In: International conference on learning representations (ICLR)
Liu S, Guo H, Tang J (2022b) Molecular geometry pretraining with se(3)-invariant denoising distance matching. https://doi.org/10.48550/ARXIV.2206.13602
Liu Y, Wang C, Song Z et al (2018) Efficient global point cloud registration by matching rotation invariant features through translation search. Computer Vision–ECCV 2018. Springer International Publishing, Cham, pp 460–474. https://doi.org/10.1007/978-3-030-01258-8_28
Chapter Google Scholar
Liu Y, Fan B, Xiang S, et al (2019b) Relation-shape convolutional neural network for point cloud analysis. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8887–8896, https://doi.org/10.1109/CVPR.2019.00910
Liu Y, Hong W, Cao B (2022) Molnet-3d: deep learning of molecular representations and properties from 3d topography. Adv Theory Simul 5(6):2200037. https://doi.org/10.1002/adts.202200037
Article Google Scholar
Liu Z, Zhou S, Suo C, et al (2019c) Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 2831–2840, https://doi.org/10.1109/ICCV.2019.00292
Lohit S, Trivedi S (2020) Rotation-invariant autoencoders for signals on spheres. https://doi.org/10.48550/ARXIV.2012.04474
Lou Y, Ye Z, You Y et al (2023) Crin: rotation-invariant point cloud analysis and rotation estimation via centrifugal reference frame. Proc AAAI Conf Artif Intell (AAAI) 37(2):1817–1825. https://doi.org/10.1609/aaai.v37i2.25271
Article Google Scholar
Luo S, Li J, Guan J, et al (2022) Equivariant point cloud analysis via learning orientations for message passing. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 18,910–18,919, https://doi.org/10.1109/CVPR52688.2022.01836
Maddern W, Pascoe G, Linegar C et al (2017) 1 year, 1000 km: the oxford robotcar dataset. Int J Robot Res 36(1):3–15. https://doi.org/10.1177/0278364916679498
Article Google Scholar
Marcon M, Spezialetti R, Salti S et al (2021) Unsupervised learning of local equivariant descriptors for point clouds. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3126713
Article Google Scholar
Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 922–928, https://doi.org/10.1109/IROS.2015.7353481
McNitt-Gray MF, Armato SG, Meyer CR et al (2007) The lung image database consortium (lidc) data collection process for nodule detection and annotation. Acad Radiol 14(12):1464–1474. https://doi.org/10.1016/j.acra.2007.07.021
Article Google Scholar
Mehta D, Rhodin H, Casas D, et al (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 International conference on 3D Vision (3DV), pp 506–516, https://doi.org/10.1109/3DV.2017.00064
Mei G, Tang H, Huang X, et al (2023) Unsupervised deep probabilistic approach for partial point cloud registration. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13,611–13,620, https://doi.org/10.1109/CVPR52729.2023.01308
Melnyk P, Felsberg M, Wadenbäck M (2022) Steerable 3D spherical neurons. In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of machine learning research, vol 162. PMLR, pp 15,330–15,339
Melzi S, Spezialetti R, Tombari F, et al (2019) Gframes: gradient-based local reference frame for 3d shape matching. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4624–4633, https://doi.org/10.1109/CVPR.2019.00476
Meng HY, Gao L, Lai YK, et al (2019) Vv-net: Voxel vae net with group convolutions for point cloud segmentation. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 8499–8507, https://doi.org/10.1109/ICCV.2019.00859
Menze BH, Jakab A, Bauer S et al (2015) The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans Med Imaging 34(10):1993–2024. https://doi.org/10.1109/TMI.2014.2377694
Article Google Scholar
Mo K, Zhu S, Chang AX, et al (2019) Partnet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 909–918, https://doi.org/10.1109/CVPR.2019.00100
Moon J, Kim H, Lee B (2018) View-point invariant 3d classification for mobile robots using a convolutional neural network. Int J Control Autom Syst 16(6):2888–2895. https://doi.org/10.1007/s12555-018-0182-y
Article Google Scholar
Mukhaimar A, Tennakoon R, Lai CY et al (2022) Robust object classification approach using spherical harmonics. IEEE Access 10:21541–21553. https://doi.org/10.1109/ACCESS.2022.3151350
Article Google Scholar
Novotny D, Ravi N, Graham B, et al (2019) C3dpo: canonical 3d pose networks for non-rigid structure from motion. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 7687–7696, https://doi.org/10.1109/ICCV.2019.00778
Pan G, Liu P, Wang J et al (2019) 3dti-net: learn 3d transform-invariant feature using hierarchical graph cnn. PRICAI 2019: trends in artificial intelligence. Springer International Publishing, Cham, pp 37–51. https://doi.org/10.1007/978-3-030-29911-8_4
Chapter Google Scholar
Pan L, Cai Z, Liu Z (2021) Robust partial-to-partial point cloud registration in a full range. https://doi.org/10.48550/ARXIV.2111.15606
Park JY, Biza O, Zhao L, et al (2022) Learning symmetric embeddings for equivariant world models. In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of machine learning research, vol 162. PMLR, pp 17,372–17,389
Paulhac L, Makris P, Ramel JY, et al (2009) A solid texture database for segmentation and classification experiments. In: VISAPP (2), pp 135–141
Poiesi F, Boscaini D (2021) Distinctive 3d local deep descriptors. In: 2020 25th international conference on pattern recognition (ICPR), pp 5720–5727, https://doi.org/10.1109/ICPR48806.2021.9411978
Poiesi F, Boscaini D (2023) Learning general and distinctive 3d local deep descriptors for point cloud registration. IEEE Trans Pattern Anal Mach Intell 45(3):3979–3985. https://doi.org/10.1109/TPAMI.2022.3175371
Article Google Scholar
Pomerleau F, Liu M, Colas F et al (2012) Challenging data sets for point cloud registration algorithms. Int J Robot Res 31(14):1705–1711. https://doi.org/10.1177/0278364912458814
Article Google Scholar
Pop A, Domşa V, Tamas L (2023) Rotation invariant graph neural network for 3d point clouds. Remote Sens. https://doi.org/10.3390/rs15051437
Article Google Scholar
Poulenard A, Guibas LJ (2021) A functional approach to rotation equivariant non-linearities for tensor field networks. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13,169–13,178, https://doi.org/10.1109/CVPR46437.2021.01297
Poulenard A, Rakotosaona MJ, Ponty Y, et al (2019) Effective rotation-invariant point cnn with spherical harmonics kernels. In: 2019 International conference on 3D vision (3DV), pp 47–56, https://doi.org/10.1109/3DV.2019.00015
Pujol-Miró A, Casas JR, Ruiz-Hidalgo J (2019) Correspondence matching in unorganized 3d point clouds using convolutional neural networks. Image Vis Comput 83–84:51–60. https://doi.org/10.1016/j.imavis.2019.02.013
Article Google Scholar
Puny O, Atzmon M, Smith EJ, et al (2022) Frame averaging for invariant and equivariant network design. In: International conference on learning representations (ICLR)
Qi CR, Su H, Nießner M, et al (2016) Volumetric and multi-view cnns for object classification on 3d data. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 5648–5656, https://doi.org/10.1109/CVPR.2016.609
Qi CR, Su H, Kaichun M, et al (2017a) Pointnet: deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 77–85, https://doi.org/10.1109/CVPR.2017.16
Qi CR, Yi L, Su H et al (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems (NIPS), vol 30. Curran Associates Inc, New York
Google Scholar
Qin S, Zhang X, Xu H et al (2022) Fast quaternion product units for learning disentangled representations in $\mathbb{S}\mathbb{O} (3)$. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3202217
Article Google Scholar
Qin S, Li Z, Liu L (2023a) Robust 3d shape classification via non-local graph attention network. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5374–5383, https://doi.org/10.1109/CVPR52729.2023.00520
Qin Z, Yu H, Wang C et al (2023) Geotransformer: fast and robust point cloud registration with geometric transformer. IEEE Trans Pattern Anal Mach Intell 45(8):9806–9821. https://doi.org/10.1109/TPAMI.2023.3259038
Article Google Scholar
Qiu Z, Li Y, Wang Y et al (2022) Spe-net: boosting point cloud analysis via rotation robustness enhancement. Computer Vision–ECCV 2022. Springer Nature Switzerland, Cham, pp 593–609
Chapter Google Scholar
Ramakrishnan R, Dral PO, Rupp M et al (2014) Quantum chemistry structures and properties of 134 kilo molecules. Sci Data 1(1):140,022. https://doi.org/10.1038/sdata.2014.22
Article Google Scholar
Rao Y, Lu J, Zhou J (2019) Spherical fractal convolutional neural networks for point cloud recognition. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 452–460, https://doi.org/10.1109/CVPR.2019.00054
Rasp S, Dueben PD, Scher S et al (2020) Weatherbench: a benchmark data set for data-driven weather forecasting. J Adv Model Earth Syst 12(11):e2020MS002. https://doi.org/10.1029/2020MS002203
Article Google Scholar
Roveri R, Rahmann L, Öztireli AC, et al (2018) A network architecture for point cloud classification via automatic depth images generation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4176–4184, https://doi.org/10.1109/CVPR.2018.00439
Rupp M, Tkatchenko A, Müller KR et al (2012) Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett 108(058):301. https://doi.org/10.1103/PhysRevLett.108.058301
Article Google Scholar
Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: 2009 IEEE international conference on robotics and automation (ICRA), pp 3212–3217, https://doi.org/10.1109/ROBOT.2009.5152473
Sahin YH, Mertan A, Unal G (2022) Odfnet: using orientation distribution functions to characterize 3d point clouds. Comput Graph 102:610–618. https://doi.org/10.1016/j.cag.2021.08.016
Article Google Scholar
Sajnani R, Poulenard A, Jain J, et al (2022) Condor: self-supervised canonicalization of 3d pose for partial shapes. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16,948–16,958, https://doi.org/10.1109/CVPR52688.2022.01646
Salihu D, Steinbach E (2023) Sgpcr: spherical gaussian point cloud representation and its application to object registration and retrieval. In: 2023 IEEE/CVF winter conference on applications of computer vision (WACV), pp 572–581, https://doi.org/10.1109/WACV56688.2023.00064
Satorras VG, Hoogeboom E, Fuchs F et al (2021) E(n) equivariant normalizing flows. Advances in neural information processing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp 4181–4192
Google Scholar
Satorras VG, Hoogeboom E, Welling M (2021b) E(n) equivariant graph neural networks. In: Proceedings of the 38th international conference on machine learning (ICML), proceedings of machine learning research, vol 139. PMLR, pp 9323–9332
Savva M, Yu F, Su H, et al (2017) Large-scale 3d shape retrieval from shapenet core55: Shrec’17 track. In: Proceedings of the workshop on 3D object retrieval. Eurographics Association, Goslar, DEU, 3Dor ’17, pp 39–50, https://doi.org/10.2312/3dor.20171050
Schneuing A, Du Y, Harris C, et al (2022) Structure-based drug design with equivariant diffusion models. https://doi.org/10.48550/ARXIV.2210.13695
Schütt K, Kindermans PJ, Sauceda Felix HE et al (2017) Schnet: a continuous-filter convolutional neural network for modeling quantum interactions. In: Guyon I, Luxburg UV, Bengio S et al (eds) Advances in neural information processing systems (NIPS), vol 30. Curran Associates Inc
Google Scholar
Schütt K, Unke O, Gastegger M (2021) Equivariant message passing for the prediction of tensorial properties and molecular spectra. In: Proceedings of the 38th international conference on machine learning (ICML), proceedings of machine learning research, vol 139. PMLR, pp 9377–9388
Schütt KT, Sauceda HE, Kindermans PJ et al (2018) Schnet–A deep learning architecture for molecules and materials. J Chem Phys 148(24):241722. https://doi.org/10.1063/1.5019779
Article Google Scholar
Shahroudy A, Liu J, Ng TT, et al (2016) Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1010–1019, https://doi.org/10.1109/CVPR.2016.115
Shakerinava M, Ravanbakhsh S (2021) Equivariant networks for pixelized spheres. In: Proceedings of the 38th international conference on machine learning (ICML), proceedings of machine learning research, vol 139. PMLR, pp 9477–9488
Shan Z, Yang Q, Ye R, et al (2023) Gpa-net:no-reference point cloud quality assessment with multi-task graph convolutional network. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2023.3282802
Article Google Scholar
Shen W, Zhang B, Huang S et al (2020) 3d-rotation-equivariant quaternion neural networks. Computer Vision–ECCV 2020. Springer International Publishing, Cham, pp 531–547. https://doi.org/10.1007/978-3-030-58565-5_32
Chapter Google Scholar
Shen Z, Hong T, She Q, et al (2022) PDO-s3DCNNs: artial differential operator based steerable 3D CNNs. In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of machine learning research, Vol 162. PMLR, pp 19827–19846
Shi B, Bai S, Zhou Z et al (2015) Deeppano: deep panoramic representation for 3-d shape recognition. IEEE Signal Process Lett 22(12):85. https://doi.org/10.1109/LSP.2015.2480802
Article Google Scholar
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–779, https://doi.org/10.1109/CVPR.2019.00086
Shotton J, Glocker B, Zach C, et al (2013) Scene coordinate regression forests for camera relocalization in rgb-d images. In: 2013 IEEE conference on computer vision and pattern recognition, pp 2930–2937, https://doi.org/10.1109/CVPR.2013.377
Siddani B, Balachandar S, Fang R (2021) Rotational and reflectional equivariant convolutional neural network for data-limited applications: multiphase flow demonstration. Phys Fluids 33(10):103323. https://doi.org/10.1063/5.0066049
Article Google Scholar
Simeonov A, Du Y, Tagliasacchi A, et al (2022) Neural descriptor fields: Se(3)-equivariant object representations for manipulation. In: 2022 international conference on robotics and automation (ICRA), pp 6394–6400, https://doi.org/10.1109/ICRA46639.2022.9812146
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/ARXIV.1409.1556
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576, https://doi.org/10.1109/CVPR.2015.7298655
Spezialetti R, Salti S, Stefano LD (2019) Learning an effective equivariant 3d descriptor without supervision. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6400–6409, https://doi.org/10.1109/ICCV.2019.00650
Spezialetti R, Stella F, Marcon M et al (2020) Learning to orient surfaces by self-supervised spherical cnns. In: Larochelle H, Ranzato M, Hadsell R et al (eds) Advances in neural information processing systems (NeurIPS), vol 33. Curran Associates Inc, NewYork, pp 5381–5392
Google Scholar
Stärk H, Ganea OE, Pattanaik L, et al (2022) Equibind: geometric deep learning for drug binding structure prediction. In: ICLR 2022 workshop on geometrical and topological representation learning
Su H, Maji S, Kalogerakis E, et al (2015) Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE international conference on computer vision (ICCV), pp 945–953, https://doi.org/10.1109/ICCV.2015.114
Subramanian G, Ramsundar B, Pande V et al (2016) Computational modeling of ß-secretase 1 (bace-1) inhibitors using ligand based approaches. J Chem Inf Model 56(10):1936–1949. https://doi.org/10.1021/acs.jcim.6b00290
Article Google Scholar
Suk J, de Haan P, Lippe P, et al (2021) Equivariant graph neural networks as surrogate for computational fluid dynamics in 3d artery models. In: Fourth workshop on machine learning and the physical sciences (NeurIPS 2021)
Suk J, Haan Pd, Lippe P et al (2022) Mesh convolutional neural networks for wall shear stress estimation in 3d artery models. Statistical atlases and computational models of the heart. Multi-disease, multi-view, and multi-center right ventricular segmentation in cardiac MRI challenge. Springer, Cham, pp 93–102. https://doi.org/10.1007/978-3-030-93722-5_11
Chapter Google Scholar
Sun T, Liu M, Ye H et al (2019) Point-cloud-based place recognition using CNN feature extraction. IEEE Sens J 19(24):12175–12186. https://doi.org/10.1109/JSEN.2019.2937740
Article Google Scholar
Sun W, Tagliasacchi A, Deng B et al (2021) Canonical capsules: self-supervised capsules in canonical pose. In: Ranzato M, Beygelzimer A, Dauphin Y et al (eds) Advances in neural information processing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp 24993–25005
Google Scholar
Sun X, Wei Y, Liang S, et al (2015) Cascaded hand pose regression. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 824–832, https://doi.org/10.1109/CVPR.2015.7298683
Sun X, Lian Z, Xiao J (2019b) Srinet: Learning strictly rotation-invariant representations for point cloud classification and segmentation. In: Proceedings of the 27th ACM international conference on multimedia (ACM MM). Association for computing machinery, New York, MM ’19, pp 980–988, https://doi.org/10.1145/3343031.3351042
Sun X, Huang Y, Lian Z (2023) Learning isometry-invariant representations for point cloud analysis. Pattern Recogn 134(109):087. https://doi.org/10.1016/j.patcog.2022.109087
Article Google Scholar
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9, https://doi.org/10.1109/CVPR.2015.7298594
Tabib RA, Upasi N, Anvekar T, et al (2023) Ipd-net: so(3) invariant primitive decompositional network for 3d point clouds. In: 2023 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 2736–2744, https://doi.org/10.1109/CVPRW59228.2023.00274
Tang D, Chang HJ, Tejani A, et al (2014) Latent regression forest: structured estimation of 3d articulated hand posture. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 3786–3793, https://doi.org/10.1109/CVPR.2014.490
Tao Z, Zhu Y, Wei T et al (2021) Multi-head attentional point cloud classification and segmentation using strictly rotation-invariant representations. IEEE Access 9:71,133-71,144. https://doi.org/10.1109/ACCESS.2021.3079295
Article Google Scholar
Team NLSTR (2011) Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 365(5):395–409
Article Google Scholar
Thölke P, Fabritiis GD (2022) Equivariant transformers for neural network based molecular potentials. In: International conference on learning representations (ICLR)
Thomas NC (2019) Euclidean-equivariant functions on three-dimensional point clouds. PhD thesis, Stanford University
Thomas NC, Smidt T, Kearnes S, et al (2018) Tensor field networks: rotation- and translation-equivariant neural networks for 3d point clouds. https://doi.org/10.48550/ARXIV.1802.08219
Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. Computer Vision–ECCV 2010. Springer, Berlin, pp 356–369. https://doi.org/10.1007/978-3-642-15558-1_26
Chapter Google Scholar
Tompson J, Stein M, Lecun Y, et al (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans Graph 33(5). https://doi.org/10.1145/2629500
Townshend R, Bedi R, Suriana P et al (2019) End-to-end learning on 3d protein structure for interface prediction. Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates Inc, NewYork
Google Scholar
Townshend RJL, Vögele M, Suriana PA, et al (2021) Atom3d: tasks on molecules in three dimensions. In: Thirty-fifth conference on neural information processing systems datasets and benchmarks track
Uy MA, Pham QH, Hua BS, et al (2019) Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 1588–1597, https://doi.org/10.1109/ICCV.2019.00167
Villar S, Hogg DW, Storey-Fisher K et al (2021) Scalars are universal: equivariant machine learning, structured like classical physics. Advances in neural information processing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp 28848–28863
Google Scholar
Vreven T, Moal IH, Vangone A et al (2015) Updates to the integrated protein-protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J Mol Biol 427(19):3031–3041. https://doi.org/10.1016/j.jmb.2015.07.016
Article Google Scholar
Wang C, Pelillo M, Siddiqi K (2017) Dominant set clustering and pooling for multi-view 3d object recognition. In: Tae-Kyun Kim GBStefanos Zafeiriou, Mikolajczyk K (eds) Proceedings of the British Machine Vision Conference (BMVC). BMVA Press, pp 64.1–64.12, https://doi.org/10.5244/C.31.64
Wang H, Sridhar S, Huang J, et al (2019a) Normalized object coordinate space for category-level 6d object pose and size estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2637–2646, https://doi.org/10.1109/CVPR.2019.00275
Wang H, Liu Y, Dong Z, et al (2022a) You only hypothesize once: point cloud registration with rotation-equivariant descriptors. In: Proceedings of the 30th ACM international conference on multimedia (ACM MM). Association for Computing Machinery, New York, NY, USA, MM ’22, pp 1630–1641, https://doi.org/10.1145/3503161.3548023
Wang H, Liu Y, Hu Q et al (2023) Roreg: pairwise point cloud registration with oriented descriptors and local rotations. IEEE Trans Pattern Anal Mach Intell 45(8):10376–10393. https://doi.org/10.1109/TPAMI.2023.3244951
Article Google Scholar
Wang J, Chakraborty R, Yu SX (2021) Spatial transformer for 3d point clouds. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3070341
Article Google Scholar
Wang L, Liu Y, Lin Y, et al (2022b) ComENet: towards complete and efficient message passing for 3d molecular graphs. In: Advances in neural information processing systems (NeurIPS)
Wang X, Lei J, Lan H, et al (2023b) Dueqnet: dual-equivariance network in outdoor 3d object detection for autonomous driving. In: 2023 IEEE International conference on robotics and automation (ICRA), pp 6951–6957, https://doi.org/10.1109/ICRA48891.2023.10161353
Wang Y, Sun Y, Liu Z et al (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph. https://doi.org/10.1145/3326362
Article Google Scholar
Wang Y, Zhao Y, Ying S et al (2022) Rotation-invariant point cloud representation for 3-d model recognition. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2022.3157593
Article Google Scholar
Wang Y, Wang J, Qu Y, et al (2023c) Rip-nerf: learning rotation-invariant point-based neural radiance field for fine-grained editing and compositing. In: Proceedings of the 2023 ACM international conference on multimedia retrieval. Association for computing machinery, New York, NY, USA, ICMR ’23, p 125-134, https://doi.org/10.1145/3591106.3592276
Wang Z, Rosen D (2023) Manufacturing process classification based on distance rotationally invariant convolutions. J Comput Inf Sci Eng 23(5):051,004. https://doi.org/10.1115/1.4056806
Article Google Scholar
Wei X, Yu R, Sun J (2020) View-gcn: view-based graph convolutional network for 3d shape analysis. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1847–1856, https://doi.org/10.1109/CVPR42600.2020.00192
Wei X, Yu R, Sun J (2022) Learning view-based graph convolutional network for multi-view 3d shape analysis. IEEE Trans Pattern Anal Mach Intell 25:1–17. https://doi.org/10.1109/TPAMI.2022.3221785
Article Google Scholar
Weihsbach C, Hansen L, Heinrich M (2022) Xedgeconv: leveraging graph convolutions for efficient, permutation- and rotation-invariant dense 3d medical image segmentation. In: Proceedings of the first international workshop on geometric deep learning in medical image analysis, Proceedings of machine learning research, vol 194. PMLR, pp 61–71
Weiler M, Geiger M, Welling M et al (2018) 3d steerable CNNS: learning rotationally equivariant features in volumetric data. Advances in neural information processing systems (NeurIPS), vol 31. Curran Associates Inc, NewYork
Google Scholar
Winkels M, Cohen TS (2018) 3d g-cnns for pulmonary nodule detection. In: Medical imaging with deep learning (MIDL)
Winkels M, Cohen TS (2019) Pulmonary nodule detection in CT scans with equivariant CNNS. Med Image Anal 55:15–26. https://doi.org/10.1016/j.media.2019.03.010
Article Google Scholar
Winter R, Bertolini M, Le T, et al (2022) Unsupervised learning of group invariant and equivariant representations. In: Advances in neural information processing systems (NeurIPS)
Worrall D, Brostow G (2018) Cubenet: equivariance to 3d rotation and translation. Computer Vision–ECCV 2018. Springer International Publishing, Cham, pp 585–602. https://doi.org/10.1007/978-3-030-01228-1_35
Chapter Google Scholar
Wu H, Miao Y (2022) So(3) rotation equivariant point cloud completion using attention-based vector neurons. In: 2022 International Conference on 3D Vision (3DV), pp 280–290, https://doi.org/10.1109/3DV57658.2022.00040
Wu W, Qi Z, Fuxin L (2019) Pointconv: deep convolutional networks on 3d point clouds. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9613–9622, https://doi.org/10.1109/CVPR.2019.00985
Wu Z, Song S, Khosla A, et al (2015) 3d shapenets: a deep representation for volumetric shapes. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1912–1920, https://doi.org/10.1109/CVPR.2015.7298801
Xiang Y, Kim W, Chen W et al (2016) Objectnet3d: a large scale database for 3d object recognition. Computer Vision–ECCV 2016. Springer International Publishing, Cham, pp 160–176. https://doi.org/10.1007/978-3-319-46484-8_10
Chapter Google Scholar
Xiao C, Wachs J (2021) Triangle-net: towards robustness in point cloud learning. In: 2021 IEEE winter conference on applications of computer vision (WACV), pp 826–835, https://doi.org/10.1109/WACV48630.2021.00087
Xiao J, Owens A, Torralba A (2013) Sun3d: a database of big spaces reconstructed using sfm and object labels. In: 2013 IEEE international conference on computer vision (ICCV), pp 1625–1632, https://doi.org/10.1109/ICCV.2013.458
Xiao Z, Lin H, Li R, et al (2020) Endowing deep 3d models with rotation invariance based on principal component analysis. In: 2020 IEEE international conference on multimedia and expo (ICME), pp 1–6, https://doi.org/10.1109/ICME46284.2020.9102947
Xie L, Yang Y, Wang W, et al (2023) General rotation invariance learning for point clouds via weight-feature alignment. https://doi.org/10.48550/arXiv.2302.09907
Xu C, Chen S, Li M et al (2021) Invariant teacher and equivariant student for unsupervised 3d human pose estimation. Proc AAAI Conf Artif Intell (AAAI) 35(4):3013–3021. https://doi.org/10.1609/aaai.v35i4.16409
Article Google Scholar
Xu J, Tang X, Zhu Y, et al (2021b) Sgmnet: Learning rotation-invariant point cloud representations via sorted gram matrix. In: 2021 IEEE/CVF International conference on computer vision (ICCV), pp 10,448–10,457, https://doi.org/10.1109/ICCV48922.2021.01030
Xu J, Yang Q, Li C, et al (2022) Rotation-equivariant graph convolutional networks for spherical data via global-local attention. In: 2022 IEEE International conference on image processing (ICIP), pp 2501–2505, https://doi.org/10.1109/ICIP46576.2022.9897510
Xu M, Zhou Z, Qiao Y (2020) Geometry sharing network for 3d point cloud classification and segmentation. Proc AAAI Conf Artif Intell (AAAI) 34(07):12500–12507. https://doi.org/10.1609/aaai.v34i07.6938
Article Google Scholar
Xu X, Yin H, Chen Z et al (2021) Disco: differentiable scan context with orientation. IEEE Robot cs Autom Lett 6(2):2791–2798. https://doi.org/10.1109/LRA.2021.3060741
Article Google Scholar
Xu X, Lu S, Wu J et al (2023) Ring++: Roto-translation invariant gram for global localization on a sparse scan map. IEEE Trans Rob 39(6):4616–4635. https://doi.org/10.1109/TRO.2023.3303035
Article Google Scholar
Xu Z, Liu K, Chen K et al (2023) Classification of single-view object point clouds. Pattern Recogn 135(109):137. https://doi.org/10.1016/j.patcog.2022.109137
Article Google Scholar
Yang F, Wang H, Jin Z (2021) Adaptive gmm convolution for point cloud learning. In: Proceedings of the British machine vision conference (BMVC), BMVA Press
Yang L, Chakraborty R (2020) An “augmentation-free” rotation invariant classification scheme on point-cloud and its application to neuroimaging. In: 2020 IEEE 17th international symposium on biomedical imaging (ISBI), pp 713–716, https://doi.org/10.1109/ISBI45749.2020.9098670
Yang L, Chakraborty R, Yu SX (2019) Poirot: a rotation invariant omni-directional pointnet. https://doi.org/10.48550/ARXIV.1910.13050
Yang Q, Li C, Dai W, et al (2020) Rotation equivariant graph convolutional network for spherical image classification. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4302–4311, https://doi.org/10.1109/CVPR42600.2020.00436
Yang Y, Feng C, Shen Y, et al (2018) Foldingnet: Point cloud auto-encoder via deep grid deformation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 206–215, https://doi.org/10.1109/CVPR.2018.00029
Yi L, Kim VG, Ceylan D et al (2016) A scalable active framework for region annotation in 3d shape collections. ACM Trans Graph. https://doi.org/10.1145/2980179.2980238
Article Google Scholar
Yin P, Wang F, Egorov A, et al (2020) Seqspherevlad: sequence matching enhanced orientation-invariant place recognition. In: 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 5024–5029, https://doi.org/10.1109/IROS45743.2020.9341727
Yin P, Xu L, Feng Z et al (2021) Pse-match: a viewpoint-free place recognition method with parallel semantic embedding. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2021.3102429
Article Google Scholar
Yin P, Wang F, Egorov A et al (2022) Fast sequence-matching enhanced viewpoint-invariant 3-d place recognition. IEEE Trans Industr Electron 69(2):2127–2135. https://doi.org/10.1109/TIE.2021.3057025
Article Google Scholar
You H, Feng Y, Ji R, et al (2018) Pvnet: a joint convolutional network of point cloud and multi-view for 3d shape recognition. In: Proceedings of the 26th ACM international conference on multimedia (ACM MM). Association for computing machinery, New York, NY, USA, MM ’18, pp 1310–1318, https://doi.org/10.1145/3240508.3240702
You Y, Lou Y, Liu Q et al (2020) Pointwise rotation-invariant network with adaptive sampling and 3d spherical voxel convolution. Proc AAAI Conf Artif Intell (AAAI) 34(07):12717–12724. https://doi.org/10.1609/aaai.v34i07.6965
Article Google Scholar
You Y, Lou Y, Shi R et al (2021) Prin/sprin: on extracting point-wise rotation invariant features. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3130590
Article Google Scholar
Yu H, Qin Z, Hou J, et al (2023) Rotation-invariant transformer for point cloud matching. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5384–5393, https://doi.org/10.1109/CVPR52729.2023.00521
Yu HX, Wu J, Yi L (2022) Rotationally equivariant 3d object detection. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1446–1454, https://doi.org/10.1109/CVPR52688.2022.00151
Yu R, Wei X, Tombari F et al (2020) Deep positional and relational feature learning for rotation-invariant point cloud analysis. Computer Vision–ECCV 2020. Springer International Publishing, Cham, pp 217–233. https://doi.org/10.1007/978-3-030-58607-2_13
Chapter Google Scholar
Yu T, Meng J, Yuan J (2018) Multi-view harmonized bilinear network for 3d object recognition. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 186–194, https://doi.org/10.1109/CVPR.2018.00027
Yu Y, Huang Z, Li F et al (2020) Point encoder GAN: a deep learning model for 3d point cloud inpainting. Neurocomputing 384:192–199. https://doi.org/10.1016/j.neucom.2019.12.032
Article Google Scholar
Yuan W, Held D, Mertz C, et al (2018) Iterative transformer network for 3d point cloud. https://doi.org/10.48550/ARXIV.1811.11209
Yun K, Honorio J, Chattopadhyay D, et al (2012) Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops, pp 28–35, https://doi.org/10.1109/CVPRW.2012.6239234
Zeng A, Song S, Nießner M, et al (2017) 3dmatch: learning local geometric descriptors from rgb-d reconstructions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 199–208, https://doi.org/10.1109/CVPR.2017.29
Zhang C, Budvytis I, Liwicki S et al (2021) Rotation equivariant orientation estimation for omnidirectional localization. Computer Vision - ACCV 2020. Springer International Publishing, Cham, pp 334–350. https://doi.org/10.1007/978-3-030-69538-5_21
Chapter Google Scholar
Zhang D, He F, Tu Z et al (2020) Pointwise geometric and semantic learning network on 3d point clouds. Integr Comput-Aided Eng 27:57–75. https://doi.org/10.3233/ICA-190608
Article Google Scholar
Zhang D, Yu J, Zhang C et al (2023) Parot: patch-wise rotation-invariant network via feature disentanglement and pose restoration. Proc AAAI Conf Artif Intell (AAAI) 37(3):3418–3426. https://doi.org/10.1609/aaai.v37i3.25450
Article Google Scholar
Zhang J, Yu MY, Vasudevan R, et al (2020b) Learning rotation-invariant representations of point clouds using aligned edge convolutional neural networks. In: 2020 International conference on 3D Vision (3DV), pp 200–209, https://doi.org/10.1109/3DV50981.2020.00030
Zhang L, Sun J, Zheng Q (2018) 3d point cloud recognition based on a multi-view convolutional neural network. Sensors. https://doi.org/10.3390/s18113681
Article Google Scholar
Zhang S, Cao H, Liu Y, et al (2021b) Sn-graph: a minimalist 3d object representation for classification. In: 2021 IEEE international conference on multimedia and expo (ICME), pp 1–6, https://doi.org/10.1109/ICME51207.2021.9428449
Zhang T (2021) Spherical-gmm: a rotation and scale invariant method for point cloud classification. In: 2021 2nd international conference on intelligent computing and human-computer interaction (ICHCI), pp 156–161, https://doi.org/10.1109/ICHCI54629.2021.00040
Zhang X, Wang L, Helwig J, et al (2023b) Artificial intelligence for science in quantum, atomistic, and continuum systems. https://doi.org/10.48550/arXiv.2307.08423
Zhang Y, Lu Z, Xue JH, et al (2019a) A new rotation-invariant deep network for 3d object recognition. In: 2019 IEEE international conference on multimedia and expo (ICME), pp 1606–1611, https://doi.org/10.1109/ICME.2019.00277
Zhang Y, Zhang W, Li J (2023) Partial-to-partial point cloud registration by rotation invariant features and spatial geometric consistency. Remote Sens. https://doi.org/10.3390/rs15123054
Article Google Scholar
Zhang Z, Rebecq H, Forster C, et al (2016) Benefit of large field-of-view cameras for visual odometry. In: 2016 IEEE international conference on robotics and automation (ICRA), pp 801–808, https://doi.org/10.1109/ICRA.2016.7487210
Zhang Z, Hua BS, Rosen DW, et al (2019b) Rotation invariant convolutions for 3d point clouds deep learning. In: 2019 International conference on 3D vision (3DV), pp 204–213, https://doi.org/10.1109/3DV.2019.00031
Zhang Z, Hua BS, Chen W, et al (2020c) Global context aware convolutions for 3d point cloud understanding. In: 2020 international conference on 3D vision (3DV), pp 210–219, https://doi.org/10.1109/3DV50981.2020.00031
Zhang Z, Wang X, Zhang Z, et al (2021c) Revisiting transformation invariant geometric deep learning: are initial representations all you need? https://doi.org/10.48550/ARXIV.2112.12345
Zhang Z, Hua BS, Yeung SK (2022) Riconv++: effective rotation invariant convolutions for 3d point clouds deep learning. Int J Comput Vis. https://doi.org/10.1007/s11263-022-01601-z
Article Google Scholar
Zhao C, Yang J, Xiong X et al (2022) Rotation invariant point cloud analysis: where local geometry meets global topology. Pattern Recogn 127(108):626. https://doi.org/10.1016/j.patcog.2022.108626
Article Google Scholar
Zhao H, Liang Z, Wang C et al (2021) Centroidreg: a global-to-local framework for partial point cloud registration. IEEE Robot Autom Lett 6(2):2533–2540. https://doi.org/10.1109/LRA.2021.3061369
Article Google Scholar
Zhao H, Zhuang H, Wang C et al (2022) G3doa: generalizable 3d descriptor with overlap attention for point cloud registration. IEEE Robot Autom Lett 7(2):2541–2548. https://doi.org/10.1109/LRA.2022.3142733
Article Google Scholar
Zhao Y, Birdal T, Lenssen JE et al (2020) Quaternion equivariant capsule networks for 3d point clouds. Computer Vision–ECCV 2020. Springer International Publishing, Cham, pp 1–19. https://doi.org/10.1007/978-3-030-58452-8_1
Chapter Google Scholar
Zhao Y, Wu Y, Chen C, et al (2020b) On isometry robustness of deep 3d point cloud models under adversarial attacks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1198–1207, https://doi.org/10.1109/CVPR42600.2020.00128
Zhou C, Dong Z, Lin H (2022) Learning persistent homology of 3d point clouds. Comput Graph 102:269–279. https://doi.org/10.1016/j.cag.2021.10.022
Article Google Scholar
Zhou K, Bhatnagar BL, Schiele B, et al (2022b) Adjoint rigid transform network: task-conditioned alignment of 3d shapes. In: 2022 international conference on 3D vision (3DV), pp 1–11, https://doi.org/10.1109/3DV57658.2022.00019
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4490–4499, https://doi.org/10.1109/CVPR.2018.00472
Zhu G, Zhou Y, Zhao J et al (2022) Point cloud recognition based on lightweight embeddable attention module. Neurocomputing 472:138–148. https://doi.org/10.1016/j.neucom.2021.10.098
Article Google Scholar
Zhu J, Li Y, Hu Y et al (2020) Rubik’s cube+: a self-supervised feature learning framework for 3d medical image analysis. Med Image Anal 64(101):746. https://doi.org/10.1016/j.media.2020.101746
Article Google Scholar
Zhu M, Ghaffari M, Peng H (2022b) Correspondence-free point cloud registration with so(3)-equivariant implicit shape representations. In: Proceedings of the 5th conference on robot learning (CoRL), Proceedings of machine learning research, vol 164. PMLR, pp 1412–1422
Zhu M, Han S, Cai H, et al (2023) 4d panoptic segmentation as invariant and equivariant field prediction. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 22,488–22,498
Zhuang X, Li Y, Hu Y et al (2019) Self-supervised feature learning for 3d medical images by playing Rubik’s cube. In: Shen D, Liu T, Peters TM et al (eds) Medical image computing and computer assisted intervention (MICCAI). Springer International Publishing, Cham, pp 420–428. https://doi.org/10.1007/978-3-030-32251-9_46
Chapter Google Scholar
Zitnick CL, Chanussot L, Das A, et al (2020) An introduction to electrocatalyst design using machine learning for renewable energy storage. https://doi.org/10.48550/ARXIV.2010.09435

Download references

Funding

This work was supported in part by the National Science Foundation of China (NSFC) under Grant No. 62176134. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Institute for Artificial Intelligence at Tsinghua University (THUAI), Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, 100084, China
Jiajun Fei & Zhidong Deng

Authors

Jiajun Fei
View author publications
You can also search for this author in PubMed Google Scholar
Zhidong Deng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jiajun Fei wrote the main manuscript text. Zhidong Deng served as the scientific advisor and led this research project. Deng chose this topic and provided valuable suggestions throughout the whole research process. After the initial draft was finished, Deng made many thorough and comprehensive revisions to correct the mistakes and improve the readability. Both authors reviewed the manuscript.

Corresponding author

Correspondence to Zhidong Deng.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fei, J., Deng, Z. Rotation invariance and equivariance in 3D deep learning: a survey. Artif Intell Rev 57, 168 (2024). https://doi.org/10.1007/s10462-024-10741-2

Download citation

Accepted: 24 February 2024
Published: 07 June 2024
DOI: https://doi.org/10.1007/s10462-024-10741-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Rotation invariance and equivariance in 3D deep learning: a survey

Abstract

Similar content being viewed by others

A survey on deep geometry learning: From a representation perspective

Deep3D reconstruction: methods, data, and challenges

CubeNet: Equivariance to 3D Rotation and Translation

1 Introduction

2 Background

Definition 1

3 Rotation invariant methods

3.1 Data augmentation methods

3.2 Multi-view methods

3.3 Ringlike and cylindrical methods

3.4 Transformation methods

3.5 Invariant value methods

3.5.1 Local values

3.5.2 LRF-based values

3.5.3 PPF-based values

3.5.4 Global values

3.5.5 Others

3.5.6 Discussion

3.6 PCA-based methods

3.7 Summary

4 Rotation equivariant methods

4.1 G-CNNs

4.2 Spherical CNNs

4.2.1 Cohen et al. (2018a)

4.2.2 Esteves et al. (2018a)

4.2.3 Others

4.2.4 Discussion

4.3 Irreducible representation methods

4.4 Equivariant value methods

4.5 Others

4.6 Summary

5 Application and dataset

5.1 3D semantic understanding

5.2 Molecule-related application

6 Future direction

6.1 Method

6.2 Theoretical analysis

6.3 Benchmark

7 Conclusion

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation