Training system overview
The newly proposed training system automatically detects the target object on the basis of the image and uses the image information as well as the brain signal information to more easily control the robotic arm approaching the target object. During the phase of training with the system, shared control strategy can motivate user by maintaining the success rate to certain level. Kinect is used for image acquisition, and separation of pixel colour and hierarchical clustering is applied for multiple target object detection. For shared control, conventional artificial potential approach is modified to apply to BMI system for robotic arm control. As the dependency of image information is higher than that of brain signal information, the control difficulty of robotic arm is lowered. When the information from brain signal is completely suppressed, the robotic arm motion doesn’t reflect the user intention, and the process becomes an observation-based training. Further specific details are described in sections below.
Detecting target objects using the Kinect
The position of target objects must be accurately defined to compensate for the predicted hand trajectory from neural signals. Before their position can be estimated, the targets must be detected. Although various target detection algorithms have been reported, elaborate algorithms are not required for the BMI training system because it is operated in a relatively well-arranged space with clean background. Green balls (diameter: 7 cm) serve as targets in this study, so binary images (green: 1; else: 0) were acquired using RGB images obtained by the Kinect (Kinect for XBOX 360, Microsoft, Redmond, WA, US). Using RGB values from the image, green pixels were separated, as shown in Eq. (1).
$$ \frac{\text{G}}{{{\text{R}} + {\text{G}} + {\text{B}}}} > 0.5 $$
(1)
The images were filtered to remove noise via the process described in Eqs. (2) and (3).
$$ {\text{Average }}\left( {{\text{i}}, {\text{j}}} \right) = \frac{1}{9}\mathop \sum \limits_{k = - 1}^{1} \mathop \sum \limits_{l = - 1}^{1} index\quad \left( {i + k,j + l} \right) $$
(2)
$$ \begin{aligned} {\text{Filtered}}\;\;{\text{pixel }}\left( {{\text{i}}, {\text{j}}} \right) & = 1 \quad \left( {{\text{if}}\;\;{\text{Average }}\left( {{\text{i}},{\text{j}}} \right) = 1} \right) \hfill \\ & = 0 \quad \left( {\text{else}} \right) \hfill \\ \end{aligned} $$
(3)
Noise-filtered images can contain more than one target object, so pixels designated as “1” should be clustered to their corresponding target object. Hierarchical clustering distinguishes multiple target objects simultaneously. Conventional clustering algorithms require the number of clusters to be predetermined for centroid generation. However, a BMI system is highly limited if the number of target object is predetermined; thus, a divisive hierarchical clustering approach was chosen instead. This approach initially assumes that there is one target object is. When the x- or y-axis standard deviation of pixels are equal or larger than 20, the cluster is reclustered with two centroids. This procedure is repeated until all clusters have a pixel distribution whose standard deviation is less than 20 in both the x- and y-axes. The target detection procedure is summarized in Fig. 2.
To validate the algorithm, an environment containing several green target objects was prepared and photographed with the Kinect. The number of target objects ranged from two to four and 20 images were acquired for each number of target objects (for a total of 60 images). The distance between target objects was controlled within the range from 15 to 35 cm. For each number of target objects, the accuracy of detecting all existing targets was measured.
Estimation of 3D position (camera calibration)
The 3D position of detected target object should be estimated using information acquired from images. The Kinect provides two types of images: RGB and depth images. RGB images provide three channels of data with 480 × 640 resolution. Depth images provide one channel of 480 × 640 resolution; each pixel represents a depth index related to the distance to objects in the image. Using three types of camera calibration, the 3D position of each target object can be estimated.
The first calibration consists of distortion compensation between the RGB and depth images. The same object is reflected in different pixels in the RGB and depth images, so calibration to match the two images is necessary for position estimation (Fig. 3a). Linear regression was applied to obtain the transformation matrix for the mapping. Images with several balls were photographed with the RGB and depth camera, respectively (Fig. 3b), and the pixel coordinates of the balls were measured. This process was repeated to obtain a larger dataset. The coordinates of the balls from the RGB images were stored to matrix A with a size of 165 × 2, which contains pixel information from 165 balls. Matrix D, for the depth image, was obtained similarly. Then, matrix B, which maps pixels from the RGB to the depth image, can be obtained with the linear regression shown below.
$$ {\text{D}} = \left[ {{\text{A }}1} \right] \times {\text{B}} $$
(4)
$$ {\text{B}} = \left( {X^{T} {\text{X}}} \right)^{ - 1} X^{T} D, {\text{X}} = \left[ {{\text{A }}1} \right] $$
(5)
Thus, B was obtained as shown below. The R2 values for the x- and y-axes were nearly 1.0 (> 0.999) and their root mean square errors (RMSEs) were 2.97 and 2.40 pixels, respectively. This result indicates that less than 3 pixels of error are occurred with the proposed linear regression. When total size of the images (480 × 640 pixels) is considered, this error seems to be trivial.
$$ {\text{B}} = \left[ {\begin{array}{*{20}c} {1.17332} & {0.000815} \\ { - 0.01963} & {1.12255} \\ { - 34.6797} & { - 20.0941} \\ \end{array} } \right] $$
(6)
The second calibration is between the depth index and real distance. We utilized a linear curve fitting provided by the Matlab Curve Fitting Toolbox (Matlab R2016b, Mathworks Inc., Natick, MA, US), the result of which is shown in Eq. (7).
$$ Distance \, \left( {\text{m}} \right) = 61.5 \times Depth\;\;index + 0.1046,\;\; \left( {R^{2} = 0.9987} \right) $$
(7)
Finally, 2D position was estimated from the x and y pixels of the RGB image. The above approach was used and the results are shown in Eqs. (8) and (9).
$$ {\text{X}} \, \left( {\text{m}} \right) = \left( {0.001937 \times Distance \, \left( {\text{m}} \right) + 0.0001662} \right) \times Pixel,\;\; \left( {R^{2} = 0.9939} \right) $$
(8)
$$ {\text{Y}} \, \left( {\text{m}} \right) = \left( {0.002072 \times Distance \, \left( {\text{m}} \right) - 0.000227} \right) \times Pixel,\;\; \left( {R^{2} = 0.9784} \right) $$
(9)
To validate the implemented algorithm, we compared it with an optic tracker (PST Base, ps-tech, Amsterdam, Netherlands) that estimates the 3D position of pre-attached stickers with high accuracy (positon < 0.5 mm RMSE, orientation < 1° RMSE2). Two target objects with pre-attached stickers were prepared and placed randomly on the prepared experimental setup, as depicted in Fig. 4. The distance between the two target objects was measured using the optic tracker and Kinect. This procedure was repeated 10 times and the difference between the two approaches was analyzed.
Vision-aided robotic system hardware
The vision-aided robotic system consists of three components: a 6-DOF anthropomorphic robot arm (JACO, Kinova, Boisbriand, QC, Canada), Kinect, and targets. The dimensions of the aluminum profiles for fixing three components were designed by considering the workspace of the robotic arm, as shown in Fig. 5a. The arm was fixed on one side of the aluminum profiles and the Kinect on the other. The green target objects were fixed between them using flexible supports. Kinect detects target objects and the estimated positions are delivered to the robot arm for its waypoint generation. The coordinates of the Kinect and robot arm are different; thus, the homogeneous transformation matrix (Eqs. (10) and (11)) should be multiplied before the target position information is used for waypoint generation.
$$ \left[ {\begin{array}{*{20}c} {\vec{B}} \\ 1 \\ \end{array} } \right] = T\left[ {\begin{array}{*{20}c} {\vec{A}} \\ 1 \\ \end{array} } \right] $$
(10)
$$ {\text{T}} = \left[ {\begin{array}{*{20}c} R & {\begin{array}{*{20}c} {1.24} \\ 0 \\ {0.05} \\ \end{array} } \\ {\begin{array}{*{20}c} 0 & 0 & 0 \\ \end{array} } & 1 \\ \end{array} } \right], \;\; \left( {R = \left[ {\begin{array}{*{20}c} 0 & 0 & { - 1} \\ 1 & 0 & 0 \\ 0 & { - 1} & 0 \\ \end{array} } \right]} \right) $$
(11)
Vectors \( \vec{A} \) and \( \vec{B} \) are the position of target object measured from the Kinect and robot gripper, respectively. The origin of the robotic arm was defined as a point with an offset of 0.2 m in the x- and z-axes from the origin of frame 0 (the center of joint1, see Fig. 5b).
Shared control using artificial potential
To generate waypoints of robot operation, both neural signals and the target object location should be considered. Previous studies reported shared control blending a velocity vector predicted from neural signals and the position of a target object; however, none considers the highly practical situation in which more than one target object exists in the workspace to the best of our knowledge. Thus, it is necessary to propose a novel shared-control strategy for BMI system that is practical for use with multiple targets. To solve this issue, our research utilized artificial potential, which is a conventional motion planning approach for robots to avoid obstacles and reach a destination. In this study, we modified the conventional approach to a proper form to apply to the BMI training system. It attracts robots to the most probable target object that allows the robot end-effector to approach the user-intended target. In this study, the joints of the robotic arm are compensated by blending the predicted hand velocity and the ideal vector to the intended target.
The detailed algorithm for shared control is described below. First, artificial potential was formed by considering the intended target object using Eqs. (12)–(14).
$$ e_{i} = q_{g, i} - q \quad \left( {{\text{i}} = 1, 2, 3, \ldots , {\text{Number of target objects}}} \right) $$
(12)
$$ U_{a, i} = \frac{1}{2}k_{a, i} e_{i}^{T} \left( q \right)e_{i} \left( q \right) $$
(13)
$$ f_{t} = \mathop \sum \limits_{i = 1}^{4} f_{a, i} \quad \left( {f_{a,i} = k_{a, i} e_{i} \left( {\text{q}} \right)} \right) $$
(14)
\( q \) is the configuration of the joints and \( q_{g, i} \) is the goal configuration for the \( i{\text{th}} \) target object. \( U_{a, i} \) is the artificial potential provided by the \( i{\text{th}} \) target object. The stiffness \( k_{a, i} \) is programmed to be 1 when the user of the BMI system most intends to reach the \( i{\text{th}} \) target object. The most intended target object is determined by the currently predicted velocity from neural signals. The angle between the predicted velocity vector and the vector from the current position to each target position is calculated and the target with the smallest angle is determined to be the intended target object. For the unselected target objects, \( k_{a, i} \) is set to 0. So, \( f_{t} \) is the attractive force acted by the intended target. The intended target object is updated for each stage of waypoint generation. Thus, the BMI user can change the preferred target object while controlling the robotic arm. Then, the ideal vector to the intended target object is generated based on the current waypoint of end-effector and attractive force \( f_{t} \) as shown in Eq. (15).
$$ \Delta x_{i, k} = x_{e} \left( {q_{k} + f_{t} \left( {q_{k} } \right)} \right) - x_{e} \left( {q_{k} } \right) $$
(15)
\( x_{e} \left( {q_{k} } \right) \) is the \( k{\text {th}} \) waypoint for robot arm end-effector and the vector \( \Delta x_{i, k} \) points the ideal direction to approach the intended target. When the predicted hand velocity vector moves away from the origin and the angle to the most intended target object is less than 90°, the next waypoint is generated via Eqs. (16) and (17).
$$ \Delta x_{k} = 1.5\left\{ {\alpha \left( {\Delta x_{i, k} \frac{{\Delta x_{n, k} }}{{\Delta x_{i, k} }}} \right) + \left( {1 - \alpha } \right)\Delta x_{n, k} } \right\} $$
(16)
$$ x_{e} \left( {q_{k + 1} } \right) = x_{e} \left( {q_{k} } \right) + \beta \Delta x_{k + 1} + \left( {1 - \beta } \right)\Delta x_{k} $$
(17)
\( \Delta x_{k} \) is the compensation vector for the \( k{\text{th}} \) waypoint and \( \Delta x_{n, k} \) is the hand velocity vector predicted from neural signals. The vector \( \Delta x_{i, k} \), which points in the ideal direction to the intended target, is scaled to the size of vector \( \Delta x_{n, k} \), and the two are blended with proportion of \( \alpha \) and \( 1 - \alpha \). The larger \( \alpha \) increases the compensation from the Kinect and decreases the strength of the originally predicted hand velocity from neural signals. Additionally, inertia is considered via parameter \( \beta \) to suppress unintended sudden movement of the robotic arm. Similar BMI studies of shared control also considered the issue of reducing acceleration, applying a smoothing approach to motion planning [31]. At Eq. (17), the compensation for the \( k + 1{\text{th}} \) waypoint is provided by blending the \( k{\text{th}} \) compensation vector with the proportion of \( 1 - \beta \). Blending parameters \( \alpha \) and \( \beta \) both range from 0 to 1.
To calculate equations for artificial potential, forward and inverse kinematics of the robotic arm are required; conventional kinematics of a 3-DOF anthropomorphic arm [32] were utilized in this study. The homogeneous transformation matrix for forward kinematics is suggested in Eq. (18). The Denavit–Hartenberg parameters for deriving the matrix are listed in Table 1.
Table 1 Denavit–Hartenberg parameters for the robotic arm
$$ T_{forward} \left( q \right) = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {c_{1} c_{23} } & { - c_{1} s_{23} } \\ {s_{1} c_{23} } & { - s_{1} s_{23} } \\ \end{array} } & {\begin{array}{*{20}c} {s_{1} } & {c_{1} \left( {a_{2} c_{2} + a_{3} c_{23} } \right)} \\ { - c_{1} } & {s_{1} \left( {a_{2} c_{2} + a_{3} c_{23} } \right)} \\ \end{array} } \\ {\begin{array}{*{20}c} {s_{23} } & { c_{23} } \\ 0 & { 0} \\ \end{array} } & {\begin{array}{*{20}c} 0 & { a_{2} s_{2} + a_{3} s_{23} } \\ 0 & 1 \\ \end{array} } \\ \end{array} } \right] $$
(18)
Relevant nomenclature is suggested in Fig. 4b. \( c_{1} \), \( c_{2} \), and \( c_{3} \) indicate \( { \cos }\theta_{1} \), \( { \cos }\theta_{2} \), and \( { \cos }\theta_{3} \). \( s_{1} \), \( s_{2} \), and \( s_{3} \) indicate \( { \sin }\theta_{1} \), \( { \cos }\theta_{2} \), and \( { \sin }\theta_{3} \). Furthermore, \( c_{23} \) and \( s_{23} \) indicate \( { \cos }\left( {\theta_{2} + \theta_{3} } \right) \) and \( { \sin }\left( {\theta_{2} + \theta_{3} } \right) \), respectively. Additional information for inverse kinematics is suggested in Eqs. (19)–(21).
$$ \theta_{3} = {\text{atan}}2\left( {s_{3} , c_{3} } \right) $$
(19)
$$ \theta_{2} = {\text{atan}}2\left( {\left( {a_{2} + a_{3} c_{3} } \right)pW_{z} - a_{3} s_{3} \sqrt {pW_{x}^{2} + pW_{y}^{2} } , \;\;\;\;\left( {a_{2} + a_{3} c_{3} } \right)\sqrt {pW_{x}^{2} + pW_{y}^{2} } + a_{3} s_{3} pW_{z} } \right) $$
(20)
$$ \theta_{1} = {\text{atan}}2\left( {pW_{y} , pW_{x} } \right) $$
(21)
\( pW_{x} \), \( pW_{y} \), and \( pW_{z} \) indicate the x-, y-, and z-coordinates of the end-effector position measured in frame 0.
The proposed algorithm was validated by applying it to hand trajectories predicted from noninvasive EEG signals. Predicted hand trajectories obtained in previous research [25] were used. The details of decoding process were introduced in the previous paper. The dataset contains 120 hand trajectories predicted from a healthy participant consisting of 4 directional reaching movements (30 trials per direction). Blending parameters \( \alpha \) and \( \beta \) affect the performance of the algorithm and were modulated from 0.05 to 1.00 in intervals of 0.05 for optimization.
Clinical application in participants with upper limb impairment
We established a clinical BMI training system consisting of an EEG acquisition system (Synamps 2, Compumedics Neuroscan, Texas, USA) and the vision-aided robotic system. The two systems are connected to a personal computer for a cooperated system. EEG is processed by Matlab, and the robotic arm is controlled by C# (Microsoft, Redmond, WA, USA) based SDK. Two volunteers with severe upper limb impairment due to cervical spinal cord injury participated in this study. The inclusion criteria for recruiting participants were: (1) severely disabled patient with unilateral or bilateral upper limb paralysis, (2) no significant cognitive impairment (MMSE > 26), and (3) able to fully understand the study and provide informed consent. Participant #1 is a 31-year-old male with American Spinal Injury Association Impairment Scale (AIS) C at level C4 (motor C7/C4 and sensory C5/C5), and participant #2 is a 47-year-old male with AIS B at level C4 (motor C4/C4 and sensory C5/C5). These patients are completely unable to control their own arms. The participants were given a total of 10 sessions for BMI training. The first 5 sessions were designed to help the participants get used to motor imagery using targets and virtual reality video files (Fig. 6a). In the next 5 sessions, developed BMI training system was utilized (Fig. 6b). In each session, the users were instructed to imagine robotic arm control to reach an instructed target out of two. Two types of training were applied to each session. As the first type, observation-based training was performed, and the parameters of the decoder were determined using multiple linear regressions between robotic arm motion and EEG signal. Since the decoder was not yet verified by patients with upper limb paralysis, blending parameters were set to 1, and the instructed targets were preprogrammed in the controller. After observation-based training of each session, shared control based robotic arm control was attempted with blending parameters α = β = 0.6. For each trial of shared control session, users were instructed to choose one target out of two, and success rate was measured for about 40 trials. fMRI while performing motor imagery tasks was taken before the 6th training session and after the 10th training session. Overall plan of clinical application is represented in Fig. 7. This clinical study was approved by the Institutional Review Board of Seoul National University Hospital (IRB No. 1605-136-765).
fMRI evaluation
Functional imaging consisted of motor imagery tasks in 3 directions: upper, lower-left, and lower-right. Block design was used in all tasks; during each task, participants were instructed to imagine reaching and grasping movements repeatedly in selected directions. For each task, 8 active blocks and 7 rest blocks (each 20 s) were interleaved. The fMRI images were acquired with a Siemens MAGNETOM Trio, A Tim Syngo scanner using echo planar imaging (EPI, TE = 30 ms, TR = 3000 ms), angulated in parallel to the anterior and posterior commissure line. A T1-weighted image was also obtained for anatomical reference. The fMRI data were preprocessed using Statistical Parametric Mapping 12 (SPM12, Wellcome Trust Centre for Neuroimaging, London, UK; http://www.fil.ion.ucl.ac.uk/spm/) executed in MATLAB 2015b (Mathworks Inc., Natick, MA).