1 Learning Objectives

This chapter explores a framework and some of the main building blocks in developing robots. You will learn about:

  • The Sense, Think, Act loop.

  • Different types of sensors that make robots ‘feel’ the world and find suitable sensors for use in specific scenarios.

  • Algorithms that make the robots’ ‘intelligent’.

  • Actuators that make robots move.

  • Commonly used computer vision algorithms that make robots ‘see’.

2 Introduction

In Chap. 4, we discussed that programming could be thought of as input, process, and output. Sense, Think, Act is a similar paradigm used in robotics. A robot could be thought of rudimentarily as analogous to how a human or an animal responds to environmental stimuli. For example, we humans perceive the environment through the five senses (e.g. sight). We might then ‘decide’ the following action based on these incoming signals and, finally, execute the action through our limbs. For example, you might see (sense) a familiar face in the crowd and think it would be good to grab their attention and then act on this thought by waving your hand.

Similarly, a robot may have several sensors through which it could sense the environment. An algorithm could then be used to interpret and decide on an action based on the incoming sensory information. This computational process could be thought of as analogous to the thinking process in humans. Finally, the algorithm sends out a set of instructions to the robot’s actuators to carry out the actions based on the sensor information and goals (Fig. 7.1).

Fig. 7.1
figure 1

Sense, Think, Act loop in robotics

The current configuration of the robot is called the robot’s state . The robot state space is all possible states a robot could be in. Observable states are the set of fully visible states to the robot, while other states might be hidden or partly visible to the robot. Such states are called partially observable states. Some states are discrete (e.g. motor on or off), and others could be continuous (e.g. rotational speed of the motor). In the above paradigm, the sense element observes the state, and the Act element proceeds to alter the state.

An Industry Perspective

Vitaliy Khomko, Vision Application Developer

Kinova Inc.

My journey into robotics started when I joined JCA Technologies, Manitoba, in 2015. At that time, the maturity of sensors and controllers technology allowed innovators to create smart agricultural and construction equipment capable of performing many complex operations autonomously with very little input from a machine operator. Frankly speaking, I did not get to choose robotics. I simply got sucked into the technological vortex because the industry was screaming for innovation as well as researchers and developers to drive it. In 2018, I was welcomed by the team of very passionate roboticists at Kinova, Quebec, in the position of Vision Technology Developer. The creative atmosphere fuelled by Kinova’s employees kept driving me for many months. I worked hard during the day trying my best to fit in and kept learning new stuff in the evening to fill in the blanks. All this hard work paid off well in the end. I must admit, now, I can wield magic with a vision-enabled robot.

Continuous learning and keeping up with all the industry trends is by far the most challenging and time-consuming. The theoretical knowledge alone, though, can only help with being on the right track. In reality, when working on delivering a real consumer product, an enormous effort goes into research and evaluation, work planning, development/coding, and testing. Maintaining a good relationship with your coworkers is essential. At the end of the day, it is your teammates who give you a hand when you get stuck, who share your passion and excitement, who appreciate your effort, and who let you feel connected. Nothing really compares with the satisfaction of joint accomplishment when you can pop a beer by the end of a long day with your colleagues after delivering the next milestone, watching the robot finally doing its thing over and over again.

With regard to evolution, I certainly noticed a shift from simple automated equipment controlled by human experts into very efficient autonomous machines capable of making decisions. Sensors have been around for a long time. By strategically placing them into a machine, one can achieve an unprecedented amount of feedback from a machine to allow better control and operation precision. The amount of information and real-time constraints, though, can be too much even for an expert human operator to process through. What really made a difference now is the availability of algorithms and computational devices to enable a certain degree of machine autonomy. For example, camera technology is widely available these days. But it is not the camera alone that enables vision-guided robotics. Its robot–camera calibration, 2D/3D object matching and localisation grasping clearance validation, etc. Some can argue that recent advancements in artificial intelligence mainly contributed to that evolution. I think AI is just another tool. And by no means an ultimate solution to every problem.

3 Sense: Sensing the World with Sensors

Everything changes in the real world. Some changes are notable while others are subtle, some are induced, and some are provoked. However, these changes always reveal information hidden from the initial perception. Sensing the changes in the environment are particularly meaningful and allow perception and interaction with the surrounding world. For example, humans perceive the displacements of the colour patterns on the retina and compute those displacements to understand the changes. Some animals, such as bats, can use echolocation to estimate their environment changes and localise themselves. Unlike humans or animals, robots do not have naturally occurring senses. Therefore, robots need extra sensors to help them sense the environment and use algorithms to process and understand the information. For example, a typical sensor such as a video camera can be considered the robot’s ‘eyes’. A sonar sensor could be thought of as equivalent to echolocation in a bat. By having different sensors integrated with robots, they can achieve various tasks like the one human being can do.

3.1 Typical Sensor Characteristics

Sensors could be characterised in various ways. Let us look at some of the common characteristics and their definitions first.

3.1.1 Proprioceptive and Exteroceptive

As humans perceive aches and pains internal to their body, so could robots sense various internal states of the robot, such as the speed of its wheels/motors or the current drawn by its internal power circuitry. Such sensors are called proprioceptive sensors. On the other hand, sensors that provide information about the robot’s external environment are called exteroceptive sensors.

3.1.2 Passive and Active Sensors

A sensor that only has a detector to observe or measure the physical properties of the environment is categorised as a passive sensor. A light sensor is an example. In contrast, active sensors emit their own signal or energy to the environment and employ a detector to observe the reaction resulting from the emitted signal. A sonar sensor is a typical example.

3.1.3 Sensor Errors and Noise

However well made a sensor is, they are susceptible to various manufacturing errors and environmental noise. However, some of these errors could be anticipated and understood. Such errors that are deterministic and reproducible are called systematic errors . Systematic errors could be modelled and integrated as part of the sensor characteristics. Other errors are difficult to pinpoint. These could be due to environmental effects or other random processes. Such errors are called random errors . Understanding these errors is crucial to deploying a successful robotics system. When this information is not readily available for the sensor selected, you will need to conduct a thorough error analysis to isolate and quantify the systematic errors and figure out how to capture the random errors.

3.1.4 Other Common Sensor Characteristics

You may encounter the following terms describing various other characteristics of a sensor. It is important to understand what they mean in a given context to use the appropriate sensor for the job.

Resolution The minimum difference between two values that the sensor can measure.

Accuracy The uncertainty in a sensor measurement with respect to an absolute standard.

Sensitivity The smallest absolute change that a sensor can measure.

Linearity Whether the output produced by a sensor depends linearly on the input.

Precision The reproducibility of the sensor measurement.

Bandwidth The speed at which a sensor can provide measurements. Usually expressed in Hertz (Hz)readings per second.

Dynamic range Under normal operation, this is the ratio between the limits of the lower and upper sensor inputs. This is usually expressed in decibels (dB):

$${\text{Dynamic}}\,{\text{Range}} = 10\log \log_{10} \left( {\frac{{{\text{upper}}\,{\text{limit}}}}{{{\text{lower}}\,{\text{limit}}}}} \right)$$

3.2 Common Sensors in Robotics

3.2.1 Light Sensors

Light sensors are used to detect light that creates a difference in voltage signal to feedback to the robot’s system. The two common light sensors that are widely used in the field of robotics are photoresistors and photovoltaic cells. The change in incident light intensity changes the photoresistor’s resistance in a photoresistor. More light leads to less resistance, vice versa. Photovoltaic cells, on the other hand, convert solar radiation into electricity. This is especially helpful when planning a solar robot. While the photovoltaic cell is considered an energy source, a smart implementation combined with transistors and capacitors can convert this into a sensor. Other light sensors, such as phototransistors, phototubes, and charge-coupled devices (CCD), are also available (Fig. 7.2).

Fig. 7.2
figure 2

A common light sensor (a photoresistor)

3.2.2 Sonar (Ultrasonic) Sensors

Sonar sensors (also called ultrasonic sensors) utilise acoustic energy to detect objects and measure distances from the sensor to the target objects. Sonar sensors are composed of two main parts, a transmitter and receiver.

The transmitter sends a short ultrasonic pulse, and the receiver receives what comes back of the signal after it has reflected from the surface of nearby objects. The sensor measures the time from signal transmission to reception, namely the time-of-flight (TOF).

Knowing the transmission rate of an ultrasonic signal, the distance to the target that reflects the signal can be calculated using the following equation.

$${\text{Distance}} = \left( {{\text{Time}} \times {\text{Speed}}\,{\text{Of}}\,{\text{Sound}}} \right)/2$$

where ‘2’ means the sound has to travel back and forth.

Sonar sensors can be used for mobile robot localisation through model matching or triangulation by computing the pose change between the inputs acquired at two different poses (Jiménez & Seco, 2005). Sonar sensors could also be used in detecting obstacles (see Fig. 7.3).

Fig. 7.3
figure 3

Four sonar sensors are embedded in the chest of this NAO robot to help detect any obstacles in front of it. A tactile sensor is embedded on its head

One of the challenges of using these sensors is that they are sensitive to noise from the surrounding and other sonar sensors with the same frequency. Moreover, they are highly dependent on the material and orientation of the object surface as these sensors make use of the reflection of the signal waves (Kreczmer, 2010). New techniques such as compressed high-intensity radar pulse (CHIRP) have been developed to improve sonar performance.

Sonar signals have a characteristic 3D beam pattern. This makes them suitable for detecting obstacles in a wide area when the exact geometric location is not needed. However, laser sensors provide a better solution for situations where precise geometry needs to be inferred.

3.2.3 Laser and LIDAR

Laser sensors can be utilised in several applications related to positioning. It is a remote sensing technology for distance measurement that involves transmitting a laser beam towards the target and analysing the reflected light. Laser-based range measurements depend on either TOF or phase-shift techniques. Like the sonar sensor, a short laser pulse is sent out in a TOF system, and the time, until it returns, is measured. A low-cost laser range finder popular in robotics is shown in Fig. 7.4. Also see Fig. 7.10.

Fig. 7.4
figure 4

Hokuyo URG-04LX range finder

LIDAR Light Detection And Ranging (LIDAR) has found many applications in robotics, including object detection, obstacle avoidance, mapping, and 3D motion capture. LIDAR can be integrated with GPS and INS to enhance the performance and accuracy of outdoor positioning applications (Aboelmagd et al., 2013).

One of the disadvantages of using LIDAR is that it requires high computational ability to process the data, which may affect mobile robot applications’ real-time performance. Moreover, scanning can fail when the object’s material appears transparent, such as glass, as the reflections on these surfaces can bring misleading and unreliable data (Takahashi, 2007).

3.2.4 Visual Sensors

Compared with proximity sensors we mentioned above, optical cameras are low-cost sensors that provide a large amount of meaningful information.

The images captured by a camera can provide rich information about the robot’s environment once processed using appropriate image processing algorithms. Some examples include localisation, visual odometry, object detection, and identification. There are different types of cameras, such as stereo, monocular, omnidirectional, and fisheye, that suit all manner of robotic applications.

Monocular cameras (Fig. 7.5) are especially suitable for applications where compactness and minimum weight are critical. Moreover, low cost and easy deployment are the primary motivations for using monocular cameras for mobile robots. However, monocular cameras can only obtain visual information and are not able to obtain depth information. On the other hand, a stereo camera is a pair of identical monocular cameras mounted on a rig. It provides everything that a single camera can offer and extra information that benefits from two views. Based on the parallax principle, the stereo camera can estimate the depth map (a 2D image that depicts the depth relationship between the objects in the scene and the camera’s viewpoint) by utilising the two views of the same scene slightly shifted. Fisheye cameras are a variant of monocular cameras that provide wide viewing angles and are attractive for obstacle avoidance in complex environments, such as narrow and cluttered environments.

Fig. 7.5
figure 5

A popular monocular camera

3.2.5 RGB-D Sensors

RGB-D sensors are unconventional visual sensors that can simultaneously obtain a visible image (RGB image) and depth map of the same scene. They have been very popular in the robotics community for real-time image processing, robot localisation, obstacle avoidance. However, due to the limited range and sensitivity to noise, they are mostly used in indoor environments.

The Kinect sensor is one of the most well-known RGB-D sensors (Yes! The same sensor you use when playing video games on the Xbox), introduced to the market in November 2010 and has gained great popularity since then. The computer vision community quickly discovered that this depth-sensing technology could be used for other purposes while costing much less than some traditional three-dimensional (3D) cameras, such as time-of-flight-based ones. In June 2011, Microsoft released an SDK for the Kinect to be used as a tool for non-commercial products (Fig. 7.6).

Fig. 7.6
figure 6

A newer version of the Microsoft® Kinect sensor

The basic principle behind the Kinect depth sensor is the emission of an IR speckle pattern (invisible to the naked eye) and the simultaneous capture of an IR image by a CMOS camera fitted with an IR-pass filter. An image processing algorithm embedded inside the Kinect uses the relative positions of the dots in the speckle pattern (see Fig. 7.7) to calculate the depth displacement at each pixel position in the imagethe technique is called structured light. Hence, the depth sensor can provide the x-, y-, and z-coordinates of the surface of 3D objects.

Fig. 7.7
figure 7

A view from an RGB-D camera (from left to right—RGB image, depth image, IR image showing the projected pattern)

The Kinect sensor consists of an IR laser emitter and IR and RGB cameras. It simultaneously captures depth and colour images at frame rates of up to 30 Hz. The RGB colour camera delivers images at 640 × 480 pixels and 24 bits at the highest frame rate. In contrast, the 640 × 480 and 11 bits per pixel IR camera provides 2048 levels of sensitivity with a field-of-view of 50° horizontal and 45° vertical. The operational range of the Kinect sensor is from 50 to 400 cm.

3.2.6 Inertial Measurement Units

An inertial measurement unit (IMU) utilises gyroscopes and accelerometers (and optionally magnetometers and barometers) to sense motion and orientation. An accelerometer is a device for measuring acceleration and tilt. Two types of forces affect an accelerometer: gravity which helps determine how much the robot tilts. This measurement helps balance the robot or determine whether a robot is driving on a flat or uphill surface—the other is the dynamic force which is the acceleration required to move an object. These sensors are useful in inferring incremental changes in motion and orientation. However, they suffer from bias, drift, and noise. This requires regular calibration of the system before use or sophisticated sensor fusion and filter techniques (such as the EKF described in Chap. 9). You will often see IMU units used with computer vision systems or combined with Global Navigation Satellite System (GNSS) information. Such systems are commonly called INS/GNSS systems (Intertial Navigation Systems/GNSS).

3.2.7 Encoders

Simply put, encoders record movement metrics in some form. There are three types of encoders: linear encoders, rotary encoders, and angle encoders.

Linear encoders measure straight-line motion. Sensor heads that attach to the moving piece of machinery run along guideways. Those sensors are linked to a scale inside the encoder that sends digital or analog signals to the control system. Rotary encoders measure rotational movement. They typically surround a rotating shaft, sensing and communicating changes in its angular motion. Traditionally, rotary encoders are classified as having accuracies above ±10″ (arcseconds). Rotary encoders are also available, equipped with important functional safety capabilities. Similar to their rotary counterpart, angle encoders measure rotation. These, however, are most often used in applications when a precise measurement is required.

Mobile robots often use encoders to calculate their odometry . Odometry is the use of motion sensors to determine the robot’s temporal change in position relative to some known position. A simple example of using a rotary incremental encoder to calculate the robot’s travel distance could be illustrated using Fig. 7.8. A light is shone through a slotted disc (usually made of metal or glass). As the disc rotates, the light passing through the slots is picked up by a light sensor mounted on the other side of the disc. This signal could be converted into a sinusoidal or square wave using electronic circuitry. If this encoder is attached to the axis of the robot’s wheel, we can use the output signal to calculate the velocity at which the robot is moving.

Fig. 7.8
figure 8

A simplified rotary incremental encoder with 16 slots

To calculate the length travelled L (cm) using the output from an incremental encoder, we start by calculating the number of pulses per cm (PPCM):

$${\text{PPCM}} = \frac{{{\text{PPR}}}}{2\pi r}$$

where PPR is the pulses per revolutionwhich in the example in Fig. 7.8 is 16.

Then the length L is given by:

$$L = \frac{{{\text{Pulses}}}}{{{\text{PPCM}}}}$$

The speed (S) is then calculated as:

$$S = \frac{L}{{{\text{Time}}\,{\text{Taken}}}}$$

It is worth noting that the need to have these sensors closer to the motors often results in them being subject to electromagnetic noise. Therefore to improve the encoder’s performance as well as to decipher the direction of rotation, a second set of light and sensor pair is included with a 90° a phase shift (Fig. 7.9).

Fig. 7.9
figure 9

A popular hobby rotary incremental encoder with two outputs (quadrature encoder)

Fig. 7.10
figure 10

A modern vacuum cleaning robot integrates many sensors. On the top is a time-of-flight laser scanner. The front bumper includes several tactile sensors to detect any frontal collisions. What other sensors do you think this robot may have?

3.2.8 Force and Tactile Sensors

Both these types of sensors measure physical interactions between the robot and the external environment. A typical force sensor is usually used to measure external mechanical force input, such as in the form of a load, pressure, or tension. Sensors such as strain gauges and pressure gauges fall into this category. On the other hand, tactile sensors are generally used to mimic the sense of touch. Usually, tactile sensors are expected to measure small variations in force or pressure with high sensitivity. Robots designed to be interactive integrate many tactile sensors so they can respond to touch (e.g. Fig. 7.3). Sophisticated sensors are emerging that could mimic skin-like sensitivity to touch. A more primitive version could be seen in most vacuum cleaning robots, where the front bumper acts as a collision detector (Fig. 7.10).

3.2.9 Other Common Sensors in Robotics

Many other sensors are used in robotics, and new ones are developed in various research laboratories and commercialised regularly. These include microphones (auditory signals), compasses, temperature sensors (thermal and infrared sensors), chemical sensors, and many more. Therefore, it is prudent to research suitable sensors for your next project as new and more capable sensors may better suit your needs. Can you think of all the sensors that may be used in the robot shown in Fig. 7.10?

4 Think: Algorithms

A critical component of a robotic system is its ability to make control decisions based on the available sensory information and to realise the tasks and goals allocated to it. If the brains of a robot are the computers embedded in the robotic system, the algorithms are the software components that enable a robot to ‘think ’ and make decisions. Algorithms interpret the environment based on sensory input and decide what needs to be done at what given time and what is happening in the environment based on the allocated tasks.

In the most general sense, an algorithm is a finite list of instructions used to solve problems or perform tasks. To get a feel for the concept of algorithms, think about baking a sponge cake. How would you write down your whole process to make a sponge cake to a person who does not know baking at all? Answering these questions in a detailed and ordered way makes an algorithm. One of the attributes of an algorithm is that there is a systematic process that occurs in a specific order. The wrong order of the steps can result in a big difference. For example, if we change the order of steps in making sponge cake, for instance, put eggs and flour in the oven for half an hour before preheating the oven. That would not make any sense!

For a robotic system, algorithms are the specific recipes that help them ‘think’. They are precise sequences of instructions implemented using programming languages. The essential elements of an algorithm are input, sequence, selection, iteration, and output.

  • InputData, information or signals collected from the sensors or a command from a human operator.

  • Sequence—The order in which behaviours and commands are combined to produce the desired result.

  • Selection—Is the use of conditional statements in a process. For example, conditional statements such as [If then] or [If then else] can affect the process.

  • Iteration—Algorithms can use repetition to execute steps a certain number of times or until a specific condition is reached. It is also known as ‘looping’.

  • OutputDesired result or expected outcome, such as the robot reaching the targeted location or avoiding the collision with certain obstacles.

Robotics is rife with all kinds of algorithms, from simple obstacle avoidance to complex scene understanding using multiple sensors. Among these, computer vision algorithms play a significant role in their ability to infer the rich information generated through various optical camera systems discussed earlier. Therefore, we discuss some common vision algorithms found in robotics next.

5 Act: Moving About with Actuators

We identify robots as things that move around or with moving parts. In the Sense, Think, Act paradigm, the Act refers to this dynamic aspect of robots. The robot acts on the environment by manipulating it using various appendages called manipulators (arm-type robots ) or traversing it (mobile robots ). In order to act, a robot needs actuators. An actuator is a device that requires energy, such as electric, hydraulic, pneumatic, and external signal input, then convert them to a form of motion that can be controlled as desired.

5.1 Common Actuators in Robotics

5.1.1 Motors

The electric motor is a typical example of an electrically driven actuator. As they can be made in different sizes, types, and capacities, they are suitable for use in a wide range of robotic applications. There are various electric motors, such as servo motors, stepper motors, and linear motors.

Servo motors

A servo motor is controlled with an electric signal, either analog or digital, which determines the amount of movement. It provides control of position, speed, and torque. Servo motors are classified into different types based on their application, such as the AC servo motor and DC servo motor.

The speed of a DC motor is directly proportional to the supply voltage with a constant load, whereas, in an AC motor, speed is determined by the frequency of the applied voltage and the number of magnetic poles. AC motors are commonly used in servo applications in robotics and in, in-line manufacturing, and other industrial applications where high repetitions and high precision are required.

DC servo motors are commutated mechanically with brushes, using a commutator, or electronically without brushes. Brushed motors are generally less expensive and simpler to operate, while brushless motors are more reliable, have higher efficiency, and are less noisy (Fig. 7.11).

Fig. 7.11
figure 11

Hobby DC servo motors (left) and a high-end actuator (right) used in an industrial robot arm (courtesy of Kinova Robotics)

Stepper motors

A stepper motor is a brushless synchronous DC motor that features precise discrete angular motions. A stepper motor is designed to break up a single complete rotation into a number of much smaller and essentially equal part rotations. For practical purposes, these can be used to instruct the stepper motor to move through set degrees or angles of rotation. The end result is that a stepper motor can be used to transfer accurate movements to mechanical parts that require a high degree of precision. Stepper motors are very versatile, reliable, cost-effective and provide precise motor movements, allowing users to increase the dexterity and efficiency of programmed movements across a huge variety of applications and industries. Most 3D printers, for example, use multiple stepper motors to precisely control the 3D print head.

Linear motors

A linear motor operates on the same principle as an electric motor but provides linear motion. Unlike a rotary machine, a linear motor moves the object in a straight line or along a curved track. Linear motors can reach very high acceleration, up to 6 g, and travel speeds of up to 13 m/s. Due to this character, they are especially suitable for use in machine tools, positioning and handling systems, and machining centres.

5.1.2 Hydraulic Actuators

Hydraulic actuators are driven by the pressure of the hydraulic fluid. It consists of a cylinder, piston, spring, hydraulic supply and return line, and stem. They can deliver large amounts of power. As such, they can be used in construction machinery and other heavy-duty equipment.

There are some advantages to using hydraulic actuators. A hydraulic actuator can hold force and torque constant without the pump supplying more fluid or pressure due to the incompressibility of fluids. Hydraulic actuators can have their pumps and motors located a considerable distance away with minimal loss of power. Comparing the pneumatic cylinder of equal size, the forces generated by hydraulic actuators are 25 times greater, ensuring they operate well in heavy-duty settings. One of the disadvantages of using hydraulic actuators is that they may leak fluid, leading to reduced efficiency and, in extreme cases, damage to nearby equipment due to spillage. Hydraulic actuators require many complementary parts, including a fluid reservoir, motor, pump, release valves, and heat exchangers, along with noise reduction equipment.

5.1.3 Pneumatic Actuators

Pneumatic actuators have been known for being highly reliable, efficient, and safe sources of motion control. These actuators are driven by pressurised air that can convert energy in the form of compressed air into linear or rotary mechanical motion. They feature both simple mechanical design and flexible operation. They are widely used in combustible automobile engines, railway applications, and aviation. Most of the benefits of choosing pneumatic actuators over alternative actuators, such as electric ones, boil down to the reliability of the devices and the safety aspects. Pneumatic actuators are also highly durable, requiring less maintenance and long operating cycles.

5.1.4 Modern Actuators

Many new actuation methods and actuators have emerged in recent times. These include pneumatic tendons (Fig. 7.12) and other biologically inspired actuators, such as fish fins or octopus tentacles. Soft robotics is an emerging field that explores some of these developments. However, the compliance requirements and morphology of soft robots prevent the use of many conventional sensors seen in hard robots. As a result, there has been active research into stretchable electronic sensors. Elastomer sensors allow for minimal impact on the actuation of the robot.

Fig. 7.12
figure 12

Pneumatic rubber muscles used in animating this giant robotic structure during a performance by the artist, Stelarc (Reclining StickMan, 2020 Adelaide Biennial of Australian Art: Monster Theatres, Photographer—Saul Steed, Stelarc)

6 Computer Vision in Robotics

Computer vision techniques have been the subject of heightened interest and rigorous research for decades now as a way of sensing the world in all its complexity. Computer vision attempts to achieve the function of understanding the scene and the objects of the environment. Furthermore, the increasing computational power and progress in computer vision methods have made making robots ‘see’ a popular trend. As computer vision combines both sensors and algorithms, it deserves its own unique section within this chapter.

Computer vision in robotics refers to the capability of a robot to visually perceive and interact with the environment. Typical tasks are to recognise objects, detect ground planes, traverse to a given target location without colliding with obstacles, interact with dynamic objects, and respond to human intents.

Vision has been used in various robotic applications for more than three decades. Examples include applications in industrial settings, service, medical, and underwater robotics, to name a few. The following section will introduce some classic computer vision algorithms widely used in robotics, such as plane detection, optical flow, and visual odometry.

6.1 Plane Detection

For an autonomous mobile robot system, detecting the dominant plane is a fundamental task for obstacle avoidance and trajectory finding. The dominant plane can be considered a planar area occupying the largest region on the ground towards which the robot is moving. It provides useful information about the environment, particularly whether objects above the detected dominant plane and along the direction of the robot’s movement can be viewed as obstacles. A ground mobile robot or micro-aerial vehicle operating in an unknown environment must identify its surroundings before the system can conduct its mission. These vehicles should recognise obstacles within their operating area and avoid detected obstacles or travel over them where possible. There are various plane detection techniques such as RANSAC and the region growth method.

6.1.1 RANSAC

The random sample consensus (RANSAC) (Fischler & Bolles, 1981) method is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers. It is a very useful tool to find planes, with its principle to search for the best plane among three-dimensional (3D) point clouds. At the same time, it is computationally efficient even when the number of points is vast. Plane detection using RANSAC starts by randomly selecting three points from the point cloud and calculating the parameters of the corresponding plane. The next step detects all the points of the original cloud belonging to the calculated plane based on the given threshold. Repeating this procedure for N rounds, each time, it compares the obtained result with the last saved one, and if the new one is better, it replaces the saved one (see Algorithm 1).

The four types of data needed as input for this algorithm are:

  • a 3D point cloud which is a matrix of the three coordinate columns X, Y, and Z;

  • a tolerance threshold of distance t between the chosen plane and other points;

  • a probability (α) which lies typically between 0.9 and 0.99 and is the minimum probability of finding at least one good set of observations in N rounds; and

  • the maximum probable number of points belonging to the same plane.

figure a

As one of the most well-known methods for plane detection, RANSAC has been shown to be capable of detecting planes in both 2D and 3D. For example, in Fig. 7.13, two groups of noisy 3D points (blue and red) with two planes detected successfully using the RANSAC method.

Fig. 7.13
figure 13

Two groups of 3D points representing two planes detected using the RANSAC method

6.1.2 Region Growth Method

The region growth method for plane detection was first introduced by Hähnel et al., (2003) with the goal of creating a low complexity model that can be implemented in real time. It works from a seed chosen randomly from the point cloud, which consists of sufficient information to fit a plane and adds more points based on specific selection conditions, such as if three points are needed or whether a point with a corresponding normal can be used. Then, when the neighbouring points are consistent with the plane, they are considered part of it. This procedure is repeated until no more points can be found, and then the algorithm stops and adds the plane if it contains enough points. Finally, the points are removed from the point set, and a new seed is selected. A brief outline of this algorithm is presented in Algorithm 2.

figure b

6.2 Optical Flow

Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and the scene. It is believed that insects and birds frequently use optical flow for short-range navigation and obstacle avoidance. For example, biologists have reported that birds use optical flow to avoid obstacles and manoeuvre landings. In addition, many mammals possibly use optical flow to detect the motions of objects. All these discoveries regarding optical flow provide new ideas for roboticists to develop visual-based robots with the capability to navigate safely and quickly in unknown environments.

Optical flow can be treated as the apparent motions of objects, brightness patterns or feature points observed by eyes or cameras. Based on this definition, it can be computed from the difference between two sequences, which is usually expressed as:

\(\left[ {\dot{u},\dot{v}} \right]^{T} = f\left( {u,v} \right)\), where its unit is pix/sec or pix/frame.

Optical flow can also be defined as the projection of the relative 3D motion between an observer and scene into the image plane. As an image consisting of many pixels with unique coordinates, it can be described as a two-dimensional (2D) vector in image sequences. Therefore, the motion field model can be described as:

$${\text{OF}} = \frac{V}{d}$$

where OF is the optical flow field, V is the observer velocity vector, and d is the distance between the observer and the image plane with the unit normally rad/s or °/s. The two definitions above mentioned are the same for an ideal situation after a coordinate transformation.

For a short duration, the intensity structures of local time-varying image regions are approximately constant. Based on this assumption, if \(I\left( {x,t} \right)\) is the image intensity function, we have:

$$I\left( {x,t} \right) = I\left( {x + \delta_{x} ,y + \delta_{y} } \right),$$

where \(\delta_{x}\) is the displacement of the local image region at \(\left( {x,t} \right)\) at time \(t + \delta_{t}\). This equation expanded in a Taylor series yields:

$$I\left( {x,t} \right) = I\left( {x,t} \right) + \nabla_{I} \cdot \delta_{x} + \delta_{t} I_{t} + O^{2}$$

where \(\nabla_{I} = \left( {I_{x} ,I_{y} } \right)\) and \(I_{t}\) are the first-order partial derivatives of \(I\left( {x,t} \right)\) and \(O^{2}\) the second-and higher-order terms, which are negligible. The previous equation can be rewritten as:

$$\nabla_{I} \cdot V + I_{t} = 0$$

dividing by \(\delta_{t}\), where \(\nabla_{I} = \left( {I_{x} ,I_{y} } \right)\) is the spatial intensity gradient, and \(V = \left( {u,v} \right)\) is the image velocity. This is known as the optical flow constraint equation, which defines a single local constraint on image motion (Fig. 7.14).

Fig. 7.14
figure 14

Detected optical flow indicated by red arrows, longer the arrow length faster movement of the pixel patch (translation on the left, rotation on the right)

Many methods have been proposed for detecting the optical flow. Some techniques are briefly discussed next.

6.2.1 Lucas–Kanade Method and Horn–Schunck Method

The Lucas–Kanade method (Lucas & Kanade, 1981) and Horn–Schunck method (Horn & Schunk, 1981) are widely used classical differential methods for optical flow estimation. Lucas–Kanade method assumes that the flow is constant in a local neighbourhood of the pixel under consideration and solves the basic optical flow equations for all the pixels in that neighbourhood by the least-squares criterion. By combining information from several nearby pixels, the Lucas–Kanade method can often resolve the inherent ambiguity of the optical flow equation. It is also less sensitive to image noise compared with other methods.

Horn–Schunck method is another classical optical flow estimation algorithm. It assumes smoothness in the flow over the whole (global) image. Thus, it tries to minimise distortions in flow and prefers solutions that show more smoothness. As a result, it is more sensitive to noise than the Lucas and Kanade method. Many current optical flow algorithms are built upon these frameworks.

6.2.2 Energy-Based Methods

Energy-based optical flow calculation methods are also called frequency-based methods because they use the energy output from velocity-tuned filters. Under certain conditions, these methods can be mathematically equivalent to differential methods mentioned previously. However, it is more difficult for differential and correlation methods to deal with sparse patterns of moving dots than energy-based methods.

6.2.3 Phase-Based Methods

A phase-based technique is a classical method calculating the optical flow using the phase behaviours of band-pass filter outputs. It was first introduced by Fleet and Jepson (1990) and has been shown to be more accurate than other local methods mainly because phase information is robust to changes, in contrast, scale orientation and speed (Fleet & Jepson, 1990). However, the main drawback of phase-based techniques is the high computational load associated with their filtering operations.

Correlation methods

Correlation-based methods find matching image patches by maximising some similarity measure between them under the assumption that the image patches have not been overly distorted over a local region. Such methods may work in cases of high noise and low temporal support where numerical differentiation methods are not as practical. These methods are typically used for finding stereo matches for the task of recovering depth.

6.3 Visual Odometry

Visual odometry (VO) is a method for estimating the position and orientation of mobile robots, such as a ground robot or flying platform, using the input from a single or multiple cameras attached to it (Scaramuzza & Fraundorfer, 2011). It estimates a position by integrating the displacements obtained from consecutive images observed from onboard vision systems. It is vital in environments in which a GPS is not available for absolute positioning (Weiss et al., 2011).

Many conventional odometry solutions produce unpredictable errors in the measurements delivered by gyroscopes, accelerometers, and wheel encoders. It has been found that, for Mars exploration Rovers experiencing small translations over the sandy ground, large rocks or steep slopes, the visual odometry needs to be corrected for errors arising from motions and wheel slip (Maimone et al., 2007). A vehicle’s position can be estimated by either stereo or monocular cameras using feature matching or tracking technologies. In Garratt and Chahl (2008), the translation and rotation are estimated using the image interpolation algorithm with a downward-facing camera. Methods for computing ego-motion directly from image intensities have also been suggested (Hanna, 1991; Heeger & Jepson, 1992). The issue with using just one camera is that only the direction of motion, not the absolute velocity scale, can be determined, known as the scaling factor problem. However, using an omnidirectional camera can solve this problem; for example, safe corridor navigation for a micro air vehicle) (MAV) using an optical flow method is achieved in Conroy et al., (2009), but this operation requires a great deal of computational time.

7 Review Questions

  • What is the difference between an AC motor and a DC motor?

  • What is the difference between a camera and an RGB-D sensor?

  • A typical rotary encoder used in a wheeled mobile robot to measure the distance it travels has 40 slots. The robot’s wheel to which this sensor is mounted has a diameter of 7 cm. If the sensor gives out a steady 7 Hz square pulse, what is the robot’s speed in cm/s?

8 Further Reading

Although a little dated, the Sensors for Mobile Robots by Everett and Robot Sensors and transducers by Ruocco provide comprehensive coverage of classical sensors used in robotics. Computer Vision: Algorithms and Applications by Szeliski is an excellent introductory book on computer vision in general. For more robotics-related concepts in computer vision as well for those interested in reading more advanced topics in robotics, Corke’s Robotics, Vision and Control are highly recommended. The book includes many code samples and associated toolboxes in Matlab®. Programming Computer Vision with Python: Tools and algorithms for analysing images by Solem provide many Python-based examples of vision algorithm implementations. Algorithms by Sedgewick and Wayne is one of the best books on the topic.