Skip to main content

Binocular Stereo Vision

  • Chapter
  • First Online:
3-D Computer Vision

Abstract

The human visual system is a natural stereoscopic vision system that acquires 3-D information through binocular imaging.

In computer vision, stereo vision mainly studies how to use (multi-image) imaging technology to obtain distance (depth) information of objects in a scene from (multiple) images. This chapter will introduce the workflow of stereo vision, and analyzes the six functional modules involved in the process of stereo vision one by one. This chapter will discuss the method of matching binocular images based on regions. First, the principle of mask matching is introduced, and then the focus is on the detailed analysis of various constraints in stereo matching. This chapter will discuss the method of matching binocular images based on features. Based on the introduction of the basic steps and methods, the widely used Scale Invariant Feature Transformation (SIFT) is described in detail, and dynamic programming based on ordering constraints is also discussed. This chapter will introduce a method for detecting and correcting the errors of the parallax map/image, which is characterized by being more versatile and fast.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kuvich G. Active vision and image/video understanding systems for intelligent manufacturing. SPIE, 2004, 5605: 74-86.

    Google Scholar 

  2. Maitre H, Luo W. Using models to improve stereo reconstruction. IEEE-PAMI, 1992, 14(2): 269-277.

    Google Scholar 

  3. Zhang Y-J. Image Engineering, Vol.1: Image Processing. De Gruyter, 2017.

    Google Scholar 

  4. Zhang Y-J. Image Engineering, Vol.3: Image Understanding. De Gruyter, 2017.

    Google Scholar 

  5. Forsyth D, Ponce J. Computer Vision: A Modern Approach, 2nd Ed. Prentice Hall. 2012.

    Google Scholar 

  6. Davies E R. Machine Vision: Theory, Algorithms, Practicalities. 3rd Ed. Elsevier. 2005.

    Google Scholar 

  7. Lew M S, Huang T S, Wong K. Learning and feature selection in stereo matching. IEEE-PAMI, 1994, 16(9): 869-881.

    Google Scholar 

  8. Kim Y C, Aggarwal J K. Positioning three-dimensional objects using stereo images. IEEE-RA, 1987, 1: 361-373.

    Google Scholar 

  9. Nixon M S, Aguado A S. Feature Extraction and Image Processing. 2nd. Ed. Academic Press. 2008.

    Google Scholar 

  10. Zhang Y-J. Image Engineering, Vol.2: Image Analysis. De Gruyter, 2017.

    Google Scholar 

  11. Forsyth D, Ponce J. Computer Vision: A Modern Approach. Prentice Hall. 2003.

    Google Scholar 

  12. Jia B, Zhang Y-J, Lin X G. General and fast algorithm for disparity error detection and correction. Journal of Tsinghua University (Sci & Tech), 2000, 40(1): 28-31.

    Google Scholar 

  13. Huang X M, Zhang Y-J. An O(1) disparity refinement method for stereo matching. Pattern Recognition, 2016, 55: 198-206.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Self-Test Questions

Self-Test Questions

The following questions include both single-choice questions and multiple-choice questions, so each option must be judged.

  1. 6.1

    Stereo Vision Process and Modules

    1. 6.1.1

      In the stereo vision process shown in Fig. 6.1, (·).

      1. (a)

        Image acquisition should be carried out on the basis of camera calibration.

      2. (b)

        The function of the feature extraction module is to extract the features of the pixel set for matching.

      3. (c)

        The depth interpolation in post-processing is to help stereo matching.

      4. (d)

        Post-processing is needed because the 3-D information obtained is often incomplete or has certain errors.

    [Hint] Consider the sequence of the stereo vision process.

    1. 6.1.2

      Consider the various modules in the stereo vision process given in Fig. 6.1, (·).

      1. (a)

        The stereo matching module is only used when it can directly make 3-D imaging.

      2. (b)

        The feature extraction module can directly extract the gray value of the pixel set as a feature.

      3. (c)

        The image acquisition module can directly acquire 3-D images to achieve 3-D information recovery.

      4. (d)

        The function of the 3-D information recovery module is to establish the relationship between the image points of the same space points in different images.

    [Hint] Consider the respective functions and connections of each module.

    1. 6.1.3

      Which of the following description(s) is/are incorrect? (·).

      1. (a)

        Although the positioning accuracy of large-scale features is poor, they contain a lot of information and match faster.

      2. (b)

        If only a single camera is used for image acquisition, there is no need for calibration.

      3. (c)

        The gray values of pixels in small regions are relatively related, so it is suitable for grayscale correlation matching.

      4. (d)

        If the camera baseline is relatively short, the difference between the captured images will be relatively large.

    [Hint] Analyze the meaning of each description carefully.

  2. 6.2

    Region-Based Stereo Matching

    1. 6.2.1

      In template matching, (·).

      1. (a)

        The template used must be square.

      2. (b)

        The size of the template used must be smaller than the size of the image to be matched.

      3. (c)

        The matching positions determined by the correlation function and the minimum mean square error function are consistent.

      4. (d)

        The matching position calculated by the correlation coefficient does not change with the gray value of the template and the matching image.

    [Hint] Matching is to determine the most relevant position.

    1. 6.2.2

      Among the various constraints used for matching, (·).

      1. (a)

        The epipolar line constraint restricts the position of the pixel.

      2. (b)

        The uniqueness constraint restricts the attributes of pixels.

      3. (c)

        The continuity constraint restricts the position of pixels.

      4. (d)

        The compatibility constraint restricts the attributes of pixels.

    [Hint] The attribute of the pixel corresponds to f, while the position corresponds to (x, y).

    1. 6.2.3

      In the following description of epipolar line constraint, (·).

      1. (a)

        The epipolar constraint can help reduce the amount of calculation by half in the matching search process.

      2. (b)

        The epipolar line in one imaging plane and the extreme point in another imaging plane are corresponding.

      3. (c)

        The epipolar line pattern can provide information about the relative position and orientation between two cameras.

      4. (d)

        For any point on an imaging plane, all points corresponding to it on the imaging plane 2 are on the same straight line.

    [Hint] Refer to Example 6.2–Example 6.4.

    1. 6.2.4

      Comparing the essential matrix and the fundamental matrix, (·).

      1. (a)

        The degree of freedom of the essential matrix is more than that of the fundamental matrix.

      2. (b)

        The role or function of the fundamental matrix and the essential matrix is similar.

      3. (c)

        The essential matrix is derived from uncorrected cameras.

      4. (d)

        The fundamental matrix reflects the relationship between the projection point coordinates of the same space points on two images.

    [Hint] Consider the different conditions in the derivation of the two matrices.

  3. 6.3

    Feature-Based Stereo Matching

    1. 6.3.1

      For feature-based stereo matching technology, (·).

      1. (a)

        It is not very sensitive to the surface structure of the scene and light reflection.

      2. (b)

        The feature point pair used is the point determined according to the local properties in the image.

      3. (c)

        Each point in the stereo image pair can be used as a feature point in turn for matching.

      4. (d)

        The matching result is not yet a dense parallax field.

    [Hint] Consider the particularity of the features.

    1. 6.3.2

      Scale Invariant Feature Transformation (·).

      1. (a)

        Needs to use multi-scale representation of images

      2. (b)

        Needs to search for extreme values in 3-D space

      3. (c)

        In which the 3-D space here includes position, scale, and direction

      4. (d)

        In which the Gaussian difference operator used is a smoothing operator

    [Hint] Analyze the meaning of each calculation step in the scale invariant feature transformation.

    1. 6.3.3

      For sequential constraints, (·).

      1. (a)

        It indicates that the feature points on the visible surface of the object are in the same order as their projection points on the two images.

      2. (b)

        It can be used to design a stereo matching algorithm based on dynamic programming.

      3. (c)

        It may not be true/hold when there is occlusion between objects.

      4. (d)

        When the graphical representation is performed according to the dynamic programming method, the interval between some feature points will degenerate into one point, and the order of constraint determination is invalid.

    [Hint] Analyze the conditions for the establishment of sequential constraints.

  4. 6.4

    Error Detection and Correction of Parallax Map

    1. 6.4.1

      In the method of parallax map error detection and correction, (·).

      1. (a)

        Only the region where the crossing number is not zero should be considered.

      2. (b)

        The crossing number in a region is proportional to the size of the region.

      3. (c)

        To calculate the total crossing number, twice summations are performed.

      4. (d)

        The crossing number in a region is proportional to the length of the region.

    [Hint] Consider the definition and connection of the crossing number and the total crossing number.

    1. 6.4.2

      Analyze the following statements, which is/are correct? (·).

      1. (a)

        In the crossing region, the crossing values of adjacent points differ by 1.

      2. (b)

        The zero-crossing correction algorithm must make Ntc = 0, so it is named.

      3. (c)

        The sequential matching constraint refers to the sequential constraint, so it indicates that the order of the space points is reversed to the order of their imaging points.

      4. (d)

        The zero-crossing correction algorithm is an iterative algorithm. After each iteration, the total crossing numbers will always decrease.

    [Hint] Analyze the meaning of each step in the zero-crossing correction algorithm.

    1. 6.4.3

      In Example 6.8, Ntc = 28 before correction; please find a new matching point fL(187, j) that corresponds to fR(160, j) and can reduce Ntc. It can correct the parallax value d(160, j), corresponding to fR(160, j), to d(160, j) = X[fL(187, j)] − X[fR(160, j)] = 27. At this time, (·).

      1. (a)

        Ntc = 16

      2. (b)

        Ntc = 20

      3. (c)

        Ntc = 24

      4. (d)

        Ntc = 28

    [Hint] The crossing number on the left side of the correction point fR(160, j) will decrease but on the right side may increase. Need specific calculations.

    1. 6.4.4

      On the basis of 6.4.3, find the fR(161, j) with the largest crossing number, and determine the new matching point corresponding to fR(161, j) that can reduce Ntc. In this way, the correction can make the total crossing number Ntc drop to (·).

      1. (a)

        20

      2. (b)

        15

      3. (c)

        10

      4. (d)

        5

    [Hint] The new matching point corresponding to fR(161, j) and capable of reducing Ntc is fL(188, j).

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhang, YJ. (2023). Binocular Stereo Vision. In: 3-D Computer Vision. Springer, Singapore. https://doi.org/10.1007/978-981-19-7580-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-7580-6_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-7579-0

  • Online ISBN: 978-981-19-7580-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics