An integrated model of autonomous topological spatial cognition

Karaoğuz, Hakan; Bozma, H. Işıl

doi:10.1007/s10514-015-9514-4

An integrated model of autonomous topological spatial cognition

Published: 14 November 2015

Volume 40, pages 1379–1402, (2016)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Hakan Karaoğuz¹ &
H. Işıl Bozma¹

838 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

This paper is focused on endowing a mobile robot with topological spatial cognition. We propose an integrated model—where the concept of a ‘place’ is defined as a collection of appearances or locations sharing common perceptual signatures or physical boundaries. In this model, as the robot navigates, places are detected in a systematic manner via monitoring coherency in the incoming visual data while pruning out uninformative or scanty data. Detected places are then either recognized or learned along with mapping as necessary. The novelties of the model are twofold: First, it explicitly incorporates a long-term spatial memory where the knowledge of learned places and their spatial relations are retained in place and map memories respectively. Second, the processing modules operate together so that the robot is able to build its spatial memory in an organized, incremental and unsupervised manner. Thus, the robot’s long-term spatial memory evolves completely on its own while learned knowledge is organized based on appearance-related similarities in a manner that is amenable for higher-level semantic reasoning, As such, the proposed model constitutes a step forward towards having robots that are capable of interacting with their environments in an autonomous manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What is a cognitive map? Unravelling its mystery using robots

Article 11 December 2018

Merging of appearance-based place knowledge among multiple robots

Article Open access 26 March 2020

Strong Spatial Cognition

Notes

Note that this definition differs appearance-based or topological SLAM methods where each location is considered separately as a place (Cummins and Newman 2008; Newman et al. 2009a; Konolige et al. 2010) or a representative location (key-place) is selected after grouping visual data from different locations (Murphy and Sibley 2014). In the former, key-frames do not necessarily represent distinct ‘places’since their selections are typically arbitrary while in the latter, key-places may not encode all the place related knowledge since they are defined by the midpoint frames of the associated clusters.
As there are no externally provided labels expressed in natural language such as “kitchen” or “Saar building”, it is not possible to expect such explicit label assignments (Walter et al. 2014).
For the interested reader, they are explained briefly in Appendix “Bubble space”.
Note that in a related work (Murphy and Sibley 2014), there are 28 key-places detected with the same dataset.

References

Beeson, P., Modayil, J., & Kuipers, B. (2010). Factoring the mapping problem: Mobile robot map-building in the hybrid spatial semantic hierarchy. The International Journal of Robotics Research, 29(4), 428–459.
Article Google Scholar
Casati, R. (2002). Topology and cognition. New York: Mcmillan.
Google Scholar
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.
Article Google Scholar
Chella, A., Macaluso, I., & Riano, L. (2007). Automatic place detection and localization in autonomous robotics. In International conference on intelligent robots and systems (pp. 741–746).
Cummins, M., & Newman, P. (2008). Fab-map: Probabilistic localization and mapping in the space of appearance. International Journal of Robotics Research, 27, 647–665.
Article Google Scholar
Cummins, M., & Newman, P. (2011). Appearance-only slam at large scale with fab-map 2.0. The International Journal of Robotics Research, 30(9), 1100–1123.
Denis, M., & Loomis, J. M. (2007). Perspectives on human spatial cognition: Memory, navigation, and environmental learning. Psychological Research, 71(3), 235–239.
Article Google Scholar
Dolins, F. L., & Mitchell, R. W. (2010). Linking spatial perception and spatial cognition. Cambridge: Cambridge University Press.
Google Scholar
Erkent, O., & Bozma, H. I. (2012). Place representation in topological maps based on bubble space. In Proceedings of international conference on robotics and automation (pp. 3497–3502).
Erkent, O., & Bozma, H. I. (2013). Bubble space and place representation in topological maps. The International Journal of Robotics Research, 32(6), 671–688.
Article Google Scholar
Erkent, O., & Bozma, H. I. (2015). Long-term topological place learning. In IEEE international conference on robotics and automation.
Galindo, C., Saffiotti, A., Coradeschi, S., Buschka, P., Fernandez-Madrigal, J., & Gonzalez, J. (2005). Multi-hierarchical semantic maps for mobile robotics. In IEEE/RSJ international conference on intelligent robots and systems, 2005. (IROS 2005) (pp. 2278–2283).
Glover, A., Maddern, W., Warren, M., Reid, S., Milford, M., & Wyeth, G. (2012). Openfabmap: An open source toolbox for appearance-based loop closure detection. In IEEE international conference on robotics and automation (ICRA), 2012 (pp. 4730–4735). IEEE.
Ho, K. L., & Newman, P. (2007). Detecting loop closure with scene sequences. International Journal of Computer Vision, 74(3), 261–286.
Article Google Scholar
Karaoguz, H., & Bozma, H. I. (2014). Reliable topological place detection in bubble space. In Proceedings of international conference on robotics and autonomous (pp. 697–702).
Karaoguz, H., & Bozma, H. I. (2015). Topological place recognition based on long-term memory retrieval. In Proceedings of international conference on advanced robotics (ICAR), 2015 (pp. 218–223).
Konolige, K., Bowman, J., Chen, J., Mihelich, P., Calonder, M., Lepetit, V., et al. (2010). View-based maps. The International Journal of Robotics Research, 29(8), 941–957.
Article Google Scholar
Kuipers, B. (2000). The spatial semantic hierarchy. Artificial Intelligence, 119(1–2), 191–233.
Article MathSciNet MATH Google Scholar
Lim, J., Frahm, J. M., & Pollefeys, M. (2012). Online environment mapping using metric-topological maps. International Journal of Robotics Research, 31(12), 1394–1408.
Article Google Scholar
Liu, M., & Siegwart, R. (2012). DP-FACT: Towards topological mapping and scene recognition with color for omnidirectional camera. In Proceedings of international conference on robotics and automation (pp. 3503–3508).
Martinez-Gomez, J., & Caputo, B. (2011). Towards semi-supervised learning of semantic spatial concepts. In Proceedings of international conference on robotics and autonomous (pp. 1936–1943).
Mozos, O. M., Jensfelt, P., Zender, H., Kruijff, G. J. M., & Burgard, W. (2007a). From labels to semantics: An integrated system for conceptual spatial representations of indoor environments for mobile robots. In Proceedings of IEEE/RSJ IROS workshop: From sensors to human spatial concepts.
Mozos, O. M., Triebel, R., Jensfelt, P., Rottmann, A., & Burgard, W. (2007b). Supervised semantic labeling of places using information extracted from laser and vision sensor data. Robotics and Autonomous Systems, 55(5), 391–402.
Article Google Scholar
Murphy, L., & Sibley, G. (2014). Incremental unsupervised topological place discovery. In Proceedings of international conference on robotics and automation (pp. 1312–1318).
Newman, P., Sibley, G., Smith, M., Cummins, M., Harrison, A., Mei, C., et al. (2009a). Navigating, recognizing and describing urban spaces with vision and lasers. The International Journal of Robotics Research, 28(11–12), 1406–1433.
Article Google Scholar
Newman, P., Sibley, G., Smith, M., Cummins, M., Harrison, A., Mei, C., et al. (2009b). Navigating, recognizing and describing urban spaces with vision and lasers. The International Journal of Robotics Research, 28(11–12), 1406–1433.
Article Google Scholar
Posner, I., Schroeter, D., & Newman, P. (2008). Using scene similarity for place labelling. In O. Khatib, V. Kumar, & D. Rus (Eds.), Experimental robotics, springer tracts in advanced robotics (Vol. 39, pp. 85–98). Berlin/Heidelberg: Springer.
Google Scholar
Pronobis, A., & Caputo, B. (2009). COLD: COsy localization database. International Journal of Robotics Research, 28(5), 588–594.
Article Google Scholar
Pronobis, A., & Jensfelt, P. (2012). Large-scale semantic mapping and reasoning with heterogeneous modalities. In Proceedings of international conference on robotics and automation (pp. 3515–3522).
Pronobis, A., Sjöö, K., Aydemir, A., Bishop, A.N., & Jensfelt, P. (2010). Representing spatial knowledge in mobile cognitive systems. In 11th international conference on intelligent autonomous systems (IAS-11). Ottawa.
Ranganathan, A. (2010). PLISS: Detecting and labeling places using online change-point detection. In Proceedings of robotics: Science and systems.
Ranganathan, A. (2012). PLISS: Labeling places using online changepoint detection. Autonomous Robots, 32(4), 351–368.
Article MathSciNet Google Scholar
Remolina, E., & Kuipers, B. (2004). Towards a general theory of topological maps. Artificial Intelligence, 152(1), 47–104.
Article MathSciNet MATH Google Scholar
Robert, L. (1997). Spatial cognition: Geographic environments. Berlin: Kluwer Academic Publishers.
Google Scholar
Shi, L., Kodagoda, S., & Dissanayake, G. (2012). Application of semi-supervised learning with voronoi graph for place classification. In Proceedings of international conference on intelligent robots and systems (pp. 2991–2996).
Sibson, R. (1973). SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), 30–34.
Article MathSciNet Google Scholar
Smith, M., Baldwin, I., Churchill, W., Paul, R., & Newman, P. (2009). The new college vision and laser data set. International Journal of Robotics Research, 28(5), 595–599.
Article Google Scholar
Tapus, A., & Siegwart, R. (2005). Incremental robot mapping with fingerprints of places. In Proceedings of international conference on intelligent robots and systems (pp. 2429–2434). lEEE.
Tversky, B. (1993). Cognitive maps, cognitive collages, and spatial mental models. In Spatial information theory: A theoretical basis for GIS (pp. 14–24). Springer.
Tversky, B. (2005). Functional significance of visuospatial representations. In P. Shah & A. Miyake (Eds.), Handbook of higher-level visuospatial thinking (pp. 1–34). Cambridge: Cambridge University Press.
Chapter Google Scholar
Tversky, B., & Hemenway, K. (1983). Categories of scenes. Cognitive Psychology, 15, 121–149.
Article Google Scholar
Ursic, P., Kristan, M., Skocaj, D., & Leonardis, A. (2012). Room classification using a hierarchical representation of space. In Proceedings of international conference on intelligent robots and systems (pp. 1371–1378).
Vasudevan, S., & Siegwart, R. (2008). Bayesian space conceptualization and place classification for semantic maps in mobile robotics. Robotics and Autonomous Systems, 56(6), 522–537.
Article Google Scholar
Vasudevan, S., Gachter, S., Nguyen, V., & Siegwart, R. (2007). Cognitive maps for mobile robots-an object based approach. Robotics and Autonomous Systems, 55(5), 359–371.
Article Google Scholar
Walter, M. R., Hemachandra, S., Homberg, B., Tellex, S., & Teller, S. (2014). A framework for learning semantic maps from grounded natural language descriptions. The International Journal of Robotics Research, 33(9), 1167–1190.
Article Google Scholar
Williams, B., Cummins, M., Neira, J., Newman, P., Reid, I., & Tardos, J. (2009). A comparison of loop closing techniques in monocular SLAM. Robotics and Autonomous Systems, 57(12), 1188–1197.
Article Google Scholar
Yeh, T., & Darrell, T. (2008). Dynamic visual category learning. In Proceedings of computer vision and pattern recognition (pp. 1–8).
Zivkovic, Z., Bakker, B., & Krose, B. (2005). Hierarchical map building using visual landmarks and geometric constraints. In Proceedings of international conference on intelligent systems and robotics (pp. 2480–2485).
Zivkovic, Z., Booij, O., & Kröse, B. (2007). From images to rooms. Robotics and Autonomous Systems, 55(5), 411–418.
Article Google Scholar

Download references

Acknowledgments

This work has been supported in part by Bogazici University BAP Project 9164 and Tubitak Project EEAG 111E285. The first author is supported by Turkish State Planning Organization (DPT) under the TAM Project number 2007K120610.

Author information

Authors and Affiliations

Intelligent Systems Lab., Electrical & Electronics Eng., Boğaziçi University, Istanbul, Turkey
Hakan Karaoğuz & H. Işıl Bozma

Authors

Hakan Karaoğuz
View author publications
You can also search for this author in PubMed Google Scholar
H. Işıl Bozma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hakan Karaoğuz.

Appendices

Appendix

The summary of the most commonly used symbols in the paper, their definitions and the sections where they are defined are presented in Table 5 for convenience.

Table 5 List of symbols

Full size table

Bubble space

This section presents a brief summary of bubble space representation for completeness. The interested reader is referred to Erkent and Bozma (2013) for further details. The bubble space ${\mathcal {B}} = {\mathcal {X}} \times {\mathcal {F}}$ is an abstract representation of the robot’s base along with its viewing directions (pan and tilt) ${\mathcal {F}} \subset S^2$ with $b \in {\mathcal {B}}$ defined as $b = \left[ x \, f \right] ^T$ where $x \in {\mathcal {X}}$ and $f \in {\mathcal {F}}$. Bubble surfaces $B_i(x,t): Im(h(x)) \times R^{\ge 0} \rightarrow R^{\ge 0}$ are hypothetical spherical surfaces surrounding the robot defined as:

$$\begin{aligned} B_i(x,t) = \left\{ \left[ \begin{array}{l} f \\ \rho _i(b,t) \end{array} \right] \mid \forall f \in {\mathcal {F}} \,\, \text{ and } b=\left[ x \,f\right] ^T \right\} \end{aligned}$$

(12)

where the image of a section h—namely Im(h(x))—is the set of viewing directions from a given base x with the section $h : {\mathcal {X}} \rightarrow {\mathcal {B}}$ defined as a continuous map such that $\forall x \in {\mathcal {X}}$, $\pi (h(x))=x$ and $\pi : {\mathcal {B}} \rightarrow {\mathcal {X}}$ defined as the projection of b onto ${\mathcal {X}}$ as $\pi (b)=x$. Finally, the function $\rho _i: {\mathcal {B}} \times R^{\ge 0} \rightarrow R^{\ge 0}$ is a Riemannian metric that encodes the observed values of ${ v }_i^{th}$ sensory feature. For simplification of notation, the second argument is omitted whenever time dependency is clear. Each bubble surface is initialized to be a $S^2$ sphere with radius $\rho _0 \in R^{\ge 0}$—namely $\rho _i(b,0)=\rho _0$. As the robot looks around, for each viewing direction $f \in {\mathcal {F}}$, it computes each feature value $q_{i}(b,t) \ge 0$. Next, each bubble surface $B_i(x,t)$ is deformed at the viewing direction f by an amount that depends on the associated sensory feature value $q_i(b,t)$ as:

$$\begin{aligned} \rho _i\big (b,t^+\big ) = q_i\big (b,t\big ) \end{aligned}$$

(13)

where the superscript $t^+$ denotes time just after t. As this is done for each feature ${ v }_i \in {\mathcal {V}}$ where $\left| {\mathcal {V}} \right| = N_v$, a set of $N_v$ bubble surfaces is generated. In the experiments, the robot computes seven bubble surfaces corresponding to seven visual features (hue, Cartesian, non-Cartesian and intensity). For the sample scenes as shown in Fig. 15a, the bubble surfaces are as shown Fig. 15b. The intensity bubble surface is used for checking reliability of sensory data in place detection.

Bubble descriptors are holistic (vector) representations of bubble surfaces. They are constructed using the double Fourier series representation of bubble surfaces as:

$$\begin{aligned} \rho _i\big (b,t\big ) = \sum ^{H_1}_{h_1=0} \sum ^{H_2}_{h_2=0}\lambda _{h_1h_2} z_{xi,h_1h_2}^T (t) e_{h_1h_2}(f) \end{aligned}$$

If $f \in {\mathcal {F}}$ is defined as $f =\left[ f_1 \, f_2\right] ^T$, for each $(h_1,h_2)$, the vector $e_{h_1h_2}(f) \in R^4$ consists of an orthonormal set of trigonometric basis functions as:

$$\begin{aligned} e_{h_1h_2}(f) = \left[ \begin{array}{l} cos\left( h_1 f_1\right) cos\left( h_2 f_2\right) \\ sin\left( h_1 f_1\right) cos\left( h_2 f_2\right) \\ cos\left( h_1 f_1\right) sin\left( h_2 f_2\right) \\ sin\left( h_1 f_1\right) sin\left( h_2 f_2\right) \end{array}\right] \end{aligned}$$

(14)

The corresponding vector $z_{xi,h_1h_2}(t) \in R^4 $ is defined as:

$$\begin{aligned} { z_{xi,h_1h_2}(t) = \frac{1}{\pi ^{2}} \left[ \begin{array}{l} \displaystyle \smallint _{0}^{2\pi }\smallint _{0}^{\pi } {\rho _i(b,t) cos(h_1 f_1)cos(h_2 f_2)df_1 df_2} \\ \displaystyle \smallint _{0}^{2\pi }\smallint _{0}^{\pi } {\rho _i(b,t) sin(h_1 f_1)cos(h_2 f_2))df_1 df_2} \\ \displaystyle \smallint _{0}^{2\pi }\smallint _{0}^{\pi } {\rho _i(b,t)cos(h_1 f_1)sin(h_2 f_2))df_1 df_2} \\ \displaystyle \smallint _{0}^{2\pi }\smallint _{0}^{\pi } {\rho _i(b,t) sin(h_1 f_1)sin(h_2 f_2) df_1 df_2} \end{array} \right] } \end{aligned}$$

(15)

The parameters $\lambda _{h_1h_2}$ are defined as:

$$\begin{aligned} \lambda _{h_1h_2} = \left\{ \begin{array}{ll} \frac{1}{4} &{}\quad \text {if } h_1 = 0, h_2 = 0\\ \frac{1}{2} &{}\quad \text {if } h_1 > 0, h_2 = 0 \ or \ h_1 = 0, h_2 > 0\\ 1 &{}\quad \text {if } h_1 > 0, h_2 > 0 \end{array} \right. \end{aligned}$$

(16)

A bubble descriptor $I(x,t) \in R^{N_I}$ is a $N_I-$dimensional vector with $N_I = N_v(H_1+1)(H_2+1)$ defined as:

$$\begin{aligned} I(x,t) = \Big [I_{1,00}(x,t), \ldots , I_{N_v,H_1H_2}(x,t) \Big ]^T \end{aligned}$$

(17)

where

$$\begin{aligned} I_{i, h_1h_2}(x,t) = z_{xi,h_1h_2}^T(t) z_{xi,h_1h_2}(t) \end{aligned}$$

(18)

Bubble descriptors have been shown to be rotationally invariant with respect to heading changes while being computable in an incremental manner—as new observations are made. Furthermore, they are flexible integrating visual features since their dimensionality are independent of the number of observations. Furthermore, no data association (Williams et al. 2009) is required for finding correspondences among observations taken at different times.

In the experiments, the bubble descriptors are constructed using the first six features with the number of harmonics $H_1=H_2=9$. The intensity bubble surface is not used in order have decreased sensitivity to illumination level. Thus, the length of each bubble descriptor is $N_I=600$.

Sensory data reliability

This section presents a brief discussion of how reliability is measured. Reliability depends on the informativeness, coherency and plenitude of the sensory data. Since sensory data are internally represented using descriptors, they can be measured via processing the descriptors appropriately. In our case, this processing uses the bubble descriptors. The interested reader is kindly referred to Karaoguz and Bozma (2014) for further details.

Informativeness measures whether an incoming sensory data is semantically rich or not. For example, in case of low illumination conditions, everything in the image will look dark. Similarly, if the robot field of view is comprised of an extended object such as a door, again there won’t be much variation in the visual or depth images. An indication of both cases may be detected by computing the average deformation $\mu _i(x_k)$ or variance $\sigma _i(x_k)$ of the associated (intensity or depth) bubble surfaces $B_i(x_k,t_k)$:

$$\begin{aligned} \mu _i\big ({x_k}\big )= & {} \frac{1}{\pi ^{2}}\int _{0}^{2\pi }\int _{0}^{\pi } {\rho _i\big (b,t_k\big ) df_1 df_2} \\ \sigma _i\big ({x_k}\big )= & {} \int _{0}^{2\pi }\int _{0}^{\pi } \big ({\rho _i(b,t_k)} - \mu _i(x_k)\big )^2 df_1 df_2 \end{aligned}$$

Low values indicate minimal surface deformation which implies that the data is not informative. Hence, the informativeness decision is based on a binary valued function $\varsigma : k\rightarrow \left\{ 0,1\right\} $:

$$\begin{aligned} \varsigma (x_k) =\left\{ \begin{array}{ll} 1 &{}\quad \mu _i({x_k}) \le \tau _{\eta } \\ 1 &{} \quad \sigma _i({x_k}) \le \tau _{\sigma } \\ 0 &{} \quad \text{ otherwise } \end{array} \right. \end{aligned}$$

where $\tau _{\eta }$ and $\tau _{\sigma }$ are a priori selected threshold parameters. Sensory data from a particular base point $x_k$ is used if and only if $\varsigma (x_k) = 0$. In this work, the bubble surface associated with intensity feature ($i=7$) is used.

The coherency of data from two consecutive base points $x_k$ and $x_{k-1}$ is measured by comparing the similarity of their respective bubble descriptors $I(x_{k})$ and $I(x_{k-1})$ using a $\chi ^2$-distance. For example, in case of jagged robot head or body motion, sensory data from consecutive bases will be quite unrelated. Thus, the incoherency decision is based on a binary valued function $\kappa : k\rightarrow \left\{ 0,1\right\} $:

$$\begin{aligned} \kappa (x_k) = \left\{ \begin{array}{ll} 0 &{} \quad \left\| I(x_k),I(x_{k-1}) \right\| _{\chi ^2} \le \tau _\kappa \\ 1 &{}\quad \text{ otherwise } \end{array} \right. \end{aligned}$$

A low similarity value as compared with the incoherency threshold $\tau _{\kappa }$ is an indicator of incoherency.

Finally, the pool of data associated with each place should be of sufficient amount. For example, sensory data from just a few base points—even if informative and coherent—will not in general be indicative of a particular place. The plenitude decision is based on the extent of the detected places $D_m$—namely those with extent less than a plenitude threshold $\tau _p$ are considered to have insufficient amount of data. The values of the informativeness thresholds $\tau _{\eta }$ and $\tau _{\sigma }$, incoherency threshold $\tau _{\kappa }$ and the plenitude threshold $\tau _p$ affect the place detection performance. In this work, they are adjusted manually based on the camera type and nature of incoming sensory data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karaoğuz, H., Bozma, H.I. An integrated model of autonomous topological spatial cognition. Auton Robot 40, 1379–1402 (2016). https://doi.org/10.1007/s10514-015-9514-4

Download citation

Received: 14 April 2015
Accepted: 14 October 2015
Published: 14 November 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10514-015-9514-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An integrated model of autonomous topological spatial cognition

Abstract

Access this article

Similar content being viewed by others

What is a cognitive map? Unravelling its mystery using robots

Merging of appearance-based place knowledge among multiple robots

Strong Spatial Cognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Bubble space

Sensory data reliability

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An integrated model of autonomous topological spatial cognition

Abstract

Access this article

Similar content being viewed by others

What is a cognitive map? Unravelling its mystery using robots

Merging of appearance-based place knowledge among multiple robots

Strong Spatial Cognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Bubble space

Sensory data reliability

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation