Skip to main content
Log in

An integrated model of autonomous topological spatial cognition

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

This paper is focused on endowing a mobile robot with topological spatial cognition. We propose an integrated model—where the concept of a ‘place’  is defined as a collection of appearances or locations sharing common perceptual signatures or physical boundaries. In this model, as the robot navigates, places are detected in a systematic manner via monitoring coherency in the incoming visual data while pruning out uninformative or scanty data. Detected places are then either recognized or learned along with mapping as necessary. The novelties of the model are twofold: First, it explicitly incorporates a long-term spatial memory where the knowledge of learned places and their spatial relations are retained in place and map memories respectively. Second, the processing modules operate together so that the robot is able to build its spatial memory in an organized, incremental and unsupervised manner. Thus, the robot’s long-term spatial memory evolves completely on its own while learned knowledge is organized based on appearance-related similarities in a manner that is amenable for higher-level semantic reasoning, As such, the proposed model constitutes a step forward towards having robots that are capable of interacting with their environments in an autonomous manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Note that this definition differs appearance-based or topological SLAM methods where each location is considered separately as a place (Cummins and Newman 2008; Newman et al. 2009a; Konolige et al. 2010) or a representative location (key-place) is selected after grouping visual data from different locations (Murphy and Sibley 2014). In the former, key-frames do not necessarily represent distinct ‘places’since their selections are typically arbitrary while in the latter, key-places may not encode all the place related knowledge since they are defined by the midpoint frames of the associated clusters.

  2. As there are no externally provided labels expressed in natural language such as “kitchen” or “Saar building”, it is not possible to expect such explicit label assignments (Walter et al. 2014).

  3. For the interested reader, they are explained briefly in Appendix “Bubble space”.

  4. Note that in a related work (Murphy and Sibley 2014), there are 28 key-places detected with the same dataset.

References

  • Beeson, P., Modayil, J., & Kuipers, B. (2010). Factoring the mapping problem: Mobile robot map-building in the hybrid spatial semantic hierarchy. The International Journal of Robotics Research, 29(4), 428–459.

    Article  Google Scholar 

  • Casati, R. (2002). Topology and cognition. New York: Mcmillan.

    Google Scholar 

  • Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.

    Article  Google Scholar 

  • Chella, A., Macaluso, I., & Riano, L. (2007). Automatic place detection and localization in autonomous robotics. In International conference on intelligent robots and systems (pp. 741–746).

  • Cummins, M., & Newman, P. (2008). Fab-map: Probabilistic localization and mapping in the space of appearance. International Journal of Robotics Research, 27, 647–665.

    Article  Google Scholar 

  • Cummins, M., & Newman, P. (2011). Appearance-only slam at large scale with fab-map 2.0. The International Journal of Robotics Research, 30(9), 1100–1123.

  • Denis, M., & Loomis, J. M. (2007). Perspectives on human spatial cognition: Memory, navigation, and environmental learning. Psychological Research, 71(3), 235–239.

    Article  Google Scholar 

  • Dolins, F. L., & Mitchell, R. W. (2010). Linking spatial perception and spatial cognition. Cambridge: Cambridge University Press.

    Google Scholar 

  • Erkent, O., & Bozma, H. I. (2012). Place representation in topological maps based on bubble space. In Proceedings of international conference on robotics and automation (pp. 3497–3502).

  • Erkent, O., & Bozma, H. I. (2013). Bubble space and place representation in topological maps. The International Journal of Robotics Research, 32(6), 671–688.

    Article  Google Scholar 

  • Erkent, O., & Bozma, H. I. (2015). Long-term topological place learning. In IEEE international conference on robotics and automation.

  • Galindo, C., Saffiotti, A., Coradeschi, S., Buschka, P., Fernandez-Madrigal, J., & Gonzalez, J. (2005). Multi-hierarchical semantic maps for mobile robotics. In IEEE/RSJ international conference on intelligent robots and systems, 2005. (IROS 2005) (pp. 2278–2283).

  • Glover, A., Maddern, W., Warren, M., Reid, S., Milford, M., & Wyeth, G. (2012). Openfabmap: An open source toolbox for appearance-based loop closure detection. In IEEE international conference on robotics and automation (ICRA), 2012 (pp. 4730–4735). IEEE.

  • Ho, K. L., & Newman, P. (2007). Detecting loop closure with scene sequences. International Journal of Computer Vision, 74(3), 261–286.

    Article  Google Scholar 

  • Karaoguz, H., & Bozma, H. I. (2014). Reliable topological place detection in bubble space. In Proceedings of international conference on robotics and autonomous (pp. 697–702).

  • Karaoguz, H., & Bozma, H. I. (2015). Topological place recognition based on long-term memory retrieval. In Proceedings of international conference on advanced robotics (ICAR), 2015 (pp. 218–223).

  • Konolige, K., Bowman, J., Chen, J., Mihelich, P., Calonder, M., Lepetit, V., et al. (2010). View-based maps. The International Journal of Robotics Research, 29(8), 941–957.

    Article  Google Scholar 

  • Kuipers, B. (2000). The spatial semantic hierarchy. Artificial Intelligence, 119(1–2), 191–233.

    Article  MathSciNet  MATH  Google Scholar 

  • Lim, J., Frahm, J. M., & Pollefeys, M. (2012). Online environment mapping using metric-topological maps. International Journal of Robotics Research, 31(12), 1394–1408.

    Article  Google Scholar 

  • Liu, M., & Siegwart, R. (2012). DP-FACT: Towards topological mapping and scene recognition with color for omnidirectional camera. In Proceedings of international conference on robotics and automation (pp. 3503–3508).

  • Martinez-Gomez, J., & Caputo, B. (2011). Towards semi-supervised learning of semantic spatial concepts. In Proceedings of international conference on robotics and autonomous (pp. 1936–1943).

  • Mozos, O. M., Jensfelt, P., Zender, H., Kruijff, G. J. M., & Burgard, W. (2007a). From labels to semantics: An integrated system for conceptual spatial representations of indoor environments for mobile robots. In Proceedings of IEEE/RSJ IROS workshop: From sensors to human spatial concepts.

  • Mozos, O. M., Triebel, R., Jensfelt, P., Rottmann, A., & Burgard, W. (2007b). Supervised semantic labeling of places using information extracted from laser and vision sensor data. Robotics and Autonomous Systems, 55(5), 391–402.

    Article  Google Scholar 

  • Murphy, L., & Sibley, G. (2014). Incremental unsupervised topological place discovery. In Proceedings of international conference on robotics and automation (pp. 1312–1318).

  • Newman, P., Sibley, G., Smith, M., Cummins, M., Harrison, A., Mei, C., et al. (2009a). Navigating, recognizing and describing urban spaces with vision and lasers. The International Journal of Robotics Research, 28(11–12), 1406–1433.

    Article  Google Scholar 

  • Newman, P., Sibley, G., Smith, M., Cummins, M., Harrison, A., Mei, C., et al. (2009b). Navigating, recognizing and describing urban spaces with vision and lasers. The International Journal of Robotics Research, 28(11–12), 1406–1433.

    Article  Google Scholar 

  • Posner, I., Schroeter, D., & Newman, P. (2008). Using scene similarity for place labelling. In O. Khatib, V. Kumar, & D. Rus (Eds.), Experimental robotics, springer tracts in advanced robotics (Vol. 39, pp. 85–98). Berlin/Heidelberg: Springer.

    Google Scholar 

  • Pronobis, A., & Caputo, B. (2009). COLD: COsy localization database. International Journal of Robotics Research, 28(5), 588–594.

    Article  Google Scholar 

  • Pronobis, A., & Jensfelt, P. (2012). Large-scale semantic mapping and reasoning with heterogeneous modalities. In Proceedings of international conference on robotics and automation (pp. 3515–3522).

  • Pronobis, A., Sjöö, K., Aydemir, A., Bishop, A.N., & Jensfelt, P. (2010). Representing spatial knowledge in mobile cognitive systems. In 11th international conference on intelligent autonomous systems (IAS-11). Ottawa.

  • Ranganathan, A. (2010). PLISS: Detecting and labeling places using online change-point detection. In Proceedings of robotics: Science and systems.

  • Ranganathan, A. (2012). PLISS: Labeling places using online changepoint detection. Autonomous Robots, 32(4), 351–368.

    Article  MathSciNet  Google Scholar 

  • Remolina, E., & Kuipers, B. (2004). Towards a general theory of topological maps. Artificial Intelligence, 152(1), 47–104.

    Article  MathSciNet  MATH  Google Scholar 

  • Robert, L. (1997). Spatial cognition: Geographic environments. Berlin: Kluwer Academic Publishers.

    Google Scholar 

  • Shi, L., Kodagoda, S., & Dissanayake, G. (2012). Application of semi-supervised learning with voronoi graph for place classification. In Proceedings of international conference on intelligent robots and systems (pp. 2991–2996).

  • Sibson, R. (1973). SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), 30–34.

    Article  MathSciNet  Google Scholar 

  • Smith, M., Baldwin, I., Churchill, W., Paul, R., & Newman, P. (2009). The new college vision and laser data set. International Journal of Robotics Research, 28(5), 595–599.

    Article  Google Scholar 

  • Tapus, A., & Siegwart, R. (2005). Incremental robot mapping with fingerprints of places. In Proceedings of international conference on intelligent robots and systems (pp. 2429–2434). lEEE.

  • Tversky, B. (1993). Cognitive maps, cognitive collages, and spatial mental models. In Spatial information theory: A theoretical basis for GIS (pp. 14–24). Springer.

  • Tversky, B. (2005). Functional significance of visuospatial representations. In P. Shah & A. Miyake (Eds.), Handbook of higher-level visuospatial thinking (pp. 1–34). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Tversky, B., & Hemenway, K. (1983). Categories of scenes. Cognitive Psychology, 15, 121–149.

    Article  Google Scholar 

  • Ursic, P., Kristan, M., Skocaj, D., & Leonardis, A. (2012). Room classification using a hierarchical representation of space. In Proceedings of international conference on intelligent robots and systems (pp. 1371–1378).

  • Vasudevan, S., & Siegwart, R. (2008). Bayesian space conceptualization and place classification for semantic maps in mobile robotics. Robotics and Autonomous Systems, 56(6), 522–537.

    Article  Google Scholar 

  • Vasudevan, S., Gachter, S., Nguyen, V., & Siegwart, R. (2007). Cognitive maps for mobile robots-an object based approach. Robotics and Autonomous Systems, 55(5), 359–371.

    Article  Google Scholar 

  • Walter, M. R., Hemachandra, S., Homberg, B., Tellex, S., & Teller, S. (2014). A framework for learning semantic maps from grounded natural language descriptions. The International Journal of Robotics Research, 33(9), 1167–1190.

    Article  Google Scholar 

  • Williams, B., Cummins, M., Neira, J., Newman, P., Reid, I., & Tardos, J. (2009). A comparison of loop closing techniques in monocular SLAM. Robotics and Autonomous Systems, 57(12), 1188–1197.

    Article  Google Scholar 

  • Yeh, T., & Darrell, T. (2008). Dynamic visual category learning. In Proceedings of computer vision and pattern recognition (pp. 1–8).

  • Zivkovic, Z., Bakker, B., & Krose, B. (2005). Hierarchical map building using visual landmarks and geometric constraints. In Proceedings of international conference on intelligent systems and robotics (pp. 2480–2485).

  • Zivkovic, Z., Booij, O., & Kröse, B. (2007). From images to rooms. Robotics and Autonomous Systems, 55(5), 411–418.

    Article  Google Scholar 

Download references

Acknowledgments

This work has been supported in part by Bogazici University BAP Project 9164 and Tubitak Project EEAG 111E285. The first author is supported by Turkish State Planning Organization (DPT) under the TAM Project number 2007K120610.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hakan Karaoğuz.

Appendices

Appendix

The summary of the most commonly used symbols in the paper, their definitions and the sections where they are defined are presented in Table 5 for convenience.

Table 5 List of symbols
Fig. 15
figure 15

Representation of visual data from sample bases in Fr, Sa, Lj and NC sites. a Visual data from sample bases in the Fr, Lj , Sa and NC sites. b Corresponding bubble surfaces for each of (color, Cartesian, non-Cartesian and intensity) features

Bubble space

This section presents a brief summary of bubble space representation for completeness. The interested reader is referred to Erkent and Bozma (2013) for further details. The bubble space \({\mathcal {B}} = {\mathcal {X}} \times {\mathcal {F}}\) is an abstract representation of the robot’s base along with its viewing directions (pan and tilt) \({\mathcal {F}} \subset S^2\) with \(b \in {\mathcal {B}}\) defined as \(b = \left[ x \, f \right] ^T\) where \(x \in {\mathcal {X}}\) and \(f \in {\mathcal {F}}\). Bubble surfaces \(B_i(x,t): Im(h(x)) \times R^{\ge 0} \rightarrow R^{\ge 0}\) are hypothetical spherical surfaces surrounding the robot defined as:

$$\begin{aligned} B_i(x,t) = \left\{ \left[ \begin{array}{l} f \\ \rho _i(b,t) \end{array} \right] \mid \forall f \in {\mathcal {F}} \,\, \text{ and } b=\left[ x \,f\right] ^T \right\} \end{aligned}$$
(12)

where the image of a section h—namely Im(h(x))—is the set of viewing directions from a given base x with the section \(h : {\mathcal {X}} \rightarrow {\mathcal {B}}\) defined as a continuous map such that \(\forall x \in {\mathcal {X}}\), \(\pi (h(x))=x\) and \(\pi : {\mathcal {B}} \rightarrow {\mathcal {X}}\) defined as the projection of b onto \({\mathcal {X}}\) as \(\pi (b)=x\). Finally, the function \(\rho _i: {\mathcal {B}} \times R^{\ge 0} \rightarrow R^{\ge 0}\) is a Riemannian metric that encodes the observed values of \({ v }_i^{th}\) sensory feature. For simplification of notation, the second argument is omitted whenever time dependency is clear. Each bubble surface is initialized to be a \(S^2\) sphere with radius \(\rho _0 \in R^{\ge 0}\)—namely \(\rho _i(b,0)=\rho _0\). As the robot looks around, for each viewing direction \(f \in {\mathcal {F}}\), it computes each feature value \(q_{i}(b,t) \ge 0\). Next, each bubble surface \(B_i(x,t)\) is deformed at the viewing direction f by an amount that depends on the associated sensory feature value \(q_i(b,t)\) as:

$$\begin{aligned} \rho _i\big (b,t^+\big ) = q_i\big (b,t\big ) \end{aligned}$$
(13)

where the superscript \(t^+\) denotes time just after t. As this is done for each feature \({ v }_i \in {\mathcal {V}}\) where \(\left| {\mathcal {V}} \right| = N_v\), a set of \(N_v\) bubble surfaces is generated. In the experiments, the robot computes seven bubble surfaces corresponding to seven visual features (hue, Cartesian, non-Cartesian and intensity). For the sample scenes as shown in Fig. 15a, the bubble surfaces are as shown Fig. 15b. The intensity bubble surface is used for checking reliability of sensory data in place detection.

Bubble descriptors are holistic (vector) representations of bubble surfaces. They are constructed using the double Fourier series representation of bubble surfaces as:

$$\begin{aligned} \rho _i\big (b,t\big ) = \sum ^{H_1}_{h_1=0} \sum ^{H_2}_{h_2=0}\lambda _{h_1h_2} z_{xi,h_1h_2}^T (t) e_{h_1h_2}(f) \end{aligned}$$

If \(f \in {\mathcal {F}}\) is defined as \(f =\left[ f_1 \, f_2\right] ^T\), for each \((h_1,h_2)\), the vector \(e_{h_1h_2}(f) \in R^4\) consists of an orthonormal set of trigonometric basis functions as:

$$\begin{aligned} e_{h_1h_2}(f) = \left[ \begin{array}{l} cos\left( h_1 f_1\right) cos\left( h_2 f_2\right) \\ sin\left( h_1 f_1\right) cos\left( h_2 f_2\right) \\ cos\left( h_1 f_1\right) sin\left( h_2 f_2\right) \\ sin\left( h_1 f_1\right) sin\left( h_2 f_2\right) \end{array}\right] \end{aligned}$$
(14)

The corresponding vector \(z_{xi,h_1h_2}(t) \in R^4 \) is defined as:

$$\begin{aligned} { z_{xi,h_1h_2}(t) = \frac{1}{\pi ^{2}} \left[ \begin{array}{l} \displaystyle \smallint _{0}^{2\pi }\smallint _{0}^{\pi } {\rho _i(b,t) cos(h_1 f_1)cos(h_2 f_2)df_1 df_2} \\ \displaystyle \smallint _{0}^{2\pi }\smallint _{0}^{\pi } {\rho _i(b,t) sin(h_1 f_1)cos(h_2 f_2))df_1 df_2} \\ \displaystyle \smallint _{0}^{2\pi }\smallint _{0}^{\pi } {\rho _i(b,t)cos(h_1 f_1)sin(h_2 f_2))df_1 df_2} \\ \displaystyle \smallint _{0}^{2\pi }\smallint _{0}^{\pi } {\rho _i(b,t) sin(h_1 f_1)sin(h_2 f_2) df_1 df_2} \end{array} \right] } \end{aligned}$$
(15)

The parameters \(\lambda _{h_1h_2}\) are defined as:

$$\begin{aligned} \lambda _{h_1h_2} = \left\{ \begin{array}{ll} \frac{1}{4} &{}\quad \text {if } h_1 = 0, h_2 = 0\\ \frac{1}{2} &{}\quad \text {if } h_1 > 0, h_2 = 0 \ or \ h_1 = 0, h_2 > 0\\ 1 &{}\quad \text {if } h_1 > 0, h_2 > 0 \end{array} \right. \end{aligned}$$
(16)

A bubble descriptor \(I(x,t) \in R^{N_I}\) is a \(N_I-\)dimensional vector with \(N_I = N_v(H_1+1)(H_2+1)\) defined as:

$$\begin{aligned} I(x,t) = \Big [I_{1,00}(x,t), \ldots , I_{N_v,H_1H_2}(x,t) \Big ]^T \end{aligned}$$
(17)

where

$$\begin{aligned} I_{i, h_1h_2}(x,t) = z_{xi,h_1h_2}^T(t) z_{xi,h_1h_2}(t) \end{aligned}$$
(18)

Bubble descriptors have been shown to be rotationally invariant with respect to heading changes while being computable in an incremental manner—as new observations are made. Furthermore, they are flexible integrating visual features since their dimensionality are independent of the number of observations. Furthermore, no data association (Williams et al. 2009) is required for finding correspondences among observations taken at different times.

In the experiments, the bubble descriptors are constructed using the first six features with the number of harmonics \(H_1=H_2=9\). The intensity bubble surface is not used in order have decreased sensitivity to illumination level. Thus, the length of each bubble descriptor is \(N_I=600\).

Sensory data reliability

This section presents a brief discussion of how reliability is measured. Reliability depends on the informativeness, coherency and plenitude of the sensory data. Since sensory data are internally represented using descriptors, they can be measured via processing the descriptors appropriately. In our case, this processing uses the bubble descriptors. The interested reader is kindly referred to Karaoguz and Bozma (2014) for further details.

Informativeness measures whether an incoming sensory data is semantically rich or not. For example, in case of low illumination conditions, everything in the image will look dark. Similarly, if the robot field of view is comprised of an extended object such as a door, again there won’t be much variation in the visual or depth images. An indication of both cases may be detected by computing the average deformation \(\mu _i(x_k)\) or variance \(\sigma _i(x_k)\) of the associated (intensity or depth) bubble surfaces \(B_i(x_k,t_k)\):

$$\begin{aligned} \mu _i\big ({x_k}\big )= & {} \frac{1}{\pi ^{2}}\int _{0}^{2\pi }\int _{0}^{\pi } {\rho _i\big (b,t_k\big ) df_1 df_2} \\ \sigma _i\big ({x_k}\big )= & {} \int _{0}^{2\pi }\int _{0}^{\pi } \big ({\rho _i(b,t_k)} - \mu _i(x_k)\big )^2 df_1 df_2 \end{aligned}$$

Low values indicate minimal surface deformation which implies that the data is not informative. Hence, the informativeness decision is based on a binary valued function \(\varsigma : k\rightarrow \left\{ 0,1\right\} \):

$$\begin{aligned} \varsigma (x_k) =\left\{ \begin{array}{ll} 1 &{}\quad \mu _i({x_k}) \le \tau _{\eta } \\ 1 &{} \quad \sigma _i({x_k}) \le \tau _{\sigma } \\ 0 &{} \quad \text{ otherwise } \end{array} \right. \end{aligned}$$

where \(\tau _{\eta }\) and \(\tau _{\sigma }\) are a priori selected threshold parameters. Sensory data from a particular base point \(x_k\) is used if and only if \(\varsigma (x_k) = 0\). In this work, the bubble surface associated with intensity feature (\(i=7\)) is used.

The coherency of data from two consecutive base points \(x_k\) and \(x_{k-1}\) is measured by comparing the similarity of their respective bubble descriptors \(I(x_{k})\) and \(I(x_{k-1})\) using a \(\chi ^2\)-distance. For example, in case of jagged robot head or body motion, sensory data from consecutive bases will be quite unrelated. Thus, the incoherency decision is based on a binary valued function \(\kappa : k\rightarrow \left\{ 0,1\right\} \):

$$\begin{aligned} \kappa (x_k) = \left\{ \begin{array}{ll} 0 &{} \quad \left\| I(x_k),I(x_{k-1}) \right\| _{\chi ^2} \le \tau _\kappa \\ 1 &{}\quad \text{ otherwise } \end{array} \right. \end{aligned}$$

A low similarity value as compared with the incoherency threshold \(\tau _{\kappa }\) is an indicator of incoherency.

Finally, the pool of data associated with each place should be of sufficient amount. For example, sensory data from just a few base points—even if informative and coherent—will not in general be indicative of a particular place. The plenitude decision is based on the extent of the detected places \(D_m\)—namely those with extent less than a plenitude threshold \(\tau _p\) are considered to have insufficient amount of data. The values of the informativeness thresholds \(\tau _{\eta }\) and \(\tau _{\sigma }\), incoherency threshold \(\tau _{\kappa }\) and the plenitude threshold \(\tau _p\) affect the place detection performance. In this work, they are adjusted manually based on the camera type and nature of incoming sensory data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karaoğuz, H., Bozma, H.I. An integrated model of autonomous topological spatial cognition. Auton Robot 40, 1379–1402 (2016). https://doi.org/10.1007/s10514-015-9514-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-015-9514-4

Keywords

Navigation