Abstract
Autonomous manipulation in unstructured environments will enable a large variety of exciting and important applications. Despite its promise, autonomous manipulation remains largely unsolved. Even the most rudimentary manipulation task—such as removing objects from a pile—remains challenging for robots. We identify three major challenges that must be addressed to enable autonomous manipulation: object segmentation, action selection, and motion generation. These challenges become more pronounced when unknown man-made or natural objects are cluttered together in a pile. We present a system capable of manipulating unknown objects in such an environment. Our robot is tasked with clearing a table by removing objects from a pile and placing them into a bin. To that end, we address the three aforementioned challenges. Our robot perceives the environment with an RGB-D sensor, segmenting the pile into object hypotheses using non-parametric surface models. Our system then computes the affordances of each object, and selects the best affordance and its associated action to execute. Finally, our robot instantiates the proper compliant motion primitive to safely execute the desired action. For efficient and reliable action selection, we developed a framework for supervised learning of manipulation expertise. To verify the performance of our system, we conducted dozens of trials and report on several hours of experiments involving more than 1,500 interactions. The results show that our learning-based approach for pile manipulation outperforms a common sense heuristic as well as a random strategy, and is on par with human action selection.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We scale each feature \(f_i\) using its mean \(E(f_i)\) and variance \(V(f_i)\). \(f_i^{scaled} = (f_i-E(f_i))/\sqrt{Var(f_i)}\). If a scaled feature is more than two standard deviations away from the mean, we cap \(f_i^{scaled}\) at either \(-2\) or \(2\). Finally, we divide \(f_i^{scaled}\) by \(2\) to guarantee that all features are in the range \([-1,1]\).
References
Bagnell, J.A., Cavalcanti, F., Cui, L., Galluzzo, T., Hebert, M., Kazemi, M., Libby, J., Liu, T.Y., Pollard, N. S., Pivtoraiko, M., Valois, J-S., Klingensmith, M., & Zhu, R. (2012). An integrated system for autonomous robotics manipulation. In IROS (pp. 2955–2962).
Barck-Holst, C., Ralph, M., Holmar, F., & Kragic, D. (2009). Learning grasping affordance using probabilistic and ontological approaches. In ICAR (pp. 1–6).
Bishop, Christopher M., et al. (2006). Pattern recognition and machine learning. New York: Springer.
Chang, Lillian Y., Smith, Joshua R., & Fox, D., (2012). Interactive singulation of objects from a pile. In ICRA, (pp. 3875–3882).
Comaniciu, Dorin, & Meer, Peter. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Forsyth, D. A., & Ponce, J. (2002). Computer Vision: A Modern Approach. Englewood Cliffs: Prentice Hall Professional Technical Reference.
Gibson, J. J. (1977). The theory of affordances, volume Perceiving (pp. 67–82). Mahwah: Lawrence Erlbaum.
Goh, A., & Vidal, R., (2007). Segmenting motions of different types by unsupervised manifold clustering. In CVPR (pp. 1–6).
Gupta, M., & Sukhatme, G. (2012). Using manipulation primitives for brick sorting in clutter. In ICRA (pp. 3883–3889).
Hausman, K., Balint-Benczedi, F., Pangercic, D., Marton, Z.-C., Ueda, R., Okada, K., & Beetz, M. (2013). Tracking-based interactive segmentation of textureless objects. In IEEE ICRA (pp. 1122–1129).
Hermans, T., Rehg, J.M., & Bobick, A. (2012). Guided pushing for object singulation. In IEEE IROS (pp. 4783–4790).
Hermans, T., Li, F., Rehg, J.M., & Bobick, A.F. (2013). Learning contact locations for pushing and orienting unknown objects. In IEEE-RAS International Conference on Humanoid Robotics.
Joachims, T. (1999). Making large-scale svm learning practical. In B. Schlkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning. Cambridge: MIT Press.
Katz, D., & Brock, O. (2008). Manipulating articulated objects with interactive perception. In ICRA (pp. 272–277).
Katz, D., Pyuro, Y., & Brock, O. (2008). Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In RSS (pp. 254–261), Zurich, Switzerland.
Katz, D., Kazemi, M., Bagnell, J.A., & Stentz, A. (2013a). Clearing a pile of unknown objects using interactive perception. In ICRA (pp. 154–161), Karlsruhe, Germany.
Katz, D., Venkatraman, A., Kazemi, M., Bagnell, J.A., & Stentz, A. (2013b). Perceiving, learning, and exploiting object affordances for autonomous pile manipulation. In RSS.
Kazemi, M., Valois, J-S., Bagnell, J.A., & Pollard, N. (2012) Robust object grasping using force compliant motion primitives. In RSS.
Kenney, J., Buckley, T., & Brock, O. (2009). Interactive segmentation for manipulation in unstructured environments. In ICRA (pp. 1343–48).
Lang, Tobias, & Toussaint, Marc. (2010). Planning with noisy probabilistic relational rules. JAIR, 39, 1–49.
Le, Q.V., Kamm, D., Kara, A.F., Ng, A.Y. (2010). Learning to grasp objects with multiple contact points. In ICRA (pp. 5062–5069).
Ratliff, N., Zucker, M., Bagnell, J.A., & Srinivasa, S. (2009) CHOMP: Gradient optimization techniques for efficient motion planning. In ICRA (pp. 489–494).
Saxena, Ashutosh, Driemeyer, Justin, & Ng, Andrew Y. (2008). Robotic grasping of novel objects using vision. The International Journal of Robotics Research, 27(2), 157–173.
Stolkin, R., Greig, A., Hodgetts, M., & Gilby, J. (2008). An EM/E-MRF Algorithm for adaptive model based tracking in extremely poor visibility. Image and Vision Computing, 26(4), 480–495.
Taylor, C.J., & Cowley, A. (2011). In RSS Workshop on RGB-D Cameras: Segmentation and analysis of rgb-d data.
Ugur, E., Sahin, E., Oztop, E. (2012). Self-discovery of motor primitives and learning grasp affordances. In IEEE IROS (pp. 3260–3267).
van Hoof, H., Kroemer, O., & Peters, J. (2013). Probabilistic interactive segmentation for anthropomorphic robots in cluttered environments. In Proceedings of the International Conference on Humanoid Robots (HUMANOIDS).
Yang, S.-W., Wang, C.-C., & Chang, C.-H. (2010). Ransac matching: Simultaneous registration and segmentation. In ICRA (pp. 1905–1912).
Zappella, L. (2008). Motion segmentation from feature trajectories. Master’s thesis, University of Girona, Girona, Spain.
Zhang, J., Shi, F., Wang, J., Liu, Y. (2007). 3D motion segmentation from straight-line optical flow. In Multimedia Content Analysis and Mining (pp. 85–94). Springer, Berlin/Heidelberg.
Acknowledgments
This work was conducted in part through collaborative participation in the Robotics Consortium sponsored by the U.S Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement W911NF-10-2-0016 and also in part by Intel (Embedded Technology Intel Science and Technology Center). The authors also gratefully acknowledge funding under the DARPA Autonomous Robotic Manipulation Software Track (ARM-S) program.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Katz, D., Venkatraman, A., Kazemi, M. et al. Perceiving, learning, and exploiting object affordances for autonomous pile manipulation. Auton Robot 37, 369–382 (2014). https://doi.org/10.1007/s10514-014-9407-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-014-9407-y