Abstract
In recent years, the computer vision applications in the robotics have been improved to approach human-like visual perception and scene/context understanding. Following this aspiration, in this study, we explored the possibility of better object manipulation performance by connecting the visual recognition of objects to their physical attributes, such as weight and center of gravity (CoG). To develop and test this idea, an object manipulation platform is built comprising a robotic arm, a depth camera fixed at the top center of the workspace, embedded encoders in the robotic arm mechanism, and microcontrollers for position and force control. Since both the visual recognition and force estimation algorithms use deep learning principles, the test set-up was named as Deep-Table. The objects in the manipulation tests are selected from everyday life and are common to be seen on modern office desktops. The visual object localization and recognition processes are performed from two distinct branches by deep convolutional neural network architectures. We present five of the possible cases, having different levels of information availability on the object weight and CoG in the experiments. The results confirm that using our algorithm, the robotic arm can move different types of objects successfully varying from several grams (empty bottle) to around 250 g (ceramic cup) without failure or tipping. The proposed method also shows that connecting the object recognition with load estimation and contact point further improves the performance characterized by a smoother motion.
Similar content being viewed by others
References
Qian Y, Bi M, Tan T, Yu K (2016) Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans Audio Speech Lang Process 24(12):2329–9290
Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256
Moussa A, Mohamed H, Feng J, Kuanquan W, Amel A (2018) Very deep feature extraction and fusion for arrhythmias detection. Neural Comput Appl 30:2047–2057
Haithem H, Olfa M, Ezzeddine Z (2018) Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain. Neural Comput Appl 30:2029–2045
Weiwei Y, Chenliang L, Donghai G, Guangjie H, Masood KA (2018) Socialized healthcare service recommendation using deep learning. Neural Comput Appl 30:2071–2082
Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5255–5259. https://doi.org/10.1109/ICASSP.2017.7953159
Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42(3–4):143–166
Cheng G (ed) (2014) Humanoid robotics and neuroscience: science, engineering and society. CRC Press, Boca Raton
Lemaignan S, Warnier M, Sisbot EA, Clodic A, Alami R (2017) Artificial cognition for social human-robot interaction: an implementation. Artif Intell 247:45–69
Bayraktar E, Yigit CB, Boyraz P (2018) A hybrid image dataset towards bridging the gap between real and simulation environments for robotics. Mach Vis Appl. https://doi.org/10.1007/s00138-018-0966-3
Bailey DG (1995) Pixel calibration techniques. In: Proceedings of the New Zealand image and vision computing workshop, pp 37–42
Yigit CB, Bayraktar E, Boyraz P (2018) Low-cost variable stiffness joint design using translational variable radius pulleys. Mech Mach Theory 130:203–219
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham, pp 818–833
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Advances in neural information processing systems, pp 2553–2561
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. European conference on computer vision. Springer, Cham, pp 21–37
Lin TY, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature Pyramid Networks for Object Detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 4
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. arXiv preprint arXiv:1708.02002
Wahrburg A, Zeiss S, Matthias B, Ding H (2014) Contact force estimation for robotic assembly using motor torques. In: 2014 IEEE international conference on automation science and engineering (CASE). IEEE, pp 1252–1257
Ugurlu B, Nishimura M, Hyodo K, Kawanishi M, Narikiyo T (2012) A framework for sensorless torque estimation and control in wearable exoskeletons. In: 2012 12th IEEE international workshop on advanced motion control (AMC), pp 1–7. IEEE
Yigit CB (2018) Novel mechanism and controller design for hybrid force-position control of humanoid robots. Istanbul Technical University, Istanbul, Turkey (phd thesis)
Narendra KS, Parthasarathy K (1990) Identification and control of dynamical systems using neural networks. IEEE Trans Neural Netw 1(1):4–27
Yegerlehner JD, Meckl PH (1993) Experimental implementation of neural network controller for robot undergoing large payload changes. In Proceedings 1993 IEEE international conference on robotics and automation, 1993. IEEE, pp 744–749
Nho HC, Meckl P (2003) Intelligent feedforward control and payload estimation for a two-link robotic manipulator. IEEE/ASME Trans Mechatron 8(2):277–282
Leahy MB, Johnson MA, Rogers SK (1991) Neural network payload estimation for adaptive robot control. IEEE Trans Neural Networks 2(1):93–100
Eski İ, Kırnap A (2018) Controller design for upper limb motion using measurements of shoulder, elbow and wrist joints. Neural Comput Appl 30(1):307–325
Byravan A, Fox D (2017) SE3-nets: Learning rigid body motion using deep neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 173–180
Smith AC, Mobasser F, Hashtrudi-Zaad K (2006) Neural-network-based contact force observers for haptic applications. IEEE Trans Rob 22(6):1163–1175
Decety J, Grèzes J (1999) Neural mechanisms subserving the perception of human actions. Trends Cognit Sci 3(5):172–178
Triloka J, Senanayake SA, Lai D (2017) Neural computing for walking gait pattern identification based on multi-sensor data fusion of lower limb muscles. Neural Comput Appl 28(1):65–77
McIntyre J, Zago M, Berthoz A, Lacquaniti F (2001) Does the brain model Newton’s laws? Nat Neurosci 4(7):693
Friedman J, Flash T (2007) Task-dependent selection of grasp kinematics and stiffness in human object manipulation. Cortex 43(3):444–460
Helbig HB, Graf M, Kiefer M (2006) The role of action representations in visual object recognition. Exp Brain Res 174(2):221–228
Negri GA, Rumiati RI, Zadini A, Ukmar M, Mahon BZ, Caramazza A (2007) What is the role of motor simulation in action and object recognition? Evidence from apraxia. Cognit Neuropsychol 24(8):795–816
Gupta A, Kembhavi A, Davis LS (2009) Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans Pattern Anal Mach Intell 31(10):1775–1789
Van Cuong P, Nan WY (2016) Adaptive trajectory tracking neural network control with robust compensator for robot manipulators. Neural Comput Appl 27(2):525–536
Bohg J, Welke K, León B, Do M, Song D, Wohlkinger W, Aldoma A, Madry M, Przybylski M, Asfour T, Martí H (2012) Task-Based Grasp Adaptation on a Humanoid Robot. In: SyRoCo, pp 779–786
Howard M, Braun DJ, Vijayakumar S (2013) Transferring human impedance behavior to heterogeneous variable impedance actuators. IEEE Trans Rob 29(4):847–862
Botzer L, Karniel A (2013) Feedback and feedforward adaptation to visuomotor delay during reaching and slicing movements. Eur J Neurosci 38(1):2108–2123
Koppula HS, Saxena A (2016) Anticipating human activities using object affordances for reactive robotic response. IEEE Trans Pattern Anal Mach Intell 38(1):14–29
Matsui H, Ryu M, Kawabata H (2017) Visual feedback of target position affects accuracy of sequential movements at even spaces. J Motor Behav. https://doi.org/10.1080/00222895.2017.1407744
Shepard RN (1978) The mental image. Am Psychol 33(2):125
Pylyshyn ZW (1973) What the mind’s eye tells the mind’s brain: a critique of mental imagery. Psychol Bull 80(1):1
Gregory RL (2015) Eye and brain: the psychology of seeing. Princeton University Press, Princeton
Jolicoeur P, Gluck MA, Kosslyn SM (1984) Pictures and names: making the connection. Cogn Psychol 16(2):243–275
Yuan Y, Kitani K (2019) Ego-pose estimation and forecasting as real-time PD control. arXiv preprint arXiv:1906.03173
Chealse F, Xin YuT, Yan D, Trevor D, Sergey L, Pieter A (2015) Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. CoRR 16(2):243–275
Florence PR, Manuelli L, Tedrake R (2018) Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756
Karayiannidis Y, Smith C, Vina FE, Kragic D (2014) Online contact point estimation for uncalibrated tool use. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, pp 2488–2494
Yu KT, Rodriguez A (2018) Realtime state estimation with tactile and visual sensing. application to planar manipulation. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 7778–7785
Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The ycb object and model set: Towards common benchmarks for manipulation research. In: 2015 international conference on advanced robotics (ICAR). IEEE, pp 510–517
Levine S, Wagener N, Abbeel P (2015) Learning contact-rich manipulation skills with guided policy search arXiv preprint arXiv:1501.05611
Saravanakumar R, Rajchakit G, Ahn CK, Karimi HR (2017) Exponential stability, passivity, and dissipativity analysis of generalized neural networks with mixed time-varying delays. IEEE Trans Syst Man Cybern Syst 49(2):395–405
Saravanakumar R, Rajchakit G, Ali MS, Xiang Z, Joo YH (2018) Robust extended dissipativity criteria for discrete-time uncertain neural networks with time-varying delays. Neural Comput Appl 30(12):3893–3904
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bayraktar, E., Yigit, C.B. & Boyraz, P. Object manipulation with a variable-stiffness robotic mechanism using deep neural networks for visual semantics and load estimation. Neural Comput & Applic 32, 9029–9045 (2020). https://doi.org/10.1007/s00521-019-04412-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04412-5