Soft Computing

, Volume 22, Issue 14, pp 4763–4778 | Cite as

Advanced pattern recognition from complex environments: a classification-based approach

  • Alfredo Cuzzocrea
  • Enzo Mumolo
  • Giorgio Mario Grasso
Methodologies and Application


This paper describes an algorithm for building 3D maps of objects detected in the visual scene acquired in an indoor environment. One feature of the described algorithm is that it works with a standard webcam equipped with a simple devices which automatically estimates the camera orientation and its distance from the floor. Another feature is that the algorithm has a low computational complexity. The proposed algorithm first extracts from the acquired images the regions of interest (ROI) which may contain an object. The ROI’s 3D position is then estimated and a map of the environment is generated. ROI extraction is realized with an Haar-like approach. ROIs are represented with edge-based features. The edge representation is filtered with a novel fuzzy-based technique which removes edges introduced by noise. Object classification is performed with a pseudo2D-HMM algorithm. We prove the reliability of our method by discussing some critical applications in the context of human–robot interaction and robot–robot interaction. Finally, we complete our contributions via describing a case study in the robotic field and providing comprehensive experimental results showing the benefits deriving from our approach.


Intelligent computer vision applications Object classification Complex methodologies in soft computing 


Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. Besl PJ, McKay ND (1992) A method for registration of 3d shapes. IEEE Trans PAMI, 239–256Google Scholar
  2. Biswas J, Veloso M (2012) Depth camera based indoor mobile robot localization and navigation. In: IEEE international conference on robotics and automation, ICRA, 14–18 May, 2012. St. Paul, Minnesota, USA, pp 1697–1702Google Scholar
  3. Campbell J, Sukthankar R, Nourbakhsh I, Pahwa A (2005) A robust visual odometry and precipice detection system using consumergrade monocular vision. In: Proceedings of the 2005 IEEE international conference on robotics and automation ICRA 2005, pp 3421–3427Google Scholar
  4. Canny J (1986) A computational approach to edge detection. Pattern Anal Mach Intell 8(6):679–698CrossRefGoogle Scholar
  5. Cuzzocrea A (2006) Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In: Proceedings of 18th international conference on scientific and statistical database management, SSDBM 2006, 3–5 July 2006, Vienna, Austria, pp 301–310Google Scholar
  6. Cuzzocrea A (2014) Privacy and security of big data: current challenges and future research perspectives. In Proceedings of the first international workshop on privacy and secuirty of big data, PSBD@CIKM, Shanghai, China, November 7, 2014, pp 45–47Google Scholar
  7. Cuzzocrea A, Furfaro F, Saccà D (2009) Enabling OLAP in mobile environments via intelligent data cube compression techniques. J Intell Inf Syst 33(2):95–143CrossRefGoogle Scholar
  8. Cuzzocrea A, Mumolo E, Moro A (2015) A classification-based algorithm for building 3d maps of environmental objects. In: 15th international conference on computational science and its applications, ICCSA, Banff, AB, Canada, June 22–25, 2015, short papers, poster papers, and Ph. D. student showcase works, pp 33–41Google Scholar
  9. Cuzzocrea A, Saccà D (2010) Balancing accuracy and privacy of OLAP aggregations on data cubes. In: Proceedings of DOLAP 2010, ACM 13th international workshop on data warehousing and OLAP, Toronto, Ontario, Canada, October 30, 2010, pp 93–98Google Scholar
  10. Cuzzocrea A, Saccà D, Serafino P (2006) A hierarchy-driven compression technique for advanced OLAP visualization of multidimensional data cubes. In: Proceedings of 8th international conference data warehousing and knowledge discovery, DaWaK, Krakow, Poland, September 4–8, 2006, pp 106–119Google Scholar
  11. Cuzzocrea A, Saccà D, Ullman JD (2013) Big data: a research agenda. In: 17th international database engineering & applications symposium, IDEAS, Barcelona, Spain, October 09–11, 2013, pp 198–203Google Scholar
  12. Cuzzocrea A, Song IY, Davis KC (2011) Analytics over large-scale multidimensional data: the big data revolution!. In: Proceedings of DOLAP, ACM 14th international workshop on data warehousing and OLAP, Glasgow, United Kingdom, October 28, 2011, pp 101–104Google Scholar
  13. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. CoRR, abs/1406.2283Google Scholar
  14. Felzenszwalb PF, Huttenclocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181CrossRefGoogle Scholar
  15. Feng Lu, Milios Evangelos (1997) Robot pose estimation in unknown environments by matching 2d range scans. J Intell Robot Syst 18(3):249–275CrossRefGoogle Scholar
  16. Furfaro F, Cuzzocrea A, Masciari E, Saccà D, Sirangelo C (2004) Approximate query answering on sensor network data streams. In: Stefanidis A, Nittel S (eds) GeoSensor Networks. CRC Press, Boca Raton, FL, USA, pp 53–72Google Scholar
  17. Geiger A, Lauer M, Wojek C, Stiller C, Urtasun R (2014) 3D traffic scene understanding from movable platforms. IEEE Trans Pattern Anal Mach Intell 36(5):1012–1025CrossRefGoogle Scholar
  18. Gerkey B, Vaughan R, Howard A (2003) The player/stage project: tools for multi-robot and distributed sensor systems. In: Proceedings of the international conference on advanced robotics, pp 317–323Google Scholar
  19. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR’14), pp 580–587Google Scholar
  20. Gonzalez RC, Woods RE (1992) Digital image processing, 2nd edn. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USAGoogle Scholar
  21. Graham D, Simmons G, Nguyen DT, Zhou G (2015) A software-based sonar ranging sensor for smart phones. IEEE Internet Things J 2(6):479–489CrossRefGoogle Scholar
  22. Hafed ZM, Levine MD (2001) Face recognition using the discrete cosine transform. Int J Comput Vis 43(3):167–188CrossRefzbMATHGoogle Scholar
  23. Ikehara M, Nagai T, Naruse T, Kurematsu A (2002) Hmm-based surface reconstruction from single images. In: Proceeding of IEEE international conference on image processing (ICIP), pp 561–564Google Scholar
  24. Jones MJ, Viola P (2003) Fase multiview face detection. MERL Technical Report No. TR2003-96, Cambridge, MA, USAGoogle Scholar
  25. Karsch K, Liu C, Kang SB (2012) Depth extraction from video using non-parametric sampling. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer Vision - ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7576. Springer, Berlin, Heidelberg, pp 775–788Google Scholar
  26. Kawakita M, Iizuka K, Aida T, Kurita T, Kikuchi H (2004) Real-time three-dimensional video image composition by depth information. IEICE Electron Express 1:237–242CrossRefGoogle Scholar
  27. Kearns J, Saxena A, Driemeyer J, Ng A (2006) Robotic grasping of novel objects. In: Proceeding of 20th anniversary conference neural information processing systems, vol 19Google Scholar
  28. Kirsch R (1971) Computer determination of the constituent structure of biological images. Comput Biomed Res 4:315–328CrossRefGoogle Scholar
  29. Konrad J, Wang M, Ishwar P (2012) 2d-to-3d image conversion by learning depth from examples. In: CVPR Workshops. pp 16–22Google Scholar
  30. Ladicky L, Shi J, Pollefeys M (2014) Pulling things out of perspective. In: CVPR. pp 89–96Google Scholar
  31. Ladický L, Zeisl B, Pollefeys M (2014) Discriminatively trained dense surface normal estimation. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham, pp 468–484Google Scholar
  32. Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: Proceedings 2002. International Conference on In Image Processing. vol 1. pp 900–903Google Scholar
  33. Liu Z, Xu S, Zhang Y, Chen X, Chen CP (2014) Interval type-2 fuzzy kernel based support vector machine algorithm for scene classification of humanoid robot. Soft Comput 18(3):589–606CrossRefGoogle Scholar
  34. Liu B, Gould S, Koller D (2010) Single image depth estimation from predicted semantic labels. In: Proceedings of IEEE Internationational Conference on Computer Vision and Pattern Recognition (CVPR). pp 1253–1260Google Scholar
  35. Liu M, Salzmann M, He X (2014) Discrete-continuous depth estimation from a single image. In: CVPR, pp 716–723,Google Scholar
  36. Marr D, Hildreth E (1980) Theory of edge detection. Proc R Soc Lond 207:187–217CrossRefGoogle Scholar
  37. McColl D, Zhang Z, Nejat G (2011) Human body pose interpretation and classification for social human–robot interaction. Int J Soc Robot 3(3):313–332CrossRefGoogle Scholar
  38. Minguez J, Montesano L, Lamiraux F (2006) Metric-based iterative closest point scan matching for sensor displacement estimation. Trans Robot 22(5):1047–1054CrossRefGoogle Scholar
  39. Minguez J, Montesano L, Lamiraux F (2006) Metric-based iterative closest point scan matching for sensor displacement estimation. IEEE Trans Robot 22(5):1047–1054CrossRefGoogle Scholar
  40. Modayil J, Kuipers B (2006) Autonomous shape model learning for object localization and recognition. In: International conference on robotics and automation (ICRA), pp 2991–2996,Google Scholar
  41. Montemerlo M, Thrun S, Koller D, Wegbreit B (2002) Fastslam: a factored solution to the simultaneous localization and mapping problem. In: Proceedings of the 18th national conference on artificial intelligence (AAAI), pp 593–598Google Scholar
  42. Moro A, Mumolo E, Nolich M (2008) Visual scene analysis using relaxation labeling and embedded hidden markov models for map-based robot navigation. In: International conference on information technology interfaces ITI, pp 767–772Google Scholar
  43. Mozos OM, Triebel R, Jensfelt P, Rottmann A, Burgard W (2007) Supervised semantic labeling of places using information extracted from sensor data. Robot Auton Syst 55(5):391–402CrossRefGoogle Scholar
  44. Nefian A, Hayes MH (1999) An embedded hmm-based aproach for face detection and recognition. In: International conference on acoustics, speech and signal processing, pp 3553–3556Google Scholar
  45. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring midlevel image representations using convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1717–1724Google Scholar
  46. Parent P, Zucker S (1989) Trace inference, curvature consistency and curve detection. IEEE Trans Pattern Anal Mach Intell 11(8):823–839CrossRefGoogle Scholar
  47. Peleg S, Rosenfeld A (1978) Determining compatibility coefficients for curve enhancement relaxation processes. IEEE Trans Syst Man Cybern SMC–8:548–555Google Scholar
  48. Ranganathan A, Dellaert F (2007) Semantic modeling of places using objects. In: Proceedings of robotics: science and systems. doi: 10.15607/RSS.2007.III.001
  49. Rangel JC, Cazorla M, Garcia-Varea I, Martinez Gomez J, Fromont E, Sebban M (2016) Scene classification based on semantic labeling. Adv Robot 30(11–12):758–769CrossRefGoogle Scholar
  50. Romero-Cano V, Agamennoni G, Nieto J (2016) A variational approach to simultaneous multi-object tracking and classification. Int J Robot Res 35(6):654–671CrossRefGoogle Scholar
  51. Rosa Tiago, Queluz Maria Paula (2001) Authentication of digital images and video: generic models and a new contribution. Sig Proc Image Commun 16(5):461–475CrossRefGoogle Scholar
  52. Rusu RB, Maldonado A, Beetz M, Kranz M, Msenlechner L, Holleis P, Schmidt A (2006) Player/stage as middleware for ubiquitous computing. In: Proceedings of the 8th annual conference on ubiquitous computing, pp 17–21Google Scholar
  53. Saxena A, Sun M, Ng AY (2009) Make3D: learning 3D scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31(5):824–840Google Scholar
  54. Schwalb M, Ewerth R, Freisleben B (2007) Using depth features to retrieve monocular video shots. In: Proceeding of ACM international conference on image and video retrieval, pp 210–217Google Scholar
  55. Stark M, Zia Zeeshan M, Schindler K (2013) Explicit occlusion modeling for 3d object class representations. In: CVPR2013, pp 3326–3333Google Scholar
  56. Tomono M (2006) 3-d object map building using dense object models with sift-based recognition features. In: Proceeding of IEEE international conference of intelligent robots and systems - IROSGoogle Scholar
  57. Torralba A (2003) Contextual priming for object detection. Int J Comput Vis 53(2):161–191MathSciNetCrossRefGoogle Scholar
  58. Vasudevan S, Gachter S, Berger M, Siegwart R (2007) Cognitive maps for mobile robots—an object based approach. Robot Auton Syst 55(5):359–371CrossRefGoogle Scholar
  59. Yu B, Cuzzocrea A, Jeong DH, Maydebura S (2012) On managing very large sensor-network data using bigtable. In: 12th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGrid 2012, Ottawa, Canada, May 13–16, 2012, pp 918–922Google Scholar
  60. Zucker SW, Hummel RA, Rosenfeld A (1977) An application of relaxation labeling to line and curve enhancement. IEEE Trans Comput 26(4):394–403CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.University of Trieste and ICAR-CNRTriesteItaly
  2. 2.University of TriesteTriesteItaly
  3. 3.University of MessinaMessinaItaly

Personalised recommendations