Classification approach for automatic laparoscopic video database organization

Twinanda, Andru Putra; Marescaux, Jacques; de Mathelin, Michel; Padoy, Nicolas

doi:10.1007/s11548-015-1183-4

Classification approach for automatic laparoscopic video database organization

Original Article
Published: 07 April 2015

Volume 10, pages 1449–1460, (2015)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Andru Putra Twinanda¹,
Jacques Marescaux²,
Michel de Mathelin¹ &
…
Nicolas Padoy¹

439 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Purpose

One of the advantages of minimally invasive surgery (MIS) is that the underlying digitization provides invaluable information regarding the execution of procedures in various patient-specific conditions. However, such information can only be obtained conveniently if the laparoscopic video database comes with semantic annotations, which are typically provided manually by experts. Considering the growing popularity of MIS, manual annotation becomes a laborious and costly task. In this paper, we tackle the problem of laparoscopic video classification, which consists of automatically identifying the type of abdominal surgery performed in a video. In addition to performing classifications on the full recordings of the procedures, we also carry out sub-video and video clip classifications. These classifications are carried out to investigate how many frames from a video are needed to get a good classification performance and which parts of the procedures contain more discriminative features.

Method

Our classification pipeline is as follows. First, we reject the irrelevant frames from the videos using the color properties of the video frames. Second, we extract visual features from the relevant frames. Third, we quantize the features using several feature encoding methods, i.e., vector quantization, sparse coding (SC), and Fisher encoding. Fourth, we carry out the classification using support vector machines. While the sub-video classification is carried out by uniformly downsampling the video frames, the video clip classification is carried out by taking three parts of the videos (i.e., beginning, middle, and end) and running the classification pipeline separately for every video part. Ultimately, we build our final classification model by combining the features using a multiple kernel learning (MKL) approach.

Results

To carry out the experiments, we use a dataset containing 208 videos of eight different surgeries performed by 10 different surgeons. The results show that SC with \(K\)-singular value decomposition (K-SVD) yields the best classification accuracy. The results also demonstrate that the classification accuracy only decreases by 3 % when solely 60 % of the video frames are utilized. Furthermore, it is also shown that the end part of the procedures is the most discriminative part of the surgery. Specifically, by using only the last 20 % of the video frames, a classification accuracy greater than 70 % can be achieved. Finally, the combination of all features yields the best performance of 90.38 % accuracy.

Conclusions

The SC with K-SVD provides the best representation of our videos, yielding the best accuracies for all features. In terms of information, the end part of the laparoscopic videos is the most discriminative compared to the other parts of the videos. In addition to their good performance individually, the features yield even better classification results when all of them are combined using the MKL approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on SVM and their application in image classification

Article 11 January 2018

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

Human action recognition using fusion of multiview and deep features: an application to video surveillance

Article 14 March 2020

Notes

http://www.websurg.com/.
IRCAD stands for Institut de Recherche contre les Cancers de l’Appareil Digestif.

References

Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. Signal Process IEEE Trans 54(11):4311–4322
Article Google Scholar
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2014) Good practice in large-scale learning for image classification. IEEE Trans Pattern Anal Mach Intell 36(3):507–520
Article PubMed Google Scholar
Allan M, Thompson S, Clarkson MJ, Ourselin S, Hawkes D, Kelly J, Stoyanov D (2014) 2d-3d pose tracking of rigid instruments in minimally invasive surgery. In: IPCAI, Springer International Publishing, pp 1–10
Atasoy S, Mateus D, Meining A, Yang GZ, Navab N (2012) Endoscopic video manifolds for targeted optical biopsy. IEEE Trans Med Imaging 31(3):637–653
Article PubMed Google Scholar
Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: In ECCV, pp 404–417
Blum T, Feussner H, Navab N (2010) Modeling and segmentation of surgical workflow from laparoscopic video. In: MICCAI (3), pp 400–407
Cabras P, Goyard D, Nageotte F, Zanne P, Doignon C (2014) Comparison of methods for estimating the position of actuated instruments in flexible endoscopic surgery. In: IROS, pp 3522–3528
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVA, pp 76.1–76.12
Chu WS, Zhou F, De la Torre F (2012) Unsupervised temporal commonality discovery. In: ECCV
Coates A, Ng A (2011) The importance of encoding versus training with sparse coding and vector quantization. In: ICML, pp 921–928
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, pp 886–893
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: VS-PETS
Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of AVC, pp 23.1–23.6
Lalys F, Riffaud L, Bouget D, Jannin P (2012) A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 59(4):966–976
Article PubMed Central CAS PubMed Google Scholar
Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Muenzer B, Schoeffmann K, Boszormenyi L (2013) Relevance segmentation of laparoscopic videos. In: IEEE International Symposium on Multimedia, pp 84–91
Padoy N, Mateus D, Weinland D, Berger MO, Navab N (2009) Workflow monitoring based on 3D motion features. In: Workshop on video-oriented object and event classification in conjunction with ICCV 2009, pp 585–592
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV, pp 143–156
Reiter A, Allen PK, Zhao T (2012) Feature classification for tracking articulated surgical tools. In: MICCAI, vol 7511, pp 592–600
Twinanda AP, Marescaux J, Mathelin MD, Padoy N (2014a) Towards better laparoscopic video database organization by automatic surgery classification. In: IPCAI, pp 186–194
Twinanda AP, Mathelin MD, Padoy N (2014b) Fisher kernel based task boundary retrieval in laparoscopic database with single video query. In: MICCAI, pp 409–416
Varma M, Babu RB (2009) More generality in efficient multiple kernel learning. In: ICML, ACM, pp 1065–1072
Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. In: ICM, ACM, pp 1469–1472
Xia L, Aggarwal J (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: CVPR
Zappella L, Bejar B, Hager G, Vidal R (2013) Surgical gesture classification from video and kinematic data. Med Image Anal 17(7):732–745
Article PubMed Google Scholar

Download references

Acknowledgments

This work was supported by French state funds managed by the ANR within the Investissements d’Avenir program under references ANR-11-LABX-0004 (Labex CAMI), ANR-10-IDEX-0002-02 (IdEx Unistra), and ANR-10-IAHU-02 (IHU Strasbourg). The authors would like to thank the IRCAD audiovisual team for their help in generating the dataset.

Conflict of interest

Andru P. Twinanda, Jacques Marescaux, Michel de Mathelin, and Nicolas Padoy declare that they have no conflict of interest.

Author information

Authors and Affiliations

ICube Laboratory, University of Strasbourg, CNRS, IHU, Strasbourg, France
Andru Putra Twinanda, Michel de Mathelin & Nicolas Padoy
IRCAD, University Hospital of Strasbourg, Strasbourg, France
Jacques Marescaux

Authors

Andru Putra Twinanda
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Marescaux
View author publications
You can also search for this author in PubMed Google Scholar
Michel de Mathelin
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Padoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andru Putra Twinanda.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 58397 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Twinanda, A.P., Marescaux, J., de Mathelin, M. et al. Classification approach for automatic laparoscopic video database organization. Int J CARS 10, 1449–1460 (2015). https://doi.org/10.1007/s11548-015-1183-4

Download citation

Received: 12 December 2014
Accepted: 18 March 2015
Published: 07 April 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11548-015-1183-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification approach for automatic laparoscopic video database organization