Efficiently Annotating Object Images with Absolute Size Information Using Mobile Devices
- 334 Downloads
The projection of a real world scenery to a planar image sensor inherits the loss of information about the 3D structure as well as the absolute dimensions of the scene. For image analysis and object classification tasks, however, absolute size information can make results more accurate. Today, the creation of size annotated image datasets is effort intensive and typically requires measurement equipment not available to public image contributors. In this paper, we propose an effective annotation method that utilizes the camera within smart mobile devices to capture the missing size information along with the image. The approach builds on the fact that with a camera, calibrated to a specific object distance, lengths can be measured in the object’s plane. We use the camera’s minimum focus distance as calibration distance and propose an adaptive feature matching process for precise computation of the scale change between two images facilitating measurements on larger object distances. Eventually, the measured object is segmented and its size information is annotated for later analysis. A user study showed that humans are able to retrieve the calibration distance with a low variance. The proposed approach facilitates a measurement accuracy comparable to manual measurement with a ruler and outperforms state-of-the-art methods in terms of accuracy and repeatability. Consequently, the proposed method allows in-situ size annotation of objects in images without the need for additional equipment or an artificial reference object in the scene.
KeywordsSize annotation Size measurement In-situ size annotation minimum focus distance Absolute size Mobile device
We would like to thank all participants of our user experiment for supporting our work. We are funded through a scholarship of the Friedrich Naumann Stiftung; the German Ministry of Education and Research (BMBF) Grants: 01LC1319A and 01LC1319B; the German Federal Ministry for the Environment, Nature Conservation, Building and Nuclear Safety (BMUB) Grant: 3514 685C19; and the Stiftung Naturschutz Thüringen (SNT) Grant: SNT-082-248-03/2014.
- Aanæs, H., Dahl, A. L., & Perfanov, V. (2010). A ground truth data set for two view image matching. Technical report, DTU Informatics, Technical University of Denmark. http://roboimagedata.imm.dtu.dk/papers/technicalReport.pdf.
- Agarwal, S. (2009). R.: Building rome in a day. In International conference on computer vision (ICCV).Google Scholar
- Apple Inc. (2017). Arkit. https://developer.apple.com/arkit/.
- Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2911–2918). https://doi.org/10.1109/CVPR.2012.6248018.
- Bradski, G. (2000). The OpenCV library. Dr Dobb’s Journal of Software Tools, 25, 120–123.Google Scholar
- Bursuc, A., Tolias, G., & Jégou, H. (2015). Kernel local descriptors with implicit rotation matching. In Proceedings of the 5th ACM on international conference on multimedia retrieval (pp. 595–598). ACM, New York, NY, USA, ICMR ’15. https://doi.org/10.1145/2671188.2749379.
- Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., et al. (2016). Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics, 32(6), 1309–1332. https://doi.org/10.1109/TRO.2016.2624754.CrossRefGoogle Scholar
- Dong, J., & Soatto, S. (2015). Domain-size pooling in local descriptors: Dsp-sift. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5097–5106). https://doi.org/10.1109/CVPR.2015.7299145.
- Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In 2015 IEEE international conference on computer vision (ICCV) (pp. 2650–2658). https://doi.org/10.1109/ICCV.2015.304.
- Google Inc. (2017). Arcore. https://developers.google.com/ar/.
- Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Proceedings of the alvey vision conference (pp. 23.1–23.6). Alvety Vision Club. https://doi.org/10.5244/C.2.23.
- Karlsson, N., di Bernardo, E., Ostrowski, J., Goncalves, L., Pirjanian, P., & Munich, M. E. (2005). The vslam algorithm for robust localization and mapping. In Proceedings of the 2005 IEEE international conference on robotics and automation (pp. 24–29). https://doi.org/10.1109/ROBOT.2005.1570091.
- Ke, Y., & Sukthankar, R. (2004). Pca-sift: A more distinctive representation for local image descriptors. In Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004 (Vol. 2, pp. II–506–II–513). CVPR 2004. https://doi.org/10.1109/CVPR.2004.1315206.
- Kim, H., Richardt, C., & Theobalt, C. (2016). Video depth-from-defocus. In 2016 fourth international conference on 3D vision (3DV) (pp. 370–379). IEEE.Google Scholar
- Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small ar workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality (pp. 225–234). https://doi.org/10.1109/ISMAR.2007.4538852.
- Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view rgb-d object dataset. In 2011 IEEE international conference on robotics and automation (pp. 1817–1824). https://doi.org/10.1109/ICRA.2011.5980382.
- Luhmann, T., Robson, S., Kyle, S., & Harley, I. (2006). Close range photogrammetry: Principles, methods and applications. Dunbeath: Whittles.Google Scholar
- Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86. https://doi.org/10.1023/B:VISI.0000027790.02288.f2.CrossRefGoogle Scholar
- Mustafah, Y. M., Noor, R., Hasbi, H., & Azma, A. W. (2012). Stereo vision images processing for real-time object distance and size measurements. In 2012 international conference on computer and communication engineering (ICCCE) (pp. 659–663). https://doi.org/10.1109/ICCCE.2012.6271270.
- Robertson, P., Frassl, M., Angermann, M., Doniec, M., Julian, B. J., Puyol, M. G., Khider, M., Lichtenstern, M., & Bruno, L. (2013). Simultaneous localization and mapping for pedestrians using distortions of the local magnetic field intensity in large indoor environments. In International conference on indoor positioning and indoor navigation (pp. 1–10). https://doi.org/10.1109/IPIN.2013.6817910.
- Schönberger, J. L., Hardmeier, H., Sattler, T., & Pollefeys, M. (2017). Comparative evaluation of hand-crafted and learned local features. In Conference on computer vision and pattern recognition (CVPR) Google Scholar
- Thrun, S., et al. (2002). Robotic mapping: A survey. Exploring Artificial Intelligence in the New Millennium, 1, 1–35.Google Scholar
- Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: Efficient boosting procedures for multiclass object detection. In Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004 (Vol. 2, pp. II–762–II–769). CVPR 2004. https://doi.org/10.1109/CVPR.2004.1315241.
- Wittich, H. C., Seeland, M., Wäldchen, J., Rzanny, M., & Mäder, P. (2018). Recommending plant taxa for supporting on-site species identification. BMC Bioinformatics, 19. https://doi.org/10.1186/s12859-018-2201-7
- ygx2011. (2017). Orb slam2 ios. https://github.com/ygx2011/ORB_SLAM2-IOS.