Computational metadata generation methods for biological specimen image collections

Karnani, Kevin; Pepper, Joel; Bakiş, Yasin; Wang, Xiaojun; Bart Jr., Henry; Breen, David E.; Greenberg, Jane

doi:10.1007/s00799-022-00342-1

Computational metadata generation methods for biological specimen image collections

Published: 23 November 2022

(2022)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

337 Accesses
4 Citations
Explore all metrics

Abstract

Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species

Development of a system for the automated identification of herbarium specimens with high accuracy

Article Open access 16 May 2022

Image Representation for Image Mining: A Study Focusing on Mining Satellite Images for Census Data Collection

Code and data availability

Raw data for INHS is available here. Raw data for UWZM is available here. Reproducible code is available here.

References

Beaman, R.S., Cellinese, N.: Mass digitization of scientific collections: new opportunities to transform the use of biological specimens and underwrite biodiversity science. ZooKeys (2012). https://doi.org/10.3897/zookeys.209.3313
Article Google Scholar
Page, L.M., MacFadden, B.J., Fortes, J.A., Soltis, P.S., Riccardi, G.: Digitization of biodiversity collections reveals biggest data on biodiversity. Bioscience 65(9), 841–842 (2015)
Article Google Scholar
Tibbetts, J.H.: The frontiers of artificial intelligence. Bioscience 68(1), 5–10 (2018)
Article Google Scholar
Darwin Core Quick Reference Guide. https://dwc.tdwg.org/terms/
Leipzig, J., Bakis, Y., Wang, X., Elhamod, M., Diamond, K., Dahdul, W., Karpatne, A., Maga, M., Mabee, P., Bart, H.L., Greenberg, J.: Biodiversity image quality metadata augments convolutional neural network classification of fish species. In: Garoufallou, E., Ovalle-Perandones, M.-A. (eds.) Metadata and Semantic Research, pp. 3–12. Springer, Cham (2021)
Chapter Google Scholar
Pepper, J., Greenberg, J., Bakiş, Y., Wang, X., Bart, H., Breen, D.: Automatic metadata generation for fish specimen image collections. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 31–40 (2021). https://doi.org/10.1109/JCDL52503.2021.00015
University of Wisconsin Zoological Museum: Fishes Collection. https://uwzm.integrativebiology.wisc.edu/fishes-collection/ (2022)
Darwin Core Maintenance Group: List of Darwin Core terms. http://rs.tdwg.org/dwc/doc/list/ (2020)
GBIF/TDWG Multimedia Resources Task Group: Audubon Core Multimedia Resources Metadata Schema. http://www.tdwg.org/standards/638 (2013)
Chapman, A., Belbin, L., Zermoglio, P., Wieczorek, J., Morris, P., Nicholls, M., Rees, E.R., Veiga, A., Thompson, A., Saraiva, A., et al.: Developing standards for improved data quality and for selecting fit for use biodiversity data. Biodivers. Inf. Sci. Stand. 4, 50889 (2020)
Google Scholar
Wieczorek, J., Bloom, D., Guralnick, R., Blum, S., Döring, M., Giovanni, R., Robertson, T., Vieglais, D.: Darwin core: an evolving community-developed biodiversity data standard. PLoS ONE 7(1), 29715 (2012)
Article Google Scholar
Liddy, E.D., Allen, E., Harwell, S., Corieri, S., Yilmazel, O., Ozgencil, N.E., Diekema, A., McCracken, N., Silverstein, J., Sutton, S.: Automatic metadata generation & evaluation. In: Proc. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 401–402 (2002)
Greenberg, J.: Metadata extraction and harvesting: a comparison of two automatic metadata generation applications. J. Internet Catalog. 6(4), 59–82 (2004)
Article Google Scholar
Cardinaels, K., Meire, M., Duval, E.: Automating metadata generation: the simple indexing interface. In: Proc. International Conference on World Wide Web, pp. 548–556 (2005)
Paynter, G.W.: Developing practical automatic metadata assignment and evaluation tools for internet resources. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’05), pp. 291–300 (2005). IEEE
Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proc. Joint Conference on Digital Libraries, pp. 37–48 (2003). IEEE
Rodriguez, M.A., Bollen, J., Sompel, H.V.D.: Automatic metadata generation using associative networks. ACM Trans. Inf. Syst. 27(2), 1–20 (2009)
Article Google Scholar
Heidorn, P.B., Wei, Q.: Automatic metadata extraction from museum specimen labels. In: International Conference on Dublin Core and Metadata Applications, pp. 57–68 (2008)
Manso, M., Nogueras-Iso, J., Bernabe, M., Zarazaga-Soria, F.: Automatic metadata extraction from geographic information. In: 7th Conference on Geographic Information Science (AGILE 2004), Heraklion, Greece, pp. 379–385 (2004)
Zion, B., Shklyar, A., Karplus, I.: In-vivo fish sorting by computer vision. Aquacult. Eng. 22, 165–179 (2000)
Article Google Scholar
Saberioon, M., Gholizadeh, A., Císař, P., Pautsina, A., Urban, J.: Application of machine vision systems in aquaculture with emphasis on fish: state-of-the-art and key issues. Rev. Aquac. 9, 369–387 (2017)
Article Google Scholar
Hu, J., Li, D., Duan, Q., Han, Y., Chen, G., Si, X.: Fish species classification by color, texture and multi-class support vector machine using computer vision. Comput. Electron. Agric. 88, 133–140 (2012)
Article Google Scholar
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)
Article Google Scholar
Li, L., Hong, J.: Identification of fish species based on image processing and statistical analysis research. In: Proc. IEEE International Conference on Mechatronics and Automation, pp. 1155–1160 (2014)
Rodrigues, M.T.A., Freitas, M.H.G., Pádua, F.L.C., Gomes, R.M., Carrano, E.G.: Evaluating cluster detection algorithms and feature extraction techniques in automatic classification of fish species. Pattern Anal. Appl. 18(4), 783–797 (2015)
Article MathSciNet Google Scholar
Hernández-Serna, A., Jiménez-Segura, L.F.: Automatic identification of species with neural networks. PeerJ. 2, e563 (2014)
Article Google Scholar
Salman, A., Jalal, A., Shafait, F., Mian, A., Shortis, M., Seager, J.W., Harvey, E.: Fish species classification in unconstrained underwater environments based on deep learning. Limnol. Oceanogr. Methods. 14, 570–585 (2016)
Article Google Scholar
LeCun, Y., F.J. Huang, Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 104–2 (2004)
Alsmadi, M., Tayfour, M., Alkhasawneh, R., Badawi, U., Almarashdeh, I., Haddad, F.: Robust features extraction for general fish classification. Int. J. Electr. Comput. Eng. 9, 5192 (2019)
Google Scholar
Iqbal, M.A., Wang, Z., Ali, Z., Riaz, S.: Automatic fish species classification using deep convolutional neural networks. Wirel. Pers. Commun. 116, 1043–1053 (2021)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proc. 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105 (2012)
Yu, C., Fan, X., Hu, Z., Xia, X., Zhao, Y., Li, R., Bai, Y.: Segmentation and measurement scheme for fish morphological features based on mask R-CNN. Inf. Process. Agric. 7(4), 523–534 (2020)
Google Scholar
Petrellis, N.: Measurement of fish morphological features through image processing and deep learning techniques. Appl. Sci. 11, 4416 (2021)
Article Google Scholar
Hao, M., Yu, H., Li, D.: The measurement of fish size by machine vision—a review. In: Proc. 9th International Conference on Computer and Computing Technologies in Agriculture, pp. 15–32 (2015)
Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, pp. 740–755. Springer, Cham (2014). https://doi.org/10.48550/arXiv.1405.0312
J. F. Bell Museum of Natural History: Fishes Collection. https://www.bellmuseum.umn.edu/fishes/ (2022)
Skalski, P.: Make Sense. https://github.com/SkalskiP/make-sense/ (2019)
Illinois Natural History Survey: INHS Fish Collection. https://fish.inhs.illinois.edu/ (2022)
Cai, T., Zhu, F., Hao, Y., Fan, X.: Performance evaluation of image enhancement methods for objects detection and recognition. In: Proceedings of the SPIE: Image Processing and Analysis, vol. 9675. (2015). SPIE
Pizer, S.M., Amburn, E.P., Austin, J.D., Cromartie, R., Geselowitz, A., Greer, T., ter Haar Romeny, B., Zimmerman, J.B., Zuiderveld, K.: Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 39(3), 355–368 (1987). https://doi.org/10.1016/S0734-189X(87)80186-X
Article Google Scholar
Manju, R.A., Koshy, G., Simon, P.: Improved method for enhancing dark images based on clahe and morphological reconstruction. Procedia Computer Science 165, 391–398 (2019). https://doi.org/10.1016/j.procs.2020.01.033. 2nd International Conference on Recent Trends in Advanced Computing ICRTAC -DISRUP - TIV INNOVATION, 2019 November 11-12, 2019
Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)
Singh, R., Biswas, M.: Adaptive histogram equalization based fusion technique for hazy underwater image enhancement. In: 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–5 (2016). https://doi.org/10.1109/ICCIC.2016.7919711
Lucas, J., Calef, B., Knox, K.: Image enhancement for astronomical scenes. In: Tescher, A.G. (ed.) Applications of Digital Image Processing XXXVI, vol. 8856, pp. 12–19. SPIE, International Society for Optics and Photonics (2013). https://doi.org/10.1117/12.2025191
Yu, H., Inoue, K., Hara, K., Urahama, K.: Saturation improvement in hue-preserving color image enhancement without gamut problem. ICT Express. 4(3), 134–137 (2018). https://doi.org/10.1016/j.icte.2017.07.003
Article Google Scholar
Trahanias, P.E., Venetsanopoulos, A.N.: Color image enhancement through 3-d histogram equalization. In: Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis, pp. 545–548 (1992). https://doi.org/10.1109/ICPR.1992.202045
Reddy, K.S., Reddy, D.K.R.L.: Enlargement of image based upon interpolation techniques. Int. J. Adv. Res. Comput. Commun. Eng. 2(12), 4631 (2013)
Google Scholar
Vidya, M.S., Shastry, A.H., Mallya, Y.: 4 - automated detection of intracranial hemorrhage in noncontrast head computed tomography. In: Koundal, D., Gupta, S. (eds.) Advances in Computational Techniques for Biomedical Image Analysis, pp. 71–98. Academic Press (2020). https://doi.org/10.1016/B978-0-12-820024-7.00004-9. https://www.sciencedirect.com/science/article/pii/B9780128200247000049
Keys, R.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981). https://doi.org/10.1109/TASSP.1981.1163711
Article MathSciNet MATH Google Scholar
Turkowski, K.: Filters for common resampling tasks. Graphics Gems pp. 147–165 (1990). https://doi.org/10.1.1.116.7898
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2017)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article Google Scholar
Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M.H., Brett, M., Haldane, A., del Río, J.F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., Oliphant, T.E.: Array programming with NumPy. Nature. 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Article Google Scholar
van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T.: The scikit-image contributors: scikit-image: image processing in Python. PeerJ. 2, 453 (2014). https://doi.org/10.7717/peerj.453
Article Google Scholar
Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Freeman, H.: On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. 10(2), 260–268 (1961). https://doi.org/10.1109/TEC.1961.5219197
Schneider, C.A., Rasband, W.S., Eliceiri, K.W.: NIH Image to ImageJ: 25 years of image analysis. Nat. Methods. 9(7), 671–675 (2012)
Article Google Scholar
Hochreiter, S.: Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität München 91(1) (1991)
Team, F.: FishBase. https://www.fishbase.de/search.php (2020)
Torres, R.S., Medeiros, C.B., Gonçcalves, M.A., Fox, E.A.: A digital library framework for biodiversity information systems. Int. J. Digit. Libr. 6(1), 3–17 (2006)
Article Google Scholar

Download references

Acknowledgements

We thank the full BGNN team for support, Chris A. Taylor, Curator of Fishes and Crustaceans at the Illinois Natural History Survey (INHS), and John Lyons, Curator of Fish at the University of Wisconsin Zoological Museum (UWZM). INHS and UWZM are two of six fish collections participating in the Great Lakes Invasives Network (GLIN).

Funding

Research supported by NSF Office of Advanced Cyberinfrastructure (OAC) #1940233 and #1940322.

Author information

Authors and Affiliations

Computer Science Department, Drexel University, Philadelphia, PA, USA
Kevin Karnani, Joel Pepper & David E. Breen
Biodiversity Research Institute, Tulane University, New Orleans, LA, USA
Yasin Bakiş, Xiaojun Wang & Henry Bart Jr.
Information Science Department, Drexel University, Philadelphia, PA, USA
Jane Greenberg

Authors

Kevin Karnani
View author publications
You can also search for this author in PubMed Google Scholar
Joel Pepper
View author publications
You can also search for this author in PubMed Google Scholar
Yasin Bakiş
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Henry Bart Jr.
View author publications
You can also search for this author in PubMed Google Scholar
David E. Breen
View author publications
You can also search for this author in PubMed Google Scholar
Jane Greenberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joel Pepper.

Ethics declarations

Conflict of interest

The authors have declared that no conflict of interest exist.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Karnani, K., Pepper, J., Bakiş, Y. et al. Computational metadata generation methods for biological specimen image collections. Int J Digit Libr (2022). https://doi.org/10.1007/s00799-022-00342-1

Download citation

Received: 30 March 2022
Revised: 30 October 2022
Accepted: 02 November 2022
Published: 23 November 2022
DOI: https://doi.org/10.1007/s00799-022-00342-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational metadata generation methods for biological specimen image collections

Abstract

Access this article

Similar content being viewed by others

Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species

Development of a system for the automated identification of herbarium specimens with high accuracy

Image Representation for Image Mining: A Study Focusing on Mining Satellite Images for Census Data Collection

Code and data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computational metadata generation methods for biological specimen image collections

Abstract

Access this article

Similar content being viewed by others

Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species

Development of a system for the automated identification of herbarium specimens with high accuracy

Image Representation for Image Mining: A Study Focusing on Mining Satellite Images for Census Data Collection

Code and data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation