Skip to main content
Log in

Computational metadata generation methods for biological specimen image collections

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Code and data availability

Raw data for INHS is available here. Raw data for UWZM is available here. Reproducible code is available here.

References

  1. Beaman, R.S., Cellinese, N.: Mass digitization of scientific collections: new opportunities to transform the use of biological specimens and underwrite biodiversity science. ZooKeys (2012). https://doi.org/10.3897/zookeys.209.3313

    Article  Google Scholar 

  2. Page, L.M., MacFadden, B.J., Fortes, J.A., Soltis, P.S., Riccardi, G.: Digitization of biodiversity collections reveals biggest data on biodiversity. Bioscience 65(9), 841–842 (2015)

    Article  Google Scholar 

  3. Tibbetts, J.H.: The frontiers of artificial intelligence. Bioscience 68(1), 5–10 (2018)

    Article  Google Scholar 

  4. Darwin Core Quick Reference Guide. https://dwc.tdwg.org/terms/

  5. Leipzig, J., Bakis, Y., Wang, X., Elhamod, M., Diamond, K., Dahdul, W., Karpatne, A., Maga, M., Mabee, P., Bart, H.L., Greenberg, J.: Biodiversity image quality metadata augments convolutional neural network classification of fish species. In: Garoufallou, E., Ovalle-Perandones, M.-A. (eds.) Metadata and Semantic Research, pp. 3–12. Springer, Cham (2021)

    Chapter  Google Scholar 

  6. Pepper, J., Greenberg, J., Bakiş, Y., Wang, X., Bart, H., Breen, D.: Automatic metadata generation for fish specimen image collections. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 31–40 (2021). https://doi.org/10.1109/JCDL52503.2021.00015

  7. University of Wisconsin Zoological Museum: Fishes Collection. https://uwzm.integrativebiology.wisc.edu/fishes-collection/ (2022)

  8. Darwin Core Maintenance Group: List of Darwin Core terms. http://rs.tdwg.org/dwc/doc/list/ (2020)

  9. GBIF/TDWG Multimedia Resources Task Group: Audubon Core Multimedia Resources Metadata Schema. http://www.tdwg.org/standards/638 (2013)

  10. Chapman, A., Belbin, L., Zermoglio, P., Wieczorek, J., Morris, P., Nicholls, M., Rees, E.R., Veiga, A., Thompson, A., Saraiva, A., et al.: Developing standards for improved data quality and for selecting fit for use biodiversity data. Biodivers. Inf. Sci. Stand. 4, 50889 (2020)

    Google Scholar 

  11. Wieczorek, J., Bloom, D., Guralnick, R., Blum, S., Döring, M., Giovanni, R., Robertson, T., Vieglais, D.: Darwin core: an evolving community-developed biodiversity data standard. PLoS ONE 7(1), 29715 (2012)

    Article  Google Scholar 

  12. Liddy, E.D., Allen, E., Harwell, S., Corieri, S., Yilmazel, O., Ozgencil, N.E., Diekema, A., McCracken, N., Silverstein, J., Sutton, S.: Automatic metadata generation & evaluation. In: Proc. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 401–402 (2002)

  13. Greenberg, J.: Metadata extraction and harvesting: a comparison of two automatic metadata generation applications. J. Internet Catalog. 6(4), 59–82 (2004)

    Article  Google Scholar 

  14. Cardinaels, K., Meire, M., Duval, E.: Automating metadata generation: the simple indexing interface. In: Proc. International Conference on World Wide Web, pp. 548–556 (2005)

  15. Paynter, G.W.: Developing practical automatic metadata assignment and evaluation tools for internet resources. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’05), pp. 291–300 (2005). IEEE

  16. Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proc. Joint Conference on Digital Libraries, pp. 37–48 (2003). IEEE

  17. Rodriguez, M.A., Bollen, J., Sompel, H.V.D.: Automatic metadata generation using associative networks. ACM Trans. Inf. Syst. 27(2), 1–20 (2009)

    Article  Google Scholar 

  18. Heidorn, P.B., Wei, Q.: Automatic metadata extraction from museum specimen labels. In: International Conference on Dublin Core and Metadata Applications, pp. 57–68 (2008)

  19. Manso, M., Nogueras-Iso, J., Bernabe, M., Zarazaga-Soria, F.: Automatic metadata extraction from geographic information. In: 7th Conference on Geographic Information Science (AGILE 2004), Heraklion, Greece, pp. 379–385 (2004)

  20. Zion, B., Shklyar, A., Karplus, I.: In-vivo fish sorting by computer vision. Aquacult. Eng. 22, 165–179 (2000)

    Article  Google Scholar 

  21. Saberioon, M., Gholizadeh, A., Císař, P., Pautsina, A., Urban, J.: Application of machine vision systems in aquaculture with emphasis on fish: state-of-the-art and key issues. Rev. Aquac. 9, 369–387 (2017)

    Article  Google Scholar 

  22. Hu, J., Li, D., Duan, Q., Han, Y., Chen, G., Si, X.: Fish species classification by color, texture and multi-class support vector machine using computer vision. Comput. Electron. Agric. 88, 133–140 (2012)

    Article  Google Scholar 

  23. Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)

    Article  Google Scholar 

  24. Li, L., Hong, J.: Identification of fish species based on image processing and statistical analysis research. In: Proc. IEEE International Conference on Mechatronics and Automation, pp. 1155–1160 (2014)

  25. Rodrigues, M.T.A., Freitas, M.H.G., Pádua, F.L.C., Gomes, R.M., Carrano, E.G.: Evaluating cluster detection algorithms and feature extraction techniques in automatic classification of fish species. Pattern Anal. Appl. 18(4), 783–797 (2015)

    Article  MathSciNet  Google Scholar 

  26. Hernández-Serna, A., Jiménez-Segura, L.F.: Automatic identification of species with neural networks. PeerJ. 2, e563 (2014)

    Article  Google Scholar 

  27. Salman, A., Jalal, A., Shafait, F., Mian, A., Shortis, M., Seager, J.W., Harvey, E.: Fish species classification in unconstrained underwater environments based on deep learning. Limnol. Oceanogr. Methods. 14, 570–585 (2016)

    Article  Google Scholar 

  28. LeCun, Y., F.J. Huang, Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 104–2 (2004)

  29. Alsmadi, M., Tayfour, M., Alkhasawneh, R., Badawi, U., Almarashdeh, I., Haddad, F.: Robust features extraction for general fish classification. Int. J. Electr. Comput. Eng. 9, 5192 (2019)

    Google Scholar 

  30. Iqbal, M.A., Wang, Z., Ali, Z., Riaz, S.: Automatic fish species classification using deep convolutional neural networks. Wirel. Pers. Commun. 116, 1043–1053 (2021)

    Article  Google Scholar 

  31. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proc. 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105 (2012)

  32. Yu, C., Fan, X., Hu, Z., Xia, X., Zhao, Y., Li, R., Bai, Y.: Segmentation and measurement scheme for fish morphological features based on mask R-CNN. Inf. Process. Agric. 7(4), 523–534 (2020)

    Google Scholar 

  33. Petrellis, N.: Measurement of fish morphological features through image processing and deep learning techniques. Appl. Sci. 11, 4416 (2021)

    Article  Google Scholar 

  34. Hao, M., Yu, H., Li, D.: The measurement of fish size by machine vision—a review. In: Proc. 9th International Conference on Computer and Computing Technologies in Agriculture, pp. 15–32 (2015)

  35. Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)

  36. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  37. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, pp. 740–755. Springer, Cham (2014). https://doi.org/10.48550/arXiv.1405.0312

  38. J. F. Bell Museum of Natural History: Fishes Collection. https://www.bellmuseum.umn.edu/fishes/ (2022)

  39. Skalski, P.: Make Sense. https://github.com/SkalskiP/make-sense/ (2019)

  40. Illinois Natural History Survey: INHS Fish Collection. https://fish.inhs.illinois.edu/ (2022)

  41. Cai, T., Zhu, F., Hao, Y., Fan, X.: Performance evaluation of image enhancement methods for objects detection and recognition. In: Proceedings of the SPIE: Image Processing and Analysis, vol. 9675. (2015). SPIE

  42. Pizer, S.M., Amburn, E.P., Austin, J.D., Cromartie, R., Geselowitz, A., Greer, T., ter Haar Romeny, B., Zimmerman, J.B., Zuiderveld, K.: Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 39(3), 355–368 (1987). https://doi.org/10.1016/S0734-189X(87)80186-X

    Article  Google Scholar 

  43. Manju, R.A., Koshy, G., Simon, P.: Improved method for enhancing dark images based on clahe and morphological reconstruction. Procedia Computer Science 165, 391–398 (2019). https://doi.org/10.1016/j.procs.2020.01.033. 2nd International Conference on Recent Trends in Advanced Computing ICRTAC -DISRUP - TIV INNOVATION, 2019 November 11-12, 2019

  44. Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)

  45. Singh, R., Biswas, M.: Adaptive histogram equalization based fusion technique for hazy underwater image enhancement. In: 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–5 (2016). https://doi.org/10.1109/ICCIC.2016.7919711

  46. Lucas, J., Calef, B., Knox, K.: Image enhancement for astronomical scenes. In: Tescher, A.G. (ed.) Applications of Digital Image Processing XXXVI, vol. 8856, pp. 12–19. SPIE, International Society for Optics and Photonics (2013). https://doi.org/10.1117/12.2025191

  47. Yu, H., Inoue, K., Hara, K., Urahama, K.: Saturation improvement in hue-preserving color image enhancement without gamut problem. ICT Express. 4(3), 134–137 (2018). https://doi.org/10.1016/j.icte.2017.07.003

    Article  Google Scholar 

  48. Trahanias, P.E., Venetsanopoulos, A.N.: Color image enhancement through 3-d histogram equalization. In: Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis, pp. 545–548 (1992). https://doi.org/10.1109/ICPR.1992.202045

  49. Reddy, K.S., Reddy, D.K.R.L.: Enlargement of image based upon interpolation techniques. Int. J. Adv. Res. Comput. Commun. Eng. 2(12), 4631 (2013)

    Google Scholar 

  50. Vidya, M.S., Shastry, A.H., Mallya, Y.: 4 - automated detection of intracranial hemorrhage in noncontrast head computed tomography. In: Koundal, D., Gupta, S. (eds.) Advances in Computational Techniques for Biomedical Image Analysis, pp. 71–98. Academic Press (2020). https://doi.org/10.1016/B978-0-12-820024-7.00004-9. https://www.sciencedirect.com/science/article/pii/B9780128200247000049

  51. Keys, R.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981). https://doi.org/10.1109/TASSP.1981.1163711

    Article  MathSciNet  MATH  Google Scholar 

  52. Turkowski, K.: Filters for common resampling tasks. Graphics Gems pp. 147–165 (1990). https://doi.org/10.1.1.116.7898

  53. Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2017)

  54. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  Google Scholar 

  55. Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M.H., Brett, M., Haldane, A., del Río, J.F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., Oliphant, T.E.: Array programming with NumPy. Nature. 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2

    Article  Google Scholar 

  56. van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T.: The scikit-image contributors: scikit-image: image processing in Python. PeerJ. 2, 453 (2014). https://doi.org/10.7717/peerj.453

    Article  Google Scholar 

  57. Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2

  58. Freeman, H.: On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. 10(2), 260–268 (1961). https://doi.org/10.1109/TEC.1961.5219197

  59. Schneider, C.A., Rasband, W.S., Eliceiri, K.W.: NIH Image to ImageJ: 25 years of image analysis. Nat. Methods. 9(7), 671–675 (2012)

    Article  Google Scholar 

  60. Hochreiter, S.: Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität München 91(1) (1991)

  61. Team, F.: FishBase. https://www.fishbase.de/search.php (2020)

  62. Torres, R.S., Medeiros, C.B., Gonçcalves, M.A., Fox, E.A.: A digital library framework for biodiversity information systems. Int. J. Digit. Libr. 6(1), 3–17 (2006)

    Article  Google Scholar 

Download references

Acknowledgements

We thank the full BGNN team for support, Chris A. Taylor, Curator of Fishes and Crustaceans at the Illinois Natural History Survey (INHS), and John Lyons, Curator of Fish at the University of Wisconsin Zoological Museum (UWZM). INHS and UWZM are two of six fish collections participating in the Great Lakes Invasives Network (GLIN).

Funding

Research supported by NSF Office of Advanced Cyberinfrastructure (OAC) #1940233 and #1940322.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joel Pepper.

Ethics declarations

Conflict of interest

The authors have declared that no conflict of interest exist.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karnani, K., Pepper, J., Bakiş, Y. et al. Computational metadata generation methods for biological specimen image collections. Int J Digit Libr (2022). https://doi.org/10.1007/s00799-022-00342-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00799-022-00342-1

Keywords

Navigation