Abstract
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.
Similar content being viewed by others
Code and data availability
Raw data for INHS is available here. Raw data for UWZM is available here. Reproducible code is available here.
References
Beaman, R.S., Cellinese, N.: Mass digitization of scientific collections: new opportunities to transform the use of biological specimens and underwrite biodiversity science. ZooKeys (2012). https://doi.org/10.3897/zookeys.209.3313
Page, L.M., MacFadden, B.J., Fortes, J.A., Soltis, P.S., Riccardi, G.: Digitization of biodiversity collections reveals biggest data on biodiversity. Bioscience 65(9), 841–842 (2015)
Tibbetts, J.H.: The frontiers of artificial intelligence. Bioscience 68(1), 5–10 (2018)
Darwin Core Quick Reference Guide. https://dwc.tdwg.org/terms/
Leipzig, J., Bakis, Y., Wang, X., Elhamod, M., Diamond, K., Dahdul, W., Karpatne, A., Maga, M., Mabee, P., Bart, H.L., Greenberg, J.: Biodiversity image quality metadata augments convolutional neural network classification of fish species. In: Garoufallou, E., Ovalle-Perandones, M.-A. (eds.) Metadata and Semantic Research, pp. 3–12. Springer, Cham (2021)
Pepper, J., Greenberg, J., Bakiş, Y., Wang, X., Bart, H., Breen, D.: Automatic metadata generation for fish specimen image collections. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 31–40 (2021). https://doi.org/10.1109/JCDL52503.2021.00015
University of Wisconsin Zoological Museum: Fishes Collection. https://uwzm.integrativebiology.wisc.edu/fishes-collection/ (2022)
Darwin Core Maintenance Group: List of Darwin Core terms. http://rs.tdwg.org/dwc/doc/list/ (2020)
GBIF/TDWG Multimedia Resources Task Group: Audubon Core Multimedia Resources Metadata Schema. http://www.tdwg.org/standards/638 (2013)
Chapman, A., Belbin, L., Zermoglio, P., Wieczorek, J., Morris, P., Nicholls, M., Rees, E.R., Veiga, A., Thompson, A., Saraiva, A., et al.: Developing standards for improved data quality and for selecting fit for use biodiversity data. Biodivers. Inf. Sci. Stand. 4, 50889 (2020)
Wieczorek, J., Bloom, D., Guralnick, R., Blum, S., Döring, M., Giovanni, R., Robertson, T., Vieglais, D.: Darwin core: an evolving community-developed biodiversity data standard. PLoS ONE 7(1), 29715 (2012)
Liddy, E.D., Allen, E., Harwell, S., Corieri, S., Yilmazel, O., Ozgencil, N.E., Diekema, A., McCracken, N., Silverstein, J., Sutton, S.: Automatic metadata generation & evaluation. In: Proc. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 401–402 (2002)
Greenberg, J.: Metadata extraction and harvesting: a comparison of two automatic metadata generation applications. J. Internet Catalog. 6(4), 59–82 (2004)
Cardinaels, K., Meire, M., Duval, E.: Automating metadata generation: the simple indexing interface. In: Proc. International Conference on World Wide Web, pp. 548–556 (2005)
Paynter, G.W.: Developing practical automatic metadata assignment and evaluation tools for internet resources. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’05), pp. 291–300 (2005). IEEE
Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proc. Joint Conference on Digital Libraries, pp. 37–48 (2003). IEEE
Rodriguez, M.A., Bollen, J., Sompel, H.V.D.: Automatic metadata generation using associative networks. ACM Trans. Inf. Syst. 27(2), 1–20 (2009)
Heidorn, P.B., Wei, Q.: Automatic metadata extraction from museum specimen labels. In: International Conference on Dublin Core and Metadata Applications, pp. 57–68 (2008)
Manso, M., Nogueras-Iso, J., Bernabe, M., Zarazaga-Soria, F.: Automatic metadata extraction from geographic information. In: 7th Conference on Geographic Information Science (AGILE 2004), Heraklion, Greece, pp. 379–385 (2004)
Zion, B., Shklyar, A., Karplus, I.: In-vivo fish sorting by computer vision. Aquacult. Eng. 22, 165–179 (2000)
Saberioon, M., Gholizadeh, A., Císař, P., Pautsina, A., Urban, J.: Application of machine vision systems in aquaculture with emphasis on fish: state-of-the-art and key issues. Rev. Aquac. 9, 369–387 (2017)
Hu, J., Li, D., Duan, Q., Han, Y., Chen, G., Si, X.: Fish species classification by color, texture and multi-class support vector machine using computer vision. Comput. Electron. Agric. 88, 133–140 (2012)
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)
Li, L., Hong, J.: Identification of fish species based on image processing and statistical analysis research. In: Proc. IEEE International Conference on Mechatronics and Automation, pp. 1155–1160 (2014)
Rodrigues, M.T.A., Freitas, M.H.G., Pádua, F.L.C., Gomes, R.M., Carrano, E.G.: Evaluating cluster detection algorithms and feature extraction techniques in automatic classification of fish species. Pattern Anal. Appl. 18(4), 783–797 (2015)
Hernández-Serna, A., Jiménez-Segura, L.F.: Automatic identification of species with neural networks. PeerJ. 2, e563 (2014)
Salman, A., Jalal, A., Shafait, F., Mian, A., Shortis, M., Seager, J.W., Harvey, E.: Fish species classification in unconstrained underwater environments based on deep learning. Limnol. Oceanogr. Methods. 14, 570–585 (2016)
LeCun, Y., F.J. Huang, Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 104–2 (2004)
Alsmadi, M., Tayfour, M., Alkhasawneh, R., Badawi, U., Almarashdeh, I., Haddad, F.: Robust features extraction for general fish classification. Int. J. Electr. Comput. Eng. 9, 5192 (2019)
Iqbal, M.A., Wang, Z., Ali, Z., Riaz, S.: Automatic fish species classification using deep convolutional neural networks. Wirel. Pers. Commun. 116, 1043–1053 (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proc. 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105 (2012)
Yu, C., Fan, X., Hu, Z., Xia, X., Zhao, Y., Li, R., Bai, Y.: Segmentation and measurement scheme for fish morphological features based on mask R-CNN. Inf. Process. Agric. 7(4), 523–534 (2020)
Petrellis, N.: Measurement of fish morphological features through image processing and deep learning techniques. Appl. Sci. 11, 4416 (2021)
Hao, M., Yu, H., Li, D.: The measurement of fish size by machine vision—a review. In: Proc. 9th International Conference on Computer and Computing Technologies in Agriculture, pp. 15–32 (2015)
Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, pp. 740–755. Springer, Cham (2014). https://doi.org/10.48550/arXiv.1405.0312
J. F. Bell Museum of Natural History: Fishes Collection. https://www.bellmuseum.umn.edu/fishes/ (2022)
Skalski, P.: Make Sense. https://github.com/SkalskiP/make-sense/ (2019)
Illinois Natural History Survey: INHS Fish Collection. https://fish.inhs.illinois.edu/ (2022)
Cai, T., Zhu, F., Hao, Y., Fan, X.: Performance evaluation of image enhancement methods for objects detection and recognition. In: Proceedings of the SPIE: Image Processing and Analysis, vol. 9675. (2015). SPIE
Pizer, S.M., Amburn, E.P., Austin, J.D., Cromartie, R., Geselowitz, A., Greer, T., ter Haar Romeny, B., Zimmerman, J.B., Zuiderveld, K.: Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 39(3), 355–368 (1987). https://doi.org/10.1016/S0734-189X(87)80186-X
Manju, R.A., Koshy, G., Simon, P.: Improved method for enhancing dark images based on clahe and morphological reconstruction. Procedia Computer Science 165, 391–398 (2019). https://doi.org/10.1016/j.procs.2020.01.033. 2nd International Conference on Recent Trends in Advanced Computing ICRTAC -DISRUP - TIV INNOVATION, 2019 November 11-12, 2019
Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)
Singh, R., Biswas, M.: Adaptive histogram equalization based fusion technique for hazy underwater image enhancement. In: 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–5 (2016). https://doi.org/10.1109/ICCIC.2016.7919711
Lucas, J., Calef, B., Knox, K.: Image enhancement for astronomical scenes. In: Tescher, A.G. (ed.) Applications of Digital Image Processing XXXVI, vol. 8856, pp. 12–19. SPIE, International Society for Optics and Photonics (2013). https://doi.org/10.1117/12.2025191
Yu, H., Inoue, K., Hara, K., Urahama, K.: Saturation improvement in hue-preserving color image enhancement without gamut problem. ICT Express. 4(3), 134–137 (2018). https://doi.org/10.1016/j.icte.2017.07.003
Trahanias, P.E., Venetsanopoulos, A.N.: Color image enhancement through 3-d histogram equalization. In: Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis, pp. 545–548 (1992). https://doi.org/10.1109/ICPR.1992.202045
Reddy, K.S., Reddy, D.K.R.L.: Enlargement of image based upon interpolation techniques. Int. J. Adv. Res. Comput. Commun. Eng. 2(12), 4631 (2013)
Vidya, M.S., Shastry, A.H., Mallya, Y.: 4 - automated detection of intracranial hemorrhage in noncontrast head computed tomography. In: Koundal, D., Gupta, S. (eds.) Advances in Computational Techniques for Biomedical Image Analysis, pp. 71–98. Academic Press (2020). https://doi.org/10.1016/B978-0-12-820024-7.00004-9. https://www.sciencedirect.com/science/article/pii/B9780128200247000049
Keys, R.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981). https://doi.org/10.1109/TASSP.1981.1163711
Turkowski, K.: Filters for common resampling tasks. Graphics Gems pp. 147–165 (1990). https://doi.org/10.1.1.116.7898
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2017)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M.H., Brett, M., Haldane, A., del Río, J.F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., Oliphant, T.E.: Array programming with NumPy. Nature. 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T.: The scikit-image contributors: scikit-image: image processing in Python. PeerJ. 2, 453 (2014). https://doi.org/10.7717/peerj.453
Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Freeman, H.: On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. 10(2), 260–268 (1961). https://doi.org/10.1109/TEC.1961.5219197
Schneider, C.A., Rasband, W.S., Eliceiri, K.W.: NIH Image to ImageJ: 25 years of image analysis. Nat. Methods. 9(7), 671–675 (2012)
Hochreiter, S.: Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität München 91(1) (1991)
Team, F.: FishBase. https://www.fishbase.de/search.php (2020)
Torres, R.S., Medeiros, C.B., Gonçcalves, M.A., Fox, E.A.: A digital library framework for biodiversity information systems. Int. J. Digit. Libr. 6(1), 3–17 (2006)
Acknowledgements
We thank the full BGNN team for support, Chris A. Taylor, Curator of Fishes and Crustaceans at the Illinois Natural History Survey (INHS), and John Lyons, Curator of Fish at the University of Wisconsin Zoological Museum (UWZM). INHS and UWZM are two of six fish collections participating in the Great Lakes Invasives Network (GLIN).
Funding
Research supported by NSF Office of Advanced Cyberinfrastructure (OAC) #1940233 and #1940322.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have declared that no conflict of interest exist.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Karnani, K., Pepper, J., Bakiş, Y. et al. Computational metadata generation methods for biological specimen image collections. Int J Digit Libr (2022). https://doi.org/10.1007/s00799-022-00342-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00799-022-00342-1