Abstract
Image pattern recognition in the field of big data has gained increasing importance and attention from researchers and practitioners in many domains of science and technology. This paper focuses on the usage of image pattern recognition for big data applications. In this context, the taxonomy of image pattern recognition and big data is revealed. The applications of image pattern recognition for big data, including multimedia, biometrics, and biology/biomedical, are also highlighted. Moreover, the significance of using pattern-based feature reduction in big data is discussed, and machine-learning techniques in pattern recognition applications are presented. A comparison based on the objectives of the approaches is presented to underline the taxonomy. This paper provides a novel review in exploring image recognition approaches for big data, which can be used in future research.
Similar content being viewed by others
Change history
20 September 2022
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s11042-022-13989-4
References
Abaei G, Selamat A, Fujita H (2015) An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl-Based Syst 74:28–39
Agarwal A, Sharma AK, Khandelwal S (2016) Fingerprint recognition system by termination points using cascade-forward backpropagation neural network. Proceedings of the International Congress on Information and Communication Technology. Springer, Singapore
Aït-Sahalia Y, Xiu D (2015) Principal component analysis of high frequency data. No. w21584. National Bureau of Economic Research
Aksoy S (2010) Introduction to Pattern Recognition. Department of Computer Engineering, Bilkent University, saksoy@ cs. bilkent. edu. tr
Alginahi YM (2013) A survey on Arabic character segmentation. Int J Doc Anal Recognit (IJDAR) 16(2):105–126
Almeida LG, Backović M, Cliche M, Lee SJ, Perelstein M (2015) Playing tag with ANN: boosted top identification with pattern recognition. J High Energy Phys 2015(7):86
Álvarez-Meza A, Valencia-Aguirre J, Daza-Santacoloma G, Castellanos-Domínguez G (2011) Global and local choice of the number of nearest neighbors in locally linear embedding. Pattern Recogn Lett 32(16):2171–2177
Amin A, Fischer S (2000) A document skew detection method using the Hough transform. Pattern Anal Applic 3(3):243–253
Artigas-Fuentes F, Gil-García R, Badía-Contelles J, Pons-Porrata A (2010) Fast k-NN classifier for documents based on a graph structure. In: Bloch I, Cesar RM (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2010. Lecture Notes in Computer Science, vol 6419. Springer, Berlin, Heidelberg
Astudillo CA, Oommen BJ (2013) On achieving semi-supervised pattern recognition by utilizing tree-based SOMs. Pattern Recogn 46(1):293–304
Bacry E, Gaïffas S, Muzy J-F (2015) A generalization error bound for sparse and low-rank multivariate Hawkes processes. arXiv preprint arXiv:1501.00725
Bao S, Landman B, Gokhale A (2017) Algorithmic Enhancements to Big Data Computing Frameworks for Medical Image Processing. In: Cloud Engineering (IC2E), 2017 I.E. International Conference on. IEEE
Ben-David S, Eiron N, Long PM (2003) On the difficulty of approximately maximizing agreements. J Comput Syst Sci 66(3):496–514
Bennet J, Ganaprakasam C, Kumar N (2015) A Hybrid Approach for Gene Selection and Classification using Support Vector Machine. Int Arab J Inf Technol (IAJIT) 12
Bigdeli B, Samadzadegan F, Reinartz P (2015) Fusion of hyperspectral and LIDAR data using decision template-based fuzzy multiple classifier system. Int J Appl Earth Obs Geoinf 38:309–320
Bluche T (2015) Deep Neural Networks for Large Vocabulary Handwritten Text Recognition. Université Paris Sud-Paris XI
Bolivar-Cime A, Marron JS (2013) Comparison of binary discrimination methods for high dimension low sample size data. J Multivar Anal 115:108–121
Bonissone P, Cadenas JM, Garrido MC, Díaz-Valladares RA (2010) A fuzzy random forest. Int J Approx Reason 51(7):729–747
Boubaker H, Rezzoug N, Kherallah M, Gorce P, Alimi AM (2015) Spatiotemporal representation of 3D hand trajectory based on beta-elliptic models. Comput Methods Biomech Biomed Engin 18(15):1632–1647
Breuel TM (2008) The OCRopus open source OCR system. In: Electronic Imaging 2008. International Society for Optics and Photonics
Cervantes J, Li X, Yu W, Li K (2008) Support vector machine classification for large data sets via minimum enclosing ball clustering. Neurocomputing 71(4–6):611–619
Chang V (2015) Towards a Big Data system disaster recovery in a Private Cloud. Ad Hoc Netw 35:65–82
Chang V, Kuo Y-H, Ramachandran M (2016) Cloud computing adoption framework: A security framework for business clouds. Futur Gener Comput Syst 57:24–41
Chapelle O, Scholkopf B, Zien A (2009) Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006)[Book reviews]. IEEE Trans Neural Netw 20(3):542–542
Chaudhuri K, Kakade SM, Netrapalli P, Sanghavi S (2015) Convergence rates of active learning for maximum likelihood estimation. Advances in Neural Information Processing Systems, p 1090–1098
Che D, Safran M, Peng Z (2013) From Big Data to Big Data Mining: Challenges, Issues, and Opportunities, in Database Systems for Advanced Applications. In: Hong B et al (eds). Springer Berlin Heidelberg, pp 1–15
Chen F, Deng P, Wan J, Zhang D, Vasilakos AV, Rong X (2015) Data mining for the internet of things: literature review and challenges. Int J Distrib Sens Netw. doi:10.1155/2015/431047
Cherkassky V, Friedman JH, Wechsler H (2012) From statistics to neural networks: theory and pattern recognition applications, vol 136. Springer Science & Business Media
Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23:493–507
Cheung A, Bennamoun M, Bergmann NW (2001) An Arabic optical character recognition system using recognition-based segmentation. Pattern Recogn 34(2):215–233
Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. Adv Neural Inf Proces Syst 28:577–585
Coronel C, Morris S (2016) Database Systems: Design, Implementation, & Management. Cengage Learning
Cruz-Roa A, Caicedo JC, González FA (2011) Visual pattern mining in histology image collections using bag of features. Artif Intell Med 52(2):91–106
Daza-Santacoloma G, Acosta-Medina CD, Castellanos-Domínguez G (2010) Regularization parameter choice in locally linear embedding. Neurocomputing 73(10):1595–1605
Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient kNN classification algorithm for big data. Neurocomputing 195:143–148
Di Martino M, Hernández G, Fiori M, Fernández A (2013) A new framework for optimal classifier design. Pattern Recogn 46(8):2249–2255
Ding C, Choi J, Tao D, Davis LS (2016) Multi-directional multi-level dual-cross patterns for robust face recognition. IEEE Trans Pattern Anal Mach Intell 38(3):518–531
Dirick L, Claeskens G, Baesens B (2015) An Akaike information criterion for multiple event mixture cure models. Eur J Oper Res 241(2):449–457
Duda RO, Hart PE, Stork DG (2012) Pattern classification. John Wiley & Sons, New York
El Kessab B, Daoui C, Bouikhalene B, Salouan R (2015) A comparison between the performances of several distances for Isolated Handwritten Arabic Numerals Recognition. International Journal of Signal Processing, Image Processing and Pattern Recognition 8(6):9–14
Fan Z, Campanelli MR (2013) Augmenting page orientation direction detection in a digital document processing environment. Google Patents
Fathy ME, Patel VM, Chellappa R (2015) Face-based active authentication on mobile devices. In: 2015 I.E. International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
Fehlings MG, Nater A, Zamorano JJ, et al. (2016) Risk factors for recurrence of surgically treated conventional spinal schwannomas: analysis of 169 patients from a multicenter international database. Spine 41(5):390
Fernández A, Gómez A, Lecumberry F, Pardo A, Ramírez I (2015) Pattern recognition in Latin America in the “Big Data” era. Pattern Recogn 48(4):1185–1196
Torres, R., R. E. Lillo, and H. Laniado. "95. MULTIVARIATE RISK MEASURES: A DIRECTIONAL APPROACH FORA VALUE AT RISK."Current Topics on Risk Analysis: ICRA6 and RISK 2015 Conference.
Franco-Arcega A, Carrasco-Ochoa JA, Sánchez-Díaz G, Martínez-Trinidad JF (2011) Decision tree induction using a fast splitting attribute selection for large datasets. Expert Systems with Applications 38(11):14290–14300
Galaz-Montoya JG, Hecksel CW, Baldwin PR, Wang E, Weaver SC, Schmid MF, Ludtke ST, Chiu W (2016) Alignment algorithms and per-particle CTF correction for single particle cryo-electron tomography. J Struct Biol 194(3):383–394
Gallistel CR, Wilkes JT (2016) Minimum description length model selection in associative learning. Curr Opin Behav Sci 11:8–13
Gkarmiri K, Finlay RD, Alström S, Thomas E, Cubeta MA, Högberg N (2015) Transcriptomic changes in the plant pathogenic fungus Rhizoctonia solani AG-3 in response to the antagonistic bacteria Serratia proteamaculans and Serratia plymuthica. BMC Genomics 16:630
Gokhale M, Cohen J, Yoo A, Miller WM, Jacob A, Ulmer C, Pearce R (2008) Hardware technologies for high-performance data-intensive computing. Computer. doi:10.1109/MC.2008.125
Gonzalez EC, Figueroa K, Navarro G (2008) Effective proximity retrieval by ordering permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658
Gruber L, Czado C (2015) Sequential bayesian model selection of regular vine copulas. Bayesian Anal 10(4):937–963
Gupta S, Rana S, Saha B, Phung D, Venkatesh S (2016) A new transfer learning framework with application to model-agnostic multi-task learning. Knowl Inf Syst 49(3):933–973
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Elsevier, Burlington
Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: Review and open research issues. Inf Syst 47:98–115
Hashem IAT, Anuar NB, Marjani M, Gani A, Sangaiah AK, Sakariyah AK (2017) Multi-objective scheduling of MapReduce jobs in big data processing. Multimedia Tools and Applications 1–16.
Hashem IAT, Anuar NB, Gani A, Yaqoob I, Xia F, Khan SU (2016) MapReduce: Review and open challenges. Scientometrics 109(1):389–422
He Z (2016) Evolutionary K-Means with pair-wise constraints. Soft Comput 20(1):287–301
Hein M, Lugosi G, Rosasco L (2016) Mathematical and Computational Foundations of Learning Theory (Dagstuhl Seminar 15361). Dagstuhl Rep 5(8)
Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Hou Y, Whang JJ, Gleich DF, Dhillon IS (2015) Non-exhaustive, overlapping clustering via low-rank semidefinite programming. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York
Huo B, Li G, Yin F (2015) Medical and Natural Image Segmentation Algorithm using MF based Optimization Model and Modified Fuzzy Clustering: A Novel Approach. Int J Signal Process Image Process Pattern Recogn 8(7):223–234
Inbarani HH, Bagyamathi M, Azar AT (2015) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput & Applic 26(8):1859–1880
Jordan M, Mitchell T (2015) Machine learning: Trends, perspectives, and prospects. Science 349(6245):255–260
Joulin A, Mikolov T (2015) Inferring algorithmic patterns with stack-augmented recurrent nets. In: Advances in Neural Information Processing Systems
Kadane JB (2015) Bayesian methods for prevention research. Prev Sci 16(7):1017–1025
Kannan RJ, Subramanian S (2015) An Adaptive Approach of Tamil Character Recognition Using Deep Learning with Big Data-A Survey. In: Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India (CSI), vol 1. Springer
Kavianifar M, Amin A (1999) Preprocessing and structural feature extraction for a multi-fonts Arabic/Persian OCR. In: Document Analysis and Recognition, 1999. ICDAR'99. Proceedings of the Fifth International Conference on. IEEE
Khan K, Ullah Khan R, Alkhalifah A, Ahmad N (2015) Urdu text classification using decision trees. High-Capacity Optical Networks and Enabling/Emerging Technologies (HONET), 2015 12th International Conference on. IEEE, Islamabad
Khoshnevisan B, Bolandnazar E, Barak S, Shamshirband S, Maghsoudlou H, Altameem TA, Gani A (2015) A clustering model based on an evolutionary algorithm for better energy use in crop production. Stoch Env Res Risk A 29(8):1921–1935
Koppers S, Hebisch C, Merhof D (2016) Feature Selection Framework for White Matter Fiber Clustering Based on Normalized Cuts. Bildverarbeitung für die Medizin
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: A review of classification techniques. 3–24.
Koukouli ME, Lerot C, Granville J, Goutail F, Lambert J-C, Pommereau J-P, Balis D, Zyrichidou I, Van Roozendael M, Coldewey-Egbers M, Loyola D, Labow G, Frith S, Spurr R, Zehner C (2015) Evaluating a new homogeneous total ozone climate data record from GOME/ERS-2, SCIAMACHY/Envisat, and GOME-2/MetOp-A. J Geophys Res Atmos 120(23):12,296–12,312
Kvarnhammar AM, Cardell LO (2012) Pattern-recognition receptors in human eosinophils. Immunology 136(1):11–20
Lauer F, Suen CY, Bloch G (2007) A trainable feature extractor for handwritten digit recognition. Pattern Recogn 40(6):1816–1824
Lee I (2017) Big data: Dimensions, evolution, impacts, and challenges. Bus Horiz 60(3):293–303
Li X, Chen L, Zhang L, Lin F, Ma W-Y (2006) Image annotation by large-scale content-based image retrieval. Proceedings of the 14th ACM international conference on Multimedia. ACM, New York, p 607–610
Liu C (2016) A Chernoff bound for branching random walk. arXiv preprint arXiv:1604.00056
Luqman H, Mahmoud SA, Awaida S (2015) Arabic and Farsi Font Recognition: Survey. Int J Pattern Recognit Artif Intell 29(01):1553002
Lv Z, Song H, Basanta-Val P, Steed A, Jo M (2017) Next-generation big data analytics: State of the art, challenges, and future research topics. IEEE Transactions on Industrial Informatics. doi:10.1109/TII.2017.2650204
Ma Z, Tavares JMR (2015) A review of the quantification and classification of pigmented skin lesions: From dedicated to hand-held devices. J Med Syst 39(11):1–12
Ma GW, Xu ZH, Zhang W, Li SC (2015) An enriched K-means clustering method for grouping fractures with meliorated initial centers. Arab J Geosci 8(4):1881–1893
Madooei A, Drew MS (2015) Detecting specular highlights in dermatological images. in Image Processing (ICIP), 2015 I.E. International Conference on. IEEE
Maldonado S, Weber R (2009) A wrapper method for feature selection using support vector machines. Inf Sci 179(13):2208–2217
Mandal B, Sethi S, Sahoo RK (2015) Architecture of efficient word processing using Hadoop MapReduce for big data applications. In: 2015 International Conference on Man and Machine Interfacing (MAMI). IEEE
Matty M, Mansfield P, Hallinen K, Albert J, Swendsen RH (2015) Cluster simulations of multi-spin Potts models. J Stat Mech: Theory Exp 2015(1):P01026
Meng Z, Pao Y-H (2000) Visualization and self-organization of multidimensional data through equalized orthogonal mapping. IEEE Trans Neural Netw 11(4):1031–1038
Merkevičius E, Garšva G (2015) Prediction of changes of bankruptcy classes with neuro-discriminate model based on the self-organizing maps. Inf Technol Control 36(1)
Meysman P, Zhou C, Cule B, Goethals B, Laukens K (2015) Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns. BioData mining 8(1):4
Mohammed RA, Nabi RM, Mahmood SM-R, Nabi RM (2015) State-of-the-art in handwritten signature verification system. Computational Science and Computational Intelligence (CSCI), 2015 International Conference on. IEEE, Las Vegas
Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. MIT press, Cambridge
Nixon MS, Aguado AS (2012) Feature extraction & image processing for computer vision. Academic Press, Oxford
Mervis J (2012) Agencies rally to tackle big data. Science 336(6077):22–22
O’Leary DE (2013) Artificial intelligence and big data. IEEE Intell Syst 28(2):96–99
Olivier G, Miled H, Romeo K (1996) Segmentation and coding of Arabic handwritten words. Pattern Recognition, 1996., Proceedings of the 13th International Conference on. Vol. 3. IEEE, Vienna
Pao Y-H, Meng Z (1998) Visualization and the understanding of multidimensional data. Eng Appl Artif Intell 11(5):659–667
Pao Y-H, Shen C-Y (1997) Visualization of pattern data through learning of non-linear variance-conserving dimension-reduction mapping. Pattern Recogn 30(10):1705–1717
Papa JP, Cappabianco FAM, Falcao AX (2010) Optimizing optimum-path forest classification for huge datasets. Pattern Recognition (ICPR), 2010 20th International Conference on. IEEE, Istanbul
Parthasarathi SHK, Hoffmeister B, Matsoukas S, Mandal A, Strom N, Garimella S (2015) fMLLR based feature-space speaker adaptation of DNN acoustic models. INTERSPEECH 3630–3634
Parvez MT, Mahmoud SA (2013) Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recogn 46(1):141–154
Patel M, Reddy SL, Naik AJ (2015) An Efficient Way of Handwritten English Word Recognition. In: Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Springer
Patil H, Kothari A, Bhurchandi K (2015) 3-D face recognition: features, databases, algorithms and challenges. Artif Intell Rev 44(3):393–441
Pattin KA, Greene AC, Altman RB, et al (2015) Training the next generation of quantitative biologists in the era of big data. Pac Symp Biocomput 20:488–92
Peña-Ayala A (2014) Educational data mining: A survey and a data mining-based analysis of recent works. Expert Syst Appl 41(4):1432–1462
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, Minneapolis
Porro-Munoz D, Duin RPW, Orozco-Alzate M, Talavera I, Londono-Bonilla JM (2010) Classifying three-way seismic volcanic data by dissimilarity representation. Pattern Recognition (ICPR), 2010 20th International Conference on. IEEE, Istanbul
Qin H, Li X, Yang Z, Shang M (2015) When underwater imagery analysis meets deep learning: a solution at the age of big visual data. OCEANS 2015 - MTS/IEEE Washington, Washington DC
Radtke JP, Schwab C, Wolf MB, Freitag MT, Alt CD, Kesch C, Popeneciu IV, Huettenbrink C, Gasch C, Klein T, Bonekamp D, Duensing S, Roth W, Schueler S, Stock C, Schlemmer HP, Roethke M, Hohenfellner M, Hadaschik BA (2016) Multiparametric magnetic resonance imaging (MRI) and MRI–transrectal ultrasound fusion biopsy for index tumor detection: correlation with radical prostatectomy specimen. Eur Urol 70(5):846–853
Rahman MN, Esmailpour A, Zhao J (2016) Machine learning with big data an efficient electricity generation forecasting system. Big Data Research 5:9–15
Raith S, Vogel EP, Anees N, Keul C, Güth JF, Edelhoff D, Fischer H (2017) Artificial Neural Networks as a powerful numerical tool to classify specific features of a tooth based on 3D scan data. Comput Biol Med 80:65–76
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Rueda L, Herrera M (2008) Linear dimensionality reduction by maximizing the Chernoff distance in the transformed space. Pattern Recogn 41(10):3138–3152
Schowengerdt, Robert A. Techniques for image processing and classifications in remote sensing. Academic Press, 2012.
Schuelke-Leech B-A, Barry B, Muratori M, Yurkovich BJ (2015) Big Data issues and opportunities for electric utilities. Renew Sust Energ Rev 52:937–947
Sharma R, Pachori RB (2015) Classification of epileptic seizures in EEG signals based on phase space representation of intrinsic mode functions. Expert Syst Appl 42(3):1106–1117
Shen X, Liao W-K, Choudhary A, Memik G, Kandemir M (2003) A high-performance application data environment for large-scale scientific computations. IEEE Trans Parallel Distrib Syst 14(12):1262–1274
Spera E, Tegolo D, Valenti C (2015) Segmentation and feature extraction in capillaroscopic videos. Proceedings of the 16th International Conference on Computer Systems and Technologies. ACM, New York
Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using LSTMs. arXiv preprint arXiv:1502.04681
Taylor I, Taylor MM (2014) Writing and Literacy in Chinese, Korean and Japanese: Revised edition, vol 14. John Benjamins Publishing Company
Vajda S, Rangoni Y, Cecotti H (2015) Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition. Pattern Recogn Lett 58:23–28
Valle E, Cord M, Philipp-Foliguet S, Gorisse D (2010) Indexing personal image collections: a flexible, scalable solution. IEEE Transactions on Consumer Electronics, Institute of Electrical and Electronics Engineers 56(3):1167–1175
Wei R (2015) Increasing the Journal’s Impact in the Age of Big Data Analytics. Mass Commun Soc 18:1–3
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowl. Inf. Syst 14(1):1–37.
Zhou L, Pan S, Wang J, Vasilakos A (2017) Machine learning on big data: Opportunities and challenges. Neurocomputing 237:350–361
Zikopoulos PC (2012) Understanding big data: analytics for enterprise class Hadoop and streaming data (1st ed.). McGraw-Hill Osborne Media, New York
Acknowledgments
This paper is supported by the Malaysian Ministry of Education under the University of Malaya.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Zerdoumi, S., Sabri, A.Q.M., Kamsin, A. et al. RETRACTED ARTICLE: Image pattern recognition in big data: taxonomy and open challenges: survey. Multimed Tools Appl 77, 10091–10121 (2018). https://doi.org/10.1007/s11042-017-5045-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5045-7