Efforts to model the human vision have led to tremendous advances in image processing technologies [1]. With this, digital image processing technology is making rapid growth that leads to the continuous development of signal processing technology, and is making remarkable progress in many fields, from video surveillance systems to autonomous vehicles and virtual reality. In particular, image quality improvement and object recognition technology have made breakthroughs in the medical field, such as cancer detection and diagnosis of various lesions that are difficult to see with human’s naked eye [2].

Traditional skin diagnosis consists of a patient’s medical history, clinical symptoms, skin examination images, biopsies, and combinations thereof, but training to become a skilled dermatologist is time-consuming, and the types of skin diseases are vast and look very similar to the human eye, making it difficult to accurately and effectively diagnose them. However, recent advances in artificial intelligence (AI), especially deep learning algorithms based on Convolutional Neural Networks (CNNs), have made it possible to effectively analyze important features of skin diseases, and some AI models have shown remarkable diagnostic capabilities that can reach or even surpass dermatologist levels [3]. Deep learning technology has become an indispensable technology in the recognition and classification of lesions in the medical field [4].

Meanwhile, as society continues to develop, interest in scalp and skin is increasing day by day. In particular, in the case of scalp-related symptoms such as dandruff, erythema, and itchiness, early detection and management of them is helpful in preventing severe diseases such as hair loss, chronic scalp inflammation, and skin cancer [5]. But, it is easy to miss the timing of early diagnosis due to the lack of specialists, difficulties in visiting medical institutions, and patient’s insufficient knowledge [6], and the demand for a system that quickly and easily determines the condition of the scalp that enables the patient’s immediate action is greatly increasing.

Besides, number of studies have been conducted to analyze and diagnose various lesions, especially skin conditions, using artificial intelligence and deep learning technology. A study [5] that effectively analyzes the image of the scalp by using the existing deep learning model as transfer learning helps a lot in scalp diagnosis and related services, but results in unnecessary overfitting because it requires too many pre-processing steps.

Other effective studies [6, 7] using deep learning technologies adopt the user’s self-examination as the main input data, which is directly affected by the user’s subjective judgment, so it is difficult to secure its reliability and homeostasis.

Another study [8] uses deep learning technology to effectively classify 134 skin and scalp lesions, but only discriminates the presence or absence of lesions which has limitations in medical use that can vary depending on the severity of the symptoms.

In addition, various skin analysis systems [9, 10] using transfer learning [3, 11,12,13] have been proposed. The deep learning models of VGGNet [14] and VGG-F [15], respectively, are efficiently applied to the analysis of the scalp, but their accuracy do not reach the groundbreaking level.

In this paper, 76,000 scalp images are used in the training and validation of the proposed algorithm. The dataset is obtained from the Scalp Images by Type [7], published by the Korea Intelligent Information Society Agency [16]. The authorized imaging devices for acquiring scalp images are used, and the images are classified depending on the type and severity of designated lesions and confirmed twice by three dermatologists at Seoul National University Bundang Hospital.

As a result of comparing and analyzing various scalp lesions, 6 lesions are selected which are the most significant and measurable by images; microkeratin, dandruff, hair loss, oily scalp, erythema, and pustule. For each lesion of the scalp, the images are classified into 4 stages according to the severity. All the images are encoded as JPEG, with the resolution of 640 × 480 and the size of 225–263 kBytes. Examples of the images are shown in Fig. 1.

Fig. 1
figure 1

Example of scalp images classified by the severity of erythema; a normal, b mild, c moderate, d severe

In addition, data augmentation for increasing the number of the image data is used in order to prevent the degradation of performance. It is known that in the training process of deep learning technology, if the number of training data is insufficient or greatly different for each class to be classified, defects such as overfitting may occur, resulting in serious performance degradation [11]. Image data augmentation methods include movement, inversion, rotation, and contrast adjustment in the vertical and horizontal directions of the image [11], and these methods are randomly selected and applied. After data augmentation, the number of image data is intensionally set equal for each class. The number of the image data before and after augmentation, for each lesion and severity is shown in Table 1. Note that the normal image set is applied to each lesion for training.

Table 1 Number of image data before and after augmentation

As a result of closely examining the obtained data and analyzing various scalp lesions, the condition of the scalp is not discriminated by a specific object, but by the texture of the scalp surface, that is, the roughness, color, size, type and periodicity of specified pattern of the skin. Therefore, a novel deep learning network is proposed in this paper, with a relatively large number of feature maps in a small number of layers, which is a structure suitable for surface image analysis.

In addition, the proposed algorithm does not include any information of the location of the scalp nor metadata such as replies answered by the users which inevitably results in inconsistency and instability, so the usability and scalability of the algorithm are thoroughly secured.

The architecture of the proposed deep learning network is shown in Fig. 2, where 6 convolution layers, 5 pooling layers, 6 dropout layers, and 3 fully connected layers are utilized.

Fig. 2
figure 2

Architecture of the proposed algorithm

The proposed algorithm is implemented in Python 3.8 using the Tensorflow library, and the experiment is conducted using the deep learning network shown in Fig. 2 with the batch size 16 and the number of epochs 10. Also, in the training and validation process, Intel i7-9700 3.00 GHz CPU, Samsung DDR4 64 GB RAM, and NVIDIA GeForce GTX 1660 GPU are used and it takes approximately 34 h for overall training and validation.

As a result of using 20% of the total data for validation, an accuracy as the maximum of 98.78% (in the case of erythema) is obtained. Accuracies according to each lesion are; microkeratin 94.91%, dandruff 98.00%, hair loss 98.02%, oily scalp 94.15%, erythema 98.78%, pustule 92.90% with the overall average of 96.13% (Before the image data augmentation, the average accuracy of the proposed algorithm is obtained lower than 70%). The average accuracy is increased by up to 6.62% compared to the existing algorithms, as shown in Fig. 3. Note that in Fig. 3, the numbers in parentheses in each bar graph indicate the degree of improvement when comparing the proposed algorithm with each existing algorithm.

Fig. 3
figure 3

Accuracy of the proposed algorithm compared with the existing algorithms

In conclusion, the proposed algorithm is more accurate than existing studies, and it is possible to easily discriminate the condition of the scalp regardless of prior interview or the location of the scalp. The result of this study will be effectively applied to other skin areas such as the face, and the diagnosis of other diseases including skin cancer, thereby contributing to the development of diagnostic technology and the development of the medical field.