1 Introduction

1.1 Basic clinical information about lymph

Malignant lymphoma is a tumor that originates in the lymph nodes and extranodal lymphoid tissues, accounting for 3.37% of all malignant tumors in the world (International Agency for Research on Cancer, http://gco.iarc.fr/). Annually, approximately 280,000 individuals worldwide receive a diagnosis of lymphoid malignancies, which comprise at least 38 different subtypes according to the World Health Organization (WHO) classification (Rubinstein et al. 2022). Pathologists rely on histopathological examination of tissue sections at various magnification levels for the diagnosis and classification of lymphoma. The traditional lymphoma diagnosis relies on performing a surgical biopsy of suspicious lesions, followed by applying hematoxylin and eosin (H &E) staining. This staining technique allows observing abnormal cells’ growth patterns and morphological characteristics. The morphological analysis conducted by pathologists, which includes assessing tissue structure, cell shape, and staining characteristics, remains an essential criterion for the routine histopathological staging and grading of lymph nodes (Isabelle et al. 2010; Pannu et al. 2000).

To make the above content more intuitive, we can observe the acquisition process of the histopathological image in Fig. 1. First, a biological sample is taken from a living body. Then, the biopsy is fixed to avoid chemical changes in the tissue (Hewitson and Darby 2010). After that, the tissue is cut into sections to be placed on a glass slide for staining. Then the stained tissue is covered on the slide with a thin piece of plastic or glass, which can protect the tissue and facilitate observation under a microscope. Finally, the slide is observed and digitized with a microscope.

Fig. 1
figure 1

The process of histopathology image acquisition. af are taken from http://library.med.utah.edu/WebPath/HISTHTML/HISTOTCH/HISTOTCH.html. g corresponds to Fig. 1(a) in Holten-Rossing (2018))

In this paragraph, we introduce the morphological features of some typical lymphomas. Because every type of lymphoma has a characteristic morphology, many lymphomas can be accurately subcategorized based on morphological characteristics (Xu et al. 2002). In NHL, follicular lymphoma (FL) is the standard type of B-cell lymphoma (as shown in Fig. 2a). FL cells usually show a distinct follicular-like growth pattern, with follicles similar in size and shape, with unclear boundaries. DLBCL is a diffusely hyperplastic large B cell malignant tumor. The cell morphology of DLBCL is diverse. The nucleus is round or oval, with single or multiple nucleoli (as shown in Fig. 2b). In HL, NLPHL presents a vaguely large nodular shape, and the background structure is an extensive spherical network composed of follicular dendritic cells (as shown in Fig. 2c).

Fig. 2
figure 2

Pathological pictures of three types of lymphomas (H&E stain). a corresponds to Fig. 6.67(b) in  Feller and Diebold (2003). b corresponds to Fig. 6.61 in  Weinstein et al. (2007). c corresponds to Fig. 3.1 in  Engert and Vassilakopoulos (2016)

1.2 The development of machine vision in the diagnosis and treatment of lymphoma

Many diseases can be evaluated and diagnosed by analyzing cells and tissues. Now, Computer-aided Diagnosis (CAD) systems become a significant research subject (Bergeron 2017). A CAD system generally consists of three parts (Samsi 2012). First, CAD systems segment the object from the background tissue. Second, CAD systems extract features related to the classification. Finally, a diagnosis result based on the extracted features is obtained from a classifier. In recent years, CAD has become a significant medical image analysis research direction. Usually, the medical images that require computer assistance are histopathological slides. Compared with pathologists, the analysis of diseases by CAD systems is performed quickly, and usually, only one sample image is needed to get accurate results (Di Ruberto et al. 2015). With CAD systems, pathological images can be automatically classified. This can enable doctors to be more efficient in diagnosing diseases, and the diagnosis results can be more accurate and objective (Zhang and Metaxas 2016; Zhang et al. 2015). Today, CAD is used in many fields of medicine. For example, CAD systems play a critical role in the early detection of breast cancer (Rangayyan et al. 2007), lung cancer diagnosis (Reeves and Kostis 2000), and arrhythmia detection (Oweis and Hijazi 2006). In Kourou et al. (2015), a CAD system is mainly used in cancer diagnosis and prognosis.  Kong et al. (2008) describe a CAD system that can classify stromal development and neuroblastoma grades. A CAD system is composed of image preprocessing, segmentation, feature extraction, feature reduction, detection and classification, and post-processing modules.

In CAD systems, machine vision (MV) is an important research direction. For example, in Lee et al. (2018), a CAD system based on deep learning is developed to locate and differentiate metastatic lymph nodes of thyroid cancer using ultrasound images, a thriving example of MV as a high-sensitivity screening tool. At present, MV has been used in agriculture (Patrício and Rieder 2018), medicine (Obermeyer and Emanuel 2016), military fields (Budiharto et al. 2020) and various industrial applications. As shown in Fig. 3, the applications of MV in the diagnosis and treatment of lymphoma mainly include segmentation, classification, and detection. Besides, the number of applications and tasks has increased, such as quantitative analysis (Shi et al. 2016), analysis of characteristics of lymphoma (Martins et al. 2019), and so on.

However, CAD has the following disadvantages. First, CAD requires a large-scale dataset to train the neural network or machine learning algorithm. However, medical images are challenging to acquire, and the data sets of medical images are quite small, which makes CAD poor for recognition results. In addition, since staining techniques vary among pathologists, there are large variations in the data from different batches, making the training process for CAD more difficult. Finally, since current image classification models require training and inference on large computing devices, the deployment of CAD at terminals is currently one of the current research difficulties (Prakash et al. 2023). The diagnosis and classification of lymphoma are crucial in clinical treatment and prognosis. Accurate diagnosis heavily relies on the expertise of pathologists. When dealing with follicular hyperplasia, it is important for pathologists to accurately differentiate between follicular lymphoma (FL) and follicular hyperplasia (FH), as these lesions can sometimes have very similar characteristics. Histopathological screening of different lymphoma types often poses challenges for pathologists, primarily due to the subtle variations in histological appearance and the complexities involved in visually differentiating between various types of lymphomas. This process is prone to inconsistencies and variations in interpretation between different observers and laboratories. Studies have indicated that there is a 20% variation in lymphoma diagnoses by pathologists (Pathologique 2017; Bowen et al. 2014; Matasar et al. 2012).

Therefore, this review addresses the following problems. First, it focuses on the urgency of quickly identifying the problems and methods that need to be solved for lymphoma histopathology images. Second, it aims to assist pathologists in accurately identifying the focal areas of lymphatic histopathology images.

Fig. 3
figure 3

The development trend of the applications of MV in the diagnosis and treatment of lymphoma. The horizontal axis represents the year, and the vertical axis represents the number of studies

1.3 Motivation of this review

This paper comprehensively reviews the MV methods for lymphatic histopathology image analysis. The motivation of this study is to study the popular technologies and trends of MV and to explore the future potential of MV in the diagnosis and treatment of lymphoma. In the course of our work, we found that some survey papers are related to the applications of MV in this area, and the comparison of the number of lymphatic-related papers and total papers in other reviews is shown in Table 1.

Table 1 Comparison of number of lymph papers and total papers in other reviews

From Table 1, we can find the following drawbacks. First, some survey papers only focus on cancer diagnosis and detection by medical image analysis, with a lack of description of detailed methods (Loukas and Linney 2004; Cai et al. 2008; Gurcan et al. 2009; He et al. 2012; Kumar et al. 2013; Irshad et al. 2013; Arevalo et al. 2014; Madabhushi and Lee 2016; Jothi and Rajam 2017; Litjens et al. 2017; Ker et al. 2017; Robertson et al. 2018; Bera et al. 2019; Zhou et al. 2020). Second, some survey papers review medical images for a single method, without presenting all MV methods (Cruz and Wishart 2006; Belsare and Mushrif 2012; Greenspan et al. 2016). Finally, none of the survey papers contributed to a comprehensive summary of the applications of MV in LHIA. In order to address the above-mentioned drawbacks, we investigate from the lymphatic histopathology image dataset, MV methods and LHIA tasks. Therefore, we refer to the methodology of other field reviews (Jahanbakhshi et al. 2021a, b, c), this study summarizes more than 170 works from 1999 to 2020 to fill the gap of MV in this field. The works summarized in this paper are scoured from arXiv, IEEE, Springer, Elsevier, and so on. The screening process of the whole work is shown in Fig. 4. Twenty-six keywords are used to search for papers we are interested in, and a total of 2175 papers are collected. After two rounds of screening, we retain 171 papers, including 135 papers on lymphoma histopathology image analysis and 36 survey papers.

Fig. 4
figure 4

The flowchart of the paper searching and screening process

This study contains the following innovations and motivations compared to previous studies. First, in contrast to previous studies which tend to focus on medical images of different diseases, this study focuses on lymph histopathology image analysis and the comparison result is shown in Table 1. In addition, different from previous studies that focused on a single task in lymph histopathology image analysis for review, this study provides a more complete and detailed analysis of all processes and tasks in lymph histopathology images. Furthermore, this study provides a more detailed summary of the advantages and disadvantages of the new technique compared to previous studies. Finally, this study also gives predictions for the future direction of the technology.

1.4 Structure of this review

The content introduced in each section of this paper is shown in Fig. 5 as follows: Sect. 2 introduces commonly used datasets and evaluation methods. Section 3 introduces commonly used preprocessing methods, and Sect. 4 introduces commonly used image segmentation methods. Sections 5, 6 and 7 introduce the use of CAD technology for feature extraction, classification, and detection methods, and finally Sect. 8 summarizes this article and made further prospects for future work.

Fig. 5
figure 5

The structure of this paper

2 Datasets and evaluation methods

In the applications of MV in analysing histopathological images of lymphoma, we have discussed some publicly available datasets that are frequently used. Besides, we listed evaluation metrics used in segmentation, classification, and detection tasks.

2.1 Publicly available datasets

Five publicly available datasets are discussed in this section; among them, four of them are Camelyon16 (Camelyon16 2016), Camelyon17 (Camelyon16 2016), IICUB-2008 (Shamir et al. 2008) and DLBCL-Morph (Vrabac et al. 2021) and the 5th dataset is collected from the National Cancer Institute (http://www.cancer.gov/) and National Institute on Aging (NIA) (https://www.nia.nih.gov/). Table 2 shows the basic information of the datasets.

Table 2 Available lymph histopathology image datasets

The Camelyon16 challenge is organized by International Symposium on Biomedical Imaging (ISBI) in 2016. The purpose of Camelyon16 is to advance the algorithms for automated detection of cancer metastases in H &E stained lymph node images. This competition has a high clinical significance in reducing the workload of pathologists and the subjectivity in the diagnosis. The data used in Camelyon16 consists of 400 whole-slide images (WSIs) of SLN, which are collected in Radboud University Medical Center (Radboudumc) (Nijmegen, The Netherlands), and the University Medical Center Utrecht (Utrecht, the Netherlands). The training dataset includes 270 WSIs, and the test dataset includes 130 WSIs. In Fig. 6a–c are WSIs at low, medium and high resolution respectively (Camelyon16 2016).

Fig. 6
figure 6

The WSIs in Camelyon16 dataset

The Camelyon17 is the second challenge launched by the Diagnostic Image Analysis Group (DIAG) and Department of Pathology in Radboudumc. The purpose of Camelyon17 is to advance the algorithms for automated detection and classification of cancer metastases in H &E stained lymph node images. Like Camelyon 16, Camalyon 17 also has a high clinical significance. The training dataset includes 500 WSIs, and the test dataset includes 500 WSIs (as shown in Fig. 6).

The IICBU-2008 dataset is proposed by Shamir et al. (2008) in 2008 to provide free access of biological image datasets for computer experts, it includes nine small datasets containing different organs or cells. Three types of malignant lymphoma are proposed in the dataset containing lymphoma. There are 113 CLL images, 139 FL images and 122 MCL images (as shown in Fig. 7). The images are stained with H &E and usually used for classification such as Meng et al. (2010) and Bai et al. (2019).

Fig. 7
figure 7

The histopathological images in IICBU-2008 dataset

The dataset used in Roberto et al. (2017) and Ribeiro et al. (2018) is from the NCI and NIA, contains 173 histological NHL images which are comprised of 12 CLL images, 62 FL images and 99 MCL images (as shown in Fig. 8).

Fig. 8
figure 8

The histopathological images in the NCI and NIA dataset

In DLBCL-Morph, a total of 209 HE-stained patches of diffuse large B-cell lymphoma are included, and Ground Truth regions are annotated for each image.

2.2 Evaluation method

This section introduces some commonly used evaluation metrics in the classification, segmentation, and detection tasks of lymphoma histopathology images.

2.2.1 Some basic metrics

Here, we presented several commonly used evaluation metrics. Suppose we want to classify two types of samples, recorded as positive and negative, respectively: True positive (TP), False positive (FP), False negative (FN) and True negative (TN). The value in the diagonal of the confusion matrix is the number of correct classifications for each category.

2.2.2 Evaluation criteria for classification models

In classification tasks, when a classifier is trained with enough training samples, the next step is to provide test samples to check whether the test samples are correctly classified. Then, a method to evaluate and quantify the results of this classification is necessary. Accuracy (ACC), Precision, Sensitivity (SEN), Recall, F1-score, Receiver Operating Characteristic Curve (ROC curve) and Area Under Curve (AUC) are always used to evaluate classification performance. In addition, some metrics such as positive predictive value (PPV) and negative predictive value (NPV) are also used in Bollschweiler et al. (2004). PPV represents the proportion of true positive samples among all samples classified as positive and is given as TP/(TP+FP). NPV represents the proportion of true negative samples among all samples classified as negative and is given as TN/(TN+FN). Finally, SEN and SPE are also called true positive rates (TPR) and true negative rates (TNR). Table 3 shows the commonly used evaluation metrics in classification tasks.

Table 3 Evaluation metrics for classification tasks

2.2.3 Evaluation criteria for segmentation methods

Segmentation can be used to detect the region of interest (ROI) in histopathological images. In tasks of histopathology image analysis, ROIs include tissue components such as lymphocytes and cell nuclei. Segmentation is the process of dividing ROIs from the tissue. Since segmentation is generally used to prepare for subsequent detection and classification, so it is necessary to evaluate the segmentation result by appropriate metrics.

The segmentation metrics commonly used in segmentation include Dice coefficient (DICE), Hausdorff distance (HD), mean absolute distance (MAD) (Cheng et al. 2010), Jaccard index (JAC) (Fatakdawala et al. 2010) and so on. The following are the formulas for the above metrics. \(S_g\) is a set of points (\(g_1, g_2,..., g_n\)) constituting the automatically segmented contour, \(S_t\) is a set of points (\(t_1, t_2,..., t_n\)) constituting the ground truth contour.

  1. (1)

    DICE represents the ratio of the area where the two regions intersect to the total area and is given by

    $$\begin{aligned} \text{DICE}=\frac{2 \vert S_g \cap S_t \vert }{\vert S_g \vert + \vert S_t \vert } \end{aligned}$$
    (1)
  2. (2)

    HD between \(S_g\) and \(S_t\) is the maximum among the minimum distance computed from each point of \(S_g\) to each point of \(S_t\). HD is defined by

    $$\begin{aligned} \text{HD}= & {} \text{max}(h(S_g, S_t), h(S_t, S_g)) \end{aligned}$$
    (2)
    $$\begin{aligned} h(S_g, S_t)= & {} \mathop {\text{max}}_{g \in S_g}\mathop {\text{max}}_{t \in S_t}\parallel g-t \parallel \end{aligned}$$
    (3)
  3. (3)

    MAD between \(S_g\) and \(S_t\) is the mean of the minimum distance computed from each point of \(S_g\) to each point of \(S_t\). MAD is defined by

    $$\begin{aligned} \text{MAD}=\frac{1}{n}\mathrm{\Sigma } _{g=1}^{n}\lbrace \text{min}_{t \in S_t} \lbrace h(g,t)\rbrace \rbrace \end{aligned}$$
    (4)
  4. 4)

    JAC represents the intersection of two regions divided by their union and is defined by

    $$\begin{aligned} \text{JAC}=\frac{\vert S_g \cap S_t \vert }{\vert S_g \cup S_t \vert } \end{aligned}$$
    (5)

2.2.4 Evaluation criteria for detection methods

Because most of the detection tasks are completed by classification, many metrics in the classification tasks can also be used to evaluate the results of the detection tasks. Such as ACC, Confusion matrix, Precision, Recall, ROC curve, AUC, SEN, SPE in Sect. 2.2.2. In addition, free response operating characteristic (FROC) is used to evaluate the result in Lin et al. (2018), dice similarity coefficient (DSC) (as shown in Eq. 6) is used to evaluate the result in Senaras et al. (2019) and AI (Artificial Intelligence) score is used to evaluate the probability of lymph node metastasis in Harmon et al. (2020).

$$\begin{aligned} \text{DSC}=\frac{2\text{TP}}{2\text{TP}+\text{FN}+\text{FP}} \end{aligned}$$
(6)

2.3 Summary

In summary, we introduce the datasets commonly used in tasks of LHIA, among which the commonly used data sets are Camelyon16, Camelyon17 and IICBU-2008. Next, we summarize the evaluation metrics commonly used in classification, segmentation and detection tasks. We find that the commonly used metrics in classification and detection tasks are ACC, SEN, SPE, AUC. DICE and HD are the commonly used metrics in segmentation tasks.

3 Image preprocessing

In order to obtain the expected results in tasks of LHIA, the images must be in good quality. In this section, we introduce the preprocessing methods commonly used in tasks of LHIA, which are based on color, filter, threshold, morphology, histogram and other methods.

3.1 Color-based preprocessing techniques

At present, some CAD methods in tasks of LHIA convert images from RGB color space to other color spaces. The commonly used color-based preprocessing methods on LHIA as shown in Table 4.

In Angulo et al. (2006), Sertel et al. (2008, 2009), Basavanhally et al. (2009), Belkacem-Boussaid et al. (2009, 2010a, b, c), Orlov et al. (2010), Akakin and Gurcan (2012), Oztan et al. (2012), Saxena et al. (2013), Di Ruberto et al. (2015), Tosta et al. (2017a), Zhu et al. (2019), Bai et al. (2019) and Martins et al. (2019, 2021), \(L^*a^*b^*\) color space is used on LHIA for \(L^*a^*b^*\) allowing color changes to be compatible with differences in visual perception. In Chen et al. (2005); Sertel et al. (2010b); Di Ruberto et al. (2015); Tosta et al. (2017a), \(L^*u^*v^*\) color space is used on LHIA for \(L^*u^*v^*\) color space are uniform in perception. In Basavanhally et al. (2008), Sertel et al. (2008a), Samsi et al. (2010, 2012), Belkacem-Boussaid et al. (2010b), Han et al. (2010), Akakin and Gurcan (2012), Ishikawa et al. (2014), Kuo et al. (2014), Zarella et al. (2015), Fauzi et al. (2015), Di Ruberto et al. (2015), Wang et al. (2016), Chen et al. (2016), Tosta et al. (2017a), Zhu et al. (2019), Huang et al. (2021), HSV color space is used on LHIA for HSV separating the intensity from the color information. In Belkacem-Boussaid et al. (2010a), HSI color space is used on LHIA for HSI is hue based color space similar to HSV. In Di Ruberto et al. (2015) and Tosta et al. (2017a), Ycbcr color space is used on LHIA. In Tosta et al. (2017a), YIQ color space is used on LHIA. In Zhu et al. (2019), YUV color space is used on LHIA. Ycbcr, YIQ and YUV are luminance based color spaces. In Kong et al. (2011a), a new color space is defined which is called MDC (the most discriminant color space). MDC is a linear combination of RGB color space, the texture features extracted from MDC can separate classes optimally. In Orlov et al. (2010), Cheikh et al. (2017), Bianconi et al. (2020), color deconvolution is used to separate H and E overlapped channels into individual channels. Figure 9 shows an example of application in color deconvolution. In Samsi et al. (2012), Acar et al. (2013), Michail et al. (2014b), Dimitropoulos et al. (2014), Jiang et al. (2018), Zhu et al. (2019), and Bándi et al. (2019), RGB images are converted into grayscale images because the variations of color and intensity may hamper the classification performance. In Codella et al. (2016) and Hashimoto et al. (2020), the color saturation of the original image has increased after color equalization and enhancement, color augmentation is used to reduce the effect of outlying colors. In Tosta et al. (2017a), Ribeiro et al. (2018), Hashimoto et al. (2020), Bianconi et al. (2020), color normalization is used to improve the quality of images. In Tosta et al. (2017b, 2018) and Azevedo Tosta et al. (2021), the R channel is extracted from RGB color space because the R channel has the greatest contrast compared to the image background.

Table 4 Color-based preprocessing techniques
Fig. 9
figure 9

a MCL lymphoma sample stained with H &E. b H channel. c E channel. This figure corresponds to Fig. 4 in Orlov et al. (2010)

3.2 Filter-based preprocessing techniques

Filtering can smooth images and remove uninteresting parts of the images such as noise, artifacts, and so on. The commonly used filter-based preprocessing methods are shown in Table 5.

In Samsi et al. (2010), Oger et al. (2012), Chen et al. (2016), Shi et al. (2016), Wollmann and Rohr (2017), Wollmann et al. (2018), Bándi et al. (2019), a median filter is used to smooth texture variations, remove artifacts and noise, homogenize color intensity. An example using median filter is shown in Fig. 10. In Han et al. (2010), Schmitz et al. (2012), Schäfer et al. (2013), Michail et al. (2014b), Dimitropoulos et al. (2014), Shi et al. (2016), Es Negm et al. (2017), Tosta et al. (2017a, 2017b, 2018) and Azevedo Tosta et al. (2021), Gaussian filter is used to remove noise. In Belkacem-Boussaid et al. (2010c), a matching filter is used for removing noise, flattening the background and enhancing the contours of the follicle region to homogenize images. In Codella et al. (2016), a region filter is used to remove areas with a surface area that is less than 10 pixels. In do Nascimento et al. (2018), the 2D order-statistics filter is used to remove noise and discrete points.

Fig. 10
figure 10

The median filter in preprocessing. a and b corresponds to Fig. 1(a) and (d) respectively in Chen et al. (2016)

Table 5 Filter-based preprocessing techniques

3.3 Threshold-based preprocessing techniques

Thresholding can separate the area of interest in an image from the background. In RGB images, the brightnesses of R, G and B commonly are used to be a threshold. Table 6 shows the references using threshold-based preprocessing techniques.

In Zorman et al. (2007, 2011), the mean brightness of R component is selected to be a threshold of the binary image. The pixels whose values are less than the threshold are set to 1 and the rest is set to 0. In Belkacem-Boussaid et al. (2010a), Wang et al. (2016), Chen et al. (2016), Codella et al. (2016), Xiao et al. (2017), Li and Ping (2018), Lin et al. (2018) and Bándi et al. (2019), the Otsu threshold method is used to calculate the optimal threshold of each channel in color space, binarize images, segment color channels, remove the background regions of images. In Schmitz et al. (2012), Schäfer et al. (2013), Ishikawa et al. (2014), Fauzi et al. (2015), Es Negm et al. (2017) and BenTaieb and Hamarneh (2018), a threshold is selected to separate the background area and tissue area to highlight the tissue area. In Michail et al. (2014a), the intensity threshold is used to remove non-candidate regions for nucleoli are dark and non-candidate regions are bright.

Table 6 Threshold-based preprocessing techniques

3.4 Morphology-based preprocessing techniques

Morphological operations can smooth the image contour, suppress the noise, fill in small holes and connect broken areas. There are four basic operations, namely dilation, erosion, opening operation and closing operation. Table 7 shows the references using morphology-based preprocessing techniques.

In Belkacem-Boussaid et al. (2010a), Cheng et al. (2010), Sandhya et al. (2013) and Bándi et al. (2019), opening and closing operations are used to recover the shape of tissue, fill the holes in the segmented binary images. In Sandhya et al. (2013), Michail et al. (2014a), Codella et al. (2016) and Xiao et al. (2017), dilation and erosion operators are used to connect edges, remove noise and black holes. In Sandhya et al. (2013), the preprocessing process includes removing noise, smoothing images by using disc smooth operator, making edges continuous by using canny edge detector, opening and closing operations, dilation operator, filling connected components using hole filling method, normalizing the maximum area of the connected area in images.

Table 7 Morphology-based preprocessing techniques

3.5 Histogram-based preprocessing techniques

The histogram shows the total number of pixels in each gray level in the images. Histogram equalization, histogram normalization and histogram stretching are commonly used to enhance image contrast. Table 8 shows the references using histogram-based preprocessing techniques.

In Samsi et al. (2010), Kong et al. (2011a), Oger et al. (2012), Michail et al. (2014b), Dimitropoulos et al. (2014), Codella et al. (2016), Tosta et al. (2017a, b, 2018), Somaratne et al. (2019), Bianconi et al. (2020) and Azevedo Tosta et al. (2021), histogram equalization is used to increase the contrast of images, better identify the tissue regions, enhance the differences between background and tissue, normalize the color distribution of slides under different staining and lighting conditions to obtain a uniformly distributed histogram. As shown in Fig. 11. In Akakin and Gurcan (2012), the co-occurrence histogram is normalized to extract texture features. In Cheikh et al. (2017), histogram stretching is used to solve the problem of changes in hematoxylin concentration after color deconvolution.

Fig. 11
figure 11

First row: the original images of FL. Second row: the images after histogram equalization. The figure corresponds to Fig. 1 in Kong et al. (2011a)

Table 8 Histogram-based preprocessing techniques

3.6 Other preprocessing techniques

In addition to the above-mentioned preprocessing methods, some papers design different image preprocessing methods in LHIA.

In Fatakdawala et al. (2010), an EM-based (expectation Maximization) is used to segment and detect lymphocytes in HER2+ breast cancer (BC) histopathology images. Figure 12 shows the results obtained by EM. In Meng et al. (2010, 2013), a 25-block framework is designed for histology image classification. In Han et al. (2010), Ishikawa et al. (2014), Xiao et al. (2017), Jamaluddin et al. (2017), Jiang et al. (2018), BenTaieb and Hamarneh (2018), Alom et al. (2019), Senaras et al. (2019), Mohlman et al. (2020), Syrykh et al. (2020), Miyoshi et al. (2020), Dif and Elberrichi (2020) and Hashimoto et al. (2020), the original images are cropped into smaller patches. In Jamaluddin et al. (2017), 40,000 patches are extracted from the training set because the process of eliminating black borders, noise and white background areas would increase the unnecessary time. In BenTaieb and Hamarneh (2018), all sides around tissue are cropped for reducing the computational amount. In Miyoshi et al. (2020), 64 × 64 pixel-sizes patches are cropped from 2048 pixel × 2018 pixel sized patches at magnifications of ×20 and ×40 during the training phase and test phase. In Neuman et al. (2010), color separation and rescaling are used in the step of preprocessing and the preprocessing is shown in Fig. 13.

Fig. 12
figure 12

a Original HER2+ BC histopathology image with corresponding be class binarized scenes. The figure corresponds to Fig. 3 in Fatakdawala et al. (2010)

Fig. 13
figure 13

The preprocessing steps. The figure corresponds to Fig. 3 in Neuman et al. (2010)

Since the contours of the images may be irregular, the Fourier shape descriptors are used to smooth and underline better the contours in Belkacem-Boussaid et al. (2010c), Kong et al. (2011a, b) and Samsi et al. (2012). In Belkacem-Boussaid et al. (2010c), Schäfer et al. (2013) and Es Negm et al. (2017), the Gaussian filtering is firstly used to convolve the images. Figure 14 shows the effects of the preprocessing.

Fig. 14
figure 14

In Box 1, the background pixels are removed. In Box 2, tissue fragments are eliminated. The figure corresponds to Fig. 2 in Belkacem-Boussaid et al. (2010c)

In Michail et al. (2014a) and Kuo et al. (2014), the preprocessing steps include a series transform method such as gray-level run length alogorithm, circlar hough transform and Hasan hole filling alogorithm. Figure 15 shows the effect of preprocessing.

Fig. 15
figure 15

The effect of preprocessing in Kuo et al. (2014). The figure corresponds to Fig. 3 in Kuo et al. (2014)

In Linder et al. (2019), Li et al. (2020), Brancati et al. (2019), Somaratne et al. (2019), Senaras et al. (2019), Kandel and Castelli (2020), Thorat (2020), Dif and Elberrichi (2020) and Hashimoto et al. (2020), data augmentation methods are used to avoid issue of biases and overfitting. The methods include left-right flip, rotation, perturbing color, adding jitter, scaling, shear, flipping (both horizontal and vertical), mirror operation. In Dif and Elberrichi (2020), the patches are rotated by 0 and 90° after cropping. In Hashimoto et al. (2020), the patches are rotated to augment data when the number of patches in a WSI is less than 3000.

3.7 Summary

The advantages and limitations of the above image preprocessing techniques are as follows. Color-based preprocessing techniques can effectively quantify color characteristics by adjusting the color space, color equalization and color enhancement. However, adjusting to other color spaces also has color distortion leading to poor results in subsequent tasks. Filter-based preprocessing techniques can effectively smooth the image to achieve the effect of denoising. However, in some tasks where small objects need to be identified, this method leads to distortion of the small objects resulting in incorrect identification. Threshold-based preprocessing techniques can extract the texture of an image quickly and efficiently, but this method can only be applied to some image segmentation tasks, and the application of this method for other tasks requires subsequent exploration. Morphology-based preprocessing techniques enable quick extraction of regions of interest, but the disadvantages of this method are similar to those of threshold-based preprocessing techniques, which are limited to image segmentation tasks. Histogram-based preprocessing techniques can perfectly resolve the color differences in images and thus better represent the texture features of images. However, it often does not have a good performance on small medical image datasets.

4 Image segmentation

In this section, we have summarized the segmentation tasks in LHIA. Segmentation is a crucial step in image processing applications, which identifies ROIs, including nuclei, glands, and lymphocytes. The ROIs include follicular regions, lymphocytes, CBs, centrocytes, and neoplastic cells in lymphoma images. It is essential to correlate accurate identification of ROIs with pathologies (Ong et al. 1996).

4.1 Threshold-based segmentation methods

Image thresholding is a widely used segmentation technique that generates binary images by selecting a reasonable threshold and the workflow is shown in Fig. 16. Table 9 shows the references using threshold-based segmentation methods. In Sertel et al. (2008b, 2009), Michail et al. (2014b) and Dimitropoulos et al. (2014), the threshold value is selected by manual selection. In the work of Zorman et al. (2007), Belkacem-Boussaid et al. (2009), Codella et al. (2016), Neuman et al. (2010) and Mandyartha et al. (2020), it is selected by adaptive selection algorithm.

Fig. 16
figure 16

The workflow of threshold segmentation

Table 9 Threshold-based methods for segmentation

4.2 Clustering-based segmentation methods

In the clustering algorithm, the goal is to group similar things. The clustering algorithm usually does not use training data for learning, called unsupervised learning in ML. Contrary to supervised segmentation methods, unsupervised segmentation does not train a classifier. Unsupervised segmentation groups the pixels of an image into several clusters based on similar pixels with a property (color, texture, intensity value, etc.). The commonly used clustering-based segmentation method is k-means clustering (Sertel et al. 2008a, b, 2009; Oztan et al. 2012; Samsi et al. 2010, 2012; Han et al. 2010; Shi et al. 2016; Arora 2013). Table 10 shows the references using clustering-based segmentation methods and the workflow of clustering segmentation is shown in Fig. 17.

Fig. 17
figure 17

The workflow of clustering segmentation

Table 10 Clustering-based methods for segmentation

4.3 Region-based segmentation methods

Watershed algorithm is a segmentation method based on region, which segments ROIs by finding watershed lines. In the watershed algorithm, grey-level images are considered a topographic. The grey level of a pixel corresponds to its elevation, and the high grey level corresponds to a mountain, the low grey level corresponds to the valley. When the water level rises to a certain height, the water will overflow the current valley. A dam can be built on the watershed to avoid the collection of water in two valleys. Therefore, the image is divided into 2-pixel sets; one is the valley pixel set submerged by water, and the other is the watershed line pixel set. In the end, the lines formed by these dams partition the entire image to achieve the segmentation result of the image (Fig. 18).

Fig. 18
figure 18

The workflow of region-based segmentation

In the work of Neuman et al. (2010), a watershed is applied to segment nuclei as the second step of segmentation. In addition, multiple follicles may merge into one region in the process of k-means clustering. Therefore, the iterative watershed algorithm is used to split the overlapping regions until the segmentation ends (Samsi et al. 2010). Furthermore, in Zarella et al. (2015), a method for predicting axillary lymph nodes is proposed, which employs a watershed transform for segmenting candidate nuclei and uses SVM for detection.

4.4 Deep learning based segmentation methods

In addition to some traditional segmentation methods, Swiderska-Chadaj et al. (2019), Bándi et al. (2019), Senaras et al. (2019), Gunesli et al. (2022) and Wu et al. (2022) use deep learning segmentation techniques in LHIA and the workflow is shown in Fig. 19.

Fig. 19
figure 19

The workflow of deep learning segmentation

In the work of Swiderska-Chadaj et al. (2019), four deep learning methods are used in automatically detecting lymphocytes of histopathology images. In Bándi et al. (2019), a fully convolutional neural network (FCNN) is applied for differentiating tissue and background regions for avoiding low-efficient scanning in empty background areas. A method which is used to automatically detect follicles by U-Net in CD8 stained FL images is developed in Senaras et al. (2019). In the work of Gunesli et al. (2022), DLBCL-Morph dataset is segmented utilizing a Morph-Net model based on Hover-Net. In Wu et al. (2022), a genetic algorithm is applied to address the problem of overlapping lymphocytes.

4.5 Other segmentation methods

In addition to the above segmentation methods, some other segmentation methods also achieve good results.

In the work of Belkacem-Boussaid et al. (2010c), a technique is developed to segment the follicular regions in the H &E stained FL images which based on an active contour model and is initialized through users manually selecting seed points in follicular regions. Vertex graph algorithm is proposed to separate touching cells (Yang et al. 2008). In the work of Basavanhally et al. (2008), a Bayesian classifier and template matching segmentation method is proposed to automatically detect lymphocytes. Figure 20 shows the result of the proposed segmentation method. In Sertel et al. (2010b), an image analysis system using mean-shift algorithm and morphological operations is designed to quantitatively evaluate digitized FL tissue slides. Figure 21 shows the process of segmentation. In the work of Cheng et al. (2010), a multi-phase level set framework to avoids the vacuum and overlap problems is used to initial segmentation of cells. In the method, n = 4 phase segmentation is selected to separate the ROIs for there are four regions with different intensity levels in original images. In Sertel et al. (2010a), a computer-aided detection system which used to automatically detect CB cells in FL images is proposed. In the work of Kong et al. (2011a), an integrated framework is proposed, which comprises an algorithm of supervised cell-image segmentation and a method of splitting touching-cell. In Kong et al. (2011b), a new algorithm is designed for splitting touching/overlapping cells. In the work of Oger et al. (2012), a framework is proposed for segmenting follicular regions before histological grading. Furthermore, A system which can automatically differentiate the categories for diffuse lymphoma cells is designed using minimum variance quantization approach. Es Negm et al. (2017).

Fig. 20
figure 20

a Original image. b Voronoi diagram. c Delaunay triangulation. d Minimum spanning tree showed in segmented lymphocytes. The figure corresponds to Fig. 3 in Basavanhally et al. (2008)

Fig. 21
figure 21

a Original image. b The segmentation result of mean-shift. c The final result. The figure corresponds to Figs. 1 and 2 in Sertel et al. (2010b)

Unsupervised learning methods tend to demonstrate excellent performance in smal-scale dataset. In the work of Tosta et al. (2017a), an unsupervised method for segmenting nuclear components is proposed. Additionally, Tosta et al. (2018) introduce a genetic unsupervised algorithm to obtain object diagnoses. This algorithm segments nuclei by evaluating various genetic algorithms’ fitness functions. SVM is utilized for classification, using the features extracted from the segmented regions. In Cheikh et al. (2017), a fast superpixel segmentation algorithm is applied to prevent overlap. Dimitropoulos et al. (2017) develops a framework for detecting and classifying CBs in FL images stained with H&E and PAX5.

4.6 Summary

In summary, advantages and limitations of the above image segmentation techniques are as follows. Threshold-based, clustering-based and region-based segmentation methods are unsupervised training methods which can quickly segment the contours of the region to be obtained, but they are not effective in identifying small regions and are susceptible to the influence of noise points. Deep learning based segmentation methods are supervised training and have good segmentation results with pixel-level labels datasets, but pixel-by-pixel labeling images requires a lot of time. From the above summary of the segmentation in LHIA, the threshold-based segmentation methods and k-means clustering-based segmentation methods in the traditional methods are more commonly used. In terms of deep learning, U-net is an efficient segmentation method, and FCNN can also achieve similar performance to U-net with low computational cost. In addition to the traditional and deep learning segmentation methods, some other segmentation methods also achieve good performance.

5 Feature extraction

In image processing tasks, it is expected to extract features from the output of the image segmentation. Feature extraction is a crucial step in histopathology image analysis, which is a process of extracting features of the object from segmented images. Table 11 shows the commonly used feature extraction methods, including traditional and deep learning-based methods.

5.1 Visual features extraction

The visual features of an image include color, shape, texture.

5.1.1 Color features extraction

Color features belong to the internal features of an image and describe the surface properties of images. Color features are widely used in CAD system.

In the field of feature extraction, different color spaces are used to extract features from images. In Zorman et al. (2007, 2011), Akakin and Gurcan (2012), Acar et al. (2013), Shi et al. (2016) and Bianconi et al. (2020) in which the RGB color space is used to extract various features such as color intensity, histograms, and statistical measures. In addition, the HSV color space in Sertel et al. (2008a), Samsi et al. (2010), Han et al. (2010), Oger et al. (2012), Akakin and Gurcan (2012), Zarella et al. (2015) and Samsi et al. (2010, 2012) were used for clustering and feature extraction, with special attention to the H-channel. \(L^*u^*v^*\) color space in Sertel et al. (2010b), Belkacem-Boussaid et al. (2010b) and Akakin and Gurcan (2012) helps to construct the feature space and extract morphological features. Finally, in Meng et al. (2010, 2013), Kong et al. (2011a) and Fauzi et al. (2015), images were segmented into chunks to extract color histogram features.

5.1.2 Texture features extraction

Texture feature extraction methods have been developed. Techniques such as Self-Organizing Feature Mappings (SOFMs) (Sertel et al. 2008b, 2009), Gray-Level Coevolutionary Matrices (GLCM) (Sertel et al. 2008a; Samsi et al. 2010, 2012; Oger et al. 2012; Oztan et al. 2012; Bianconi et al. 2020), Gabor features (Cheng et al. 2010; Meng et al. 2013; Acar et al. 2013; Sandhya et al. 2013) and other methods (Sertel et al. 2010a; Michail et al. 2014a; Shi et al. 2016; Di Ruberto et al. 2015; Fatakdawala et al. 2010; Kong et al. 2011a) have been improved for better texture and color quantization analysis. SOFMs are used for nonlinear color quantization, while GLCM helps in extracting texture features from the quantized color space. In addition, principal component analysis (PCA) is applied in different color spaces (RGB, HSI, \(L^*a^*b^*\)) to extract color texture features (Belkacem-Boussaid et al. 2009, 2010a, b). These methods play a key role in feature extraction for automatic cell recognition, tissue analysis, and the construction of feature sets for classification tasks using classifiers such as SVM and QDA. The effectiveness of these techniques is validated by the high classification accuracy in various studies (Fig. 22).

Fig. 22
figure 22

The key points retrieved by SIFT (color online). a Key points without morphological refining. b Key points after morphological refining. The figure corresponds to Fig. 4 in Kuo et al. (2014)

5.1.3 Shape feature extraction

In the work of Yang et al. (2008), the contour of boundary of the touching cells is extracted by the followed steps: First, L\(_2\)E robust estimation is applied to produce a rough estimation for the boundaries. Second, a robust gradient vector flow extracts the contour from the background.

In Sertel et al. (2008a) and Cooper et al. (2009), the size of a cell, eccentricity are extracted for feature construction. In Belkacem-Boussaid et al. (2009), the area and perimeter of cells are extracted. In Sertel et al. (2010b), area and nuclear to cytoplasm ratio are extracted in the feature vector. In Belkacem-Boussaid et al. (2010b), the largest area is extracted through Otsu thresholding, opening, closing, labeling and area classification. In Oztan et al. (2012), the morphological features extracted include the length of major axes, length of minor axes and area of ellipses. In Arora (2013), two features are extracted from segmented images, which include curvature and center of the nuclei. In Sandhya et al. (2013), the extracted shape features include area, bounding box, perimeter, convex area, solidity, major axes and minor axes. In Zarella et al. (2015), six features are extracted including perimeter, area, aspect-ratio, circularity and two measurements quantifying the shape of nuclei. In Wang et al. (2016), a total of 28 geometrical and morphological features are extracted, which include the percentage of region of the tumor over the region of whole tissue, the area ratio, mean and the longest axis. In Chen et al. (2016), the extracted geometrical and morphological features include max, mean, skewness, variance and kurtosis, etc.

Individual nuclei and their size construct the first feature through connected component labeling. Then, the perimeter of nuclei is extracted. Next, aspect ratio and ellipse residual are extracted after estimating the best fitting ellipse through the Orthogonal Distance Regression (ODR) (Michail et al. 2014a).

5.2 Statistical feature extraction

In the work of Sertel et al. (2008a), skewness, energy, mean intensity, kurtosis and entropy are extracted for obtaining the intensity characteristics.

5.3 Graph-based feature extraction

Graph-based feature extraction methods are more and more used in lymphathic histopathology image analysis (Basavanhally et al. 2008, 2009; Oztan et al. 2012; Ishikawa et al. 2014). Voronoi diagram, Delaunay triangulation and minimum spanning tree are constructed for extracting the arrangement information of lymphocyte nuclei because lymphocyte detection is not enough to describe the abnormalities lymphatic infiltration. Voronoi diagram is constructed using the segmented objects to produce a tessellation for an image. Delaunay triangulation is constructed through connecting the centroid of the adjacent diagram of the Voronoi diagram. Suppose there is a connected graph G, G may have multiple spanning trees, and these spanning trees have all the vertices of G. When traversing all the vertices of these spanning trees, the spanning tree with the least cost is the minimum spanning tree (Kuo et al. 2014).

5.4 Deep learning feature extraction

In addition to the traditional methods of low-level feature extraction described above, some advanced deep learning methods have been widely used in LHIA in recent years (Litjens et al. 2016; Sheng et al. 2020; Hashimoto et al. 2020; Roberto et al. 2021; Dif et al. 2021). Figure 23 shows a CNN model. There is no need to manually extract features in deep neural networks, and better features can be automatically extracted. A neural network extracts visual features which can be overlooked by human visual inspection. The convolutions in the neural network can filter a large number of features contained in an image.

Fig. 23
figure 23

The figure corresponds to Fig. 1 in Litjens et al. (2016)

5.5 Other features extraction

In the work of Cooper et al. (2009), a method of automatic non-rigid registration is proposed, which is used in different staining histopathology images. This method extracts matching high level features on the down-sampled images of original images. In addition, the set of global features is calculated in the inner level, then the calculated features are fused into a feature vector (Orlov et al. 2010). In the work of Michail et al. (2014b), a method is designed for automatically detecting malignant cells in microscopic images of FL. The cell nuclei are first detected after preprocessing and segmentation, next the expectation maximization algorithm is used to split touching cells.

5.6 Summary

This section has described feature extraction methods, including traditional and deep learning-based methods. In traditional methods, the extracted features are generally color features, texture features and shape features. However, advanced algorithms are researched and applied in LHIA with the development of MV. Deep learning-based method is now being studied with automatic extracting different features by more researchers and it has also shown good performance. With the advancement of deep learning technology, feature extraction is no longer limited to manual extraction because deep neural networks can automatically extract better features (LeCun et al. 2015; Janiesch et al. 2021).

Advantages and limitations of the above classification techniques are as follows. Visual and statistical features are simple and intuitive, and can clearly represent the distribution of information such as color and texture of the image, but the features are too simple for subsequent tasks. Graph-based features can clearly represent the adjacency relationship between feature points in the image, but due to the large time complexity of the computation of this method, for large size Deep Learning Feature feature is effective in extracting high-dimensional information by network model, but it is poor in extracting long-range features and the location information of the image is easily destroyed (LeCun et al. 2015; Zhan et al. 2023).

Table 11 Feature extraction methods

6 Classification methods

In this section, we introduce the classification tasks in LHIA. After feature extraction, different classifiers are selected to classify the histopathological images. In LHIA, common classification tasks include staging and grading malignant lymphomas (such as FL), classification of several malignant lymphomas (such as CLL, MCL and FL) and classification of CB and non-CB. The methods of classification include traditional ML methods and deep learning-based methods.

6.1 Traditional machine learning based classification methods

In a series of studies, diverse machine learning classifiers have been applied to classify and grade lymphatic histopathological tissue samples. SVM classifier is widely used to differentiate between high and low lymphocytic infiltration samples, to classify FL and Neuroblastoma as well as three types of Lymphomas (CLL, FL, and MCL), and the radial basis function (RBF) function is chosen for some of the studies (Basavanhally et al. 2009; Akakin and Gurcan 2012; Michail et al. 2014a; Sandhya et al. 2013; Di Ruberto et al. 2015; Shi et al. 2016; Song et al. 2017; Acar et al. 2013). Bayesian classifier (Sertel et al. 2008b, 2009; Oztan et al. 2012) and kNN (Sertel et al. 2009; Oztan et al. 2012; Fauzi et al. 2015) classifier are used to classify FL and QDA (Belkacem-Boussaid et al. 2009, 2010a, b) is used to classify CBs and non-CBs. Meanwhile, other studies explored the classification effect of Weighted-Neighbor Distance (WND) and Naive Bayes Network (BBN) classifiers after feature extraction (Orlov et al. 2010). The C-RSPM model improves the classification accuracy by chunking the images and multimodal fusion algorithm (Meng et al. 2010, 2013). In addition, there are studies using integrated learning algorithms such as symbol-based methods, minimum distance classifiers, random forests, and XGBoost to deal with FL classification (Zorman et al. 2007, 2011; Schmitz et al. 2012; Ribeiro et al. 2018; Fanizzi et al. 2021; Yu et al. 2021) (Table 12).

Table 12 Traditional classification methods

6.2 Deep learning based classification methods

In recent years, deep learning classification methods are widely used in lymphatic histopathology analysis. First, in Jamaluddin et al. (2017), El Achi et al. (2019), Zhu et al. (2019), Bai et al. (2019), Brancati et al. (2019), Li et al. (2020), Sheng et al. (2020), Steinbuss et al. (2021) and Huang et al. (2021), traditional vision models such as VGG, ResNet and EifficentNet are applied to the lymphoma tasks and excellent results are obtained. In addition, in Arora (2013), Miyoshi et al. (2020), Chen et al. (2022), Ahmad et al. (2022) and Basu et al. (2022), models specialized for lymphoma tasks are designed (Table 13).

Table 13 Deep learning methods for classification

6.3 Summary

Features are obtained from histology images through techniques including multidimensional fractal models and defining convolutional descriptors with different CNN models. The method shows that the feature association extracted from different methods is a potential field for further research (Roberto et al. 2021).

From the reviewed works, SVM and kNN are the commonly used traditional classification methods in LHIA. Deep learning classification methods are the mainstream methods and achieve good results, and the CNN network is the most commonly used deep learning method. Machine learning methods are fast and effective, but they consume a lot of time to extract valid features manually and SVM and kNN are difficult to implement for large-scale datasets. Deep learning methods are able to extract features by automatic learning compared to machine learning methods, but they often need to be effective on large scale datasets (Huang et al. 2023).

7 Detection methods

In this section, we introduce the detection tasks in LHIA. The detection tasks of LHIA include the detection of malignant cells, CB, follicles and lymphocytic infiltration. In many cases, the detection tasks are carried out through classification methods.

7.1 Traditional machine learning based detection methods

In the work of Basavanhally et al. (2008), Ishikawa et al. (2014), Zarella et al. (2015), Kuo et al. (2014) and Cheng et al. (2010), SVM is used to identify lymphocytic. and in Michail et al. (2014b) and Basavanhally et al. (2009), LDA classifier and Markov random field (MRF) is used in lymphatic histopathology image analysis (Table 14).

Table 14 Traditional detection methods

7.2 Deep learning based detection methods

Nowadays object detection models based on deep learning are increasingly being applied to the problem of lymphoma histopathology analysis (Chen and Srinivas 2016; Wang et al. 2016; Chen et al. 2016; Xiao et al. 2017; Liu et al. 2017; Li and Ping 2018; Somaratne et al. 2019). These tasks use an already designed feature extraction network as a backbone, and then the extracted features are used with conditional random field (CRF) or region proposal networks (RPN) for tasks such as detecting lymphatic and predicting lymphatic metastasis. Figure 24 shows the architecture of Wang et al. (2016) (Table 15).

Fig. 24
figure 24

The architecture of NCRF. The figure corresponds to Fig. 1 in Li and Ping (2018)

Table 15 Deep learning methods for detection

7.3 Summary

The knn algorithm is relatively simple, easy to understand and implement, and does not require parameter estimation or training. knn is suitable for classifying rare events. For multi-classification problems, it performs better than SVM. However, when the sample is unbalanced, it may cause classification errors.

Most of the current image classification and detection tasks use deep learning methods, and CNN is the most widely used algorithm. Compared with general neural networks, the most prominent feature of CNN is the addition of convolutional layers and pooling layers. It does not need to extract specific artificial features for specific image data sets or classification methods but simulates the visual processing mechanism of the brain to image hierarchy abstraction, automatic screening of features, so as to achieve classification. In recent years, CNN has also achieved good results in the classification and detection of histopathological images of lymphoma. For example, CNN is used to classify normal and tumor slides in images of Camelyon16, an AUC of 0.94 which surpasses the result of the winner of Camelyon16 Challenge is obtained (Jamaluddin et al. 2017). The residual CNN is used to classify lymphomas subtypes, an ACC of 97.67% which is higher than U-net and ResNet is obtained (Brancati et al. 2019). 92.4% of tumors are detected through a CNN architecture called Inception V3 in Liu et al. (2017). CNN obtains good results on histopathological images of lymphoma, but also on other pathological images, such as breast cancer (Titoriya and Sachdeva 2019; Kumar et al. 2017), prostate cancer (Duran-Lopez et al. 2020; Lucas et al. 2019), colon cancer (Yildirim and Cinar 2021; Tasnim et al. 2021).

8 Conclusions and future work

This paper summarizes the LHIA methods based on MV. The datasets, evaluation methods and indicators, image preprocessing, segmentation, feature extraction, classification, and detection are comprehensively analyzed and summarized.

This paper provides a baseline and direction for subsequent LHIA research. In recent years, with the application of visual vision transformer, small sample learning, diffusion model and other methods in traditional vision field, the subsequent research can focus on the above directions. In the future, diagnosing and treating lymphoma methods based on deep learning has extensive application prospects. Especially since the new crown epidemic outbreak in 2019, medical image analysis technology has received more and more attention. At present, medical image analysis is still an emerging area. Therefore, developing a system that requires a small amount of calculation, small memory, and interpretability is essential. In addition, this review discusses the MV method not only can be applied in the field of LHIA, but also in the field of other image analysis, such as MV-assisted identification of the lung adenocarcinoma category and high-risk tumor area based on CT images (Chen et al. 2022a), and deep learning based CT image analytics protocol to identify lung adenocarcinoma category and high-risk tumor area (Chen et al. 2022b). No matter from the aspects of image pre-processing, feature extraction and selection, segmentation, and classification, or from the aspects of MV method design and proposed framework idea, the methods of MV summarized in this review can bring a new perspective to the research in other fields.