Introduction

Regenerative medicine industry was accelerated by the development of embryonic stem (ES) cells and induced pluripotent stem (iPS) cells that will be useful for tissue/organ regeneration as well as drug discovery using stem cell modeling of human disease [1, 2]. Regenerated tissues/organs will be possible to replace damaged tissue/organ of patients. To realize regenerative medicine, industrialization of tissue/organ regenerations is one important topic because the industrialization is expected to reduce human errors and labor cost, which provides safety, security and affordable tissue/organ supply. Tissue/organ regeneration requires large numbers of cells, typically 1 cm3 tissue include about 109 cells, so that automatic cell-culturing system has been developed [3, 4] and already commercially available. The next target must be developing the method how to evaluate cellular conditions in culturing state for further tissue/organ engineering. To this date, cellular conditions are still mostly checked manually, depending on the experience of scientists. Numerous numbers of cells must be treated correctly, rapidly and automatically by mechanical approaches. Furthermore, for tissue/organ regeneration, non-invasive method is required for transplant and any invasive method such as PCR, ELISA and fluorescent imaging are not applicable. It has been challenging to recognize cellular conditions with non-invasive method using bright field microscopy.

Machine learning approach to analyze images is one of promising methods to survey cellular conditions with phase-contrast images of cells. For example, cellular morphology classification [5], prediction of osteogenic differentiation potential [6], cellular orientation analysis [7], and non-invasive evaluation of human induced pluripotent stem cell (iPSC) [8] have been reported. Because cells have complicated and ununiformed morphology, extracting of feature parameters to obtain highly accurate classifications are still developing field. On the other hand, deep learning (DL), one of machine learning methods, has outstanding algorithm for feature extraction so that DL is more efficient than the conventional machine learning. DL is emerging in object recognition field [9,10,11,12,13,14]. In 2012, DL got a lot of attention because DL showed the highest recognition rate in a competition [ImageNet Large Scale Visual Recognition Challenge (ILSVRC)] with a huge lead from the second top group [10]. In ILSVRC 2015 the accuracy of image recognition reaches over 95% [14], which is comparable or overcome human accuracy, although the classes of the images are only 1000. Recently, it is starting to apply DL for recognition of cellular conditions, such as cellular mitosis for the further application of cancer detection due to high cell division ratio [15, 16]. Convolutional neural network (CNN) is one of DL methods that was used in the mitosis recognition. CNN shows high performance for pattern and image recognition, which could be an automatic, non-invasive, rapid method for cell analysis. The purpose of this research is recognizing cellular states with phase-contrast images of cellular morphologies using CNN.

In this paper, we develop CNN to recognize cellular differentiation using cultured C2C12 cells with only the phase-contrast images. C2C12 cells are mouse skeletal myoblast cells, which changes its phenotype in the process of differentiation from somewhat round shape to elongated tubular shape due to fusion of the cells. C2C12 cells before and after differentiation are imaged by phase-contrast microscopy and the images are input to CNN. Learning is conducted by supervised learning, and after learning, the accuracy is checked using test dataset.

Materials and methods

Cell culture and differentiation

Mouse myoblast cell line (C2C12) (Dainippon Sumitomo Pharma, Osaka, Japan) was used and seeded on fibronectin coated glass-based culture dish. The cells are cultured in Dulbecco Modified Eagle’s Medium/high glucose (DMEM, D6429) (Sigma-Aldrich, St. Louis, MO, USA) with 10 v/v % fetal bovine serum (FBS, Lot# 171012) (Nichirei Bioscience, Tokyo, Japan) and 1 v/v % penicillin–streptomycin solution (100×) (168-23191) (Wako Pure Chemicals, Osaka, Japan) in 5% CO2 and humidified atmosphere at 37 °C. After the cultured cells reach confluence, differentiation is induced by changing the medium to Dulbecco Modified Eagle’s Medium (DMEM, D6429) (Sigma-Aldrich, St. Louis, MO, USA) with penicillin–streptomycin solution (100×) (168-23191) (Wako Pure Chemicals, Osaka, Japan) and insulin–transferrin–selenium (ITS, 41400-045) (Gibco, Grand Island, NY, USA) (1 v/v %). The date when differentiation starts is defined as Day 0.

Fibronectin coating

1 mg/ml fibronectin solution (Bovine Fibronectin, F1141-1MG, Sigma-Aldrich, St. Louis, MO, USA) is diluted to 100 μg/ml with PBS (−). 100 µl of the diluted solution is poured into a glass base dish and put in incubator (5% CO2 and humidified atmosphere, 37 °C) for 1 day. Before seeding cells, the dish is washed with PBS (−) 2 times.

Staining for fluorescent imaging

The cells are washed with PBS (−) for 3 times and fixed with 4% paraformaldehyde (paraformaldehyde phosphate buffer solution, 163-20145) (Wako Pure Chemicals, Osaka, Japan) for 1 h at 4 °C. For permeabilization of cell membrane the cells are treated with 1 ml of 0.5% triton X-100 in 1 ml PBS (−) for 5 min. After washing with PBS (−) for 3 times, myosin heavy chains are stained with 2.5 μg/ml MHC (Anti-Myosin Heavy Chain Alexa Fluor 488, 53-6503) (eBioscience, San Diego, CA, USA) for 1 h under light shielding condition. Before the microscopy observation, the cells are washed with PBS (−) for 3 times.

Image acquisition

Phase-contrast images of C2C12 cells are obtained with a microscope (IX83, Olympus), an objective lens (UPlanFLN, 10×, Ph1, Olympus) and a CCD (DP70, 4080 × 3072 pixels, Olympus). As training data set for CNN, 1400 cellular images with the size of 200 × 200, 300 × 300, and 400 × 400 pixels were constructed, respectively. This data set is prepared for Day 0, Day 3 and Day 6 each. All images were converted to gray-scale images before input to CNN.

Convolutional neural network (CNN)

CNN is a biologically-inspired multi-layer feed-forward architecture. CNN used in this study has typical structure including convolution layers, max-pooling layers, and full-connect layers. The structure is shown in Fig. 1, which is constructed with 2 successive convolution and max-pooling layers and two full-connect layers. The kernels are represented as blue (convolution) and yellow (pooling) rectangle in Fig. 1. As the sizes of kernels in first convolution layer 3 × 3, 5 × 5, 7 × 7 are used and in the second convolution layer the size of 3 × 3, 6 × 6, 10 × 10 are used. A total of nine combinations of these kernel sizes are examined to explore a better CNN structure. The strides in both convolution layers are 1 pixel. The number of convolution filters is 32 and 64 for each convolution layer. The kernel sizes in both pooling layers are 2 × 2 and the strides are 2 pixels. Feature maps are derived from the convolution with the input data. The number of data dimensions after the second pooling is 160,000. The numbers of the nodes in first full-connect layer is 1024. The output is composed of three nodes in Fig. 1 to recognize three conditions of cells (Day 0, Day 3, and Day 6). Rectified Linear Unit (ReLU) is used for activation function, except for the output layer. Softmax function is used in the output layer to make each output value of output layer’s nodes probabilities for each class. Training is proceeded by backpropagation algorithm with AdamOptimizer. The training of the CNN is conducted until cross entropy loss function reaches approximately 0. When epoch increase until 30,000 times, the loss function decrease and goes down to approximately 0. The loss is checked every 100 epochs with the training image data and the test image data to detect overfitting where the loss function using training data decrease, however, the loss of test data classification is to increase. The accuracy is defined as the number of correctly classified images divided by the number of input images. For example, if 100 test image data is input and DL outputs a correct class for 90 images, the accuracy is 90%.

Fig. 1
figure 1

Structure of convolutional neural network used in this study. This network is constructed by two sets of convolution and pooling layers followed with two full-connect layers. The kernel of the convolution layer is represented by blue–green color, and the kernel of the pooling layer is represented by yellow color. Each kernel size is M × M (M = 3 or 5 or 7), N × N (N = 3 or 6 or 10), and 2 × 2

Computer and software

For operating CNN, we use custom made personal computer with a Central Processing Unit (CPU) (Corei7-6700K, Intel), a gaming Graphics Processing Unit (GPU) (GTX1080, 8 GB memory, NVIDIA). This PC is really common-use even in general home. The CNN program was written using deep learning framework TensorFlow version 0.11 provided by Google.

Results

Phase-contrast and fluorescent images of cultured C2C12 cells

Mouse skeletal myoblast cells (C2C12) are used to recognize cellular state during differentiation. When differentiation is started after myoblast cells are cultivated to reach confluent, the cell states of myoblasts are defined as Day 0. Figure 2a shows the schematic image of the cellular differentiation and the fusion to be myotubes cells. Cellular morphology changed from round shape to elongated tubular shape due to fusion of cells during differentiation. Figure 2b is phase-contrast images of C2C12 cells during Day 0, 3 days after induction of differentiation (Day 3), and 6 days after induction of differentiation (Day 6). Cell differentiation classification using CNN is performed using data of different image sizes. The yellow, cyan, and magenta rectangles in each image indicate the size of 200 × 200, 300 × 300, and 400 × 400 pixel regions. The length of one side of the rectangle is about 80, 120, and 160 µm, respectively. In the differentiation process, cellular shape becomes tubular shape and aligned with neighbor cells. Differentiation of C2C12 cells is evaluated by expression of myosin heavy chain (MHC). Immuno-fluorescent staining against MHC indicates the cells on Day 0 and Day 3 rarely express MHC (Fig. 3). The cells on Day 6 show clear fluorescence from MHC in myotubes cells, which means differentiation of C2C12 cells occur 6 days after induction of differentiation. The cells on Day 3 are considered to be differentiating.

Fig. 2
figure 2

a Schematic image of differentiation and b bright field images of C2C12 cells

Fig. 3
figure 3

a Fluorescent images of C2C12, and b merged image with the phase-contrast images

Classification of C2C12 cells depend on the differentiation with CNN

For each day (Day 0, Day 3, Day 6), 1400 cell images were prepared for each image size (200 × 200, 300 × 300, 400 × 400). 1260 images out of 1400 images are used for training data, and the other 140 images are used for test data. Classification accuracy is obtained by 10-fold cross validation. Because the accuracy depends on architecture of the CNN, the sizes of kernels in first convolution layer and the second convolution layer are modified.

Table 1 shows the accuracy under each condition. The vertical column on the left represents the image size, and the horizontal column above represents the size of the kernel. For example, 3–6 indicates that the kernel size of the first convolution layer is 3 × 3 and the kernel size of the second convolution layer is 6 × 6. As a combination of kernel sizes, a total of nine ways are implemented. In all cases, 3–10 showed the highest accuracy. For each image size, the accuracy was 73.9, 78.1 and 82.4%. The data shows that the larger the image size, the higher the accuracy. This is thought to be because it is difficult to capture the shape of the whole cell if the image is small as shown in the yellow rectangles in Fig. 2. Although it seems no problem for the image of Day 0, in the image of Day 3 and Day 6, the cell does not fit within the image with the image size of 200 × 200 pixels. The images of Day 3 and Day 6 in the image size of 200 × 200 pixels looks similar, which makes it difficult to distinguish. On the other hand, it is not a simple matter to increase the image size, but if the image size is increased, it becomes impossible to finely discriminate one cell level.

Table 1 Results of classification accuracy depend on the image size and the kernel size of CNN

Data augmentation

Generally, in deep learning, it is possible to raise the accuracy by increasing the number of data. Even if the image of C2C12 cells is rotated or flipped, the essence of visual information of the cell does not change so that it is multiplied 4 times by rotation and further increased 2 times by flip, and finally 8 times the number of images are obtained. Since the original image is 1400 images, we restarted CNN learning using 11,200 images. Learning was done in the same way as before using the 3–10 model which output the highest accuracy. The results are shown in Table 2. In each image size, the accuracy increased by about 10% by data augmentation, which are 84.9, 88.4 and 91.8%.

Table 2 Results of classification accuracy after data augmentation

Colour mapping of the classification results

Cell differentiation mapping using one test image (3072 × 4080 pixels) is performed using a learned model (image size 400 × 400, model 3–10, number of the training images 11,400) which had the highest accuracy. An image of 400 × 400 size is cut out from the test image and input to CNN to discriminate cells. After judging Day 0, Day 3, Day 6, respectively, color one pixel at the center of the image with blue, green and red colors, respectively. The image area to be cut is shifted by one pixel and the entire test image is scanned all over. The result of the color mapping is shown in Fig. 4. Cells of each test data are almost exactly classified (Fig. 4d–f). Although there are some mistakes, it is a place where abnormality of cell shape is observed as seen by human eyes. Particularly, the cells of Day 0 that are discriminated as Day 3 (green) and the cells discriminated as Day 6 (red) in the upper left part of the cell image of Day 3 are abnormally shaped. It is thought that these mistakes can be reduced by learning more image patterns.

Fig. 4
figure 4

Phase contrast images of a Day 0, b Day 3, and c Day 6. df are classification results of CNN The colors of blue, green, and red indicate that they were classified as Day 0, Day 3, and Day 6, respectively

Discussion

The differentiation of C2C12 cells are recognized with high accuracy using CNN. It should be noted that there is room for higher accuracy by improving the hyperparameter problem: it is still unknown how to optimize the parameters such as number of convolution layers, pooling layers, full-connect layers, and sizes of kernels, and initial weight. The recent trend shows large number of the layer provides high accuracy [10, 14, 17]. GoogLeNet has 22 layers and a CNN constructed by a team from Microsoft Research Asia has 152 layers, both the CNN presented the highest accuracy in ILSVRC 2014 and 2015, respectively [10, 14]. Hyper parameter setting is major step to improve the accuracy of CNN. Although 1400 images or 11,200 images provided by data augmentation are used in this paper, further increase of training dataset is one reliable way to raise the accuracy.

CNN have been approached human’s recognition level and in a regulated condition (e.g. ILSVRC) current CNN could be overcoming human performance [14]. Even though cellular morphology is similar with each other and still complicated, in future it is expected to substitute well experienced human observers who check cellular qualities/conditions and DL may provide homogeneous quality avoiding the error attribute to individual personal differences. The distinctive feature of DL different from human is that DL provides quantitative parameter of cellular information. This means that DL represents human’s sense as converting cellular species and conditions to certain quantitative values.

We utilized phase-contrast imaging because of no-invasiveness, highly prevalent, low cost, and fast imaging method. These characteristics are suitable for recognition of differentiation in regenerative medicine. To realize higher accuracy of the image recognition, another non/less-invasive imaging method could be combined, such as auto-fluorescent imaging [18], Raman imaging [19], imaging flowcytometer [20] and so on. Not only using single imaging technique, but the combination of some imaging technique and deep learning method will provide deeper information of single cells. It will be possible to learn the correlation between molecular information and cellular functions because fluorescence and Raman data tells us molecular information of cells. It is noteworthy, using many cellular images taken by high-throughput imaging device, DL has potential to draw up hidden information related to cellular functions.

In conclusion, we applied CNN of deep learning method for recognition of cellular morphology change due to differentiation. The differentiation of C2C12 cells is recognized with 91.8% accuracy with automatic feature extraction. It is successfully demonstrated the label-free and morphology-dependent cellular recognition, means could be applied for another cell type in regenerative medicine. We believe this method will facilitate evaluation of cellular quality/condition, drug response/screening, and cancer diagnosis.