Keywords

1 Weather Classification

The most popular methods for the automatic recognition of the current weather is the use of weather stations. They are systems equipped with specialized hardware (e.g. light intensity sensors, rain detectors, temperature sensors, humidity sensors). These systems collect very detailed data that it is reflected by their relatively high cost. A cheaper alternative may be to use the magic of machine learning which allows to build a classifier that will analyze the provided photo and be able to determine the general weather condition. This paper evaluate models built on top of three different neural network architectures.

2 Transfer Learning

Transfer learning is a machine learning technique of reusing a previously prepared model to train new one for related problem. Instead of starting the learning process from scratch, it starts with the patterns learned in solving the related task. It is widely used in image recognition because neural networks try to detect edges, shapes and then features. With transfer learning it is possible to use edge and shapes detection layers of pre-trained model and then train custom feature layers. It is much faster than training model from scratch, uses less computing power and allows to train model with a relatively small amount of data. Overall concept of transfer learning shows Fig. 1.

Fig. 1.
figure 1

Transfer learning concept

3 Architectures

Inception, ResNet, and MobileNet are the convolutional neural networks commonly used for an image classification task. Although they carry out similar problems and are based on different architectures, some differences can be expected in the results of specific tasks such as weather classification.

3.1 Inception

Inception architecture is based on two concepts - 1\(\,\times \,\)1 Convolution and Inception Module. Deep neural networks are expensive in terms of computation. Thanks to 1\(\,\times \,\)1 Convolution it is possible to decrease number of computations by reducing number of input channels. It causes that depth and width of neural network can be increased. Inception Module performs computations of some convolution layers simultaneously and then combines results.

InceptionV3 is a convolutional neural network that is 48 layers deep.

The network has an image input size of 299\(\,\times \,\)299.

3.2 MobileNet

MobileNet targets mobile and embedded systems. This architecture is based on an inverted residual structure, which connections are between the bottleneck layers. It uses lightweight depthwise convolutions for features filtering.

This architecture allows to build lightweight models which do not need much computing power.

MobileNetV2 is a convolutional neural network that is 53 layers deep.

The he network has an image input size of 224\(\,\times \,\)224.

3.3 ResNet

ResNet (Residual Networks) uses concept of identity shortcut connection that allows to jump over some layers. It partially solves vanishing gradients and mitigate accuracy saturation problem. The identity shortcuts simplifies the network and speeds learning process up.

ResNet50 is a convolutional neural network that is 50 layers deep.

The network has an image input size of 224\(\,\times \,\)224.

4 Metrics

4.1 Precision, Recall, Accuracy, F1

These metrics are widely used in binary classification where only two categories are taken into consideration. In multi classification solutions they might be calculated in multiple ways, but the most popular is to calculate them as the average of every single metric across all classes. Precision represents proportion of predicted positives that are truly positive. Values closer to 1 means high precision and shows that there is a small number of false positives.

$$\begin{aligned} Precision = \dfrac{TruePositives}{TruePositives+FalsePositives} \end{aligned}$$

Recall is calculated as a proportion of actual positives that have been classified correctly. Values closer to 1 means high recall and shows that there is a small number of false negatives.

$$\begin{aligned} Recall = \dfrac{TruePositives}{TruePositives+FalseNegatives} \end{aligned}$$

Accuracy measures proportion of number of correct predictions to total number of samples. It helps to detect over-fitting problem (models that overfit have usually an accuracy of 1).

$$\begin{aligned} Accuracy = \dfrac{CorrectPredictions}{TotalPredictions} \end{aligned}$$

F1 Score combines precision and recall metrics by calculating their harmonic mean.

$$\begin{aligned} F1 = 2*\dfrac{Precision*Recall}{Precision+Recall} \end{aligned}$$

4.2 Log-Loss, Log-Loss Reduction

Logarithmic loss quantifies the accuracy of a classifier by penalizing incorrect classifications. This value shows uncertainty of prediction using probability estimates for each class in the dataset. Log-loss increases as the predicted probability diverges from the actual label. Maximizing the accuracy of the classifier causes minimizing this function.

Logarithmic loss reduction (also called reduction in information gain - RIG) gives a measure of how much improves on a model that gives random prediction. Value closer to 1 means a better model.

4.3 Confusion Matrix, Micro-averages, Macro-averages

Confusion matrix contains precision and recall for each class in multi-class classification problem.

A macro-average computes the metric independently for each class and then take the average (treats all classes equally).

A micro-average aggregates the contributions of all classes to compute the average metric.

Micro- and macro-averages may be applied for every metric.

In a multi-class classification problem, micro-average is preferred because there might be class imbalance (significant difference between number of class’ examples).

5 Dataset

Models were build on custom six class weather image dataset. Images were scraped from web. Despite the fact that the used trainer does not require data normalization [9], the images were normalized to specified aspect ratio (1:1) and size (512\(\,\times \,\)512 pixels). This image size has been chosen in order to not to favor any architecture (InceptionV3 prefers 299\(\,\times \,\)299, MobileNetV2 and ResNet50 prefer 224\(\,\times \,\)224). Training set details are described below.

Total: 1577 images

Image format: JPEG

Image size: 512\(\,\times \,\)512 pixels

Color space: sRGB

Categories:

  • Clouds

  • Fog

  • Rain

  • Shine

  • Storm

  • Sunrise

Table 1 presents number of images of each category and their share in total.

Table 1. Categories

6 Models

Three different models have been built to classify weather conditions. They are based on InceptionV3, MobileNetV2 and ResNet50 architectures and all of them were trained with the same dataset, specified in section above. As the solution uses transfer learning, models were trained on top of feature vectors provided by TensorFlow Hub [10,11,12]. All feature vectors were originally trained with ImageNet (ILSVRC-2012-CLS) dataset. Microsoft ML .NET library was used to train and evaluate models. The pipeline uses cross-validation with 10 numbers of folds. In machine learning cross-validation is a technique to measure the variability of a dataset. It also measures the reliability of any model trained using that data. Cross-validation algorithm divides randomly the dataset into subsets (folds). Then it builds a model on each subset and returns a set of accuracy metrics for each subset. One fold is used for validation and the others are used for training.

According to ML .NET trainer documentation [13], values of the hyperparameters are presented in Table 2.

Table 2. Hyperparameters

What are these hyperparameters?

The batch size is a number of samples processed before the model is updated.

The number of epochs is the number of complete passes through the training dataset.

Learning rate determines the step size at each iteration while moving toward a minimum of a loss function.

The source code that have been used for model creation, training and testing is publicly available on GitHub repository [14].

The most important training metrics are shown in Table 3.

Table 3. Models evaluation metrics

Additionally each model pass performance and accuracy tests measuring total classification time for two groups of images of different size and aspect ratio. In this case there was no images normalization.

First experimental set contained 37 manually selected pictures, found on the web, of all six trained classes (clouds, fog, rain, shine, sunrise, storm). Among these data were many problematic items which poor quality have entailed that they were difficult to classify even for a human. The second set of data was “Multi-Class Images for Weather Classification” found on kaggle [15]. It had images of only four classes (clouds, rain, shine, sunrise), but was much larger than first experimental set - contained 1125 images. Both experimental datasets included images that were not used to create any of the models (Fig. 2).

Fig. 2.
figure 2

Example of potentially problematic images

Classifier trained on top of MobileNetV2 was almost 4 times faster than models based on the other architectures. This confirms that MobileNetV2 architecture would be the best choice for mobile and embedded systems. InceptionV3 and ResNet50 classified correctly more images than MobileNetV2. Table 4 and Table 5 shows time performance and accuracy for both experimental datasets.

Table 4. First experimental group - performance and accuracy
Table 5. Second experimental group - performance and accuracy

7 Summary

Thanks to transfer learning it is possible to train custom classifiers without large dataset and computing power. Global efficiency and accuracy may depend on neural network architecture. All created models were able to classify weather with accuracy of 70–73% (poor quality dataset of six classes) and 95–97% (good quality dataset of four classes). InceptionV3 and ResNet50 architectures had similar classification time and accuracy. MobileNetV2 had the shortest classification time and achieved competitive results. ResNet50 model achieved a slightly higher average accuracy (based on model evaluation) than existing image-based weather classification models [1].

Selected neural networks were compared due to significant differences in architecture and the availability of vector features used for knowledge transfer. All vector features were trained on the same data set, which allowed to conclude that the differences in the achieved results result mostly from architectural differences of the compared neural networks.

Weather classification is complicated due to the difficulty of extracting the characteristics of weather phenomena. Some weather conditions are extremely difficult for classification, e.g. heavy rain may look like a fog especially if input image resolution is low.

Another difficulty is the possibility of mixing atmospheric phenomena. It happens quite often that it is rainy during storm or sun shines during rain. Such situations can not be easily resolved by simple image classification because of its limits that is necessity to select only one option (class).

An integral part of work is a source code wrote using C# programming language and ML .NET framework [14]. Despite many searches, it was not possible to find a program that would deal with similar issues, which would be built with the mentioned technologies. Provided source code allows to follow line by line whole logic being used to build models, evaluate and use them. The use of non-mainstream technology enriches the originality of the solution.