Keywords

1 Introduction

According to a recent survey in Japan (The Association of Real Estate Agents in Japan 2016), the proportion of home purchasers who collected real estate information using the Internet has reached 80% or more. Among various information posted on real estate information sites, property photographs are particularly important. In the questionnaire for users of the real estate information sites in Japan (Real Estate Information Site Business Liaison Council 2016), as a result of inquiring about important points (multiple answers allowed) when choosing real estate agents, 80.7% (first place) of the users chose “many photographs are posted”, and 27.5% (fifth place) of the users chose “posted photographs are good” (Fig. 23.1). The result suggests a tendency that property photographs are very important for user experience. The tendency of increasing importance of property photographs can be observed not only in Japan but also in other major countries. Zillow (United States), Rightmove (United Kingdom), SouFun (China), and other major portal sites post a lot of high-quality photographs. In recent years, higher-value image data such as panoramic photographs and movies are also posted. However, there is considerable variation of quality for property photographs posted on real estate information sites, because taking photographs is up to each owner or broker.

Fig. 23.1
A bar chart describes important points and the most important points. The important points are high at 80.7% in many photos posted. The most important points are high at 43.4% in many photos posted.

Points which users of real estate information sites use when choosing agents (cited from p. 7 of the reference Real Estate Information Site Business Liaison Council (2016))

A notable feature of real estate property information in Japan is the enhancement of floorplan images. On the real estate information site LIFULL HOME’S, more than 90% of property information is given floorplans. Further utilization of unique contents such as floorplans will also be important in revitalizing the Japanese real estate markets.

As described above, image information such as property photographs and floorplans is very important in the real estate markets, and there are urgent needs for innovation to increase the value of image information. In particular, research and development activities are becoming active all over the world on how to incorporate image processing techniques such as deep learning, which has been rapidly developing in recent years.

This article briefly describes the revolution that deep learning, which is said to be the biggest innovation in recent artificial intelligence research, has brought to the image processing field in Sect. 23.2, and then in Sect. 23.3, research activities on applying deep learning to real estate property photographs, including application to actual services, are introduced. Section 23.4 focuses on an attempt to generate more innovation by providing a large amount of real estate property photographs and floorplan image data to the informatics and computer science research communities. Finally, Sect. 23.5 describes prospects in the future.

2 Revolution of Image Processing Technology by Deep Learning

In recent years, there has been an increasing interest in artificial intelligence in society. Today, it is said to be the third artificial intelligence boom following the 1960s and 1980s. “Deep learning” is regarded as a key technology of the third artificial intelligence boom. This section refers to the significant impact that deep learning has had on image processing research.

Deep learning is a type of machine learning and an evolution of neural networks. Studies on neural networks started from imitating the human cranial nerve circuit, and its origin dates back to the 1940s (McCulloch and Pitts 1943).

The first boom on neural network studies began in 1958 with the perceptron (Rosenblatt 1958) published by Frank Rosenblatt. Although this perceptron (simple perceptron) has a simple structure with only two layers, an input layer and an output layer, as shown in Fig. 23.2, it attracted much attention at that time because it can learn and predict. However, Marvin Minsky, a famous artificial intelligence researcher, pointed out in 1969 that a simple problem using an exclusive OR (XOR) operation cannot be solved (Minsky and Papert 1969), and the boom once ended.

Fig. 23.2
A structure of the simple perceptron describes input flows to the set of three input layers x flows weight w to an output layer y. It produces output.

The structure of the simple perceptron

Subsequent studies showed that the XOR problem can be solved by inserting a hidden layer into a simple perceptron as shown in Fig. 23.3 to create a multilayer perceptron. Backpropagation, an efficient learning method for multilayered perceptrons, was proposed in 1986 by American cognitive psychologist David Rumelhart and others (Rumelhart et al. 1986), and the boom in neural network research began again. For example, a study in 1998 using the MNIST database,Footnote 1 which has been used for handwritten digit recognition tasks for evaluating machine learning algorithms, achieved high performance with an error rate of less than 2.5% with a three-layer perceptron (LeCun et al. 1998).

Fig. 23.3
A structure describes the input flows to input layers x. It flows weight w 1 to hidden layers z. Each input flows to each hidden layer. From that weight w 2 flows to an output layer y and produces output.

The structure of the multilayer perceptron

By the way, is it possible to use neural networks to recognize images that are much more complex than handwritten digits, such as real estate property photographs? It was said that increasing the number of layers could increase the learning ability of the neural network and recognize complex images, but if the number of layers increases, backpropagation will not work well. The result was inferior to other methods using human-designed image features. However, at the ILSVRCFootnote 2 2012, a competition for image recognition research held in 2012, the University of Toronto’s system SuperVision, which adopted a method developed from a neural network, achieved an accuracy that exceeded that of other teams (Krizhevsky et al. 2012). It has had a huge impact on the image processing and artificial intelligence research communities. The method used in SuperVision is the deep learning developed mainly by Professor Jeffrey Hinton at the University of Toronto.

The major point of deep learning is to enable learning of multilayered (deep) neural networks from tens to hundreds of layers by incorporating a kind of “information compressor” called an autoencoder into the neural network. The autoencoder plays the role of “compressing information,” i.e., “extracting only essential features.” The epoch-making point of deep learning is that it acquires high learning ability to capture essential features from images by layering the autoencoders.

When using deep learning actually, it is necessary to learn an enormous number of weighting parameters from an enormous amount of data, and a large amount of computing power is required. The use of GPGPU (General-Purpose computing on Graphics Processing Units) is practically essential.

Deep learning methods are being applied to various fields such as speech recognition, machine translation, robot control, and automated driving, but the most advanced applications and methods are still in the field of image processing.

3 Application of Deep Learning to Real Estate Property Photographs

Almost 5 years have passed since the effectiveness of deep learning became widely known and easy-to-use open-source software libraries have been developed. Research and developments that apply deep learning to real estate property photographs are also increasing. This section introduces some recent examples.

3.1 Photograph Classification for Quality Improvements of Posted Photographs

As stated in the beginning, quality variation is a major issue in property photographs that are highly valued by users looking for real estate properties. In some cases, photographs that violate the regulations for real estate information are posted. Each company that operates a real estate information site strives to improve information quality through manual checks, etc., but there are limits to manpower in situations where more than millions of photograph data are submitted daily. Since then, efforts are being made to use state-of-the-art image processing technologies such as deep learning.

Kikuta et al. (2016) reported an example of deep learning applied to the task of detecting anomalous photographs that violate regulations at the real estate information site SUUMO. In the task of detecting “photographs with people reflected,” a type of convolutional neural network (Convolutional Neural Network, CNN) that is a deep learning method suitable for image processing is used. They reported that the probability of missing an abnormal photograph is less than 5%.

Ishida and Kiyota (2016) used the LIFULL HOME’S data set (described later) to evaluate the accuracy of automatic discrimination by deep learning of 13 typesFootnote 3 of photographs. It is reported that the error rate of 14.3% was achieved by learning from 130,000 photograph data (10,000 samples randomly selected for each type) using CNN. As shown on the left of Fig. 23.4, although the accuracy is low for classifications such as “living,” where the judgment by humans also tends to fluctuate, the “kitchen” and “bath” achieve extremely high accuracy. Even in the error example, there are not a few examples that are considered to be classified into multiple types. On the right side of Fig. 23.4, there are subtle examples of errors such as classifying “bathroom washbasin” (the correct answer is “washbasin”) as “bath.”

Fig. 23.4
A set of seven photographs. Four photographs of correct answers for the kitchen 97.3%, living 52%, floorplan 91%, and bath 100%. Three incorrect answers for correct living, wash basin, and storage.

Classification of property photograph types using deep learning

As mentioned above, real estate property photograph classification by deep learning has achieved the same level of accuracy as human beings at present, so application examples in the business are being reported. The author’s company has been operating a system for detecting inconsistencies in the category of real estate property photographs submitted by real estate companies since December 2016 (LIFULL Co. Ltd. 2016). LIFULL HOME’S has a system that gives priority to displaying properties with more room photographs registered in the search results from the viewpoint of providing more useful information to users. As in the case of photographs, there is a problem that inconsistencies occur such as that photographs other than indoor photographs are registered by indoor type. Therefore, by using deep learning, the consistency rate is automatically calculated as shown in Fig. 23.5, and for the photographs that are inconsistent with the registration type, the registration real estate company is encouraged to correct it.

Fig. 23.5
A chart describes the three photographs with specified types, and their scores and results.

Inconsistency detection of property photograph classification using deep learning

3.2 Photograph Analyses for Promoting Values of Property Information

In response to the diversification of users’ needs for finding real estate properties, the real estate information site also supports adding various search conditions such as “counter kitchen,” “broadband connection,” and “convenience store nearby.” However, since there are so many factors related to the ease of living of the property, the maintenance of the database has not kept up with the diversification of needs.

In response to the diversification of users’ needs for finding real estate properties, the real estate information site also supports adding various search conditions such as “counter kitchen,” “broadband connection,” and “convenience store nearby.” However, since there are so many factors related to the ease of living of the property, the maintenance of the database has not kept up with the diversification of needs. Therefore, attempts have been made to improve the value of real estate property information by extracting indices related to comfortability of living from property photographs. Ishida and Kiyota (2016) focused on “comfortability of use of the kitchen,” which greatly affects the ease of living and conducts an experiment to distinguish two types of indicators, “Kitchen type” and “Workspace,” using deep learning. For the former, create a data set (consisting of 1000 photographs of each type, a total of 5000 photographs) classified into five types: “system kitchen,” “simplified system kitchen,” “non-system kitchen,” “kitchen part,” and “others.” And by learning with CNN, a high accuracy of 11.6% error rate has been achieved. For the latter, we created a data set (categorized into 5500 photographs in Fig. 23.6, consisting of a total of 5500 photographs) that was categorized into 6 types including “very narrow” to “very wide” plus “others.” Although the error rate of category discrimination is not so good at 36.2%, it can be seen from the mixing matrix (lower left of Fig. 23.6) that the size can be identified to some extent. When the correlation coefficient is calculated by assigning a breadth score to each category, it is 0.717 (lower right in Fig. 23.6), and it can be expected that practical accuracy will be achieved by expanding the data set.

Fig. 23.6
A set of five kitchen photographs and a table. Photographs describe the classes as very narrow, narrow, normal, wide, and very wide. Very narrow 20, narrow 40, normal 60, wide 80, and very wide 100.

Detection of workspace width of kitchens

4 Promotion of Open Innovations in the Real Estate Industries Through Provision of Data Sets for Academic Communities

As mentioned above, applications of deep learning to real estate property photographs become active in business situations. However, there is an overwhelming shortage of human resources to implement deep learning in order to further draw out the potential of advanced image processing technologies such as deep learning and create new innovations. In particular, human resources who are familiar with deep learning are rare, and it is not realistic to create innovation with just one company.

Therefore, our company began to activate studies related to real estate by providing a data set including image data such as property photographs and floorplans held by our company for academic research purposes. With the cooperation of the National Institute of Informatics of Japan (NII), we started providing “LIFULL HOME’S Data set” (National Institute of Informatics 2015) (Fig. 23.7) in November 2015. The LIFULL HOME’S data set includes information on all properties for rent (approximately 5.33 million) that were listed on LIFULL HOME’S as of September 2015, property photographs (approximately 83 million items) associated with it, and floorplan images (approximately 5.15 million items). It is currently provided to more than 80 university laboratories and research institutions in Japan and overseas. More than 3 years have passed since the launch, and very interesting research is being announced.

Fig. 23.7
A webpage of informatics research data repository. It contains the L I F U L L homes dataset and outlines the data. It includes information on all properties for rent, high-resolution floor plan image data, and monthly data.

LIFULL HOME’S data set

I would like to briefly introduce one of the very interesting research cases using the property photographs and floorplan image data included in the LIFULL HOME’S data set. A study group at Simon Fraser University in Canada has shown that it is possible to create new applications by solving the task of correlating floorplans with indoor photographs using deep learning (Liu et al. 2016). Consider the “quiz for selecting the correct bathroom photograph corresponding to the floorplan” as shown in Fig. 23.8 (the correct answer is (A)). This quiz is a very difficult task for humans, and even in an experiment by a crowdsourcing service (Amazon Mechanical Turk) worker, the correct answer rate is only 43%, and it takes 30 seconds or more on average to solve one task. However, by using a deep neural network as shown in Fig. 23.9, a correct answer rate of 72% far exceeding that of human beings has been achieved, and more than 20 problems can be solved in one second.

Fig. 23.8
A diagram and a set of four photographs. A diagram of the floor. A, B, C, and D photographs are the floor plan of the diagram.

Which of the four photographs corresponds to the left floorplan?

Fig. 23.9
A diagram describes the convolutional and fully connected layers and feature vectors. The floor is a feature plan. Image 1 to K is convolutional layers. These are fully connected with classification.

The architecture of the deep neural network for answering the questionnaire in Fig. 23.8

If the deep neural network learned as described above is used, there is a possibility that the position on the floorplan corresponding to the indoor photograph can be estimated. When the visualization method is used, as shown in Fig. 23.10, it can be seen that the position on the floorplan corresponding to the indoor photograph (center) is correctly pointed by the red spot on the right side of the figure. This result seems to suggest the possibility of realizing new navigation based on floorplans on the real estate information site.

Fig. 23.10
A set of eight diagrams and four photographs. Four diagrams describe the floor plan, the floor, and describe the receptive field.

Visualization for the receptive field of the prediction by the neural network

5 Conclusion

In this article, we introduced the outline and application examples of image processing technologies, especially deep learning, to further enhance the quality and value of property photographs and floorplans that are very important in real estate property information. Image processing technology is still developing rapidly, and it is expected that even greater innovations will be generated one after another.

On the other hand, researchers and engineers who are familiar with image processing technologies such as deep learning are extremely rare even in the world, and the competition for human resources is not only between companies but also between industrial fields. With research and development in various industrial fields such as advertising, finance, automobiles, and robots, to create new innovations in the real estate field, it is important to develop a mechanism that encourages people familiar with such technologies to engage in the real estate field. In order to attract such people, it is indispensable to present challenging tasks and to develop a data set and research community as infrastructure for research and development. I would like to make further contributions to the creation of such R&D infrastructure in the real estate field.