Frontiers of Computer Vision Technologies on Real Estate Property Photographs and Floorplans

This article describes frontier efforts to apply deep learning technologies, which is the greatest innovation of research on arti ﬁ cial intelligence and computer vision, to image data such as real estate property photographs and ﬂ oorplans. Speci ﬁ cally, attempts to detect property photographs that violate regulations or were misclassi ﬁ ed, or to extract information that can be used as new recommendation features from property photographs, were mentioned. Besides, this article introduces an innovation created by providing data sets for academic communities.

lot of high-quality photographs. In recent years, higher-value image data such as panoramic photographs and movies are also posted. However, there is considerable variation of quality for property photographs posted on real estate information sites, because taking photographs is up to each owner or broker.
A notable feature of real estate property information in Japan is the enhancement of floorplan images. On the real estate information site LIFULL HOME'S, more than 90% of property information is given floorplans. Further utilization of unique contents such as floorplans will also be important in revitalizing the Japanese real estate markets.
As described above, image information such as property photographs and floorplans is very important in the real estate markets, and there are urgent needs for innovation to increase the value of image information. In particular, research and development activities are becoming active all over the world on how to incorporate image processing techniques such as deep learning, which has been rapidly developing in recent years.
This article briefly describes the revolution that deep learning, which is said to be the biggest innovation in recent artificial intelligence research, has brought to the image processing field in Sect. 23.2, and then in Sect. 23.3, research activities on applying deep learning to real estate property photographs, including application to actual services, are introduced. Section 23.4 focuses on an attempt to generate more innovation by providing a large amount of real estate property photographs and floorplan image data to the informatics and computer science research communities. Finally, Sect. 23.5 describes prospects in the future.

Revolution of Image Processing Technology by Deep Learning
In recent years, there has been an increasing interest in artificial intelligence in society. Today, it is said to be the third artificial intelligence boom following the 1960s and 1980s. "Deep learning" is regarded as a key technology of the third artificial intelligence boom. This section refers to the significant impact that deep learning has had on image processing research. Deep learning is a type of machine learning and an evolution of neural networks. Studies on neural networks started from imitating the human cranial nerve circuit, and its origin dates back to the 1940s (McCulloch and Pitts 1943).
The first boom on neural network studies began in 1958 with the perceptron (Rosenblatt 1958) published by Frank Rosenblatt. Although this perceptron (simple perceptron) has a simple structure with only two layers, an input layer and an output layer, as shown in Fig. 23.2, it attracted much attention at that time because it can learn and predict. However, Marvin Minsky, a famous artificial intelligence researcher, pointed out in 1969 that a simple problem using an exclusive OR (XOR) operation cannot be solved (Minsky and Papert 1969), and the boom once ended.
Subsequent studies showed that the XOR problem can be solved by inserting a hidden layer into a simple perceptron as shown in Fig. 23.3 to create a multilayer perceptron. Backpropagation, an efficient learning method for multilayered perceptrons, was proposed in 1986 by American cognitive psychologist David Rumelhart and others (Rumelhart et al. 1986), and the boom in neural network research began again. For example, a study in 1998 using the MNIST database, 1 which has been used for handwritten digit recognition tasks for evaluating machine learning algorithms, achieved high performance with an error rate of less than 2.5% with a three-layer perceptron (LeCun et al. 1998).
By the way, is it possible to use neural networks to recognize images that are much more complex than handwritten digits, such as real estate property photographs? It was said that increasing the number of layers could increase the learning ability of the neural network and recognize complex images, but if the number of layers increases, backpropagation will not work well. The result was inferior to other methods using human-designed image features. However, at the ILSVRC 2 2012, a competition for image recognition research held in 2012, the University of Toronto's system SuperVision, which adopted a method developed from a neural network, 1 A data set consisting of a set of handwritten numeric images and correct numeric labels. Provided by the National Institute of Standards and Technology (NIST). 2 ImageNet Large Scale Visual Recognition Challenge. A task is required computer to answer what objects (yachts, dogs, cats, flowers, etc.) are in the images. ImageNet is an image database maintained for the purpose of promoting research on image object recognition. More than 14 million image data associated with more than 20,000 synonyms (synsets) of WordNet, a concept dictionary of English.
achieved an accuracy that exceeded that of other teams (Krizhevsky et al. 2012). It has had a huge impact on the image processing and artificial intelligence research communities. The method used in SuperVision is the deep learning developed mainly by Professor Jeffrey Hinton at the University of Toronto.
The major point of deep learning is to enable learning of multilayered (deep) neural networks from tens to hundreds of layers by incorporating a kind of "information compressor" called an autoencoder into the neural network. The autoencoder plays the role of "compressing information," i.e., "extracting only essential features." The epoch-making point of deep learning is that it acquires high learning ability to capture essential features from images by layering the autoencoders.
When using deep learning actually, it is necessary to learn an enormous number of weighting parameters from an enormous amount of data, and a large amount of  Deep learning methods are being applied to various fields such as speech recognition, machine translation, robot control, and automated driving, but the most advanced applications and methods are still in the field of image processing.

Application of Deep Learning to Real Estate Property Photographs
Almost 5 years have passed since the effectiveness of deep learning became widely known and easy-to-use open-source software libraries have been developed.
Research and developments that apply deep learning to real estate property photographs are also increasing. This section introduces some recent examples.

Photograph Classification for Quality Improvements of Posted Photographs
As stated in the beginning, quality variation is a major issue in property photographs that are highly valued by users looking for real estate properties. In some cases, photographs that violate the regulations for real estate information are posted. Each company that operates a real estate information site strives to improve information quality through manual checks, etc., but there are limits to manpower in situations where more than millions of photograph data are submitted daily. Since then, efforts are being made to use state-of-the-art image processing technologies such as deep learning. Kikuta et al. (2016) reported an example of deep learning applied to the task of detecting anomalous photographs that violate regulations at the real estate information site SUUMO. In the task of detecting "photographs with people reflected," a type of convolutional neural network (Convolutional Neural Network, CNN) that is a deep learning method suitable for image processing is used. They reported that the probability of missing an abnormal photograph is less than 5%. Ishida and Kiyota (2016) used the LIFULL HOME'S data set (described later) to evaluate the accuracy of automatic discrimination by deep learning of 13 types 3 of photographs. It is reported that the error rate of 14.3% was achieved by learning from 130,000 photograph data (10,000 samples randomly selected for each type) using CNN. As shown on the left of Fig. 23.4, although the accuracy is low for classifications such as "living," where the judgment by humans also tends to fluctuate, the "kitchen" and "bath" achieve extremely high accuracy. Even in the error example, there are not a few examples that are considered to be classified into multiple types. On the right side of Fig. 23.4, there are subtle examples of errors such as classifying "bathroom washbasin" (the correct answer is "washbasin") as "bath." As mentioned above, real estate property photograph classification by deep learning has achieved the same level of accuracy as human beings at present, so application examples in the business are being reported. The author's company has been operating a system for detecting inconsistencies in the category of real estate property photographs submitted by real estate companies since December 2016 (LIFULL Co. Ltd. 2016). LIFULL HOME'S has a system that gives priority to displaying properties with more room photographs registered in the search results from the viewpoint of providing more useful information to users. As in the case of photographs, there is a problem that inconsistencies occur such as that photographs other than indoor photographs are registered by indoor type. Therefore, by using deep learning, the consistency rate is automatically calculated as shown in Fig. 23.5, and for the photographs that are inconsistent with the registration type, the registration real estate company is encouraged to correct it.

Photograph Analyses for Promoting Values of Property Information
In response to the diversification of users' needs for finding real estate properties, the real estate information site also supports adding various search conditions such as "counter kitchen," "broadband connection," and "convenience store nearby." However, since there are so many factors related to the ease of living of the property, the maintenance of the database has not kept up with the diversification of needs. In response to the diversification of users' needs for finding real estate properties, the real estate information site also supports adding various search conditions such as "counter kitchen," "broadband connection," and "convenience store nearby." However, since there are so many factors related to the ease of living of the property, the maintenance of the database has not kept up with the diversification of needs. Therefore, attempts have been made to improve the value of real estate property information by extracting indices related to comfortability of living from property photographs. Ishida and Kiyota (2016) focused on "comfortability of use of the kitchen," which greatly affects the ease of living and conducts an experiment to distinguish two types of indicators, "Kitchen type" and "Workspace," using deep learning. For the former, create a data set (consisting of 1000 photographs of each type, a total of 5000 photographs) classified into five types: "system kitchen," "simplified system kitchen," "non-system kitchen," "kitchen part," and "others." And by learning with CNN, a high accuracy of 11.6% error rate has been achieved. For the latter, we created a data set (categorized into 5500 photographs in Fig. 23.6, consisting of a total of 5500 photographs) that was categorized into 6 types including "very narrow" to "very wide" plus "others." Although the error rate of category discrimination is not so good at 36.2%, it can be seen from the mixing matrix (lower left of Fig. 23.6) that the size can be identified to some extent. When the correlation coefficient is calculated by assigning a breadth score to each category, it is 0.717  Fig. 23.6), and it can be expected that practical accuracy will be achieved by expanding the data set.

Promotion of Open Innovations in the Real Estate
Industries Through Provision of Data Sets for Academic Communities As mentioned above, applications of deep learning to real estate property photographs become active in business situations. However, there is an overwhelming shortage of human resources to implement deep learning in order to further draw out the potential of advanced image processing technologies such as deep learning and create new innovations. In particular, human resources who are familiar with deep learning are rare, and it is not realistic to create innovation with just one company. Therefore, our company began to activate studies related to real estate by providing a data set including image data such as property photographs and floorplans held by our company for academic research purposes. With the cooperation of the National Institute of Informatics of Japan (NII), we started providing "LIFULL HOME'S Data set" (National Institute of Informatics 2015) (Fig. 23.7) in November 2015. The LIFULL HOME'S data set includes information on all properties for rent (approximately 5.33 million) that were listed on LIFULL HOME'S as of September 2015, property photographs (approximately 83 million items) associated with it, and floorplan images (approximately 5.15 million items). It is currently provided to more than 80 university laboratories and research institutions in Japan and overseas. More than 3 years have passed since the launch, and very interesting research is being announced.
I would like to briefly introduce one of the very interesting research cases using the property photographs and floorplan image data included in the LIFULL HOME'S data set. A study group at Simon Fraser University in Canada has shown that it is possible to create new applications by solving the task of correlating floorplans with indoor photographs using deep learning (Liu et al. 2016). Consider the "quiz for selecting the correct bathroom photograph corresponding to the floorplan" as shown in Fig. 23.8 (the correct answer is (A)). This quiz is a very difficult task for humans, and even in an experiment by a crowdsourcing service (Amazon Mechanical Turk) worker, the correct answer rate is only 43%, and it takes 30 seconds or more on average to solve one task. However, by using a deep neural Fig. 23.7 LIFULL HOME'S data set network as shown in Fig. 23.9, a correct answer rate of 72% far exceeding that of human beings has been achieved, and more than 20 problems can be solved in one second.
If the deep neural network learned as described above is used, there is a possibility that the position on the floorplan corresponding to the indoor photograph can be estimated. When the visualization method is used, as shown in Fig. 23.10, it can be seen that the position on the floorplan corresponding to the indoor photograph (center) is correctly pointed by the red spot on the right side of the figure. This result

Conclusion
In this article, we introduced the outline and application examples of image processing technologies, especially deep learning, to further enhance the quality and value of property photographs and floorplans that are very important in real estate property information. Image processing technology is still developing rapidly, and it is expected that even greater innovations will be generated one after another.
On the other hand, researchers and engineers who are familiar with image processing technologies such as deep learning are extremely rare even in the world, and the competition for human resources is not only between companies but also between industrial fields. With research and development in various industrial fields such as advertising, finance, automobiles, and robots, to create new innovations in the real estate field, it is important to develop a mechanism that encourages people familiar with such technologies to engage in the real estate field. In order to attract such people, it is indispensable to present challenging tasks and to develop a data set and research community as infrastructure for research and development. I would like to make further contributions to the creation of such R&D infrastructure in the real estate field. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.