How to use Google street view for a time-lapse data collection methodology: potential uses for retailing

Finding the optimal location is a relevant strategic decision for retailers. The classic theories of retail location offer complementary perspectives, and later models include new variables, although they present methodological problems, these methodologies are static in time. Google Street View (GSV) allows extending the analysis of predictive models to different fields by a time-lapse collection data offering new opportunities to research and providing dynamic information. The development of a customized methodology, incorporating the time-lapse technique for practical applications, is the main contribution of this research, since there is almost no research on this topic.


Introduction
This article aims to show a new methodology for data collection based on Google Street View (GSV) using the timelapse function, which allows planning and decision-making based on direct observation of a certain environment, generally urban. As an example of the development of the data collection methodology, its potential use will be applied to the location of retail stores.
To study the state of the art regarding the existence or not of a similar methodology prior to the publication of this article, we have carried out a systematic study based on bibliometric techniques for literature review, mainly through the Web Science (WoS) Core Collection database and as an accessory and to verify the validity of the search, the Scopus database.
"Google Street View" has been used as a keyword since in this way it includes similar formulations such as Streetview or Street-view. Following this initial step, a process of data cleansing is performed. The databases contain different types of documents that contain the three words in the title, abstract, keywords included by the author himself or Keywords Plus.
For our study, we delimited the number of publications by eliminating certain types of documents, either because they are not complete investigations, or because they are reviews that do not develop new methodologies. Thus, the following document types were excluded: proceedings paper (142), literature reviews (7), editorials (5), and poetry (1). On the other hand, we retain research areas, categories, languages and other search criteria, resulting in 303 documents. The results are essentially repeated in Scopus with 301 documents.
By analyzing the documents through their summaries and / or reading the complete texts, we can highlight the following characteristics: • In all cases, the treatment of the GSV results is static over time and the temporal correlation is not used.
• In most cases, it has been used as an alternative or comparative to traditional physical exploration. • No document has been found that focuses on GSV as its own methodology, but rather always as part of previous methodologies.
Following the traditional structure of bibliometric analysis and with reference to their evolution over time, the most prolific authors and journals, as well as the main categories and areas of knowledge, are analyzed in order to obtain an overview of the state of the art.
Regarding the temporal evolution of these articles, from 2010 to 2019, the number of articles that include GSV has multiplied by 10, evidence that it is a tool whose utility has spread to academia.
With regards to the scientific production by authors, we can highlight Xiaojiang Li and Carlo Ratti, from the Massachusetts Institute of Technology (MIT), who, with 17 registered documents, are the researchers who have most focused their studies on GSV as a methodology. Their work "show a methodology using GSV panoramic images to estimate and predict the appearance of sun glare.
However, it is the article "Where, When, Why, and For Whom Do Residential Contexts Matter? Moving Away from the Dichotomous Understanding of Neighborhood Effects", by Professors Sharkey and Faber, which has received the greatest number of citations (207) (Sharkey and Faber 2014). While this article does not address the use of GSV directly, it is widely used in its bibliography. The article titled "Using Google Street View to Audit Neighborhood Environments" (Rundle et al. 2011), with 203 citations, is the most cited.
The five main areas of knowledge, according to the WoS classification system, are Public Environmental Occupational Health (58); Geography (42), Urban Studies (35); Environmental Studies (30) and Physical Geography (26), practically coinciding with the research areas of the documents studied. Similar results are obtained when applied to the searches on Scopus.
To confirm the previous analysis carried out using WoS and Scopus, the search was expanded to Google Scholar, finding 13,300 results. This tool does not allow effective selection and refinement of results. However, the search for documents that contained "Google Street View" in the document title yielded 244 results, consistent with the results obtained with the other databases, thus confirming their validity.
As the design of the new methodology will fundamentally be used for urban elements, we have analyzed the 35 documents in the Urban Studies knowledge area and the 7 in Economics (Table 1).
To find out if similar methodologies exist, the following criteria have been considered: -The object of study: investigations that focus on commercial premises or on building typologies, or that are related to the practical application developed for the new methodology. -Whether the use of the GSV images has been static (photographs with a single reference year), or dynamic (the temporary functionality has been used). -Whether the use of GSV is part of a new technique or development of a methodology.
Ewing and Clemente (2013) present an interesting analysis of the intangible elements of cities, studying the case of New York. Similarly, the analysis by Monteiro and Turczyn (2018) of Google Earth and GSV images, complemented by street level observations and photographs, adapts them to the categories of pattern identification in metropolitan territories.
The study by Lee and Talen (2014), in turn, contributes to the literature on walkability measurement by proposing a hybrid auditing method that combines a GIS-based approach with GSV. Hipp et al. (2017) propose a dynamic application similar to that of Grubesic et al. (2018). In the case of the former, their study does not use GSV images but rather employs data obtained from images shot every 30 min by surveillance cameras in order to look for patterns of physical activity. In the case of the latter, their study raises the issue of frequency of images as an added difficulty of the study, since temporality is given by the use of two different tools, Google and Microsoft (Bing) showing multi-year temporary gaps.
In 2019, two methodological investigations are published. While Zhang et al. rely on open data to measure the quality of life and health of neighborhoods in Atlanta (USA), and use GSV as a data source; Middel et al. use GSV imagery to assess urban form and composition of cities from a humancentric perspective.
As can be seen in Table 1, none of the references study the three variables considered in this study at the same time. That is, none of them consider urban planning or building as an object of study, development of a new methodology, and that it be a temporarily dynamic analysis in the same study.

Retailing location choice theories
The location of a store is one of the most relevant strategic decisions for retailers. Kuo et al. (2002) claim that choosing a location is one of the most critical decisions of a small retail establishment. Moreover, the choice of location may be a determining factor for the success or failure of a retailer (Scarborough and Zimmerer 2004). Jaravaza and Chitando (2013) study the role of store location as an influencing factor in customers' store choice, furthermore, Ilbahar and Kahraman (2018) state that retail store selection is an important decision for both customers and retailers since it is directly linked to customer satisfaction and retailers´ profit. Given the extensive and multidisciplinary array of literature on the issue of store site selection (Nwogugu 2006), there exists a variety of ways to measure the "ideal location" for commercial establishments.
The three classic theories of retail location propose different ways of measuring the potential of commercial locations. The Principle of minimum differentiation (Hotelling 1929), argues that the most important factor is the relative  Guo (2013b) Parking Tool Gordon and Janzen (2013) Suburbs Guo (2013a) Parking Economy Hanson et al. (2013) Transport Lee and Talen (2014) Walkable proximity to other stores that offer similar goods or services; i.e., the tendency of businesses or products to cluster. Thus, proximity to competitors is considered more critical than proximity to customers. The other two classic theories of retail location focus on locational centrality. Thus, Spatial interaction theory (Reilly 1929) assumes that customers compensate for differences in store-specific product and service relative to the appeal of the place of purchase. This is the case, for example, of small convenience stores, which offer fewer standardized goods and services (Jones et al. 2003). Finally, Central place theory (Christaller 1933) focuses on the relationships between establishments of different sizes and relates their economic activities with the population. It states that its main function is to supply goods and services to the surrounding population. Geographical distance and transportation costs acquire a relevant role in the analysis, since demand for a good or service will decline with distance from the source of supply. Consequently, the locations closest to the customer's demand center guarantee a better positioning, in contrast to those located further away. This theory is considered to have significant predictive power, primarily for single-purpose shopping trips.
Since the appearance of GSV in 2007, new methodological opportunities appear for research on this topic. Wilson et al. (2012) show a precise and consistent agreement between observation field audits and image-based interpretation using GSV. Both academic research and companies have found that GSV offers an excellent opportunity for practical implementation and solving numerous problems.
However, GSV is not limited to the world of retailing as it offers many other possibilities for analysis and development of predictive models. Thus, Odgers et al. (2012) find it a reliable and cost effective tool for measuring both negative and positive features of local neighborhoods. Rundle et al. (2011) develop an exploratory study and find that GSV can be used to audit neighborhood environments. Wood and Reynolds (2012) study how retailers can take advantage of location research in order to better leverage geographical insights and assist in the realization of appropriate customer propositions and marketing strategies. Griew et al. (2013) develop a street audit tool using GSV to measure environmental supportiveness for physical activity. Hara et al. (2013) combine crowdsourcing and GSV to identify streetlevel accessibility problems in a city. Using GSV is a reliable method for assessing characteristics of the built environment .
All these methodologies present a problem, they are static in time. In other words, they represent a "snapshot" of reality at a certain moment. Therefore, the time-lapse technique can be an excellent opportunity for a more thorough and in-depth analysis of retail store location. Time-lapse is a technique in which images are captured in sequence with a photo (or video) camera, which offers opportunities to obtain dynamic information about reality through its evolution over time. To our best understanding, there is almost no research incorporating the time-lapse technique for practical applications, and no study examines the retail sector. One of the few exceptions that includes dynamic analysis using GSV is Ilic et al. (2019) that present a Siamese convolutional neural network that automatically detects gentrification-like visual changes in temporal sequences of GSV images. Cohen et al. (2020) also use GSV images and their evolution over time to provide current and historical food retail data from 2007 to the present. The study shows how GSV can be used to analyze changing food environments that affect health.
For this paper, the main contributions are as follows: (1) this methodology allows researchers obtain data from a primary source, based on direct observation of a certain environment.
(2) our dynamic time-lapse methodology will help to predict the potential popularity of locations over time, facilitating the process of decision making on retail store location, (3) this methodology may also help guide urban planners in designing commercial zones and transportation networks, to analyze spatial concentration of retailers, as well as study the factors affecting the rental value of residential property.

Development of the methodology for data collection and practical application
The methodology for collecting data for the analysis of commercial premises in a certain area begins with the selection of the area of influence. It will be considered a radius of 150 m for the analysis. This distance is based on the investigations of Tan and Tan (1995); Nikilaos et al. (2011) or Ray (2017) who follow this same longitudinal criterion attending to visual, transport and visual impact elements. In the case of previously delimited areas (the intention is to study a street, district or singular area) it is recommended to create grids of approximately 900 m 2 .
Likewise, it is advisable to create a table with a list of all the streets affected, list the premises or buildings within that radius of action to later record the data, as shown in the diagram (Fig. 1).
It should be noted that GSV imagery allows us to access spatial and temporal data, only in those places and times where the technology was employed. In most cases, this data is available from 2008 onwards with an increasing number of images and data available for research.
The data collection scheme was carried out following the criteria of standardized graphic notation system, BPMN (Business Process Model and Notation). BPMN is aimed at process modeling and is one of the best organization

Data considerations and treatment
The possible scenarios regarding the availability (or not) of images in GSV in the year of study are the following: (1) Image is available: the data of the variables for that year is recorded.
(2) No image is available for the year being studied: (a) There is an image available within 2 years of the year being studied: the data of the closest image in time is considered. Should there be both an earlier and a later image available, the data from the earlier image will be used. (b) There is at least one image with a time difference greater than 2 years: the data from the earlier year will be considered for the image analysis. If there is no earlier image available, the data record will state that there is no valid information.
(3) Image not available for any year: as no image was obtained, the data record will state "Property information not available".
One of the main advantages of the data collection methodology is that the researcher is free to select the variables that will be registered, as well as their classification.
In the case of the study of commercial premises, and by way of example, some of the variables that may be of interest and data that can be obtained about the property include: the economic activity carried out; external elements of the establishment; facade of the property; competitive analysis. Likewise, information about the urban environment can be collected.
By way of example, an analysis of a commercial area located in the center of Madrid (Spain) and London (UK) is carried out (Fig. 2). From the starting position, the steps and instructions laid out in the diagram will be followed, in order to make position shifts or make decisions about years where information was not available.
The starting point offers us a timeline with sufficient information (Fig. 3), although, as will be seen, in some years there was no available images. In this example, we will analyze the selected commercial property as the epicenter of the area for the period between 2008 and 2019 using four variables: commercial activity (type of business), commercial signage, the space surrounding the entrance and, finally, the total surface area of the sidewalk around the property. (Figs. 4 and 5).
Data such as those provided by this research, of a specific location in London and Madrid, simply as an example of the type of quantitative and qualitative information that this methodology can provide, allow you to have a very complete  perspective of the location and movements that occur in the retailers and surrounding areas over time. It is not simply a matter of having information at a specific moment, but also allows us to observe the temporal evolution and analyze possible interactions between the different observed variables.
These data can be valued from different perspectives. Thus, for example, it offers useful information to be considered by the three classic theories of retail location, which propose different ways of measuring the potential of commercial locations. In this sense, for example, for the analysis Data: · AcƟvity: Food Retail. · Commercial Signage: Autoservicio GAMA (orange and white -Background: black). · Area surrounding entrance: 35 m 2 . · Area of Sidewalk around the property: 104 m 2 .

MADRID
As the closest images available were for the years 2008 and 2012, the data from 2008 was used.
As the closest images available were for the years 2008 and 2011, the data from 2008 was used.

MADRID
As the closest images available were for 2012, the data recorded for 2012 was used.
As the closest images available within a two-year period were for 2011, the data recorded for 2011 was used. from the Principle of minimum differentiation, it is very important to know the openings and / or closings of the retailers around the central point of analysis. The density of competitors is key and knowing if there are retailers that offer similar goods or services can be decisive in the decision process.

Conclusions and potential applications
Store location is one of the most relevant strategic decisions for retailers since it may be a determining factor for success, even being directly linked to customer satisfaction. However, identifying the best locations is a complex decision for businesses to make. In this regard, there are different ways to evaluate store location. The three classic theories of retail location, Principle of minimum differentiation, Spatial interaction theory, and the Central place theory offer different and complementary perspectives. However, as described in this paper, many authors add different variables to these models when presenting some methodological problems inherent to them (e.g., the characteristics of the site is scarcely considered; the distance is over-emphasized; factors such as: the effect of site-specific operating costs, competing stores, or the economic value of customer's time on the location decision are not considered). In this way, different retail store location models are developed, with some research even focusing on specific retailers. In addition, since the development of GSV, new methodological opportunities appear for research which are not limited to retail stores, but rather extend the analysis and development of predictive models in different fields. Nevertheless, these methodologies are static in time, representing the situation at a very specific moment. Therefore, time-lapse techniques offer new opportunities to research, favoring a deeper analysis, as well as obtaining dynamic information, thus observing the evolution over time of different variables that the researcher can personalize and adapt to their analysis. The development of a customized methodology, incorporating the time-lapse technique for practical applications, is the main contribution of this research, since to our best understanding, there is almost no research on this topic, and no study that examines the retail sector. One of the most common problems that researchers face is the difficulty in finding available data, and on numerous occasions, the high cost of said data. In this article, we present a proposed methodology for obtaining spatio-temporal data through GSV as applied to commercial properties. New data that, when analyzed in combination with other methodologies, should broaden research horizons.
The data related to zoned commercial establishments aids the decision making process with regards to choosing the optimal location, competition studies, commercial gentrification movements or real estate investments, among others.
One of the advantages of the methodology applied to the study of commercial premises is that it allows the economic data of the establishments, which is easily identified (price of the land, rentals, turnover…), to be combined with other qualitative data (color of the facade, proximity to other establishments or width of sidewalk).
The primary data is a result of the direct observations of the researcher of the environment, and unusual in terms of research methodology, for different periods.
The data methodology shown allows the researcher the freedom to choose the desired time periods, and facilitates spatial autonomy, since it covers almost all of the urban centers.
As previously mentioned, GSV is a free tool, and consequently, the data preparation is free and remains in the hands of the researcher who can thus adapt their workload.
In addition to the obvious lines of research related to urban planning and construction, the possibilities offered for research are vast. As a guide, and without implying an exhaustive list of options, some possible uses in different areas of knowledge could be: archeology, to analyze the deterioration of buildings or exploitation of heritage; in health sciences in studies of different diseases and their relationship with neighborhoods or residential areas; on depopulation and its effects on rural municipalities; analysis of the consequences of climate change or even to compare some of the effects of the most recent pandemics in certain geographical areas.
As limiting factors of this tool, we must point out that GSV shows images of reality at a given moment, and sometimes it would be advisable to have shorter periods than are currently available. Moreover, the quality of the image may not be optimal and sometimes obstacles appear that prevent an adequate analysis.