Recognition of built-up and non-built-up areas from road scenes

In many cases, it does not follow from the road design, whether the given scene is within or outside the posted built-up area. The purpose of this paper is to evaluate road scenes, how far they can be considered being of built-up and non-built-up nature, as well as to identify road scenes which are ambiguous and therefore less safe. Two methods were used to assess the degree of unambiguous or ambiguous nature of road scenes. In the first approach, a survey of requested speeds at various road scenes was performed with 500 respondents. Here clearly non-built-up and built-up sites, as well as unclear sites were compared. In the second method, the recognition process of drivers was simulated by an image classification software. The classifier was trained by 100 clearly built-up and 100 non-built-up pictures. Four test runs followed, each using 200 pictures from different roads. From the speed choice study, results have shown that in unclear situations (e.g. transition between built-up and non-built-up areas) the standard deviation of chosen speeds is higher than in unambiguous situations. In the image classification study the trained classifier worked well for road scenes which are definitely of built-up or non-built-up nature. Furthermore, as expected, for unclear situations, the classifier gave uncertain classifications. Each of the two methods produces an output indicator, the standard deviation of speeds and the certainty score, respectively. Both indicators can serve to identify road scenes leading to uncertain and therefore risky situations.


Introduction
The safe speeds and also the general speed limits are quite different outside and within built-up areas. However, the general definition of a built-up area is rather vague. According to the Vienna Convention on Road Signs and Signals [1], Bbuiltup area^means an area with entries and exits specially signposted as such, or otherwise defined in domestic legislation. The sign to indicate the beginning of a built-up area shall bear the name of the built-up area or the symbol showing the silhouette of a built-up area or the two combined.
The concept of self-explaining roads involves that drivers choose their appropriate speed according to the road layout, without the help of speed limit signs. Typical road layouts within and outside built-up areas are usually sharply different from each other, drivers can easily recognize them. However, in transition areas between non-built-up and built up areas the layout might be not so clear, therefore drivers are not certain, which speed they should drive at. This uncertainty is reflected in higher differences between their speeds which is itself a risk factor. This paper shows two methods to assess the differences between built-up and non-built-up areas: a questionnaire survey and a computer-based image classification procedure.

Earlier research
The term Bself-explaining road^has been used more frequently in the literature since the 90's. According to Theeuwes and Godthelp [2], traffic systems having selfexplaining properties are designed in such a way that they are in line with the expectations of the road user. The so-called BSelf-Explaining Road^(SER) is a traffic environment which elicits safe behavior simply by its design.
Although different authors use different terms, all agree that internal mental representations (such as schemata, scripts, routines, prototypical representations and mental models) help to increase efficiency in human decisions. According to Theeuwes and Godthelp [2], abstract representations of the world are stored in memory. These prototypical representations develop through experience. In order to ensure unity in the way people structure their world, it is required that there is a large consistency in the physical appearance of an object or environment and a large consistency with respect to the behaviour displayed in relation to that object or environment.
In the SPACE project initiated by ERA-NET ROAD (Sjörgen et al. [3]) refer to (Mazet and Dubois [4] and Mazet, Dubois and Fleury [5]), considering two terms in driving behaviour: 'mental categories of roads' and 'road readability'. The SPACE project uses a definition of SER that recognizes the role of categorization in the previous papers, but suggests that practitioners generally now understand the meaning of self-explaining roads to include other psychological concepts such as intuitive and understandable design, consistency, readability and psychological traffic calming.
In their paper about behaviourally relevant road categorisation, Weller et al. [6] argue that Bunsafe situations are likely to occur if the perceived message conveyed by cues or affordances does not match the normative behavioural expectations of the official road category. In order to avoid such mismatch it is important to know how drivers categorise (rural) roads and which elements are used for this subjective and behaviourally relevant road categorisation.^Therefore they conducted a study in a laboratory setting during which subjects were asked to rate a variety of rural road pictures. The study revealed that drivers distinguish between three different rural road categories which can be distinguished with comparatively few objective criteria.
Discussing the cognitive psychological background of driving, Montel et al. [7] explain that Bdrivers refer to categories of roads when they analyse the roads and environments they are driving on. They also associate to such categories of roads certain specific expectancies related to the events they may encounter on such roads. … One challenge for the engineers is to take drivers' categories into account when designing roads in such a way that drivers' information processing and decision making will be more appropriate to the situations encountered.^Montel's paper shows results from a survey related to urban streets. The goal of the survey was to identify drivers' categories of urban streets based on 65 photographs of various urban streets. Drivers were asked to classify streets and then to describe the events they expected to meet in the different classes of streets.
Referring to a research program on road legibility Fleury [8] describes a set of experiments using photographs, TV screens and drawings of various road scenes to assess the cognitive categorial knowledge of the Bcommon driver^, to find the sets of properties of the environment appear to be relevant for the categorial organisation and finally to identify the clues (or patterns of clues) of the environment which are associated as predictors of different types of problems or patterns of behaviour.
Road scene photographs were also used in further studies about the selection of the speed by drivers depending on the layout and conditions of the environment of the road section (e.g. Garrick [9], Goldenbeld and van Schagen [10], Lahausse [11]).
Charlton et al. [12] describe a project undertaken to establish a self-explaining roads (SER) design programme on existing streets in an urban area. The SER design for local roads included increased landscaping and community islands to limit forward visibility, and removal of road markings to create a visually distinct road environment. In comparison, roads categorised as collectors received increased delineation, addition of cycle lanes, and improved amenity for pedestrians. The objective speed data, combined with residents' speed choice ratings, indicated that the project was successful in creating two discriminably different road categories.
Dealing with road categorisation and design of selfexplaining roads in a broader sense, Matena et al. [13] showed specific good and bad practices for the layout of transitions between rural and urban road segments.

Questionnaire survey
The goal of this survey was to assess, how well road users can distinguish built-up and non-built-up areas with a general speed limit of 50 and 90 km/h, and especially how they perceive transition zones.
Pictures of clearly built-up, clearly non-built-up as well as transition sites were shown on computer screen to persons who had to give their chosen speeds at each location. For each of these three types, five pictures were shown in randomly mixed order. The sites were chosen from 2*1 lane national main roads in the North-western part of Hungary, flat terrain, tangent sections, and the built-up sections from the same roads national main roads being in villages or small towns. Participants were not informed about the actual speed limit. The images showed road scenes with very little or no traffic at all so that it could be inferred what the free flow speed would be at those locations. Fig. 1 shows two typical pictures: the first one being a clearly built-up site with houses, sidewalk and public lighting poles on both sides of the road, while the second being an unclear site with built-up nature on the right side but with a rural look on the left side).
Nearly 500 respondents filled in this on-line questionnaire at home at their own computers. The survey started with about 100 students and it was later extended by other persons on available mailing lists. The average age of the respondents was 31 years, the maximum 61 years. Male/female rate: 72/ 28 %. This sample is certainly not representative for the total driving population, however it can be assumed that it is appropriate for finding the differences between built-up, nonbuilt-up as well as transition sites.
For each picture, the average preferred speed, the v 85 speed, the standard deviation and the relative standard deviation of speeds were calculated. The results for the three categories are shown in Table 1.
The average speeds in the three categories are well reflecting the differences: for built-up, transition and nonbuilt-up sections 47.8, 63.1 and 86.1 km/h respectively. The fact that the mean speed in Btransition areas^lies between the mean speed for built-up areas and the mean speed for nonbuilt-up areas is not surprising: drivers take into account the reality of the road environment and the related risks, and not the official dichotomous categories (built-up/non-built-up).
Other results of this survey show that both the standard deviations and the relative standard deviation of speeds at not clearly identified sites are considerably higher than at clearly built-up or clearly non-built-up sites. This reflects the uncertainty of drivers with speed choice at such locations. An interpretation consistent with the self-explaining road notion is that it is less easy for them to categorize these sites as builtup roads (implying a lower legal speed limit) or non-built-up roads.

Image classification
In the next phase, an image recognition software was used to identify built-up and non-built-up areas. The aim of this part of the research was to verify that such a tool is able to account for the human classification activity, which is important for potential applications. The program used for the classification is VLFeat, the framework is provided by the program Matlab. The algorithm was created by Zisserman and Vedaldi [14].
The algorithm combines the following building blocks: [15] & information from it and these descriptive data, the so called BVisual words^are collected in a dictionary. According to the density of the Bvisual words^, histograms are made for each picture and for each grid within the picture.

Training of the classifier with training image dataset
For this experiment a large amount of road scene photographs was needed, which we got from on-board camera video records from the same roads as in the previous section (2*1 lane main roads in the North-western part of Hungary, flat terrain, tangent sections, and the built-up sections being in villages or small towns). The images depict road scenes showing the field of view in front of the driver while driving. The photographs of the database should be classified and all classes need a series of training images and also a series of test images.
In the teaching phase, pictures of clearly built-up and clearly non-built-up road scenes were given as input. Having a sufficient number of teaching pictures, the program is able to allocate new pictures to either built-up or non-built-up categories. Each picture is given a numerical value, indicating the degree of belonging to one or the other category. Using this evaluation method, unclear sites can be identified; preventive measures can be taken, thereby increasing safety.
The classifier builds a dictionary using histograms from a series of Bvisual words^extracted from the training image dataset. The program will recognize the Bvisual words^which best describe the category and also the least typical ones. This helps to build up the model. For the training we use two sets of images, one for the positive training images, which depict the category we want to recognize, while the other, the negative image group, gives a series of images that do not contain the category desired to recognize. According to these training images two models, a positive and a negative model is prepared. Each picture is given a Bcertainty^score by the program. The closer the picture is to the image in the model, the higher its certainty score is (in absolute value). The score is negative for the images that do not contain the object that you want to recognize.
In our experiment the built-up road category was taken as positive and the non-built-up road category as negative. One hundred built-up and one hundred non-built-up training images were given to the program. The training images were  Looking at the training rural database we can find pictures with dense vegetation nearby the road and also cross sections where vegetation is rare or absolutely lacking. In the training database road marking conditions also vary including locations with visible pavement markings and also no markings at all. In Fig. 2 the 100 + 100 training images are listed on the horizontal axis, with their ratings on the vertical axis. The positive grades mean the built-up scenes, while negative values the non-built-up ones.

Classifying test images
The trained classifier is used for classifying test images. Similarly to the training image dataset, two groups of test images, both positive and negative image sets are used. Using the model each test picture gets a certainty score.
In the first experiment, the database of test images contained only clearly distinguishable built-up and nonbuilt-up road images. According to the results, the classifier was able to recognize these two categories, there were only 12 pictures which were not ranked into the correct category. 94 % of the pictures were classified correctly ( Table 2). In Table 2 the 100 + 100 test images are listed on the horizontal axis with their ratings on the vertical axis. The positive grades mean scenes recognized as built-up setting, while negative values concern scenes identified as non-built-up areas.
In the second experiment the training database was kept unchanged, while the test data were completely changed. The test images were chosen from two specific road sections, the clear urban images from the small town Herend, and the rural road scenes from main road No. 8. In the authors' opinion nearly all of the images were clearly definable urban or rural scenes. From the test a similarly good detection was hoped as in the first experiment with images from various  roads. The expectation was confirmed, this case resulted also in the high rate of correctly classified images, 91 % of the images were correctly classified (Table 3).
In the third experiment the training image dataset was kept constant again. Test images were chosen from another road section, from road No. 1 between the towns Komarom and Tata. The built-up or non-built-up nature of the pictures was defined by the city limit signs, indicating the speed limit of 50 km/h in built-up areas. There were a number of cases, where it was difficult or even impossible to decide from the picture itself, whether it is situated inside or outside a built-up area. This is because one side of the road is like a built-up environment, and the other side of the road suggests a nonbuilt-up environment.
The detection rate has dropped significantly in this case, since during the training process the classifier met only images that clearly belong to one or to the other class. Thus, only 65 % of the images were classified correctly. From the builtup images only 42 % were recognized correctly ( Table 4). Most of the built-up scenes were classified as non-built-up.
It has to be mentioned, that the road scenes themselves are not clear, this can also cause difficulties for the real drivers in this road section.
For the fourth experiment, the test database was changed again. This time the images were taken on road No. 81, in and around the town of Mor. Similar results were seen as in the third case, so the recognition rate is quite low. The classifier was able to recognize and correctly classify only 60 % of the test images. From the built-up images only 20 % was correctly recognized (Table 5). Similarly to the previous case, for drivers on this road section it might be difficult to decide where they are and what speed they should choose.

Discussion of results
If we look at the certainty scores given by the classifier (Table 6), it is clear that in the first two experiments more than 70 % of built-up road scenes was ranked correctly (72 and 76 pictures from 100 having scores over 0.5 or over 1), while the precision in the non-built-up scene recognition was 100 %   This result does not imply that the program is inefficient: humans also fail to correctly classify these unclear sites, as suggested by the results of the questionnaire survey. Therefore, the reason of the low performance of the program for these environments is probably to be found in the road layout. If we consider the results from the experiments No. 1 and 2, we can observe that for scenes, which are definitely of built-up or non-built-up nature, the classifier works reasonably well. So the reason of misclassification in experiments 3 and 4 has to be in the road layout. Fig. 4 shows two examples of misclassified images. The left-side picture in Fig. 4 looks definitely a non-built-up site, with solid lines marking the pavement edge, with green shoulders, without curbs and sidewalks, while in reality it is within the city limit signs with a speed limit of 50 km/h.
The right-side picture in Fig. 4 is a little bit less obvious. On the left side there are some buildings, a sidewalk with curb, while on the right side there is no curb, no sidewalk and it looks more non-built-up. Looking at this picture more carefully, a building can be recognized on the right, but it is covered by trees and it is not well visible for the drivers.
In about 50 % of the cases in experiment 3 and 4 the certainty scores were between minus 0.5 and 0.5. If we look for the reasons of the uncertain classification, we can identify cases like unknown objects in the pictures (e.g. bridges, New Jersey concrete barrier elements) and sometimes simply too dark pictures.
The teaching process described above used only clearly built-up and clearly non-built up pictures. Thus as it was expectedthe program was not able to classify unclear sites. In a later phase of the research the training process could be applied to all kinds of environments. The question is to what extent one can recognize the two Bofficial^categories (officially inside a built-up area, with a 50 km/h legal speed limit, versus officially outside built-up areas, with a legal speed limit of 90 km/h). Then the program could be trained on all sites, based on the two official categories, and therefore including the transition sites. Thus, the positive training images would be the officially built-up environments (including both clear and unclear sites), and the negative training images would be the officially non-built-up environments (including both clear and unclear sites). The expected result would be that, as human drivers, even after training, the program will probably less easily recognise the transition sites (as officially built-up or non-built-up sites) than clearly built-up or clearly non-

Machine-human comparison
This chapter attempts to compare machine and human classifications. In the image classification experiments a total of 200 + 4*200 = 1000 pictures were used. For each picture the certainty score given by the program was known. As this amount is too much for human tests, 50 pictures were chosen, so that they cover the whole range of the certainty scores. These pictures were shown to 86 persons asking about their preferred speeds. The average speed for each picture is plotted by the certainty scores given by the classifier in Fig. 5. In Fig. 5 a linear regression line was fitted. Despite of the moderate R 2 of 0.72 it is visible that the relationship is not linear, respondents classified the pictures in two groups and referred to the speed limits of 90 and 50 km/h on non-built-up and built sites resp. However in the range of low certainty scores (between about −1.5 and +1) there are deviations from these limits. In general, there is a reasonable coherence between the human and machine classification.

Conclusions
It is widely known that road users choose their speed based on their visual impression of the road scene, rather than on speed limit signs. Unclear road design can cause uncertainty to the drivers. If it does not follow from the road design, whether the given scene is within or outside built-up area, drivers are not informed properly about the appropriate speed.
This paper shows two approaches to assess the degree of uncertainty of the drivers. In the first approach, a survey of requested speeds at various road scenes has shown that in unclear situations the standard deviation of chosen speeds is higher than in unambiguous situations, and the inhomogeneous distribution of driving speeds can increase the risk of accidents.
In the second method, the recognition process of drivers was simulated by an image classification software. For road scenes which are definitely of built-up or non-built-up nature, the trained classifier works reasonably well. However, as a b  expected, for unclear situations the classifier gives an uncertain classification. Each of the two methods presented in this paper produces an output indicator, the standard deviation of speeds and the certainty score, respectively. Both indicators can serve as tools to assess the degree of uncertainty in road users, thus road scenes and road elements leading to uncertain and therefore risky situations can be identified. The output of the proposed methodology can be used to help road safety inspections.
Having identified uncertain transition sites, road engineers should help drivers to select the right speed. There are several possible solutions for this purpose. According to the SER principle, a clear distinction should be created, e.g. by adding a Bvillage gate^consisting of a middle island with an appropriate deviation in the vehicles' path. If it is not possible, non-SER solutions could also help, e.g. using or repeating speed limit signs, pavement markings.
The experiments of this paper were restricted to 2x1 lane national main roads within and around villages and small towns in off-peak hours. Further research is envisaged to add other cases, like more urbanised areas and more sophisticated traffic conditions (e.g. higher traffic volumes, bicycles, pedestrians).
The human road scene assessment method described in Chapter 3 fits into a series of similar experiments mentioned in Chapter 2. The authors think that the image classification in Chapter 4 adds a new element using a relatively simple tool. The focus here was to identify road scenes and certain elements in the environment influencing human decisions. Recently there are more advanced techniques (e.g. Foucher et al. [16]) using sequences of pictures taken at every 5 m with a more sophisticated analysis algorithms trying to minimize false classifications. However if there are ambiguous road scenes or sequences of them, false classifications will remain.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.