1 Introduction

According to Eurostat (2021) a growing share of the EU citizens inhabits apartments located in multi-family buildings, reaching as much as 46.3% in 2020. This market’s structure may be considered diversified as buyers and sellers satisfy there not only residential but also investment needs. However, an access to the full micro-level information needed for market understanding and supervision is difficult, if at all possible, thus the market may not be considered fully efficient (Cheng et al., 2015, 2020). The problem may be rooted in the information asymmetry (Akerlof, 1970) as the sell side of the housing market transaction (real estate broker or apartment owner) has an information advantage over the buy side (buyer or tenant) concerning the knowledge on the true quality of the offered product.

The importance of quality in the housing context has been studied with the use of hedonic models (Lancaster, 1966; Rosen, 1974) where the quality-related variables have repeatedly proven to be key determinants of apartment prices (de Wit & van der Klaauw, 2013; Olszewski et al., 2017; Trojanek et al., 2021; Trojanek & Głuszak, 2022). Moreover, based on the micro-level hedonic models, researchers have been able to calculate hedonic price indices (invariant to changes in the quality structure of the analytical samples between the periods of analysis). These are recommended by European Commission, Eurostat, Organisation for Economic Co-operation and Development and World Bank (2013) to be computed by state institutions to measure changes in prices of real estate. Finally, the quality has been indicated as the most important attribute that contributes to housing attractiveness (Renigier-Biłozor et al., 2022). However, the approach of consumers to apartment quality has been recently changing, which has been induced by the COVID-19 pandemic (Marona & Tomal, 2020). Therefore, the issue of apartment quality should be considered important from both individual and institutional perspective.

To reduce information asymmetry, sellers would send signals (Spence, 1973, 2002) that reveal the information on the true quality to buyers. In this case, a signal may be defined as a way to communicate the product’s observable but alterable characteristics. Recently, the signalling of quality of apartments for sale or rent has been done most often via listings posted in the online platforms. A typical listing contains:

  • A title,

  • Information about the price or rent,

  • A table grouping structural characteristics of the apartment,

  • A declaration on the offered apartment quality chosen by its seller from a drop-down list of pre-specified quality labels, which may be referred as a direct textual signal of quality,

  • A textual description of an apartment—a descriptive, indirect textual signal of quality (as this way the quality is signalled by a multitude of expressions used to describe the apartment),

  • Photos covering the interior and exterior of an apartment and the building, in which it is located—an indirect visual signal of quality.

It has been proven that the understanding of textual signals included in listings is the same for real estate sellers and buyers (Luchtenberg et al., 2019). Grossman (1981) and Milgrom (1981) in their pioneering papers showed that sellers have to provide buyers with the information on the product’s quality as its non-disclosure would signal low quality. However, the literature suggests the existence of some configurations of consumer preferences that may incentivise sellers to signal quality not fully (Hotz & Xiao, 2013). Moreover, disclosing more photos to housing listings has been found to have both positive and negative trade-offs (Benefield et al., 2011) and for some segments of the market disclosing less information in listings might be beneficial (Bian et al., 2021). It may be argued that when increasing the information load of the listing, the probability of enclosing the information that would discourage the potential buyer to visit the apartment in person also rises (Goodwin et al., 2014).

It would be possible to identify the true quality of the listed apartments but the cost of the procedure would be high. The ideal quality assessment would demand a huge amount of time needed for an in-person visit to see each apartment and require a specialistic knowledge. Thus, buyers on the housing market are prone to using heuristics and are subject to anchoring effects (Bucchianeri & Minson, 2013; Tversky & Kahneman, 1974). Luchtenberg et al. (2019) examined the importance of pictures and words used as tools to signal the quality of homes listed for sale and concluded that words mean more than pictures in terms of attracting potential customers. In the contrary, Seiler et al. (2012) proved that when searching for home with the use of online listing platforms, buyers view first the apartment photos and only later the quantitative descriptions and sellers’ remarks sections. Moreover, Renigier-Biłozor et al. (2022) have proven, that the apartment quality has been responsible for the largest fluctuations of detected emotions among all the apartment attributes represented in apartment photos. Based on that, it may be suspected that both textual and visual signals expressing housing quality may have a role in the anchoring process. Thus, in order to sell (or rent) for higher prices, the apartment owners (or landlords) may be interested in overstating the quality of the listed apartments.

Recently, when search processes in the housing market have been dominated by online listing platforms, the discussed phenomena may have two-dimensional implications. First, the incomplete or supposedly overstated information on housing quality provided by sellers in listings may lead to far-from-realistic anchoring and may create a platform for abuse of ill-informed buyers. Then, because of a significant time cost to arrange an in-person visit to multiple apartments, either a problem of adverse selection may arise or the market clearing processes may be impaired. Moreover, the imperfect knowledge of the buyers, driven by information asymmetry, might have an impact on price levels in the housing sector. It has been pointed by Łaszek et al. (2016) on the example of the Polish primary market. Moreover, Qiu et al. (2020) have proven, that as a result of a weak information, certain socio-economic groups of buyers pay higher prices for similar housing than the better-informed ones.

Secondly, the scarce, incomplete or defective information would have its reflection in the housing market data needed to perform analyses (Renigier-Biłozor et al., 2019). Transactional data of acceptable quality is usually available with a significant time lag, which hinders the possibility for a prompt reaction of administrative and supervisory authorities to a rapidly changing market situation. The weaknesses and challenges of the real estate market in the context of data have been presented on the example of Poland by Brzezicka (2021). Her study highlights that the gathered data is often fragmented because of the shortage of publicly available databases of housing transactions. Additionally, an access to micro data from the housing market is severly restricted, as they are a source of competitive advantage (in the case of private entities) or are covered by statistical confidentiality (in the case of public entities). In this market environment, the importance of the listings-originated data has been recently growing, even though the process of seachring for an apartment with the use of online listing platforms has multiple weaknesses that later affect also the quality of datasets. Among those, Nasreen and Ruming (2022) have found the misrepresentation of property condition and Beręsewicz (2015, 2019) has pointed that the micro-data concerning Polish housing market cannot be fully regarded as representative. However, listings data have shown an ability to provide an up-to-date view of the market situation, hence they may be used as a proxy of transactional data or their supplement (Boeing et al., 2020). What is more, the studies indicate the high accuracy of the listings-based price indices and their beneficial anticipatory nature (Anenberg & Laufer, 2017; Ardila et al., 2021; Wang, 2021). Finally, listings data is easily accessible with the use of web-scraping algorithms.

This paper intends to inspect the quality-related information disclosure strategies of sellers (owners, landlords or brokers) who list individual apartments for sale or rent in the online listing platforms. It aims to answer, what is the scale of non-full information disclosure and to empirically verify the hypothesis that housing sellers tend to overstate their direct declarations of the listed housing quality. Both issues may contribute to the creation of the unwanted information gap. Then, the paper checks, whether the visual signals sent by sellers are consistent with the textual ones, so that the information gap may be decreased by combining the earlier unused textual and visual information. It has been done with the novel use of the supervised machine learning, dictionary-based Wordscores algorithm (Laver et al., 2003). To verify the correctness of the method a unique dataset of housing listings has been prepared. It contains over 7000 Polish-language sales and rental listings of apartments located in Poznań, Poland, gathered between 06/2019 and 03/2021 labelled in terms of the listed apartment quality. The whole analysis has been conducted separately for the two most popular types of housing tenure in Poland—buying an apartment or renting it on a private market for a long-term (contracts for 6 months or more).

The research contributes to the literature in four ways that have implications for the study fields’ development. First, it extends the previous studies on the information disclosure in the housing market (Benefield et al. (2011), Bian et al. (2021), Hotz and Xiao (2013)) by documenting the size of information gap as for the sellers’ direct textual signals of apartment quality in listings. This kind of signals may be used to filter thousands of listings in the process of searching for an apartment to buy or rent, thus their absence may significantly prolong the market search process and impair the market clearing. Moreover, to date researchers (who construct their hedonic models with the use of micro-level, listings-originated data of housing market transactions) have most often utilized only the observations, for which the information on quality has been provided in the form of direct textual quality signal. The paper warns that this approach may introduce bias to the analyses.

Secondly, to fight the implications of the information gap, the study tests the utility of the Wordscores algorithm to automatically extract signals of apartments’ quality from their textual descriptions included in listings. The fruits of the method may be used both to facilitate the processes of housing search and to enhance the analytical properties of datasets. In this manner, the study extends the results of Seo et al. (2020) (who studied the impact of the presence of quality-related information included in listings on house prices) by showing a way to calculate an independent measure of apartment quality.

Thirdly, using the calculated Wordscores, the paper verifies the consistency of textual and visual quality signals sent via sales and rental listings by owners of the apartments and real estate brokers. The study aims to answer whether the visual and textual signals carry the same information load, and therefore can be used interchangeably. This would noticeably ease the analytical processes that are currently resource- and technically-demanding because they utilize the visual data (as e.g. the one proposed by Poursaeed et al. (2018)). In this regard, it contributes to the signalling literature and extends the results of Luchtenberg et al. (2019).

Fourthly, to my best knowledge this is the first study to compare the performance of textual models of sale and rental markets and to inspect the similarities and differences in listing strategies used on them. Both submarkets interact to great extent as their supplies originate in the same housing stock. Thus, they are suspected to be a subject to similar phenomena related with housing quality. The contribution to the development of efficient, comparable tools of their analysis would constitute an additional value added of the research.

The structure of the paper is as follows. Section 2. discusses the ways of perceiving and signalling housing quality together with a review of methods for extracting quality-related signals from apartments’ listings. Section 3. presents the data and methods used for measuring the consistency of textual and visual quality signals. In this regard, the steps of the Wordscores algorithm and the data used are introduced. Section 4. describes the empirical results and the performed robustness checks. Finally, Sect. 5. concludes and outlines the scope of further analyses.

2 Literature review

2.1 Perception of housing quality

Researchers consider that the quality of individual housing consists of four main components: apartment building quality, neighborhood quality, social and economic surrounding quality and apartment unit quality (Brkanić, 2017). The former three can be expressed relatively well with objective factors and with the use of information that may be considered hard information in the Liberti and Petersen (2019) definition. In this case, the publicly available information may be used e.g. building age, its construction technology or the building’s location in relation to public utility facilities. However, the issue of apartment unit quality is more problematic. Some researchers find that it is being reflected by the structural apartment’s characteristics, which can be presented objectively (Elsinga & Hoekstra, 2005; James, 2007; Tomal, 2022), e.g. number of rooms, presence of balcony, presence of AC or type of heating. Others focus also on more challenging to quantify information on apartment finishing, condition, design or adaptability (Kain & Quigley, 1970; Kim et al., 2005; Le et al., 2016; Renigier-Biłozor et al., 2022). Obtaining it would require having access to the ultra-precise data for each apartment or would entail personal inspection of apartments by specialized agents. Thus the objective judgements of apartment unit quality need to be substituted with more subjective ones, based on a more general look at the apartments (e.g. on their photos or descriptions). In this regard, the quality should be considered soft information (Liberti & Petersen, 2019) and this understanding of quality may be referred as soft quality. It means that the knowledge of a person providing the assessment, his/her motivations, and the context of the process are essential factors in understanding the meaning of soft quality assessments.

The official, universal guidance on how the housing quality should be measured is provided rarely. Among exceptions, there are the systems, which have been applied nationally in France (Qualitel, 2021), Switzerland (OFL, 2015) or Japan (Housing Performance Indication System and Long-life quality certification (Fujihira, 2017)). The more common approach is to prepare general guidelines set for a given purpose and it has been taken in Poland, by the National Bank of Poland. As an institution responsible for monitoring housing market phenomena, which may affect the country’s financial stability, it gathers (from private and public entities) the micro-level data on transactions on the secondary residential real estate market. To ensure compatibility of data it has released the instruction (NBP, 2022), in which it defines the desired data structure. One part of it specifies how the soft housing quality should be measured. The main attributes of apartment one need to take into consideration are windows, floors, installations, doors, walls, kitchen & bathroom equipment. The high-quality label should be assigned to apartments that are functionally arranged, finished with good quality materials, and with a low degree of exploitation. The medium-quality apartments are those whose technical condition allows moving in without significant additional investments. Lastly, the low-quality ones are those that are eligible for renovation.

2.2 Targeting apartment quality in housing listings

The signals concerning quality in its hard sense are most often included in listings in the structured form of a table with the offered apartment’s characteristics. Thus, they are relatively clear and little prone to manipulation. Thinking about soft quality signals sent by sellers, they might be considered twofold—as direct and indirect ones. A direct signal—a declaration of quality selected from a pre-specified drop-down list may be interpreted without any hardship. The bigger problem arises when it comes to indirect signals that are sent in the form of apartment’s photos or its textual description. This kind of signals are much more detailed but also harder to interpret, because of their structure. However, they are often the only sources of information on soft quality provided by sellers.

The most precise approach to extracting information on soft quality from indirect signals would be to personally inspect each listing’s photos and description and assign qualitative or ordinal labels to each of them. In order to provide as objective assessment as possible, the person providing an evaluation should follow strict rules set by experts in the field and, ideally, be an expert himself. However, the cost of obtaining information this way is very high, as it requires both a qualified workforce and a massive amount of time. For this reason, one should pay attention to the automatic, machine learning (ML) methods of extracting information on housing quality. They may limit the resources needed and reduce the unwanted bias and inconsistency of subjective human assessments. One may divide ML methods into unsupervised and supervised ones (Bonaccorso, 2018). Unsupervised ML models are used mostly for clustering or dimensionality reduction of big data. No labelled data is required to train this kind of models, which however makes it difficult to verify precisely the models’ outcomes. In the case of supervised ML methods, which are designed for classification or regression purposes, it is easier to validate the results, but the process of obtaining training sets with labelled data may be time consuming, costly and prone to bias (Algaba et al., 2020).

One approach to automatically extract information on the quality of apartments is to use the ML-supported visual quality assessment methods based on the photos of apartments. Gastaldo et al. (2013) argued that mimicking human’s perception of quality might be more effective than trying to understand the sophisticated human’s process of assessing quality. Thus, it is better to focus on extracting photos’ patterns than on modelling non-linear human decisions. In this case, the unsupervised ML methods may be used to effectively cluster similar images. However, they have huge data requirements and the tendency to overfit data. Finally, the clusters’ labelling or grading in terms of quality would be anyway a subject to the expert, subjective decision. Therefore, the semi-supervised ML methods that require a minor share of labelled observations combined with majority of unlabelled ones may be seen as a compromise. In this manner, Poursaeed et al. (2018) estimated the luxury level of apartment photos and the obtained metrics may be considered a representation of soft quality. With crowdsourcing, the researchers trained the dataset by subjectively comparing pairs of photos (representing similar objects) in terms of quality. Finally, one of eight levels of luxury has been assigned to each type of room (kitchen, bathroom, bedroom etc.) in each offered apartment. Although computationally demanding and resource costly, the method has proven effective in reducing the median error in the mass appraisal model.

The other approach entails utilizing the information included in listings’ textual descriptions. Algaba et al. (2020) pointed that supplying soft information explored from textual content is becoming a standard addition in economics and finance to conventionally used hard data. In this regard, the measure of sentiment (a latent variable to indicate the semantic orientation of text) has been often used as a parameter in economic modelling. Goodwin et al. (2014) studied the language used in real estate listings and its impact on the selling-price, time-on-market and the probability to be sold. They concluded that the performance of housing listings might be enhanced by a careful preparation of its textual descriptions. To understand better the true meaning of apartments’ descriptions Goodwin et al. (2018) surveyed a nationwide sample of the US’s respondents to construct the dictionaries of words with positive, negative or neutral meaning in the reality of the housing market. However, Haag et al. (2000) found, that some sellers’ remarks in the listings platforms may be classified as hype, because certain strictly positive comments were associated with lower sales prices. Thus, using lexicons of word’s sentiment based on the declarations (obtainted through surveys) may lead to false conclusions.

In some textual studies, the housing quality has been targeted indirectly. Shen and Ross (2021) used texts to construct an unsupervised ML algorithm (based on natural language processing) to measure the apartment’s uniqueness, which may be interpreted twofold—as the unobserved quality of an apartment and as the market power of an apartment in relation to the properties in its neighborhood. The usage of uniqueness as a variable in the hedonic model absorbed a significant part of the model’s residual variance. Nowak and Smith (2017), Liu et al. (2020) and Nowak et al. (2021) argued that real estate hedonic models that do not include quality-related explanatory variables might not only be incomplete, but also biased. To mitigate the problem, they transformed listings texts into tokens that reflect the presence or absence of each word in each listing. With the LASSO algorithm, they formed static and dynamic dictionaries of words to proxy for the omitted, quality-reflecting variables in hedonic models of housing prices. They proved that the addition of the selected tokens as explanatory variables in the hedonic models contributes to the improvement of their statistical properties.

As for the researchers that targeted the issue of quality directly Seo et al. (2020) studied descriptions of housing to form seven quality-related groups of phrases (in the form of bigrams—pairs of words). Among them, one can name mainly: physical housing quality, housing size, its accessibility or the building condition. The phrases were computationally extracted from the US’s Multiple Listing Services platform and manually assigned to the pre-specified groups to form tokens. In the study researchers focused rather on the presence or absence of the specific (only positive) information on quality in listings than on calculating or estimating the quality level of apartments. As a result, the statistically significant impact of the presence of quality-related information on selling prices of housing has been proven.

Even though to my best knowledge the fully supervised ML methods have not been used in housing quality-related research yet, the already developed methods offer a wide range of practical applications. One of those is the Wordscores algorithm (Laver et al., 2003), which has been developed for extracting textual sentiment from political statements. The main idea behind it is to construct a dictionary of words with the assigned scores, based on the already classified texts. Then, the dictionary is being used for automatic reading of virgin, non-classified texts—as a result, each of them is provided with a sentiment score. Contrary to the regression-based techniques, the Wordscores builds on the bag-of-words approach that utilizes the content of whole texts without selection of only statistically significant words. Moreover, the method does not pre-specify the sign of correlation of the used words with the target value. It is important, knowing the results obtained by Haag et al. (2000), based on which it may be inferred that some words considered strictly positive may be linked with a low quality of apartments. Furthermore, the method allows targeting directly the phenomenon represented in the classified texts. It stands in contrast to the methods used in the quality-related studies that focused primarily on reducing errors of hedonic models. Finally, the method has proven to be effective in sentiment analysis, yet simple, compared with the already developed and empirically tested approaches to textual and visual analysis. Therefore, I hypothesize its high utility for extracting information on quality for the needs of the housing market analyses.

3 Methodology

3.1 Consistency of signals: direct textual vs indirect visual signals of quality

To document the scale of information gap and to verify the hypothesis that housing sellers’ direct textual signals of quality tend to be overstated the unique dataset of individual listings has been used. It contains 1441 listings of apartments for rent from the period 09/2020–06/2021 and 1567 listings of secondary market apartments for sale from the period 09/2020–06/2021 sourced from the two most popular online housing listing platforms in Poland—www.otodom.pl and www.gratka.pl.

As a first step, one soft quality class has been assigned to each apartment listing—low, medium or high, based exclusively on the attached photos. This measure, obtained according to NBP (2022) guidelines (discussed in Sect. 2.1.), would be later referred as a visual signal of quality of apartment. For sales listings, all the observations, which were not possible to be categorized with the three mentioned quality classes were excluded. The apartments labelled as “for renovation” or “for refreshment” have been automatically regarded as low-quality ones and in their case the consistency of signals has not been measured. For rental listings, it has been assumed that an offered apartment should be ready to enter by a tenant without a need for prior renovation. The direct textual signalling of quality was done by sellers by selecting the most appropriate quality label from the following—for sales listings: “high quality”, “good quality”, “freshly renovated”, “to enter”, “for renovation”, “for refreshment”, and for rental listings: “high quality”, “good quality”, “freshly renovated”. Finally, the visual signals of quality have been confronted with the sellers’ direct textual signals of quality to verify their consistency.

3.2 Consistency of signals: indirect textual (descriptive) vs indirect visual signals of quality—Wordscores algorithm

The second part of the analysis has aimed at measuring the consistency of indirect textual and indirect visual quality signals. For extracting the indirect textual quality signals from texts (later referred as apartment’s descriptive quality) the Wordscores algorithm has been used. The analytical dataset has covered 7872 Polish-language listings of secondary market apartments for sale and rent located in Poznań, Poland, gathered quarterly between 06/2019 and 03/2021 from two most popular online listing platforms in Poland. Similarly to the observations discussed in Sect. 3.1., each listing has been labelled with the quality class corresponding with the visual quality signal sent by the seller (later referred as apartment’s visual quality). Moreover, for 6796 listings it has been indicated whether the listing has been posted by the real estate broker (84% of the tagged observations) or by the owner of the apartment (16% of the tagged observations).

The listings without photos that would allow assigning a visual quality label, without textual descriptions, duplicates, together with 5% of the shortest observations among the remaining ones have been excluded in the preliminary stage to ensure that all the descriptions in the analytical sample are unique and valid. Finally, the analytical sample has been divided into training and test set. In the baseline scenario, the test set has been formed by the observations from the last quarter. The initial structure of the analytical samples has been presented in Table 1.

Table 1 The initial structure of the analytical sample: baseline scenario.

3.2.1 Building dictionaries

The steps of the Wordscores algorithm have been the same for rental and sales listings, although the calculations have been conducted independently. For clarity, they have been presented on the sample listing’s textual description:

PL: “Mam do zaoferowania na wynajem 2-pokojowy apartament z balkonem o powierzchni 40 m2 zlokalizowany w prestiżowej lokalizacji.” (ENG: “I am offering for rent a 2-room apartment with a balcony, of [the area] 40 m2, located in a prestigious location.”).

  1. 1.

    Numbers, special characters, and words no longer than four characters have been removed from each listing’s description. The exclusion of the shortest words has aimed at reducing the noise of the analysis, as in Polish this group consists mainly of pronouns and conjunctions—the words excluded this way should carry no significant quality signal. Finally, the Polish listing platforms impose high word limits, thus the usage of abbreviations in not common.

  2. 2.

    Using the morphological dictionary of the Polish language (Miłkowski, 2016) all words in all listings (training and test sets) have been replaced with their lemmas. The lemmatization process converts inflected grammatical form of a word to its basic form (lemma). In this manner, the words “[he] renovates”, “[she] renovated” or “[they are] renovating” would be replaced with their infinitive form “[to] renovate”. All the words that have not been found in the morphological dictionary have been excluded from further analysis. Then the sample listing’s description would take the following form:

PL: zaoferować / wynajem / pokojowy / apartament / balkon / powierzchnia / zlokalizować / prestiżowy / lokalizacja (ENG: to offer / rent / room / apartment / balcony / area / locate / prestigious / location).

  1. 3.

    The occurrences of each lemma in descriptions of apartments of a given visual quality have been counted:

    $$n_{j} = h_{j} + m_{j} + l_{j} ,$$
    (1)

where \(n_{j}\) is the number of occurrences of the \(j\)-th lemma in all training set observations, \(h_{j}\), \(m_{j}\) and \(l_{j}\) refer to the number of occurrences of the \(j\)-th lemma in the descriptions of apartments of respectively high-, medium- and low-visual-quality.

  1. 4.

    Each occurrence of a lemma in the description of a low-visual-quality apartment has been treated as a negative sentiment of the lemma. In the contrary, each occurrence of a lemma in the description of a high-visual-quality apartment has been treated as a positive sentiment of the lemma. The occurrences in the medium-visual-quality apartments’ descriptions have been treated as neutral. Then, the \(WORD\_SIGNAL\) score—the quality sentiment of the single (\(j\)-th) lemma may be calculated as:

    $$WORD\_SIGNAL_{j} = \frac{{h_{j} *\left( { + 1} \right) + m_{j} *0 + l_{j} *\left( { - 1} \right)}}{{n_{j} }}$$
    (2)

As a result, a list of lemma-\(WORD\_SIGNAL\) pairs has been obtained, which may be referred as a dictionary. The base—DICT_0% version of the dictionary has covered all the lemmas that appeared at least once in all the training set listings’ descriptions. Later, the base version has been restricted to leave only the lemmas, which appeared at least:

\(D*3,127\) times—for the dictionary of the rental listings,

\(D*2,993\) times—for the dictionary of the sales listings.

Finally, eight variants of the dictionary, for:

$$D = \left\{ {0\% ; 0.25\% ; 0.5\% ; 1\% ; 2.5\% ; 5\% ; 10\% ; 20\% } \right\}.$$

labelled as DICT_D have been obtained. The higher the \(D\)-value, the more restrictive the dictionary has become, hence it has included only the more frequently used lemmas. The neutral \(WORD\_SIGNAL\) score for the rental listings’ dictionary (for the lemma that would theoretically appear exactly once in every listing, regardless of the listed apartment quality) equals:

$$WORD\_SIGNAL_{neutral, rental} = \frac{{420*\left( { - 1} \right) + 1950*0 + 757*\left( { + 1} \right)}}{3127} = 0.108$$
(3)

and is dependent on the training set’s quality structure. It means that all lemmas with a \(WORD\_SIGNAL\) score close to 0.108 may be treated as neutral, and the more the \(WORD\_SIGNAL\) score diverges from this value, the stronger the positive or negative quality signal represented by the lemma. Table 2 presents some selected lemmas from the DICT_0% dictionary for rental listings.

Table 2 Some selected lemmas included in the rental market DICT_0% dictionary.

3.2.2 Calculation of wordscores

The listing length has been defined as an absolute number of lemmas (not necessarily unique) that appeared in the listing and have been included in the DICT_1% dictionary. Although all of the listings from the initial training sets have contributed to the dictionaries, it has been assumed that using the shortest listings for the models’ calibration and testing may lead to false conclusions. Thus, all observations shorter than an average listing length decreased by its standard deviation have been excluded. In the case of the rental observations, the cut-off point has been set at 53 lemmas and for the sales observations at 81 lemmas. Finally, the training set for the rental market analysis has included 2717 observations (12.0% of low, 62.2% of medium and 25.8% of high visual quality), while for the sales market analysis it has been 2633 observations (25.3% of low, 44.2% of medium and 30.5% of high visual quality). Figures 1 and 2 present histograms of listing lengths for the training sets.

Fig. 1
figure 1

Source: own calculations

Histogram of the length of the rental listing (measured as a number of lemmas included).

Fig. 2
figure 2

Source: own calculations

Histogram of the length of the sales listing (measured as a number of lemmas included).

As a next step, the \(WORD\_SIGNAL\) score from a dictionary has been assigned to each lemma in each observation from the training set. Based on the scores of all lemmas included in the corresponding version of the dictionary, the Wordscore—\(DESC\_SIGNAL\) has been calculated to represent the strength of the listing’s descriptive quality signal. Six variants related to the measures of \(WORD\_SIGNAL\) scores’ distribution within individual listing have been chosen to answer, which of them signals best the apartment’s quality. Thus, each \(DESC\_SIGNAL\) score has been calculated as 1st decile (labelled as DE1), 1st quartile (Q1), median (MED), average (AVG), 3rd quartile (Q3) and 9th decile (DE9) of all \(WORD\_SIGNAL\) scores of lemmas included in the observation. Finally, the \(DESC\_SIGNAL\) score for each listing from the training set has been calculated in 48 variants—for all combinations of 8 dictionary variants and 6 distributional variants. They reflect different approaches to measure the overall quality signalled via whole descriptions of apartments. The calculation’s scheme has been presented in Fig. 3.

Fig. 3
figure 3

Source: own calculations

The scheme for calculating the overall DESC_SIGNAL quality scores, based on the sample listing’s textual description.

All observations from the training set for which the \(DESC\_SIGNAL\_TR_{i, d, v}\) has been calculated according to the rules of the certain variant would be jointly referred as models, labelled as \(DESC\_SIGNAL\_TR\_MODEL_{d,v}\). \(TR\) refers to the training set, \(i\) indexes the training set observations, \(d\) refers to the variant of the dictionary and \(v\) refers to the distributional variant of the calculation.

3.2.3 Models’ calibration phase

The models’ calibration (based on the training set observations) has aimed at obtaining the threshold values \(\gamma\) and \(\delta\) needed to change the scale of the obtained measures of descriptive quality from continuous to ordinal (low, medium and high quality). The threshold values would be later used in the model’s testing and validation phase. The phase has proceeded as follows:

$$\left\{ {\begin{array}{*{20}c} {if \;DESC\_SIGNAL\_TR_{i, d, v} > \gamma_{d, v} \Rightarrow \hat{q}_{i, d,v} =^{ } high} \\ {if \;DESC\_SIGNAL\_TR_{i, d, v} < \delta_{d, v} \Rightarrow \hat{q}_{i, d,v} =^{ } low} \\ {if \;\delta_{d,v} \le DESC\_SIGNAL\_TR_{i, d, v} \le \gamma_{d, v} \Rightarrow \hat{q}_{i, d,v} =^{ } medium} \\ \end{array} } \right.,$$
(4)
$$\left\{ {\begin{array}{*{20}c} {if \;\hat{q}_{i, d,v} = q_{i } \Rightarrow A_{i,d,v} = 1} \\ {if \;\hat{q}_{i, d,v} \ne q_{ } \Rightarrow A_{i,d,v} = 0} \\ \end{array} } \right.,$$
(5)

where \(q_{i,d }\) is the visual quality of the listed apartment, \(\hat{q}_{i, d,v}\) is the model’s measure of the apartment’s descriptive quality in the ordinal form and \(A_{i,d,v}\) is the test parameter checking, whether the descriptive quality matches the visual quality of the apartment. The following optimization task has been set for each \(DESC\_SIGNAL\_MODEL_{d,v}\) in order to retrieve optimal values of parameters \(\gamma_{d, v}\) and \(\delta_{d, v}\), for which the model-estimated signals of descriptive quality would match the visual quality signals to the highest degree:

(6)

The optimization procedure has been conducted with the use of an evolutionary algorithm (Powell & Batt, 2008). Figure 4 presents the visualisation of the calibration phase procedure.

Fig. 4
figure 4

Source: own elaboration

Visualisation of the training set’s calibration procedure.

3.2.4 Models’ testing phase

First, the test set observations have been subject to the procedure described in Sect. 3.2.2. for training sets. The test set for the rental market analysis has included 767 observations (14.3% of low, 56.2% of medium and 29.5% of high visual quality), while for the sales market analysis it has been 719 observations (27.4% of low, 33.1% of medium and 39.5% of high visual quality). The overall \(DESC\_SIGNAL\_TE_{i,d,v}\) of each test set (\(TE\)) observation has been calculated in 48 model’s variants, which would be referred as \(DESC\_SIGNAL\_TE\_MODEL_{d,v}\). Then, each listing’s estimated descriptive quality signal measure has been converted from continuous into ordinal scale based on the threshold parameters \(\hat{\gamma }_{d, v}\) and \(\hat{\delta }_{d, v}\) obtained for the corresponding training set model’s variant (as presented in Fig. 5). Finally, the visual quality signals and estimated descriptive quality signals have been confronted with the use of contingency matrices.

Fig. 5
figure 5

Source: own elaboration

Visualization of the conversion of the test set’s DESC_SIGNAL quality measures from continuous into ordinal scale.

The testing phase has aimed at reducing the scope of search for the models’ best variants. As a result, the combinations of dictionary variant and distributional variant of the model that have shown the highest consistency of signals have been selected. The models’ variants with the smallest number of cardinal inconsistencies have been of the particular value. A cardinal inconsistency has been defined as assigning the descriptive quality label “high” to the low-visual-quality apartment or the label “low” to the high-visual-quality apartment.

3.2.5 Models’ validation phase

In the baseline scenario, the last available quarter has been selected as the test set, while the observations from remaining five quarters have constituted the training set. To provide robust results the whole algorithm has been rerun, every time selecting one different period as the test and the remaining five periods as the training set (sixfold validation). Then, the contingency matrices summarizing the results of all six models have been constructed to select the single variant of the model for which the quality signals have been consistent to the highest degree.

It may be suspected that signals’ inconsistency would be higher for listings’ with \(DESC\_SIGNAL\) scores closer to the threshold values \(\hat{\gamma }_{d, v}\) and \(\hat{\delta }_{d, v}\) than to the ones more distant, which may be considered less clear signals. To find, whether the consistency of visual and descriptive quality signals differs depending on the descriptive signal’s strength, the ranges of inconclusiveness have been introduced. The parameter \(\alpha_{p,d,v}\) (\(\alpha > 0)\) has aimed at providing the ordinal descriptive quality measure only to some restricted group of observations. Then:

$$\left\{ {\begin{array}{*{20}l} {if\; DESC\_SIGNAL\_TE_{i, d, v} > \hat{\gamma }_{d,v} + \alpha_{p,d,v} \Rightarrow \hat{q}_{i, d,v} =^{ } high} \hfill \\ {if\; \hat{\gamma }_{d,v} + \alpha_{p,d,v} \ge DESC\_SIGNAL\_TE_{i, d, v} \ge \hat{\gamma }_{d,v} - \alpha_{p,d,v} \Rightarrow \hat{q}_{i, d,v} =^{ } no \;decision} \hfill \\ {if\; \hat{\delta }_{d,v} + \alpha_{p,d,v} < DESC\_SIGNAL\_TE_{i, d, v} < \hat{\gamma }_{d,v} - \alpha_{p,d,v} \Rightarrow \hat{q}_{i, d,v} =^{ } medium} \hfill \\ {if\; \hat{\delta }_{d,v} + \alpha_{p,d,v} \ge DESC\_SIGNAL\_TE_{i, d, v} \ge \hat{\delta }_{d,v} - \alpha_{p,d,v} \Rightarrow \hat{q}_{i, d,v} =^{ } no\; decision} \hfill \\ {if\; DESC\_SIGNAL\_TE_{i, d, v} < \hat{\delta }_{d,v} - \alpha_{p,d,v} \Rightarrow \hat{q}_{i, d,v} =^{ } low} \hfill \\ \end{array} } \right.$$
(7)
$$p = \frac{{\mathop \sum \nolimits_{i = 1}^{I} ND_{i,d,v} }}{I},$$
(8)
$$\left\{ {\begin{array}{*{20}c} {if \;\hat{q}_{i, d,v} \in \left\{ {high, medium, low} \right\}^{ } \Rightarrow ND_{i,d,v} = 0 } \\ {if \;\hat{q}_{i, d,v} =^{ } no\; decision \Rightarrow ND_{i,d,v} = 1 } \\ \end{array} } \right. .$$
(9)

where \(p\) refers to the percentage of all observations, for which the visual and descriptive quality signals have been compared (otherwise, the model has not provided a descriptive quality measure—“no decision” and the consistency of signals has not been considered) and \(ND_{i,d,v}\) is the test parameter checking whether the descriptive quality label has been assigned or not. The parameter \(\alpha_{p,d,v}\) has been estimated using the evolutionary algorithm for \(p \in \left\{ {0.9 ;0.8 ;0.7 ;0.6 ;0.5 ;0.4 ;0.3 ;0.2 ;0.1} \right\}\), while for \(p = 1\) the ordinal descriptive quality labels have been assigned to all observations (the situation is then equal to the standard scenario). The scheme of the restricting procedure has been presented in Fig. 6.

Fig. 6
figure 6

Source: own elaboration

Visualization of the models’ restricting procedure after introduction of ranges of inconclusiveness.

3.2.6 Models for agents: quality signals sent by brokers vs quality signals sent by owners

The last part of the study has aimed to answer whether the consistency of signals sent by real estate brokers differs from the consistency of signals sent by owners of apartments. Knowing the best \(d\)—dictionary and \(v\)—distributional variants of the models for sales and rental listings one can extend Eq. 5 to calculate the number of consistent signals for specific apartment quality groups (\(s\)) and listed by specific agents (\(b\)). Then:

(10)

where \(B = \left\{ {broker ; private ;not \; indicated} \right\}\) and \(S = \left\{ {low ;medium ;high} \right\}\).

Subsequently, the number of listings for which the descriptive and visual quality have proven to be consistent may be defined as:

(11)

and \(T_{d,v,s,b}\) may be defined as a number of all listings of \(s\)—standard, posted by \(b\)—agent and calculated according to the rules of \(d\)—dictionary and \(v\)—distributional variant of the model. In this regards, \(G\)—the consistency of the model for each agent may be calculated as:

(12)

However, the consistency of the model for agents calculated this way would be dependent on the quality structure of listings posted by each type of agent. To account for the differences and to ensure the comparability of the results the \(H\)-value has been introduced. It represents the consistency of the model of agents weighted with the use of the quality structure of the full dataset (used in the validation phase). Its final form may be presented as:

(13)

4 Findings

4.1 Consistency of signals: direct textual vs indirect visual signals of quality

In Tables 3 and 4 the direct textual signals of apartments’ quality have been confronted with the visual quality signals. In rental listings, the textually signalled high quality has rarely corresponded with the visual quality. The same has applied to the textual signals of good quality that have corresponded mostly with medium or low visual quality. It may mean that the direct textual signals are overstated, but may be well a result of a different (much more optimistic) scale adopted by sellers than by NBP (2022). The more serious problem is that the Polish listing platforms offer no possibility to signal directly, textually low quality, so low-quality apartments are mostly being left unlabelled or mislabelled as “good quality” ones.

Table 3 Comparison of direct textual and indirect visual signals of quality of apartments listed for rent.
Table 4 Comparison of direct textual and indirect visual signals of quality of apartments listed for sale.

In sales listings, the textually signalled high quality has most often corresponded with the visual signal of high quality and textually signalled good quality has corresponded with low or medium visual quality. The number of apartments declared textually as “freshly renovated” has been insignificantly low. Finally, the textual quality signal “to enter”, intuitively should refer only to the medium- and high-visual-quality apartments, however, it has also been used for some low-quality ones. Nevertheless, the consistency of signals has been much higher than for the apartments listed for rent.

Table 5 presents the quality structure of listings, for which the direct textual quality signal has been sent compared with the quality structure of observations with quality non-signalled textually. The noticed information gap as for the direct textual quality signal has been substantial and has covered more than half of all observations. Although for the rental listings it has been found that the quality structure of the observations with signalled and non-signalled quality is similar, for the sales listings the difference has been considerable. It may introduce bias to the often-performed analyses that base only on the observations with textually, directly signalled quality.

Table 5 Comparison of the quality structure of listings, for which the direct textual quality signal has been sent with the quality structure of observations with non-signalled quality.

4.2 Consistency of signals: indirect textual (descriptive) vs indirect visual signals of quality—Wordscores algorithm

The results obtained in the testing phase presented in Table 6 have pointed to the highest consistency of signals in the Wordscores models that have based on relatively little restricted variants of the dictionary (\(D \le 2.5\%\)). For sales listings the most consistent models have based on average, 3rd quartile or 9th decile of \(WORD\_SIGNAL\) scores of words included in listings and for rental listings—median, average or 3rd quartile of \(WORD\_SIGNAL\) scores.

Table 6 The consistency of indirect textual (descriptive) and visual quality signals based on the Wordscores models—testing phase results.

Table 7 presents the results of the validation phase for the models of the highest consistency obtained in the testing phase. For sales listings the highest consistency has equalled 64% for the \(DESC\_SIGNAL\_TE\_MODEL_{0\% ,DE9}\), which means that visual quality signals are most consistent with the textual signals conveyed by the 10% of words with the highest quality-sentiment score. However, the model that based on the average sentiment of words (\(DESC\_SIGNAL\_TE\_MODEL_{0\% ,AVG}\)) has given similar results. As for the dictionary variants, all analysed variants have given comparable results, but the highest consistency of signals has been obtained for the unrestricted version of the dictionary (DICT_0%). For the rental analysis, the AVG distributional variant has been dominant and the highest consistency has been achieved for \(DESC\_SIGNAL\_TE\_MODEL_{0.25\% ,AVG}\), amounting to 71%.

Table 7 The consistency of indirect textual (descriptive) and visual quality signals based on the Wordscores models—validation phase results.

Tables 8 and 9 present the results of the analysis with the introduced ranges of inconclusiveness. Only the variants for which the highest consistency has been detected in the first stage of the validation phase have been presented. In Fig. 7 the results have been presented on a graph.

Table 8 Contingency matrices of the Wordscores models that have shown the highest consistency of signals—results of the validation phase with ranges of inconclusiveness.
Table 9 The consistency of indirect textual (descriptive) and indirect visual quality signals based on the Wordscores models that have shown the highest consistency of signals—results of the validation phase with ranges of inconclusiveness.
Fig. 7
figure 7

Source: own calculations

The consistency of indirect textual (descriptive) and indirect visual quality signals based on the Wordscores models that have shown the highest consistency of signals—results of the validation phase with ranges of inconclusiveness.

For rental listings, the consistency of indirect textual and visual quality signals for medium- and high-visual-quality apartments has been high and has increased with the decreasing \(p\)-percentage of most clear textual signals. However, for low-visual-quality apartments the consistency has been low. It may be rooted in the specificity of rental listings, in which sellers may have incentives to exaggerate and pretend that the quality is higher than it is indicated by the attached photos to achieve the highest possible price. It stands in line with the results presented in Sect. 4.1 concerning sellers’ over-optimism. Moreover, the low quality of apartments for rent often manifests itself in a high degree of wear and tear of the apartment or in its obsolete design, which are harder to reflect in texts. Finally, the rental model in the standard scenario have detected only 4 (out of 3482 signals compared which equals around 0.1%) cardinal inconsistencies. This number has dropped even more after introducing ranges of inconclusiveness.

For sales listings the consistency of signals for medium-quality apartments has been high, whereas in the case of high-quality ones it has been very high. The consistency of signals for low-quality apartments has been relatively low, but higher than for the rental listings. It may be rooted in the fact that in the sales market some customers may see the low quality as an advantage. They supposedly do not want to pay extra for the apartment of medium quality, because either way they intend to renovate it prior to moving in. Therefore, sellers do not refrain from describing the quality as low in the textual descriptions or from signalling the low quality using specific vocabulary (i.e. “apartment with potential” used as a synonym for “apartment for renovation”). Finally, comparing two variants of the sales model, the similar results have been achieved in all but one category. The \(DESC\_SIGNAL\_TE\_MODEL_{0\% ,DE9}\) detected 45/3712 cardinal inconsistencies, (which equaled 1.2%), whereas the \(DESC\_SIGNAL\_TE\_MODEL_{0\% ,AVG}\) detected only 30/3718 ones (0.8%), which is considerably less. Therefore, although the \(DESC\_SIGNAL\_TE\_MODEL_{0\% ,AVG}\) variant has shown little less consistency, it has been considered as the best variant of the sales model.

Overall, when introducing the ranges of inconclusiveness, the models showed consistency of signals of up to 83% for the rental listings and 90% for the sales ones. The increasing consistency (following the decrease of p-percentage of all signals compared) is particularly important, because it confirms not only the high consistency of indirect textual and visual signals but also proves the increasing consistency of signals along with the increasing clarity of signals.

4.3 Consistency of signals: models for agents

Despite the relatively small number of observations of listings posted by owners of the apartments (only 764 rental and 325 sales listings) compared with the number of listings posted by brokers, the results of models for agents reveal interesting patterns. For rental listings, the consistency (in the form of the quality-corrected \(H\)-value) of quality signals has proven to be slightly higher for private listings—72% than for the listings posted by brokers—70%. It may be rooted in the fact that private landlords are more interested in finding the tenant, who not only would generate financial flows, but also would take care of the apartment and would stay in the apartment for longer. Thus, private landlords would have more incentives to represent the reality as it is and to point at the consistent quality standard with the use of both descriptive and visual signals. At the same time, brokers are willing to conclude the transaction quickly, as their commission is not dependent on the time needed to find a suitable tenant. For this reason, they may use advertising tricks e.g. describe the apartment as the one of the higher quality than it really is, which results in the lower consistency of descriptive and visual signals.

As for sales listings, the results have been the opposite—the higher consistency of signals has been found for listings posted by brokers—63% than for the listings posted by the owners of the apartments—60%. In this case the brokerage agency might be interested in providing more reliable descriptions than in tricking the potential customer. In the era of information spreading rapidly on the Internet, apart from focusing only on finding a buyer quickly, brokers would also like to gain a good reputation. In contrast, the private sellers of apartments do not need to take care of their reputation as sellers, because they conclude only a single transaction. Thus, they may be more interested in sending the conflicting quality signals, hoping it will enable to achieve higher sales price.

5 Conclusions

In the research conducted on online sales and rental listings of apartments the information disclosure strategies of sellers have been inspected and the compliance of the three types of apartment soft quality signals: direct textual signals, indirect textual (descriptive) signals and indirect visual signals have been verified. In the first part of the empirical study, it has been documented, that there exists some discrepancy between direct textual signals of housing quality provided by apartments’ sellers and the visual quality signals. It has been found that the direct textual signals sent in rental listings have diverged from the visual signals, which could have been caused by a different adopted scale, much more optimistic than the NBP (2022) scale adopted as a reference point. Thus, in the case of apartments for rent, direct textual quality signals may be considered overstated by sellers. The direct textual signals of quality for the sales listings have been mainly in line with the visual quality assessments, which has been in the contrary to the initial assumptions. However, for sales listings, it has been found that there exists a considerable difference of the quality structure of apartments for which sellers declared quality directly and the ones left without declaration. All the mentioned phenomena, combined with the observed very high percentage of observations without directly textually signalled quality (for both rental and sales listings) contribute to the creation of the information gap. Its presence may dampen the efficiency of the market search processes and introduce bias to the increasingly more popular analyses based on sales and rental market listings data. Therefore, to achieve possibly representative and unbiased results of the analyses, one needs to both correct the over-optimistic direct textual quality signals and address the issue of quality of the observations for which no direct textual signal has been provided.

The second part of the study has focused on checking, whether the indirect textual quality signals, processed with the use of the simple supervised ML algorithm—Wordscores are consistent with the visual quality signals. It has been documented, that the sales listings’ descriptive and visual quality signals have agreed in 63–90% of cases, depending on the model’s variant. For the rental market, the descriptive quality signals matched the visual ones in 71–83% of cases. As a result, it may be concluded that both types of signals show considerable consistency. What is more, the consistency of quality signals for low-quality apartments may reveal the use of different listing strategies. The wording used by sellers in rental listings aims at hiding the negative aspects of apartments, which is being done by refraining from using the words with negative quality sentiment. Therefore, for apartments for rent the low-visual-quality signals have most often corresponded with the medium-quality descriptive signals. At the same time, in the sales market, the low quality may be sometimes considered as an advantage and sellers would refrain less from describing apartments with the wording with negative sentiment. Thus, for apartments for sale the low-visual-quality signals have much more often corresponded with low-quality descriptive signals. It may be also a reason, why the descriptive signals regarded as most clear have been more consistent with the visual signals in the sales listings’ analysis than in the rental one.

Moreover, it has been shown, that for rental listings the consistency of signals has been higher for listings posted by private landlords than for the ones posted by brokers. For sales listings it has been the opposite. The obtained results may be rooted in different motivations of both types of agents that drive them, while concluding the sales or rental agreement. Although the differences in consistency of signals sent by brokers and by the owners of the apartments have been relatively small and the numbers of observations have been limited, the results show some interesting patterns, which may constitute a base for further research.

Knowing the high compliance of textual and visual quality signals one may use them interchangeably to enhance the search processes and elevate the quality of the analytical datasets. Then, the descriptive quality measure, which is the fruit of the Wordscores method (\(DESC\_SIGNAL)\), may be used as a variable in the market related studies. It has been proven that to obtain the measure that would be most consistent with the visual quality signal one should utilize information from all or almost all words used in the listed apartment description. It means that in the case of extracting the quality signals the Wordscores model may be treated as a real “bag-of-words” model, in which literally all the words included in the description matter.

Though very promising, the proposed approach to extract textual quality signals from housing listings and the achieved results of signals’ consistency generate some issues that should be addressed in further studies. While the analyzed visual quality assessments have been done according to the official guidelines set by the National Bank of Poland, they also include some non-avoidable subjectivity. It may affect the visual quality assessments that have been used to train the Wordscores models on and may disturb the models’ results. However, it may be suspected, that the additional variance would rather lower the achieved results on the consistency of signals, thus the consistency may be in reality even higher than it has been proven in this paper.

Moreover, Liu et al. (2020) and Loughran and McDonald (2011) showed that the dictionaries of words are highly localized and Goodwin et al. (2018) proved that the interpretation of real estate wording depends on multiple social and economic factors. As a result, the outcomes of the Wordscores models obtained with the use of the dictionary build on Poznań’s data may differ in the specificity of other local markets and demand further testing. The same applies to the influence of the market conditions on the wording used in listings, which has not been tested in this paper, as it requires access to data of a longer time horizon.

Finally, it is advisable to both further extend the knowledge on the quality signals sent via listings and to develop methods of utilizing the textual information on quality. Combined with the promotion of creation of central databases of publicly available information on the real estate market transactions, this would increase market transparency and would enable more accurate and effective market monitoring.