Indicators for Sanitation Quality in Low-Income Urban Settlements: Evidence from Kenya, Ghana, and Bangladesh

In recent years, shared facilities have contributed substantially to increased access to sanitation in urban areas. While shared sanitation is often the only viable option in densely-populated, low-income urban areas, it is currently considered a “limited" solution by the international community. In this paper, we analyze the conditions under which shared sanitation could be considered of adequate quality and propose a set of indicators associated with sanitation quality to be included in national household surveys. We conducted a survey with 3600 households and 2026 observational spot-checks of shared and individual household toilets in Kisumu (Kenya), Kumasi (Ghana), and Dhaka (Bangladesh). We develop a composite sanitation quality outcome measure based on observational data. Using regression analysis, we identify self-reported indicators that correlate with the spot-checked composite measure and are, therefore, robust with regard to reporting bias. Results show that (pour-) flush toilets are a highly informative indicator for sanitation quality compared to other toilet technologies. In contrast to previous arguments and depending on the context, sharing a toilet has a comparatively lower correlation with sanitation quality. Toilets still show good quality if shared among only 2–3 households. Toilet location and lighting, as well as the presence of a lockable door, are equally strong indicators for sanitation quality and could serve as alternative indicators. The findings suggest that the sanitation service levels defined by the WHO and UNICEF might be reconsidered to better capture the quality of sanitation facilities in low-income urban settlements.


Introduction
The Sustainable Development Goal (SDG) 6.2 calls for "adequate and equitable sanitation and hygiene for all" and to "eradicate open defecation" (UN-DESA, 2020). However, little progress has been made in the provision of adequate sanitation in many low-and middle-income countries. In 2017, more than an estimated two billion people-roughly 25% of the global population-did not have access to "adequate" sanitation services, of whom 627 million people relied on shared toilets instead of a private toilet facility (JMP, 2021). "Adequate" or high-quality sanitation is typically defined using the "sanitation service ladder" and the corresponding set of indicators created by the WHO's and UNICEF's Joint Monitoring Programme (JMP) (JMP, 2019).
The JMP sanitation service ladder consists of five levels: open defection alongside unimproved, limited, basic, and safely managed sanitation. According to the WHO/ UNICEF, only the latter two levels are considered "adequate" (JMP, 2021). A toilet is only deemed basic or safely managed if the technology is improved, i.e., designed to hygienically separate excreta from human contact, 1 and is used exclusively by a single household. 2 Correspondingly, a toilet is deemed limited-even if it meets high technological standards-if used by two or more households. Toilets that do not hygienically separate excreta from human contact, i.e., are technologically unimproved, such as pit latrines without a cement slab, are categorized as unimproved, irrespective of the number of users. Figure 1 shows the evolution of sanitation coverage between 2000 and 2017 for the whole world and sub-Saharan Africa (SSA) based on data from the JMP (2021). The first panel shows that the coverage of basic and safely managed services has increased significantly across the world, from 56 to 74% in the last two decades. Simultaneously, the share of the world's population practicing open defecation (OD) and using unimproved technologies decreased from 38 to 18%. However, limited sanitation (i.e., facilities with improved technologies, but shared by two or more households) actually increased from 5 to 8% over the same period and remained particularly high in urban areas of low-income regions: In 2017, the share of limited sanitation was 31% in urban SSA and 19% in urban South Asia.
These statistics show that, first, preventing open defecation is no longer the main issue in urban areas of low-and middle-income countries. Second, a large and increasing share of the urban population already has access to improved sanitation technologies, but share it with other households. These two observations gain even more relevance considering that SSA and South Asia are often considered the world's fastest urbanizing regions. In 2017, 472 million people lived in urban areas in SSA, a figure projected to double over the next 25 years (Lall et al., 2017). In South Asia, the urban population is poised to rise by almost 250 million (or 50%) to approximately 750 million by 2030 (Ellis & Roberts, 2015;The World Bank, 2020). Hence, it is likely that many more toilets will be shared by multiple households in the future. Shared sanitation could be-given the available technology-the only viable option for improving sanitation in urban low-income settlements (Schouten & Mathenge, 2010).
Therefore, the important question is whether the international community continues labeling this sanitation solution as "limited" or if it can be labeled "adequate," at least under certain conditions. An ill-defined sanitation service ladder that incorrectly categorizes shared sanitation as inadequate could have a dampening effect on policymakers' incentives to allocate funds for shared sanitation, leading to a misallocation of investments (Evans et al., 2017). In addition, ill-defined indicators could lead to misguided development objectives. Indicators for better education, for example, faced similar scrutiny. For a long time, enrollment rates were defined as indicators for better education (e.g., by the Millennium Development Goals), which led to a policy focus on achieving higher enrollment rates at the cost of teaching quality. As a result, learning in schools stagnated or deteriorated in many countries (World Bank, 2018).
The JMP sanitation service ladder has already faced scrutiny in the literature, and whether or not its indicators are suited to measure "adequate" sanitation effectively has been contested (Evans et al., 2017). The main components of the current JMP sanitation indicators-toilet technology and the number of toilet users-have been repeatedly challenged (Mara 2016;Evans et al., 2017). Moreover, while observational studies tend to find large and robust effects of improved technologies on most child health variables (Fink et al., 2011;Heijnen et al., 2014;Andrés et al., 2017;Headey & Palloni, 2019), experimental studies show no effects on stunting and mixed effects on incidence of diarrhea Patil et al., 2015;Luby et al., 2018;Humphrey et al., 2019).
The number of sharing users has not drawn as much scholarly attention. Mosler (2014a, 2014b) and Shiras et al. (2018) suggest that shared sanitation facilities are less likely to be clean than individual household toilets because of barriers to collective action to clean and maintain the toilet. Some empirical studies support this hypothesis and show that cleanliness deteriorates with an increasing number of users in Uganda and Kenya (Günther et al., 2012;Simiyu et al., 2017b). For the case of Tanzania, on the other hand, Exley et al. (2015) find no correlation between shared toilets and pathogen contamination when compared to individual household toilets. It is also critical to consider factors beyond lack of hygiene and its associated health risks, such as privacy and safety, in order to understand when shared sanitation is adequate for all users, especially for women, children, and the elderly (Giné-Garriga et al., 2017;Sclar et al., 2018;Kwiringira et al., 2014;Tidwell et al., 2018;Simiyu et al., 2017b;Schelbert et al., 2020).
One could hypothetically try to elicit the hygiene, privacy, and safety of a toilet using a household survey. In practice, and as we show in this paper, the low reliability of selfreported data on sanitation quality outcomes prevents us from obtaining this information directly from households. At the same time, conducting observational spot-checks of toilet facilities is time-consuming and expensive, and thus infeasible for large-scale surveys across countries. Consequently, international large-scale surveys such as UNICEF's Multiple Indicator Cluster Surveys or USAID's Demographic and Health Surveys mostly rely on self-reported data. 3 The objective of this paper is to check the reliability of the leading indicators currently used in international surveys and to obtain improved indicators for sanitation quality. To this end, we first identified candidate indicators in a qualitative formative study (Schelbert et al., 2020). Based on these candidate indicators, we build a Sanitation Quality Index (SQI) that centers around the needs and preferences of the users. Because self-reported data on sanitation quality suffers from considerable reporting biases, we construct the SQI with spot-checked data. This paper then analyzes the correlation between this index (based on observed variables) and various self-reported characteristics of toilets to elicit "robust" reported indicators that can be elicited in large-scale surveys and can be applied to determine whether a toilet facility is "adequate". In particular, we first apply multivariate regression analysis to evaluate whether the current JMP indicators-toilet technology and the number of users-are highly correlated with the SQI. In a second step, we test the correlation between sanitation quality and additional self-reported indicators that are unlikely to suffer from reporting bias. Examples are the toilet's location, whether the toilet has a light, a bin, a lockable door, and whether there is a water source or a landlord living on the plot. We reconsider the current WHO/UNICEF Joint Monitoring Programme (JMP) framework and propose ways to improve it to measure access to adequate sanitation.
The remainder of this paper proceeds as follows. Section 2 describes the study setting, sampling, data collection procedure, and the empirical strategy. Section 3 first reports the descriptive results on the household sample and toilet facilities, and then provides a comparison of observed and reported sanitation quality indicators. It continues with the results of the regression analysis of the correlation between the observed SQI and various selfreported indicators. Finally, Sect. 3 ends with a discussion of the results within the current JMP framework. Section 4 concludes.

Setting
We conducted a cross-sectional study in low-income settlements between May and July 2019 in three cities: Kisumu, Kumasi, and Dhaka. Kisumu is the third-largest city in Kenya, with a population of around 500,000 people (Kenya National Bureau of Statistics, 2019). Forty-seven percent of the population lives in low-income settlements (NCPD, 2013). Kumasi is Ghana's second-largest city, with a population of approximately 2.5 million people, an increasing share of whom are living in low-income settlements (Amoako & Cobbinah, 2011). Dhaka is the capital of Bangladesh and the largest city in the country, with an estimated population of 20 million. Over a quarter of its residents live in lowincome settlements (Bangladesh Bureau of Statistsics, 2019). In all three cities, housing in low-income settlements is often organized in compounds, comprising several single-unit houses occupied by different households, most of whom are tenants. Households often share toilet facilities, which can have one or multiple cubicles. The operation and maintenance of these facilities is usually in the hands of the landlords or organized by the tenants, rather than outsourced to a paid cleaner (Alam et al., 2017;Simiyu et al., 2017a;Antwi-Agyei et al., 2020).

Sampling
The sampling strategy for the data collection consisted of four steps. First, between four and ten study areas were selected in each of the three cities based on income levels and the supposed prevalence of shared sanitation facilities. Only low-income areas with the prevalent use of shared sanitation facilities were eligible. Moreover, selected areas had to be distributed across the city. In a second step, up to four random geo-points were sampled in each study area using the geographic information software QGIS. The geo-points served as starting positions for the household sampling. Whenever possible, the (four) enumerators spread out in four different directions from the starting point. If there were fewer than four possible paths leading away from the point, the enumerators would walk in the same direction and split at the next opportunity (e.g., the next junction). In a third step, we applied a skipping pattern. The field assistants would start with the closest compound in the respective walking direction from the starting point, skipping the next two compounds and entering the third. 4 Fourth, upon entering a compound, field assistants would interview two households if the respondents used a shared toilet and one household if it was a private toilet. Each household within a compound was assigned a number and randomly selected by drawing a number on a mobile phone application. The second household was identified by repeating the same procedure with the restriction that the second respondent had to use the same toilet cubicle as the first respondent.
Each respondent had to be at least 18 years of age, a resident of the compound (i.e. living on the premises for at least three months), a regular user of a shared/private/public toilet facility within walking distance, and had to voluntarily consent to participate in the study. If none of the respondents in a household met these criteria, the field assistants moved to the next available household on the compound. In most instances, enumerators interviewed the household head or the most knowledgeable person. If the respondent of the first household in the compound was a male, the enumerators sought a female respondent for the second interview, and vice-versa.
Even though this study focuses on shared toilets, public and private toilets were still of interest for comparative purposes. We set the upper bound on the proportion of private and public toilets to not exceed 20% within a given study area, ensuring a minimum of 80% shared toilets in the total sample. Thus, the share of households using a private or public toilet is not representative of the chosen cities. Using the sampling method described above, we interviewed too few households with individual household toilets in Kenya and Bangladesh to allow for meaningful analysis, and thus had to resort to purposive sampling for individual household toilets.
Even though households were randomly selected within a given settlement based on the systematic sampling applied, the selection of settlements was purposive. We deliberately focused on settlements where the chance of encountering shared facilities was higher, 1 3 following expert knowledge from our local partners. We tried to ensure a certain degree of geographic dispersion across cities, but middle-and high-income areas were deliberately excluded. Therefore, the distribution of sanitation outcomes is not representative of the overall situation in each city but rather for low-income settlements only. Moreover, middleand high-income areas tend to have higher shares of households with private toilets. Thus, the private toilets we encounter in low-income areas might not necessarily have the same characteristics as those encountered in middle-and high-income areas.

Data Collection
To analyze the relevance of self-reported, sanitation-related indicators in predicting observed sanitation quality outcomes, we relied on two primary data sources: a household survey questionnaire and spot-check observations of the toilet cubicles used by the interviewed households. Both the potential quality outcomes and the explanatory variables (candidate indicators) were identified through a combination of formative qualitative research and the WHO guidelines on sanitation (Schelbert et al., 2020;WHO, 2018).
The questionnaire was administered in person by trained field assistants. It was conducted in multiple local languages and piloted extensively in the presence of the authors to ensure consistency across the three study sites. A detailed explanation for each variable used in this analysis is listed in Table 8 in Appendix A. In addition, the field assistants conducted spot-checks of toilets that respondents had indicated were their primary sanitation facility. The questionnaire and spot-check protocol can be found in Online Resource 1.
Sanitation quality dimensions were identified in focus group discussions as part of a formative qualitative study (Schelbert et al., 2020), which was conducted in the same cities and contexts as this study. The three sanitation quality dimensions are: hygiene, safety (and security), and privacy. For these three dimensions, the corresponding outcome characteristics were selected based on their relevance to the three quality dimensions and the feasibility of observing them during spot-checks. Due to concerns about the validity and reliability of self-reported data, we focus exclusively on observable proxies for the quality outcomes (see Sect. 3.2). In contrast to the WHO (2018) guidelines on sanitation and health, we do not include affordability and accessibility as quality dimensions. What is affordable depends on each individual's budget constraints and willingness to pay and is therefore not a suitable quality feature. Further, accessibility was excluded because it turned out to be covered by the other three dimensions and was not explicitly mentioned by participants during the focus group discussions. The quality dimensions and linked indicators are: • Solid roof (without holes): The roof protects the user from external (environmental) factors such as rain.
• Solid floor (without cracks/holes): The floor separates the user from excreta and is, therefore, a gatekeeper for health hazards through both direct contact and indirect contact, e.g., insects. A solid floor also prevents users, particularly children, from falling into the pit, should there be one.
• Privacy: • Solid wall: The wall must be made out of solid material and have no holes that would allow a person to peek through.
To develop a single sanitation quality outcome measure, we aggregated the eight quality characteristics into a single index (see Sect. 2.4).

Empirical Strategy
The empirical approach of this paper follows four steps. First, we calculate the Sanitation Quality Index (SQI) based on the three dimensions and eight observed characteristics described in the previous section. Second, we use regression analysis to study the relationship between the SQI, as a proxy for toilet quality, and currently used self-reported sanitation indicators, namely technology and sharing. Third, we include additional self-reported candidate indicators in the regression analysis that were identified as user quality priorities in Schelbert et al. (2020). Fourth, we incorporate the findings into the current JMP framework to analyze the implications of new quality indicators for the sanitation service ladder. Aggregating the eight observed sanitation quality indicators into one single measure simplifies the analysis to a single outcome variable. Simiyu et al. (2017b) and Tidwell et al. (2018) provide similar examples of aggregated sanitation quality indices. Simiyu et al. (2017b) calculated a score with equal weights summed over 18 binary quality characteristics, each of which is assigned to one of three quality dimensions (hygiene, privacy, and toilet design). Tidwell et al. (2018) is guided by five quality dimensions-hygiene, sustainability, use, desirability, and accessibility-and assigns weights according to the number of characteristics within each quality category. Both methods have one caveat. The method applied by Simiyu et al. (2017b) implicitly gives more weight to dimensions that include more characteristics. Consequently, privacy, which includes eight characteristics, ends up having twice the weight of hygiene, which is only made up of four characteristics. The method applied by Tidwell et al. (2018)-somewhat arbitrarily-gives equal weight to each dimension.
Here, we propose to assign weights to the characteristics using Multiple Correspondence Analysis (MCA). MCA, a special case of Principal Component Analysis (PCA), allows for the analysis of patterns in the relationships between categorical characteristics (Abdi & Valentin, 2007). It accounts for the fact that one characteristic might contribute more variation than others. This feature is generally desirable when constructing a measure that is supposed to capture differences between households. For example, PCA is used extensively to aggregate characteristics from questionnaires to develop wealth and socioeconomic status indices based on household assets (Filmer & Pritchett, 2001;McKenzie, 2005;Vyas & Kumaranayake, 2006).
MCA derives several orthogonal (i.e., uncorrelated) principal components equal to the number of variables used for the analysis. From the first principal component, "factor loadings" are obtained, serving as statistical weights assigned to each variable. We aggregate the eight observed quality characteristics into a weighted average that provides us with a single quality score for each toilet. In the context of sanitation, the technique is 1 3 based on the central assumption that the eight sanitation characteristics observed via spotchecks reflect an underlying variable, namely sanitation quality. The main advantage of this method is that the underlying variable accounts for the largest share of the variance and covariance in the data. Additionally, the statistical weights derived from MCA solve the problem of choosing arbitrary weights (see Appendix B for a formal description of how the SQI is constructed.) We subsequently use the SQI as a dependent variable in multivariate regressions to estimate the correlation between households' self-reported toilet indicators and the SQI, which is based on observed toilet characteristics. The regressions are modeled using three different specifications. In the first model specification, toilet quality is regressed on the selfreported indicators that currently determine the JMP sanitation service ladder, improved technology 5 and shared cubicle: where i represents a household, using facility f, in country c. The coefficient represents the difference in SQI scores between technologically improved and unimproved toilets, holding the sharing status constant. Similarly, represents the difference in SQI scores between shared and private toilets, holding the toilet technology constant. The coefficient c denotes a vector of country fixed effects, controlling for unobserved but constant differences in SQI scores across countries. The error term, ifc , is clustered at the facility level to account for correlated outcomes between two respondents using the same toilet.
In the second specification, the two indicators, improved_technology and shared_cubicle, are decomposed into the categorical variables technology, outflow, and sharingHHs. Technology represents a dummy variable of technology category J: flush (reference category), pit latrine (with slab), pit latrine (no slab)/other. Similarly, outflow denotes a dummy variable of categories K: piped sewer/septic tank (reference category), pit, and elsewhere. SharingHH denotes the number of households, L, sharing a cubicle, coded as a categorical variable. 6 We estimate the following regression model: where , , and now represent regression coefficients corresponding to the category J, K, and L of technology, outflow and sharingHHs, respectively. Therefore, j , k and l give us the difference in SQI scores between a category j of technology, outflow, sharingHHs, and the omitted reference category. (1) Finally, we add a list of additional (again self-reported) indicators X if to the regression model besides technology, outflow, and sharingHHs, which were identified as user priorities in the formative study described in Schelbert et al. (2020): 7 where ifm provides us with marginal SQI difference in SQI scores in the presence of a toilet characteristic k, holding technology, outflow, sharingHHs, and all other variables constant. Equations 1, 2, and 3 are estimated for the pooled sample as well as separately for each country (in this case, c is dropped from the regressions).
In a last step, these results are checked against the current JMP framework to analyze whether the indicators currently used for sanitation service levels are supported by the data on observed sanitation quality. We assess the indicators' performance in separating highquality toilets from low-quality toilets (as measured by the SQI). We compare different alternative sanitation service level specifications, where we manipulate the decisive criteria that classify sanitation facilities as basic, limited, or unimproved. 8

Descriptive Statistics
Demographic characteristics. The sampling procedure yielded a sample size of 3,600 households, as reported in Table 1. The vast majority of respondents are female, possibly because women were more likely to be at home during the day when data were collected. Even though all study areas are low-income urban settlements, household characteristics differ considerably across the three cities. Ghana, for example, has a considerably larger average household size than Kenya or Bangladesh. 9 Meanwhile, Bangladesh has the highest number of household members per room: on average, a room is shared by 4.3 people in Bangladesh and 4.0 people in Ghana, while in Kenya, it is shared by only 3.3 people. Bangladesh has the highest number of household heads without formal primary education (47%), followed by Ghana (25%), and Kenya (12%). Home ownership is also distributed unevenly across the three countries. In Kenya and Bangladesh, most respondents are informal tenants (76% in Kenya and 84% in Bangladesh), whereas in Ghana, a large share of respondents own the dwelling unit that they live in (49%). This imbalance might be due to the large share of traditional compound housing in Kumasi (Ghana), where it is possible to have multiple housing unit owners per compound (Tipple, 2011). (3) The additional indicators include: location, water on premises, handwashing facility with soap, lighting, lockable door, tiling, gender-separated cubicles, cleaning arrangement, user relationship, age of toilet, landlord on plot, and bin inside cubicle. 8 Safe management of sanitation facilities could not be determined using a spot-check and household survey and is therefore excluded from this analysis. Some toilets that would otherwise qualify as safely managed are considered basic as part of this study. 9 For easier reading, the study sites are referred to by the name of the country. Results from Kisumu will be referred to as results from Kenya, Kumasi as Ghana, and Dhaka as Bangladesh.
Toilet characteristics (reported). Results in Table 2 indicate that toilet technologies are remarkably heterogeneous across the three study sites. In Kenya, the sampling procedure resulted in mostly pit latrines (with slab) (83%), followed by flush to sewer/septic tank/ pit (13%). In Ghana, the sample is more equally distributed between flush to sewer/septic tank/pit (55%) and pit latrines (with slab) (41%). In Bangladesh, 90% of all toilets flush to "elsewhere". For 52% of those, the outflow is unknown to the respondent. 10 We further find that in Kenya (6.9) and Bangladesh (6.6), shared toilets have a considerably higher average number of households per toilet cubicle than in Ghana (5.98). 11 The toilets' location is relatively equally distributed across country samples. Most toilets are located outside of the individual dwelling but within the compound (90%). In total, 68% of respondents report having access to an improved water source on their premises. Only a fraction of the respondents reports having lighting inside the toilet cubicle in Kenya (5%), as opposed to Ghana (56%) and Bangladesh (61%). Most cubicles are lockable either from the inside or the outside; 67% are both. In Kenya and Ghana, more toilets exhibit an outside lock (Kenya 74%; Ghana 85%) than an inside lock (Kenya 63%; Ghana 81%),  whereas, in Bangladesh, almost all toilets have an inside lock (96%) and fewer have an outside lock (68%). In terms of floor tiling, Ghana stands out with 54% compared to 6% in Kenya and 4% in Bangladesh. The share of gender-separated toilets is below 5% throughout. Compared to the other two countries, few respondents in Kenya report having a cleaning rota (Kenya 14%; Ghana 45%; Bangladesh 69%). The term "user relationship" describes the social proximity between the respondent's household and the other users. The majority of toilets are used only by relatives and close neighbors, except for Ghana, where 19% report that the toilet is also shared among individuals who are not next-door neighbors and people from outside the compound. In Ghana, toilets are older than in the other two countries-59% reported that the toilet was built ten or more years ago. 12 There is also an exceptionally high share of landlords or caretakers in Ghana that live in the same compound as the respondent (88% compared to 37% in Kenya and 36% in Bangladesh). In Ghana, bins for solid waste are frequently found inside the toilet cubicle (59%). 13 In contrast, in Kenya and Bangladesh, the share is below 1% and 2%, respectively.
Hence, the toilet technology, as well as other toilet characteristics, seem to vary widely across countries, which might lead to differences in sanitation quality. We were therefore c We assume toilets draining elsewhere involve an unsafe conveyance system d The share of households using private toilets is not representative for the samples as these households (and toilets) were purposively sampled  12 In case the respondent did not know, the length of time the respondent lived on the plot was applied. 13 The share of bins outside the cubicle is very low in all three countries. not able to analyze challenges that are associated with a specific type of toilet technology across all three countries. For example, emptying arrangements are most certainly an essential factor for the quality of pit latrines but could not be studied in this cross-country context.
Sanitation Quality Index-SQI (observed). Figure 2 shows the distribution of SQI scores for each city. The first panel shows the SQI scores resulting from a pooled Multiple Correspondence Analysis (MCA), where observations from all three countries (cities) are included in calculating the index weights. The second panel relies on an MCA computed separately for each country (city). All observations are binned according to their SQI score in equal bins of 20 points on the SQI scale. Overall, we find that most of the observations end up having a score between 60 and 100, indicating a skewed distribution of scores. On average, the toilets in Kenya have the lowest SQI scores out of all three countries. In the first panel, 40% of toilets in Kenya have an SQI score of 80 or below. In Ghana, less than 6% of toilets have SQI scores of 60 and below, and in Bangladesh, approximately 12% fall below a score of 60. Compared to the pooled SQI, the SQI score by country tends to have more variation. This difference is particularly striking in Ghana. Table 3 shows the observed toilet quality characteristics that were used to construct the SQI. A more detailed description of the characteristics can be found in Appendix A. We see that all dimensions of the SQI drive the uneven SQI distribution, i.e., cleanliness, safety, and privacy (Kenyan toilets consistently score lower than toilets in Bangladesh and Ghana). Only 21% of shared urban toilets in Kenya are clean, as measured by the lack of visible insects, feces or solid waste, followed by Bangladesh (43%), and Ghana (61%). There is also a high divergence in full and clogged toilets. In Kenya, more than a third of the observed toilets were clogged or had a full pit, in contrast to just over 8% in Ghana and 6% in Bangladesh. Handwashing facilities with soap are absent in all but 11% of the toilet facilities. The share is lowest in Kenya (2%), followed by Ghana (10%), and Bangladesh (22%).

Observed Versus Self-Reported Sanitation Quality
As discussed in Sect. 2.3, we rely solely on observed toilet characteristics to construct the SQI because we find that asking households about their toilet's sanitation quality leads to high measurement error-probably because of social desirability bias. Table 4 documents this point by showing the coefficients of simple bivariate regressions between selfreported and observed characteristics that were used to construct the SQI. Even though all correlations are statistically significant on any conventional level, the magnitude differs considerably depending on the characteristic. While the correlation between observed and self-reported construction materials is high, toilet cleanliness and the availability of handwashing facilities are often reported differently by enumerators and respondents. Figure 3 further illustrates this point by showing the cleanliness assessment (one of the three dimensions of the SQI) from different data sources. It shows the distribution of toilet cleanliness ratings based on respondent-reported data compared to observed data recorded by enumerators during spot-checks and compared to a remote coding of photographs taken during the spot-checks. The reported data represents the respondents' subjective assessment of the toilets' general cleanliness on a five-point Likert scale. The observed data was collected by enumerators after the interviews, also using a five-point Likert scale. Enumerators paid special attention to visible feces, insects, solid waste, as well as spilled urine and other bodily substances. Third, enumerators took photos of the toilets (from the inside and outside) during the spot-checks that were later rated by research assistants who were otherwise not involved in the study. The left panel shows the comparison between reported and observed data, while the right panel shows the comparison between reported and remotelycoded data.
Each bar shows the distribution of cleanliness ratings that were observed (by enumerators) or remotely-coded (by research assistants) for a self-reported cleanliness level. For example, the first bar contains all toilet cubicles that were considered very dirty by their users. About 60% of these cubicles were also considered to be very dirty by the enumerators. The graph shows that even if there is a positive correlation between reported, observed, and remote assessments, there is considerable disagreement between households and the other two types of observers. In general, households report their toilets to be cleaner than external observers. Whereas households only reported 11% of toilets to be dirty or very dirty, the data based on observations suggest that 26% of the shared toilets are dirty or very dirty. Hence, we conclude that self-reported sanitation quality is not a reliable indicator of observed sanitation quality, in particular for the dimension of hygiene.

Regression Results: Technology and Number of Households
To test whether self-reported technology, the number of households, and potential alternative indicators are good predictors of sanitation quality (as measured by the SQI), we compare different models of multivariate linear regressions. In all subsequent regression tables, we report robust standard errors clustered at the compound level because of the sampling strategy (see Sect. 2.2). While some of the variables are household-level data from the questionnaire, other variables are compound-level data, in particular those based on the spot-check observation. Using clustered standard errors, we acknowledge that the residuals may be correlated for respondents dwelling on the same compound. Additionally, all regressions on the pooled sample include country fixed effects to control for constant, but unobserved differences in SQI scores between countries. Table 5 reports the regression results for the correlation between the SQI, technology, and the number of sharing households. The results in columns 1-4 correspond to equation 1 in Sect. 2.4. The explanatory variables, improved technology and whether a toilet is shared by two or more households, are the two indicators currently used by WHO/ UNICEF for the sanitation service level assessment. On average, the SQI scores of technologically improved toilets-defined as pit latrines with a cement slab or flush/pour-flush to piped sewers/septic tanks/pits by WHO/UNICEF-are not statistically different from unimproved toilets. In none of the country-specific regressions do we find a significant  difference in SQI scores between "improved" and "unimproved" technology as defined by WHO/UNICEF. For Bangladesh, this result is driven by toilets being classified as unimproved due to the outflow and not the interface. In Kenya and Ghana, the difference is sizable but insignificant due to a small share of toilets that classify as unimproved according to WHO/UNICEF (see Table 2). The coefficient for a shared toilet cubicle is negative and statistically significant for the pooled sample (Table 5, Col.1). On average, the SQI of shared cubicles is 12 points below private toilets, which have an SQI of 74.1 on average. The negative coefficient for toilet sharing is particularly evident for the Kenyan sample. On average, shared facilities are 26 SQI points below facilities that are used by only one household. Shared toilets score 9 points lower than private toilets in Ghana, and only 4 points lower in Bangladesh. These results suggest that improved sanitation technology (as currently defined) is generally not associated with toilet quality, and for shared cubicles, the magnitude of the effect is highly context-sensitive.
The results in Table 5, Cols.5-8 show the impact of technology, outflow, and the number of sharing households as categorical variables (see equation 2). Results indicate that for the pooled sample, SQI scores of improved pit latrines (with slab) are more than 14 points lower than those of flush toilets (even though both are categorized as improved by the current WHO/UNICEF definition). The SQI for pit latrines with no slab and other technologies is, on average, 24 points lower than for flush toilets. 14 This result is mainly driven by the Kenyan sample. In Ghana, the difference in toilet types is much more driven by the outflow (piped sewer/septic tank vs. pit vs. elsewhere) than by the interface technology. In Bangladesh, the sample consists of mostly pour-flush toilets, making a comparison to other technologies unreliable.
The number of sharing households is negatively correlated with toilet quality, but the results suggest that the relationship between the number of sharing households and the SQI score is not linear. Based on the pooled sample, there appears to be a large gap between one and two households and no significant difference between two and three households. There is again a moderate difference between three and four households, whereas between four and nine households, average SQI scores stay relatively constant and only increase again with ten households or more. These results are again mainly driven by Kenyan households, while the differences are less pronounced in Ghana, where the SQI is similar for private households and toilets shared by two households. Interestingly, the number of sharing households does not conclusively predict toilet quality in Bangladesh: only toilets that are shared by more than ten households tend to be of lower quality.

Regression Results: Additional Variables
In Table 6, various additional candidate indicators for sanitation quality are included (see equation 3). The overall results show that the magnitude of the coefficients for technology and sharing households decreases compared to Table 5. In Kenya, this has the consequence that the coefficients for 2 and 3 households are no longer significant. In Ghana and Bangladesh, the consequence is that the number of households is no longer a significant predictor 1 3 of toilet quality for any number of household. This means that some of the added variables are correlated with the number of sharing households and the SQI.
Standard errors are clustered on the compound level All pooled regressions include country fixed effects * * * p < 0.001 ; * * p < 0.01 ; * p < 0.05 Apart from the technology and the number of sharing households, the results suggest that location affects SQI scores. Toilets located outside of the compound have, on average, an SQI score that is 11 points lower than toilets inside the compound. Moreover, the results suggest that a lockable door is positively associated with SQI scores. Having the option to lock the door from the inside and the outside improves the SQI score by 16 points. Having the option to lock the door only from the inside or only from the outside improves the SQI score by more than 10 points. Other features that show a moderate positive correlation with toilet quality are lighting, floor tiling, and cleaning rotas.
The availability of water on the premises, gender-separated cubicles, the relationship between the users, the age of the toilet, whether there is a landlord residing on the same plot, and the presence of a bin inside the cubicle are not significantly associated with SQI scores.
In addition to the regressions in Tables 5 and 6, where the SQI is the dependent variable, Cols.2-5 in Table 10 in Appendix D report the results for a robustness check using an additive sanitation quality measure. In this specification, the eight toilet quality features that constitute the SQI are weighted equally, and the score takes values between one and eight. The results remain qualitatively unchanged due to the high correlation between the SQI and the alternative additive quality score (see Fig. 7 in Appendix D). Furthermore, we check whether the day a spot-check was conducted might be correlated with the SQI. It is possible that toilets would be dirtier during weekends because the users spend more time at home compared to weekdays. The coefficient for the Weekend-dummy in Col.1 of Table 10 indicates that conducting the spot-check on the weekend is not significantly associated with a lower SQI.
Comparing the R 2 statistics for the three regressions in Tables 5 and 6 shows that considering technology and the number of households as detailed, categorical variables increases the adjusted R 2 by nine percentage points in the pooled regressions and by 9-18% points in the country-specific regressions. This implies that considering the specific kind of toilet technology (instead of simply distinguishing between improved and unimproved) and the exact number of households (instead of simply distinguishing between shared and private) increases the share of total variation explained by the regression considerably. Adding the other potential indicators in Table 6 increases the adjusted R 2 by an additional nine percentage points. Generally, the R 2 statistics are remarkably high once we move beyond simple binary variables along two dimensions.

Quality Indicators Within the JMP Framework
The results suggest that toilet technology predicts toilet quality across countries, but with a different threshold than currently proposed by the JMP framework: pit latrines (without or with slab) are consistently associated with lower SQI scores than flush toilets. Shared toilets are, as suggested by the JMP framework, of lower quality on average than private toilets, but the effect is generally low in Bangladesh and low in Ghana if we control for other sanitation quality indicators. We also observe an additional improvement in the toilet quality from four to only 2-3 sharing households. Of the additional variables, the location, lighting, and a lockable door are all positively associated with toilet quality in two of the three countries.
From these observations, we suggest options for adjusting the current JMP framework for a higher correlation between collected indicators in household surveys and toilet quality and for indicators that make it possible to distinguish between adequate (defined as clean, safe, and private) and inadequate shared sanitation. These adjustments could increase the sanitation service ladder's informative and explanatory power of urban sanitation quality, but stick to the three service levels as proposed by JMP: basic (i.e., adequate), limited, and unimproved. 15 First, we analyze which specification performs best in separating high-quality from lowquality toilets when only self-reported indicators are used. Ideally, an informative indicator of sanitation quality produces increasing SQI scores with increasing sanitation service levels, i.e., basic should exhibit higher average SQI scores than limited, and limited should have higher scores than unimproved. Second, we analyze how alternative sanitation ladders would affect the share of households counted as having access to basic, limited, and unimproved sanitation. Even if one specification of the sanitation service ladder outperforms others with regard to differences in SQI between sanitation levels, this specification might still not be preferable from a policy point of view if it means that the costs to achieve adequate sanitation under this specification would be very high.  Table 7 presents the specifications of alternative sanitation service levels. Taking the current JMP framework ("JMP indicators" & "Standard Technology", upper left in Table 7) as a starting point, we vary the specification of the levels along two dimensions. First ("Standard Technology" vs. "Alternative Technology"), we alter the criteria for whether a toilet technology falls under the unimproved category. According to the current specification, flush (to piped sewer systems, septic tanks, or pits) and pit latrines with slabs are considered to be improved technology (and can be considered as basic or limited, depending on whether they are private or shared). Pit latrines without slabs are considered unimproved technology (JMP, 2019a). However, our results indicate that pit latrines (with or without a slab) have considerably lower SQI scores than flush toilets. Thus, we modify the improved/ unimproved facility type classification, categorizing any flush option as improved technology while categorizing all pit latrines (with/without slab) as unimproved. 16 Second, we vary when a toilet is considered basic rather than limited. To this end, we change the threshold for a sanitation facility to be classified as basic (and not only limited) from one to three households: "Expanded JMP (3HH)." As an alternative, we dismiss any information on the number of sharing households but add the restriction "Location + Lock + Lighting" (LLL). For a basic service level, the toilet facility must be located on or next to the compound, have an outside/inside lock, and lighting. To be categorized as "limited," the toilet technology must be improved, but either located elsewhere, or have either no lock or no lighting. To be considered "unimproved," the technology is unimproved. Fig. 4 SQI distribution by service level specification 16 The outflow was not definable in many cases due to the dense structure of low-income urban settlements. Thus, only considering the interface-irrespective of the outflow-is justified in the context of a quality indicator for urban sanitation, which does not mean that outflow is an obsolete indicator. Figure 4 shows the mean SQI score that qualifies as basic, limited, or unimproved according to the current and alternative specifications of the sanitation service level. Only considering the upper three panels ("Standard Technology"), we find that overall, in determining what qualifies as basic sanitation, altering the number of households from one to three households does not change the correlation between the sanitation levels and SQI much. Hence, toilets that are shared by up to three households could still be considered as basic and not only as limited. Applying the LLL specification improves classification in Kenya slightly with regard to the SQI, while it slightly worsens classification in Ghana and Bangladesh. One could, of course, keep the number of sharing households as a defining factor and build a measure that combines toilet technology, sharing households, and LLL. We show the results for this additional specification in Table 11 in Appendix E. However, there are only marginal differences compared to the LLL measure, which demonstrates that the LLL specification substitutes the number of households to a certain degree.
Considering the second dimension ("Standard technology" vs. "Alternative technology"), we find that the SQI classification is substantially improved if the cut-off for basic and limited sanitation are flush toilets and not pit latrines with slabs (as in the JMP classification). Interestingly, basic and limited sanitation levels now show about the same SQI across countries, which is not the case for the standard JMP classification. Such an alignment of classifications across countries is also highly desirable from a measurement perspective. However, the unimproved sanitation level still corresponds to a higher SQI in Ghana than in Kenya and Bangladesh. Again, switching from 1 to 3 households for a cut-off between basic and limited does not change the average SQI for the different sanitation levels significantly, nor does a switch from the number of households to lock/lights/location to distinguish between basic and limited sanitation. Figure 5 reports the changes in the relative frequency for each specification. It shows that lowering the threshold to three households ("JMP indicators" vs. "Expanded JMP") while leaving the technology definition unaltered ("Standard technology") moves many shared facilities to the basic level, as expected. In Ghana and Bangladesh, applying the LLL specification (that does not take into consideration the number of households at all) increases the number of facilities classified as basic even more. In contrast, in Kenya, the LLL specification would decrease the share of sanitation facilities considered as basic since few facilities are close to the household, have a light, and have a lock.
Changing the definition of the technological requirements for a facility to be considered basic ("alternative technology") has a very large impact on the frequency of facilities classified as unimproved, and the consequences would strongly diverge for the three countries (see Fig. 5, lower panel). In Kenya, the share of unimproved toilets would increase from 4 to 85%, in Ghana from 4 to 45%, and in Bangladesh, decrease from 80 to 1%. This result also shows that the technology threshold used currently (or any other) has huge implications for whether we assume low-income urban areas have mostly adequate sanitation or not.

Conclusion
In this paper, we developed a Sanitation Quality Indicator (SQI), a composite index measuring the observed cleanliness, safety, and privacy of sanitation facilities, and analyzed its correlation with self-reported indicators of households' toilets in cities in Kenya, Bangladesh, and Ghana. We first show that self-reported sanitation quality (and especially hygiene) is only weakly correlated with observed sanitation quality. Our results further demonstrate if and under what conditions shared sanitation facilities can be considered to be "adequate." Our results also support the modification of the widely-used JMP sanitation service ladder to better reflect differences in sanitation quality. To do so, we suggest collecting additional information to assess the sanitation progress across countries through international applied household surveys, such as the Demographic and Health Surveys (DHS) and the Multiple Indicator Cluster Surveys.
Based on our results, the user interface technology, the number of sharing households, the toilet's location, presence of a door that is lockable from the inside and outside, and lighting are predictive indicators of sanitation quality. A cleaning rota and 1 3 floor tiling are also weakly associated with higher sanitation quality. In contrast, a water source on the premises, gender-separated cubicles, the users' relationship, toilet age, a landlord living on the same plot, and a bin inside the cubicle are not correlated with sanitation quality. Second, we find that even though private toilets generally show a higher sanitation quality than shared toilets, the magnitude of the relationship varies considerably across countries. Toilets that are only shared by two and three households are mostly cleaner, safer, and more private than toilets shared by four or more households. Third, and most importantly, our results indicate that pit latrines with a slab show a considerably lower sanitation quality than toilets with flush or pour-flush technology. This is in contrast to JMP's classification, which makes no distinction between (pour-)flush facilities and pit latrines with a slab.
The JMP sanitation service levels constitute a classification system exclusively based on two dichotomous indicators, improved technology and shared facility, which are only partly informative as sanitation quality indicators in the urban low-income context. Classifying pit latrines as unimproved sanitation (with/without slab) leads to a considerable improvement in sanitation quality prediction relative to the current JMP sanitation service level specification. However, under this new specification, many more toilets would be classified as unimproved in low-income urban areas. In contrast, increasing the number of households from one to three as a decisive criterion for basic sanitation or ignoring the number of households altogether and instead focusing on different indicators (location, lighting, lock) does not substantially affect the indicators' predictive performance with regard to sanitation quality, but it strongly increases the share of toilets classified as basic. Such an adjustment, therefore, helps to focus scarce resources on the remaining "limited" category for better targeting of future investments.
Our results also show the large heterogeneity across low-income urban settlements-even though all were part of the (second) largest city in a middle-income country. The most commonly found sanitation technology varied considerably, and the correlation of various indicators with sanitation quality was different across contexts. Toilets in Kumasi (Ghana) showed, on average, a higher sanitation quality than toilets in Kisumu (Kenya) and Dhaka (Bangladesh), even when controlling for all toilet characteristics. Hence, the context seems to be particularly relevant for urban sanitation, and research that analyzes these country-specific differences in more detail would shed more light on contextual factors. Finally, exploring the causal impact of the identified indicators and toilet quality would help to better inform future policy decisions.