Conceptual framework and item selection
The content of the measure was designed with significant input from patients and community members. We sought to design a conceptual framework for the measure based on their views and experiences. We conducted 32 semi-structured interviews with community members as well as with individuals who had sought help for sexual function problems. Maximum variation sampling was used to ensure a wide range in terms of experience of sexual difficulties. Individuals were recruited from: a sexual problems clinic (n = 6; clinical sample); the diabetes and depression patient lists of a General Practice (n = 13; community members at higher risk of difficulties); an HIV charity (n = 3; community members at higher risk of difficulties); and the waiting room of a General Practice (n = 10; community sample). As is usual for qualitative methodology, the sample size was small to allow in-depth exploration of the data. The interviews explored the range of criteria used by participants in assessing their sex lives and what was seen, and not seen, as problematic. Interview transcripts were coded to identify potential criteria for a functional sex life. Based on the qualitative data and academic literature, and following a set of decision rules, extraneous criteria were excluded. The rules were:
If two criteria overlap, exclude the criterion for which the evidence is weakest.
Exclude any criterion that interview respondents regarded as desirable rather than essential.
Exclude criteria that are associated with sexual function, rather than part of the construct itself.
The second rule, stipulating a focus on the essential, reflected our design imperatives of brevity and public health utility . The last rule involved differentiating correlates of sexual function from the criteria representing the construct itself. We defined as correlates any criteria that could be construed as antecedent to, or an outcome of, a functioning sex life or criteria that were “a degree or so removed from explicit sexual behaviour” [15, p 293]. The methodology for this qualitative stage of the study is described in detail elsewhere .
The measure was designed as a computer-based instrument (for completion by respondent or interviewer). The rationale for this was threefold: firstly, the measure is primarily designed for use in Natsal 3, which is a computerized survey; secondly, in future the measure is most likely to be used in large-scale health surveys, which increasingly use computers; and thirdly, a computer-based design allowed more complex filtering, providing the flexibility to cater for wide variation in individual sexual experience. The selected criteria were translated into draft items. Some items (Q9 and several items under Q1) were similar to items in the previous Natsal survey but the others were newly created, following a review of items in existing measures. The items were pre-tested to investigate: acceptability; comprehension, correspondence between respondents’ actual experience (as reported in interview) and their questionnaire responses; and efficiency of routing and question order.
At the piloting stage, 12 interviews were conducted with individuals sampled from a general practice waiting room (a proxy for the general population); and four were conducted with individuals from a sexual problems clinic (clinical sample), both situated in North London. After completing the measure, participants reviewed their answers with an interviewer. Cognitive techniques (for example, thinking aloud; rephrasing in the respondents own words) were used to elicit participant views on the measure. The methodology and results of the pre-test are described in further detail elsewhere (Mitchell and Datta, unpublished study report).
Measure formation and validation
We implemented a survey to test the draft items and select those with the strongest psychometric properties for inclusion in the final measure; and to test the reliability and validity of the final measure.
The survey involved a general population sample (n = 1,262) and a clinical sample (n = 100).
The general population sample was obtained via an internet panel administered by one of the UK’s leading market research companies. The panel has 420,000 or so members living in Britain who collect reward points for participation. Data quality is maintained by validating new members, and by close monitoring of ‘survey behaviour’ to eliminate panellists who give inconsistent responses or who display low engagement (for example, completing surveys too quickly). Panellists for this study were selected randomly within nationally representative quotas on age, gender and region. The survey link was sent to 13,489 members aged 18–74 and data from the first 1,262 completed surveys to fill the quotas were analysed. Of these respondents, 144 completed the measure again 2 weeks later (in order to assess test–retest reliability).
The clinical sample (n = 100) was recruited via four NHS sexual problems clinics in London. Following their consultation, new clinic patients were introduced to the study by their clinician, who gave them an invitation letter and an information sheet with instructions on how to access the web-based survey The majority of patients completed the survey at home after their clinic appointment. In one clinic some respondents opted to complete the survey on a computer in a private room in the hospital. Respondents were given £10 worth of shop vouchers, as thanks for their contribution to the study.
Comparison measures and variables
The online questionnaire included all the items from our new measure, plus several items for comparison (variables that in theory should correlate with sexual function (see Table 1). We also included two existing measures of sexual function.
As outlined above, there are no universally agreed standard instruments for measuring sexual function in the community. From the array of reliable and valid measures we chose, for comparison, two whose dimensions looked fairly similar to our own. The female comparison measure, the Female Sexual Function Index (FSFI), is well known and has been used extensively . We used the FSFI-6, a validated item-reduced version of this measure , in order to minimise questionnaire length and respondent burden. The chosen male comparison measure, the Brief Sexual Function Questionnaire (BSFQ) for men  has an emphasis on psychological aetiologies and probes the relational aspect of sexual function without assuming that the respondent has a sexual partner.
Both of the selected measures (the FSFI-6 and BSFQ) ask about sexual function in the last month. In order to provide a fairer comparison with our measure (in which the reporting period is the past year), we extended the reporting period for each measure to the last 3 months; a compromise between comparability and staying close to the original timeframes of the FSFI-6 and BSFQ. We modified the 21 item BSFQ to reduce respondent burden, omitting 9 items. The omitted items were those asked elsewhere in the questionnaire (e.g. frequency of sexual activity), items deemed unessential for comparison purposes (e.g. sexual orientation) and items providing detail not required for comparison purposes (e.g. length of intercourse after insertion of penis and before ejaculation).
Our latent variable measurement models were based on a multivariate probit analysis with latent variables  through a 2-parameter normal ogive item response model and its extension to polytomous/ordinal data . In such models, the factor loading reflects the strength of the association between the observed item and the latent construct. The threshold parameter reflects the point of the latent construct that needs to be reached for a particular response option to be endorsed. Within this measurement modelling framework it is possible to estimate an individual’s scores on the Natsal-SF against their standard error of measurement. This plot is a scale information function (SIF) or scale characteristics curve (SCC). The SIF indicates the range of estimated scores for which an item, item response, or scale is most precise for measuring a persons’ level of, in this instance, sexual functioning. The information is Fisher information i.e. statistical information, and relates to the reciprocal of the square root of the posterior standard deviation of the estimated score (posterior mean). It is the same information that is used to construct a confidence interval for an estimated score, under the assumption of a normal distribution underpinning scores. From a SIF we can identify where the standard error is of constant width, and at what point on the measurement continuum standard errors start to increase, indicating less precise measurement. Psychometric results such as these enable a more informed statement to be made about the measurement range of an instrument when applied in a population. For example, it enables the researcher to define the centile range over which estimated scores have a sufficiently small standard error (precision) to be considered a reliable score.
In the second stage of the analysis, the selected measurement model was combined with a set of observed covariates as well as external validation criteria in order to jointly estimate the external validity of the scale in a full structural model, thus extending the measurement model to a Multiple Causes Multiple Indicators (MIMIC) model. All models were estimated in the Mplus 6.1 software . Model fit was assessed with the Comparative Fit Index (CFI), the Tucker Lewis Index (TLI) and the Root Mean Square Error of Approximation (RMSEA) following the recommendations of Yu on their interpretation (Evaluation of model fit indices for latent variable models with categorical and continuous outcomes. Unpublished dissertation, 2002; see Mplus website http://www.statmodel.com/download/Yudissertation.pdf).
For missing data, we employed the Full Information Maximum Likelihood (FIML) method which is naturally incorporated into structural equation models. In this full likelihood context model parameters and standard errors are estimated directly from the available data and the selection mechanism is ignorable under the Missing at Random (MAR) assumption [21, 22]. The basic goal of the FIML method of handling missing data is to identify the population parameter values that are most likely to have produced a particular sample of data and the discrepancy between the data and the estimated parameters is quantified by this likelihood. In this context the MAR assumption implies that all systematic selection effects depend on variables which are included in the models.
Ethical approval for the study was granted by Oxford A Research Ethics Committee. Governance approval was secured from all the participating NHS trusts.