Background

Rapid tests (RTs) for HIV antibody in blood and saliva are being promoted for HIV diagnosis in resource limited settings in order to improve the quality of service delivery, acceptability and uptake of HIV testing and counselling (HTC) [1, 2]. The advantage of RTs compared with enzyme immunoassays (EIA) is the simplicity in use and the quick turn-around time of results leading to improvements in service delivery including prevention, care and treatment. Unlike EIAs, RTs do not require much use of instrumentation and with good training can be used by non-laboratory staff in remote centres. In the past few years, there has also been an increasing interest in the use of HIV self-testing (HIVST) to help increase testing coverage and a number of countries already have HIVST policies [3]. HIVST is where an individual, who wants to know his or her HIV status collects his/her specimen, tests it and interprets the results, often in private. Most HIVST is by the use of oral tests that use saliva and currently one HIVST kit, OraQuick In-Home HIV Test is approved by the USA Food and Drug Administration (FDA) [4]. Furthermore, there are now 4th generation RTs that allow detection of infection in the earlier phase of infection due to the ability to detect HIV antigen which appears earlier than antibodies after infection. The challenge is that some of these 4th generation RTs are still lagging behind, largely due to their inadequate performance so far especially in detecting acute infections [5, 6], and only the Determine HIV 1/2 Ag/Ab Combo is currently approved by the FDA.

The Joint United Nations Programme on HIV/AIDS (UNAIDS) 90–90-90 targets aim at 90% of people knowing their HIV status, 90% of those positive receiving ARVs and 90% of those receiving ARVs suppressing viral load to below detection by 2020 [7]. The attainment of this will require unprecedented roll out of HTC. Nations will need to ensure HTC expansion to meet these targets but also to provide quality HIV testing as a priority.

In Africa, various RT kits are currently on market and different algorithms are being used, but evaluation of these tests especially in field settings of the general populations has been limited [8]. In addition, most HIV test kits have been evaluated as individual kits and not testing algorithms. The global practice is that each country must select and standardize their testing algorithms based on local situations e.g. costs, storage issues, human resource, infrastructure, service demand etc. [9].

In Uganda, HIV rapid testing began in 1990 using Capillus HIV-1/HIV-2 (Cambridge Diagnostics) and since then various kits have been used [10, 11], it was in 2006 that a national testing algorithm of Determine HIV 1/2, HIV-1/2 Statpak and Uni-Gold HIV was introduced but without extensive evaluation. Other algorithms have been proposed [12] but their evaluations were based on limited kits and without following the recommended World Health Organization (WHO) and US Center for Disease Control and Prevention (CDC) guidelines [8, 13]. One other challenge with the current national algorithm is related to the interpretation of results that show weak positive (WP) bands which have led to low specificity and positive predictive value (PPV) [14]. With more kits on market, there is therefore a need to evaluate them, come up with better algorithms for the country and make recommendations on the observed WP bands. The objectives of this study therefore were to determine the appropriate HIV RT and recommend 2–3 algorithms to be used in Uganda.

Specific objectives were 1) To assess the performance of eleven blood and three oral fluid based rapid tests and come up with best algorithms. 2) To determine the proportion of HIV positive RTs with WP bands but which eventually turn out negative on confirmation using EIA. 3) To ascertain the inter-reader agreement in HIV diagnosis using RTs. 4) To assess the correlation of finger stick blood HIV results and those obtained by use of plasma.

Method

The study followed the CDC/WHO guidelines for RT evaluations [13]. It was a two phase study where phase I was a laboratory-based evaluation to determine test performance and to identify 3 algorithms that were later evaluated in phase II at the point of care (POC). Phase II used a finger prick by laboratory and non-laboratory staff and quality control (QC) at the National Reference Laboratory (NRL) performed on 10% of the negative and 100% of positive samples using the algorithms used at the POC. For oral based kits, both laboratory and field testing took place in phase I.

A total of eleven blood based kits were evaluated namely: Determine HIV1/2, HIVSav 1&2 rapid serotest, Acon HIV 1/2/0 Triline, Uni-gold HIV, SD Bioline HIV 1/2 3.0, First Response HIV 1–2. O, Carestart HIV 1.2.O, Doublecheck Gold HIV 1&2, HIV 1/2 StatPak, Medinostics HIV 1/2 Gold rapid screen and Insti HIV antibody. The three oral transudate kits evaluated were OraQuick HIV-1/2, Calypte Aware HIV-1/2 OMT and Chembio DPP HIV1/2.

These were selected from the United States Agency for International Development (USAID) list of approved HIV kits. These kits had undergone evaluation and approval by either FDA or another stringent regulatory approval for Canada, European Union Japan or USAID. From this list, kits that did not meet the study’s selection criteria of storage (2–300 C), specimen type to use serum, plasma and whole blood, ease of use (that included number of steps, ease of reading the test, nature of specimen collection and whether no special equipment was required), and shelf life (at least 12 months) were excluded. Other reasons for kits exclusion were limited information and not for our Ugandan market.

Before the study, the study team was trained on the standard operating procedures (SOPs) and on the protocol. Site visits were made in preparation for and during the study to monitor progress. Ten milliters (mls) of venous blood was collected in ethylenediaminetetraacetic acid (EDTA) (5 ml), the anticoagulant preferred by most test kit manufacturers and serum separating tubes (SST 5 ml). The blood based kits were evaluated using EDTA whole blood while oral kits were evaluated using oral transudate by swabbing of participants gums to obtain sample. Each kit was tested following the instructions in its insert. The serum from the SST tubes was used for confirmatory testing using 2 EIAs i.e. Vironostika HIV Uni-form II plus O and Murex HIV-1.2.O run in parallel as the gold standard. In-house cut off values used by CDC Uganda were used (UVRI NRL QC/quality assurance (QA) report 2000). These cut offs had been established after validation of these EIAs using local samples in order to increase their specificity without compromising sensitivity. The EIAs were done at Medical Research Council/UVRI Uganda at Entebbe and at Rakai Health Sciences Program (RHSP) in Rakai with technicians blinded to the RT results.

Testing of the blood based kits was done by laboratory technicians at the laboratories at each site while oral transudate kits were evaluated by nurse counsellors at the respective sites. In order to address objective number two, results were scored as weakly reactive or WP when the intensity of the test band was weaker than the control band but in the calculation of kit performance, all test bands were interpreted as positive regardless of intensity as required by the manufacturer. All results were read independently by two operators at three sites in order to assess inter-observer agreement.

The sensitivity (Se), specificity (Sp), negative predictive value (NPV), PPV false positive rate (FPR) and false negative rate (FNR) of each kit was calculated using the EIAs as gold standard. The performance criteria for the desirable algorithms were then established. A minimum sensitivity of 99.0% was set for the screening kits and a minimum specificity of 98.0% was set for the confirmatory and tie breaker kits [2]. Combinations of all kits that met the above criteria were then made and the performance (Se, Sp, PPV, NPV, FPR, FNR) and cost of the different algorithms calculated.

The cost of the kits was determined using the HIV Test kits listed in the USAID Source and Origin Waiver [15]. We also compared this with the WHO procurement prices for 2013. The cost of each algorithm was estimated using the formula described in Appendix A using the average national prevalence of 7.3% [16]). The technical team of the ministry proposed a new algorithm had to be cheaper or within 20% price of the current algorithm. The price of the algorithm being used at the time was about 1.50 US$ per test. Results were double entered and consistency checks done at UVRI.

Study sites and participant selection

Seven sites, covering the four different regions of Uganda, participated in the phase I evaluation namely; − AIDS Information Centre (AIC) Kampala and MRC/UVRI Kyamulibwa (Central), Rakai Health Sciences Program (RHSP) and Mbarara Regional Referral Hospital (Western), St. Mary’s Lacor Hospital, Gulu (Northern), The AIDS Support Organization (TASO) Jinja and TASO Soroti (Eastern), see site locations in Fig. 1. Participants already known to be HIV positive were recruited after collection of the 200 HIV negative samples at each site. Four sites AIC, TASO Entebbe, TASO Masaka and Lacor Hospital participated in phase II.

Fig. 1
figure 1

Map of Uganda showing the four regions and site locations where the phase I testing was performed. Source: http://d-maps.com/carte.php?num_car=4025&lang=en but with the study sites added

Statistical methods

Sample size

The study followed the CDC/WHO guidelines for HIV test evaluation [13] which recommend that a test evaluation exercise should have a minimum of approximately 200 HIV-positive and 200 HIV negative specimens to provide 95% confidence intervals of less than ±2% for both the estimated sensitivity and specificity. This number was considered at each of the 7 study sites (Rakai, Kyamulibwa, AIC, Tororo, Jinja, Mbarara and Gulu) to give a targeted enrolment of 2800 individuals (1400 HIV negatives and 1400 HIV positives). Consideration of this sample size per region was decided to give an opportunity to investigate whether there are environmental or other local factors that could affect test performance, especially the reported weak positive bands.

Samples included and those excluded in the analysis

A total of 200 HIV positive and 200 HIV negative individuals from each of the sites involved were eligible. However, the analysis only considered those individuals whose samples were in quantities that would allow for all the tests under evaluation together with EIA tests to be performed. Those that did not have enough samples were excluded from the analysis. The analysis also excluded those that had discordant EIA tests.

Dates of sample collection for phase I

The samples were drawn from October 2010 to December 2012. More specifically, for Lacor, this was 4th October 2010 – 11th February 2011. MRC Kyamulibwa, 7th October 2010 - 5th October 2011; Mbarara: 14th October 2010 - 17th December 2010; RHSP: 12th October 2010 - 16th February 2011. AIC-Kampala: 23rd November 2010 - 15th April 2011; Jinja: 4th January 2011 - 13th December 2012 and Soroti: 13th September 2011 - 9th May 2012.

Dates of sample collection for phase II

TASO Masaka was 5th February 2015 to 11th March 2015; TASO Entebbe 25th February 2015 to 30th March 2015; Lacor Hospital 18th February 2015 to 20th April 2015 and AIC was 13th April 2015 to 29th June 2015.

To assess specificity, sensitivity, negative and positive predictive values

STATA 12 was used to calculate the sensitivity, specificity PPV and NPV. These values were calculated by first taking the WP results as positive and then removing the WP results from the calculation to determine the influence of WPs on false positive rates.

Receiver operating curves (ROC)

Though our major interest was Se, Sp, NPV and PPV, we used the STATA command roccomp to compare the ROC generated using the three best algorithms to the current national algorithm in use (Determine + Statpak + Unigold).

Developing an algorithm

Data were analyzed to determine the performance of individual tests. Kits that passed the above sensitivity criteria were considered as screening tests while kits that passed the specificity criteria were considered as confirmatory tests or as a tie breaker when evaluating an algorithm. For the algorithm evaluation, the study considered a serial testing model whereby specimens were considered as true negative if they reacted negatively on the first test. For specimens reactive on the first test a confirmatory test would be considered. If specimens were concordantly positive by the two assays, they were considered as true-positives. For discordantly reactive sera, the third test kit or tie breaker results in the algorithm would be considered as definitive.

To determine the proportion of HIV positive RTs with WP bands but which eventually turn out negative on EIA confirmation

In order to assess whether the WP scores were equally distributed in the four geographical regions for each RT, cross tabulation of the blood RT results against the EIA tests were generated indicating the percentage of WPs that returned positive or negative EIA results.

To ascertain the inter-reader variability using RTs

In order to determine the inter-reader variability of the RTs, two persons independently interpreted each test result, and the reader variability, expressed as percentage of sera for which different readers interpret test results differently was calculated for each RT.

Quality assurance

This study was conducted under a rigorous QA program which included among others, training of personnel, development of SOPs, coordination of sample transport and storage, equipment maintenance and proficiency testing. Samples were given unique identifiers in all the laboratories.

Phase II

In the second phase, the three algorithms selected in phase I were used in the field, at the POC for purposes of QC. Finger stick whole blood testing was done at three sites where two different algorithms per site were used. Matching plasma from all HIV positive specimens and 10% of HIV negative specimens were retested at the NRL at UVRI using the same testing algorithms as at the POC.

Before initiation of phase II, there was a change in the manufacture of Determine HIV1/2 from Abbott to Alere Medical Co Ltd., Japan. Although the kit composition did not change, we compared the performance of both kits and showed these two kits were not different in their performance (results not included).

Results

Figure 2 shows the logic framework for phase I and II.

Fig. 2
figure 2

The logic framework for phase I, the laboratory-based evaluation and phase II at the point of care

Phase I

In phase I, a total of 2800 participants were enrolled, with 400 samples tested at each of the 7 centers. However, the kits were evaluated using samples from 2746 participants i.e. 392 from AIC; Mbarara 387; Kyamulibwa 392; Rakai 399; Lacor 384; Jinja 395 and Soroti 397 participants. Those missing had either EIA test results that were discordant or samples collected were insufficient for the Gold standard test.

First Response and Carestart had more WP results; they scored 36.7% and 36% respectively. These were followed by Acon Triline (27.5%) and HIVSav (25.7%). StatPak (3.0%) followed by Determine (6.2%), Unigold (6.4%), Insti (11.4%) and SD Bioline (12.5%) had the least WP scores. All kits had a WP score in all participating centers (Table 1). Eastern (58.8%) followed by Central Uganda (53.8%) had significantly higher proportion of samples testing WP on any kit compared to Western (43.9%) and Northern Uganda (47.7%), table not shown. For all the kits evaluated, the exclusion of samples with WP bands improved the specificity and PPVs.

Table 1 Rapid test results by testing center

Table 2 shows the performance of the blood based kits with WP taken as positive and after removal of the WP (No WP). Determine had the highest sensitivity of 99.3% and 99.2% with and without WP respectively. Insti had the second best sensitivity of 99.2% and 98.9% with and without WP respectively. Medinostics had the best specificity of 99.1 and 99.8 with and without WP respectively. Unigold had the best specificity when WP were taken as positive and 9 RTs had specificity of ≥90 after removal of WPs.

Table 2 Summary of the kit performance for the blood based kits

Table 3 shows the performance of the oral kits. Aware had the highest sensitivity of 97% when WP were included and 96.4 after removal of WP. On the other hand, Oraquick had the highest sensitivity of 96.5% when WPs were excluded which increased to 96.8% when WP were included. Oraquick and Aware had the highest specificity of 98.5% when WPs were included and the highest specificity of 99.2% when WPs were excluded.

Table 3 Summary of performance of the Oral kits

After sensitivity and specificity analyses, it was observed that the best test kits that could be used as screening tests were Determine and Insti (with best sensitivity before and after removing results with WP bands). However, it was also observed that Medinostics had the “best” PPV of 99.1% before and 99.6% after the removal of results with a WP band. Medinostics also had the highest specificity of 99.1% before and 99.8% after removal of results with a WP band.

It was therefore decided to compare algorithms where Determine and Insti were the screening tests. Combinations of all kits that met the above criteria were then made and the performance (sensitivity, specificity, PPV, NPV, FPR, FNR) and cost of the different algorithms calculated. None of the 3 oral mucosal transudate kits were suitable for inclusion in an algorithm evaluation because of their low sensitivities.

Even though HIVsav passed the criteria for a screening kit, all algorithms in which it was included were eliminated because their sensitivities were lower than the current algorithm.

Based on the results in Tables 4, 18 algorithms had a performance better than the current national algorithm. The three algorithms indicated in Tables 4 and 5, 1) Determine + SD Bioline + Statpak; 2) Determine + Statpak + SD Bioline and 3) Determine + Statpak + Insti which performed better than the current algorithm and met the cost requirements are recommended as the new national algorithms. Algorithms containing Medinostics were all removed because the kit could not be procured by the study team.

Table 4 Summary of possible algorithms with a performance as good as or better than the currenta
Table 5 Summary of possible algorithms with a performance as good as or better than the current algorithm ranked in order of costa

Receiver operating curves

The ROC observed showed no statistically significant differences between the three selected algorithms and the currently used one. Determine + Statpak + SDBio to Determine + Statpak + Unigold, (p = 0.31); Determine + SDBio + Statpak to Determine + Statpak + Unigold, (p = 0.31); Determine + Statpak + Insti to Determine + Statpak + Unigold (p = 0.25). The ROC area for the three new algorithms was marginally wider when compared to that generated for the current national algorithm.

Inter-reader variability

Analyses done at three sites, Rakai, Jinja and Soroti showed that the inter-reader variability between two technicians for all the tests was low, in most cases below 1%, meaning that there were similarities in the interpretation of results (Table 6), exceptions were Jinja where for Carestart this was 1.26 and 2.01 for Double check and at Soroti where for Determine this was 2.42% and for Insti 3.63%. The majority of differences happened when one reader interpreted a result as having a weak positive band while another interpreted it as being truly positive. There were also cases where one reader interpreted result as having WP band while the other recorded as negative result.

Table 6 Inter reader variability

Antigens and technology used

Since it is advisable to use in combinations tests that use different antigens and technology, this was also examined for the kits used in the proposed three algorithms as shown in Table 7. One manufacturer did not provide this information.

Table 7 Antigens and technology used in the selected kits

Phase II

In phase II, sites tested a total of 2398 samples. A total of 901 site-specific results were included in the comparison and were compared to the NRL plasma EIA results. Results from AIC, TASO Entebbe and Lacor Hospital were 100% concordant with the NRL results. Concordance between results at the TASO Masaka site and NRL was 96.7% with 5 sample results being discordant with EIA results, 2 for one algorithm (Determine, Statpak, SD Bioline) and 3 for the other (Determine, Statpak and Insti). Two of the samples resulted in false negative EIA results while 3 resulted in false positive EIA results. The discordant results were all from one operator who joined later in the study and were attributed to inadequate training.

Discussion

This evaluation was necessary because there were indications that the current national algorithm which has been in use for quite some time may need to be changed. In addition, several new HIV RTs kits had been developed that may be better in terms of cost or ease of use if used as part of the national algorithm. There was also a demand to provide alternative algorithms for better service delivery. To achieve this, we used a more rigorous CDC/WHO recommended evaluation method compared to that which was used earlier.

In our study, the performance of the individual RTs varied. While the sensitivity of all kits except for four was within that recommended by WHO i.e. (≥ 99%) [17], the specificity of six kits was lower than the recommended ≥98%, if the WP bands are scored as positive leading to low PPV. A similar observation was reported before with some of these kits i.e. Determine and Unigold [14, 18]. There is a recent study that has also shown poor performance of some RTs in sub-Saharan Africa than the WHO evaluations [19], and this was not attributed to weak bands, which were rarely reported in their study. There are varied reasons for the performance differences, for instance, it was reported that the sensitivity of Insti could be enhanced by its ability to detect HIV gp41 IgM antibodies [20].

The performance of the oral based kits was lower than the blood based ones, especially their sensitivity and NPV which fell below the recommended WHO performance targets. We are aware that a few other studies in Africa have shown good performance of these oral based kits [21,22,23] and their use in self-testing is being promoted. Such a policy is also being developed in Uganda and our recommendation is to emphasize the need to perform other confirmatory tests based on blood assays after oral self-testing. Since all those involved in testing were well trained we do not think this poor performance of the oral based kits was due to the testing being performed by nurse counsellors as opposed to laboratory technician who performed the blood based RTs.

We have identified three algorithms whose cost is either lower or within 20% the price of the current one and with better performance. We recommend these algorithms to be adopted by the Ministry of Health (MOH) as the national algorithms for the country. The above approach gives providers a choice of algorithms that can be used especially if there are procurement challenges such as those we observed when one of the kits was temporarily withdrawn. We however recognize the challenges of moving from using one algorithm to three. There will be a need to have training and understanding new concepts including the possibility of interchanging the confirmatory and the tie breaker kits since two of the algorithms can use these interchangeably i.e. Determine + SD Bioline + Statpak or Determine + Statpak + SD Bioline. If the MOH eventually decides to reduce to two algorithms these two would be preferred since it simplifies training and procurement. On the other hand, if it is recommended to stick to only one algorithm as currently, and from the initial discussions this appears to be the preferred option, then the algorithm to recommend would be Determine + Statpak + SD Bioline, the reason being this would be very easy to introduce since there is only one change from the current algorithm, replacing Unigold as the tie breaker with SD Bioline. Our study therefore provides the policy makers with a number of options.

Though the ROC for the three selected algorithms did not significantly differ from the currently used one, the above performance in terms of Se, Sp, NVP and PPV met the pre-set performance requirements to guide the selection.

We also recommend that all algorithms that performed better than the current one are made available as alternatives for the private sector where choice is largely based on performance rather than cost. Currently different test algorithms are being used in the private sector guided by availability and unfortunately, some of these algorithms have not been evaluated locally. Our recommendation will provide the private sector with a number of alternatives; some of these have very high performance only limited by cost which may not be an issue for some clients. The challenge is that these are too many algorithms and training and QC programmes could be difficult to implement in such situations yet they are important components of HCT and RT scaling up [24].

From our study, we propose that kits being recommended should be used following the manufacturers criteria, scoring both strong and WP bands as positive. When serial testing as being proposed is followed, the performance of these kits would be improved and the intensity of the bands would be less of an issue. We however recognize the fact that reading/interpretation of RT strips is subjective and depends on a number of factors not least, the available light and the reader’s visual acuity. There are now efforts by manufacturers to provide automated readers that would address this subjectivity.

Our study did not identify significant geographical regional differences in the frequency of WP bands in Uganda possibly ruling out a local environment factor as responsible for these WP bands.

The screening tests recommended are very sensitive but with low specificity. Due to this, service providers should avoid giving final positive test results based on the screening test results alone as often reported, which could be false positive.

The use of highly specific confirmatory test ensures that a confirmed positive sample is likely to be a true positive. With a highly specific tie-breaker any concordance positive result with the confirmatory test would provide a definitive result. On the other hand, a rare discordance between a tie breaker and an equally specific confirmatory test would warrant a client to have a repeat test after 14 days, a strategy recommended by WHO [17] in high prevalence settings which we propose to adopt under the new national Algorithm. If the rate of HIV-inconclusive results is high, additional efforts to assure quality should be made, and the selection of assays might be reconsidered.

HIV misdiagnosis has recently been reported [25], raising concerns about RT algorithms. Our study however on the contrary has shown that with carefully selected RTs, we can achieve the WHO recommended testing algorithms performance. By further adopting the above additional WHO recommendations for high HIV prevalence settings, we expect increased test results accuracy. None of our best performing algorithms was tested in the Kosack study.

The rapid tests recommended had acceptable inter-reader variability within the WHO recommended < 5% [2], the minor differences were largely as a result of recording the intensity of the bands as either WP or being truly positive an issue which can be minimized if we follow the manufacturer’s guidelines and record all bands as positive. The finger stick testing at the POC was performed at acceptable standards except for one center in Masaka and this was by one individual, as reported above, this was attributed to poor training of this person who joined later in the study. We therefore recommend an active training and QC programme if we are to have an efficient HCT programme.

Since the oral based kits individually did not perform as well as the blood based ones, they are not recommended to be part of the national algorithm.

This HIV RT evaluation, has allowed our NRL to collect and store specimens from different parts of Uganda that will in future be available for QA/QC of new tests that come on market.

We recognize that the kits evaluated are all 3rd generation assays and currently new 4th generation tests have come on the market [6]. The advantage of these new kits is the early detection of recent seroconverters due to the inclusion of the p24 antigen detection. The performance of these new tests for the detection of acute infection are however still questionable [5, 6]. In this evaluation, we also explored the performance of one such kit i.e. the Alere Determine HIV-1/2 Ag/Ab Combo (data not shown). It had a good sensitivity equivalent to the 3rd generation kit, and a considerably better specificity. We were however unable to evaluate its accuracy in the detection of early seroconverters and therefore we could not make meaningful recommendations on its use.

Our study has a number of strengths; it is the largest rapid test evaluation in Uganda in terms of subject numbers and test kits. It is the first that has followed both the CDC/WHO recommended phase I and II approaches which in addition to laboratory, include POC field evaluations by non-laboratory workers. The study evaluated both the blood based and three commercially available oral based kits. Furthermore, the study followed very high QA/QC checks supervised by a highly experienced team with some of the laboratories used having various accreditations such as ISO 15189 and College of American Pathologist (CAP). The other strength of our study was that it was carried out with continuous consultations with the MOH and the C17 committee which advises on issues of HTC. Their recommendations were therefore taken into consideration.

The study had some challenges. There were difficulties in the procurement of some kits. As mentioned earlier, one very good kit Medinostics HIV1/2 Gold Rapid Screen Test was dropped because it could not be procured. The SD Bioline rapid HIV test was temporarily withdrawn by The U.S. President’s Emergency Plan for AIDS Relief (PEPFAR) and removed from the USAID list of approved rapid HIV test kits. During the period of the evaluation and as mentioned earlier Determine changed manufacture from Abbott to Alere Determine, which came on market. We had to evaluate this new test before moving to phase II. The study had planned to perform inter-lab evaluation but this was not possible due to logistical challenges. Finally, there were also some pressures from some manufacturers and suppliers to include their tests in the evaluation.

Like a few other studies in Africa that followed the CDC/WHO recommendations, it was expensive and took long to complete due to different factors including those mentioned above.

The limitations of this study included the fact that the criteria for a fresh blood sample may not have met the manufacture’s recommendations for some kits investigated in phase I of the study due to some delays in transfer of samples from the field to the laboratory. The evaluation of test kits using venous blood sample in phase I may not mimic that using finger stick in the field due to the time difference between collection and testing and also due to the possible interference of the anticoagulant. Whereas it was preferable to use samples from volunteers whose HIV status was unknown and avoid selection bias, the total sample size required to collect all the HIV positives using this approach was going to make the study very costly and hence samples were collected from both known and unknown HIV positive and negative study participants. We decided to use the CDC/WHO guidelines for the sample size as using area specific HIV prevalence resulted in a big sample that could not be used because of budget limitations. It was not logistically possible to evaluate all the 18 algorithms that performed better than the current algorithm in phase II, hence the performance that we report for fifteen of these is limited to the laboratory evaluation (phase I). Finally, the inter observer agreement using finger stick whole blood was not possible between the field site.

Conclusion

In this first evaluation of a national HIV RT algorithm in Uganda using the CDC/WHO recommended method, we have come up with three algorithms we recommend for public or Government procurement based on accuracy and cost, i.e. Determine + SD Bioline + Statpak; Determine + Statpak + SD Bioline and Determine + Statpak + Insti. If it is recommended to stick to only one algorithm as currently, then the algorithm to recommend would be Determine + Statpak + SD Bioline, replacing Unigold as the tie breaker with SD Bioline. We further recommend that all the 18 algorithms that have shown better performance than the current one are all made available to the private sector especially where cost is not a limiting factor.