Data Based Modeling

  • James R. ThompsonEmail author
Part of the Studies in Computational Intelligence book series (SCI, volume 605)


Statisticians spend a great deal of time coming up with tests that are frequently useless in practice and then proving the asymptotic optimality of the tests under the assumption of conditions that do not actually exist. There is another approach: we can use the data to build models. This is the goal of Tukey’s “Exploratory Data Analysis.” In this paper I will be examining alternatives to the Neoclassical Analysis of the stock market, the dominant view still in schools of business, data notwithstanding. Then I will give a brief analysis of the breakdown of America’s public health service in stopping the progression of the AIDS epidemic and demonstrate that simply closing down the gay bathhouses would have prevented AIDS from progressing from an endemic to a full blown epidemic which has already killed more Americans than died in the two world wars.


Capital market line Simugram Maxmedian AIDS 

1 The Introduction

There is a tendency for mathematicians and statisticians (including “applied” ones) to believe that in the vast literature of theoretical tools, there must be one appropriate to the problem at hand. This is generally not the case. This fact has been emphasized by Marian Rejewski, who cracked the Enigma code used by the German armed forces, and most sophisticatedly by the German Navy. Dr. Rejewski was not just a theoretical mathematician, but one who had four years of statistical training at Gottingen. Given the task in 1931, he tried the rich panolpy of techniques he had learnt to no effect. Bydgoszcz, where he attended high school was part of the German chunk of partitioned Poland. So as a cadet in high school he learned much about the eccentricities used in military and naval German reports. For example, memos started with a line beginning with “Von” followed by a second line starting with “Zu”. Then turning linguist and cultural sociologist, Rejewski built up a template of forms that must be used in military discourse. At the end of the day, he had reduced the number of feasible combinations in Enigma from \(10^{92}\) to a manageable 100,000. Every time the code was changed by the Germans, using a few dozen cypher clerks, the Rejewski team could come up with the settings used in the new format in a week or so. (It should be noted in passing that the submarine codes could only be changed when submarines were docking at German occupied ports and that the SS never departed from the original settings of 1932).

The British have always minimized the fact that it was the Poles who cracked Enigma. However, Rejewski and his crew saved the British from a starvation-induced peace with the Nazis. Rejewski’s filtering “bombe” was the first digital computer and the coding is correctly viewed as proto-Unix. It is usually the case that real world problems require stepping outside the standard tool boxes of mathematics and statistics.

2 If Only the Market Were a Martingale (But It Is Not)

One way to express the Weak Form of the Efficient Market Hypothesis is to require that stocks have martingale structure, i.e., for a stock S(t), the expected value at any future time \(t+r\) is S(t). In other words, a stock which has been going up for the last 10 sessions is no more worthy an investment than a stock which has gone down for the last 10 sessions. This is counterintuitive, but has been the basis of several Nobel Prizes in Economics. One of these belongs to William Sharpe for his Capital Market Theory [1, 2].

If we may assume that investors behave in a manner consistent with the Efficient Market Hypothesis, then certain statements may be made about the nature of capital markets as a whole. Before a complete statement of capital market theory may be advanced, however, certain additional assumptions must be presented:
  1. 1.

    The \(\mu \) and \(\sigma \) of a portfolio adequately describe it for the purpose of investor decision making [\(U = f( \sigma , \mu ) \)].

  2. 2.

    Investors can borrow and lend as much as they want at the riskless rate of interest.

  3. 3.

    All investors have the same expectations regarding the future, the same portfolios available to them, and the same time horizon.

  4. 4.

    Taxes, transactions costs, inflation, and changes in interest rates may be ignored.

Under the above assumptions, all investors will have identical opportunity sets, borrowing and lending rates (\(r_L = r_B\)) and, thus, identical optimal borrowing-lending portfolios, say X (see Fig. 1).
Fig. 1

The capital market line

The borrowing-lending line for the market as whole is called the Capital Market Line. The “market portfolio” (X) employed is the total universe of available securities weighted by their total stock value relative to all the stocks in the market universe (called the market portfolio) by the reasoning given above. The CML is linear and it represents the combination of a risky portfolio (X) and a riskless security (a T\(-\)Bill). One use made of the CML is that its slope provides the so-called market price of risk, or, that amount of increased return required by market conditions to justify the acceptance of an increment to risk, that is
$$\begin{aligned} \text{ slope } = \frac{\mu (X) -r}{\sigma (X)} . \end{aligned}$$
The simple difference \(\mu (X)-r\) is called the equity premium, or the expected return differential for investing in risky equities rather than riskless debt.
This very elegant result of Sharpe indicates that one simply cannot do better than invest along the Sharpe Superefficient Frontier (CML). Of course, if one wishes to invest on “autopilot” there are ways to do so. John Bogle has effectively and non-mathematically argued [3] that the value of investment counsellors is, in general, not worth their fees. Many years ago, he founded the Vanguard S&P 500 fund (among others) which maintains a portfolio balanced according to the market cap values of each of the members of the Standard and Poor selected basket of top 500 stocks. Thus the weight of investment in the \(i'th\) stock would be
$$\begin{aligned} w_i = \frac{V_i}{\varSigma V_j} \end{aligned}$$
where \(V_i\) is the total market value of all the stocks in company i. Interestingly, Bogle’s strategy is actually very close to the “total market index fund” suggested by Nobel laureate William Sharpe. However, Thompson et al. [4] took a backlook at 50,000 randomly selected portfolios from the 1,000 largest market cap stocks over a period of 40 years. They discovered that over over half lie above the CML. How it has been that EMH enthusiasts apparently failed to crunch the numbers is a matter of conjecture. Nor is this result surprising, since the Standard and Poor Index fund over this period has averaged an annual return of somewhat in excess of 10 % while Buffett’s Berkshire-Hathaway has delivered over 20 % (Fig. 2).
Fig. 2

Randomly selected portfolios beating the super efficient frontier portfolios

Fig. 3

Market cap weight versus equal weight

When we see that randomnly selected portfolios frequently lie above the Capital Market Line, we are tempted to see what happens when we make a selection based on equal weights of, for example, the Standard and Poor 100. We shall also demonstrate the results of Thompson’s patented Simugram portfolio selection algorithm [5]. Space does not permit a discussion of this algorithm. Suffice it to say that though quite different from the fundamental analysis of Buffet, it achieves roughly the same results. During the economic shocks caused by the market collapse of 2008–2009, both the Simugram and the analysis of Buffett proved themselves nonrobust against massive intervention of the Federal Reserve Bank to save the large banks. (Now that QE3 has ended, Simugram appears to be working again).

We show such a comparison in Fig. 3.

3 “Everyman’s” MaxMedian Rule for Portfolio Management

If index funds, such as Vanguard’s S&P 500 are popular (and with some justification they are), this is partly due to the fact that over several decades the market cap weighted portfolio of stocks in the S&P 500 of John Bogle (which is slightly different that a total market fund) has small operataing fees, currently, less than 0.1 % compared to fund management rates typically around 40 times that of Vanguard. And, with dividends thrown in, it produces around a 10 % return. Many people prefer large cap index funds like those of Vanguard and Fidelity. The results of managed funds have not been encouraging overall, although those dealing with people like Peter Lynch and Warren Buffet have done generally well. John Bogle probably did not build his Vanguard funds because of any great faith in fatwahs coming down from the EMH professors at the University of Chicago. Rather, he was arguing that investors were paying too much for the “wisdom” of the fund managers. There is little question that John Bogle has benefited greatly the middle class investor community.

That being said, we have shown earlier that market cap weighted funds do no better (actually worse) than those selected by random choice. It might, then, be argued that there are nonrandom strategies which the individual investor could use to his/her advantage. For example if one had invested in the stocks with equal weight in the S&P 100 over the last 40 years rather than by weighting according to market cap, he would have experienced a significantly higher annual growth (our backtest revealed as much as a 5 % per year difference in favor of the equal weighted portolio). We remind the reader that the S&P 100 universe has been selected by fundamental analysis from the S&P 500 according to fundamental analysis and balance. Moreover, the downside losses in bad years would have been less than with a market cap weighted fund. It would be nice if we could come up with a strategy which kept only 20 stocks in the portfolio. If one is into managing ones own portfolio, it would appear that Baggett and Thompson [6] did about as well with their MaxMedian Rule as the equal weight of the S&P 100 using a portfolio size of only 20 stocks. I am harking back to the old morality play of “Everyman” where the poor average citizen moving through life is largely abandoned by friends and advisors except for Knowledge who assures him “Everyman, I will accompany you and be your guide.”

The MaxMedian Rule [6] of Baggett and Thompson, given below, is easy to use and appears to beat the Index, on the average, by up to an annual multiplier of 1.05, an amount which is additionally enhanced by the power of compound interest. Note that \((1.15/1.10)^{45} = 7.4\), a handy bonus to one who retires after 45 years. A purpose of the MaxMedian Rule was to provide individual investors with a tool which they could use and modify without the necessity of massive computing. Students in my classes have developed their own paradigms, such as the MaxMean Rule. In order to use such rules, one need only purchase for a very modest one time fee the Yahoo base hquotes program from (The author owns no portion of the hquotes company).

The MaxMedian Rule
  1. 1.

    For the 500 stocks in the S&P 500 look back at the daily returns S(jt) for the preceding year

  2. 2.

    Compute the day to day ratios \(r(j,t) = S(j,t)/S(j,t-1)\)

  3. 3.

    Sort these for the year’s trading days

  4. 4.

    Discard all r values equal to one

  5. 5.

    Look in the 500 medians of the ratios

  6. 6.

    Invest equally in the 20 stocks with the largest medians

  7. 7.

    Hold for one year, then liquidate.

Fig. 4

A comparison of three investment strategies

In Fig. 4 we examine the results of putting one present value dollar into play in three different investments: 5 % yielding T-Bill, S&P 500 Index Fund, MaxMedian Rule. First, we shall do the investment simply without avoiding the intermediate taxing structure. The assumptions are that interest income is taxed at 35 %; capital gains and dividends are taxed at 15 %; and inflation is 2 %. As we see, the T-Bill invested dollar is barely holding its one dollar value over time. The consequences of such an investment strategy are disastrous as a vehicle for retirement. On the other hand, after 40 years, the S&P 500 Index Fund dollar has grown to 11 present value dollars. The MaxMedian Rule dollar has grown to 55 present value dollars. Our investigations indicate that the MaxMedian Rule performs about as well as an equal weighted S&P 100 portfolio, though the latter has somewhat less downside in bad years. Of course, it is difficult for the individual investor to buy into a no load equal weight S&P 100 index fund. So far as the author knows, none currently exist, though equal weighted S&P 500 index funds do (the management fees seem to be in the 0.50 % range). For reasons not yet clear to the author, the advantage of the equal weight S&P index fund is only 2 % greater than that of the market cap weight S&P 500. Even so, when one looks at the compounded advantage over 40 years, it appears to be roughly a factor of two. It is interesting to note that the bogus Ponzi scheme of Maidoff claimed returns which appear to be legally attainable either by the MaxMedian Rule or the equal weight S&P 100 rule. This leads the author to the conclusion that most of the moguls of finance and the Federal Reserve Bank have very limited data analytical skills or even motivation to look at market data.

3.1 Investing in a 401-k

Money invested in a 401-k plan avoids all taxes until the money is withdrawn, at which time it is taxed at the current level of tax on ordinary income. In Table 1, we demonstrate the results of adding an annual inflation adjusted $5,000 addition to a 401k for 40 years, using different assumptions of annual inflation rates. ($5,000 is very modest but that sum can be easily adjusted.) All values are in current value dollars.
Table 1

40 year end results of three 401-k strategies


2 %

3 %

5 %

8 %






S&P Index










We recall that when these dollars are withdrawn, taxes must be paid. So, in computing the annual cost of living, one should figure in the tax burden. Let us suppose the cost of living including taxes for a family of two is $70,000 beyond Social Security retirement checks. We realize that the 401-k portion which has not been withdrawn will continue to grow (though the additions from salary will have ceased upon retirement). Even for the unrealistically low inflation rate of 2 % the situation is not encouraging for the investor in T-bills. Both the S&P Index holder and the Max Median holder will be in reasonable shape. For the inflation rate of 5 %, the T-bill holder is in real trouble. The situation for the Index Fund holder is also risky. The holder in the MaxMedian Rule portfolio appears to be in reasonable shape. Now, by historical standards, 5 % inflation is high for the USA. On the other hand, we observe that the decline of the dollar against the Euro during the Bush Administration was as high as 8 % per year.

Hence, realistically, 8 % could be a possibility to the inflation rate for the future in the United States. In such a case, of the four strategies considered, only the return available from the MaxMedian Rule leaves the family in reasonable shape. Currently, even the Euro is inflation-stressed due to the social welfare excesses of some of the Eurozone members. From a societal standpoint, it is not necessary that an individual investor achieve spectacular returns. What is required is effectiveness, robustness, transparency, and simplicity of use so that the returns will be commensurate with the normal goals of families: education of children, comfortable retirement, etc. Furthermore, it is within the power of the federal government to bring the economy to such a pass where even the prudent cannot make do. The history of modern societies shows that high rates of inflation cannot be sustained without some sort of revolution, such as that which occurred at the end of the Weimar Republic. Unscrupulous bankers encourage indebtedness on the unwary, taking their profits at the front end and leaving society as a whole to pick up the bill. Naturally, as a scientist, I would hope that the empirical rules such as the MaxMedian approach of Baggett and Thompson will lead to fundamental insights about the market and the economy more generally. Caveat: The MaxMedian Rule is freeware not quality assured or extensively tested. If you use it, remember what you paid for it. The goal of the MaxMedian Rule is to enable the individual investor to develop his or her own portfolios without the assistance of generally overpriced and underachieving investment fund managers. The investor gets to use all sorts of readily available information in public libraries, e.g., Investors Business Daily. Indeed, many private investors will subscribe to IBD as well as to other periodicals. Obviously, even if a stock is recommended by the MaxMedian rule (or any rule) and there is valuable knowledge, such as that the company represented by the stock is under significant legal attack for patent infringement, oil spills, etc., exclusion of the stock from the portfolio might be indicated. The bargain brokerage Fidelity provides abundant free information for its clients and generally charges less than 8 dollars per trade.

Obviously, one might choose rather a MaxMean rule or a Max 60 Percentile rule or an equal weight Index rule. The MaxMedian was selected to minimize the optimism caused by the long right hand tails of the log normal curves of stock progression. MaxMean is therefore more risky. There are many which might be tested by a forty year backtest. My goal is not to push the MaxMedian Rule or the MaxMean Rule or the equal weight S&P 100 rule or any rule, but rather allow the intelligent investor to invest without paying vast sums to overpriced and frequently clueless MBAs. If, at the end of the day, the investor chooses to invest in market cap based index funds, that is suboptimal but not ridiculous. What is ridiculous is not to work hard to understand as much as practicable about investment. This chapter is a very good start. It has to be observed that at this time in history, investment in US Treasury Bills or bank cds would appear to be close to suicidal. Both the Federal Reserve and the investment banks are doing the American middle class no good service. 0.1 % return on Treasury Bills is akin to theft, and what some of the investment banks do is akin to robbery. By lowering the interest rate to nearly zero, the Federal Reserve has damaged the savings of the average citizen and laid the groundwork for future high inflation. The prudent investor is wise to invest in stocks rather than in bonds.

I have no magic riskless formula for getting rich. Rather, I shall offer some opinions about alternatives to things such as buying T-Bills. Investing in market cap index funds is certainly suboptimal. However, it is robustness and transparency rather than optimality which should be the goal of the prudent investor. It should be remembered that most investment funds do charge the investor a fair amount of his/her basic investment whatever be the results. The EMH is untrue and does not justify investment in a market cap weighted index fund. However, the fact is that, with the exception of such gurus as Warren Buffett and Peter Lynch, the wisdom of the professional market forecaster seldom justifies the premium of the guru’s charge. There are very special momentum based programs (on one of which the author holds a patent), in which the investor might do well. However, if one simply manages one’s own account, using MaxMean or MaxMean within an IRA, it would seem to be better than trusting in gurus who have failed again and again. Berksire-Hathaway has proved to be over the years a vehicle which produces better than 20 % return. For any strategy that the investor is considering, backtesting for, say, 40 years, is a very good idea. That is not easy to achieve with equal weight funds, since they have not been around very long. Baggett and Thompson had to go back using raw S&P 100 data to assess the potential of an S&P 100 equal weight fund. If Bernie Maidoff had set up such a fund, he might well have been able to give his investors the 15 % return he promised but did not deliver.

The United States government has been forcing commercial banks to grant mortgage loans to persons unlikely to be able to repay them, and its willingness to allow commercial banks to engage in speculative derivative sales, is the driving force behind the market collapse of the late Bush Administration and the Obama Administration. Just the war cost part of the current crisis due to what Nobel Laureate Joseph Stiglitz has described as something beyond a three trillion dollar war in the Middle East has damaged both Berkshire-Hathaway’s and other investment strategies. To survive in the current market situation, one must be agile indeed. Stiglitz keeps upping his estimates of the cost of America’s war in the Middle East. Anecdotally, I have seen estimates as high as six trillion dollars. If we realize that the cost of running the entire US Federal government is around three trillion dollars per year, then we can see what a large effect Bush’s war of choice has had on our country’s aggregate debt. This fact alone would indicate that a future damqging inflation is all but certain. To some extent, investing in the stock market could be viewed as a hedge against inflation.

In the next section, we will examine another cause of denigration and instability in the economy, the failure of the Centers for Disease Control to prevent the AIDS endemic from becoming an AIDS epidemic.

4 AIDS: A New Epidemic for America

In 1983, I was investigating the common practice of using stochastic models in dealing with various aspects of diseases. Rather than considering a branching process model for the progression of a contagious disease, it is better to use differential equation models of the mean trace of susceptibles and infectives. At this time the disease had infected only a few hundred in the United States and was still sometimes referred to as GRIDS (Gay Related Immunodeficiency Syndrome). The more politically correct name of AIDS soon replaced it.

Even at the very early stage of an observed United States AIDS epidemic, several matters appeared clear to me:
  • The disease favored the homosexual male community and outbreaks seemed most noticeable in areas with sociologically identifiable gay communities.

  • The disease was also killing (generally rather quickly) people with acute hemophilia.

  • Given the virologist’s maxim that there are no new diseases, AIDS in the United States had been identified starting around 1980 because of some sociological change. A disease endemic under earlier norms, it had blossomed into an epidemic due to a change in society.

At the time, which was before the HIV virus had been isolated and identified, there was a great deal of commentary both in the popular press and in the medical literature (including that of the Centers for Disease Control) to the effect that AIDS was a new disease. Those statements were not only false but were also potentially harmful. First of all, from a practical virological standpoint, a new disease might have as a practical implication genetic engineering by a hostile foreign power. This was a time of high tension in the Cold War, and such an allegation had the potential for causing serious ramifications at the level of national defense.

Secondly, treating an unknown disease as a new disease essentially removes the possibility of stopping the epidemic sociologically by simply seeking out and removing (or lessening) the cause(s) that resulted in the endemic being driven over the epidemiological threshold.

For example, if somehow a disease (say, the Lunar Pox) has been introduced from the moon via the bringin in of moon rocks by American astronauts, that is an entirely different matter than, say, a mysterious outbreak of dysentery in St. Louis. For dysentery in St. Louis, we check food and water supplies, and quickly look for “the usual suspects”—unrefrigerated meat, leakage of toxins into the water supply, and so on. Given proper resources, eliminating the epidemic should be straightforward.

For the Lunar Pox, there are no usual suspects. We cannot, by reverting to some sociological status quo ante, solve our problem. We can only look for a bacterium or virus and try for a cure or vaccine. The age-old way of eliminating an epidemic by sociological means is difficult—perhaps impossible.

In 1982, it was already clear that the United States public health establishment was essentially treating AIDS as though it were the Lunar Pox. The epidemic was at levels hardly worthy of the name in Western Europe, but it was growing. Each of the European countries was following classical sociological protocols for dealing with a venereal disease. These all involved some measure of defacilitating contacts between infectives and susceptibles. The French demanded bright lighting in gay “make-out” areas. Periodic arrests of transvestite prostitutes in the Bois de Boulogne were widely publicized. The Swedes took much more draconian steps, mild in comparison with those of the Cubans. The Americans took no significant sociological steps at all.

However, as though following the Lunar Pox strategy, the Americans outdid the rest of the world in money thrown at research related to AIDS. Some of this was spent on isolating the unknown virus. However, it was the French, spending pennies to the Americans’ dollars, at the Pasteur Institute who first isolated HIV. In the intervening 30 years since isolation of the virus, no effective vaccine or cure has been produced.

4.1 Why Was the AIDS Epidemic so Much More Prevalent in America Than in Other First World Countries?

Although the popular press in the early 1980s talked of AIDS as being a new disease prudence and experience indicated that it was not. Just as new species of animals have not been noted during human history, the odds for a sudden appearance (absent genetic engineering) of a new virus are not good. My own discussions with pathologists with some years of experience gave anecdotal cases of young Anglo males who had presented with Kaposi’s sarcoma at times going back to early days in the pathologists’ careers. This pathology, previously seldom seen in persons of Northern European extraction, now widely associated with AIDS, was at the time simply noted as isolated and unexplained. Indeed, a few years after the discovery of the HIV virus, HIV was discovered in decades old refrigerated human blood samples from both Africa and America.

Although it was clear that AIDS was not a new disease, as an epidemic it had never been recorded as such. Because some early cases were from the Congo, there was an assumption by many that the disease might have its origins there. Record keeping in the Congo was not and is not very good. But Belgian colonial troops had been located in that region for many years. Any venereal disease acquired in the Congo should have been vectored into Europe in the 19th century. But no AIDS-like disease had been noted. It would appear, then, that AIDS was not contracted easily as is the case, say, with syphilis. Somehow, the appearance of AIDS as an epidemic in the 1980s, and not previously, might be connected with higher rates of promiscuous sexual activity made possible by the relative affluence of the times.

Then there was the matter of the selective appearance of AIDS in the American homosexual community. If the disease required virus in some quantity for effective transmission (the swift progression of the disease in hemophiliacs plus the lack of notice of AIDS in earlier times gave clues that such might be the case), then the profiles in Figs. 5 and 6 give some idea of why the epidemic seemed to be centered in the American homosexual community. If passive to active transmission is much less likely than active to passive, then clearly the homosexual transmission patterns facilitate the disease more than the heterosexual ones.
Fig. 5

Heterosexual transmission of AIDs

Fig. 6

Homosexual transmission of AIDs

One important consideration that seemed to have escaped attention was the appearance of the epidemic in 1980 instead of 10 years earlier. Gay lifestyles had begun to be tolerated by law enforcement authorities in the major urban centers of America by the late 1960s. If homosexuality was the facilitating behavior of the epidemic, then why no epidemic before 1980? Of course, believers in the “new disease” theory could simply claim that the causative agent was not present until around 1980. In the popular history of the early American AIDS epidemic, And the Band Played On, Randy Shilts points at a gay flight attendant from Quebec as a candidate for “patient zero.” But this “Lunar Pox” theory was not a position that any responsible epidemiologist could take (and, indeed, as pointed out, later investigations revealed HIV samples in human blood going back into the 1940s).

What accounts for the significant time differential between civil tolerance of homosexual behavior prior to 1970 and the appearance of the AIDS epidemic in the 1980s? Were there some other sociological changes that had taken place in the late 1970s that might have driven the endemic over the epidemiological threshold?

It should be noted that in 1983, data were skimpy and incomplete. As is frequently the case with epidemics, decisions need to be made at the early stages when one needs to work on the basis of skimpy data, analogy with other historical epidemics, and a model constructed on the best information available.

I remember in 1983 thinking back to the earlier American polio epidemic that had produced little in the way of sociological intervention and less in the way of models to explain the progress of the disease. Although polio epidemics had been noted for some years (the first noticed epidemic occurred around the time of World War I in Stockholm), the American public health service had indeed treated it like the “Lunar Pox.” That is, they discarded sociological intervention based on past experience of transmission pathways and relied on the appearance of vaccines at any moment. They had been somewhat lucky, since Dr. Jonas Salk started testing his vaccine in 1952 (certainly they were luckier than the thousands who had died and the tens of thousands who had been permanently crippled). But basing policy on hope and virological research was a dangerous policy (how dangerous we are still learning as we face the reality of 650,000 Americans dead by 2011 from AIDS). I am unable to find the official CDC death count in America as of the end of 2014, but a senior statistician colleague from CDC reckons that 700,000 is not unreasonable.

Although some evangelical clergymen inveighed against the epidemic as divine retribution on homosexuals, the function of epidemiologists is to use their God-given wits to stop epidemics. In 1983, virtually nothing was being done except to wait for virological miracles.

One possible candidate was the turning of a blind eye by authorities to the gay bathhouses that started in the late 1970s. These were places where gays could engage in high frequency anonymous sexual contact. By the late 1970s they were allowed to operate without regulation in the major metropolitan centers of America. My initial intuition was that the key was the total average contact rate among the target population. Was the marginal increase in the contact rate facilitated by the bathhouses sufficient to drive the endemic across the epidemiological threshold? It did not seem likely. Reports were that most gays seldom (many, never) frequented the bathhouses.

In the matter of the present AIDS epidemic in the United States, a great deal of money is being spent. However, practically nothing in the way of steps for stopping the transmission of the disease is being done (beyond education in the use of condoms). Indeed, powerful voices in the Congress speak against any sort of government intervention. On April 13, 1982, Congressman Henry Waxman [7] stated in a meeting of his Subcommittee on Health and the Environment, “I intend to fight any effort by anyone at any level to make public health policy regarding Kaposi’s sarcoma or any other disease on the basis of his or her personal prejudices regarding other people’s sexual preferences or life styles.” (It is significant that Representative Waxman has been one of the most strident voices in the fight to stop smoking and global warming, considering rigorous measures acceptable to end these threats to human health.)

In light of Congressman Waxman’s warnings, it would have taken brave public health officials to close the gay bathhouses. We recall how Louis Pasteur had been threatened with the guillotine if he insisted on proceeding with his rabies vaccine and people died as a result. He proceeded with the testings, starting on himself. There were no Louis Pasteurs at the CDC. The Centers for Disease Control have broad discretionary powers and its members have military uniforms to indicate their authority. They have no tenure, however. The Director of the CDC could have closed the bathhouses, but that would have been an act of courage which could have ended his career. Of all the players in the United States AIDS epidemic, Congressman Waxman may be more responsible than any other for what has turned out to be a death tally exceeding any of America’s wars, including its most lethal, the American War Between the States (aka the Civil War).

5 The Effect of the Gay Bathhouses

But perhaps my intuitions were wrong. Perhaps it was not only the total average contact rate that was important, but a skewing of contact rates, with the presence of a high activity subpopulation (the bathhouse customers) somehow driving the epidemic. It was worth a modeling try.

The model developed in [8] considered the situation in which there are two subpopulations: the majority, less sexually active, and a minority with greater activity than that of the majority. We use the subscript “1” to denote the majority portion of the target (gay) population, and the subscript “2” to denote the minority portion. The latter subpopulation, constituting fraction p of the target population, will be taken to have a contact rate \(\tau \) times the rate k of the majority subpopulation. The following differential equations model the growth of the number of susceptibles \(X_i\) and infectives \(Y_i\) in subpopulation i (\(i = 1,2\)).
$$\begin{aligned} \frac{dY_{1}}{dt}= & {} \frac{k \alpha X_{1} (Y_{1} + \tau Y_{2} )}{ X_{1} + Y_{1} + \tau (Y_{2} + X_{2} ) } - ( \gamma + \mu ) Y_{1}, \nonumber \\ \frac{dY_{2}}{dt}= & {} \frac{k \alpha \tau X_{2} (Y_{1} + \tau Y_{2} )}{ X_{1} + Y_{1} + \tau (Y_{2} + X_{2} ) } - ( \gamma + \mu ) Y_{2},\\ \frac{dX_{1}}{dt}= & {} - \frac{k \alpha X_{1} (Y_{1} + \tau Y_{2} )}{ X_{1} + Y_{1} + \tau (Y_{2} + X_{2} ) }+(1-p) \lambda - \mu X_{1},\nonumber \\ \frac{dX_{2}}{dt}= & {} - \frac{k \alpha \tau X_{2} (Y_{1} + \tau Y_{2} )}{ X_{1} + Y_{1} + \tau (Y_{2} + X_{2} ) } + p \lambda - \mu X_{2}. \nonumber \end{aligned}$$

k = number of contacts per month,

\(\alpha \) = probability of contact causing AIDS,

\(\lambda \) = immigration rate into the population,

\(\mu \) = emigration rate from the population,

\(\gamma \) = marginal emigration rate from the population due

to sickness and death.

In Thompson [8], it was noted that if we started with 1,000 infectives in a target population with \(k \alpha = 0.05\), \(\tau = 1\), a susceptible population of 3,000,000 and the best guesses then available (\(\mu = 1/(15 \times 12)= 0.00556\), \(\gamma = 0.1\), \(\lambda = 16{,}666\)) for the other parameters, the disease advanced as shown in Table 2.
Table 2

Extrapolated AIDS cases: \(\varvec{k \alpha =0.05}\), \(\varvec{\tau = 1}\)


Cumulative deaths

Fraction infective



















Next, a situation was considered in which the overall contact rate was the same as in Table 2, but it was skewed with the more sexually active subpopulation 2 (of size 10 %) having contact rates 16 times those of the less active population.
Table 3

Extrapolated AIDS cases: \(\varvec{k \alpha =0.02}\), \(\varvec{\tau = 16}\), \(\varvec{p=0.10}\)


Cumulative deaths

Fraction infective

























Even though the overall average contact rate in Tables 2 and 3 is the same \((k \alpha )_\mathrm{overall} = 0.05\), the situation is dramatically different in the two cases. Here, it seemed, was a prima facie explanation as to how AIDS was pushed over the threshold to a full-blown epidemic in the United States: a small but sexually very active subpopulation.

This was the way things stood in 1984 when I presented my AIDS paper at the summer meetings of the Society for Computer Simulation in Vancouver. It hardly created a stir among the mainly pharmacokinetic audience who attended the talk. And, frankly, at the time I did not think too much about it because I supposed that probably even as the paper was being written, the “powers that be” were shutting down the bathhouses. The deaths at the time were numbered in the hundreds, and I did not suppose that things would be allowed to proceed much longer without sociological intervention. Unfortunately, I was mistaken.

In November 1986, the First International Conference on Population Dynamics took place at the University of Mississippi where there were some of the best biomathematical modelers from Europe and the United States. I presented my AIDS results [9], somewhat updated, at a plenary session. By this time, I was already alarmed by the progress of the disease (over 40,000 cases diagnosed and the bathhouses still open). The bottom line of the talk had become more shrill: namely, every month delayed in shutting down the bathhouses in the United States would result in thousands of deaths. The reaction of the audience this time was concern, partly because the prognosis seemed rather chilling, partly because the argument was simple to follow and seemed to lack holes, and partly because it was clear that something was pretty much the matter if things had gone so far off track.

After the talk, the well-known Polish probabilist Robert Bartoszyński, with whom I had carried out a lengthy modeling investigation of breast cancer and melanoma (at the Curie-Sklodowska Institute in Poland and at Rice), took me aside and asked whether I did not feel unsafe making such claims. “Who,” I asked, “will these claims make unhappy”? “The homosexuals,” said Bartoszyński. “No, Robert,” I said, “I am trying to save their lives. It will be the public health establishment who will be offended.”

And so it has been in the intervening years. I have given AIDS talks before audiences with significant gay attendance in San Francisco, Houston, Washington, and other locales without any gay person expressing offense. Indeed, in his 1997 book [10], Gabriel Rotello, one of the leaders of the American gay community, not only acknowledges the validity of my model but also constructs a survival plan for gay society in which the bathhouses have no place.

5.1 A More Detailed Look at the Model

A threshold investigation of the two-activity population model (2) is appropriate here. Even today, let alone in the mid-1980s, there was no chance that one would have reliable estimates for all the parameters k, \(\alpha \), \(\gamma \), \(\mu \), \(\lambda \), p, \(\tau \). Happily, one of the techniques sometimes available to the modeler is the opportunity to express the problem in such a form that most of the parameters will cancel out. For the present case, we will attempt to determine the \(k \alpha \) value necessary to sustain the epidemic when the number of infectives is very small. For this epidemic in its early stages one can manage to get a picture of the bathhouse effect using only a few parameters: namely, the proportion p of the target population which is sexually very active and the activity multiplier \(\tau \).

For \(Y_{1}\) = \(Y_{2}\) = 0 the equilibrium values for \(X_{1}\) and \(X_{2}\) are \( (1-p)( \lambda / \mu ) \) and \(p ( \lambda / \mu )\), respectively. Expanding the right-hand sides of (2) in a Maclaurin series, we have (using lower case symbols for the perturbations from 0)
$$\begin{aligned} \frac{dy_{1}}{dt}= & {} \left[ \frac{k \alpha (1-p)}{1-p+ \tau p} - ( \gamma + \mu ) \right] y_{1} + \frac{k \alpha (1-p)\tau }{1-p+ \tau p }\, y_{2}\nonumber \\ \frac{dy_{2}}{dt}= & {} \frac{k \alpha \tau p}{1-p+ \tau p }\, y_{1} + \left[ \frac{k \alpha \tau ^ {2} p }{1-p+ \tau p } - ( \gamma + \mu ) \right] y_{2} .\nonumber \end{aligned}$$
Summing then gives
$$ \frac{dy_{1}}{dt} + \frac{dy_{2}}{dt} = \left[ k \alpha - ( \gamma + \mu ) \right] y_1 + \left[ k \alpha \tau - ( \gamma + \mu ) \right] y_2. $$
In the early stages of the epidemic,
$$\begin{aligned} \frac{dy_1 /dt}{dy_2 /dt} = \frac{(1-p)}{p \tau } . \end{aligned}$$
That is to say, the new infectives will be generated proportionately to their relative numerosity in the initial susceptible pool times their relative activity levels. So, assuming a negligible number of initial infectives, we have
$$\begin{aligned} y_1 = \frac{(1-p)}{p \tau } y_2 . \end{aligned}$$
Substituting in the expression for \(dy_1 /dt + dy_2 /dt \), we see that for the epidemic to be sustained, we must have
$$\begin{aligned} k \alpha > \frac{(1+ \mu )(1-p + \tau p)}{1 -p + p {\tau }^2} (\gamma + \mu ). \end{aligned}$$
Accordingly we define the heterogeneous threshold via
$$ k_\mathrm{het} \alpha =\frac{(1+ \mu )(1-p + \tau p)}{1 -p + p {\tau }^2} (\gamma + \mu ). $$
Now, in the homogeneous contact case (i.e., \({\tau = 1}\)), we note that for the epidemic not to be sustained, the condition in Eq. (4) must hold.
$$\begin{aligned} k \alpha < ( \gamma + \mu ). \end{aligned}$$
Accordingly we define the homogeneous threshold by
$$ k_\mathrm{hom} \alpha = ( \gamma + \mu ). $$
For the heterogeneous contact case with \(k_\mathrm{het}\), the average contact rate is given by
$$ k_\mathrm{ave} \alpha = p \tau (k_\mathrm{het} \alpha ) + (1-p ) ( k_\mathrm{het} \alpha ) = \frac{(1+ \mu )(1-p + \tau p)}{1 -p + p {\tau }^2} (\gamma + \mu ). $$
Dividing the sustaining value \( k_\mathrm{hom} \alpha \) by the sustaining value \( k_\mathrm{ave}\alpha \) for the heterogeneous contact case then produces
$$ Q = \frac{ 1 - p + { \tau }^2 p }{ (1 - p + \tau p )^2}. $$
Notice that we have been able here to reduce the parameters necessary for consideration from seven to two. This is fairly typical for model-based approaches: the dimensionality of the parameter space may be reducible in answering specific questions. Figure 7 shows a plot of this “enhancement factor” Q as a function of \(\tau \). Note that the addition of heterogeneity to the transmission picture has roughly the same effect as if all members of the target population had more than doubled their contact rate. Remember that the picture has been corrected to discount any increase in the overall contact rate which occurred as a result of adding heterogeneity. In other words, the enhancement factor is totally a result of heterogeneity. It is this heterogeneity effect which I have maintained (since 1984) to be the cause of AIDS getting over the threshold of sustainability in the United States. Data from the CDC on AIDS have been other than easy to find. Concerning the first fifteen years of the epidemic, Dr. Rachel MacKenzie of the WHO was kind enough to give me the data. Grateful though I was for that data, I know there was some displeasure from the WHO that she had done so, and after 1995 the data appeared on the internet very irregularly with two and three year gaps between data postings. Since the United States was contributing most of the money for AIDS conferences, grants and other activities, I can understand the reluctance of the WHO to give out information which showed how badly the Americans were doing compared to the rest of the First World. Transparency is generally assumed in scientific research, but that assumption is unfortunately wrong in some of the most important situations. Suffice it to say that during the 15 years of WHO data I was presented, the United States had 10 times the AIDS rate per 100,000 of the UK, 8 times that of Netherlands, 7 times that of Denmark, 4 times that of Canada, and 3.5 times that of France. One can understand the embarrassment of the American CDC. I regret to say that AIDS goes largely unmentioned and unnoticed by the American media and such agencies as the NIH, the PHS, and the NCI. Benjamin Franklin once said: “Experience keeps a hard school and a fool will learn by none other.” What about those who continue failed policies ad infinitum? I believe Albert Einstein called them insane.
Sometimes establishment inertia trumps facts. When I started my crusade against the bathhouses, there were two in Houston. Now, within 5 miles of the Texas Medical Center, there are 17. One of these adjoins the hotel Rice frequently uses to house its visitors. Vancouver, which had no bathhouses when I gave my first AIDS lecture there, now has 3. As some may remember if they attended the recent national meetings of the ASA held in Vancouver, the Gay Pride Parade there has floats from the major Canadian banks and from the University of British Columbia School of Medicine. Gay bathhouses are popping up in several European cities as well. The American AIDS establishment has the pretence of having drugs which can make an AIDS sufferer as treatable as a diabetic. That these drugs are dangerous and over time frequently produce pain so severe tht users eventually opt for cessation of treatment is not much spoken about.
Fig. 7

Effect of a high activity subpopulation

6 Conclusions

Data analysis to a purpose is generally messy. If I think back on the very many consulting jobs I have done over the years, very few were solvavable unless one went outside the box of classical statistical tools into other disciplines and murky waters. Indeed, the honoree of this Festschrift Jacek Koronacki is a good example to us all of not taking the easy way out. During martial law, I offered him a tenured post at Rice. I cautioned him that in the unlikely event the Red Army ever left Poland, the next administration would be full of unsavory holdovers from the junior ranks of the Party posing as Jeffersonian reformers. Jacek left Rice, nevertheless, with his wife, daughter and unborn son. He said he could not think of abandoning Poland and his colleagues. It would be ignoble to do so. He would return to Poland with his family and hope God would provide. I have to say that though I was correct in my prophecy, Jacek chose the right path.


  1. 1.
    Sharpe, WE (1964) Capital asset prices: a theory of market equilibrium under conditions of risk. J Finance 19:425–442Google Scholar
  2. 2.
    Sharpe William E (2000) Portfolio theory and capital markets. McGraw Hill, New YorkGoogle Scholar
  3. 3.
    Bogle JC (1999) Common sense and mutual funds: new imperatives for the intelligent investor. Wiley, New YorkGoogle Scholar
  4. 4.
    Thompson JR, Baggett LS, Wojciechowski WC, Williams EE (2006) Nobels for nonsense. J Post Keynesian Econ Fall 3–18Google Scholar
  5. 5.
    Thompson, JR (2010) Methods and apparatus for determining a return distribution for an investment portfolio. US Patent 7,720,738 B2, 18 May 2010Google Scholar
  6. 6.
    Baggett LS, Thompson JR (2007) Every man’s maxmedian rule for portfolio management. In: Proceedings of the 13th army conference on applied statisticsGoogle Scholar
  7. 7.
    Shilts R (1987) And the band played on: politics, people, and the AIDS epidemic. St. Martin’s Press, New York, p 144Google Scholar
  8. 8.
    Thompson JR (1984) Deterministic versus stochastic modeling in neoplasia. In: Proceedings of the 1984 computer simulation conference, society for computer simulation, San Diego, 1984, pp 822–825Google Scholar
  9. 9.
    Thompson JR (1998) The united states AIDS epidemic in first world context. In: Arino O, Axelrod D, Kimmel M (eds) Advances in mathematical population dynamics: molecules, cells and man. World Scientific Publishing Company, Singapore, pp 345–354Google Scholar
  10. 10.
    Rotello G (1997) Sexual ecology: AIDS and the destiny of Gay men. Dutton, New York, pp 85\(-\)89Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of StatisticsRice UniversityHoustonUSA

Personalised recommendations