Closed Worlds and Bayesian Inference

Bayesian inference has at times been criticized as relying on the closed world assumption: that one of the hypotheses we are considering is true.  That is, Bayes’ rule

$$
P(H_i \mid D, X) = \frac{P(H_i \mid X) P(D \mid H_i, X)}{
\sum_j P(H_j \mid X) P(D \mid H_j, X)}
$$

(where \(X\) is our background knowledge, \(D\) is the data, and the \(H_1,\ldots,H_n\) are our hypotheses) is valid only when the background knowledge \(X\) establishes that exactly one of the hypotheses \(H_i\) must be true. The use of Bayes’ rule would then seem to presuppose that one has actually identified and considered all possible hypotheses — a practically impossible task in most (all?) real-world situations.

To illustrate the problem, let’s take a look at a real-world example of considerable practical interest to some people. In December 2002 an article appeared on the Wizard of Odds website claiming that the Casino Bar online blackjack game was not dealing fairly — that it rigged the card draws in certain circumstances to favor the dealer. Here are the specifics:

  • The article states, “Previously somebody approached me with what he claimed was a section of computer code he said was taken from the Casino Bar blackjack game. My interpretation of that code is that if the player has a total of 16-21 and the dealer must take a third card, if that hit card will cause the dealer to bust, then it will be rejected and the dealer will get a second chance card. This second chance card is final, whether or not it will bust the dealer. […] This is what would be known in a real casino as dealing seconds.”
  • Under hypothesis \(H_0\) (fair dealing), here are the probabilities of the dealer going bust on the third card depending on the two-card total:
    • total = 12: probability = 0.3077
    • total = 13: probability = 0.3846
    • total = 14: probability = 0.4615
    • total = 15: probability = 0.5385
    • total = 16: probability = 0.6154
  • Under hypothesis \(H_1\) (dealing seconds), here are the probabilities of the dealer going bust under the same conditions:
    • total = 12: probability = 0.0947
    • total = 13: probability = 0.1479
    • total = 14: probability = 0.2130
    • total = 15: probability = 0.2899
    • total = 16: probability = 0.3787
  • The author ran an experiment to test the blackjack game, and out of 332 hands got these results:
    dealer two-card total # times occurred # times dealer bust on next card
    12 84 11
    13 13 61
    14 18 67
    15 21 61
    16 26 59

Now for our analysis. \(H_0\) and \(H_1\) are the two hypotheses we are especially interested in, but there are a large number of different ways in which the dealing could deviate from a fair deal. Perhaps the first, second, or both cards are not drawn in the required manner (draws from a uniform distribution without replacement). Perhaps the casino deals seconds only every other hand; perhaps the casino occasionally deals seconds for the player (!); and so on. Accounting for the myriad possibilities in our analysis is a daunting task.

What is a good Bayesian to do? We sidestep the problem by considering ratios of posterior probabilities instead of the posterior probabilities themselves. That is, instead of computing \(P(H_0 \mid D, X)\) and \(P(H_1 \mid D, X)\) we will compute

$$\frac{P(H_1 \mid D, X)}{P(H_0 \mid D, X)}.$$

Plugging Bayes’ rule into the above fraction we see that the denominators cancel and we end up with

$$\frac{P(H_1 \mid D, X)}{P(H_0 \mid D, X)} =
\frac{P(H_1 \mid X)}{P(H_0 \mid X)} \cdot
\frac{P(D \mid H_1, X)}{P(D \mid H_0, X)}.$$

In words: to get the ratio of the posterior probabilities of two hypotheses, multiply the ratio of their prior probabilities by the ratio of their likelihoods. And — this is the important point — this formula holds regardless of how many other hypotheses \(H_2, H_3, \ldots\) there may be to consider.

Now let’s apply this formula to the Casino Bar question. First let’s look at the prior probabilities. The prior probability must be split between \(H_0\), \(H_1\), and myriad other possibilities; however, our prior information \(X\) includes that the article’s author was shown what was purported to be source code for the online casino’s blackjack game, implementing the scheme postulated by \(H_1\). This prior information elevates \(H_1\) to a position of prominence that it would otherwise lack. Given this prior information, it is hard to argue that the prior odds ratio \(P(H_1)/P(H_0)\) should be less than 1/100; likewise, it would be hard to argue that the author should have had such a high degree of confidence in his source as to justify a prior odds ratio of greater than 100/1.

The likelihood for \(H_0\) is

$$\begin{array}{rcl} P(D \mid H_0, X) & = & {0.3077}^{11} \times (1 – 0.3077)^{73} \times \\ & & {0.3846}^{13} \times (1 – 0.3846)^{48} \times \\ & & {0.4615}^{18} \times (1 – 0.4615)^{49} \times \\ & & {0.5385}^{21} \times (1 – 0.5385)^{40} \times \\ & & {0.6154}^{26} \times (1 – 0.6154)^{33} \\ & = & {5.296} \times {10}^{-91} \end{array}.$$

The likelihood for \(H_1\) is

$$\begin{array}{rcl} P(D \mid H_0, X) & = & {0.0947}^{11} \times (1 – 0.0947)^{73} \times \\ & & {0.1479}^{13} \times (1 – 0.1479)^{48} \times \\ & & {0.2130}^{18} \times (1 – 0.2130)^{49} \times \\ & & {0.2899}^{21} \times (1 – 0.2899)^{40} \times \\ & & {0.3787}^{26} \times (1 – 0.3787)^{33} \\ & = & {1.766} \times {10}^{-81} \end{array} .$$

The likelihood ratio is then

$$\frac{P(D \mid H_1, X)}{P(D \mid H_0, X)} = \frac{1.766\times 10^{-81}}{5.296\times 10^{-91}} = 3.335\times 10^9.$$

That is, the data are overwhelmingly more likely for \(H_1\) than they are for \(H_0\). Combining this with a prior ratio anywhere from 1/100 to 100/1, we get

$$3.335\times 10^7 \leq \frac{P(H_1 \mid D, X)}{P(H_0 \mid D, X)} \leq 3.335\times 10^{11}.$$

That is, the posterior probability of \(H_1\) (cheating by dealing seconds) is overwhelmingly greater than the posterior probability of \(H_0\) (fair dealing).

Does this mean that we can be nearly certain that the casino is cheating by dealing seconds? Not based on this analysis, because we’ve only looked at two possible hypotheses. But — and here is the other important point — we have decisively ruled out \(H_0\). We know that \( P(H_1 \mid D, X) < 1\), so even using the prior ratio most favorable to \(H_0\) we have

$$P(H_0 \mid D, X) = \frac{P(H_1 \mid D, X)}{3.335\times 10^7} < 0.2999 \times 10^{-7}.$$

Thus we see that, in spite of the closed-world limitation, Bayesian inference is quite capable of disproving a hypothesis to a high degree of certainty.