## 1.2. Joint and Conditional Probability

Thus far, we have defined the terms used in studying probability and considered single events in isolation. Having set this foundation, we now turn our attention to the interesting issues that arise when studying **sequences of events**. In doing so, it is very important to keep track of the sample space in which the events are defined: A common mistake is to ignore the fact that two events in a sequence may be defined on different sample spaces.

### 1.2.1. Joint Probability

Consider two processes with sample spaces *S*_{1} and *S*_{2} that occur one after the other. The two processes can be viewed as a single **joint process** whose outcomes are the tuples chosen from the **product space** *S*_{1} × *S*_{2}. We refer to the subsets of the product space as **joint events**. Just as before, we can associate probabilities with outcomes and events in the product space. To keep things straight, in this section, we denote the sample space associated with a probability as a subscript, so that *P*_{S1}(*E*) denotes the probability of event *E* defined over sample space *S*_{1}, and *P*_{S1 × S2} (*E*) is an event defined over the product space *S*_{1} × *S*_{2}.

We will return to the topic of joint processes in Sections 1.8. We now turn our attention to the concept of conditional probability.

### 1.2.2. Conditional Probability

Common experience tells us that if a sky is sunny, there is no chance of rain in the immediate future but that if the sky is cloudy, it may or may not rain soon. Knowing that the sky is cloudy, therefore, increases the chance that it may rain soon, compared to the situation when it is sunny. How can we formalize this intuition?

To keep things simple, first consider the case when two events *E* and *F* share a common sample space and occur one after the other. Suppose that the probability of *E* is *P _{S}*(

*E*) and the probability of

*F*is

*P*(

_{S}*F*). Now, suppose that we are informed that event

*E*actually occurred. By definition, the

**conditional probability**of the event

*F*conditioned on the occurrence of event

*E*is denoted

*P*

_{S × S}(

*F*|

*E*) (read “the probability of

*F*given

*E*”) and computed as

If knowing that *E* occurred does not affect the probability of *F*, *E* and *F* are said to be **independent** and

*P _{S × S} (EF) = P_{S}(E)P_{S}(F)*

The notion of conditional probability generalizes to the case in which events are defined on more than one sample space. Consider a sequence of two processes with sample spaces *S*_{1} and *S*_{2} that occur one after the other. (This could be the condition of the sky now, for instance, and whether it rains after 2 hours.) Let event *E* be a subset of *S*_{1} and event *F* a subset of *S*_{2}. Suppose that the probability of *E* is *P*_{S1}(*E*) and the probability of *F* is *P*_{S2}(*F*). Now, suppose that we are informed that event *E* occurred. We define the probability *P*_{S1 × S2} (*F*|*E*) as the **conditional probability** of the event *F* conditional on the occurrence of *E* as

If knowing that *E* occurred does not affect the probability of *F*, *E* and *F* are said to be **independent** and

It is important not to confuse *P*(*F|E*) and *P*(*F*). The conditional probability is defined in the product space *S*_{1} × *S*_{2} and the unconditional probability in the space *S*_{2}. Explicitly keeping track of the underlying sample space can help avoid apparent contradictions such as the one discussed in Example 1.14.

### 1.2.3. Bayes’s Rule

One of the most widely used rules in the theory of probability is due to an English country minister: Thomas Bayes. Its significance is that it allows us to infer “backwards” from effects to causes rather than from causes to effects. The derivation of his rule is straightforward, though its implications are profound.

We begin with the definition of conditional probability (Equation 1.4):

If the underlying sample spaces can be assumed to be implicitly known, we can rewrite this as

We interpret this to mean that the probability that both *E* and *F* occur is the product of the probabilities of two events: first, that *E* occurs; second, that conditional on *E*, *F* occurs.

Recall that *P*(*F|E*) is defined in terms of the event *F* following event *E*. Now, consider the converse: *F* is known to have occurred. What is the probability that *E* occurred? This is similar to the problem: If there is fire, there is smoke, but if we see smoke, what is the probability that it was due to a fire? The probability we want is *P*(*E|F*). Using the definition of conditional probability, it is given by

Substituting for *P*(*F*) from Equation 1.7, we get

which is **Bayes’s rule**. One way of interpreting this is that it allows us to compute the degree to which some effect, or **posterior** *F*, can be attributed to some cause, or **prior** *E*.

We can generalize Bayes’s rule when a posterior can be attributed to more than one prior. Consider a posterior *F* that is due to some set of *n* priors *E _{i}* such that the priors are mutually exclusive and exhaustive: That is, at least one of them occurs, and only one of them can occur. This implies that . Then,

This is also called the **law of total probability**.

The law of total probability allows one further generalization of Bayes’s rule to obtain **Bayes’s theorem**. From the definition of conditional probability, we have

From Equation 1.7, we have

Substituting Equation 1.10, we get

This is called the **generalized Bayes’s rule**, or Bayes’s theorem. It allows us to compute the probability of any one of the priors *E _{i}*, conditional on the occurrence of the posterior

*F*. This is often interpreted as follows: We have some set of mutually exclusive and exhaustive hypotheses

*E*. We conduct an experiment, whose outcome is

_{i}*F*. We can then use Bayes’s formula to compute the revised estimate for each hypothesis.