 Introduction
 Traditional Derivation of EM
 An Entropy Interpretation
 DoublyStochastic EM
 Concluding Remarks
3.4 DoublyStochastic EM
This section presents an EMbased algorithm for problems that possesses partial data with multiple clusters. The algorithm is referred to as as a doublystochastic EM. To facilitate the derivation, adopt the following notations:

X = {x_{t} ∊ R^{D}; t = 1,..., T } is a sequence of partialdata.

Z = {z_{t} ∊ C; t = 1, . . . , T} is the set of hiddenstates.

C = {C^{(1)}, . . . _{,}C(^{J})}, where J is the number of hiddenstates.

Г = {γ^{(1))}, . . . , γ^{(K)}} is the set of values that x_{t} can attain, where K is the number of possible values for x_{t}.
Also define two sets of indicator variables as:
and
Using these notations and those defined in Section 3.2, Q(θθ_{n}) can be written as
where
If θ defines a GMM—that is, θ = {π^{(}^{j}^{)}, μ^{(}^{j}^{)}, — then
3.4.1 SinglyStochastic SingleCluster with Partial Data
This section demonstrates how the general formulation in Eq. 3.4.1 can be applied to problems with a single cluster and partially observable data. Referring to Example 2 shown in Figure 3.3(b), let X = {x1, x_{2} x_{3}, x_{4}, y} = {1, 2, 3, 4, {5 or 6}} be the observed data, where y = {5 or 6} is the observation with missing information. The information is missing because the exact value of y is unknown. Also let z ∊ Г, where Г = {γ^{(1)}, γ^{(2)}} = {5, 6}, be the missing information. Since there is one cluster only and x_{1} to x_{4} are certain, define θ ≡ {μ,σ^{2}}, , set π^{(1)} = 1.0 and write Eq. 3.4.1 as
Note that the discrete density p(y = ^{γ(}^{(k}^{)}y ∊ Г, θ) can be interpreted as the product of density p(y = ^{γ}^{(k)}y ∊ Г) and the functional value of p(yθ) at y = γ^{(k}^{)} as shown in Figure 3.7.
Figure 3.7 The relationship between p(yθ0), p(yy ∊ Г), p(yy ∊ Г, θ0), and P(y = γ(k) y ∊ Г, θ0), where Г = {5, 6}.
Assume that at the start of the iterations, n = 0 and } = {0, 1}. Then, Eq. 3.4.2 becomes
In the Mstep, compute θ_{1} according to
The next iteration replaces θ_{0} in Eq. 3.4.3 with θ_{1} to compute Q(θθ_{1}). The procedure continues until convergence. Table 3.4 shows the value of μ and σ^{2} in the course of EM iterations when their initial values are μ_{0} = 0 and . Figure 3.8 depicts the movement of the Gaussian density function specified by μ and σ^{2} during the EM iterations.
Table 3.4. Values of μ and σ^{2} in the course of EM iterations. Data shown in Figure 3.3(b) were used for the EM iterations.
Iteration (n) 
Q(θθ_{n}) 
μ 
σ^{2} 

0 
−∞ 
0.00 
1.00 
1 
29.12 
3.00 
7.02 
2 
4.57 
3.08 
8.62 
3 
4.64 
3.09 
8.69 
4 
4.64 
3.09 
8.69 
5 
4.64 
3.09 
8.69 
Figure 3.8 Movement of a Gaussian density function during the EM iterations. The density function is to fit the data containing a single cluster with partially observable data.
3.4.2 DoublyStochastic (PartialData and HiddenState) Problem
Here, the singledimension example shown in Figure 3.9 is used to illustrate the application of Eq. 3.4.1 to problems with partialdata and hiddenstates. Review the following definitions:

X = {x_{1}, x_{2},..., x_{6}, y_{1}, y_{2}} is the available data with certain {x_{1},..., x_{6}} and uncertain {y_{1}, y_{2}} observations.

and such that y_{1} ∊ Г_{1} and y_{2} ∊ Г_{2} are the values attainable by y_{1} and y_{2}, respectively.

J = 2 and K = 2.
Figure 3.9 Singledimension example illustrating the idea of hiddenstates and partialdata.
Using the preceding notations results in
where is the posterior probability that y_{t}' is equal to given that y_{t}' is generated by cluster C^{(}^{j)}. Note that when the values of y_{1} and y_{2} are certain (e.g., it is known that y_{1} ═┴ 5, and and become so close that we can consider y_{2} = 9), then K = 1 and Г_{1} = {γ_{1}} = {5} and Г_{2} = {γ_{2}} = {9}. In such cases, the second term of Eq. 3.4.4 becomes
Replacing the second term of Eq. 3.4.4 by Eq. 3.4.5 and seting x_{7} = y_{1} and x_{8} = y_{2} results in
which is the Qfunction of a GMM without partially unknown data with all observable data being certain.