Computational trick that we must note

Iterated closed form of covariance matrix

It is noted that we solve the iterative closed form of covariance matrix, we require to devide the interactive term into two terms. For example, we have an objective function \[l(\Sigma)=-\frac{1}{2} \ln |\Sigma| - \frac{1}{2} tr\{\Sigma^{-1} (\sigma^2 I_p + zz^T)\} + \mu^T \Sigma^{-1}z - \frac{1}{2} \mu^T \Sigma{-1} \mu.\] We are used to writing \(\frac{1}{2}\mu^T\Sigma^{-1} z+ \frac{1}{2}z^T\Sigma^{-1} \mu\) as \(\mu^T \Sigma^{-1}z\) since the transpose of \(1\times 1\) matrix is itself. However, it will lead to some problems when we solve \(\Sigma\) and we will obtain a unsymmetric matrix estimae. Taking derivative on \(\Sigma^{-1}\), we have \[\frac{1}{2} \Sigma - \frac{1}{2}(\sigma^2 + zz^T) +\mu z^T - \frac{1}{2} \mu\mu^T,\] which implies \[\Sigma = \sigma^2I_p + zz^T + \mu\mu^T - 2\mu z^T \notin S^{p\times p},\] where \(S^{p\times p}\) is the symmetric matrix space. Note \(2\mu z^T \neq \mu z^T + z\mu^T\).

Thus, we rewrite \(\mu^T \Sigma^{-1}z\) as \(\frac{1}{2}\mu^T\Sigma^{-1} z+ \frac{1}{2}z^T\Sigma^{-1} \mu\), then we obtain \[\Sigma = \sigma^2I_p + zz^T + \mu\mu^T - \mu z^T - z\mu^T \in S^{p\times p}.\] Therefore, we get the results we want.

High-dimensional mixture normal observational log likelihood

Usually, the observational log likelihood has a following form: \[l(\theta)= \sum_i \ln \sum_k \pi_k P(x_i|y_i = 1_k, \mu_k, \Sigma_k)=\sum_i \ln \sum_k \pi_k a_{ik}.\] When \(x_i\) is high dimensional, we often encounter that the computer will recognize \(a_{ik}\) as zero before log is took.

Thus, we use the equivalent form \(\ln a_{ik} - \max_k \ln a_{ik} = b_{ik} - c_i\), which leads to \[a_{ik}= \exp(b_{ik} - c_i) \exp(c_i).\] Furthermore, \[l(\theta) = \sum_i [(\ln \sum_k \pi_k \exp(b_{ik}-c_i)) + \ln c_i].\]

Logrithm of non-negative matrix

If a non-negative matrix \(\Phi\) include some zeros entries, then we take logrithm and will obtain \(-\infty\). Sometimes, we want to compute \(\Phi .* log(\Phi)\). If \(\phi_{ik}=0\), the expected result is \(0\) but we will obtain \(NaN=0* Inf\). Thus, we use the following trick to avoid this. \[\Phi .* log(\Phi + \Phi==0).\] So, the \(0*Inf\) is changed to \(0*0\).

Reproducible results

There are many traps that need to know in real data analysis: one of the important things is to set a random seed. When we use the existing algorithm to get the initial values for our proposed algorithm, there may be randomness problems, such as Mclust or Kmeans selecting the initial cluster. If we don’t set a random seed, the results may not be reproduced.