Methods¶

This section describes our mathematical models of differences in functional connectivity and the methods we use to estimate anomalous connections and regions in unhealthy patients and groups. All these models are defined in fcdiff.model.

Nomenclature¶

A brief list of letter and symbols used for mathematical notation.

\(N\) : number of regions.
\(H\) : number of healthy subjects.
\(U\) : number of unhealthy patients.
\(R\) : anomalous region indicator.
\(T\) : anomalous connection indicator.
\(F\) : functional connectivity of a subject.
\(\tilde{F}\) : functional connectivity of a patient.
\(B\) : correlation of a subject.
\(\tilde{B}\) : correlation of a patient.

Individual Anomalous Regions¶

First, we describe the individual anomalous region (IAR) model, in which the anomalous regions are not shared across the group of patients, though their parameters or effects are. This model is defined in fcdiff.model.UnsharedRegionModel.

Generative Model¶

If you are familiar with graphical models, then look at Fig. 1 for a summary of the generative model.

Fig. 1 The graph that represents the conditonal dependences between hidden random variables (unshaded circles), observed random variables (shaded circles) and unknown fixed parameters (rounded rectangles) in our model. The sharp rectangles are plates that represent the number of times a variable or parameter is repeated. The right side of the figure shows the relationship between \(R\) and \(T\).

Now we go into more detail on the distributions of these random variables and their parameters.

Let \(R_{nu}\) be a Bernoulli random variable indicating that region \(n\) of patient \(u\) is anomalous. \(R_{nu}\) is drawn from the distribution

\[p(r_{nu}; \pi) = (1 - \pi)^{1 - r_{nu}} \pi^{r_{nu}}\]

where \(\pi \in (0, 1)\) is the parameter of a Bernoulli distribution. You can sample from this distribution with UnsharedRegionModel.sample_R().

Let \(T_{nmu}\) be a Bernoulli random variable indicating that the connection between regions \(n\) and \(m\) of patient \(u\) is anomalous. \(T_{nmu}\) is dependent on the anomalous state of the regions at either end of the connection, and is drawn from the distribution

\[\begin{split}p(t_{nmu} | r_{nu}, r_{mu}; \eta) \begin{cases} \delta(t_{nmu}) & \mathrm{if} \, r_{nu} = r_{mu} = 0 \\ \delta(1 - t_{nmu}) & \mathrm{if} \, r_{nu} = r_{mu} = 1 \\ (1 - \eta)^{1 - t_{nmu}} \eta^{t_{nmu}} & \mathrm{if} \, r_{nu} \neq r_{mu} \end{cases}\end{split}\]

where \(\delta\) is the Dirac delta function and \(\eta \in (0, 1)\) is the parameter of a Bernoulli distribution. \(T_{nmu}\) is deterministic if the anomalous state of regions \(n\) and \(m\) in patient \(u\) is the same, and is a Bernoulli random variable with parameter \(\eta\) if they are different. This distribution encourages anomalous networks containing cliques of anomalous nodes, where larger values of \(\eta\) allow more edges outside of cliques to be affected. You can sample from this distribution with UnsharedRegionModel.sample_T().

Let \(F_{nm}\) be a multinomial random variable indicating the state of healthy connectivity between regions \(n\) and \(m\). We use three states of connectivity:

\(f_{nm-1} = 1\) denotes negative connectivity;
\(f_{nm0} = 1\) denotes no connectivity;
\(f_{nm1} = 1\) denotes positive connectivity.

Exactly one component of \(f_{nm}\) must be equal to one. \(F_{nm}\) is drawn from the distribution

\[p(f_{nm}; \gamma) = \prod_{k = -1}^1 \gamma_k^{f_{nmk}}\]

where \(\gamma = (\gamma_{-1}, \gamma_0, \gamma_{1})\) is the parameter vector of a multinomial distribution such that \(\gamma_k \in (0, 1) \, \forall k\) and \(\sum_{k = -1}^1 \gamma_k = 1\). You can sample from this distribution with UnsharedRegionModel.sample_F().

Let \(\tilde{F}_{nmu}\) be a multinomial random variable indicating the state of connectivity between regions \(n\) and \(m\) of patient \(u\). \(\tilde{F}_{nmu}\) is dependent on \(T_{nmu}\), the anomalous state of the connection between regions \(n\) and \(m\) of patient \(u\), and on \(F_{nm}\), the healthy connectivity state between regions \(n\) and \(m\). \(\tilde{F}_{nmu}\) is drawn from the distribution

\[\begin{split}p(\tilde{f}_{nmu} | f_{nm}, t_{nmu}; \epsilon) = \begin{cases} (1 - \epsilon)^{f_{nm}^\top \tilde{f}_{nmu}} \left( \frac{\epsilon}{2} \right)^{1 - f_{nm}^\top \tilde{f}_{nmu}} & \mathrm{if} \, t_{nmu} = 0 \\ \epsilon^{f_{nm}^\top \tilde{f}_{nmu}} \left( \frac{1 - \epsilon}{2} \right)^{1 - f_{nm}^\top \tilde{f}_{nmu}} & \mathrm{if} \, t_{nmu} = 1 \end{cases}\end{split}\]

where \(\epsilon \in (0, 1)\) is the parameter of a Bernoulli distribution. If the connection between regions \(n\) and \(m\) of patient \(u\) is anomalous, the connectivity state is perturbed from the healthy template with high probability \(1 - \epsilon\). Conversely, if the connection is typical, the connectivity state is perturbed with small probability \(\epsilon\). You can sample from this distribution with UnsharedRegionModel.sample_F_tilde().

Let \(B_{nmh}\) be the random Pearson correlation coefficient between the fMRI BOLD contrast time series from regions \(n\) and \(m\) of healthy subject \(h\). This is dependent on the healthy connectivity state, and is drawn from a mixture of Normal distributions

\[p(b_{nmh} | f_{nm}; \mu, \sigma) = \prod_{k = -1}^1 \mathcal{N}(b_{nmh}; \mu_k, \sigma_k^2)^{f_{nmk}}\]

where \(\mu = (\mu_{-1}, \mu_0, \mu_{1})\), \(\sigma = (\sigma_{-1}, \sigma_0, \sigma_{1})\) and \(\mathcal{N}(\cdot; \mu_k, \sigma_k^2)\) is a Normal distribution with mean \(\mu_k\) and variance \(\sigma_k^2\). You can sample from this distribution with UnsharedRegionModel.sample_B().

Similarly, let \(\tilde{B}_{nmu}\) denote the random Pearson correlation coefficient between the fMRI BOLD contrast time series from regions \(n\) and \(m\) of patient \(u\). This is dependent on the connectivity state of the patient, and is drawn from the same mixture of Normal distributions as the healthy correlations

\[p(\tilde{b}_{nmh} | \tilde{f}_{nm}; \mu, \sigma) = \prod_{k = -1}^1 \mathcal{N}(\tilde{b}_{nmh}; \mu_k, \sigma_k^2)^{\tilde{f}_{nmk}}\]

except that it is conditional on \(\tilde{F}\) instead of \(F\). You can sample from this distribution with UnsharedRegionModel.sample_B_tilde().

We assume independence between all healthy subjects and patients, independence between healthy connections and independence between regions, and thus obtain the full joint distribution

\[\begin{split}p&(f, b, r, t, \tilde{f}, \tilde{b}; \theta) = p(f; \gamma) p(b | f; \mu, \sigma) p(r; \pi) p(t | r; \eta) p(\tilde{f} | f, t; \epsilon) p(\tilde{b} | \tilde{f}; \mu, \sigma) \\ = &\left( \prod_{n = 1}^N \prod_{m > n} p(f_{nm}; \gamma) \prod_{h = 1}^H p(b_{nmh} | f_{nm}; \mu, \sigma) \right) \\ &\left( \prod_{n = 1}^N \prod_{u = 1}^U p(r_{nu}; \pi) \prod_{m > n} p(t_{nmu} | r_{nu}, r_{mu}; \eta) p(\tilde{f}_{nmu} | f_{nm}, t_{nmu}; \epsilon) p(\tilde{b} | \tilde{f}_{nmu}; \mu, \sigma) \right)\end{split}\]

where \(\theta = (\pi, \gamma, \eta, \epsilon, \mu, \sigma)\).

Exact Inference¶

In [2] three inference algorithms are discussed. Here, we only present the algorithm that gave the best estimation performance on synthetic data experiments and computational efficiency. Similarly, only this algorithm is available in the toolbox.

Our goal is to infer the posterior probability distribution \(p(r_{nu} | \tilde{b}, b; \theta)\) for all regions \(n \in \{1, ..., N\}\) and in all patients \(u \in \{1, ..., U\}\). This requires marginalizing out all latent random variables in the model to compute the partition function \(p(\tilde{b}, b; \theta)\).

First, note that we can easily sum over \(T_{nmu}\)

\[\begin{split}p(\tilde{f}_{nmu} | f_{nm}, r_{nu}, r_{mu}; \eta, \epsilon) &= \sum_{t_{nmu}} p(t_{nmu} | r_{nu}, r_{mu}; \eta) p(\tilde{f}_{nmu} | f_{nm}, t_{nmu}; \epsilon) \\ &= \begin{cases} (1 - \epsilon)^{f_{nm}^\top \tilde{f}_{nmu}} \left( \frac{\epsilon}{2} \right)^{1 - f_{nm}^\top \tilde{f}_{nmu}} & \mathrm{if} \, r_{nu} = r_{mu} = 0 \\ \epsilon^{f_{nm}^\top \tilde{f}_{nmu}} \left( \frac{1 - \epsilon}{2} \right)^{1 - f_{nm}^\top \tilde{f}_{nmu}} & \mathrm{if} \, r_{nu} = r_{mu} = 1 \\ \tilde{\epsilon}^{f_{nm}^\top \tilde{f}_{nmu}} \left( \frac{1 - \tilde{\epsilon}}{2} \right)^{1 - f_{nm}^\top \tilde{f}_{nmu}} & \mathrm{if} \, r_{nu} \neq r_{mu} \end{cases}\end{split}\]

where \(\tilde{\epsilon} = \eta \epsilon + (1 - \eta)(1 - \epsilon)\).

Note

The marginalization of \(T\) comes at the expense of coupling \(\epsilon\) with \(\eta\).

Next, we marginalize out \(\tilde{F}_{nmu}\)

\[\begin{split}p(\tilde{b}_{nmu} | f_{nm}, r_{nu}, r_{mu}; \theta) &= \sum_{\tilde{f}_{nmu}} p(\tilde{f}_{nmu} | f_{nm}, r_{nu}, r_{mu}; \eta, \epsilon) p(\tilde{b}_{nmu} | \tilde{f}_{nmu}; \mu, \sigma) \\ &= \sum_{k=-1}^{1} \mathcal{N}(\tilde{b}_{nmu}; \mu_{k}, \sigma_{k}^2) \\ &\qquad \left( (1 - \epsilon)^{f_{nmk}} \left( \frac{\epsilon}{2} \right)^{1 - f_{nmk}} \right)^{(1 - r_{nu}) (1 - r_{mu})} \\ &\qquad \left( \epsilon^{f_{nmk}} \left( \frac{1 - \epsilon}{2} \right)^{1 - f_{nmk}} \right)^{r_{nu} r_{mu}} \\ &\qquad \left( \tilde{\epsilon}^{f_{nmk}} \left( \frac{1 - \tilde{\epsilon}}{2} \right)^{1 - f_{nmk}} \right)^{r_{nu} (1 - r_{mu}) + (1 - r_{nu}) r_{mu}} \\ &= \prod_{k = -1}^{1} \bigg( \mathcal{M}_{k0}(\tilde{b}_{nmu}; \theta)^{r_{nu} r_{mu}} \\ &\qquad \mathcal{M}_{k1}(\tilde{b}_{nmu}; \theta)^{(1 - r_{nu}) (1 - r_{mu})} \\ &\qquad \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta)^{(r_{nu} (1 - r_{mu}) + (1 - r_{nu}) r_{mu})} \bigg)^{f_{nmk}}\end{split}\]

where

\[\begin{split}\mathcal{M}_{k0}(\tilde{b}_{nmu}; \theta) &= (1 - \epsilon) \mathcal{N}(\tilde{b}; \mu_k, \sigma_k^2) + \frac{\epsilon}{2} \sum_{l \neq k} \mathcal{N}(\tilde{b}_{nmu}; \mu_l, \sigma_l^2) \\ \mathcal{M}_{k1}(\tilde{b}_{nmu}; \theta) &= \epsilon \mathcal{N}(\tilde{b}; \mu_k, \sigma_k^2) + \frac{1 - \epsilon}{2} \sum_{l \neq k} \mathcal{N}(\tilde{b}_{nmu}; \mu_l, \sigma_l^2) \\ \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) &= \tilde{\epsilon} \mathcal{N}(\tilde{b}; \mu_k, \sigma_k^2) + \frac{1 - \tilde{\epsilon}}{2} \sum_{l \neq k} \mathcal{N}(\tilde{b}_{nmu}; \mu_l, \sigma_l^2).\end{split}\]

Note

The marginalization of \(T\) comes at the expense of coupling \(\epsilon\) and \(\eta\) with \(\mu\) and \(\sigma\).

As \(\epsilon\) is assumed to be small \(\mathcal{M}_{k0}\) is dominated by the likelihood of the correlation being drawn from the \(k^\mathrm{th}\) Normal distribution, whereas \(\mathcal{M}_{k1}\) is dominated by the likelihoods of the correlation being drawn from the other Normal distributions. Finally, \(\mathcal{M}_{k \neq}\) is an interpolation between the other two terms that tends to \(\mathcal{M}_{k0}\) as \(\eta \to 0\) and tends to \(\mathcal{M}_{k1}\) as \(\eta \to 1\).

Next, we could marginalize out \(F_{nm}\), but this would complicate the form enough to make inference difficult and would also couple \(\gamma\) with \((\epsilon, \eta, \mu, \sigma)\).

Finally, note that there is no way to analytically marginalize over \(R\), because of the pair wise conditional dependence between \(\tilde{F}\) and \(R\). Furthermore, it is very computationally expensive to perform the brute force summation which would require a sum over \(2^N\) terms - one for each value the binary \(N\) vector can take on.

Now we can construct the joint probability over all the remaining random variables

\[\begin{split}p(f, b, r, \tilde{b}; \theta) &= p(f; \gamma) p(b | f; \mu, \sigma) p(r; \pi) p(\tilde{b} | f, r; \theta) \\ &= \left( \prod_{n=1}^N \prod_{m > n} p(f_{nm}; \gamma) \prod_{h=1}^H p(b_{nmh} | f_{nm}; \mu, \sigma) \right) \\ &\quad \left( \prod_{u=1}^U \prod_{n=1}^N p(r_{nu}; \pi) \prod_{m > n} p(\tilde{b}_{nmu} | f_{nm}, r_{nu}, r_{mu}; \eta, \epsilon, \mu, \sigma) \right).\end{split}\]

Variational Inference¶

To perform inference without marginalizing out the remaining hidden random variables, \(F\) and \(\tilde{F}\), we use a variational mean field approximation of their posterior. The factorization takes the form

\[\begin{split}p(f, r | b, \tilde{b}) &\approx q(f, r) = q_F(f) q_R(r) \\ &= \left( \prod_{n = 1}^{N} \prod_{m > n} \prod_{k = -1}^{1} q_{F_{nmk}}^{f_{nmk}} \right) \left( \prod_{n = 1}^N \prod_{u = 1}^U q_{R_{nu0}}^{1 - r_{nu}} q_{R_{nu1}}^{r_{nu}} \right).\end{split}\]

Computing the posterior \(p(r | b, \tilde{b}; \theta)\) also requires an estimate of \(\theta\).

To estimate both \(\theta\) and the variational factors, we minimize the variational free energy

\[\begin{split}\mathcal{E}(q, \theta; b, \tilde{b}) &= - \mathbb{E}_q \left[ \log p(f, b, r, \tilde{b}; \theta) \right] + \mathbb{E}_q \left[ \log q(f, r) \right] \\ &= - \mathbb{E}_{q_F} \left[ \log p(f; \gamma) \right] - \mathbb{E}_{q_F} \left[ \log p(b | f; \mu, \sigma) \right] - \mathbb{E}_{q_R} \left[ \log p(r; \pi) \right] \\ &\qquad - \mathbb{E}_{q_F q_R} \left[ \log p(\tilde{b} | f, r; \epsilon, \eta, \mu, \sigma) \right] + \mathbb{E}_{q_R} \left[ \log q_R(r) \right] + \mathbb{E}_{q_F} \left[ \log q_F(f) \right]\end{split}\]

where

\[\begin{split}\mathbb{E}_{q_F} \left[ \log p(f; \gamma) \right] &= \sum_{n = 1}^N \sum_{m > n} \sum_{k = -1}^{1} q_{F_{nmk}} \log \gamma_k \\ \mathbb{E}_{q_R} \left[ \log p(r; \pi) \right] &= \sum_{n = 1}^N \sum_{u = 1}^U q_{R_{nu0}} \log (1 - \pi) + q_{R_{nu1}} \log \pi \\ \mathbb{E}_{q_F} \left[ \log p(b | f; \mu, \sigma) \right] &= \sum_{n = 1}^N \sum_{m > n} \sum_{k = -1}^{1} q_{F_{nmk}} \sum_{h = 1}^H \log \mathcal{N}(b_{nmh}; \mu_k, \sigma_k^2) \\ \mathbb{E}_{q_F q_R} \left[ \log p(\tilde{b} | f, r; \epsilon, \eta, \mu, \sigma) \right] &= \sum_{n = 1}^N \sum_{m > n} \sum_{k = -1}^{1} q_{F_{nmk}} \sum_{u = 1}^U \bigg( q_{R_{nu0}} q_{R_{mu0}} \log \mathcal{M}_{k0}(\tilde{b}_{nmu}; \theta) \\ &\qquad + q_{R_{nu1}} q_{R_{mu1}} \log \mathcal{M}_{k1}(\tilde{b}_{nmu}; \theta) \\ &\qquad + \left( q_{R_{nu0}} q_{R_{mu1}} + q_{R_{nu1}} q_{R_{mu0}} \right) \log \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) \bigg) \\ \mathbb{E}_{q_F} \left[ \log q(f) \right] &= \sum_{n = 1}^N \sum_{m > n} \sum_{k = -1}^{1} q_{F_{nmk}} \log q_{F_{nmk}} \\ \mathbb{E}_{q_R} \left[ \log q(r) \right] &= \sum_{n = 1}^N \sum_{u = 1}^U q_{R_{nu0}} \log q_{R_{nu0}} + q_{R_{nu1}} \log q_{R_{nu1}}.\end{split}\]

The algorithm for performing this minimization is

\[\begin{split}\begin{array}{lll} \text{Line} & \text{Operation} \\ \hline 1 & e \gets \mathcal{E}(q, \theta, b, \tilde{b}) \\ 2 & \text{for } s = 1 \dots S \\ 3 & \quad q_F^* \gets \mathcal{U}_{q_F}(q_R, \theta, b, \tilde{b}) \\ 4 & \quad q_R^* \gets \mathcal{U}_{q_R}(q_R,q_F^*, \theta, b, \tilde{b}) \\ 5 & \quad \theta^* \gets \mathcal{U}_{\theta}(q^*, \theta, b, \tilde{b}) \\ 6 & \quad e^* \gets \mathcal{E}(q^*, \theta^*, b, \tilde{b}) \\ 7 & \quad \text{if } (e - e^*) / e < \xi \\ 8 & \quad \quad \text{break} \\ 9 & \quad q \gets q^*, \, \theta \gets \theta^*, \, e \gets e^* \end{array}\end{split}\]

where \(\xi\) is the relative tolerance used to detect convergence before the maximum number of iteration steps \(S\).

The \(\mathcal{U}_{q_F}\) function is determined by the following update equation

\[\begin{split}\log q_{F_{nmk}}^* &= \log \gamma_k + \sum_{h=1}^H \log \mathcal{N}(b_{nmh}; \mu_k, \sigma_k) \\ &\quad + \sum_{u=1}^U q_{R_{nu0}} q_{R_{mu0}} \log \mathcal{M}_{k0}(\tilde{b}_{nmu}; \theta) + q_{R_{nu1}} q_{R_{mu1}} \log \mathcal{M}_{k1}(\tilde{b}_{nmu}; \theta) \\ &\qquad + \left( q_{R_{nu1}} q_{R_{mu0}} + q_{R_{nu0}} q_{R_{mu1}} \right) \log \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) + \text{const.}\end{split}\]

where \(\text{const.}\) is used to imply that \(\log q_{F_{nmk}}^*\) is further normalized to ensure that \(\sum_{k=-1}^1 q_{F_{nmk}}^* = 1\).

Note

We optimize w.r.t. \(\log q\) to ensure positivity of \(q\).

The \(\mathcal{U}_{q_R}\) function is determined by the following update equations

\[\begin{split}\log q_{R_{nu0}}^* &= \log(1 - \pi) \\ &\, + \sum_{m \neq n} \sum_{k=-1}^{1} q_{F_{nmk}} \left( q_{R_{mu0}} \log \mathcal{M}_{k0}(\tilde{b}_{nmu}; \theta) + q_{R_{mu1}} \log \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) \right) + \text{const.}\end{split}\]

\[\begin{split}\log q_{R_{nu1}}^* &= \log \pi \\ &\, + \sum_{m \neq n} \sum_{k=-1}^{1} q_{F_{nmk}} \left( q_{R_{mu1}} \log \mathcal{M}_{k1}(\tilde{b}_{nmu}; \theta) + q_{R_{mu0}} \log \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) \right) + \text{const.}\end{split}\]

where \(\text{const.}\) is used to imply that \(\log q_{R_{nu0}}^*\) and \(\log q_{R_{nu1}}^*\) are further normalized to ensure that \(q_{R_{nu0}}^* + q_{R_{nu1}}^* = 1\).

Note

Here, we estimate each component of the variational parameters and then project the solution onto the solution space of probability distributions. Instead, we could incorporate the equality constraints into the linear system of equations. However, note that means we would have to optimize w.r.t. \(q\) instead of \(\log q\) and thus would need to explicitly handle the positivity constraints.

The \(\mathcal{U}_{\theta}\) function is determined by the following update equations

\[\pi^* = \frac{1}{UN} \sum_{u=1}^U \sum_{n=1}^N q_{R_{nu1}}\]

\[\gamma_k^* = \sum_{n=1}^N \sum_{m > n} q_{F_{nmk}} + \text{const.}\]

as well as the multivariable minimization of \(\mathcal{E}\) with respect to \((\mu, \sigma, \epsilon, \eta)\). This minimization can be performed using a variety of iterative descent methods. By default, a trust region reflective Gauss-Newton method is used. In order to use a descent method, we need the following derivatives.

The derivative of the energy w.r.t. \(\mu_j\) is

\[\begin{split}\frac{\partial \mathcal{E}}{\partial \mu_j} &= - \frac{\partial}{\partial \mu_j} \left( \mathbb{E}_{q_F} \left[ \log p(b | f; \mu, \sigma) \right] + \mathbb{E}_{q_F q_R} \left[ \log p(\tilde{b} | f, r; \theta) \right] \right) \\ &= - \sum_{n=1}^N \sum_{m > n} \bigg( q_{F_{nmj}} \sum_{h=1}^H \frac{\partial}{\partial \mu_j} \left( \log \mathcal{N}(b_{nmh}; \mu_j, \sigma_j^2) \right) \\ &\quad + \sum_{k=-1}^1 q_{F_{nmk}} \bigg( \sum_{u=1}^U q_{R_{nu0}} q_{R_{mu0}} \frac{\partial}{\partial \mu_j} \left( \log \mathcal{M}_{k0}(\tilde{b}_{nmu}; \theta) \right) \\ &\qquad + q_{R_{nu1}} q_{R_{mu1}} \frac{\partial}{\partial \mu_j} \left( \log \mathcal{M}_{k1}(\tilde{b}_{nmu}; \theta) \right) \\ &\qquad + \left( q_{R_{nu0}} q_{R_{mu1}} + q_{R_{nu1}} q_{R_{mu0}} \right) \frac{\partial}{\partial \mu_j} \left( \log \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) \right) \bigg) \bigg)\end{split}\]

where

\[\frac{\partial}{\partial \mu_j} \left( \log \mathcal{M}_{k0}(b; \theta) \right) = \mathcal{M}_{k0}(b; \theta)^{-1} (1 - \epsilon)^{\delta(j, k)} \left( \frac{\epsilon}{2} \right)^{1 - \delta(j, k)} \frac{\partial}{\partial \mu_j} \left( \mathcal{N}(b; \mu_j, \sigma_j^2) \right)\]

\[\frac{\partial}{\partial \mu_j} \left( \log \mathcal{M}_{k1}(b; \theta) \right) = \mathcal{M}_{k1}(b; \theta)^{-1} \epsilon^{\delta(j, k)} \left( \frac{1 - \epsilon}{2} \right)^{1 - \delta(j, k)} \frac{\partial}{\partial \mu_j} \left( \mathcal{N}(b; \mu_j, \sigma_j^2) \right)\]

\[\frac{\partial}{\partial \mu_j} \left( \log \mathcal{M}_{k \neq}(b; \theta) \right) = \mathcal{M}_{k \neq}(b; \theta)^{-1} \tilde{\epsilon}^{\delta(j, k)} \left( \frac{1 - \tilde{\epsilon}}{2} \right)^{1 - \delta(j, k)} \frac{\partial}{\partial \mu_j} \left( \mathcal{N}(b; \mu_j, \sigma_j^2) \right)\]

and

\[\frac{\partial}{\partial \mu_j} \left( \mathcal{N}(b; \mu_j, \sigma_j^2) \right) = \mathcal{N}(b; \mu_j, \sigma_j^2) \frac{\partial}{\partial \mu_j} \left( \log \mathcal{N}(b; \mu_j, \sigma_j^2) \right)\]

\[\frac{\partial}{\partial \mu_j} \left( \log \mathcal{N}(b; \mu_j, \sigma_j^2) \right) = \sigma_j^{-2} (b - \mu_j)\]

The derivative of the energy w.r.t. \(\sigma_j^2\) is

\[\begin{split}\frac{\partial \mathcal{E}}{\partial \sigma_j^2} &= - \frac{\partial}{\partial \sigma_j^2} \left( \mathbb{E}_{q_F} \left[ \log p(b | f; \mu, \sigma) \right] + \mathbb{E}_{q_F q_R} \left[ \log p(\tilde{b} | f, r; \theta) \right] \right) \\ &= - \sum_{n=1}^N \sum_{m > n} \bigg( q_{F_{nmj}} \sum_{h=1}^H \frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{N}(b_{nmh}; \mu_j^2, \sigma_j^2) \right) \\ &\quad + \sum_{k=-1}^1 q_{F_{nmk}} \bigg( \sum_{u=1}^U q_{R_{nu0}} q_{R_{mu0}} \frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{M}_{k0}(\tilde{b}_{nmu}; \theta) \right) \\ &\qquad + q_{R_{nu1}} q_{R_{mu1}} \frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{M}_{k1}(\tilde{b}_{nmu}; \theta) \right) \\ &\qquad + \left( q_{R_{nu0}} q_{R_{mu1}} + q_{R_{nu1}} q_{R_{mu0}} \right) \frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) \right) \bigg) \bigg)\end{split}\]

where

\[\frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{M}_{k0}(b; \theta) \right) = \mathcal{M}_{k0}(b; \theta)^{-1} (1 - \epsilon)^{\delta(j, k)} \left( \frac{\epsilon}{2} \right)^{1 - \delta(j, k)} \frac{\partial}{\partial \sigma_j^2} \left( \mathcal{N}(b; \mu_j, \sigma_j^2) \right)\]

\[\frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{M}_{k1}(b; \theta) \right) = \mathcal{M}_{k1}(b; \theta)^{-1} \epsilon^{\delta(j, k)} \left( \frac{1 - \epsilon}{2} \right)^{1 - \delta(j, k)} \frac{\partial}{\partial \sigma_j^2} \left( \mathcal{N}(b; \mu_j, \sigma_j^2) \right)\]

\[\frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{M}_{k \neq}(b; \theta) \right) = \mathcal{M}_{k \neq}(b; \theta)^{-1} \tilde{\epsilon}^{\delta(j, k)} \left( \frac{1 - \tilde{\epsilon}}{2} \right)^{1 - \delta(j, k)} \frac{\partial}{\partial \sigma_j^2} \left( \mathcal{N}(b; \mu_j, \sigma_j^2) \right)\]

and

\[\frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{N}(b; \mu_j, \sigma_j^2) \right) = (2 \sigma_j^2)^{-1} \left( (b - \mu_j)^2 - \sigma_j^2 \right)\]

\[\frac{\partial}{\partial \sigma_j^2} \left( \mathcal{N}(b; \mu_j, \sigma_j^2) \right) = \mathcal{N}(b; \mu_j, \sigma_j^2) \frac{\partial}{\partial \sigma_j^2} \left( \log \mathcal{N}(b; \mu_j, \sigma_j^2) \right)\]

The derivative of the energy w.r.t. \(\eta\) is

\[\begin{split}\frac{\partial \mathcal{E}}{\partial \eta} &= - \frac{\partial}{\partial \eta} \left( \mathbb{E}_{q_F q_R} \left[ \log p(\tilde{b} | f, r; \theta) \right] \right) \\ &= - \sum_{n=1}^N \sum_{m > n} \sum_{k = -1}^1 q_{F_{nmk}} \sum_{u=1}^U \left( q_{R_{nu0}} q_{R_{mu1}} + q_{R_{nu1}} q_{R_{mu0}} \right) \frac{\partial}{\partial \eta} \left( \log \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) \right)\end{split}\]

where

\[\frac{\partial}{\partial \eta} \left( \log \mathcal{M}_{k \neq}(\tilde{b}; \theta) \right) = \mathcal{M}_{k \neq}(\tilde{b}; \theta)^{-1} \left( (2 \epsilon - 1) \mathcal{N}(\tilde{b}; \mu_k, \sigma_k^2) - \frac{(2 \epsilon - 1)}{2} \sum_{l \neq k} \mathcal{N}(\tilde{b}; \mu_l, \sigma_l) \right)\]

The derivative of the energy w.r.t. \(\epsilon\) is

\[\begin{split}\frac{\partial \mathcal{E}}{\partial \epsilon} &= - \frac{\partial}{\partial \epsilon} \left( \mathbb{E}_{q_F q_R} \left[ \log p(\tilde{b} | f, r; \theta) \right] \right) \\ &= - \sum_{n=1}^N \sum_{m > n} \sum_{k = -1}^1 q_{F_{nmk}} \bigg( \sum_{u=1}^U q_{R_{nu0}} q_{R_{mu0}} \frac{\partial}{\partial \epsilon} \left( \log \mathcal{M}_{k0}(\tilde{b}_{nmu}; \theta) \right) \\ &\qquad + q_{R_{nu1}} q_{R_{mu1}} \frac{\partial}{\partial \epsilon} \left( \log \mathcal{M}_{k1}(\tilde{b}_{nmu}; \theta) \right) \\ &\qquad + \left( q_{R_{nu0}} q_{R_{mu1}} + q_{R_{nu1}} q_{R_{mu0}} \right) \frac{\partial}{\partial \epsilon} \left( \log \mathcal{M}_{k \neq}(\tilde{b}_{nmu}; \theta) \right) \bigg)\end{split}\]

where

\[\frac{\partial}{\partial \epsilon} \left( \log \mathcal{M}_{k0}(\tilde{b}; \theta) \right) = \mathcal{M}_{k0}(\tilde{b}; \theta)^{-1} \left( \frac{1}{2} \sum_{l \neq k} \mathcal{N}(\tilde{b}; \mu_l, \sigma_l^2) - \mathcal{N}(\tilde{b}; \mu_k, \sigma_k^2) \right)\]

\[\frac{\partial}{\partial \epsilon} \left( \log \mathcal{M}_{k1}(\tilde{b}; \theta) \right) = \mathcal{M}_{k1}(\tilde{b}; \theta)^{-1} \left( \mathcal{N}(\tilde{b}; \mu_k, \sigma_k^2) - \frac{1}{2} \sum_{l \neq k} \mathcal{N}(\tilde{b}; \mu_l, \sigma_l^2) \right)\]

\[\frac{\partial}{\partial \epsilon} \left( \log \mathcal{M}_{k \neq}(\tilde{b}; \theta) \right) = \mathcal{M}_{k \neq}(\tilde{b}; \theta)^{-1} \left( (2 \eta - 1) \mathcal{N}(\tilde{b}; \mu_k, \sigma_k^2) - \frac{(2 \eta - 1)}{2} \sum_{l \neq k} \mathcal{N}(\tilde{b}; \mu_l, \sigma_l^2) \right)\]