What is sigmoid? what is sigmoid CPD? why can it reduce parameters? and what can it capture?
What is Sigmoid
Actually, Sigmoid is a function with name sigmoid function. It's a mathematical function having an S shape (sigmoid curve). Often, sigmoid function refers to the special case of logistic function shown below and defined by the formula
$$ S(x) = \frac{e^x}{1+e^x} $$
or
$$ S(x) = \frac{1}{1+e^{-x}}$$
What is Sigmoid CPD
Sigmoid CPD(Conditional Probability Distribution) is an variant of linear CPD, it just applies a sigmoid function on the linear result. And the parameters in Sigmoid CPD are just the weights in the linear relationship. For distribution $$$P(y_1 \mid X_1, X_2,…, X_n)$$$ , the linear CPD is
$$ P_L(y_1 \mid X_1, X_2, …, X_n) = w_0 + \sum_i^n w_i X_i $$ where $$$w_i$$$ is the weight of each variable, and $$$w_0$$$ is the bias.
so the sigmoid CPD is $$ P_S(y_1 \mid X_1, X_2, … X_n) = \frac {1}{1+e^{-P_L(X_1, X_2, …, X_n)}}$$
Compared with traditional CPD, whose number of parameters is proportional to $$$ \prod_i^n d_i $$$ ($$$n$$$ is the number of variables and $$$d_i$$$ is the cardinality of the i-th variable), the number of parameters in Sigmoid CPD is reduced to $$$n$$$.
What can Sigmoid CPD capture
Sigmoid CPD can only capture the effect of each variable for the weight is applied to each variable independently. But it can not capture relations between variables because there are no interaction terms between variables(unless one of the variables is a function of some set of other variables, but this is a special case).
For example, Consider genes A and B that might be involved in spinal muscular atrophy. Assume that A has 2 alleles A1 and A2, and B has 2 alleles, B1 and B2. Which of the following relationships between A and B can a sigmoid CPD capture? (From Coursera PGM class PA2 quiz)
1. Alleles A1 and B1 each independently make a person likely to have SMA.
2. When the alleles are A1 and B2 or A2 and B1 the person has SMA; otherwise the person does not have SMA.
3. Neither gene A nor gene B contribute to SMA.
4. Allele A1 and allele B1 make a person more likely to be have SMA when both of these alleles are present, but neither affect SMA otherwise.
5. Allele A1 and allele B1 make a person equally more likely to have SMA, but when both are present the effect on SMA is the same as when only one is present.
6. Allele A1 makes a person more likely to have SMA, while allele B1 independently makes a person less likely to have SMA.
7. Gene A contributes to SMA, but gene B does not contribute to SMA and thus does not affect the effects of gene A on SMA.
The answer is 1, 3, 6, 7.
For each choice, reasons are:
1. Since their contributions are independent, a sigmoid CPD that weights the alleles for each gene based on the extent of their contribution would capture this perfectly.
2. This XOR relationship means that the effect of the allele for gene A depends on which allele for gene B is present; since the sigmoid CPD does not have interactive terms, it will not be able to capture this.
3. A sigmoid CPD can capture this by giving alleles for copies of gene A as well as alleles for copies of gene B weights with value zero.
4. This AND relationship cannot be captured by a sigmoid CPD because interaction terms between the alleles are not present.
5. This AND relationship cannot be captured by a sigmoid CPD because interaction terms between the alleles are not present.
6. A sigmoid CPD can capture this by making the weights for the indicators for allele A1 positive while making the weights for the indicators for allele B1 negative.
7. A sigmoid CPD can capture this by giving the alleles for copies of gene A positive weights and the alleles for copies of gene B zero weights.