Category: Side Quests

  • On the products of Bernoulli random variables

    On the products of Bernoulli random variables

    I took another detour into the world of binary neural networks. In the interests of evaluating dot-product-like functions of two binary vectors, I wanted to assess how the distributions of multiplied bits (equivalent to ANDed bits) relate to the distributions of the individual bits. This post captures the results of this side-trip.

    Products of independent Bernoullis

    The simplest case is in the light of independence, when XBern(πX)

    YBern(πY) and P(X,Y)=P(X)P(Y)

    Let Z=XY be the product of the two bits, equivalent to Z=XY. As Z{0,1} we can model ZBern(πZ) for some πZ.

    The truth table for the four possible outcomes guides us here:

    XYZP(X)P(Y)
    0001πX1πY
    0101πXπY
    100πX1πY
    111πXπY

    P(Z=1)=P(X=1)P(Y=1)=πXπY; a bit of arithmetic also shows that P(Z=0)=1πXπY. With πX[0,1]πY[0,1]πXπY[0,1], we see that ZBern(πXπY)

    Products of Non-Independent Bernoullis

    Suppose that XBern(πX). A second Bernoulli random variable will be dependent on X only if its single parameter is a function of X, so let Y|XBern(g(X)) where g:{0,1}[0,1] generates the parameter of Y’s distribution.

    As there are only two possible outcomes from such a function g we denote them individually as: πY|X=0=g(0) and πY|X=1=g(1)

    We again are interested in the distribution of a given Z=XY=XY, and again the truth table leads us to a straightforward solution:

    XYZP(X)P(Y|X)
    0001πX1πY|X=0
    0101πXπY|X=0
    100πX1πY|X=1
    111πXπY|X=1

    P(Z=1)=P(X=1)P(Y=1|X=1)=πXπY|X=1=πXg(1)

    A bit of arithmetic shows that P(Z=0)=1πXπY|X=1=1πXg(1)

    With πX[0,1]πY|X=1[0,1]πXπY|X=1[0,1] we conclude that ZBern(πXπY|X=1)

    The parameter πY|X=0=g(0) for the case where X=0 drops out; it is redundant as Y=1|X=0 is simply another way for Z to equal zero.

    In other words, the distribution of Z depends only on the distributions of X and of Y|X=1, not on Y|X=0. The constraint that all events must result in either one or the other of two outcomes allows the behavior of the combined system to be fully characterized with a constant number of parameters, and the distribution of the product / Boolean AND even of dependent Bernoullis is simply yet another Bernoulli.