Bayesian Neural Network
Bayesian Neural Network
A Bayesian neural network is a neural network with a prior distribution on its weights (Neal, 2012).
Consider a data set $(\{(\mathbf{x}_n, y_n)\})$, where each data point comprises of features $(\mathbf{x}_n\in\mathbb{R}^D)$ and output $(y_n\in\mathbb{R})$. Define the likelihood for each data point as \[\begin{aligned} p(y_n \mid \mathbf{w}, \mathbf{x}_n, \sigma^2) &= \text{Normal}(y_n \mid \mathrm{NN}(\mathbf{x}_n\;;\;\mathbf{w}), \sigma^2),\end{aligned}\] where $(\mathrm{NN})$ is a neural network whose weights and biases form the latent variables $(\mathbf{w})$. Assume $(\sigma^2)$ is a known variance.
Define the prior on the weights and biases $(\mathbf{w})$ to be the standard normal \[\begin{aligned} p(\mathbf{w}) &= \text{Normal}(\mathbf{w} \mid \mathbf{0}, \mathbf{I}).\end{aligned}\]
Let’s build the model in Edward. We define a 3-layer Bayesian neural network with $(\tanh)$ nonlinearities.
from edward.models import Normal
def neural_network(x):
h = tf.tanh(tf.matmul(x, W_0) + b_0)
h = tf.tanh(tf.matmul(h, W_1) + b_1)
h = tf.matmul(h, W_2) + b_2
return tf.reshape(h, [-1])
N = 40 # number of data ponts
D = 1 # number of features
W_0 = Normal(loc=tf.zeros([D, 10]), scale=tf.ones([D, 10]))
W_1 = Normal(loc=tf.zeros([10, 10]), scale=tf.ones([10, 10]))
W_2 = Normal(loc=tf.zeros([10, 1]), scale=tf.ones([10, 1]))
b_0 = Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_1 = Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_2 = Normal(loc=tf.zeros(1), scale=tf.ones(1))
x = tf.cast(x_train, dtype=tf.float32)
y = Normal(loc=neural_network(x), scale=0.1 * tf.ones(N))
This program builds the model assuming the features x_train
already exists in the Python environment. Alternatively, one can also define a TensorFlow placeholder,
x = tf.placeholder(tf.float32, [N, D])
The placeholder must be fed with data later during inference.
A toy demonstration is available in the Getting Started section. Source code is available at examples/bayesian_nn.py
in the Github repository.
References
Neal, R. M. (2012). Bayesian learning for neural networks (Vol. 118). Springer Science & Business Media.