Gumbel Softmax
![](/uploads/icon.png)
Gumbel Softmax
aims to tackle
discrete optimization
. This blog contains my understanding
of it, as well as necessary deductions.
REINFORCE
Suppose that there is an underlying distribution
Originally, discrete optimization is tackled in RL style through
REINFORCE algorithm, where gradient of expectation
is
converted to expectation of gradient
.
More problems arise, such as
high variance (convergence issues)
and
sample efficiency
at each step.
Gumbel Softmax
relieves them to a certain extent.
Gumbel-Max
Sampling from an arbitrary discrete distribution can be
reparameterized through the Gumbel-Max
trick. Suppose that
we would like to draw Gumbel-Max
takes the form of:
It can be prooved that:
Merit: The parameters of the underlying distribution
now appear explicitly in the sampling process. Moreover, the sampling
process is transferred to
Gumbel Distribution
Let us inspect the distribution of
Its density function can be derived as
Deriving Gumbel-Max
Denote
We have
Therefore, sampling from arbitrary categorical distribution can be reparameterized to sampling from Gumbel distribution.
Putting It Together
Recall that sampling from arbitrary categorical distribution can be
reparameterized as sampling
To tackle the problem underlying discrete optimization, we would like
to approximate discrete operations
with continuous
operations, such that gradients flow back to the parameters that the
sampled distribution is conditioned on (Here,
It is intuitive that by replacing argmax
with
softmax
, we would obtain a continuous approximation where
gradients to distribution parameter softmax
operation would
approach argmax
in both sampled value and expectation
value.
In the computation graph, we can replace the intended sampled values with:
This approach basically alleviates the sampling process by resorting
to expectation values
.
- Title: Gumbel Softmax
- Author: Jay Zhou
- Created at : 2023-06-01 21:18:48
- Updated at : 2023-06-30 01:07:14
- Link: https://ja1zhou.github.io/2023/06/01/gumbel-softmax/
- License: This work is licensed under CC BY-NC-SA 4.0.