A.1 The Alias Method
If many samples need to be generated from a discrete distribution, using the approach implemented in the SampleDiscrete() function would be wasteful: each generated sample would require computation. That approach could be improved to time by computing a cumulative distribution function (CDF) table once and then using binary search to generate each sample, but there is another option that is even more efficient, requiring just time for each sample; that approach is the alias method.
To understand how the alias method works, first consider the task of sampling from discrete outcomes, each with equal probability. In that case, computing the value gives a uniformly distributed index between 0 and and the corresponding outcome can be selected—no further work is necessary. The alias method allows a similar searchless sampling method if the outcomes have arbitrary probabilities .
The alias method is based on creating bins, one for each outcome. Bins are sampled uniformly and then two values stored in each bin are used to generate the final sample: if the th bin was sampled, then gives the probability of sampling the th outcome, and otherwise the alias is chosen; it is the index of a single alternative outcome. Though we will not include the proof here, it can be shown that this representation—the th bin associated with the th outcome and no more than a single alias per bin—is sufficient to represent arbitrary discrete probability distributions.
With the alias method, if the probabilities are all the same, then each bin’s probability is one, and it reduces to the earlier example with uniform probabilities. Otherwise, for outcomes where the associated probability is greater than the average probability, the outcome will be stored as the alias in one or more of the other bins. For outcomes where the associated is less than the average probability, will be less than one and the alias will point to one of the higher-probability outcomes.
For a specific example, consider the probabilities . A corresponding alias table is shown in Table A.1. It is possible to see that, for example, the first sample is chosen with probability : there is a probability of choosing the first table entry, in which case the first sample is always chosen. Otherwise, there is a probability of choosing the second and third table entries, and for each, there is a chance of choosing the alias, giving in sum an additional probability of choosing the first sample. The other probabilities can be verified similarly.
Index | Alias Index | |
---|---|---|
1 | n/a | |
2 | 1 | |
3 | 1 | |
4 | 2 |
One way to interpret an alias table is that each bin represents of the total probability mass function. If outcomes are first allocated to their corresponding bins, then the probability mass of outcomes that are greater than must be distributed to other bins that have associated probabilities less than . This idea is illustrated in Figure A.1, which corresponds to the example of Table A.1.
The AliasTable class implements algorithms for generating and sampling from alias tables. As with the other sampling code, its implementation is found in util/sampling.h and util/sampling.cpp.
Its constructor takes an array of weights, not necessarily normalized, that give the relative probabilities for the possible outcomes.
The Bin structure represents an alias table bin. It stores the probability , the corresponding outcome’s probability , and an alias.
We have found that with large numbers of outcomes, especially when the magnitudes of their weights vary significantly, it is important to use double precision to compute their sum so that the alias table initialization algorithm works correctly. Therefore, here std::accumulate takes the double-precision value 0. as its initial value, which in turn causes all its computation to be in double precision. Given the sum of weights, the normalized probabilities can be computed.
The first stage of the alias table initialization algorithm is to split the outcomes into those that have probability less than the average and those that have probability higher than the average. Two std::vectors of the Outcome structure are used for this.
Here and in the remainder of the initialization phase, we will scale the individual probabilities by the number of bins , working in terms of . Thus, the average value is 1, which will be convenient in the following.
To initialize the alias table, one outcome is taken from under and one is taken from over. Together, they make it possible to initialize the element of bins that corresponds to the outcome from under. After that bin has been initialized, the outcome from over will still have some excess probability that is not yet reflected in bins. It is added to the appropriate work list and the loop executes again until under and over are empty. This algorithm runs in time.
It is not immediately obvious that this approach will successfully initialize the alias table, or that it will necessarily terminate. We will not rigorously show that here, but informally, we can see that at the start, there must be at least one item in each work list unless they all have the same probability (in which case, initialization is trivial). Then, each time through the loop, we initialize one bin, which consumes worth of probability mass. With one less bin to initialize and that much less probability to distribute, we have the same average probability over the remaining bins. That brings us to the same setting as the starting condition: some of the remaining items in the list must be above the average and some must be below, unless they are all equal to it.
The probability of un must be less than one. We can initialize its bin’s q with , as that is equal to the probability it should be sampled if its bin is chosen. In order to allocate the remainder of the bin’s probability mass, the alias is set to ov. Because , it certainly has enough probability to fill the remainder of the bin—we just need of it.
In initializing bins[un.index], we have consumed worth of the scaled probability mass. The remainder, un.pHat + ov.pHat - 1, is the as-yet unallocated probability for ov.index; it is added to the appropriate work list based on how much is left.
Due to floating-point round-off error, there may be work items remaining on either of the two work lists with the other one empty. These items have probabilities slightly less than or slightly greater than one and should be given probability in the alias table. The fragment that handles this, <<Handle remaining alias table work items>>, is not included in the book.
Given an initialized alias table, sampling is easy. As described before, an entry is chosen with uniform probability and then either the corresponding sample or its alias is returned. As with the SampleDiscrete() function, a new uniform random sample derived from the original one is optionally returned.
The index for the chosen entry is found by multiplying the random sample by the number of entries. Because u was only used for the discrete sampling decision of selecting an initial entry, it is possible to derive a new uniform random sample from it. That computation is done here to get an independent uniform sample up that is used to decide whether to sample the alias at the current entry.
If the initial entry is selected, the various return values are easily computed.
Otherwise the appropriate values for the alias are returned.
Beyond sampling, it is useful to be able to query the size of the table and the probability of a given outcome. These two operations are easily provided.