Generating Random Numbers

This post is by Eric Novik | 29 Oct 2016
2 minute read



This post was inspired by examples from Introducing Monte Carlo Methods with R by Robert and Casella.

Introduction

Random number generation is a fundamental problem in any probabilistic system. The numbers generated from functions like rnorm() are not, strictly speaking, random. They are deterministic numbers that have the properties of random numbers. For a discussion on how to test these properties, see John Cook’s description in the book Beautiful Testing.

We can verify that the sequence is deterministic by setting the seed of the random number generator as follows.

set.seed(123)
runif(5)

    ## [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673

set.seed(123)
runif(5)

    ## [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673

Generating Numbers from a Uniform Distribution

It turns out that the only thing we need to generate random numbers from an arbitrary distribution with a known CDF is the ability to generate uniform random numbers. We can then generate samples from any distribution using a probability integral transform or an inverse transform. To see this, we start from the other end. If \(X\) is a random variable with \(F\) as its CDF, and we let \(Y = F(X)\) then \(Y \sim Uniform(0, 1)\). In words, if we feed a random variable generated from some distribution through its own CDF, we get back \(Uniform(0, 1)\).

First, let’s see if it looks like it is true using a simulation. Suppose \(X\) is drawn from a standard logistic distribution with the CDF \(F(x) = 1 / (1 + exp^{-x})\)

library(ggplot2)
n <- 1e5
X <- rlogis(n) # draw X from standard logistic
plot_dens <- function(data) {
  qplot(
    data,
    geom = "density",
    fill = "red",
    color = "red",
    alpha = I(1 / 5)
  ) + theme_minimal() + theme(legend.position = 'none')
}
plot_dens(X)
one dense 1
Y <- plogis(X) # Feed X back to its own CDF
plot_dens(Y)
one dense 2

\(Y\) was generated from \(X\), where \(X \sim Logistic(0, 1)\). Now let’s compare it to a uniform distribution generated with R’s runif() function.

d <- data.frame(unif_from_logis = Y,
                unif_from_runif = runif(n))
d <- tidyr::gather(d)
plot_dens <- function(data) {
  qplot(
    value,
    fill = key,
    color = key,
    alpha = I(1 / 5),
    geom = "density",
    data = data
  ) + theme_minimal() + facet_wrap(~ key)
}
plot_dens(d)
two dense 1

Probability Integral Transform

Indeed, \(Y\) looks uniform and indistinguishable from \(U\), which was generated from the uniform distribution directly. Why is that? Here is the math:

If we start from the other end, where we let \(X = F^{-1}(U)\):

For an intuitive explanation of why this works, check out the following video by Brian Zhang.

Generate Random Numbers from an Arbitrary Density with a Known CDF

Using the above result, we can obtain draws from a target distribution using the uniform. For example, if we solve the logistic CDF for \(x\), we get \(x = -\log(1 - U) / U\), where \(X \sim Logistic(0, 1)\).

U <- runif(n)
X <- -log((1 - U) / U)
d <- data.frame(logis_from_unif = X,
                logis_from_rlogis = rlogis(n))
d <- tidyr::gather(d)
plot_dens(d)
compare one

These distributions look almost identical and indeed they are, but what if we do not know the analytical form of the CDF? In that case, we can not use this method, and must resort to numerical integration techniques such as MCMC. We will cover those in later posts.