###### Common Statistical Operations

# Beyond Random() — Normal Distributions in SQL

### Leaving random() Behind

If you’ve ever had to generate sample data in your database, odds are you’ve started with * random()* or its equivalent. It’s fast and it’s reliable. Unfortunately, it’s also evenly distributed. You’re just as likely to get

*as*

**0.5***. Most real-world data is not like this.*

**0.9**After an even distribution, your most basic distribution of random numbers is a normal distribution. In this post, we’ll show you how to use the Marsaglia Polar Method to convert * random()*’s uniform distribution into a normal distribution that looks like this:

### Two Random Numbers Are Better Than One

The Marsaglia method converts a pair of uniformly distributed random numbers into a pair of normally distributed random numbers.

First we need a list of random numbers. We’ll use * generate_series* for brevity. If you’re on Redshift or MySQL, check out the alternatives to generate_series. The Marsaglia method requires numbers between -1 and 1. We’ll generate 100,000 of them:

with

numbersas(

select

2*random()-1asx,

2*random()-1asy

fromgenerate_series(0,100000)

),

The Marsaglia method is a type of rejection sampling, which is a fancy way of saying that it throws away inputs that don’t fit the distribution.

Since it’s based on polar coordinates, the sum of the squares of the numbers needs to be less than one. This ends up filtering out about 𝜋/4 (21%) of the data, so make you you generate enough numbers at the outset.

polar_pointsas(

select

x,

y,

x*x+y*yass

fromnumbers

wherex*x+y*y<1

),

Now we know we have the right set of inputs, so we can turn them into normally distributed pairs. We’ll compute the new pairs separately, and then union the lists together to get one long list of normally distributed numbers.

The formulas are straightforward:

Translating these to SQL, we get:

marsagliaas(

select

x*sqrt((-2*ln(s))/s)asn

frompolar_points

unionselect

y*sqrt((-2*ln(s))/s)asn

frompolar_points

)

### Bringing It All Together

To make a pretty graph of your distribution like the one above, combine the previous steps and then bucket and count the numbers (by rounding them) like so:

with

numbersas(

select

2*random()-1asx,

2*random()-1asy

fromgenerate_series(0,100000)

),

polar_pointsas(

select

x,

y,

x*x+y*yass

fromnumbers

wherex*x+y*y<1

),

marsagliaas(

select

x*sqrt((-2*ln(s))/s)asn

frompolar_points

unionselecty*sqrt((-2*ln(s))/s)asn

frompolar_points

)

select

round(n::numeric,1),

count(1)

frommarsagliagroupby1orderby1

Now your randomly generated test data can look a lot more normal!