# Adding Summary Statistics to your SQL Table

At Periscope Data we continuously help people build, diagnose, and update their data. When approaching a table of data for the first time, it helps our understanding to see summarizing statistics.

By looking at such staples as the range, sum, and average, you can get a quick glance at the shape and distribution of the data before taking a deep dive.

Today we will demonstrate how we append this data to our SQL tables.

### Understanding Blog Post Metrics

Whenever we write a blog post, we want to understand if readers are visiting the Periscope Data homepage and signing up for a trial. We have tables * blog_pings*,

*, and*

**homepage_pings***that let us observe when users visit the homepage after reading a blog post.*

**signup_pings**We want to count visitors to the blog even if they did not visit the homepage. Since * blog_pings* includes everyone in

*, which in turn includes all of*

**homepage_pings***, we use*

**signup_pings***to merge these records without losing any of the rows.*

**left join**select

post

,count(distinctblog_pings.cookie)asblog

,count(distincthomepage_pings.cookie)ashomepage

,count(distinctsignup_pings.cookie)assignups

from

blog_pings

leftjoinhomepage_pingson

blog_pings.cookie=homepage_pings.cookie

leftjoinsignup_pingson

blog_pings.cookie=signup_pings.cookie

groupby

1

Running this query gives us the counts of visitors who visited the blog page, the home page, and those who signed up for a trial of Periscope Data.

This table makes it very easy to analyze the conversion rates of the most active and least active blog posts, but not how an average post performed just by scrolling through the table. We can add the average and standard deviation as headers to the table to guide the analyst when checking up on a blog post.

First, we bundle the previous query into a temporary table using with * blog_signups* as

**(...)**withblog_signupsas(

select

post

,count(distinctblog_pings.cookie)asblog

,count(distincthomepage_pings.cookie)ashomepage

,count(distinctsignup_pings.cookie)assignups

from

blog_pings

leftjoinhomepage_pingson

blog_pings.cookie=homepage_pings.cookie

leftjoinsignup_pingson

blog_pings.cookie=signup_pings.cookie

groupby

1

)

Now we can pull statistics like the total, mean, and standard deviation from * blog_signups*.

select,'Total'sum(blog),sum(homepage),

sum(signups)fromblog_signups

unionselect,'Average'avg(blog),avg(homepage),

avg(signups)fromblog_signups

unionselect, stddev(blog),'Standard Deviation'

stddev(homepage), stddev(signups)fromblog_signups

Having these statistics is great, but it becomes tedious to check multiple data sources. We can address this by simply folding our statistics into the main table by * unioning* them together.

select,'Total'sum(blog),sum(homepage),

sum(signups)fromblog_signups

unionselect,'Average'avg(blog),

avg(homepage),avg(signups)fromblog_signups

unionselect, stddev(blog),'Standard Deviation'

stddev(homepage), stddev(signups)fromblog_signups

unionselect*fromblog_signups

This will make sorting the table difficult, since the Total, Average, and Standard Deviation are on either ends of the range of the data. To help us in sorting the data, we can add an * index* column that we will use as a primary sorting key. First, we add an index to

*:*

**blog_signups**withblog_signupsas(

select

1asindex

, post

,count(distinctblog_pings.cookie)asblog

,count(distincthomepage_pings.cookie)ashomepage

,count(distinctsignup_pings.cookie)assignups

from

blog_pings

leftjoinhomepage_pingson

blog_pings.cookie=homepage_pings.cookie

leftjoinsignup_pingson

blog_pings.cookie=signup_pings.cookie

groupby

1,2

)

Then we index the statistics in our order preference:

select4,,'Total'sum(blog),sum(homepage),

sum(signups)fromblog_signups

unionselect3,,'Average'avg(blog),avg(homepage),

avg(signups)fromblog_signups

unionselect2,, stddev(blog),'Standard Deviation'

stddev(homepage), stddev(signups)fromblog_signups

Now we wrap this entire query into another * with* block which we call

*:*

**signup_statistics**withsignup_statisticsas(

withblog_signupsas(...)

select4,,'Total'sum(blog),sum(homepage),

sum(signups)fromblog_signups

unionselect3,,'Average'avg(blog),avg(homepage),

avg(signups)fromblog_signups

unionselect2,, stddev(blog),'Standard Deviation'

stddev(homepage), stddev(signups)fromblog_signups

unionselect*fromblog_signups

)

select

post

, blog

, homepage

, signups

from

signup_statistics

orderby

indexdesc

, signupsdesc

And there we have it! Now whenever an author wants to check up on their post, they will be able to more easily understand how their post faired against the distribution.