We talk about data a lot at the Sisense for Cloud Data Teams office. Recently, the discussion turned to how much time we spend managing our email inboxes. <a href="https://twitter.com/AustinBG" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Austin</a>, one of our engineers, decided to analyze his own email usage for answers.



Ever since I first heard the phrase Inbox Zero, I have been aggressively pursuing it with careful inbox management and quick email response times. I wanted to see how I was doing, so I <a href="https://takeout.google.com/settings/takeout" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">downloaded a data dump</a> of my gmail data. ​



<h3 class="wp-block-heading">Getting Your Data Download from Gmail</h3>



Once you have your Gmail data, you’ll need to convert your .mbox file format into a CSV. <a href="https://www.americaninno.com/boston/how-to-export-your-gmail-to-excel/" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">This post by Claire Willett</a> offers some code to help use python’s <a href="https://docs.python.org/2/library/mailbox.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">mailbox</a> library to to help you do this.



To analyze how much time I was spending on email, I decided to pull out the subject, from, to, and date fields. ​



<h3 class="wp-block-heading">Analyzing the Data</h3>



The first thing I do with any dataset is to look for a date column and plot the growth over time. I have been using this email account for about one year, so I decided to look at a weekly overview.



I quickly encountered dates that came with different timezones and varying formats, for example Wed, 02 Mar 2016 14:43:34 +0200 versus Tue, 1 Mar 2016 19:25:02 -0600 (CST). Redshift had some trouble with the variable timezone formats, and because I was interested in weekly granularity, I chopped off the timestamp by removing anything that came after + and -. ​



<pre class="wp-block-code"><code>select
 date_trunc('week', 
 (regexp_replace(date, ' [+\-].*'))::timestamp
 )
 , count(1)
from
 my_email
group by
 1</code></pre>



Here are my results:



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/image-1-email-blog.png" alt="Email volume chart" class="wp-image-72762"/></figure>



Immediately an elbow in the growth popped out at me. In January, we changed how we were logging information from our production servers, which led to a much greater volume of emails.



The engineering team recognized that while having the alerts was great, the email flood was not. We switched to using <a href="https://rollbar.com/" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Rollbar</a> to aggregate emails coming from the servers, and the volume tapers off at the end of January. ​ This led me to ask who was sending me all of these emails, so I drew a pie chart of the top ten accounts sending me email.



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/image-2-email-blog.png" alt="Email pie chart" class="wp-image-72767"/></figure>



As expected, most of my volume comes from email lists that handle things like our server emails, signups from people interested in trying out Sisense for Cloud Data Teams, and messages to our all-purpose email.



I saw my coworkers Andreas and Sunny impressively show up in seventh and eighth place, respectively, and wanted to take a second look while ignoring anything coming from an automated address.



I decided to use regexp_substr to parse the sender out of the email address. Some of the emails come from our previous .io domain, some use the named noreply format, while others do not.



To do a consistent comparison I only wanted to consider any characters up to a space or @ symbol. ​



<pre class="wp-block-code"><code>regexp_substr("from", '[^@ ]*') </code></pre>



This regex says to record any characters that are not @ or space, and then to stop when it sees one of those characters. I can then use the extracted name to quickly filter out lists: ​



<pre class="wp-block-code"><code>regexp_substr("from", '[^@ ]*') not in (
 'noreply'
 , 'leads'
 , 'signups'
 , 'hello'
 , 'alerts'
 , 'Rollbar'
 , 'Periscope'
 , 'support'
 , '"Slack"'
 , 'OpsGenie' 
)</code></pre>



Re-running my weekly cumulative graph with this filter dropped my total from 255,000 to 45,600. Progress! Knowing now that my analysis would not be dominated by email lists, I wanted to ask a more involved question: whose emails do I respond to most quickly, and whose sit in my inbox the longest?



To start answering this question, I decided to do a self-join on my email. Since different email servers make different modifications to the subject line, I first I stripped the Re: out of the subjects so that I could match emails by subject:



And simplified the problem by only looking at emails with a single response: ​



<pre class="wp-block-code"><code>with
 replied_once as (
 select
 subject
 from
 (
 select
 subject, count(1)
 from
 [austins_email_no_lists]
 group by
 subject
 ) where count = 2
 )
 
select * from response_times, replied_once 
where response_times.subject = replied_once.subject </code></pre>



Then I plotted the average over time: ​



<pre class="wp-block-code"><code>select
 date_trunc('week', (recv_time)::timestamp)::date
 , avg(datediff(minute, recv_time, send_time))
from
 response_times
group by
 1</code></pre>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/image-3-email-blog.png" alt="Pie chart of email response times" class="wp-image-72774"/></figure>



My response times have been pretty dynamic! I spent a lot of time in August, September, and October building charts for new Sisense for Cloud Data Teams trials, and would have to spend time running queries before I could respond.



In February, I stepped up my recruiting efforts and my email response times have definitely seen the effects of coordinating schedules since.



Finally, I wanted to know how much time I was spending communicating with Sisense for Cloud Data Teams teams over email. We use <a href="https://slack.com/" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Slack</a> for most of our internal communications, so I wanted to see where we were using email to communicate.



I mapped each of the internal email addresses to their Sisense for Cloud Data Teams team, and looked at the sum and average number of hours it takes me to reply: ​



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/image-4-email-blog.png" alt="Pie chart of email sources" class="wp-image-72781"/></figure>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/image-5-email-blog.png" alt="graph of email response time" class="wp-image-72791"/></figure>



I am clearly on top of my marketing email response time, but it looks like I have some work to do with the sales team. If you want to see where you are spending your time with email, download your own <a href="https://takeout.google.com/settings/takeout" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Gmail archive</a> and start exploring!

Analyzing Your Email with SQL

LinkedIn

Twitter

GitHub

curve-image-unique-image-unique

curve

3-dark-2-image-unique-image-unique

3 DARK 2

Get the latest in analytics right in your inbox.

Article