Tips and Tricks
The Best Advice for First-Time Citizen Data Scientists
One of my favorite things about data is its ability to communicate a lot of information with incredible clarity and efficiency. We’ve all heard the saying that a picture’s worth a thousand words, but a well-designed data visualization can instantly illustrate insights that no number of words can explain. The biggest reason I joined Pericope Data was because I believe in the power of data as a means of communication — a language that tells stories in a way that no other can.
With Periscope Data discovery for business, a new generation of citizen data scientists will gain access to that language to do their own research and tell their own stories. These new storytellers will soon find exciting new tools to dive into data and share insights. To help get them started on their journey, I want to share some advice from my personal experience.
My education and career has been dedicated to the concept of telling better stories through data visualizations, so I’d love to share some of what I’ve learned. When I was just getting started in data communication and visualization, I learned an important lesson that has stuck with me through the years. Whether you’re just getting started in data analysis or you’re a veteran like me, telling better data-driven stories starts with the same approach: be skeptical of the data.
I don’t mean skeptical in the sense that you should doubt your findings, the point of this analysis is to give you answers that are objectively reliable; I mean you should question your assumptions and always ask questions about the data you are studying. How was it collected? How often is it refreshed? What is being calculated in the columns you see?
To extract the best insights and to make the most convincing recommendations with your data, you need to really understand the story behind the data. The best analysts aren’t just reviewing datasets for insights, they’re learning something from every step of the process to create a more holistic understanding of what they're seeing.
Think about an analyst who has a dataset that includes numbers on “users” and “sales.” That data can be used to tell a lot of stories, but first you have to know exactly what the data is saying. For starters, what do the terms actually mean. Is that column labeled “users” showing a total of all users ever registered, daily/monthly active users, paid users, new users or something else entirely? The same is true for the “sales” column. Is that related to a specific product or inclusive of every product. Are you looking at lifetime sales, monthly sales, annual sales or something else?
Once you start asking questions about the data, you learn a lot more about what the numbers actually mean. That’s the purpose of analysis — to use the data to come to a better understanding of a bigger picture.
Before you can do that real analysis, you have to interview the data. You have to see where it’s coming from and how it gets collected. Question the entry process. Question chart and dashboard titles. Question any biases or assumptions that you might be bringing to the interpretation.
Another area to be skeptical is with null values, or what is entered when a field is intentionally or accidentally left blank. Do null entries count automatically turn into a “0” or an “n/a”? Are they excluded from bigger calculations? Are they affecting a mean value? How is the system set up to treat null values and how does that affect the bigger picture with that data?
The best place to answers these questions is with your data team. Talk to them about the charts they’re creating and find out exactly what the data means and what happens to it before you see it. It’s always a good idea to get very specific with the names of the columns you’re seeing, maybe to a point that you agree on a language to use in the dataset creation process to maximize clarity.
If you’re looking for more tips to improve your dives into data, our How to Chart Your Data Discoveries guide is full of them. It’s a great way to make sure you’re getting the most information and value out of your data.