Advanced SQL

R & Python 101: Advanced Statistical Analysis

Data analysis is not new. Companies have been storing and tracking numbers for a long time. What started in basic spreadsheets has advanced to pivot tables, SQL queries and beyond that, advanced coding languages. With each step of this evolution, businesses are empowered to move away from simply describing what has happened into a more thorough picture of understanding why things happen and better predicting what is going to happen next. As data teams get more advanced, they move closer and closer to the executive decision makers, working together to identify operational moves that will optimize results.

Using R and Python for advanced statistical analysis

SQL is a descriptive language; it does a great job of answering questions about what is happening. R and Python allow teams to answer questions about why something is happening. This evolution of analysis means executives want data-based understanding of what is actually meaningful. Your data team needs to be able to identify variables that positively correlate with intended business outcomes. Using SQL alone doesn’t allow for this, but a language like R can handle this type of query with just a single line.

To illustrate how easy writing advanced analytics can be with these languages, look at the image below. On the left is SQL code intended to determine a correlation between just two variables. On the right is the same type of function, when implemented in R, able to analyze many relationships at once in a large matrix of data.

Incorporating more advanced languages into your data approach opens the door for more powerful executive conversations. R and Python help your team uncover patterns that were previously not visible and help you quantify your findings to easily illustrate their importance. Imagine that your team was tasked with finding ways to minimize customer churn. Using Periscope’s advanced language support, you could pass a table of data from SQL into R and run correlations across as many variables as you can imagine. Any factors that show a strong enough correlation would be a great starting point for identifying causal relationships with churn.

To establish causal relations, your team could make educated recommendations about operational adjustments based on those variables. Those hypotheses could be tested for a period and anything that shows a statistically significant improvement can be formally adopted into your company’s process. It’s an easy way to cut through the noise and pinpoint exactly what is driving your business. These new languages give analysts the tools to look for connections that they think are relevant while still allowing data to be the ultimate decision maker.

To take statistical analysis to the next level, data teams that have skilled scientists can use the new languages to perform detailed predictive analysis, such as logistic regressions and cluster analysis, that can be used to create machine learning models for even more advanced recommendations. Even better, these languages are created and improved by thousands of experts, with new capabilities appearing frequently. Utilizing R and Python as part of your data team’s analysis is a good way to ensure you always have access to the best tools available.

Advanced statistical analysis in Periscope Data

For Periscope users, data can easily be passed from SQL into R or Python, where it can be analyzed and visualized before passing it back into a formal Periscope dashboard for distribution, collaboration and presentation. The libraries available in those languages represent the combined brilliance of thousands of the most skilled professionals and academics in the field.

The beautiful thing about Periscope Data is that our agile platform allows dashboards to refresh instantly to show new information. There’s no need to download any new information or re-run reports to get new visualizations. This approach allows a team’s most skilled data scientists to avoid doing the same work twice, freeing them up to work on more complex analysis that illustrate a deeper understanding of emerging business needs. The overall effect of this model is that every analyst, employee and executive can view data from the perspective of the most advanced scientist at the company.

To learn more about how your data team can use R and Python to uncover hidden insights, download our guide.

Want to discuss this article? Join the Periscope Data Community!

Britton Stamper
A self-described data visualization evangelist, Britton spends his time working with and teaching anyone who will listen about the great benefits of aligning a visual’s design with a business need. He’s willing to go to any length to get people to understand the need for including data in decision making.