Product

Connecting Python and Periscope Data

Sometimes you need to step away from reporting and analytics and do some data science. You might want to build a new lead scoring model or use a random forest to look for patterns in your churn, and languages such as R or Python offer the power and flexibility to get that done.

Periscope Data, with its easy data exploration, visualization and cross-database analytics is an amazing tool to have available while you code. 

  • It’s the ideal location to build an input dataset for a model. All your data lives in one place, and cross-database joins allow you to quickly create datasets that contain everything you need.
  • You can output your work directly to Periscope for visualization. Having an easy to use tool with advanced data visualization is great for exploring derived datasets or examining the results of a model.
  • Model results can be put back into a database and go directly to consumers. For instance, a Python model can build a lead scoring model, and then the scores can be used in dashboards or queries for later analysis or segmentation.

 

The way you can use Periscope Data and Python varies depending on your level of service—more specifically on whether you use only the visualization platform, our cache or our full analytics warehouse. Here are a few different ways you can use Periscope Data and Python together—or R— but I’ll use Python as an example. 

Connect to Your Own Database

The simplest way to use Python with Periscope Data is to connect to an existing database and interact with that database directly. Python allows for easy access to most database types through pyodbc, psycopg2 or other common libraries. You can easily pull data out and write it back to the database.

We use Redshift, so I’ll use that as an example.

For our connection I will use the psycopg2 library. It can connect to Redshift quickly and easily. If you’ll be doing modeling, I further using the Pandas library for consuming and working with data while using Python. You can read your SQL directly into a dataframe and be modeling in minutes.

 

import psycopg2
import pandas as pd

## Connect to Redshift Database
conn = psycopg2.connect(
           host="analytics.********.us-west-1.rds.amazonaws.com",
           user='user',
           port=5432,
           password=’password',
           dbname='analytics')
cur = conn.cursor()

## Use cursor to create a table, or run other queries which interact directly with the database
cur.execute(“create table new_data as select * from old_data limit 10”)
conn.commit()

## Combine psycopg2 with pandas for data interaction and modeling
df = pd.read_sql(query, conn)
print df.head()

This does require you to maintain an independent analytics database, and doesn’t allow cross-database joins, but you can visualize your results in Periscope and consume your model results in dashboards immediately! 

Use the Periscope Data CSV Endpoint

For cache users, Periscope Data allows the creation of CSV endpoints where a chart’s data can be retrieved over a simple http request. This is a great solution for simple work, and will enable you to use always fresh data from multiple data sources. If you need to work with a small dataset in Python this is a great way to go.

Your first step is to create the table you want to work with as a chart, and then enable public CSV access. This can be found in the advanced section of the series settings tab in the editor.

Once you have enabled access you, can point Python to the link using the urlib2 Python library, and read the data directly into Pandas:

import urllib2
import pandas as pd

url = 'https://app.periscopedata.com/api/your-CSV-endpoint’'
response = urllib2.urlopen(url)

df = pd.read_csv(response)
print df.head()

The CSV endpoint is built as a lightweight solution to grab data from a chart, and thus capped to the same 5 MB limit as it’s parent chart. With most tables this is between 5,000 and 25,000 rows—though if you are careful to avoid strings and exclude unnecessary data, you can get more.

Get Real with a Periscope Data Warehouse

If you want the most power and flexibility, Periscope Data also offers a full data warehouse. This means—in addition to a very fast, dedicated private cluster—you have full read and write access to all your Periscope SQL views.

Using Python similar to the first example, your Periscope views can act as input datasets for your models. You do all your work across any connected database, dataset and data size. This includes datasets in different databases, CSVs and from third party integrations.

You can push your model outputs directly back into the Periscope Data warehouse where your results will be instantly useable across your company, and can be recombined with all your data, creating a virtuous cycle. No new databases, no data caps and no limitations. Your model output can be immediately visualized and shared.

Happy Scripting!

Want to discuss this article? Join the Periscope Data Community!

Sean Cook
Sean has chased the best tech from San Diego to NYC to SF working in GIS, data science and marketing. You can find him alternatively creating dazzling new analyses and driving content strategy at Periscope Data.