Setting up Your Redshift Cluster
You’ve done your data warehouse research and have settled on Amazon Redshift. Now you just need to get everything set up. We’re heavy users of Redshift, so this is something we have a lot of experience with.
Setting up your cluster
Setting up a Redshift cluster is extremely easy. The details of connecting to your Redshift cluster vary depending on how you set it up, but the basics are the same.
First, you need to decide on what type of node you’ll use — Dense Compute or Dense Storage.
Compute nodes have more ECU and memory per dollar than storage nodes, but come with far less storage. We highly value speed at Periscope, so we’ve found these to be the most effective. The more data you are querying, the more compute you need to keep queries fast.
Storage nodes can work well if you have too much data to fit on SSD nodes within your budget, or you want to store a lot more data than you expect to query.
Number of Nodes
Now you need to figure out how many nodes to use. This depends somewhat on your dataset, but for single query performance, the more the merrier.
The size of your data will determine the smallest cluster you can have. Compute nodes only come with 160GB drives. Even if your row count is in the low billions, you may still require 10+ nodes.
The last step is network setup. Clusters in US East (North Virginia) do not require a VPC, while the rest do. For any production usage, we suggest using a VPC, as you’ll get better network connectivity to your EC2 instances.
A default VPC is created if one doesn’t exist. If you want to access Redshift from outside of AWS, then add a public IP by setting Publicly Accessible to true. Whether you want a public IP on your cluster is up to you — the rest of this post explains how to connect to both public and private IPs.
In either case, take note of the VPC Security group. You’ll need to allow access to the cluster through it later.
We’ll start with the simplest cluster setup possible — a cluster in Virginia not in any VPC. This kind of setup is best used for prototyping.
Once the cluster boots, the Configuration tab in the AWS Redshift console will show you the endpoint address.
Before connecting, we need to allow the IP in the Cluster Security Group. Click the link, then click Add Connection Type. The default is your current IP.
Now connect directly to your cluster:
psql -h \
-p 5439 -U periscope dev
If your cluster is in a VPC with a public IP, there’s one more step: Head to the VPC’s security group for this clusters, and whitelist port 5439 for your IP address.
If your cluster doesn’t have a public IP, stay tuned! Next week, we’ll cover connecting to private clusters via SSH Tunnels.
Thanks for reading and if you have any questions send us an email.