Amazon Redshift is a petabyte-scale data warehouse service located in the Cloud which is fully managed for you. They offer you an opportunity to start with several hundred gigabytes of data, then scale upward when needed for your company. This structure makes it possible to gain new insights through data acquisition into your customers, business opportunities, and other daily needs.
You get started with this opportunity by launching a set of nodes. These are referred to as an “Amazon Redshift cluster.” Once you provision the cluster, it becomes possible to upload your data. From there, you can perform analysis queries on your information package. It uses the same SQL-based tools that most companies are using for internal needs, making your data access faster and more secure because you’re not forced to host it at your physical location.
At the time of this writing, Amazon offers all new users of Redshift a 2-month free trial of this data warehouse service to see if it meets their needs. If you start small, the pricing after the free trial begins at just $0.25 per hour, moving to just $250 per year for each terabyte that you store. That makes it about 10% of the cost of a similar solution.
If you’re looking for data storage solutions that are fast and scalable today, then here are the pros and cons of Amazon Redshift to consider.
List of the Pros of Amazon Redshift
1. It is one of the quickest solutions of its type available today.
When it comes to loading your data or querying it for reporting or analytical purposes, there are fewer competitors which can match what Redshift offers. It utilizes the MPP (Massively Parallel Processing) architecture to load your data at speeds you won’t believe. It will also parallelize and distribute your queries across multiple notes for fast access when needed. You also have the option with this service to use SSD-based data warehouses, making it possible to run a complex query without a massive time commitment.
2. You receive a high-performance warehouse solution.
The design of Redshift allows you to take advantage parallelization in your backup and restore operations, in addition to the data loading requirements you have. This structure provides you with efficient data compression rates, allowing your queries and distribution to be entirely optimized – no matter how much data you store. These benefits are possible because of the columnar storage database offered to you, optimized for repetitive data. The I/O operations are reduced on the disk, improving your performance as a result.
3. You receive access to a machine learning process.
Redshift uses machine learning to ensure you receive a high throughout based on the workloads you have. It does this through the employment of sophisticated algorithms that predict the incoming query run times. Then it assigns them to whatever queue will optimize the processing speed for you. That means your reports or dashboard queries go through an express queue instead of a standard routing structure, optimizing the processing speed to deliver immediate results.
4. It offers result caching.
Redshift also uses result caching so that it delivers a sub-second response time whenever there are repetitive queries. Visualized, business intelligence tools, and dashboards that execute repeat queries will receive a significant boost in performance because of this structure. It searches the cache to determine if a prior run created a cached result. If there is one and the data has not changed, then you see the cached result instead of having the query re-run again.
5. You will find that it is very easy to deploy.
Amazon Redshift is one of the easiest data warehouse solutions available today to set up and operate from a business perspective. All you need to do is log into your AWS console, then follow the commands present to deploy your new data warehouse. It will automatically provision your infrastructure at that point. Many of the administrative tasks are automated as well, including replication and backups, which means you get to focus on your data instead of the task of administrating it. You can make adjustments to tune specific workloads too.
6. It integrates with third-party tools.
You can choose to enhance your interactions with Redshift by working with an extensive list of third-party providers that help to transform and visualize your data. There are business intelligence partners, data integration experts, consulting and system integration assistance, and query and data modeling opportunities all with certified solutions that are guaranteed to work with Amazon.
7. You will find it to be a cost-effective solution for your business.
If you want to start small, then you can take advantage of the $0.25 per hour rate with no commitments. Redshift is the only provider of its type that offers on-demand pricing without any upfront costs. If you commit to a 3-year term, however, you can save up to 75% on your Cloud-based data warehousing needs. Your hourly rate is based on the number and type of nodes that are in your cluster. Even with dense storage, the maximum cost of current generation products is about $7 per hour.
8. You get to choose your node type when working with Amazon Redshift.
There are two node types available to optimize your data warehousing needs if you choose Amazon Redshift. The first option, called Dense Compute, allows you to create a high-performance solution for fast CPUs, solid-state disks, and large amount s of memory. You can scale further to use Dense Storage nodes that offer a larger hard disk drive (HDD) for low price points. If you want to switch between nodes or scale your cluster, then a single API call or a couple of clicks from your console are all that is necessary to get the work done.
9. It offers a consistent backup for your data.
Amazon Redshift offers a consistent backup of your data and files. It recovers them when failures or corruption may occur as well. The subtasks available to you in this area include help with data recovery that is continuous and automatic, even if a drive or node fails. They help with a disaster recovery backup to limit the amount of data loss that you experience. Data restoration occurs from different regions too, depending on how you set up your account. These benefits even apply if you go beyond the first perabyte that you store in the Cloud with this service.
10. You receive end-to-end encryption with Amazon Redshift.
All you need to do is set your parameter settings property to use SSL for enhanced data security while in transit. This data warehousing system also uses AES-256 hardware-accelerated encryption for your data when it is at rest. When you select encryption for your data when it is resting, all of it that is written to disk, including any backups you may have, receives this encryption benefit. Redshift then takes care of your key management by default.
11. It offers network isolation.
Choosing Amazon Redshift gives you the advantage of configuring your firewall rules to control the network access to the clusters which comprise your data warehouse. It is possible to run Redshift inside Amazon VPC to isolate your clusters through your own virtual network. You can then connect this to your existing IT infrastructure if desired using an encrypted IPsec VPN to maximize ongoing data access.
12. You can audit all of your API calls through Redshift.
Because Amazon Redshift integrates with CloudTrail, you can audit all the API calls made through the system. It logs all your SQL operations, including queries, database changes, and connection attempts. You can then access the information using SQL queries against the system tables or choose to download them on Amazon S3 to a secure location. The system is compliant with SOC1, SOC2, and SOC3. It is also compliant with PCI DSS Level 1 requirements.
13. It natively integrates with the AWS analytics ecosystem.
When you choose Amazon Redshift for your data warehousing needs, then you will find that it fully integrates with the AWS analytics ecosystem.
• You can use AWS Glue to extract, transform, and load data into Redshift.
• Capture, transform, and load streaming data into Redshift using Amazon Kinesis Data Firehose for analytics which are almost in real-time.
• Create dashboards, visualizations, and reports through Amazon QuickSight.
You can even use the AWS Database Migration Service if you want to improve the speed of your data movement to Redshift with a free 6-month trial of their DMS service.
14. You have access to plenty of training materials for Amazon Redshift.
If you access the documents page for Amazon Redshift, then you will find a variety of resources available to you as a first-time user. There is a complete overview of how to manage the system when you have data warehousing needs. You have access to a “getting started” guide which takes you through all the steps necessary to create clusters, database tables, and test queries. A cluster management guide will show you how to manage the clusters correctly, while a database developer guide offers explanations on how to build, design, query, and maintain the information that makes up the foundation of your data warehouse.
List of the Cons of Amazon Redshift
1. It requires that you enforce uniqueness on your end.
There is no structure currently available at this time of writing which allows Amazon Redshift to help you maintain data integrity through the use of unique indexes. You are responsible for this structure on your end of the data storage process. That means there are no checks on the values in your expressions or columns to determine if the index key has been compromised in some way.
The “check” and “unique” constraints are not supported because of this structure. That means the cannot be declared, which creates some limitations for some agencies.
2. You only receive support for parallel upload with specific data.
Parallel loading receives support for DynamoDB, SE, and Amazon EMR when you opt for Redshift as your Cloud-based data warehousing solution. These structures use the MPP that gives you the high speeds needed for your queries. If you have any other source for your data, however, then this feature is not supported at all. You’re required to use JDBC inserts or scripts to load the data into Redshift. Your other option would be to utilize an ETL solution that loads your data into the warehouse from a different source.
3. You must understand the distribution and sort keys.
The distribution and sort keys will determine how your data is indexed and stored when you choose Redshift for your data warehousing needs. This process applies across all nodes. That means you must have a firm understanding of the concepts behind these keys, including knowledge about how to set them properly on your tables to create the optimized performance you want through this solution.
There can only be a single distribution key for each table. You cannot change it later on, which means you must anticipate future workloads before making a decision. The primary keys can be declared as well, but not enforced.
4. It does not work as a live app database.
You will discover that Amazon Redshift does an excellent job of running queries with a significant amount of data, running reporting, handling analytics, and similar tasks, but it still isn’t a solution if you’re trying to run live web apps. You’ll need to pull data into a caching layer, or opt for a Postgres instance, if you’re serving data from Redshift to any web apps.
During a Redshift training opportunity, Lars Kamp ran a survey of attendees about the issues they’ve experienced with this solution. 91% reported that their queries were too slow. 64% said that their dashboards were slow as well. 55% said that it was hard to understand what was going on with the database. It takes time to find solutions to these problems to turn Redshift into the powerful tool it can be.
5. You are placing your data onto a Cloud-based system.
There are unique pros and cons to consider when you’re running Cloud-based systems. Although having your data managed off-site by a third-party can increase physical security concerns, you are also placing the safety of your data in the hands of someone else.
There are privacy concerns that some companies may have when using Redshift because of the value of their intellectual property. You also have connection issues to think about, since a lack of access to an ISP limits your ability to access these services. There is the possibility of outages too, which means any failures will be public.
6. It is a little behind the times with its Postgre setup.
The structure of Amazon Redshift is based on PostgreSQL 8.0.2. That version is more than a decade old at this point. It has seen marked improvements in multiple areas since then, but these features are not currently available if you choose this data warehousing solution. You’ll discover that many of the basic features you would expect with the updated SQL are not made available to you through this system.
7. You must handle the costs of data migration and integration.
Because you’re working with a perabyte-level data warehousing solution, there must be a consideration of the bandwidth you’ll require to transmit this data during the initial phases of this project. Your internal systems must send the information to the Cloud-based Redshift system or send them through USB drives to AWS from your preferred shipper. If you’re a small business still operating on capped data use, then it may not be possible to send over all your data to be warehoused.
8. There are no stored procedures available to you in Amazon Redshift.
When you decide to use Redshift for your data warehousing needs, then you’ll need to parse and run your SQL script files one statement at a time. That’s because there are no stored procedures available to you. You much check and count affected rows, then execute a complex join query against some of your system views or tables to generate the needed results. Unless you’re familiar with database management systems, the learning curve for these processes will be quite high for the average person.
9. Your performance levels decrease as the clusters increase.
If you want to achieve consistent results when using Amazon Redshift, then you must keep your clusters below 75% for best results. If you let the clusters become overloaded with multiple queries, then you’ll start to have performance issues as well. Do your best to limit yourself to 10 concurrent queries or less when working with this data warehousing solution. You’ll need to run your maintenance or heavy loads during quiet periods, which may fall outside of the timeframe you envisioned for this process.
If you’re looking for an affordable and effective solution for data warehousing, then the pros and cons of Amazon Redshift are essential to review. There are some limitations with this service, but you will also find that it is lightyears ahead of some competitors, such as Snowflake. It does take some time to learn, especially if you require custom scripts for real-time data access, though most agencies which use this service find that its accuracy, consistency, and scalability are exactly what they need to push for greater success.
Although millions of people visit Brandon's blog each month, his path to success was not easy. Go here to read his incredible story, "From Disabled and $500k in Debt to a Pro Blogger with 5 Million Monthly Visitors." If you want to send Brandon a quick message, then visit his contact page here.