Amazon Redshift - Fundamentals
Late 2017, we set out to replace and upgrade our existing reporting and analytics infrastructure with something that would be a better fit for our workloads. Keeping costs and required maintenance at a minimum would be a nice plus, making for an easy sell. After a bit of research, it was obvious Amazon Redshift had the potential to tick all the right boxes. While steadily porting the most problematic workloads away from our existing infrastructure, I started writing an investigative article on the fundamental concepts of Amazon Redshift. I learned a lot studying each individual building block, allowing me to make some small, but impactful changes to our own setup along the way.
The outcome is a 10.000 word document (1 hour reading time), covering 7 topics:
- Storage
- Distribution
- Importing data
- Table maintenance
- Exporting data
- Query processing
- Workload management
The text is available in three formats:
The project is open source and available on Github.
Thanks to everyone who proof-read earlier iterations and provided me with indispensable feedback.
I hope this work can teach you as much as it thought me. I’m looking forward to your feedback.