Although be already familiar with Cloud Computing for may years, I haven’t look inside many services provided by Amazon Web Service. Because my company (Alibaba) has it’s own cloud platform: Aliyun, so we are only allowed to use home-made cloud products, such as ECS(like EC2 in AWS), RDS(like RDS in AWS), ODPS(like EMR in AWS).
These days I have read some sections of “Amazon Redshift Database Developer Guide” on my Kindle at my commute time.
Amazon Redshift is built on PostgreSQL, which is not very popular in China but pretty famous in Japan and USA. The book said that primary key and foreign key are only used for informal and constrains are totally not supported. I guess Redshift do distributed the rows of every tables into different servers in the cluster therefore keeping constrains is almost impossible.
Columnar Storage is used in Redshift because it is a perfect solution for OLAP (OnLine Analytical Processing) in which situation users tends to retrieve or load tremendous of records. Column-oriented Storage is also suitable for compression and will conserve colossal disk space.
The interesting thing is the architecture of Amazon Redshift and Greenplum looks very similar: both distribute the rows, both use PostgreSQL as back-end engine. Greenplum has open-sourced recently, which make common users to build private OLAP platform much easier. This lead a new question for me: if users could build a private cloud on their bare-metal servers very easily (by the software of OpenStack, OpenShift, Mesos, Greenplum etc.), is it still necessary to build their services and store their data into public cloud? Or the only value of public cloud will be maintaining and managing large mount of bare-metal servers?