At Globant we deal with multiple clients, some of them using multiple Hadoop distributions to avoid vendor lock in. From the experience we gained, we thought it may be useful to share it to the community. In this post, we provide a brief comparison between the most prominent Hadoop distributions today, namely Intel, Hortonworks, Cloudera, MapR, IBM and Amazon. We only take in account a technical point of view from the publicly available information at the moment.
(1) This refers to any software or mechanism that provides support for additional data protection or security such as data encryption. E.g. Intel’s Advanced Encryption Standard New Instructions (Intel® AES-NI).
(2) This comprises any kind of replication mechanism that could be used to provide the system with fault tolerance and error recovery capabilities (extending Hadoop’s default fault tolerance). E.g. MapR’s No NameNode High-Availability (HA) Architecture.