Datastax – Cassandra

At Globant we are taking advantage of Cassandra within different business verticals and in the cases where it makes sense to exploit its main benefits Specifically, when we can relax consistency in order to achieve response times in reads and writes at a volume of data that relational databases would struggle to achieve, while leaving our clients a tool that it easy to maintain and interact with people who have SQL skills.

Other benefits:

  • Tunable consistency: Consistency levels in Cassandra can be configured to manage availability versus data accuracy and latencies. Consistency levels in Cassandra can be set on any read or write query. This allows us to tune consistency on a per-query basis depending on their requirements to improve response time versus data accuracy. Cassandra offers a range of consistency levels, from 1 to All, and naturally the strongest level of consistency or Quorum.

  • Masterless – No Single Point of Failure. No master-slave architecture, it means all nodes have same responsibility, reducing the possibility of a SPOF (master). It also simplifies the scalability process.

  • Simple, easy of use: install, deploy, and maintenance processes are straightforward. Also, the required maintenance is minimal.

  • CQL: SQL-like language as native driver. This functionality allows any DBA to adapt easily to this new tool.

  • Snitch/Sharding: Working with multiple nodes, we must take care of the distribution strategy we select for ours keys. Cassandra offers multiple out of the box strategies according to our use cases and selected Cloud infrastructure: ie: AWS.

MongoDB

 At Globant we’ve been using MongoDB for a while now within clients that need to scale their operations or specific analytics, by taking advantage of the following:

  • Schema less: Documents are stored in BSON (Binary JSON). Any valid JSON can be easily imported and queried. Schema less is very flexible. No more blocking  ALTER TABLE. This allows us the creation and consumption of API’s very easily.

  • Sharding: Sharding is the process of storing data records across multiple machines and it is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. Data is split between shards using a shard key. The shard key is a single field (or a set of composite fields) that identifies the data to be sharded. Data is defined in “chunks”, by default 64MB of data. The chunk is the smallest logical block of data that will be sharded.

  • Supports MapReduce combined with sharding: Map Reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. When using sharded collection as the input for a map-reduce operation, MongoDB will automatically dispatch the MapReduce job in parallel to each shard. There is no special option required. MongoDB will wait for jobs on all shards to finish.

  • Replica Sets and Automatic Failover: We have found Replica sets easy to configure and maintain.

  • Aggregation Framework: It is useful to do calculations and serve JSON documents with aggregated data. Although they are not using standard SQL they are easy to use if you are familiar with data pipeline processing. Nevertheless, it should be used carefully due to some restrictions on memory resources they can consume. Basically it helps solving aggregations, that can take a while if they are processed outside MongoDB.

  • Graph representation: Despite it not being a graph oriented db, graph and threes representation is straightforward when specifying links inside documents. One-level querying will be extremely fast and sharding allows to use locality principle to ensure that related nodes are stored together. Although each graph has its own characteristics, with this kind of representation, documents remain at a small size, both for relationships and node information.

  • MMS (Mongo Monitoring System): It’s really good to have a made-in-house monitoring system, since we can monitor  indicators that are specific of MongoDB, such as “replication lag”. In our experience we have tried other “generic” monitoring tools that we use for other applications with a MongoDB connector, but they were missing those MongoDB specific indicators. MMS is a very complete tool with charts, alerts, e-mail notifications, and it’s free.

  • Profiler: In case of performance testing, it is really helpful to quickly obtain the offending queries, or those queries that are taking more than specific threshold in time.

 

 

facebooktwittergoogle_plusredditlinkedinby feather

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>