Oozie is an open-source workflow/coordination service to manage data processing jobs for Apache Hadoop. It is an extensible, scalable and data-aware service to orchestrate dependencies between jobs running on Hadoop (including HDFS, Pig and MapReduce). Oozie is a lot of things, but being:

  • A workflow solution for off Hadoop processing
  • Another query processing API, a la Cascading

is not one of them.

Oozie Benefits:

  • Complex workflow action dependencies: Oozie workflow comprises of actions and dependencies among them.
  • Reduces Time-To-Market (TTM): The DAG specification enables users to specify the workflow.
  • Frequency execution: Users can specify execution frequency and can wait for data arrival to trigger an action in the workflow.
  • Native Hadoop stack integration: Oozie supports all types of Hadoop jobs.
  • Oozie is validated against the Hadoop stack.
  • Oozie is integrated with the Yahoo! Distribution of Hadoop with security and is a primary mechanism to manage a variety of complex data analysis.


Azkaban vs. Oozie:

What do Azkaban and Oozie do?

 

  • Both allow to run a series of map-reduce, pig, java & scripts actions a single workflow job
  • Both allow regular scheduling  of workflow jobs


On the Implementation Side

Runtime

  • Azkaban runs as standalone (one workflows) or server (one user, multi workflows)
  • Oozie runs as server (multi user, multi workflows)


On the Functional Side

Regular Scheduling

  • Azkaban interval job scheduling is time based
  • Oozie interval job scheduling is time & input-data-dependent based
facebooktwittergoogle_plusredditlinkedinby feather

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>