Sqoop is a tool designed to import data from relational databases into Hadoop. It uses JDBC to connect to a DB, examine its tables and generate the code needed to automatically run a MapReduce job which in parallel read data from the tables.
Sqoop can also import tables into Hive for further relational processing, as well as export tabular data from HDFS back to databases.
For certain databases, such as MySQL, Sqoop provides further performance enhancements by using database-specific tools to facilitate imports and exports.
Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:
- Imports individual tables or entire databases to files in HDFS
- Generates Java classes to allow you to interact with your imported data
- Provides the ability to import from SQL databases straight into your Hive data warehouse
Why was it made?
Sqoop steps in to satisfy the need to integrate co-existing storage solutions based on RDBMS and Hadoop architectures. Legacy systems heavily dependant on RDBMS which need to integrate with the parallel processing capabilities of a Hadoop cluster can use Sqoop to import raw data into the cluster and export processing results to the RDBMS.by