Sqoop (SQL-to-Hadoop) is a Hadoop ecosystem component and an ETL tool that offers the capability to extract data from various structured data stores such as relational databases with the help of map-reduce. This command-line interpreter works efficiently to transfer a huge volume of the data to the Hadoop file system (HDFS, HBase, and Hive). Similarly, it exports the data from Hadoop to SQL.
Sqoop Import
Imports individual or all the tables
from RDBMS to HDFS or to Hive or HBase. Regardless of the target location, the
table data will be stored as text files.
In HDFS, each row of the table is
treated as a record, each table is treated as a sub-directory, and table data
is stored as text files. Users can opt to store the table data either in text
format or as binary data or in Avro format or in row-column file format or in Sequence
file formats. Users can also have the privilege to opt if the data to
be compressed.
In Hive, the target database must be
created before importing all the tables from RDBMS. If not, the tables will be
imported into the Hive’s ‘default’ database.
Sqoop Import will take care of the table’s creation if the table does not exist in Hive metastore.
Sqoop Export
No comments:
Post a Comment