Tuesday, August 25, 2020

Sqoop Complete Tutorial Part-2

This is the continuation part of "Sqoop Complete Tutorial". If you want to read Part-1, please click here. We are about to discuss the following in the current and in the upcoming parts.

Importing data from MySQL to HDFS

  • Importing a table from MySQL to the default location of the HDFS
  • Importing a table from MySQL to user-specified location in HDFS
  • Importing all the tables from MySQL to the default location of the HDFS
  • Importing all the tables from MySQL to user-specified location in HDFS.
  • Importing all the tables but excluding few tables from MySQL to HDFS.
  • Importing part of the data from a table from MySQL to HDFS using WHERE condition.
  • Importing a few columns from a table from MySQL to HDFS
  • Importing and compressing the table's data to HDFS
  • Importing data by specifying the column delimiter.
  • Importing data by splitting the data based on a specific column.
  • Import the data from MySQL to HDFS based on a SQL Query.
  • From MySQL incremental import data into HDFS.
  • Import and append the data.
  • Import data from MySQL and store in a sequential file format in HDFS.
  • Read the contents of the source database without importing it.
  • Get the list of the databases.
The format of the import process is the same however some additional keywords will be used if the target is not HDFS.

Importing data from MySQL to Hive
  • Importing a table from MySQL to Hive's default database.
  • Importing a table from MySQL to Hive's user's database.
  • Importing all the tables from MySQL to Hive
  • Importing all the tables but excluding few from MySQL to Hive
  • Import the table's structure from MySQL to Hive without data.
  • Import the table from MySQL to Hive with the specific delimiter.
Sqoop Automation
  • Sqoop - Creating a job
  • Sqoop - Displaying the list of the jobs
  • Sqoop - Displaying the properties of a job
  • Sqoop - Executing the job
  • Sqoop - Deleting the job

Importing data from MySQL to HBase
  • Importing a table from MySQL to HBase
  • Importing a table with some selected columns from MySQL to HBase

Ok, let us begin the exercises with the above-mentioned points one by one.

Importing data from MySQL to HDFS


1) Importing a table from MySQL to HDFS default location.

sqoop import 
        --connect jdbc:mysql://localhost/retail_db 
        --table categories 
        --username root 
        --password cloudera 
        -m 1


Explanation: "Sqoop Import" is the keyword to be used while import. Sqoop is a JDBC utility to move data between Hadoop Ecosystems and relational databases. Username and password must be provided to connect to the RDBMS. 

Assigning mappers is the user's choice. Users can assign a number of mappers based on the size of the data that is being imported. In case the password not to be disclosed in the connection string, the user can simply pass -P argument without providing any password. During the run-time, it will prompt you to enter the password. Refer to below -

Importing a table from MySQL to HDFS default location - Hiding password.

sqoop import 
        --connect jdbc:mysql://localhost/retail_db 
        --table categories 
        --username root 
        --P
        -m 1

The above command will create the directory named “categories” and the table data will be stored as a file (part-m-00000) in the specified directory "/user/cloudera"

2) Importing a table from MySQL to HDFS user-specified target location.

sqoop import
--connect jdbc:mysql://localhost/retail_db
--table categories
--username root
--password cloudera
--target-dir 'retail_db'
-m 1

3) Importing all the tables from MySQL to HDFS default location

sqoop import-all-tables
--connect jdbc:mysql://localhost/retail_db
--username root
--password cloudera

Note: All the tables will be placed in the root directory as files.

4) Importing all the tables from MySQL to HDFS user-specified target location.

sqoop import-all-tables
--connect jdbc:mysql://localhost/retail_db
--username root
--password cloudera
--warehouse-dir 'retaildb'

Note: All the tables will be placed in the specified directory. The keyword "target-dir" will not work with import-all-tables. No mappers assigned since a single mapper is insufficient to import data from the "retail_db" database as the volume is higher than the HDFS block size.

 

Please click here for the next part.


No comments:

Post a Comment

Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...