Showing posts with label bigdata;. Show all posts
Showing posts with label bigdata;. Show all posts

Sunday, September 27, 2020

Sqoop Complete Tutorial Part-7

This is the continuation part of "Sqoop Complete Tutorial". If you want to read -


18) Importing all tables from MySQL to Hive  

Importing a table from MySQL to Hive's default database. The below command will help in copying all the tables from MySQL to Hive user database. 

sqoop import-all-tables
--connect jdbc:mysql://localhost/empdept
--username root
--password cloudera
--hive-import
--hive-database dbTest

The result is below:


If you look at the result, the data is replicated. This is because the tables "emp" and "dept" already exists in the database. hive-overwrite will help in replacing the data if already exist. 

sqoop import-all-tables
--connect jdbc:mysql://localhost/empdept
--username root
--password cloudera
--hive-import
--warehouse-dir /user/hive/warehouse/dbtest
--hive-database dbtest
--hive-overwrite

Here we have additionally provided the warehouse directory to specify the location of the database.


19) Importing all tables but excluding few from MySQL to Hive  

I have created a table named "location" in my current database 'empdept'.


I am about to import all tables but excluding 'emp' and 'dept' since those were already imported. Since "location" is the only table to import, I can specify the table name, however, let's see how it can be done with sqoop-import-all.

sqoop import-all-tables
--connect jdbc:mysql://localhost/empdept
--username root
--password cloudera
--hive-import
--hive-database dbtest
--exclude-tables "emp,dept"


If you look at the above screenshot, the import process selecting only "loc" table and excluding the tables "emp" and "dept" from the import.


The import process is completed and the table schema and data populated into Hive warehouse/database. Let's verify in Hive.

Sqoop Complete Tutorial Part-6

This is the continuation part of "Sqoop Complete Tutorial". If you want to read -


16) Importing a table from MySQL to Hive's default database.  

Importing a table from MySQL to Hive's default database. The below command will help in copying "emp" table and data from MySQL to Hive "default" database as "employee".

sqoop import
--connect jdbc:mysql://localhost/empdept
--table emp
--username root
--password cloudera
--hive-import
--hive-table employees



Verifying the data:


17) Importing a table from MySQL to Hive's user database.  

Importing a table from MySQL to Hive's default database. The below command will help in copying "emp" table and data from MySQL to Hive's user database (dbTest) as "employee" table.

sqoop import 
--connect jdbc:mysql://localhost/empdept 
--table emp 
--username root 
--password cloudera 
--hive-import 
--hive-table employees
--hive-database dbTest


Verifying the data:



Please click here for the next part.




Monday, September 7, 2020

Sqoop Complete Tutorial Part-5

This is the continuation part of "Sqoop Complete Tutorial". If you want to read -


14) Read the contents of the database without importing it. 

sqoop list-tables
--connect jdbc:mysql://localhost/empdept
--username root
--password cloudera


This above command will list out the tables from the given database.

15) Get the list of of the databases using Sqoop

sqoop list-databases
--connect jdbc:mysql://localhost/
--username root
--password cloudera



With the above command, we completed learning how to import data from MySQL to HDFS using Sqoop. The upcoming articles are on how to import data from MySQL to Hive.

Please click here for the next part


Wednesday, August 26, 2020

Sqoop Complete Tutorial Part-4

This is the continuation part of "Sqoop Complete Tutorial". If you want to read -


11) Importing data from MySQL to HDFS based on SQL query 

sqoop import-all-tables
--connect jdbc:mysql://localhost/retail_db
--username root
--password cloudera
--exclude-tables "order_items" 

Append & LastModified

The following is as per the Sqoop documentation.

Sqoop supports two types of incremental imports: append and lastmodified. You can use the --incremental argument to specify the type of incremental import to perform.

You should specify append mode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with --check-column. Sqoop imports rows where the check column has a value greater than the one specified with --last-value.

An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with --last-value are imported.

At the end of an incremental import, the value which should be specified as --last-value for a subsequent import is printed to the screen. When running a subsequent import, you should specify --last-value in this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.

Below two are examples of incremental imports.

12) Importing data from MySQL to HDFS - Incremental Import

Assume that the below is the "emp" data which is in MySQL's "EMPDEPT" database.


Importing it into HDFS using the below command

sqoop import
--connect jdbc:mysql://localhost:3306/empdept
--table emp
--username root
--password cloudera
-m1

After importing it into HDFS, insert two rows in MySQL's table.

These two records need to be updated/appended in HDFS location. To achieve it use the below command.

sqoop import
--connect jdbc:mysql://localhost:3306/empdept
--table emp
--username root
--password cloudera
--check-column empno
--incremental append
--last-value 7521

13) Import and append data from MySQL to HDFS

Consider the same example specified above.

As per the data, the last entry in the table is 1981-09-28, and if we update one or two records, as shown below. 

The records which are updated in the source table should also need to be updated in the destination (HDFS). The command to do that -

sqoop import
--connect jdbc:mysql://localhost:3306/empdept
--table emp
--username root
--password cloudera   
--check-column HireDate
--incremental lastmodified
--last-value 1981-09-28
--target-dir /user/cloudera/empdept


Please click here for the next part


Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...