This is the continuation
part of "Sqoop Complete Tutorial". If you want to read -
11) Importing data from MySQL to HDFS based on SQL query
sqoop import-all-tables
--connect jdbc:mysql://localhost/retail_db
--username root
--password cloudera
--exclude-tables "order_items"
Append & LastModified
The following is as per the Sqoop documentation.
Sqoop supports two types
of incremental imports: append and lastmodified.
You can use the --incremental argument to specify the type of
incremental import to perform.
You should specify append mode when importing a table where new rows are continually
being added with increasing row id values. You specify the column containing
the row’s id with --check-column. Sqoop imports rows where the check column has
a value greater than the one specified with --last-value.
An alternate table
update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may
be updated, and each such update will set the value of a last-modified column
to the current timestamp. Rows where the check column holds a timestamp more
recent than the timestamp specified with --last-value are
imported.
At the end of an
incremental import, the value which should be specified as --last-value for a subsequent import is printed to the screen. When running
a subsequent import, you should specify --last-value in
this way to ensure you import only the new or updated data. This is handled
automatically by creating an incremental import as a saved job, which is the
preferred mechanism for performing a recurring incremental import. See the
section on saved jobs later in this document for more information.
Below two are examples of incremental imports.
12) Importing data from MySQL to HDFS - Incremental Import
Assume that the below is the
"emp" data which is in MySQL's "EMPDEPT" database.
Importing it into HDFS
using the below command
sqoop import
--connect
jdbc:mysql://localhost:3306/empdept
--table emp
--username root
--password cloudera
-m1
After importing it into HDFS,
insert two rows in MySQL's table.
These two records need to be updated/appended in HDFS location. To
achieve it use the below command.
sqoop import
--connect jdbc:mysql://localhost:3306/empdept
--table
emp
--username root
--password cloudera
--check-column empno
--incremental
append
--last-value 7521
13) Import and append data from MySQL to HDFS
Consider the same example specified above.
As per the data, the last entry in the table is 1981-09-28, and if we update one or two records, as shown below.
The records which are updated in the source table should also need to be updated in the destination (HDFS). The command to do that -
sqoop import
--connect jdbc:mysql://localhost:3306/empdept
--table emp
--username root
--password cloudera
--check-column HireDate
--incremental lastmodified
--last-value 1981-09-28
--target-dir /user/cloudera/empdept
Please click here for the next part