Showing posts with label copyfromlocal. Show all posts
Showing posts with label copyfromlocal. Show all posts

Tuesday, September 29, 2020

HDFS Basic Commands

This article will explore some Hadoop basic commands that help in our day-to-day activities.

Hadoop file system shell commands are organized in a similar way to Unix/Linux environments. For people who work with Unix shell, it is easy to turn to Hadoop shell commands. Such commands communicate with HDFS and other Hadoop-supported file systems.

1) List-out the contents of the directory.

ls
is to list out the files from the current directory (local system)

hadoop fs -ls
will list HDFS home directory (/user/cloudera/) content of the current user

hadoop fs -ls /
will list sub-directories of the root directory.

hdfs dfs -ls
will list the contents of the root directory.

Note: Use hadoop fs for older versions and hdfs dfs for newer versions of Hadoop. 

hadoop fs -ls /user/cloudera
/user/cloudera is default HDFS location in Cloudera VM where users files get copied.

hadoop fs -ls -R / 
recursively displays entries in all subdirectories of a path

2) Create or delete a directory

hadoop fs –mkdir /path/directory_name
mkdir is the command to create a folder/directory in a given path. 

Example:
hadoop fs -mkdir testdir1
hadoop fs –mkdir /user/cloudera/testdir2

hadoop fs -rm -r /user/cloudera/testdir2
-rm -r is the command to delete a folder/directory or a specific file.

Example:
hadoop fs -rm -r /user/cloudera/testdir2
hadoop fs -rmr /user/cloudera/testdir2/file1.txt

Note: If the OS is in safemode then you’ll not be able to create any directories in HDFS.

To check the status of safemode
hadoop dfsadmin -safemode get

To change the safemode to ON
hadoop dfsadmin -safemode enter

To change the safemode to OFF / or to leave the safemode

hadoop dfsadmin -safemode leave


3) Copy The File From Local System To Hadoop

hadoop fs -put <sourcefilepath> <destinationfilepath>

Examples:

hadoop fs -put Desktop/Documents/emp.txt /user/cloudera/empdir

hadoop fs -copyFromLocal Desktop/Documents/emp.txt /user/cloudera/emp.txt

To know more about "copyFromLocal", "put" "copyToLocal" and "get", please click here.  

4) Read the file

hadoop fs -cat /user/cloudera/emp.txt

The above command helps in reading the file however, one has to avoid using this command for large files since it can impact on I/O. This command is good for files with small data.

5) Copy the file from HDFS to Local System

hadoop fs -get /user/cloudera/emp.txt Desktop/Documents/emp1.txt
hadoop fs -copyToLocal /user/cloudera/emp.txt Desktop/Documents/emp2.txt

This is reverse scenario of Put & CopyFromLocal. For more information click here.


6) Move the file from one HDFS location to another (HDFS location)

Hadoop fs -mv emp.txt testDir

Hadoop fs -mv testDir tesDir2

Hadoop fs -mv testDir2/testDir /user/cloudera

Hadoop fs -mv testDir/emp.txt /user/cloudera

7) Admin Commands

sudo vi /etc/hadoop/conf/hdfs-site.xml 
Note: hdfs-site.xml is a configuration file where we can change.

To view the config settings
go to --> computer-browse folder-filesystem-->etc-->hadoop-->conf-->hdfs-site.xml

To change the default configuration values such as dfs.replication or dfs.blocksize from hdfs-site.xml, use the sudo commands

sudo vi /etc/hadoop/conf/hdfs-site.xml
Note: "vi" is the editor to edit such sudo files.

Click "I" for insert option or to bring it in edit mode.

Modify the values as per your requirement.

To save and exit :wq!

hadoop fs -tail [-f] <file>

The Hadoop fs shell tail command shows the last 1KB of a file on console or stdout.


File exists error in HDFS - CopyFromLocal

HDFS is a distributed file system designed to run on top of the local file system. Many times we may need to copy files from different sources i.e. from the internet, remote network, or from the local file system. There are  "CopyFromLocal" and "Put" commands to help us in performing the task. While copying a file from the local file system to HDFS, if the file exists in the destination, the execution will fail and we will receive 'the file exists' error.

Let's assume the file "emp.txt" already exists in the path /user/cloudera.

Hadoop fs -put Desktop/emp.txt /user/cloudera/emp.txt

This returned “the file already exists” error

Hadoop fs -copyFromLocal Desktop/emp.txt /user/cloudera/emp.txt

This also returned “the file already exists” error.

Hadoop fs -copyFromLocal -f Desktop/Documents/emp.txt /user/cloudera/emp.txt
This is succeeded. The file is copied to the destination without any errors.

The usage of the "-f" option with -copyFromLocal will overwrite the destination if it already exists.


Hope you find this article helpful.

Monday, September 28, 2020

Difference between CopyFromLocal, Put, CopyToLocal and Get

The purpose of this article is to let you know about few HDFS commands that are identical in behavior but distinct.


CopyFromLocal and Put: These two commands help in copying the file from one location to another. The difference between these two is that the "CopyFromLocal" command will help copy the file from local file system to HDFS, while the "Put" command will copy from anywhere (local or network) to anywhere (HDFS or local file system).

hadoop fs -put <Local system directory path or network path> <HDFS file path>

hadoop fs -copyFromLocal <Local system directory path>  <HDFS file path>

"Put" allows us to copy several file paths to HDFS at once (files or folders from 
local or remote locations), while copyFromLocal, on the other hand, is limited to local file reference.

A choice exists to overwrite an existing file using -f when using copyFromLocal. However, an error is returned if the file persists when "put" is executed.

In short, anything you do with copyFromLocal, you can do with "put", but not vice-versa.

CopyToLocal and Get: These two commands are just opposite to "CopyFromLocal" and "Put".
The destination is restricted to a local file reference when we use copyToLocal. While using "Get" there are no such restrictions.

Anything you do with copyToLocal, you can do with "get" but not vice-versa.
hadoop fs -get <HDFS file path> <Local system directory path> hadoop fs -copyToLocal <HDFS file path> <Local system directory path>

For complete HDFS commands please click here. For complete Hive DDL commands please click here.

Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...