Showing posts with label block size in linux. Show all posts
Showing posts with label block size in linux. Show all posts

Saturday, August 29, 2020

Difference between Local File System vs HDFS

In an operating system, file system is the strategy that will be used to keep track of files on a disk. It has its own method to organize the files on the disk or partition. 

HDFS will be deployed on top of the existing Operating system to bring its own file system method. This way, the system will have two different file systems at a time which we call "local file system" and HDFS.

There is a difference between the Local File System and the Hadoop Distributed File System (HDFS) and the difference is mainly because of the block size. 

The block size is 4 KB both in Windows and Unix local file systems. But the block size in Hadoop HDFS is 64 MB in the initial version and in later versions, it is 128 MB which is configurable. This impacts the disk seek. For a large file, there will be multiple disk-seeks in local file system due to its 4KB block size. Since HDFS maintains higher block allocation, the data will be read sequentially after every individual seek. 

Large files will be split into multiple chunks automatically, distributed and stored across various slave machines (aka. nodes) in HDFS. In the local file system, the files will be saved the way they are.

The machine with the local file system is physically a single unit. HDFS is logically a single unit.


These are the differences I found over a period of time. If you find any more, please do share with me. 

Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...