Monday, September 28, 2020

Big Data: Apache Hive & Impala Data Types Quick Reference

This article offers an overview of the various data types that are available both in Apache Hive & Impala. 


TINYINT - 1 byte 
Range: -128 to 127

SMALLINT - 2 bytes 
Range: -32,768 to 32,767

INT - 4-bytes
Range: -2,147,483,648 to 2,147,483,647

BigInt - 8 bytes value
Range: -9223372036854775808 .. 9223372036854775807.

FLOAT  - 4 bytes
single precision floating point number

DOUBLE - 8-byte
double precision floating point number

DECIMAL 
Hive 0.13.0 introduced user definable precision and scale

STRING 
The hard limit on the size of a STRING and the total size of a row is 2 GB.
The limit is 1 GB on STRING when writing to Parquet files.

TIMESTAMP

Timestamps were introduced in Hive 0.8.0. It supports traditional UNIX timestamp with the optional nanosecond precision.

The supported Timestamps format is yyyy-mm-dd hh:mm:ss[.f…].

Complex types:
Complex types (also referred to as nested types) in Hive let you represent multiple data values within a single row/column position. Impala supports the complex types ARRAY, MAP, and STRUCT in Impala 2.3 and higher. 

Arrays: Array<data_type>
     Collection of Similar Data
Maps: Map<primitive_type, data_type>
     Key Value Combination
Structs: Struct<col_name : data_type [Comment col_comment], …>
    Collection of Different Data


No comments:

Post a Comment

Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...