Showing posts with label Data types in Hive and Impala. Show all posts
Showing posts with label Data types in Hive and Impala. Show all posts

Monday, September 28, 2020

Big Data: Apache Hive & Impala Data Types Quick Reference

This article offers an overview of the various data types that are available both in Apache Hive & Impala. 


TINYINT - 1 byte 
Range: -128 to 127

SMALLINT - 2 bytes 
Range: -32,768 to 32,767

INT - 4-bytes
Range: -2,147,483,648 to 2,147,483,647

BigInt - 8 bytes value
Range: -9223372036854775808 .. 9223372036854775807.

FLOAT  - 4 bytes
single precision floating point number

DOUBLE - 8-byte
double precision floating point number

DECIMAL 
Hive 0.13.0 introduced user definable precision and scale

STRING 
The hard limit on the size of a STRING and the total size of a row is 2 GB.
The limit is 1 GB on STRING when writing to Parquet files.

TIMESTAMP

Timestamps were introduced in Hive 0.8.0. It supports traditional UNIX timestamp with the optional nanosecond precision.

The supported Timestamps format is yyyy-mm-dd hh:mm:ss[.f…].

Complex types:
Complex types (also referred to as nested types) in Hive let you represent multiple data values within a single row/column position. Impala supports the complex types ARRAY, MAP, and STRUCT in Impala 2.3 and higher. 

Arrays: Array<data_type>
     Collection of Similar Data
Maps: Map<primitive_type, data_type>
     Key Value Combination
Structs: Struct<col_name : data_type [Comment col_comment], …>
    Collection of Different Data


Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...