Showing posts with label split function in hive and impala. Show all posts
Showing posts with label split function in hive and impala. Show all posts

Wednesday, September 30, 2020

Split equivalent in Impala

Split function splits the data based on the delimiter provided and it is mostly used function in Apache Hive. This function is not available in Impala. However, there is an alternative to it.

Let us first see the usage of the "split" function in Hive.

Below is the patient's blood pressure variations information.

TableName: PatientsData
Systolic-Diastolic
122/80, 122/83, 130/83, 135/86, 140/95, 147/92

SELECT split(data,'\/') as split_data from PatientsData;

Result:
split_data
122,80
122,83
130,83
130,83
135,86
140,95
147,92

SELECT split(data,'\/')[0] AS Systolic, 
               split(data,'\/')[1] AS Diastolic 
FROM PatientsData;

Result:
Systolic   Diastolic
122          80
122          83
130          83
130          83
135          86
140          95
147          92


Let's do the same exercise in Impala using "split_part" function.

SELECT split_part(data,'\/',1) AS Systolic, 
               split_part(data,'\/',2) AS Diastolic 
FROM PatientsData;

Result:
Systolic   Diastolic
122          80
122          83
130          83
130          83
135          86
140          95
147          92

As per the documentation from Apache, below is the description of the function.

SPLIT_PART(STRING source, STRING delimiter, BIGINT index)

Purpose: Returns the requested indexth part of the input source string split by the delimiter.

If the index is a positive number, returns the indexth part from the left within the source string. If the index is a negative number, returns the indexth part from the right within the source string. If the index is 0, returns an error.

The delimiter can consist of multiple characters, not just a single character.

All matching of the delimiter is done exactly, not using any regular expression patterns.

Return type: STRING



Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...