SQLBank: Word Count in HiveQL - Explode and Split Usage

Wednesday, September 30, 2020

Word Count in HiveQL - Explode and Split Usage

This article aims to explain the usage of the SPLIT function in HiveQL. If you are looking for a similar function in SQL Server, then please click here.

Let's create a staging table to load the data temporarily.
CREATE TABLE tempData (col1 STRING);

Load the data to the table.
LOAD DATA INPATH 'Desktop/DataFile' OVERWRITE INTO TABLE tempData;

To split the data from the above-created temp table
SELECT word, count(1) AS count FROM
(SELECT explode(split(col1, '\s')) AS word FROM tempData) temp
GROUP BY word
ORDER BY word;

Split function splits the data based on the delimiter provided. The Explode function will further split the data into smaller chunks. Let's see what these explode and split functions are doing with another example.

Below is the patient's blood pressure variations information.

TableName: PatientsData
Systolic-Diastolic
122/80, 122/83, 130/83, 135/86, 140/95, 147/92

SELECT split(data,'\/') as split_data from PatientsData;

Result:
split_data
122,80
122,83
130,83
130,83
135,86
140,95
147,92

SELECT split(data,'\/')[0] AS Systolic, split(data,'\/')[1] AS Diastolic from PatientsData;

Result:
Systolic Diastolic
122 80
122 83
130 83
130 83
135 86
140 95
147 92

SELECT explode(split(data,'\/')) as exploded_data from PatientsData;

Result:
exploded_data
122
80
122
83
130
83
130
83
135
86
140
95
147
92

Hope you understood the behavior of the function with the examples.

If you are looking for a word-count program using SQL Server, then click here.

If you are looking for a word-count program using Pig, then click here.

Wednesday, September 30, 2020

Word Count in HiveQL - Explode and Split Usage

No comments:

Post a Comment

Big Data & SQL