Showing posts with label concat_ws in hive. Show all posts
Showing posts with label concat_ws in hive. Show all posts

Wednesday, August 12, 2020

Concatenate rows (group concatenation) in MySQL, Hive, SQL Server and Oracle

The CONCAT() function joins the input strings or different column values and returns as a single string. However, if we need to concatenate rows of strings (rows from the same column) into a single string with a desired separator, this will not work.

There comes a function that fulfills this requirement and this function known with different names in various RDBMS and Big Data Technologies like Hive.

==========

MySQL:

==========

GROUP_CONCAT() is a function which merges the data from multiple rows into one field. It is a GROUP BY function which returns a string. Comma (,) will be used to separate the values in the string.

Example:

SELECT StudentName,GROUP_CONCAT(Subjects)

FROM tbStudentInfo

GROUP BY StudentName;

==========

Hive:

==========

Hive doesn’t have the same functionality like in MySQL however there are two functions collect_set() and CONCAT_WS() to be used to get the desired output.

Separator has to be specified explicitly.

Example:

SELECT StudentName,CONCAT_WS(‘,’, collect_set(Subjects)) as Group_Concat

FROM tbStudentInfo

GROUP BY StudentName;

==========

SQL Server:

==========

STRING_AGG() is the function which is introduced in SQL Server 2017 which is equivalent to MySQL’s GROUP_CONCAT function. There is a workaround to achieve group concatenation using STUFFF() along with FOR XML and PATH() functions if you are using older versions.

SQL Server 2017: Example:

SELECT  StudentName,

              STRING_AGG(Subjects,';') Subjects

FROM tbStudentInfo

GROUP BY

    StudentName;

Prior to SQL 2017: Example:

SELECT  DISTINCT StudentName,

              STUFF((SELECT ','+ a.Subjects FROM tbStudentsInfo a

WHERE a.StudentName = b.StudentName

FOR XML PATH('')),1,1,'') AS Subjects

              FROM tbStudentsInfo b

==========

Oracle 11g:

==========

LISTAGG orders data within each group specified in the ORDER BY clause for a specified measure, and then concatenates the values of the measure column.

SELECT

    StudentName,

    LISTAGG(Subjects, ', ') WITHIN GROUP (ORDER BY Subjects) "Subjeccts"

FROM tbStudentInfo

GROUP BY StudentName



Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...