Friday, October 2, 2020
SPACE function in Cloudera's Impala
TRANSLATE & REPLACE functions in Cloudera's Impala
Examples:
Oracle MySQL is a free software
input
string with each character in the from
argument replaced with the corresponding character in
the to
argument. The characters are matched in the order they appear
in from
and to
.
Output
---------------------
hetta earth
Wednesday, September 30, 2020
Chaining Common Table Expressions in Impala
This article will help you to learn how to implement chaining common table expressions in Cloudera's Impala.
-~ Chaining CTEs
WITH
Temp1 AS (
SELECT
OrderID,
OrderStatus,
ProductID
FROM temp2
WHERE order_status =
'completed'
),
Temp2 AS (
SELECT OrderID,
OrderStatus,
ProductID
FROM Orders
WHERE ProductID = 5
)
SELECT * FROM (SELECT COUNT(1) FROM Temp1) a;
Technically the query doesn't make any sense, it can be written in a
straight-forward and in a simple way -
SELECT COUNT(1)
FROM Orders
WHERE ProductID = 5
AND
order_status='completed'
However, observe the structure or the pattern. In order to execute
the first common table expression, it needs the values from the second
expression.
Get the unique values from multiple CTEs - Impala
Often, some requests for reports seem simple and straightforward. But it is challenging to find a better solution to be applied. Cloudera's Impala provides several features, and in order to accomplish the tasks in the best possible way, an analyst needs to be aware of them.
Assume that, after filtering out specific values, there are two separate result sets that need to be combined into a single output with distinct values. How would you do that?
Let's see with an example.
),
SELECT * FROM SHJ_Products WHERE Price > 500
)
Starting Hive & Hive metastore from command line
The following commands will help in starting the services of Hive and Hive metastore in case they didn't start automatically without logging into Cloudera manager.
sudo service hive-metastore start
sudo service hive-server2 start
Split equivalent in Impala
Split function splits the data based on the delimiter provided and it is mostly used function in Apache Hive. This function is not available in Impala. However, there is an alternative to it.
Let us first see the usage of the "split" function in Hive.
Below is the patient's blood pressure variations information.
TableName: PatientsData
Systolic-Diastolic
122/80, 122/83, 130/83, 135/86, 140/95, 147/92
SELECT split(data,'\/') as split_data from PatientsData;
Result:
split_data
122,80
122,83
130,83
130,83
135,86
140,95
147,92
SELECT split(data,'\/')[0] AS Systolic,
split(data,'\/')[1] AS Diastolic
FROM PatientsData;
Result:
Systolic Diastolic
122 80
122 83
130 83
130 83
135 86
140 95
147 92
Let's do the same exercise in Impala using "split_part" function.
SELECT split_part(data,'\/',1) AS Systolic,
split_part(data,'\/',2) AS Diastolic
FROM PatientsData;
Result:
Systolic Diastolic
122 80
122 83
130 83
130 83
135 86
140 95
147 92
As per the documentation from Apache, below is the description of the function.
SPLIT_PART(STRING source, STRING delimiter, BIGINT index)
Purpose: Returns the requested indexth part of the input source string split by the delimiter.
If the index is a positive number, returns the indexth part from the left within the source string. If the index is a negative number, returns the indexth part from the right within the source string. If the index is 0, returns an error.
The delimiter can consist of multiple characters, not just a single character.
All matching of the delimiter is done exactly, not using any regular expression patterns.
Return type: STRING
Joins in HiveQL
As we discussed earlier, HiveQL handles structured data only, much like SQL. This doesn't mean that Hive just manages structured data, it also processes and transforms the unstructured data into a readable structured way.
Unstructured data is usually not dependent on static data or other data files. But in structured data, especially if the data is imported from relational systems, due to normalization, the tables may be connected with other tables to obtain meaningful information for a few columns. Hence functionalities of SQL will also be needed in big-data platforms such as Joins, Sub-queries, casting, and conversion functions.
This article focuses on the joins that are available in Hive.
Below is the type of joins available in Hive.
- INNER JOIN
- LEFT OUTER JOIN
- RIGHT OUTER JOIN
- FULL OUTER JOIN
I am not specifying "self-join" explicitly because it is also an inner join.
INNER JOIN is the join type that combines two tables to return records that have matching values from both the tables.
LEFT OUTER JOIN returns all records from the left table and the matched records from the right table.
RIGHT OUTER JOIN is a join that returns all records from the right table, and the matched records from the left table.
FULL OUTER JOIN displays all the records i.e. matched records and unmatched records from both the tables.
Not just the description but the implementation is also as same as in SQL.
Please do let me know if you want to see how it is to be implemented in Hive.
Can we create a table based on a view in HIve?
Can we create a table based on a view in Hive?
Yes, we can.!!
Let's create "View" by combining Emp and Dept tables.
-~ To create a view
CREATE VIEW Emp_View
AS
SELECT Ename, DName FROM Emp
INNER JOIN Dept ON Emp.DeptNo = Dept.DeptNo;
Now let's create the table based on the view.
CREATE TABLE EmpDept
AS
SELECT * FROM Emp_View;
Creating a table and a view with the select statement.
In this article, you'll learn how to create a view and table based on a select statement.
-~ To create a view
CREATE VIEW Emp_View
AS
SELECT Ename, DName FROM Emp
INNER JOIN Dept ON Emp.DeptNo = Dept.DeptNo;
-~ To create a table
CREATE TABLE EmpDept
AS
SELECT Ename, DName FROM Emp
INNER JOIN Dept ON Emp.DeptNo = Dept.DeptNo;
Can we create a table based on a view? Click here.
Hive - Extended Properties
DATABASE LEVEL:
Database or schema both are the same thing. These words can be used interchangeably.
Big Data & SQL
Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...
-
There is a ‘ CharIndex’ function in SQL Server which is similar to the Oracle ‘ Instr’ function. In Oracle, the syntax of the INSTR functi...
-
The CONCAT() function joins the input strings or different column values and returns as a single string. However, if we need to concatenate ...
-
Can we create a table based on a view in Hive? Yes, we can.!! Let's create "View" by combining Emp and Dept tables. -~ To crea...