Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

Saturday, August 8, 2020

Creating Table From Existing Table in Hive, Impala, SQL Server, Oracle SQL*Plus, PostgreSQL and MySQL

In some scenarios we may need to create a table based on the existing ones. Sometimes we may need the table along with the data and sometimes we may need only the table structure. Most of the RDBMSs, Apache Hive and Cloudera's Impala supports CREATE TABLE.. AS and CREATE TABLE.. LIKE. 

In both the scenarios, we do not specify the columns at all; the column names and types are derived from the source table, query, or data file.

Use of CREATE TABLE ... LIKE is to create an empty table based on the definition of another table, including any column attributes and indexes defined in the original table. On the other hand, CREATE TABLE ... AS is to create the table based on the columns specified in the SELECT statement. It helps in fetch the only columns that required. However, the column definitions will be as same as in source table.

Let's see in which RDBMSs we have these functionalities. 

As stated in the above picture, most of the relational applications do support 'Create Table..As' and 'Create Table..Like' except Microsoft SQL Server. However there is another approach available in SQL Server to accomplish the same task. Before we discuss about SQL Server functionality we will look into the Create table statements which will work in MySQL, Oracle SQL*Plus, PostgreSQL, Hive and Impala.

CREATE TABLE <newTableName> AS <SelectStatement>

CREATE TABLE <newTableName> LIKE <ExistingTableName>

Let's see some examples:

CREATE TABLE EmployeeData AS SELECT EmpID, Ename, Sal FROM Emp;

As you see, we are selecting only required columns and not fetching the DeptNo from Emp table. This will create the table EmployeeData with Three columns along with the data. There is a workaround to create the table with the desired columns definitions without importing the data too.

CREATE TABLE EmployeeData AS SELECT EmpID, Ename, Sal FROM Emp WHERE 1=2;

Since 1=2 is false, the data import will not happen but the table will be created with the specified columns.

CREATE TABLE..LIKE will not give the scope to select the few columns from the existing table. It will copy the source table's structure. Let's see an example -

CREATE TABLE EmployeeData LIKE Emp;

If you are working with PostgreSQL and if you are getting a syntax error then use 

CREATE TABLE EmployeeData (like Emp); 

Now let's check how this can be accomplished using Microsoft SQL Server.

SELECT * INTO EmployeeData FROM Emp;

This statement will copy the Emp table's structure and data into EmployeeData table. Here also we can fetch the columns based on requirement; See the below example to understand.

SELECT EmpID, Ename, Sal INTO EmployeeData FROM Emp;

 If your requirement is to create table without importing the data, you can use a where clause that fails.

SELECT EmpID, Ename, Sal INTO EmployeeData FROM Emp WHERE 1=2

Hope you find this article helpful.



Thursday, July 30, 2020

Calculating "Approximate Median" in Cloudera Impala, Apache Hive, SQL Server, Oracle and MySQL

APPROX_MEDIAN is an approximate inverse distribution function that accept a nonstop/continuous dispersion model. It takes a numeric or datetime value and returns an estimated middle value or an approximate interpolated value that would be the middle value once the values are sorted. Nulls are ignored in the calculation.

In short, median is the middle value of a set of ordered data.

Median = {(n + 1) ÷ 2}th value

is the number of values in a set of data.

This function is available few RDBMSs like Oracle SQL*Plus and Cloudera Impala, also in Hive, we can achieve it using PERCENTILE function. If the function is not available in the RDBMS in which you work, we still can get the approximate median value in simple steps which we will discuss later.

Click here to get the “Emp” dataset from my previous post if the table and data not exists in your database.

Let's see how we implement it in Cloudera Impala first.

SELECT appx_median(sal) FROM emp;

Result: 
appx_median(Sal)
20000.00

SELECT DeptID, appx_median(sal) FROM emp GROUP BY DeptID;

 

Result: as shown in the picture.

The same can be accomplished in Hive with a different function.

SELECT DeptID, PERCENTILE(CAST(sal AS INT),0.5) FROM emp GROUP BY DeptID;


Let's try in Oracle SQL*Plus 12c

SELECT department_id "Department",

       APPROX_MEDIAN(salary DETERMINISTIC) "Median Salary"

  FROM employees

  GROUP BY department_id

  

In SQL Server:

The below will work only if the compact mode is 110 or higher

select

  percentile_cont(0.25) within group(order by sal) over () as percentile_cont_25,

  percentile_cont(0.50) within group(order by sal) over () as percentile_cont_50,

  percentile_cont(0.75) within group(order by sal) over () as percentile_cont_75,

  percentile_cont(0.95) within group(order by sal) over () as percentile_cont_95

from emp;

In case of MySQL, there are many ways to calculate the median value. The workarounds can be found here.

Hope you find this article useful in calculating approximate median in Big Data technologies like Cloudera Impala, Apache Hive and various traditional RDBMSs.

Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...