Monday, September 28, 2020

TOP, LIMIT, ROWNUM vs DENSE_RANK

What would you do if you were asked to identify top-ten products based on their prices?

In SQL Server, using a TOP clause with a specified number of records with descending order of price?

In MySQL and Impala, using a LIMIT clause with a specified number of records with descending order of price?

Basically, TOP (in SQL Server), LIMIT (in MySQL and Impala), or ROWNUM (in Oracle SQL*Plus) keywords are used for pagination or page-results or limit the number of rows and is useful when applied on large tables. They will not help in identifying the rankings directly unless some workarounds. 

Let's create some sample data and do some exercises to understand the scenario.

The following statement will create a "Products" table:

CREATE TABLE Products
(
ProductName STRING,
Price DECIMAL(7,2)
);



INSERT INTO Products (ProductName, Price) VALUES
('Delights breads',25),
('Galaxy Chocolates',20),
('Kitkat Chocolates',22),
('Rainbow Chocolates',19),
('Americana Chocobread',26),
('Palm Milky Chocobars',28),
('Bounty chocolates',26),
(Sparkles chocos',23),
('Smiley Cocos',21),
('DelightPlus chocos',22),
('Softy chocobar',18),
('Minis chocos',8)


Now we have "Products" table with data.











Let's query against the table to retrieve "Products" data based on the descending order of "Price" 

SELECT ProductName, Price FROM Products
ORDER BY Price DESC;


Now, let us retrieve the top ten product information based on the highest price using LIMIT clause.

SELECT ProductName, Price FROM Products
ORDER BY Price DESC
LIMIT 10;



This returned ten rows however by looking at the data we can say it is not giving the information what we are looking for. i.e. the top-ten product information. There are some products that have the same price hence it will be considered only Top-8 products.

In this scenario, we need to use DENSE_RANK to fetch the ranking of the products based on their price.

SELECT * FROM (
SELECT ProductName, Price, DENSE_RANK() OVER (ORDER BY Price DESC)
AS RankValue FROM Products)
AS Tab
WHERE RankValue <= 10;


Finally, we have successfully retrieved top-ten product information.


Hope you find this article helpful.



Sunday, September 27, 2020

Sqoop Complete Tutorial Part-7

This is the continuation part of "Sqoop Complete Tutorial". If you want to read -


18) Importing all tables from MySQL to Hive  

Importing a table from MySQL to Hive's default database. The below command will help in copying all the tables from MySQL to Hive user database. 

sqoop import-all-tables
--connect jdbc:mysql://localhost/empdept
--username root
--password cloudera
--hive-import
--hive-database dbTest

The result is below:


If you look at the result, the data is replicated. This is because the tables "emp" and "dept" already exists in the database. hive-overwrite will help in replacing the data if already exist. 

sqoop import-all-tables
--connect jdbc:mysql://localhost/empdept
--username root
--password cloudera
--hive-import
--warehouse-dir /user/hive/warehouse/dbtest
--hive-database dbtest
--hive-overwrite

Here we have additionally provided the warehouse directory to specify the location of the database.


19) Importing all tables but excluding few from MySQL to Hive  

I have created a table named "location" in my current database 'empdept'.


I am about to import all tables but excluding 'emp' and 'dept' since those were already imported. Since "location" is the only table to import, I can specify the table name, however, let's see how it can be done with sqoop-import-all.

sqoop import-all-tables
--connect jdbc:mysql://localhost/empdept
--username root
--password cloudera
--hive-import
--hive-database dbtest
--exclude-tables "emp,dept"


If you look at the above screenshot, the import process selecting only "loc" table and excluding the tables "emp" and "dept" from the import.


The import process is completed and the table schema and data populated into Hive warehouse/database. Let's verify in Hive.

Sqoop Complete Tutorial Part-6

This is the continuation part of "Sqoop Complete Tutorial". If you want to read -


16) Importing a table from MySQL to Hive's default database.  

Importing a table from MySQL to Hive's default database. The below command will help in copying "emp" table and data from MySQL to Hive "default" database as "employee".

sqoop import
--connect jdbc:mysql://localhost/empdept
--table emp
--username root
--password cloudera
--hive-import
--hive-table employees



Verifying the data:


17) Importing a table from MySQL to Hive's user database.  

Importing a table from MySQL to Hive's default database. The below command will help in copying "emp" table and data from MySQL to Hive's user database (dbTest) as "employee" table.

sqoop import 
--connect jdbc:mysql://localhost/empdept 
--table emp 
--username root 
--password cloudera 
--hive-import 
--hive-table employees
--hive-database dbTest


Verifying the data:



Please click here for the next part.




Saturday, September 26, 2020

Get Table Row Count And Used Size

The following query will help you in checking the total number of rows and the size of the table(s). 

SELECT
s.Name AS SchemaName,
t.Name AS TableName,
p.rows AS RowCounts,
CAST(ROUND((SUM(a.used_pages) / 128.00), 2) AS NUMERIC(36, 2)) AS Used_MB,
CAST(ROUND((SUM(a.total_pages) - SUM(a.used_pages)) / 128.00, 2) AS NUMERIC(36, 2)) AS Unused_MB,
CAST(ROUND((SUM(a.total_pages) / 128.00), 2) AS NUMERIC(36, 2)) AS Total_MB
FROM sys.tables t
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
GROUP BY t.Name, s.Name, p.Rows
ORDER BY Total_MB DESC
GO




Saturday, September 12, 2020

Last Modified Date - Stored Procedure

Usually, several stored procedures will be provided to DBAs to deploy in the Production environment. Some of the existing stored procedures need to be replaced or sometimes the stored procedures are new to the environment.

If the deployment process is not automated, it can cause some confusion in DBAs if the existing stored procedure is replaced or missed.

The below script will help you to clear any doubts by looking at the last modified date of the stored procedure.

SELECT name, create_date, modify_date
FROM sys.objects
WHERE type = 'P'
ORDER BY modify_date DESC


Please do let me know if you find any other approach.


Thursday, September 10, 2020

Import from Excel - Named Sheet

This article is to let you know how to import Excel data into SQL Server.

First of all download a set of components i.e OLE DB support that can be used to facilitate the transfer of data between Microsoft Office System files and non-Microsoft Office applications.

Please click here for the Microsoft download link.

The excel file has only one sheet and it is named as Sheet1 by default. Use the below script to import this data into SQL Server.

USE TestDB1

GO

CREATE TABLE SampleTable(IDCol INT, Col2 VARCHAR(10))

INSERT INTO SampleTable
SELECT * FROM
OPENROWSET('Microsoft.ACE.OLEDB.12.0',
                 'Excel 12.0; Database=D:\TestDataFolder\sample.xlsx',
                 'SELECT * FROM [Sheet1$]')
GO

SELECT * FROM SampleTable


If the excel file's tab has a specific name, i.e. named sheet then -

USE TestDB1
GO

CREATE TABLE SampleTable(IDCol INT, Col2 VARCHAR(10))

INSERT INTO SampleTable
SELECT * FROM
OPENROWSET('Microsoft.ACE.OLEDB.12.0',
                 'Excel 12.0; Database=D:\TestDataFolder\sample.xlsx',
                 'SELECT * FROM [Sample Data$]')
Go

SELECT * FROM SampleTable


Hope you find this article helpful.

Generate multiple CSV data files from SQL

Assume that there are thousands of rows in a table and the rows need to be split and saved in different CSV files based on date or category or status. Have you ever come across such a requirement?

If so, here is the code to accomplish it.


USE
TestDB1
GO

DECLARE @TestData20 TABLE(IntValCol INT, DateCol DATETIME)

INSERT INTO @TestData20 (IntValCol, DateCol) VALUES
(1,'09/05/2020'), (2,'09/05/2020'), (3,'09/06/2020'), (4,'09/06/2020'),
(5,'09/07/2020'), (6,'09/07/2020'), (7,'09/08/2020'), (8,'09/08/2020'),
(9,'09/09/2020'), (10,'09/09/2020'),(11,'09/10/2020'), (12,'09/10/2020') 

-- Declaring Variables
DECLARE @MinDate DATETIME,
              @MaxDate DATETIME,
              @FileName VARCHAR(30),
              @FilePath VARCHAR(100),
              @BCPCommand VARCHAR(4000) 

-- Assigning Values To Variables
SELECT @MinDate = MIN(DateCol) FROM @TestData20
SELECT @MaxDate = MAX(DateCol) FROM @TestData20

--Creating the loop
WHILE @MinDate <= @MaxDate
BEGIN
       --Setting up the filename and filepath
       SET @FileName = 'TempData'+'_'+CONVERT(VARCHAR(8),@MinDate,112)+'.csv'
       SET @FilePath = 'D:\TestDataFolder\'+@FileName

       --Fetching the data into a table
       SELECT * INTO Temp FROM @TestData20 WHERE DateCol = @MinDate 

       --Assigning the BCP statement to the variable
       SET @BCPCommand = 'BCP  TestDB1.dbo.Temp out '+@FilePath+' -T -w -t, '

       --Executing the BCP command through command-shell
       EXEC master..xp_cmdshell @BCPCommand

       --Dropping the table, to utilize it again
       DROP TABLE Tempf

       SET @MinDate = DATEADD(D,1,@MinDate)

END




Hope you find this article helpful.


Monday, September 7, 2020

SQL Server - Parsing, Casting and Conversion

Sometimes the data stores in the table in a different format in comparison to the real data type it belongs to. The date might be stored as an integer value or a string value, numerical values might be stored as a character string, the currency might be stored in number format. The reason or explanation could be whatever requested by the design or business. One cannot expect the same as an output, there we need casting, parsing, and conversion methods.

This article focuses on the following functions -

1) CAST

2) CONVERT

3) PARSE

4) TRY_PARSE

5) TRY_CAST

6) TRY_CONVERT

CAST:

Cast function converts an expression from one data type to another. This is ANSI-SQL specification.

Examples:

SELECT CAST(25.65 AS INT);
The query returns the value of 25.65 as 25 by converting the 'decimal' into an integer. 

SELECT CAST(25.65 AS VARCHAR);
The query returns the value of 25.65 as 25.65 but the later one is a character string. 

SELECT
 CAST('2017-08-25' AS DATETIME);
The query returns the value of "2017-08-25" as "2017-08-25 00:00:00.000" by converting the 'decimal' values into an integer.

Let's see one more example. 

USE AdventureWorks2017;
GO

SELECT Name,
             sellstartdate,
             CAST(sellstartdate AS VARCHAR) converteddate,
             ListPrice
FROM Production.Product
WHERE CAST(ListPrice AS int) LIKE '3%';
GO




CONVERT:

Convert is SQL implementation-specific function. Though it does the same the Cast function does and there isn't any difference in performance however Convert function allows a more noteworthy broadness of flexibility when converting among date and time values, fractional numbers, and monetary signifiers.

Examples:

SELECT CONVERT (INT, 25.65)
The query returns "25" by converting the 'decimal' into an integer. 

SELECT CONVERT(DATETIME, '2017-08-25')
The query returns "2017-08-25 00:00:00.000" a proper DateTime format. 

SELECT CONVERT(VARCHAR(10), GETDATE(), 101)

The query returns "09/02/2020" by converting the DateTime value into a character string.


Let's see one more example. 

USE AdventureWorks2017;
GO

SELECT Name,
             sellstartdate,
             CONVERT(VARCHAR(10),sellstartdate,101) converteddate,
             ListPrice
FROM Production.Product
WHERE CONVERT(INT, ListPrice) LIKE '3%';
GO


Please click here for more information about the "CONVERT" function.


PARSE:

Returns the result of an expression translated to the requested data type in SQL Server. This function is not an alternative to CAST or CONVERT function though it converts the data type one to another.

Let's see this date - "
Wednesday, 2 September 2020". If you see there isn't any issue with this date or its format as it's human-readable and we know that in some places people use such date format. However, built-in functions like 'CAST' or 'CONVERT' cannot convert such data that people of different timezones or different geo-locations used.

Let us test.

SELECT CAST('Wednesday, 2 September 2020' AS datetime)

SELECT CAST('Wednesday, 2 September 2020' AS datetime2)

SELECT CONVERT(DATETIME,'Wednesday, 2 September 2020')

SELECT CONVERT(DATETIME,'Wednesday, 2 September 2020')

All the above statements return 'conversion failed' error since CAST or CONVERT cannot convert such data into other formats.

SELECT PARSE('Wednesday, 2 September 2020' AS datetime)

SELECT PARSE('Wednesday, 2 September 2020' AS datetime2)

SELECT PARSE('Wednesday, 2 September 2020' AS datetime2 USING 'en-US');

The above three statements return the converted data as shown below.


Let's see one more example.


Above is a test data. EmployeeID usually is an integer value however intensionally it was assigned as VARCHAR. The same with Salary, "joindate" and "updatedate".

In the SELECT statement, +1 (plus one) at the end of the first parse statement is to check if it is convertible to a numeric value. If the outcome is not a number then addition will return an error. 

PARSE will not proceed further and throws an error if there is any null value


If the PARSE is used in a transaction and if there is any null, the whole transaction will be rolled back. This led to the introduction of TRY_PARSE. 


TRY_PARSE:

TRY_PARSE function is as same as PARSE; the only difference is it will return a null value when the conversion fails instead of canceling the whole conversion process. 

USE dbTEST
GO

CREATE TABLE EmpTest(
                           EmpID VARCHAR(4),
                           Ename VARCHAR(50),
                           Salary varchar(5),
                           joindate VARCHAR(40),
                           updateddate varchar(20))

INSERT INTO EmpTest VALUES
('A123','Kate','21000','Tuesday, 1 September 2020','09/01/2020'),
('B199','Jamon','23000','Wednesday, 2 September 2020','09/02/2020'),
('','Sandra',NULL,'Tuesday, 1 September 2020','09/01/2020') 

SELECT TRY_PARSE(RIGHT(EmpID,3) AS INT)+1,
             TRY_PARSE(Salary AS DECIMAL(15,2)),
             TRY_PARSE(joindate AS date),
             TRY_PARSE(updateddate AS DATETIME)
FROM EmpTest



Similarly, there are two more functions i.e. TRY_CONVERT and 
TRY_CAST. These two will also return a null value when the conversion fails, instead of cancelling the transaction and returning an error.


TRY_CONVERT:

SELECT TRY_CONVERT(INT, RIGHT(EmpID,3)),
             TRY_CONVERT(DECIMAL(15,2), Salary),
             TRY_CONVERT(DATETIME, updateddate)
FROM EmpTest


TRY_CAST:

SELECT TRY_CAST(RIGHT(EmpID,3) AS INT),
             TRY_CAST(Salary AS DECIMAL(15,2)),
             TRY_CAST(updateddate AS DATETIME)
FROM EmpTest





Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...