Monday, July 27, 2020

Disk Space Usage For Analysis

The following implementation helped me in analyzing the disk space usage. 

There are many ways to get the available disk space in the server and using SCOM we can get the alerts whenever the disk space reaches below the threshold levels. This implementation fetches the available space information and stores it in a table on a daily basis which helps in comparing the data with the previous days and also over a period of time to understand how much disk space is being utilized. 

Undoubtedly, when we have the data, we will have lot of scope in analyzing it. 

As the famous quote goes, “If you torture the data long enough, it will confess”. 😊

Our requirement is to fetch the information in the format below and using xp_fixeddrives, we get the following results:

 

Code:

USE TestDB1

GO

-- Create a table to store the information

-- Adding a default value to the Updated Date Time. It records when the execution took place.

-- Inserting values into the table

 -- Retrieving the stored information in a single row / readable format.

As you see, this is implemented without leaving the SQL Server Management Studio and without using any third-party tool.


Thursday, July 23, 2020

Adding Multiple Columns With Default Value

Almost a decade before, I came across a requirement where I needed to add multiple columns in a table with default value. It took a while on that time to figure it out.

I found it in my old SQL Projects repository and thought to share it with you.


-- Creating a table and adding a row

CREATE TABLE TestTable1(Id INT, SomeCol VARCHAR(10))

INSERT INTO TestTable1 SELECT 10, 'abcdefgh'

-- Adding Multiple Columns With Default Values

ALTER TABLE TestTable1

                        ADD Col1 VARCHAR(100) NOT NULL DEFAULT('SomeInformation'),

                                    Col2 INT NOT NULL DEFAULT(999)

GO

-- Retrieve Data from Table

SELECT * FROM TestTable1

GO



Tuesday, July 21, 2020

SQL Server - Replicate Function

SQL Server "Replicate" Function repeats a character expression to a specified number of times. It is equivalent to or alternative of Oracle's RPAD function.

Let's' check how this will be useful to the developers.

SELECT LEN(REPLICATE(2, '12'))

The above query will return 12 as a result due to 2 will be called 12 times (222222222222) hence the string length will be 12.

SELECT REPLICATE(2, 5)

The above statement will place 2 five times hence the result will be 22222.

SELECT REPLICATE('SQL', 5)

The above statement will place 'SQL' five times hence the result will be 'SQLSQLSQLSQLSQL'.

Let's' go through with more examples.

/**********/
CASE-1
/**********/
DECLARE @String AS NUMERIC(8,2)
SELECT @String = .2
SELECT @String

The above example returns 0.20 as a result. Assume that there is a requirement to have a 4 digit string (fixed length) like '00.20' but not just 0.20. Though logically this requirement is inappropriate but just think of the format.

/**********/
CASE-2
/**********/
DECLARE @String AS NUMERIC(8,2)
SELECT @String = .02
SELECT @String

The above statement returns 0.02 as a result. Similar to above case, requirement is to have 00.02.

Let's' now use the 'Replicate' function to get the required result for the above said two cases.

DECLARE @String AS NUMERIC(8,2)
SELECT @String = .2
SELECT CAST(REPLICATE(0, 1) AS VARCHAR(5)) + Cast(@String AS VARCHAR(5))

Or you can use the following instead
SELECT CAST(REPLICATE(0, Len(@String)-3) AS VARCHAR(5)) + Cast(@String AS VARCHAR(5))

The above query will return 00.20 as a result.

We have provided '.2' but SQL Server will store the number as 0.20 due to the datatype what we have provided. Hence the total string length will be 4 (along with the period).

SELECT CAST(REPLICATE(0, Len(@String)) AS VARCHAR(5)) + Cast(@String AS VARCHAR(5))

The result will be 00000.20, since we have requested to place the 0 for 4 times (lengh of the string is 4)

DECLARE @String AS NUMERIC(8,2)
SELECT @String = 08.2
SELECT Cast(Replicate(0,5-Len(@String)) AS VARCHAR(5)) + Cast(@String AS VARCHAR(5))

The result will be 08.20 for the above statement due to we had commanded that the length of the string will be 5 and out of five we have provided 4 characters to the variable @String.

Let's' write a stored procedure to avoid providing length parameter manually.

/******************** Procedure **********************/
CREATE PROCEDURE uspReplicate(@String AS NUMERIC(8,2))
AS
DECLARE
@length AS INT

BEGIN
SET @length = LEN(@String)

IF @length < 5
BEGIN
SELECT CAST(REPLICATE(0, 5-@length) AS VARCHAR(10)) + Cast(@String AS VARCHAR(10))
END
ELSE
SELECT @String
END
/************* End of the Procedure *****************/

EXEC uspReplicate 0.02
Result : 00.02

EXEC uspReplicate .02
Result : 00.02

EXEC uspReplicate .2
Result : 00.20

EXEC uspReplicate 2.02
Result : 02.02

EXEC uspReplicate 2.0
Result : 02.00

EXEC uspReplicate 2
Result : 02.00

The below example will let you know the real use of this function.
DECLARE @Account1 INT='6700'
DECLARE @Account2 INT='007800'
DECLARE @Account3 INT='108978'

SELECT @Account1, @Account2, @Account3

If you look at the values, the account number should be 6 digit value. But when you retrieve them, the returned values are 6700, 7800 and 108978. To bring all of them in proper 6 digit format, use the below code.

SELECT CONCAT(REPLICATE('0', 6-LEN(@Account1)),@Account1)
SELECT CONCAT(REPLICATE('0', 6-LEN(@Account2)),@Account2)
SELECT CONCAT(REPLICATE('0', 6-LEN(@Account3)),@Account3)

Hope this article helped you in understanding the usage of replicate function.

Alert on Scheduled SQL Jobs - Missed to enable

During the deployments, windows patching or any other activity, DBAs often disable the jobs to avoid data corruption and jobs failure. Once the activity is completed, DBAs need to re-enable them.

Since it is a manual intervention, it is possible that a job might be left behind from re-enabling. It happened in my case long time ago and since then I have been extra cautious not to repeat the same mistake again. 

I thought to create an alert mechanism that shall help me in identifying if anything is missed. 

Sharing the code below, hoping this will help you as well. 

Steps:

1) Fetch all the active jobs information in a table

2) Compare the information with the current status

3) Unmatched information should be emailed. 

Code: 



Monday, July 20, 2020

Analytical & Window Functions


Please refer to my previous post in which schema and data for EMP and DEPT tables available.

In this article we are about to discuss about SQL Server Analytical and Window functions. As stated in Microsoft docs, analytic functions calculate an aggregate value based on a group of rows. Unlike aggregate functions, however, analytic functions can return multiple rows for each group. Use analytic functions to compute moving averages, running totals, percentages or top-N results within a group.

These functions are common in most of the RDBMS applications and are widely used by the data and business analysts.

NTILE
It divides/distributes an ordered data set (or partition) into a specified number of groups which we call it buckets and assigns an appropriate (bucket) number to each row. The bucket number will represent each row to which bucket it belongs to.

In other words, it is used to divide rows into equal sets and assign a number to each row.

SELECT Ename, sal, NTILE(2) OVER (ORDER BY sal DESC) Bucket FROM Emp;


SELECT Ename, sal, NTILE(5) OVER (ORDER BY sal DESC) Bucket FROM Emp;


The following query retrieves the records from the first bucket.

SELECT * FROM (
SELECT          Ename,
sal, NTILE(4) OVER (ORDER BY sal DESC) Bucket
FROM Emp
) EmpAlias
WHERE Bucket=1;
ROW NUMBER:
This function represents each row with a unique and sequential value based on the column used in OVER clause. Here, we are having 10 rows in our Emp table and will use ROW_NUMBER on these records.

This can also be used to assign a serial /row number to the rows within the provided dataset.

SELECT DeptNo, sal, ROW_NUMBER() OVER (ORDER BY sal) AS row_num FROM emp;

SELECT 
     DeptNo, 
     sal, 
     ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS row_num 
FROM emp;

RANK:
This function is used to assign a rank to the rows based on the column values in OVER clause.

The row with equal values assigned the same rank with next rank value skipped.   

SELECT DeptNo, sal, RANK() OVER(ORDER BY sal DESC) AS rnk FROM emp;


SELECT   DeptNo, 
                 sal, 
                 RANK() OVER(PARTITION BY DeptNo ORDER BY sal DESC) AS rnk 
FROM emp;
DENSE_RANK: The DENSE_RANK analytics function used to assign a rank to each row. The rows with equal values receive the same rank and this rank assigned in the sequential order so that no rank values are skipped.

SELECT 
          DeptNo, 
          sal, 
          DENSE_RANK() OVER(PARTITION BY DeptNo ORDER BY sal DESC) AS dns_rnk 
FROM emp;
Let us use all the above functions in one query to see the difference in the results.

SELECT DeptNo AS dept, sal AS sal,
ROW_NUMBER() OVER (PARTITION BY DeptNo ORDER BY sal DESC) AS RowNumber,
RANK() OVER (PARTITION BY DeptNo ORDER BY sal DESC) AS iRank,
DENSE_RANK() OVER(PARTITION BY DeptNo ORDER BY sal DESC) AS DenseRank
FROM emp;


CUME_DIST:
This function stands for cumulative distribution. It computes the relative position of a column value in a group. Here, we can calculate the cumulative distribution of salaries among all departments. For a row, the cumulative distribution of salary is calculated as:

SELECT DeptNo, sal, CUME_DIST() OVER (ORDER BY sal) AS cum_dist FROM emp;

SELECT DeptNo, sal, CUME_DIST() OVER (ORDER BY sal) AS cum_dist FROM emp
WHERE DEPTNO in(20,30);


CUME_DIST(salary) = Number of rows with the value lower than or equals to salary / total number of rows in the dataset.

In the above example, due to ORDER BY clause, 1st row from salary will be counted as 1 and it will be divided by the total number of rows. That is 1/14 = 0.1

For the second row, it is 2/14 = 0.14;

For the 4th row and the next immediate row too has the same value, it will be calculated as 5/14 = 0.35 and assign it for the both rows.

Look at the outcome to understand.

PERCENT_RANK:
It is very similar to the CUME_DIST function. It ranks the row as a percentage. In other words, it calculates the relative rank of a row within a group of rows.

The range of values returned by PERCENT_RANK is between 0 to 1 and first row in the dataset is always zero. This means the return value is of the double type.

Let’s rank the salary by department wise as percentage:

Percent_Rank = (rank decreased by 1)/(remaining rows in the group)

SELECT DeptNo, sal,
RANK() OVER (PARTITION BY deptNo ORDER BY sal DESC) AS iRank,
CUME_DIST() OVER (PARTITION BY deptno ORDER BY sal) AS cum_dist,
PERCENT_RANK() OVER (PARTITION BY deptNo ORDER BY sal) AS perc_rnk 
FROM EMP;
Go

If you observe its behavior when it calculates the relative rank for the rows with same values, it assigns same percentage rank value (0.75) to both of them. The behavior is similar to rank function.

Range Between & Rows Between:
These functions are called window functions which fetches records right before and after the current record to perform the aggregation. It is similar to lead and lag functions however a window function defines a frame or window of rows with a given length around the current row, and performs a calculation across the set of data in the window. Mostly these functions will be used to get the cumulative sum/average, running or moving sum or averages.

Examples:
SELECT DISTINCT    DeptNo, --sal,
            SUM(sal) OVER(ORDER BY DeptNo RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'RangeUnbound'
FROM emp
GO


SELECT DISTINCT    DeptNo, --sal,
            SUM(sal) OVER(ORDER BY DeptNo ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'RowsUnbound'
FROM emp
GO


SELECT DISTINCT    EMPNO, sal,
            SUM(sal) OVER(ORDER BY EmpNo ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'RowsUnbound'
FROM emp
GO


SELECT DISTINCT    EMPNO, sal,
            SUM(sal) OVER(ORDER BY EmpNo RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'RangeUnbound'
FROM emp
GO


SELECT DISTINCT    Job, sal, DeptNo,
            SUM(sal) OVER(ORDER BY DeptNo RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'RangeUnbound'
FROM emp
GO


lag AND Lead FUNCTIONS:
The LAG function gets the information from a past column, while LEAD brings information from an ensuing line. The two functions are fundamentally the same as one another and you can simply supplant one by the other by changing the sort request.

SELECT ename, deptno, sal,
LEAD(sal, 1) OVER(PARTITION BY deptno ORDER BY sal) AS lead1,
LEAD(sal, 2) OVER(PARTITION BY deptno ORDER BY sal) AS lead2,
LAG(sal,1) OVER(PARTITION BY deptno ORDER BY sal) AS lag1,
LAG(sal,2) OVER(PARTITION BY deptno ORDER BY sal) AS lag2
FROM emp ORDER BY deptno,sal;


Hope this article helped you in understanding Analytical and Window functions.


Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...