Monday, July 27, 2020

Handling "Json" and "Unstructured" Data in SQL

The below is to understand how we can handle JSON data in SQL Server 2016 version on-wards.

Sample JSON data is:

  {"accountNumber": 2020112, "pin": 2525},

  {"accountNumber": 2567899, "pin": 1462}

  {"accountNumber": 6789925, "pin": 2614}

  {"accountNumber": 9925678, "pin": 6142}

This can be extracted into columns easily in SQL Server with the help of following code:

However, what if the data is not in the exact JSON format i.e. no curly braces, no colon (:) to indicate name-value pairs and no square bracket to hold arrays and values separated by comma (,), in addition to this if it is to be worked on prior to SQL Server 2016 versions?

Sample data:

(accountNumber=2020112)(accountPin=2525)(Phone=+12345678)(countryId=121) (DateOfBirth=19810726)(NumberOfCallsMade=381)

(accountNumber=202019)(accountPin=98291)(Phone=)(countryId=1881) (DateOfBirth=19860526)(NumberOfCallsMade=31)

If you look at the data, there are two rows and are not in same lengthy strings since the values lengths are different. No value provided for Phone attribute in the second row.

The reason behind highlighting these points is, recently I got to work with this data and SUBSTRING function in SQL Server alone is not much a help. I managed to create the report at the end by using PATINDEX along with SUBSTRING functions.

This is to let you know that, within the available resources we can sort out the things even if it looked complicated and unsolvable.

Let me add these two rows in a TEMP table.

 

Now let’s see how can we convert these strings into columns. 

SELECT

(SELECT LEFT(

     Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(

 SELECT SUBSTRING(SUBSTRING(StringVal,16,7), PATINDEX('%[0-9]%', SUBSTRING(StringVal,16,7)), LEN(SUBSTRING(StringVal,16,7))) Val

)X)AccountNumber,

(SELECT LEFT(

      Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(

 SELECT SUBSTRING(SUBSTRING(StringVal,35,5), PATINDEX('%[0-9]%', SUBSTRING(StringVal,35,5)), LEN(SUBSTRING(StringVal,35,5))) Val

)X)Pin,

(SELECT LEFT(

      Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(

 SELECT SUBSTRING(SUBSTRING(StringVal,47,10), PATINDEX('%[0-9]%', SUBSTRING(StringVal,47,10)), LEN(SUBSTRING(StringVal,47,10))) Val

)X)Phone,

(SELECT LEFT(

        Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(

 SELECT SUBSTRING(SUBSTRING(StringVal,60,15), PATINDEX('%[0-9]%', SUBSTRING(StringVal,60,15)), LEN(SUBSTRING(StringVal,60,15))) Val

)X)CountryID,

(SELECT LEFT(

          Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(

 SELECT SUBSTRING(SUBSTRING(StringVal,78,20), PATINDEX('%[0-9]%', SUBSTRING(StringVal,78,20)), LEN(SUBSTRING(StringVal,78,20))) Val

)X)DateOfBirth,

(SELECT LEFT(

          Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(

 SELECT SUBSTRING(SUBSTRING(StringVal,105,20), PATINDEX('%[0-9]%', SUBSTRING(StringVal,105,20)), LEN(SUBSTRING(StringVal,105,20))) Val

)X)NumberOfCallsMade

FROM #Temp

Let’s see what substrings are returning.

SELECT SUBSTRING(StringVal,16,7) FROM #Temp

SELECT SUBSTRING(StringVal,35,5) FROM #Temp

SELECT SUBSTRING(StringVal,47,10) FROM #Temp

SELECT SUBSTRING(StringVal,60,15) FROM #Temp

SELECT SUBSTRING(StringVal,78,20) FROM #Temp

SELECT SUBSTRING(StringVal,105,20) FROM #Temp

 

Substrings are returning invalid or unnecessary characters apart from the desired output. Hence we used PATINDEX to fetch only numbers.

This conversion can be achieved using some user defined function as well by reading row by row and character by character. As I said earlier there will be many ways to solve the issues.



Logical vs Physical Computed Columns

Many people have asked me about the difference between using and not using ‘PERSISTED’ in a computed column; and how do we know whether the column values are being stored logically or physically. 

I would like to share the answer to these questions and some more information here.

Let me start the with the definitions first. 

  • A computed column is basically a virtual column that is not physically stored in the table.
  • A computed column which is marked as PERSISTED will have its data stored physically.

Let’s see the difference between computed column and persisted computed column:


CREATE TABLE dbo.Products (

ProductID int IDENTITY (1,1) NOT NULL ,

QtyAvailable smallint ,

UnitPrice money ,

InventoryValue AS QtyAvailable * UnitPrice );

 

If we expand the table in object explorer, we see the column marked as computed with the datatype assigned based on the data of the columns & that also defines the computed column.


If we verify the properties of the table, it shows the following information.


Also, if we look into the system tables, it appears as a computed column.


Now drop this table and recreate it mentioning the keyword “PERSISTED” at the end of the table definition.

 

DROP TABLE dbo.Products

Go 

CREATE TABLE dbo.Products (

ProductID int IDENTITY (1,1) NOT NULL ,

QtyAvailable smallint ,

UnitPrice money ,

InventoryValue AS QtyAvailable * UnitPrice  PERSISTED)

 

Now, look at the properties, values or attributes as shown in the Picture-1, Picture-2 and Picture-3 along with the values of “sys.columns” system table. You will notice that there is no difference except for “is_persisted” column value in “sys.computed_columns” system table.

And if we talk about identifying whether it’s storing the data physically or logically, please continue reading the article.

 

CREATE TABLE dbo.Products (

ProductID int IDENTITY (1,1) NOT NULL ,

QtyAvailable smallint ,

UnitPrice money ,

InventoryValue AS QtyAvailable * UnitPrice );

 

--INSERTING 1000 ROWS

DECLARE @QtyAvailable SMALLINT, @UnitPrice MONEY

SELECT @QtyAvailable = 1

SELECT @UnitPrice = 10

while @QtyAvailable >=1 and @QtyAvailable <= 1000

BEGIN

    INSERT INTO dbo.Products values(@QtyAvailable, @UnitPrice)

    SELECT @QtyAvailable = @QtyAvailable + 1

    SELECT @UnitPrice = @UnitPrice + 1

END

GO

 

sp_spaceused 'dbo.Products'

GO

 

If you see, the data space it consumed is 24 KB out of the reserved 72 KB. Unused space remains 40 KB after the index size is being deducted.

Let us check the same for computed column with PERSISTED.

CREATE TABLE dbo.Products (

ProductID int IDENTITY (1,1) NOT NULL ,

QtyAvailable smallint ,

UnitPrice money ,

InventoryValue AS QtyAvailable * UnitPrice  PERSISTED)

 

--INSERTING 1000 ROWS

DECLARE @QtyAvailable SMALLINT, @UnitPrice MONEY

SELECT @QtyAvailable = 1

SELECT @UnitPrice = 10

while @QtyAvailable >=1 and @QtyAvailable <= 1000

BEGIN

    INSERT INTO dbo.Products values(@QtyAvailable, @UnitPrice)

    SELECT @QtyAvailable = @QtyAvailable + 1

    SELECT @UnitPrice = @UnitPrice + 1

END

GO

sp_spaceused 'dbo.Products'

GO


After using “PERSISTED”, the computed column values are being stored physically in the disk as you can see the difference in the space consumption.

Hope you liked this article & found it informative. Please do not hesitate to ask me for any further clarification.

Disk Space Usage For Analysis

The following implementation helped me in analyzing the disk space usage. 

There are many ways to get the available disk space in the server and using SCOM we can get the alerts whenever the disk space reaches below the threshold levels. This implementation fetches the available space information and stores it in a table on a daily basis which helps in comparing the data with the previous days and also over a period of time to understand how much disk space is being utilized. 

Undoubtedly, when we have the data, we will have lot of scope in analyzing it. 

As the famous quote goes, “If you torture the data long enough, it will confess”. 😊

Our requirement is to fetch the information in the format below and using xp_fixeddrives, we get the following results:

 

Code:

USE TestDB1

GO

-- Create a table to store the information

-- Adding a default value to the Updated Date Time. It records when the execution took place.

-- Inserting values into the table

 -- Retrieving the stored information in a single row / readable format.

As you see, this is implemented without leaving the SQL Server Management Studio and without using any third-party tool.


Thursday, July 23, 2020

Adding Multiple Columns With Default Value

Almost a decade before, I came across a requirement where I needed to add multiple columns in a table with default value. It took a while on that time to figure it out.

I found it in my old SQL Projects repository and thought to share it with you.


-- Creating a table and adding a row

CREATE TABLE TestTable1(Id INT, SomeCol VARCHAR(10))

INSERT INTO TestTable1 SELECT 10, 'abcdefgh'

-- Adding Multiple Columns With Default Values

ALTER TABLE TestTable1

                        ADD Col1 VARCHAR(100) NOT NULL DEFAULT('SomeInformation'),

                                    Col2 INT NOT NULL DEFAULT(999)

GO

-- Retrieve Data from Table

SELECT * FROM TestTable1

GO



Tuesday, July 21, 2020

SQL Server - Replicate Function

SQL Server "Replicate" Function repeats a character expression to a specified number of times. It is equivalent to or alternative of Oracle's RPAD function.

Let's' check how this will be useful to the developers.

SELECT LEN(REPLICATE(2, '12'))

The above query will return 12 as a result due to 2 will be called 12 times (222222222222) hence the string length will be 12.

SELECT REPLICATE(2, 5)

The above statement will place 2 five times hence the result will be 22222.

SELECT REPLICATE('SQL', 5)

The above statement will place 'SQL' five times hence the result will be 'SQLSQLSQLSQLSQL'.

Let's' go through with more examples.

/**********/
CASE-1
/**********/
DECLARE @String AS NUMERIC(8,2)
SELECT @String = .2
SELECT @String

The above example returns 0.20 as a result. Assume that there is a requirement to have a 4 digit string (fixed length) like '00.20' but not just 0.20. Though logically this requirement is inappropriate but just think of the format.

/**********/
CASE-2
/**********/
DECLARE @String AS NUMERIC(8,2)
SELECT @String = .02
SELECT @String

The above statement returns 0.02 as a result. Similar to above case, requirement is to have 00.02.

Let's' now use the 'Replicate' function to get the required result for the above said two cases.

DECLARE @String AS NUMERIC(8,2)
SELECT @String = .2
SELECT CAST(REPLICATE(0, 1) AS VARCHAR(5)) + Cast(@String AS VARCHAR(5))

Or you can use the following instead
SELECT CAST(REPLICATE(0, Len(@String)-3) AS VARCHAR(5)) + Cast(@String AS VARCHAR(5))

The above query will return 00.20 as a result.

We have provided '.2' but SQL Server will store the number as 0.20 due to the datatype what we have provided. Hence the total string length will be 4 (along with the period).

SELECT CAST(REPLICATE(0, Len(@String)) AS VARCHAR(5)) + Cast(@String AS VARCHAR(5))

The result will be 00000.20, since we have requested to place the 0 for 4 times (lengh of the string is 4)

DECLARE @String AS NUMERIC(8,2)
SELECT @String = 08.2
SELECT Cast(Replicate(0,5-Len(@String)) AS VARCHAR(5)) + Cast(@String AS VARCHAR(5))

The result will be 08.20 for the above statement due to we had commanded that the length of the string will be 5 and out of five we have provided 4 characters to the variable @String.

Let's' write a stored procedure to avoid providing length parameter manually.

/******************** Procedure **********************/
CREATE PROCEDURE uspReplicate(@String AS NUMERIC(8,2))
AS
DECLARE
@length AS INT

BEGIN
SET @length = LEN(@String)

IF @length < 5
BEGIN
SELECT CAST(REPLICATE(0, 5-@length) AS VARCHAR(10)) + Cast(@String AS VARCHAR(10))
END
ELSE
SELECT @String
END
/************* End of the Procedure *****************/

EXEC uspReplicate 0.02
Result : 00.02

EXEC uspReplicate .02
Result : 00.02

EXEC uspReplicate .2
Result : 00.20

EXEC uspReplicate 2.02
Result : 02.02

EXEC uspReplicate 2.0
Result : 02.00

EXEC uspReplicate 2
Result : 02.00

The below example will let you know the real use of this function.
DECLARE @Account1 INT='6700'
DECLARE @Account2 INT='007800'
DECLARE @Account3 INT='108978'

SELECT @Account1, @Account2, @Account3

If you look at the values, the account number should be 6 digit value. But when you retrieve them, the returned values are 6700, 7800 and 108978. To bring all of them in proper 6 digit format, use the below code.

SELECT CONCAT(REPLICATE('0', 6-LEN(@Account1)),@Account1)
SELECT CONCAT(REPLICATE('0', 6-LEN(@Account2)),@Account2)
SELECT CONCAT(REPLICATE('0', 6-LEN(@Account3)),@Account3)

Hope this article helped you in understanding the usage of replicate function.

Alert on Scheduled SQL Jobs - Missed to enable

During the deployments, windows patching or any other activity, DBAs often disable the jobs to avoid data corruption and jobs failure. Once the activity is completed, DBAs need to re-enable them.

Since it is a manual intervention, it is possible that a job might be left behind from re-enabling. It happened in my case long time ago and since then I have been extra cautious not to repeat the same mistake again. 

I thought to create an alert mechanism that shall help me in identifying if anything is missed. 

Sharing the code below, hoping this will help you as well. 

Steps:

1) Fetch all the active jobs information in a table

2) Compare the information with the current status

3) Unmatched information should be emailed. 

Code: 



Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...