The below is to understand how we can handle JSON data in SQL Server 2016 version on-wards.
Sample
JSON data is:
{"accountNumber": 2020112,
"pin": 2525},
{"accountNumber": 2567899,
"pin": 1462}
{"accountNumber": 6789925,
"pin": 2614}
{"accountNumber": 9925678, "pin": 6142}
This can be extracted into columns easily in SQL Server with the help of following code:
However, what if the data is not in the exact JSON format i.e. no curly braces, no colon (:) to indicate name-value pairs and no square bracket to hold arrays and values separated by comma (,), in addition to this if it is to be worked on prior to SQL Server 2016 versions?
Sample
data:
(accountNumber=2020112)(accountPin=2525)(Phone=+12345678)(countryId=121)
(DateOfBirth=19810726)(NumberOfCallsMade=381)
(accountNumber=202019)(accountPin=98291)(Phone=)(countryId=1881) (DateOfBirth=19860526)(NumberOfCallsMade=31)
If you look at the data, there are two rows and are not in same lengthy strings since the values lengths are different. No value provided for Phone attribute in the second row.
The reason behind highlighting these points is, recently I got to work with this data and SUBSTRING function in SQL Server alone is not much a help. I managed to create the report at the end by using PATINDEX along with SUBSTRING functions.
This is to let you know that, within the available resources we can sort out the things even if it looked complicated and unsolvable.
Let me add these two rows in a TEMP table.
Now let’s see how can we convert these strings into columns.
SELECT
(SELECT LEFT(
Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(
SELECT SUBSTRING(SUBSTRING(StringVal,16,7), PATINDEX('%[0-9]%', SUBSTRING(StringVal,16,7)), LEN(SUBSTRING(StringVal,16,7))) Val
)X)AccountNumber,
(SELECT LEFT(
Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(
SELECT SUBSTRING(SUBSTRING(StringVal,35,5), PATINDEX('%[0-9]%', SUBSTRING(StringVal,35,5)), LEN(SUBSTRING(StringVal,35,5))) Val
)X)Pin,
(SELECT LEFT(
Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(
SELECT SUBSTRING(SUBSTRING(StringVal,47,10), PATINDEX('%[0-9]%', SUBSTRING(StringVal,47,10)), LEN(SUBSTRING(StringVal,47,10))) Val
)X)Phone,
(SELECT LEFT(
Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(
SELECT SUBSTRING(SUBSTRING(StringVal,60,15), PATINDEX('%[0-9]%', SUBSTRING(StringVal,60,15)), LEN(SUBSTRING(StringVal,60,15))) Val
)X)CountryID,
(SELECT LEFT(
Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(
SELECT SUBSTRING(SUBSTRING(StringVal,78,20), PATINDEX('%[0-9]%', SUBSTRING(StringVal,78,20)), LEN(SUBSTRING(StringVal,78,20))) Val
)X)DateOfBirth,
(SELECT LEFT(
Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(
SELECT SUBSTRING(SUBSTRING(StringVal,105,20), PATINDEX('%[0-9]%', SUBSTRING(StringVal,105,20)), LEN(SUBSTRING(StringVal,105,20))) Val
)X)NumberOfCallsMade
FROM #Temp
Let’s see what substrings are returning.
SELECT SUBSTRING(StringVal,16,7) FROM #Temp
SELECT SUBSTRING(StringVal,35,5) FROM #Temp
SELECT SUBSTRING(StringVal,47,10) FROM #Temp
SELECT SUBSTRING(StringVal,60,15) FROM #Temp
SELECT SUBSTRING(StringVal,78,20) FROM #Temp
SELECT SUBSTRING(StringVal,105,20) FROM #Temp
Substrings are returning invalid or unnecessary characters apart from the desired output. Hence we used PATINDEX to fetch only numbers.
This conversion can be achieved using some user defined function as well by reading row by row and character by character. As I said earlier there will be many ways to solve the issues.