Monday, September 7, 2020

SQL Server - Parsing, Casting and Conversion

Sometimes the data stores in the table in a different format in comparison to the real data type it belongs to. The date might be stored as an integer value or a string value, numerical values might be stored as a character string, the currency might be stored in number format. The reason or explanation could be whatever requested by the design or business. One cannot expect the same as an output, there we need casting, parsing, and conversion methods.

This article focuses on the following functions -

1) CAST

2) CONVERT

3) PARSE

4) TRY_PARSE

5) TRY_CAST

6) TRY_CONVERT

CAST:

Cast function converts an expression from one data type to another. This is ANSI-SQL specification.

Examples:

SELECT CAST(25.65 AS INT);
The query returns the value of 25.65 as 25 by converting the 'decimal' into an integer. 

SELECT CAST(25.65 AS VARCHAR);
The query returns the value of 25.65 as 25.65 but the later one is a character string. 

SELECT
 CAST('2017-08-25' AS DATETIME);
The query returns the value of "2017-08-25" as "2017-08-25 00:00:00.000" by converting the 'decimal' values into an integer.

Let's see one more example. 

USE AdventureWorks2017;
GO

SELECT Name,
             sellstartdate,
             CAST(sellstartdate AS VARCHAR) converteddate,
             ListPrice
FROM Production.Product
WHERE CAST(ListPrice AS int) LIKE '3%';
GO




CONVERT:

Convert is SQL implementation-specific function. Though it does the same the Cast function does and there isn't any difference in performance however Convert function allows a more noteworthy broadness of flexibility when converting among date and time values, fractional numbers, and monetary signifiers.

Examples:

SELECT CONVERT (INT, 25.65)
The query returns "25" by converting the 'decimal' into an integer. 

SELECT CONVERT(DATETIME, '2017-08-25')
The query returns "2017-08-25 00:00:00.000" a proper DateTime format. 

SELECT CONVERT(VARCHAR(10), GETDATE(), 101)

The query returns "09/02/2020" by converting the DateTime value into a character string.


Let's see one more example. 

USE AdventureWorks2017;
GO

SELECT Name,
             sellstartdate,
             CONVERT(VARCHAR(10),sellstartdate,101) converteddate,
             ListPrice
FROM Production.Product
WHERE CONVERT(INT, ListPrice) LIKE '3%';
GO


Please click here for more information about the "CONVERT" function.


PARSE:

Returns the result of an expression translated to the requested data type in SQL Server. This function is not an alternative to CAST or CONVERT function though it converts the data type one to another.

Let's see this date - "
Wednesday, 2 September 2020". If you see there isn't any issue with this date or its format as it's human-readable and we know that in some places people use such date format. However, built-in functions like 'CAST' or 'CONVERT' cannot convert such data that people of different timezones or different geo-locations used.

Let us test.

SELECT CAST('Wednesday, 2 September 2020' AS datetime)

SELECT CAST('Wednesday, 2 September 2020' AS datetime2)

SELECT CONVERT(DATETIME,'Wednesday, 2 September 2020')

SELECT CONVERT(DATETIME,'Wednesday, 2 September 2020')

All the above statements return 'conversion failed' error since CAST or CONVERT cannot convert such data into other formats.

SELECT PARSE('Wednesday, 2 September 2020' AS datetime)

SELECT PARSE('Wednesday, 2 September 2020' AS datetime2)

SELECT PARSE('Wednesday, 2 September 2020' AS datetime2 USING 'en-US');

The above three statements return the converted data as shown below.


Let's see one more example.


Above is a test data. EmployeeID usually is an integer value however intensionally it was assigned as VARCHAR. The same with Salary, "joindate" and "updatedate".

In the SELECT statement, +1 (plus one) at the end of the first parse statement is to check if it is convertible to a numeric value. If the outcome is not a number then addition will return an error. 

PARSE will not proceed further and throws an error if there is any null value


If the PARSE is used in a transaction and if there is any null, the whole transaction will be rolled back. This led to the introduction of TRY_PARSE. 


TRY_PARSE:

TRY_PARSE function is as same as PARSE; the only difference is it will return a null value when the conversion fails instead of canceling the whole conversion process. 

USE dbTEST
GO

CREATE TABLE EmpTest(
                           EmpID VARCHAR(4),
                           Ename VARCHAR(50),
                           Salary varchar(5),
                           joindate VARCHAR(40),
                           updateddate varchar(20))

INSERT INTO EmpTest VALUES
('A123','Kate','21000','Tuesday, 1 September 2020','09/01/2020'),
('B199','Jamon','23000','Wednesday, 2 September 2020','09/02/2020'),
('','Sandra',NULL,'Tuesday, 1 September 2020','09/01/2020') 

SELECT TRY_PARSE(RIGHT(EmpID,3) AS INT)+1,
             TRY_PARSE(Salary AS DECIMAL(15,2)),
             TRY_PARSE(joindate AS date),
             TRY_PARSE(updateddate AS DATETIME)
FROM EmpTest



Similarly, there are two more functions i.e. TRY_CONVERT and 
TRY_CAST. These two will also return a null value when the conversion fails, instead of cancelling the transaction and returning an error.


TRY_CONVERT:

SELECT TRY_CONVERT(INT, RIGHT(EmpID,3)),
             TRY_CONVERT(DECIMAL(15,2), Salary),
             TRY_CONVERT(DATETIME, updateddate)
FROM EmpTest


TRY_CAST:

SELECT TRY_CAST(RIGHT(EmpID,3) AS INT),
             TRY_CAST(Salary AS DECIMAL(15,2)),
             TRY_CAST(updateddate AS DATETIME)
FROM EmpTest





Clearing Screen - HDFS Terminal, Hive, Impala

Sometimes we may need to clear the screen or console window of commands or any output generated from the previous task. 

The following commands are used to clear the contents of the screen from Hive, Impala, and HDFS terminal windows. If you have already worked in Ms-DOS, you would definitely search for such command in other terminals. 

By the way, CLS is the command in DOS.


HDFS:

clear; is the command to clean-up the contents of the screen in Impala.



Impala:

!clear; is the command to clean-up the contents of the screen in Impala.


Hive:

!clear; is the command to clean-up the contents of the screen in Hive.



Sqoop Complete Tutorial Part-5

This is the continuation part of "Sqoop Complete Tutorial". If you want to read -


14) Read the contents of the database without importing it. 

sqoop list-tables
--connect jdbc:mysql://localhost/empdept
--username root
--password cloudera


This above command will list out the tables from the given database.

15) Get the list of of the databases using Sqoop

sqoop list-databases
--connect jdbc:mysql://localhost/
--username root
--password cloudera



With the above command, we completed learning how to import data from MySQL to HDFS using Sqoop. The upcoming articles are on how to import data from MySQL to Hive.

Please click here for the next part


SQL Server Error Log - Search

The SQL Server error log contains user-defined events and certain system events you can use for troubleshooting. The errors and the informational data will be stored in the error logs. The data will be huge and sometimes it will be difficult to find the information or error from it. 

Below is the query to retrieve the complete log from SSMS.  

EXEC xp_ReadErrorLog
GO

If you want to search through this log, you can use the below syntaxes.

To know the SQL Server version, edition, and other information -
EXEC xp_ReadErrorLog 0, 1, N'Build', N'Edition', NULL, NULL, 'DESC'
GO

To know on which port SQL Server is listening on -
EXEC xp_ReadErrorLog 0, 1, N'Server is listening on', N'any', NULL, NULL, 'DESC'
GO

To know the CLR version loaded in the SQL Server -
EXEC xp_ReadErrorLog 0, 1, N'CLR version', N'loaded', NULL, NULL, 'DESC'
GO

To know the authentication mode -
EXEC xp_ReadErrorLog 0, 1, N'Authentication Mode', NULL, NULL, NULL, 'DESC'
GO

To know the number of sockets and processors -
EXEC xp_ReadErrorLog 0, 1, N'sockets', N'processors', NULL, NULL, 'DESC'
GO



If you observe the syntax, I've provided two key-words to filter and fetch the required information. The count can be increased.

The same can be applied to find errors.


Saturday, September 5, 2020

Find a specific column from the database objects

There are many ways to find out a specific column from the database objects like Tables and stored procedures.

1) Red Gate's SQL Search:

This tool will be integrated into SQL Server Management Studio once installed. This will help in searching across multiple object types and multiple databases.


2) Using a query:
The following query will find the tables and stored procedures that contain a certain field.  

SELECT sysobjects.name FROM syscolumns
LEFT JOIN sysobjects ON sysobjects.id = syscolumns.id
WHERE syscolumns.name like '%EmployeeID%'
ORDER BY 1

3) Using a query:
The following query will find the tables (only) that contain a certain field

SELECT * FROM information_schema.columns
WHERE
column_name LIKE '%EmployeeID%'



Sunday, August 30, 2020

Status of SQL Server Agent

SQL Server Agent is a Microsoft Windows service that executes scheduled jobs and DBAs need to ensure the Agent is running. There are several methods to know the status of the SQL Server Agent.

1) Check if the SQL Server Agent is running in Services.

    Control panel --> Windows Administrative Tasks --> Services



2) In SSMS (SQL Server Management Studio), go to Object Explorer, scroll down to find "SQL Server Agent". Right-click and check the properties.


3) By executing the following query

SELECT [servicename], [status_desc]
FROM   sys.dm_server_services
WHERE  [servicename] LIKE N'%SQL Server Agent%';


4) Also, another query to see if there are any processes in "sysprocesses" system table.

SELECT * FROM sysprocesses
WHERE program_name LIKE N'SQLAgent%'


5) Another query to find out if SQL Server Agent's status

EXEC master.dbo.xp_servicecontrol 'QueryState', 'SQLServerAgent';
GO


6) Open the Microsoft SQL Server Configuration Manager from Configuration Tools and see the status of SQL Agent service. 

 

Please do let me know if you find any other method.


Server Principals Login Create and Modification Date

The certificate-based logins that start and end with double hash marks are stated below -

##MS_SQLResourceSigningCertificate##
##MS_SQLReplicationSigningCertificate##
##MS_SQLAuthenticatorCertificate##
##MS_PolicySigningCertificate##
##MS_SmoExtendedSigningCertificate##
##MS_AgentSigningCertificate##

By default, they are Windows-based logins and are enabled. If you want to look into both windows and SQL based logins, query against sys.server_principals as mentioned below.

USE master
GO
SELECT * FROM sys.server_principals
WHERE  name LIKE '%##%'


If you look into sys.server_principals for the certificate-based logins, you might confuse with the "create date" and "modify date" of those logins. The logins were created when SQL Server is installed however the dates are different than the installation date.

These logins will be dropped and recreated during the 'script level upgrade' which usually takes place after cumulative updates/patches or an unexpected restart of the active SQL node.

There won't be any change in the dates (since no logins will be recreated) when the services are restarted or routine failover takes place.


If you need more information on Server Principals, please click here.




Certificate Based Server Logins - SQL Server

The SQL and Windows logins information will be available in sys.server_principals along with the logins that were created from certificates when SQL Server is installed. However, these certificate-based logins are for internal system use only and will not be used to login to the system.

As per the documentation, 
these principal accounts do not have passwords that can be changed by administrators as they are based on certificates issued to Microsoft.

Use the below query to identify those logins.

SELECT * FROM sys.server_principals
WHERE  name LIKE '%##%'
              AND is_disabled = 0


If you want to know all the logins that have double hash marks (##), use the below query.


If you see there are 8 principals that are created by SQL Server for both SQL and Windows logins and based on configuration SQL based logins were disabled. As I stated earlier, these logins cannot be used to login to SQL Server instance since they do not have access. Please check below -

USE master
GO

SELECT a.name, a.is_disabled, b.hasaccess FROM sys.server_principals a
JOIN sys.syslogins b ON a.name = b.name
WHERE a.name LIKE '%##%'

The login ##MS_AgentSigningCertificate## is not disabled and has access to the system however one cannot impersonate a login mapped to a certificate/asymmetric key. 

These certificates are either be used as permission containers or to encrypt the data. If you want to see what permissions they have, use the below query-

SELECT a.name principal_name, b.class_desc, b.permission_name
FROM sys.server_permissions b
JOIN sys.server_principals a ON b.grantee_principal_id = a.principal_id
WHERE name LIKE '%Certificate%'
ORDER BY a.name

Please note that these are created by the system, and it should be not be deleted.

Have you ever observed the creation date and modify date for these certificate-based logins?

Click here to know more.



Saturday, August 29, 2020

Difference between Local File System vs HDFS

In an operating system, file system is the strategy that will be used to keep track of files on a disk. It has its own method to organize the files on the disk or partition. 

HDFS will be deployed on top of the existing Operating system to bring its own file system method. This way, the system will have two different file systems at a time which we call "local file system" and HDFS.

There is a difference between the Local File System and the Hadoop Distributed File System (HDFS) and the difference is mainly because of the block size. 

The block size is 4 KB both in Windows and Unix local file systems. But the block size in Hadoop HDFS is 64 MB in the initial version and in later versions, it is 128 MB which is configurable. This impacts the disk seek. For a large file, there will be multiple disk-seeks in local file system due to its 4KB block size. Since HDFS maintains higher block allocation, the data will be read sequentially after every individual seek. 

Large files will be split into multiple chunks automatically, distributed and stored across various slave machines (aka. nodes) in HDFS. In the local file system, the files will be saved the way they are.

The machine with the local file system is physically a single unit. HDFS is logically a single unit.


These are the differences I found over a period of time. If you find any more, please do share with me. 

Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...