Showing posts with label Windowing Functions. Show all posts
Showing posts with label Windowing Functions. Show all posts

Sunday, August 23, 2020

Aggregate Functions in Analytic Context

The functions SUM, AVG, COUNT, MIN, and MAX are well-known aggregate functions that we use every day. They are to compute/summarize the multiple rows by grouping them and provide a single summary value. 

However, when it comes to analysis, the aggregate functions carry out twofold responsibility: The same aggregate function which computes on each row can also help in computing based on the range of rows or partitions.


Considering the "EMP" data, the following aggregations/analysis can be done based on Salary.

1) Identify the total salary costing to the company.

2) Identify the total salary budget for each department.

3) Identify the total salary budget for each role in each department.

This means, in the sales department, how much salary is being paid to Salesmen and to other teams within the department.

Point (1) can be achieved with a simple aggregation however the other two are not limited to one result value and operate on a certain range or partition or window where the input rows are ordered and grouped using flexible conditions like data window/range of rows expressed through an OVER() clause. The range or partition in the above scenario is "department" for the 2nd point and "Job" for the 3rd one.

Here is the data for an example.


Aggregate functions with OVER clause:

SELECT DISTINCT [JOB]
      ,a.[DEPTNO],
  b.[DNAME],
  b.LOC,
  SUM(SAL) OVER(PARTITION BY a.DeptNo) DepartmentwiseSalaryCTC,
  SUM(SAL) OVER(PARTITION BY a.Job) JobwiseSalaryCTC
  FROM [TestDB1].[dbo].[EMP] a
  JOIN [TestDB1].[dbo].[DEPT] b ON a.DeptNo = b.DeptNo


If you look at the summary, employees salary cost to company based on department and job is retrieved. This way, we can use all the aggregate functions with the OVER clause to analyze the data in the most possible ways.

SELECT DISTINCT [JOB], a.[DEPTNO], b.[DNAME],
  SUM(SAL) OVER(PARTITION BY a.DeptNo) DepartmentSalaryCTC,
  SUM(SAL) OVER(PARTITION BY a.Job) JobSalaryCTC,
  AVG(SAL) OVER(PARTITION BY a.DeptNo) AVGSalaryPerDept,
  AVG(SAL) OVER(PARTITION BY a.Job) AVGSalaryPerRole,
  MAX(SAL) OVER(PARTITION BY a.DeptNo) MaxSalaryInEachDept,
  MAX(SAL) OVER(PARTITION BY a.Job) MaxSalaryInEachRole,
  MIN(SAL) OVER(PARTITION BY a.DeptNo) MinSalaryInEachDept,
  MIN(SAL) OVER(PARTITION BY a.Job) MinSalaryInEachRole
  FROM [TestDB1].[dbo].[EMP] a
  JOIN [TestDB1].[dbo].[DEPT] b ON a.DeptNo = b.DeptNo


Please note that the syntax or the way of analysis is the same in any RDBMS and Big Data technologies like Hive and Impala.


Sunday, August 16, 2020

Computing Running & Cumulative Average

This article helps you in understanding how to calculate running average and cumulative average on a given data-set.

 

As discussed earlier, windowing functions are the same in SQL Server, Oracle & Hive and perform computations across a set of table rows that are related to the current row either by the department, area, category, or date/time.

 

Please observe the sample data-set:


Monthly summarized sales amount by location is specified from which we are about to calculate running average as well as cumulative average.

Running Average / Moving Average / Rolling Average:

The reason to compute a rolling average is to streamline the highs and lows of the dataset and figure out the patterns or trends in the information.

Cumulative Average:

Cumulative or Span of Time is the most common way time is used in a measure. Traditionally Cumulative measures sum data across a span of time.


Both running and cumulative average can be accomplished in a single SELECT statement.

SELECT DISTINCT Location,YYYYMM, Amount,

AVG(Amount) OVER(ORDER BY Location ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) 

    AS 'MovingAverage',

AVG(Amount) OVER(ORDER BY Location ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 

    AS 'CumulativeAverage'

FROM SalesData

GO


Do let me know if you need more clarification.


Big Data & SQL

Hi Everybody, Please do visit my new blog that has much more information about Big Data and SQL. The site covers big data and almost all the...