Group Functions using SQL

Group functions produce results based on a group or set of rows. For example, with group functions you can get counts, sums/totals, averages, maximums, minimums, etc. If you include the distinct clause then you will evaluate only non-duplicate values while if you include ALL (the default) you will evaluate all values. The data types may be VARCHAR2, CHAR, NUMBER or DATE for the functions COUNT, MAX and MIN. Null values will be ignored by all group functions except COUNT(*). The NVL function can be used to overcome this.

FUNCTION list:

COUNT - counts the group
SUM - totals the group
AVG - averages the group
MAX - finds maximum in the group
MIN - find minimum in the group
STDDEV - finds the standard deviation
VARIANCE - finds the variance

The only group function that deals with null values is COUNT(*). Other functions can be forced to deal with null values by using the NVL function.

SQL CODE:

SQL> SELECT COUNT(*) FROM donor;

 COUNT(*)
---------
        6
SQL> SELECT COUNT(NVL(yrgoal,0)), AVG(NVL(yrgoal,0)), SUM(NVL(yrgoal,0))
  2  FROM donor;

COUNT(NVL(YRGOAL,0)) AVG(NVL(YRGOAL,0)) SUM(NVL(YRGOAL,0))
-------------------- ------------------ ------------------
                   6          183.33333               1100

SQL>

The rest of these examples will ignore null values if the column being tested contains a null value.

SQL CODE:

SQL> SELECT COUNT(yrgoal), AVG(yrgoal), SUM(yrgoal)
  2  FROM donor;

COUNT(YRGOAL) AVG(YRGOAL) SUM(YRGOAL)
------------- ----------- -----------
            5         220        1100
SQL> SELECT MIN(yrgoal), MAX(yrgoal)
  2  FROM donor;

MIN(YRGOAL) MAX(YRGOAL)
----------- -----------
	50 	500

SQL> SELECT COUNT(datefst), MAX(datefst), MIN(datefst)
  2  FROM donor;

COUNT(DATEFST) MAX(DATEF MIN(DATEF
-------------- --------- ---------
             6 03-JUL-98 04-MAR-92

SQL> SELECT COUNT(DISTINCT(yrgoal)) FROM donor;

COUNT(DISTINCT(YRGOAL))
-----------------------
                      4

SQL> SELECT SUM(DISTINCT(yrgoal))
  2  FROM donor;

SUM(DISTINCT(YRGOAL))
---------------------
                 1050

SQL> SELECT STDDEV(yrgoal), VARIANCE(yrgoal)
  2  FROM donor;

STDDEV(YRGOAL) VARIANCE(YRGOAL)
-------------- ----------------
	213.8925 	45750

Instead of treating the whole table as the group, the GROUP BY clause can be used to make groups based on particular criteria. In the first example below, I am printing the state and a count of the number of cities in each state. The select line has state and count(city) and the GROUP BY clause has state (note that state did not have to be included in the select by, only in the group by). SQL will produce one display line for each state or group. Including the GROUP BY function means that you will only get grouped results, you will not see individual groups. A WHERE clause can be used to exclude things from the GROUPS as shown in the second example below. One other important thing to note, you must use the column not the column alias in the group by clause. One other note, the group by clause causes sorting to be in ascending order by the column in the group by clause, to override this, you can use the order by clause. This is shown in the third example which is ordered by state desc.

SQL CODE:

SQL> SELECT state, COUNT(city)
  2  FROM donor
  3  GROUP BY state;

ST COUNT(CITY)
-- -----------
MA           4
RI           2

SQL>

SQL> SELECT state, COUNT(city)
  2  FROM donor
  3  WHERE city !='Seekonk'
  4  GROUP BY state;

ST COUNT(CITY)
-- -----------
MA           3
RI           2

SQL> SELECT state, COUNT(city)
  2  FROM donor
  3  GROUP BY state
  4  ORDER BY state desc;

ST COUNT(CITY)
-- -----------
RI           2
MA           4

SQL>

You can also use a function in the order by, let's say I wanted to order by the count of the cities in descending order as opposed to the state name.

SQL CODE:

SQL> SELECT state, COUNT(city)
  2  FROM donor
  3  GROUP by state
  4  ORDER BY COUNT(city) desc;

ST COUNT(CITY)
-- -----------
MA           4
RI           2

You can also group within a group. In this example, I grouped by state and then by contact within state. This means that first the rows will be grouped by state and then on contact within state. The SUM of yrgoals is for each contact within each state. Note again that with the GROUP BY I can only display grouped information so the only things in the select can be the columns I group by and group functions. Again, I could have eliminated a particular contact, as shown in the second example below, by the use of the WHERE clause. In the example, I eliminated contact John Adams from the results.

SQL CODE:

SQL> SELECT state, contact, SUM(yrgoal)
  2  FROM donor
  3  GROUP BY state, contact;

ST CONTACT      SUM(YRGOAL)
-- ------------ -----------
MA Amy Costa            150
MA John Adams            50
MA John Smith           500
RI Susan Jones          400

SQL> SELECT state, contact, SUM(yrgoal)
  2  FROM donor
  3  WHERE contact  !='John Adams'
  4  GROUP BY state, contact;

ST CONTACT      SUM(YRGOAL)
-- ------------ -----------
MA Amy Costa            150
MA John Smith           500
RI Susan Jones          400

A tricky part of grouping is when you can use the WHERE clause and when you must use the HAVING clause. My understanding is that if you restrict prior to forming the groups the WHERE clause can be used. In the example above I eliminated contact John Adams. However if you restrict the actual groups such as in the example below where I was testing to see if SUM(yrgoal) was > 100 then you need to use the HAVING clause. The third example below, brings the WHERE and the HAVING together with the WHERE eliminating JOHN ADAMS and the HAVING eliminating groups that do not have a SUM(yrgoal) > 100.

SQL CODE:

SQL> SELECT state, contact, SUM(yrgoal)
  2  FROM donor
  3  WHERE SUM(yrgoal) > 100
  4  GROUP by state, contact;
WHERE SUM(yrgoal) > 100
      *
ERROR at line 3:
ORA-00934: group function is not allowed here

SQL> SELECT state, contact, SUM(yrgoal)
  2  FROM donor
  3  GROUP BY state, contact
  4  HAVING SUM(yrgoal) > 100;

ST CONTACT      SUM(YRGOAL)
-- ------------ -----------
MA Amy Costa            150
MA John Smith           500
RI Susan Jones          400

SQL> SELECT state, contact, SUM(yrgoal)
  2  FROM donor
  3  WHERE contact != 'John Adams'
  4  GROUP BY state, contact
  5  HAVING SUM(yrgoal) > 100
  6  ORDER by SUM(yrgoal);

ST CONTACT      SUM(YRGOAL)
-- ------------ -----------
MA Amy Costa            150
RI Susan Jones          400
MA John Smith           500

SQL>

The two examples below show the minimum or the average yrgoals by state (example 1) and by contact within state (example 2).

SQL CODE:

SQL> SELECT MIN(AVG(yrgoal))
  2  FROM donor
  3  GROUP BY state;

MIN(AVG(YRGOAL))
----------------
             175

SQL> SELECT MIN(AVG(yrgoal))
  2  FROM donor
  3  GROUP BY state, contact;

MIN(AVG(YRGOAL))
----------------
              50

SQL>