Conditional aggregation of data

Data aggregation represents a mathematical operation (summarization, counting, averaging) that we perform over a set of data. Conditional aggregation, as its name implies, is performing data aggregation over a set of data that meets certain condition which is contained within a given data range. In this “recipe” we’ll cover functions that use single condition for performing aggregation.

We will start with the frequently used aggregation function, which is the SUMIF function. Its syntax:

SUMIF (<range>, <criteria>,<sum range>)

00054-1The first argument of this function is the range of cells. The range is a set of cells in a column of a table, an entire column, or a named column. Second argument is the criterion to compare the cells of the previously defined range. This can be a value (text under the signs, number, date, logical value) or condition specified under the signs. The third argument is sum range: a range of cellst that we summarize in case that certain cell meets given criterion. In the sequence of examples that follow, we will explain how everything is possible to use this function.

For the table set in the picture we will try to summarize all those values of sales that were played in Belgrade. After entering the equality sign and selecting the SUMIF function, we enter the first argument B2: B7, which represents all the cells in which the names of cities are in the table. Then we enter a condition, which is the name of the city of Belgrade under the signs. Finally, we choose the column that contains the sales values.

=SUMIF(B2:B7,”BEOGRAD”,C2:C7)

After we have entered the function, we will get the sum of all data that satisfy the condition that the sale was made in Belgrade.

When we enter the text, we can also use “wildcard” characters “star” (*) and “question mark” (?). The asterisk changes more than one character in the text, and the question mark only one character. For example:

=SUMIF(B2:B7,”B*”,C2:C7)

calculates sum of all values for cities that begin with “B” letter. Or:

=SUMIF(B2:B7,”B?OGRAD”,C2:C7)

calculates sum of all values for cities which begin with “B”, then any other letter, and finishes with “OGRAD” string.

If we summarize data in comparison with some numerical value, we can state the condition under the quotation signs. For example, if we want to sum up all the values in column C that are greater than 100,000, we will use the formula:

=SUMIF(C2:C7,”>100000″,C2:C7)

As you can see, in some cases it is possible to specify two identical ranges as an argument. In the specific case, we restricted the testing of conditions and summing up to the range C2: C7. As an argument we could also quote the whole column C, in which case the formula would look like this:

=SUMIF(C:C,”>100000″,C:C)

In addition to the SUMIF function, the AVERAGEIF and COUNTIF functions are also used to perform data aggregation. AVERAGEIF has the same syntax as SUMIF, but it differs only in that it performs conditional averaging of a series of data. The COUNTIF function is used to count elements that meet a certain condition. Its syntax is:

COUNTIF (<range>, <criteria>)

For example, if we want to count all those cells in column B that have the value of “BEOGRAD” we would use the following formula:

=COUNTIF(B2:B7,”BEOGRAD”)

Average sales value in “BEOGRAD” is calculated by following formula:

=AVERAGEIF(B2:B7,”BEOGRAD”,C2:C7)

For each of these functions, the same rules apply for assigning arguments or formulating the conditions used to aggregate data.