Aggregate functions
If you are already familiar with aggregate functions, then you can skip straight to the syntax and semantics section or the section that lists all of the YSQL aggregate functions and that links, in turn, to the definitive account of each function.
This page has only the Synopsis section and the section Organization of the aggregate functions documentation section.
Synopsis
Aggregate functions operate on a set of values and return a single value that reflects a property of the set. The functions count()
and avg()
are very familiar examples.
In the limit, the values in the set that the aggregate function operates on are taken from the whole of the result set that the FROM
list defines, subject to whatever restriction the subquery's WHERE
clause might define. Very commonly, the set in question is split into subsets according to what the GROUP BY
clause specifies.
Very many aggregate functions may be invoked, not only using the ordinary syntax where GROUP BY
might be used, but also as window functions.
Notice these differences and similarities between aggregate functions and window functions:
- A window function produces, in general, a different output value for each different input row in the window.
- When an aggregate function is invoked using the regular
GROUP BY
clause, it produces a single value for each entire subset that theGROUP BY
clause defines. - When an aggregate function is invoked in the same way as a window function, it might, or might not, produce the same value for each different input row in the window. The exact behavior depends on what the frame clause specifies.
- All of the thirty-seven aggregate functions are listed in the four tables in the section Signature and purpose of each aggregate function.
Organization of the aggregate functions documentation
The remaining pages are organized as follows:
Informal overview of function invocation using the GROUP BY clause
Here. Skip this section entirely if you are already familiar with aggregate functions. It presents code examples that classify the aggregate functions into three kinds according to how they may be invoked:
This section focuses on the effect that each illustrated function has. It leaves formal definitions to the invocation syntax and semantics section and the Signature and purpose of each aggregate function section.
Aggregate function invocation—SQL syntax and semantics
Here. This section presents the formal treatment of the syntax and semantics of how an aggregate function is invoked as a special kind of SELECT
list item—with the invocation syntax optionally decorated with an ORDER BY
clause, or a FILTER
clause. This account also explains the use of the HAVING
clause which lets you restrict a result set according the value(s) returned by a list of aggregate functions.
There are four variants of the GROUP BY
invocation style: GROUP BY <column list>
; GROUP BY GROUPING SETS
; GROUP BY ROLLUP
; and GROUP BY CUBE
. Further, all but the bare GROUP BY <column list>
allow the use of a GROUPING
keyword in the SELECT
list to label the different GROUPING SETS
. Because all of this requires a fairly lengthy explanation, this is covered in the dedicated section Using the GROUPING SETS, ROLLUP, and CUBE syntax for aggregate function invocation
.
Signature and purpose of each aggregate function
Here. The following list groups the thirty-seven aggregate functions in the same way that the sidebar items group them. The rationale for the grouping is explained in the referenced sections.
array_agg()
string_agg()
jsonb_agg()
jsonb_object_agg()
bit_and()
bit_or()
bool_and()
bool_or()
variance()
var_pop()
var_samp()
stddev()
stddev_pop()
stddev_samp()
covar_pop()
covar_samp()
corr()
regr_avgy()
regr_avgx()
regr_count()
regr_slope()
regr_intercept()
regr_r2()
regr_syy()
regr_sxx()
regr_sxy()
mode()
percentile_disc()
percentile_cont()
rank()
dense_rank()
percent_rank()
cume_dist()
Aggregate functions case study—the "68–95–99.7" rule
Here. Regard this section as an optional extra. It shows the use of aggregate functions to demonstrate the so-called "68–95–99.7 rule"—described in this Wikipedia article. This case-study focuses on just one part of the rule:
68.27% of the values in a normal distribution lie within one standard deviation each side of the mean.