Skip to main content

The pandas aggregate function is used to aggregate using one or more operations over desired axis.

Dataframe.aggregate() function is used to apply some aggregation across one or more column. Aggregate using callable, string, dict, or list of string/callables. Most frequently used aggregations are:

sum: Return the sum of the values for the requested axis
min: Return the minimum of the values for the requested axis
max: Return the maximum of the values for the requested axis

the aggregate() method allows for even more flexibility. It can take a string, a function, or a list thereof, and compute all the aggregates at once. 

Syntax

pandas.dataframe.agg(func, axis=0, *args, kwargs)

  • func : function, str, list or dict – This is the function used for aggregating the data.
  • axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – The axis over which the operation is applied.
  • args : These are the positional arguments to pass to func.
  • kwargs : Additional keyword arguments.

Example 1: Using pandas aggregate functions over rows

Here a dataframe is created first and then different operations are applied using aggregate function of pandas.

Input:

df = pd.DataFrame([[15, 22, 37],
                    [49, np.nan, 64],
                    [np.nan, 89, 99],
                    [53, np.nan,71]],
                   columns=['P', 'Q', 'R'])

Input:

df

Output:

  P Q R
0 15.0 22.0 37
1 49.0 NaN 64
2 NaN 89.0 99
3 53.0 NaN 71

Here sum and minimum value for each column is calculated using pandas agg() function.

Input:

df.agg(['sum', 'min'])

Output:

  P Q R
sum 117.0 111.0 271
min 15.0 22.0 37

Example 2: Using different agg() functions on each column

In this example, different types of functions are applied over different columns.

Input:

df

Output:

  P Q R
0 15.0 22.0 37
1 49.0 NaN 64
2 NaN 89.0 99
3 53.0 NaN 71

Input:

df.agg({'P' : ['sum', 'min'], 'Q' : ['min', 'max']})

Output:

  P Q
max NaN 89.0
min 15.0 22.0
sum 117.0 NaN

Example 3: Aggregating over columns

Here the aggregate function is applied over columns. We can specify the operation and the axis on which it has to be performed.

Input:

df

Output:

  P Q R
0 15.0 22.0 37
1 49.0 NaN 64
2 NaN 89.0 99
3 53.0 NaN 71

Input:

df.agg("mean", axis="columns")

Output:

0    24.666667
1    56.500000
2    94.000000
3    62.000000
dtype: float64

Example 4:

Input:

df.groupby('key').aggregate(['min', np.median, max])

Output:

  data1 data2
  min median max min median max
key            
A 0 1.5 3 3 4.0 5
B 1 2.5 4 0 3.5 7
C 2 3.5 5 3 6.0 9

Another useful pattern is to pass a dictionary mapping column names to operations to be applied on that column:

Input:

df.groupby('key').aggregate({'data1': 'min',
                             'data2': 'max'})

Output:

  data1 data2
key    
A 0 5
B 1 7
C 2 9
Tags
Submitted by shiksha.dahiya on February 16, 2021

Shiksha is working as a Data Scientist at iVagus. She has expertise in Data Science and Machine Learning.

About

Elix is a premium wordpress theme for portfolio, freelancer, design agencies and a wide range of other design institutions.