Pandas DataFrame.sum() function is used to return the sum of all the values for the given axis. If the input is an index axis then it adds all the values present in columns and repeats the same for all the columns.It returns a serieswhich contains sum of values of all columns.
It can also skip the missing values in the DataFrame while calculating the sum the sum in the DataFrame.
Syntax
DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0,
**kwargs)
Parameters:
- axis: Axis along which the sum of values is to be calculated. {index (0), columns (1)},0 is used for sum of values along rows or index and 1 is used for the sum along the columns.
- skipna: By Default True. It is used to exclude null values while computing the result.
- level: By Default None. It is used when index is a multindex, then it adds item in the given level only.
- numerical_only: By Default None.If True, it will include only float, int, boolean columns. If None, it will attempt to use everything.
- min_count: By Default 0. The required number of valid values to perform the operation.
- **kwargs: Additional Keywords, to be passed through the function.
Return:
According to the specified level it returns the sum in the from of Series or DataFrame.
Examples
import pandas as pd
import numpy as np
Students_name=['John','Alice','Jack','Tom','Monica']
Physics_marks=[44,np.NaN,47,28,39]
Chemistry_marks=[45,46,np.NaN,40,30]
Maths_marks=[35,38,29,30,np.NaN]
Students_marks=pd.DataFrame({'Name':Students_name, 'Physics':Physics_marks,
'Chemistry':Chemistry_marks,
'Maths':Maths_marks})
Students_marks
Output:
Name | Physics | Chemistry | Maths | |
---|---|---|---|---|
0 | John | 44.0 | 45.0 | 35.0 |
1 | Alice | NaN | 46.0 | 38.0 |
2 | Jack | 47.0 | NaN | 29.0 |
3 | Tom | 28.0 | 40.0 | 30.0 |
4 | Monica | 39.0 | 30.0 | NaN |
1. When axis = 0 i.e. along rows (excluding null values )
Students_marks.sum(axis=0)
Output
Name JohnAliceJackTomMonica
Physics 158.0
Chemistry 161.0
Maths 132.0
dtype: object
2. When axis = 1 i.e. along columns (excluding null values )
Students_marks.sum(axis=1)#the sum of all the values over the column axis.
Output
0 124.0
1 84.0
2 76.0
3 98.0
4 69.0
dtype: float64
3. Including null values
Students_marks.sum(axis=1, skipna=False)#without skipping Null Values
Output
0 124.0
1 NaN
2 NaN
3 98.0
4 NaN
dtype: float64
4. Using min_count
Students_marks.sum(axis=1 ,min_count=2)
Output
0 124.0
1 84.0
2 76.0
3 98.0
4 69.0
dtype: float64
5. Using Specific level in Multi-Index DataFrame
Students_name=['John','Alice','Jack','Tom','Monica']
Roll_No=[10,11,15,17,25]
Physics_marks=[44,np.NaN,47,28,39]
Chemistry_marks=[45,46,np.NaN,40,30]
Maths_marks=[35,38,29,30,np.NaN]
Students_marks=pd.DataFrame({'Name':Students_name, 'Roll No':Roll_No ,'Physics':Physics_marks,
'Chemistry':Chemistry_marks,
'Maths':Maths_marks})
Students_marks.set_index(['Name','Roll No'], inplace=True)
print(Students_marks)
Output
Physics | Chemistry | Maths | ||
---|---|---|---|---|
Name | Roll No | |||
John | 10 | 44.0 | 45.0 | 35.0 |
Alice | 11 | NaN | 46.0 | 38.0 |
Jack | 15 | 47.0 | NaN | 29.0 |
Tom | 17 | 28.0 | 40.0 | 30.0 |
Monica | 25 | 39.0 | 30.0 | NaN |
#sum of values for a level 'Roll No' only
Students_marks.sum(level='Roll No')
Output
Physics | Chemistry | Maths | |
Roll No | |||
10 | 44.0 | 45.0 | 35.0 |
11 | 0.0 | 46.0 | 38.0 |
15 | 47.0 | 0.0 | 29.0 |
17 | 28.0 | 40.0 | 30.0 |
25 | 39.0 | 30.0 | 0.0 |
- Log in to post comments