Once you have your values in the DataFrame, you can perform a large variety of operations. For example, you may calculate stats using Pandas.
For instance, let’s say that you want to find the maximum price among all the Cars within the DataFrame.
Obviously, you can derive this value just by looking at the dataset, but the method presented below would work for much larger datasets.
To get the maximum price for our Cars example, you’ll need to add the following portion to the Python code (and then print the results):
Here is the complete Python code:
max1 = df['Price'].max()
import pandas as pd
cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
'Price': [22000,25000,27000,35000]
}
df = pd.DataFrame(cars, columns = ['Brand', 'Price'])
max1 = df['Price'].max()
print (max1)
In the real world, a Panda DataFrame will be created by loading the datasets from persistent storage, including but not limited to excel, csv and MySQL database.
However, to help you understand it better, I’ll be using Python Data Structures (Dictionary and list) over here.
As depicted in excel sheet above, if we consider column names as “Keys” and list of items under that column as “Values”, we can easily use a python dictionary to represent the same as
my_dict = {
'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}
We can create a Pandas DataFrame out of this dictionary as
import Pandas as pddf = pd.DataFrame(my_dict)
The resultant DataFrame shall look similar to what we’ve seen in the excel sheet above as
There are chances that the Columns are not in sequence as defined in the dictionary because python implements dictionary as hash and doesn’t guarantee to preserve the sequence.
Find maximum values & position in columns and rows of a Dataframe in Pandas
In this article, we are going to discuss how to find maximum value and its index position in columns and rows of a Dataframe.
DataFrame.max()
Pandas dataframe.max() method finds the maximum of the values in the object and returns it. If the input is a series, the method will return a scalar which will be the maximum of the values in the series. If the input is a dataframe, then the method will return a series with maximum of values over the specified axis in the dataframe. The index axis is the default axis taken by this method.
import numpy as np
import pandas as pd
# List of Tuples
matrix = [(10, 56, 17),
(np.NaN, 23, 11),
(49, 36, 55),
(75, np.NaN, 34),
(89, 21, 44)
]
# Create a DataFrame
abc = pd.DataFrame(matrix, index = list('abcde'), columns = list('xyz'))
# output
abc
Output:
How to find Maximum values of every column?
To find the maximum value of each column, call max()
method on the Dataframe object without taking any argument.
# find the maximum of each column
maxValues = abc.max()
print(maxValues)
Output :
We can see that it returned a series of maximum values where the index is column name and values are the maxima from each column.
How to find maximum values of every row?
To find the maximum value of each row, call max()
method on the Dataframe object with an argument axis = 1.
# find the maximum values of each row
maxValues = abc.max(axis = 1)
print(maxValues)
Output :
We can see that it returned a series of maximum values where the index is row name and values are the maxima from each row. We can see that in the above examples NaN values are skipped while finding the maximum values in any axis. We can include NaN values as well if we want.
How to find maximum values of every column without skipping NaN?
# find maximum value of each
# column without skipping NaN
maxValues = abc.max(skipna = False)
print(maxValues)
Output :
By putting skipna=False we can include NaN values also. If any NaN value exists it will be considered as the maximum value.
How to find maximum values of a single column or selected columns?
To get the maximum value of a single column see the following example
# find maximum value of a
# single column 'x'
maxClm = df['x'].max()
print("Maximum value in column 'x': " )
print(maxClm)
Output :
We have another way to find maximum value of a column :
A list of columns can also be passed instead of a single column to find the maximum values of specified columns
# find maximum value of a
# single column 'x'
maxClm = df.max()['x']
The result will be same as above.
Output:
# find maximum values of a list of columns
maxValues = df[['x', 'z']].max()
print("Maximum value in column 'x' & 'z': ")
print(maxValues)
Output :
How to get position of maximum values of every column?
DataFrame.idxmax(): Pandas dataframe.idxmax()
method returns index of first occurrence of maximum over requested axis. While finding the index of the maximum value across any index, all NA/null values are excluded.
Syntax: DataFrame.idxmax(axis=0, skipna=True)
Parameters :
axis : 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise
skipna : Exclude NA/null values. If an entire row/column is NA, the result will be NAReturns : idxmax : Series
Let’s take some examples to understand how to use it :
How to get row index label of Maximum value in every column
Output : |
It returns a series containing the column names as index and row as index labels where the maximum value exists in that column.
How to find Column names of Maximum value in every row?
# find the column name of maximum
# values in every row
maxValueIndex = df.idxmax(axis = 1)
print("Max values of row are at following columns :")
print(maxValueIndex)
Output :
It returns a series containing the rows index labels as index and column names as values where the maximum value exists in that row.