Skip to main content
dataframe

 

Characteristics of Pandas DataFrames

These characteristics (i.e. tabular format with rows and columns that can have headers) make pandas dataframes very versatile for not only storing different types, but for maintaining the relationships between cells across the same row and/or column.

the type of tabular dataset that can easily be imported into a pandas dataframe.

month precip_in
Jan 0.70
Feb 0.75
Mar 1.85
Apr 2.93
May 3.05
June 2.02
July 1.93
Aug 1.62
Sept 1.84
Oct 1.31
Nov 1.39
Dec 0.84

The relationship between the value January in the months column and the value 0.70 in the precip column is maintained.

month precip_in
Jan 0.70

These two values (January and 0.70) are considered part of the same record, representing the same observation in the pandas dataframe. In addition, pandas dataframes have other unique characteristics that differentiate them from other data structures:

These characteristics (i.e. tabular format with rows and columns that can have headers) make pandas dataframes very versatile for not only storing different types, but for maintaining the relationships between cells across the same row and/or column.

  1. Each column in a pandas dataframe can have a label name (i.e. header name such as months) and can contain a different type of data from its neighboring columns (e.g. column_1 with numeric values and column_2 with text strings).
  2. By default, each row has an index within a range of values beginning at [0]. However, the row index in pandas dataframes can also be set as labels (e.g. a location name, date).
  3. All cells in a pandas dataframe have both a row index and a column index (i.e. two-dimensional table structure), even if there is only one cell (i.e. value) in the pandas dataframe.
  4. In addition to selecting cells through location-based indexing (e.g. cell at row 1, column 1), you can also query for data within pandas dataframes based on specific values (e.g. querying for specific text strings or numeric values).
  5. Because of the tabular structure, you can work with cells in pandas dataframes:
    • across an entire row
    • across an entire column (or series, a one-dimensional array in pandas)
    • by selecting cells based on location or specific values
  6. Due to its inherent tabular structure, pandas dataframes also allow for cells to have null values (i.e. no data value such as blank space, NaN, -999, etc).

Tabular Structure of Pandas Dataframes

As described in the previous paragraphs, the structure of a pandas dataframe includes the column names and the rows that represent individual observations (i.e. records).

In a typical pandas dataframe, the default row index is a range of values beginning at [0], and the column headers are also organized into an index of the column names.

The function DataFrame from pandas (e.g. pd.DataFrame) can be used to manually define a pandas dataframe.

One way to use this function is to provide a list of column names (to the parameter columns) and a list of data values (to the parameter data), which is composed of individual lists of values for each row:

# Dataframe with 2 columns and 2 rows
dataframe = pd.DataFrame(columns=["column_1", "column_2"],
                         data=[
                              [value_column_1, value_column_2],  
                              [value_column_1, value_column_2]
                         ])

In the example below, the pandas dataframe is created using the average monthly precipitation values in inches for Boulder, CO.

The pandas dataframe is created with a column called month containing abbreviated month names as text strings and another column called precip_in for the precipitation (inches) as numeric values.

For example, the first row is created using ["Jan", 0.70], with Jan as the value for month and 0.70 as the value for precip_in.

import matplotlib.pyplot as plt
# Import pandas with alias pd
import pandas as pd
# Average monthly precip for Boulder, CO
avg_monthly_precip = pd.DataFrame(columns=["month", "precip_in"],
                                  data=[
                                       ["Jan", 0.70],  ["Feb", 0.75],
                                       ["Mar", 1.85],  ["Apr", 2.93],
                                       ["May", 3.05],  ["June", 2.02],
                                       ["July", 1.93], ["Aug", 1.62],
                                       ["Sept", 1.84], ["Oct", 1.31],
                                       ["Nov", 1.39],  ["Dec", 0.84]
])

# Notice the nicely formatted output without use of print
avg_monthly_precip
  month precip_in
0 Jan 0.70
1 Feb 0.75
2 Mar 1.85
3 Apr 2.93
4 May 3.05
5 June 2.02
6 July 1.93
7 Aug 1.62
8 Sept 1.84
9 Oct 1.31
10 Nov 1.39
11 Dec 0.84

You can see from the pandas dataframe that each row has an index value, and that the default indexing still begins with [0], as it does for Python lists and numpy arrays.

 A Quick Plot

You can plot pandas dataframe using matplotlib or using the pandas .plot() method which wraps around matplotlib.

f, ax = plt.subplots()
avg_monthly_precip.plot(x="month",
                        y="precip_in",
                        title="Plot of Pandas Data Frame using Pandas .plot",
                        ax=ax)
plt.show()

/opt/conda/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py:1235: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels(xticklabels)
Plot of monthly precipitation using pandas .plot()
Plot of monthly precipitation using pandas .plot()

Or you can plot using the standard matplotlib approach. In this course we will encourage you to use the matplotlib approach which will be more flexible as you begin to create more complex plots.

f, ax = plt.subplots()
ax.plot(avg_monthly_precip.month,
        avg_monthly_precip.precip_in)

ax.set(title="Plot of Pandas Data Frame using Pandas .plot")
plt.show()

Plot of monthly precipitation using matplotlib ax.plot()
Plot of monthly precipitation using matplotlib ax.plot()
Tags
Submitted by shiksha.dahiya on February 10, 2021

Shiksha is working as a Data Scientist at iVagus. She has expertise in Data Science and Machine Learning.

About

Elix is a premium wordpress theme for portfolio, freelancer, design agencies and a wide range of other design institutions.