An Introduction to Pandas in Python

Profile picture for user devanshi.srivastava
Submitted by devanshi.srivastava on


In the year 2008 development for pandas began at AQR Capital Management by Wes McKinney out of need for a high performance, flexible tool to perform quantitative anaylsis on data.

By the end of 2009 it has become open source community. In 2012, another AQ Employee,Chang She, made the second big contribution to pandas library.

In 2015, pandas has successfully become sponsored project of NumFOCUS. In 2018, First in-person developer sprint.

What Is Pandas In Python?

  • Before Pandas Python was mainly used for data munging and preparation. It was having a very little contribution towards data analysis. Pandas has sloved this problem in Python.
  • "Pandas" has a reference to both "Pannel Data" and "Python Data Analysis".
  • Pandas in an open-source library. This library was built on top of the NumPy Library. It has fast and has high performance and productivity for users. It is used to perform operation for manipulating numerical data and time series.
  • Pandas is used in a wide range of fields including academic and commercial domains including finance, economics,Statistical,analytics , etc.
  • Pandas can analyse big data and make conclusions based on Statistical Theories. We can also do five steps in analysis of data using Pandas-load,prepare,manipulate,model and analyze.
  • It can clean mess data sets as well as make them readable and relevant.

key feature of Pandas Library

  • It is fast and efficient for manipulating and analyzing data.
  • Size mutability
  • It do flexible reshaping and pivoting of data sets.
  • It provides time-series functionality.
  • It can also do Data set merging and joning.
  • High performance merging and joining of data.
  • Provides time-series functionality.

Why Pandas is used for Data Science?

It is generally used for Data Science because it is built on top of the Numpy Library which means that lot of structures which are used in Numoy are replicated in Pandas. The data which are produced by Pandas are often used in input for statistical analysis in SciPy, machine learning algorithms in Scikit-learn, Matplotlib.

Pandas Program can be easily run from the Jupyter Notebook as Jupyter has ability to execute code in a particular cell rather than executing whole file.