Panda DataFrame Set operations

In Data Science, We often try to extract and scrape data from various sources but while analyzing these data we often come to this situation where we need to do different types of comparisons. We can check what all is different in each dataframe or what is common between two dataframe by using set operation.

In Maths we used set operation for comparison and it has same purpose in pandas also. 

It will be interesting to treat a pandas DatFrame as a mathematical set. Here each row of DataFrame will be considered as an element or member of the set.

1. Union

If there are two sets given A and B, the A union B (A ∪ B) is the set in which we get all the elements present in both A or B.
This Operation is used to count all the elements present in all the given tables.

In Pandas DataFrame Set Operation of the union can be performed using concat() method which is followed by drop_duplicate.

2. Intersection

If there are two sets given A and B, the A intersection B (A ∩ B) is the opposite of the union. Where we can only keep the common elements between two given sets.

In Pandas DataFrame Set Operation we use merge() method which is used for the intersection operation.

3. Difference

If there are two sets given A and B then,

  • A - B, is the difference of A and B returns the set that includes all the elements that are in A but not in B. 
  • B - A, is the difference of B and A returns the set that includes all the elements that are in B but not in A.

Note: A - B and B - A is different from each other.

In Pandas Dataframe Set operation we use isin() method in tandem with boolean indexing.

Mon, 02/15/2021 - 15:10

Authored by

Devanshi, is working as a Data Scientist with iVagus. She has expertise in Python, NumPy, Pandas and other data science technologies.