Quick Tip: Comparing two pandas dataframes and getting the differences

There are times when working with different pandas dataframes that you might need to get the data that is ‘different’ between the two dataframes (i.e.,g Comparing two pandas dataframes and getting the differences). This seems like a straightforward issue, but apparently its still a popular ‘question’ for many people and is my most popular question on stackoverflow.

As an example, let’s look at two pandas dataframes. Both have date indexes and the same structure. How can we compare these two dataframes and find which rows are in dataframe 2 that aren’t in dataframe 1?

dataframe 1 (named df1):

dataframe 2 (named df2):

The answer, it seems, is quite simple – but I couldn’t figure it out at the time.  Thanks to the generosity of stackoverflow users, the answer (or at least an answer that works) is simply to concat the dataframes then perform a group-by via columns and finally re-index to get the unique records based on the index.

Here’s the code (as provided by user alko on stackoverlow):

This simple approach leads to the correct answer:

There are most likely more ‘pythonic’ answers (one suggestion is here) and I’d recommend you dig into those other approaches, but the above works, is easy to read and is  fast enough for my needs.


Want more information about pandas for data analysis? Check out the book Python for Data Analysis by the creator of pandas, Wes McKinney.

3
Leave a Reply

avatar
2 Comment threads
1 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
2 Comment authors
Eric D. Brown, D.Sc.Jurryt Pietersma Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
trackback

[…] post Quick Tip: Comparing two pandas dataframes and getting the differences appeared first on Python […]

Jurryt Pietersma
Guest

Consider passing the dataframes to concat in a dictionary, results in a multi-index dataframe from which you can easily delete the duplicates, which results in a multi-index dataframe with the differences between the dataframes:

https://stackoverflow.com/questions/20225110/comparing-two-dataframes-and-getting-the-differences/42652112#42652112