pandas date time disaster, cannot convert rows of dates to anything uniformly usable

Given this sample data from an excel file to csv (using Pandas) I have tried every form of pd.datetime to convert these seemingly uniform string dates to datetime format. Use the flags errors='coerce' I lose a bunch of dates. No good. Use errors='ignore' I get some columns with dtype datetime and others remain object. No good. The goal is to grab the years for all these dates and then bin them in five year bins from 1980-2000. At this point I am thinking pandas datetime parser is like the Kardashian of parsers, famous for nothing.

Date_1  Date_2    Date_3      Date_4      Date_5    Date_6      
1000    9/1/2019    NaN      NaN          NaN       NaN 
1001    NaN         NaN      NaN          NaN       NaN 
1002    NaN         1/1/2000 NaN          NaN       NaN 
1003    NaN         NaN      NaN          NaN       NaN 
1004    NaN         4/1/2016 NaN          NaN       NaN 
1005    NaN         NaN      NaN          NaN       1/1/2013

What have I tried. pd.todatetime with various flags and without flags. This is the most common error:

TypeError                                 Traceback (most recent call last)
~\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\arrays\ in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2053         try:
-> 2054             values, tz_parsed = conversion.datetime_to_datetime64(data)
   2055             # If tzaware, these values represent unix timestamps, so we

pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

Tried even converting all date strings to just strings and using regex to grab just the year. I only need the year for each of these dates to then use pd.cut or groupby and get the following result in bins.

1980 - 1985     347
1986 - 1990     450
1995 - 2000     47

and so on.

However, having done what I thought was a good set of operations, I keep ending up with dramatically less in date figures than are in the actual data set, like 50% of the dates just disappear from the dataset no matter what datetime conversion is attempted. So much frustration that I have actually linked half the dataset in csv format here so you can actually check out what I am dealing with in reality

Read more here:

Content Attribution

This content was originally published by John Taylor at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: