-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: assignment via loc silently fails with differing dtypes #61346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Confirmed on main. Still silently works when assigning differing dtypes to columns via It seems to me that this should be raising an error, consistent with the behavior introduced for other dtype mismatches (e.g., int64 ← str, which now raises a |
This may be well known, but just in case,
This isn't so clear to me, e.g. df = pd.DataFrame({"a": [1.0, 2.5, 3.0]})
df.loc[:, "a"] = 5
print(df)
# a
# 0 5.0
# 1 5.0
# 2 5.0 Should this raise? I personally think the answer there is no. But I'm not sure we ever made any decisions on which implicit conversions should and should not be allowed. This is somewhat related to PDEP-6. |
cc @pandas-dev/pandas-core |
I don't think that is what is going on here. It's not about incompatible types not being recognized. It's about the automatic conversion that is done with strings that are formatted datetime objects being assigned to a series that has df['bar'] = pd.to_datetime(df['foo'], format='%Y-%m-%d')
df.loc[:, 'bar'] = df.loc[:, 'bar'].dt.strftime('%Y%m%d') the first statement sets the dtype of >>> df.loc[:, "bar"] = ["290102", "300304"]
>>> df
foo bar
0 2025-04-23 2002-01-29
1 2025-04-22 2004-03-30 I'm not sure if we want to change the behavior in this case. If On the other hand, as shown in the example, if a user did something like that, it is unclear whether they wanted the dates parsed as YYMMDD or DDMMYY. So maybe we should be warning if things are ambiguous?? |
Yup, looks like it's going down the mixed formats path (🙀 ) In [8]: df = pd.DataFrame({'foo': ['2025-04-23', '2025-04-22']}); df['bar'] = pd.to_datetime(df['foo'], format='%Y-%m-%d
...: ')
In [9]: df.loc[:, 'bar'] = ['12/01/2020', '13/01/2020']
In [10]: df
Out[10]:
foo bar
0 2025-04-23 2020-12-01
1 2025-04-22 2020-01-13 |
@Dr-Irv, thanks for the detailed explanation. From what you’ve described, it appears that the automatic conversion of string-formatted datetime values when assigned via .loc is intentional and, in many cases, desirable for allowing flexible value updates in datetime columns. Given this understanding, it seems the current behavior isn’t a bug per se but a design choice. If the consensus among maintainers and the community is that this behavior should remain as is, then it might make sense to close this issue. However, if there’s broader interest in re-evaluating the behavior—perhaps to introduce warnings or alternative handling for ambiguous string formats—it could be worthwhile to change the title with a view to moving the discussion towards a new enhancement proposal. |
Sure, but it would probably be in line with pdep4 for the parsing to be consistent rather than changing format mid column? |
@MarcoGorelli Is your sample in #61346 (comment) a bug not already covered by other open issues? |
I couldn't find so, I've made a new one: #61353 At which I agree that what's described here is intentional and not a bug 👍 |
thanks @MarcoGorelli |
This discussion has been quite illuminating: thanks for all your responses. In the past I’ve avoided using |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I expect
bar
to look likeinstead of
Expected Behavior
bar
should look likeInstalled Versions
The text was updated successfully, but these errors were encountered: