Skip to content

BUG: assignment via loc silently fails with differing dtypes #61346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
zbs opened this issue Apr 23, 2025 · 11 comments
Closed
3 tasks done

BUG: assignment via loc silently fails with differing dtypes #61346

zbs opened this issue Apr 23, 2025 · 11 comments
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Dtype Conversions Unexpected or buggy dtype conversions

Comments

@zbs
Copy link

zbs commented Apr 23, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
print(pd.__version__)
df = pd.DataFrame({'foo': ['2025-04-23', '2025-04-22']})
df['bar'] = pd.to_datetime(df['foo'], format='%Y-%m-%d')
df.loc[:, 'bar'] = df.loc[:, 'bar'].dt.strftime('%Y%m%d')
print(df)

# Yields
# 2.2.3
#           foo        bar
# 0  2025-04-23 2025-04-23
# 1  2025-04-22 2025-04-22

Issue Description

I expect bar to look like

20250423
20250422

instead of

2025-04-23
2025-04-22

Expected Behavior

bar should look like

20250423
20250422

Installed Versions

[ins] In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit                : 0691c5cf90477d3503834d983f69350f250a6ff7
python                : 3.12.10
python-bits           : 64
OS                    : Linux
OS-release            : 4.18.0-372.32.1.el8_6.x86_64
Version               : #1 SMP Fri Oct 7 12:35:10 EDT 2022
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.3
numpy                 : 1.26.4
pytz                  : 2025.2
dateutil              : 2.9.0.post0
pip                   : 25.0.1
Cython                : 3.0.12
sphinx                : None
IPython               : 8.35.0
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : 4.13.3
blosc                 : None
bottleneck            : 1.4.2
dataframe-api-compat  : None
fastparquet           : None
fsspec                : 2024.9.0
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : 3.1.6
lxml.etree            : 5.3.2
matplotlib            : 3.10.1
numba                 : 0.61.2
numexpr               : 2.10.2
odfpy                 : None
openpyxl              : 3.1.5
pandas_gbq            : None
psycopg2              : None
pymysql               : None
pyarrow               : 15.0.2
pyreadstat            : None
pytest                : 8.3.5
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : 1.15.2
sqlalchemy            : 2.0.39
tables                : 3.9.2
tabulate              : 0.9.0
xarray                : 2025.3.1
xlrd                  : 2.0.1
xlsxwriter            : 3.2.2
zstandard             : 0.23.0
tzdata                : 2025.2
qtpy                  : None
pyqt5                 : None
@zbs zbs added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 23, 2025
@arthurlw
Copy link
Contributor

arthurlw commented Apr 23, 2025

Confirmed on main. Still silently works when assigning differing dtypes to columns via .loc (in the example above, assigning strings to a datetime64 column).

It seems to me that this should be raising an error, consistent with the behavior introduced for other dtype mismatches (e.g., int64 ← str, which now raises a LossySetitemError when assigning with .loc).

@rhshadrach
Copy link
Member

rhshadrach commented Apr 24, 2025

This may be well known, but just in case, df['bar'] = df.loc[:, 'bar'].dt.strftime('%Y%m%d') gives the desired behavior for the OP and is what should be used when you want to overwrite a column with a (possibly) different dtype.

It seems to me that this should be raising an error, consistent with the behavior introduced for other dtype mismatches

This isn't so clear to me, e.g.

df = pd.DataFrame({"a": [1.0, 2.5, 3.0]})
df.loc[:, "a"] = 5
print(df)
#      a
# 0  5.0
# 1  5.0
# 2  5.0

Should this raise? I personally think the answer there is no. But I'm not sure we ever made any decisions on which implicit conversions should and should not be allowed. This is somewhat related to PDEP-6.

@rhshadrach
Copy link
Member

cc @pandas-dev/pandas-core

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 24, 2025
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 24, 2025

I don't think that is what is going on here. It's not about incompatible types not being recognized. It's about the automatic conversion that is done with strings that are formatted datetime objects being assigned to a series that has datetime64 dtype. With these statements:

df['bar'] = pd.to_datetime(df['foo'], format='%Y-%m-%d')
df.loc[:, 'bar'] = df.loc[:, 'bar'].dt.strftime('%Y%m%d')

the first statement sets the dtype of "bar" to be datetime64. In the second statement, the expression df.loc[:, 'bar'].dt.strftime('%Y%m%d') has object dtype - it is a set of strings. But because it is being assigned to a column with datetime64 dtype, we first try to parse the strings to see if it is a valid date. So then we keep the dtype as datetime64.
For example:

>>> df.loc[:, "bar"] = ["290102", "300304"]
>>> df
          foo        bar
0  2025-04-23 2002-01-29
1  2025-04-22 2004-03-30

I'm not sure if we want to change the behavior in this case. If .loc is used to change values in a column with datetime64 dtype, the ability to parse a string is useful as it lets you fix individual values (or selected rows) without having to parse the strings into dates.

On the other hand, as shown in the example, if a user did something like that, it is unclear whether they wanted the dates parsed as YYMMDD or DDMMYY. So maybe we should be warning if things are ambiguous??

@MarcoGorelli
Copy link
Member

Yup, looks like it's going down the mixed formats path (🙀 )

In [8]: df = pd.DataFrame({'foo': ['2025-04-23', '2025-04-22']}); df['bar'] = pd.to_datetime(df['foo'], format='%Y-%m-%d
   ...: ')

In [9]: df.loc[:, 'bar'] = ['12/01/2020', '13/01/2020']

In [10]: df
Out[10]:
          foo        bar
0  2025-04-23 2020-12-01
1  2025-04-22 2020-01-13

@simonjayhawkins
Copy link
Member

@Dr-Irv, thanks for the detailed explanation. From what you’ve described, it appears that the automatic conversion of string-formatted datetime values when assigned via .loc is intentional and, in many cases, desirable for allowing flexible value updates in datetime columns.

Given this understanding, it seems the current behavior isn’t a bug per se but a design choice. If the consensus among maintainers and the community is that this behavior should remain as is, then it might make sense to close this issue.

However, if there’s broader interest in re-evaluating the behavior—perhaps to introduce warnings or alternative handling for ambiguous string formats—it could be worthwhile to change the title with a view to moving the discussion towards a new enhancement proposal.

@MarcoGorelli
Copy link
Member

Given this understanding, it seems the current behavior isn’t a bug per se but a design choice

Sure, but it would probably be in line with pdep4 for the parsing to be consistent rather than changing format mid column?

@simonjayhawkins
Copy link
Member

@MarcoGorelli Is your sample in #61346 (comment) a bug not already covered by other open issues?

@MarcoGorelli
Copy link
Member

I couldn't find so, I've made a new one: #61353

At which I agree that what's described here is intentional and not a bug 👍

@rhshadrach rhshadrach added Closing Candidate May be closeable, needs more eyeballs and removed Needs Discussion Requires discussion from core team before further action labels Apr 25, 2025
@simonjayhawkins
Copy link
Member

I've made a new one: #61353

thanks @MarcoGorelli

@zbs
Copy link
Author

zbs commented Apr 26, 2025

This discussion has been quite illuminating: thanks for all your responses. In the past I’ve avoided using df[col] because pandas will complain if df is a view; instead I use df.loc wherever possible. Given that the above suggests using df[col], is that warning no longer valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

6 participants