Add warning to `.groupby` when null keys would be dropped due to default `dropna` #61351

tehunter · 2025-04-24T21:01:55Z

closes ENH: Add warning when DataFrame.groupby drops NA keys #61339
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

TODO:

Check performance for codes check approaches (codes.min() was about 3x faster)
Run full test suite to ensure nothing broke
Add tests/implementation for .pivot_table/.stack/etc. (possibly in a follow-up PR?)

tehunter · 2025-04-24T21:02:31Z

pandas/core/groupby/grouper.py

+_NULL_KEY_MESSAGE = (
+    "`dropna` is not specified but grouper encountered null group keys. These keys "
+    "will be dropped from the result by default. To keep null keys, set `dropna=True`, "
+    "or to hide this warning and drop null keys, set `dropna=False`."
+)


Is this a standard approach for a warning message that could be hit from two lines of code?

tehunter · 2025-04-24T21:04:31Z

pandas/core/groupby/groupby.py

+    @property
+    def dropna(self) -> bool:
+        if self._dropna is lib.no_default:
+            return True
+        return self._dropna


I know the implementation is trivial, but this is redundant with Grouper. I'm not sure we can get around it while still being a class property, but should the default value be referenced as a constant defined just once?

this will help with PDEP-11 (pandas-dev#53094) as an intermediate step to identify tests that will fail under the default value

tehunter added 7 commits April 24, 2025 18:43

add NullKeyWarning

1705532

Add tests

bfa5846

fix index and Series tests

992eaff

add multi-index and categorical tests

d1c5053

implement dropna null key warning

47cabb2

add test for Series.groupby

0bf986e

implement for Series.groupby

cee2378

tehunter commented Apr 24, 2025

View reviewed changes

tehunter added 3 commits April 25, 2025 13:50

add mode.null_grouper_warning option

41131a1

fix tests which trigger NullKeyWarning

692c153

this will help with PDEP-11 (pandas-dev#53094) as an intermediate step to identify tests that will fail under the default value

resolve repr change and empty grouper bug

9822a66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add warning to `.groupby` when null keys would be dropped due to default `dropna` #61351

Add warning to `.groupby` when null keys would be dropped due to default `dropna` #61351

tehunter commented Apr 24, 2025 •

edited

Loading

tehunter Apr 24, 2025

tehunter Apr 24, 2025

Add warning to .groupby when null keys would be dropped due to default dropna #61351

Are you sure you want to change the base?

Add warning to .groupby when null keys would be dropped due to default dropna #61351

Conversation

tehunter commented Apr 24, 2025 • edited Loading

tehunter Apr 24, 2025

Choose a reason for hiding this comment

tehunter Apr 24, 2025

Choose a reason for hiding this comment

Add warning to `.groupby` when null keys would be dropped due to default `dropna` #61351

Add warning to `.groupby` when null keys would be dropped due to default `dropna` #61351

tehunter commented Apr 24, 2025 •

edited

Loading