Skip to content

Fix last issue with GC bridge comparison and enable all tests #115049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

filipnavara
Copy link
Member

@filipnavara filipnavara commented Apr 25, 2025

Fixes an issue in the "new" GC bridge when overflow collections happen in GC. This can happen when a concurrent GC is triggered and the user code calls GC.Collect. In this case the bridge work is restarted and processing_stw_step gets called twice. The state between these two calls was not cleared properly which resulted in accessing dead objects.

Additionally, fixing this issue unblocked all the remaining GC bridge tests which were previously crashing or producing different results than the (default) Tarjan bridge.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 25, 2025
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad
See info in area-owners.md if you want to be subscribed.

@@ -654,6 +654,7 @@ static void
reset_data (void)
{
dyn_array_ptr_empty (&registered_bridges);
free_data ();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure [yet] why is this needed. There are cases where the data were not freed and processing_stw_step started with non-empty hash_table still referencing objects from the previous pass. It's not immediately obvious whether this happens only for the bridge comparison tests or whether it can happen for real-life scenarios too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tarjan bridge frees in processing_after_callback. Seems like it makes more sense to have the cleanup there for consistency. But I'm also curious to understand how we come to this.

Copy link
Member Author

@filipnavara filipnavara Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a very reasonable explanation in the comments:

/*
Do the first bridge step here, as the collector liveness state will become useless after that.
An important optimization is to only process the possibly dead part of the object graph and skip
over all live objects as we transitively know everything they point must be alive too.
The above invariant is completely wrong if we let the gray queue be drained and mark/copy everything.
This has the unfortunate side effect of making overflow collections perform the first step twice, but
given we now have heuristics that perform major GC in anticipation of minor overflows this should not
be a big deal.
*/
sgen_client_bridge_processing_stw_step ();

if (overflow_generation_to_collect != -1) {
SGEN_ASSERT (0, !sgen_concurrent_collection_in_progress, "We don't yet support overflow collections with the concurrent collector");
/*
* We need to do an overflow collection, either because we ran out of memory
* or the nursery is fully pinned.
*/
if (overflow_generation_to_collect == GENERATION_NURSERY)
collect_nursery (overflow_reason, TRUE);
else
major_do_collection (overflow_reason, TRUE, forced_serial);
oldest_generation_collected = MAX (oldest_generation_collected, overflow_generation_to_collect);
}

This means it's an actual product bug and not just testing bug. Under the conditions described in the comment the processing_stw_step will get called twice and the results of the first call need to be thrown away because they may no longer point to valid memory. It also requires explicit GC.Collect(1) call to trigger if I understand it correctly.

@filipnavara filipnavara marked this pull request as ready for review April 25, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-GC-mono community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants