-
Notifications
You must be signed in to change notification settings - Fork 5k
Fix last issue with GC bridge comparison and enable all tests #115049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @BrzVlad |
@@ -654,6 +654,7 @@ static void | |||
reset_data (void) | |||
{ | |||
dyn_array_ptr_empty (®istered_bridges); | |||
free_data (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure [yet] why is this needed. There are cases where the data were not freed and processing_stw_step
started with non-empty hash_table
still referencing objects from the previous pass. It's not immediately obvious whether this happens only for the bridge comparison tests or whether it can happen for real-life scenarios too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tarjan bridge frees in processing_after_callback
. Seems like it makes more sense to have the cleanup there for consistency. But I'm also curious to understand how we come to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually a very reasonable explanation in the comments:
runtime/src/mono/mono/sgen/sgen-gc.c
Lines 1179 to 1191 in 3eef331
/* | |
Do the first bridge step here, as the collector liveness state will become useless after that. | |
An important optimization is to only process the possibly dead part of the object graph and skip | |
over all live objects as we transitively know everything they point must be alive too. | |
The above invariant is completely wrong if we let the gray queue be drained and mark/copy everything. | |
This has the unfortunate side effect of making overflow collections perform the first step twice, but | |
given we now have heuristics that perform major GC in anticipation of minor overflows this should not | |
be a big deal. | |
*/ | |
sgen_client_bridge_processing_stw_step (); |
runtime/src/mono/mono/sgen/sgen-gc.c
Lines 2671 to 2685 in 3eef331
if (overflow_generation_to_collect != -1) { | |
SGEN_ASSERT (0, !sgen_concurrent_collection_in_progress, "We don't yet support overflow collections with the concurrent collector"); | |
/* | |
* We need to do an overflow collection, either because we ran out of memory | |
* or the nursery is fully pinned. | |
*/ | |
if (overflow_generation_to_collect == GENERATION_NURSERY) | |
collect_nursery (overflow_reason, TRUE); | |
else | |
major_do_collection (overflow_reason, TRUE, forced_serial); | |
oldest_generation_collected = MAX (oldest_generation_collected, overflow_generation_to_collect); | |
} |
This means it's an actual product bug and not just testing bug. Under the conditions described in the comment the processing_stw_step
will get called twice and the results of the first call need to be thrown away because they may no longer point to valid memory. It also requires explicit GC.Collect(1)
call to trigger if I understand it correctly.
Fixes an issue in the "new" GC bridge when overflow collections happen in GC. This can happen when a concurrent GC is triggered and the user code calls
GC.Collect
. In this case the bridge work is restarted andprocessing_stw_step
gets called twice. The state between these two calls was not cleared properly which resulted in accessing dead objects.Additionally, fixing this issue unblocked all the remaining GC bridge tests which were previously crashing or producing different results than the (default) Tarjan bridge.