• Thomas Munro's avatar
    Fix race in Parallel Hash Join batch cleanup. · 3b8981b6
    Thomas Munro authored
    With very unlucky timing and parallel_leader_participation off, PHJ
    could attempt to access per-batch state just as it was being freed.
    There was code intended to prevent that by checking for a cleared
    pointer, but it was buggy.
    
    Fix, by introducing an extra barrier phase.  The new phase
    PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to
    find a batch to help with, and PHJ_BUILD_DONE means that it is too late.
    The last to detach will free the array of per-batch state as before, but
    now it will also atomically advance the phase at the same time, so that
    late attachers can avoid the hazard, without the data race.  This
    mirrors the way per-batch hash tables are freed (see phases
    PHJ_BATCH_PROBING and PHJ_BATCH_DONE).
    
    Revealed by a one-off build farm failure, where BarrierAttach() failed a
    sanity check assertion, because the memory had been clobbered by
    dsa_free().
    
    Back-patch to 11, where the code arrived.
    Reported-by: default avatarMichael Paquier <michael@paquier.xyz>
    Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
    3b8981b6
nodeHashjoin.c 47.2 KB