• Andres Freund's avatar
    Make simplehash.h grow hashtable in additional cases. · d4c62a6b
    Andres Freund authored
    Increase the size when either the distance between actual and optimal
    slot grows too large, or when too many subsequent entries would have
    to be moved.
    
    This addresses reports that the simplehash performed, sometimes
    considerably, worse than dynahash.
    
    The reason turned out to be that insertions into the hashtable where,
    due to the use of parallel query, in effect done from another
    hashtable, in hash-value order.  If the target hashtable, due to
    mis-estimation, was sized a lot smaller than the source table(s) that
    lead to very imbalanced tables; a lot of entries in many close-by
    buckets from the source tables were inserted into a single, wider,
    bucket on the target table.  As the growth factor was solely computed
    based on the fillfactor, the performance of the table decreased
    further and further.
    
    b81b5a96 was an attempt to address this problem for hash
    aggregates (but not for bitmap scans), but it turns out that the
    current method of mixing hash values often actually leaves neighboring
    hash-values close to each other, just in different value range.  It
    might be worth revisiting that independently of the performance issues
    addressed in this patch..
    
    To address that problem resize tables in two additional cases: Firstly
    when the optimal position for an entry would be far from the actual
    position, secondly when many entries would have to be moved to make
    space for the new entry (while satisfying the robin hood property).
    
    Due to the additional resizing threshold it seems possible, and
    testing confirms that so far, that a higher fillfactor doesn't hurt
    performance and saves a bit of memory.  It seems better to increase it
    now, before a release containing any of this code, rather than wonder
    in some later release.
    
    The various boundaries aren't determined in a particularly scientific
    manner, they might need some fine-tuning.
    
    In all my tests the new code now, even with parallelism, performs at
    least as good as the old code, in several scenarios significantly
    better.
    
    Reported-By: Dilip Kumar, Robert Haas, Kuntal Ghosh
    Discussion:
        https://postgr.es/m/CAFiTN-vagvuAydKG9VnWcoK=ADAhxmOa4ZTrmNsViBBooTnriQ@mail.gmail.com
        https://postgr.es/m/CAGz5QC+=fNTYgzMLTBUNeKt6uaWZFXJbkB5+7oWm-n9DwVxcLA@mail.gmail.com
    d4c62a6b
simplehash.h 24.9 KB