• Andres Freund's avatar
    Add a macro templatized hashtable. · b30d3ea8
    Andres Freund authored
    dynahash.c hash tables aren't quite fast enough for some
    use-cases. There are several reasons for lacking performance:
    - the use of chaining for collision handling makes them cache
      inefficient, that's especially an issue when the tables get bigger.
    - as the element sizes for dynahash are only determined at runtime,
      offset computations are somewhat expensive
    - hash and element comparisons are indirect function calls, causing
      unnecessary pipeline stalls
    - it's two level structure has some benefits (somewhat natural
      partitioning), but increases the number of indirections
    to fix several of these the hash tables have to be adjusted to the
    individual use-case at compile-time. C unfortunately doesn't provide a
    good way to do compile code generation (like e.g. c++'s templates for
    all their weaknesses do).  Thus the somewhat ugly approach taken here is
    to allow for code generation using a macro-templatized header file,
    which generates functions and types based on a prefix and other
    parameters.
    
    Later patches use this infrastructure to use such hash tables for
    tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation,
    ...). In queries where these use up a large fraction of the time, this
    has been measured to lead to performance improvements of over 100%.
    
    There are other cases where this could be useful (e.g. catcache.c).
    
    The hash table design chosen is a variant of linear open-addressing. The
    biggest disadvantage of simple linear addressing schemes are highly
    variable lookup times due to clustering, and deletions leaving a lot of
    tombstones around.  To address these issues a variant of "robin hood"
    hashing is employed.  Robin hood hashing optimizes chaining lengths by
    moving elements close to their optimal bucket ("rich" elements), out of
    the way if a to-be-inserted element is further away from its optimal
    position (i.e. it's "poor").  While that can make insertions slower, the
    average lookup performance is a lot better, and higher fill factors can
    be used in a still performant manner.  To avoid tombstones - which
    normally solve the issue that a deleted node's presence is relevant to
    determine whether a lookup needs to continue looking or is done -
    buckets following a deleted element are shifted backwards, unless
    they're empty or already at their optimal position.
    
    There's further possible improvements that can be made to this
    implementation. Amongst others:
    - Use distance as a termination criteria during searches. This is
      generally a good idea, but I've been able to see the overhead of
      distance calculations in some cases.
    - Consider combining the 'empty' status into the hashvalue, and enforce
      storing the hashvalue. That could, in some cases, increase memory
      density and remove a few instructions.
    - Experiment further with the, very conservatively choosen, fillfactor.
    - Make maximum size of hashtable configurable, to allow storing very
      very large tables. That'd require 64bit hash values to be more common
      than now, though.
    - some smaller memcpy calls could be optimized to copy larger chunks
    But since the new implementation is already considerably faster than
    dynahash it seem sensible to start using it.
    
    Reviewed-By: Tomas Vondra
    Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
    b30d3ea8
simplehash.h 22.6 KB