• Fujii Masao's avatar
    Log when GetNewOidWithIndex() fails to find unused OID many times. · 7fbcee1b
    Fujii Masao authored
    GetNewOidWithIndex() generates a new OID one by one until it finds
    one not in the relation. If there are very long runs of consecutive
    existing OIDs, GetNewOidWithIndex() needs to iterate many times
    in the loop to find unused OID. Since TOAST table can have a large
    number of entries and there can be such long runs of OIDs, there is
    the case where it takes so many iterations to find new OID not in
    TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used,
    GetNewOidWithIndex() enters something like busy loop and repeats
    the iterations until at least one OID is marked as unused.
    
    There are some reported troubles caused by a large number of
    iterations in GetNewOidWithIndex(). For example, when inserting
    a billion of records into the table, all the backends doing that
    insertion operation got hang with 100% CPU usage at some point.
    
    Previously there was no easy way to detect that GetNewOidWithIndex()
    failed to find unused OID many times. So, for example, gdb full
    backtrace of hanged backends needed to be taken, in order to
    investigate that trouble. This is inconvenient and may not be
    available in some production environments.
    
    To provide easy way for that, this commit makes GetNewOidWithIndex()
    log that it iterates more than GETNEWOID_LOG_THRESHOLD but have
    not yet found OID unused in the relation. Also this commit makes
    it repeat logging with exponentially increasing intervals until
    it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it
    finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless
    an unused OID is found. Those macro variables are used not to
    fill up the server log with the similar messages.
    
    In the discusion at pgsql-hackers, there was another idea to report
    the lots of iterations in GetNewOidWithIndex() via wait event.
    But since GetNewOidWithIndex() traverses indexes to find unused
    OID and which will do I/O, acquire locks, etc, which will overwrite
    the wait event and reset it to nothing once done. So that idea
    doesn't work well, and we didn't adopt it.
    
    Author: Tomohiro Hiramitsu
    Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao
    Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org
    7fbcee1b
catalog.c 18.1 KB