• Tom Lane's avatar
    Fix assorted issues in parallel vacuumdb. · 94173d3e
    Tom Lane authored
    Avoid storing the result of PQsocket() in a pgsocket variable; it's
    declared as int, and the no-socket test is properly written as "x < 0"
    not "x == PGINVALID_SOCKET".  This accidentally had no bad effect
    because we never got to init_slot() with a bad connection, but it's
    still wrong.
    
    Actually, it seems like we should avoid storing the result for a long
    period at all.  The function's not so expensive that it's worth avoiding,
    and the existing coding technique here would fail if anyone tried to
    PQreset the connection during the life of the program.  Hence, just
    re-call PQsocket every time we construct a select(2) mask.
    
    Speaking of select(), GetIdleSlot imagined that it could compute the
    select mask once and continue to use it over multiple calls to
    select_loop(), which is pretty bogus since that would stomp on the
    mask on return.  This could only matter if the function's outer loop
    iterated more than once, which is unlikely (it'd take some connection
    receiving data, but not enough to complete its command).  But if it
    did happen, we'd acquire "tunnel vision" and stop watching the other
    connections for query termination, with the effect of losing parallelism.
    
    Another way in which GetIdleSlot could lose parallelism is that once
    PQisBusy returns false, it would lock in on that connection and do
    PQgetResult until that returns NULL; in some cases that could result
    in blocking.  (Perhaps this can never happen in vacuumdb due to the
    limited set of commands that it can issue, but I'm not quite sure
    of that, and even if true today it's not a future-proof assumption.)
    Refactor the code to do that properly, so that it risks blocking in
    PQgetResult only in cases where we need to wait anyway.
    
    Another loss-of-parallelism problem, which *is* easily demonstrable,
    is that any setup queries issued during prepare_vacuum_command() were
    always issued on the last-to-be-created connection, whether or not
    that was idle.  Long-running operations on that connection thus
    prevented issuance of additional operations on the other ones, except
    in the limited cases where no preparatory query was needed.  Instead,
    wait till we've identified a free connection and use that one.
    
    Also, avoid core dump due to undersized malloc request in the case
    that no tables are identified to be vacuumed.
    
    The bogus no-socket test was noted by CharSyam, the other problems
    identified in my own code review.  Back-patch to 9.5 where parallel
    vacuumdb was introduced.
    
    Discussion: https://postgr.es/m/CAMrLSE6etb33-192DTEUGkV-TsvEcxtBDxGWG1tgNOMnQHwgDA@mail.gmail.com
    94173d3e
vacuumdb.c 25.3 KB