• Tom Lane's avatar
    Account explicitly for long-lived FDs that are allocated outside fd.c. · 3d475515
    Tom Lane authored
    The comments in fd.c have long claimed that all file allocations should
    go through that module, but in reality that's not always practical.
    fd.c doesn't supply APIs for invoking some FD-producing syscalls like
    pipe() or epoll_create(); and the APIs it does supply for non-virtual
    FDs are mostly insistent on releasing those FDs at transaction end;
    and in some cases the actual open() call is in code that can't be made
    to use fd.c, such as libpq.
    
    This has led to a situation where, in a modern server, there are likely
    to be seven or so long-lived FDs per backend process that are not known
    to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
    few spare FDs if max_files_per_process is >= the system ulimit and
    fd.c had opened all the files it thought it safely could.  The
    contrib/postgres_fdw regression test, in particular, could easily be
    made to fall over by running it under a restrictive ulimit.
    
    To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
    that allow outside callers to tell fd.c that they have or want to allocate
    a FD that's not directly managed by fd.c.  Add calls to track all the
    fixed FDs in a standard backend session, so that we are honestly
    guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
    limit in a backend's idle state.  The coding rules for these functions say
    that there's no need to call them in code that just allocates one FD over
    a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
    That means that there aren't all that many places where we need to worry.
    But postgres_fdw and dblink must use this facility to account for
    long-lived FDs consumed by libpq connections.  There may be other places
    where it's worth doing such accounting, too, but this seems like enough
    to solve the immediate problem.
    
    Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
    (Callers can choose to ignore this limit, but of course it's unwise
    to do so except for fixed file allocations.)  I also reduced the limit
    on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
    Conceivably a smarter rule could be used here --- but in practice,
    on reasonable systems, max_safe_fds should be large enough that this
    isn't much of an issue, so KISS for now.  To avoid possible regression
    in the number of external or allocated files that can be opened,
    increase FD_MINFREE and the lower limit on max_files_per_process a
    little bit; we now insist that the effective "ulimit -n" be at least 64.
    
    This seems like pretty clearly a bug fix, but in view of the lack of
    field complaints, I'll refrain from risking a back-patch.
    
    Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
    3d475515
xlog.c 385 KB