1. 24 Feb, 2020 8 commits
    • Andres Freund's avatar
      expression eval: Reduce number of steps for agg transition invocations. · 2742c450
      Andres Freund authored
      Do so by combining the various steps that are part of aggregate
      transition function invocation into one larger step. As some of the
      current steps are only necessary for some aggregates, have one variant
      of the aggregate transition step for each possible combination.
      
      To avoid further manual copies of code in the different transition
      step implementations, move most of the code into helper functions
      marked as "always inline".
      
      The benefit of this change is an increase in performance when
      aggregating lots of rows. This comes in part due to the reduced number
      of indirect jumps due to the reduced number of steps, and in part by
      reducing redundant setup code across steps. This mainly benefits
      interpreted execution, but the code generated by JIT is also improved
      a bit.
      
      As a nice side-effect it also ends up making the code a bit simpler.
      
      A small additional optimization is removing the need to set
      aggstate->curaggcontext before calling ExecAggInitGroup, choosing to
      instead passign curaggcontext as an argument. It was, in contrast to
      other aggregate related functions, only needed to fetch a memory
      context to copy the transition value into.
      
      Author: Andres Freund
      Discussion:
         https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
         https://postgr.es/m/5c371df7cee903e8cd4c685f90c6c72086d3a2dc.camel@j-davis.com
      2742c450
    • Michael Paquier's avatar
      Issue properly WAL record for CID of first catalog tuple in multi-insert · 7d672b76
      Michael Paquier authored
      Multi-insert for heap is not yet used actively for catalogs, but the
      code to support this case is in place for logical decoding.  The
      existing code forgot to issue a XLOG_HEAP2_NEW_CID record for the first
      tuple inserted, leading to failures when attempting to use multiple
      inserts for catalogs at decoding time.  This commit fixes the problem by
      WAL-logging the needed CID.
      
      This is not an active bug, so no back-patch is done.
      
      Author: Daniel Gustafsson
      Discussion: https://postgr.es/m/E0D4CC67-A1CF-4DF4-991D-B3AC2EB5FAE9@yesql.se
      7d672b76
    • Tom Lane's avatar
      Account explicitly for long-lived FDs that are allocated outside fd.c. · 3d475515
      Tom Lane authored
      The comments in fd.c have long claimed that all file allocations should
      go through that module, but in reality that's not always practical.
      fd.c doesn't supply APIs for invoking some FD-producing syscalls like
      pipe() or epoll_create(); and the APIs it does supply for non-virtual
      FDs are mostly insistent on releasing those FDs at transaction end;
      and in some cases the actual open() call is in code that can't be made
      to use fd.c, such as libpq.
      
      This has led to a situation where, in a modern server, there are likely
      to be seven or so long-lived FDs per backend process that are not known
      to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
      few spare FDs if max_files_per_process is >= the system ulimit and
      fd.c had opened all the files it thought it safely could.  The
      contrib/postgres_fdw regression test, in particular, could easily be
      made to fall over by running it under a restrictive ulimit.
      
      To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
      that allow outside callers to tell fd.c that they have or want to allocate
      a FD that's not directly managed by fd.c.  Add calls to track all the
      fixed FDs in a standard backend session, so that we are honestly
      guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
      limit in a backend's idle state.  The coding rules for these functions say
      that there's no need to call them in code that just allocates one FD over
      a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
      That means that there aren't all that many places where we need to worry.
      But postgres_fdw and dblink must use this facility to account for
      long-lived FDs consumed by libpq connections.  There may be other places
      where it's worth doing such accounting, too, but this seems like enough
      to solve the immediate problem.
      
      Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
      (Callers can choose to ignore this limit, but of course it's unwise
      to do so except for fixed file allocations.)  I also reduced the limit
      on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
      Conceivably a smarter rule could be used here --- but in practice,
      on reasonable systems, max_safe_fds should be large enough that this
      isn't much of an issue, so KISS for now.  To avoid possible regression
      in the number of external or allocated files that can be opened,
      increase FD_MINFREE and the lower limit on max_files_per_process a
      little bit; we now insist that the effective "ulimit -n" be at least 64.
      
      This seems like pretty clearly a bug fix, but in view of the lack of
      field complaints, I'll refrain from risking a back-patch.
      
      Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
      3d475515
    • Peter Eisentraut's avatar
      Change client-side fsync_fname() to report errors fatally · 1420617b
      Peter Eisentraut authored
      Given all we have learned about fsync() error handling in the last few
      years, reporting an fsync() error non-fatally is not useful,
      unless you don't care much about the file, in which case you probably
      don't need to use fsync() in the first place.
      
      Change fsync_fname() and durable_rename() to exit(1) on fsync() errors
      other than those that we specifically chose to ignore.
      
      This affects initdb, pg_basebackup, pg_checksums, pg_dump, pg_dumpall,
      and pg_rewind.
      Reviewed-by: default avatarMichael Paquier <michael@paquier.xyz>
      Discussion: https://www.postgresql.org/message-id/flat/d239d1bd-aef0-ca7c-dc0a-da14bdcf0392%402ndquadrant.com
      1420617b
    • Robert Haas's avatar
      Adapt hashfn.c and hashutils.h for frontend use. · a91e2fa9
      Robert Haas authored
      hash_any() and its various variants are defined to return Datum,
      which is a backend-only concept, but the underlying functions
      actually want to return uint32 and uint64, and only return Datum
      because it's convenient for callers who are using them to
      implement a hash function for some SQL datatype.
      
      However, changing these functions to return uint32 and uint64
      seems like it might lead to programming errors or back-patching
      difficulties, both because they are widely used and because
      failure to use UInt{32,64}GetDatum() might not provoke a
      compilation error. Instead, rename the existing functions as
      well as changing the return type, and add static inline wrappers
      for those callers that need the previous behavior.
      
      Although this commit adapts hashutils.h and hashfn.c so that they
      can be compiled as frontend code, it does not actually do
      anything that would cause them to be so compiled. That is left
      for another commit.
      
      Patch by me, reviewed by Suraj Kharage and Mark Dilger.
      
      Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
      a91e2fa9
    • Robert Haas's avatar
      Put all the prototypes for hashfn.c into the same header file. · 9341c783
      Robert Haas authored
      Previously, some of the prototypes for functions in hashfn.c were
      in utils/hashutils.h and others were in utils/hsearch.h, but that
      is confusing and has no particular benefit.
      
      Patch by me, reviewed by Suraj Kharage and Mark Dilger.
      
      Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
      9341c783
    • Robert Haas's avatar
      Move bitmap_hash and bitmap_match to bitmapset.c. · 07b95c3d
      Robert Haas authored
      The closely-related function bms_hash_value is already defined in that
      file, and this change means that hashfn.c no longer needs to depend on
      nodes/bitmapset.h. That gets us closer to allowing use of the hash
      functions in hashfn.c in frontend code.
      
      Patch by me, reviewed by Suraj Kharage and Mark Dilger.
      
      Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
      07b95c3d
    • Michael Paquier's avatar
      Add prefix checks in exclude lists for pg_rewind, pg_checksums and base backups · bf883b21
      Michael Paquier authored
      An instance of PostgreSQL crashing with a bad timing could leave behind
      temporary pg_internal.init files, potentially causing failures when
      verifying checksums.  As the same exclusion lists are used between
      pg_rewind, pg_checksums and basebackup.c, all those tools are extended
      with prefix checks to keep everything in sync, with dedicated checks
      added for pg_internal.init.
      
      Backpatch down to 11, where pg_checksums (pg_verify_checksums in 11) and
      checksum verification for base backups have been introduced.
      
      Reported-by: Michael Banck
      Author: Michael Paquier
      Reviewed-by: Kyotaro Horiguchi, David Steele
      Discussion: https://postgr.es/m/62031974fd8e941dd8351fbc8c7eff60d59c5338.camel@credativ.de
      Backpatch-through: 11
      bf883b21
  2. 22 Feb, 2020 3 commits
  3. 21 Feb, 2020 16 commits
  4. 20 Feb, 2020 3 commits
  5. 19 Feb, 2020 10 commits
    • Tom Lane's avatar
      Doc: discourage use of partial indexes for poor-man's-partitioning. · 6a8e5605
      Tom Lane authored
      Creating a bunch of non-overlapping partial indexes is generally
      a bad idea, so add an example saying not to do that.
      
      Back-patch to v10.  Before that, the alternative of using (real)
      partitioning wasn't available, so that the tradeoff isn't quite
      so clear cut.
      
      Discussion: https://postgr.es/m/CAKVFrvFY-f7kgwMRMiPLbPYMmgjc8Y2jjUGK_Y0HVcYAmU6ymg@mail.gmail.com
      6a8e5605
    • Tom Lane's avatar
      Remove support for upgrading extensions from "unpackaged" state. · 70a77320
      Tom Lane authored
      Andres Freund pointed out that allowing non-superusers to run
      "CREATE EXTENSION ... FROM unpackaged" has security risks, since
      the unpackaged-to-1.0 scripts don't try to verify that the existing
      objects they're modifying are what they expect.  Just attaching such
      objects to an extension doesn't seem too dangerous, but some of them
      do more than that.
      
      We could have resolved this, perhaps, by still requiring superuser
      privilege to use the FROM option.  However, it's fair to ask just what
      we're accomplishing by continuing to lug the unpackaged-to-1.0 scripts
      forward.  None of them have received any real testing since 9.1 days,
      so they may not even work anymore (even assuming that one could still
      load the previous "loose" object definitions into a v13 database).
      And an installation that's trying to go from pre-9.1 to v13 or later
      in one jump is going to have worse compatibility problems than whether
      there's a trivial way to convert their contrib modules into extension
      style.
      
      Hence, let's just drop both those scripts and the core-code support
      for "CREATE EXTENSION ... FROM".
      
      Discussion: https://postgr.es/m/20200213233015.r6rnubcvl4egdh5r@alap3.anarazel.de
      70a77320
    • Peter Eisentraut's avatar
      Fix typo · 2f9c46a3
      Peter Eisentraut authored
      Reported-by: default avatarDaniel Verite <daniel@manitou-mail.org>
      2f9c46a3
    • Tom Lane's avatar
      Fix confusion about event trigger vs. plain function in plpgsql. · 761a5688
      Tom Lane authored
      The function hash table keys made by compute_function_hashkey() failed
      to distinguish event-trigger call context from regular call context.
      This meant that once we'd successfully made a hash entry for an event
      trigger (either by validation, or by normal use as an event trigger),
      an attempt to call the trigger function as a plain function would
      find this hash entry and thereby bypass the you-can't-do-that check in
      do_compile().  Thus we'd attempt to execute the function, leading to
      strange errors or even crashes, depending on function contents and
      server version.
      
      To fix, add an isEventTrigger field to PLpgSQL_func_hashkey,
      paralleling the longstanding infrastructure for regular triggers.
      This fits into what had been pad space, so there's no risk of an ABI
      break, even assuming that any third-party code is looking at these
      hash keys.  (I considered replacing isTrigger with a PLpgSQL_trigtype
      enum field, but felt that that carried some API/ABI risk.  Maybe we
      should change it in HEAD though.)
      
      Per bug #16266 from Alexander Lakhin.  This has been broken since
      event triggers were invented, so back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/16266-fcd7f838e97ba5d4@postgresql.org
      761a5688
    • Peter Eisentraut's avatar
      Set gen_random_uuid() to volatile · 2ed19a48
      Peter Eisentraut authored
      It was set to immutable.  This was a mistake in the initial
      commit (5925e554).
      Reported-by: default avatarhubert depesz lubaczewski <depesz@depesz.com>
      Discussion: https://www.postgresql.org/message-id/flat/20200218185452.GA8710%40depesz.com
      2ed19a48
    • Jeff Davis's avatar
      Minor refactor of nodeAgg.c. · 5b618e1f
      Jeff Davis authored
        * Separate calculation of hash value from the lookup.
        * Split build_hash_table() into two functions.
        * Change lookup_hash_entry() to return AggStatePerGroup. That's all
          the caller needed, anyway.
      
      These changes are to support the upcoming Disk-based Hash Aggregation
      work.
      
      Discussion: https://postgr.es/m/31f5ab871a3ad5a1a91a7a797651f20e77ac7ce3.camel%40j-davis.com
      5b618e1f
    • Jeff Davis's avatar
      logtape.c: allocate read buffer even for an empty tape. · 8021985d
      Jeff Davis authored
      Prior to this commit, the read buffer was allocated at the time the tape
      was rewound; but as an optimization, would not be allocated at all if
      the tape was empty.
      
      That optimization meant that it was valid to have a rewound tape with
      the buffer set to NULL, but only if a number of conditions were met
      and only if the API was used properly. After 7fdd919a refactored the
      code to support lazily-allocating the buffer, Coverity started
      complaining.
      
      The optimization for empty tapes doesn't seem important, so just
      allocate the buffer whether the tape has any data or not.
      
      Discussion: https://postgr.es/m/20351.1581868306%40sss.pgh.pa.us
      8021985d
    • Fujii Masao's avatar
      Fix mesurement of elapsed time during truncating heap in VACUUM. · 00749197
      Fujii Masao authored
      VACUUM may truncate heap in several batches. The activity report
      is logged for each batch, and contains the number of pages in the table
      before and after the truncation, and also the elapsed time during
      the truncation. Previously the elapsed time reported in each batch was
      the total elapsed time since starting the truncation until finishing
      each batch. For example, if the truncation was processed dividing into
      three batches, the second batch reported the accumulated time elapsed
      during both first and second batches. This is strange and confusing
      because the number of pages in the table reported together is not
      total. Instead, each batch should report the time elapsed during
      only that batch.
      
      The cause of this issue was that the resource usage snapshot was
      initialized only at the beginning of the truncation and was never
      reset later. This commit fixes the issue by changing VACUUM so that
      the resource usage snapshot is reset at each batch.
      
      Back-patch to all supported branches.
      
      Reported-by: Tatsuhito Kasahara
      Author: Tatsuhito Kasahara
      Reviewed-by: Masahiko Sawada, Fujii Masao
      Discussion: https://postgr.es/m/CAP0=ZVJsf=NvQuy+QXQZ7B=ZVLoDV_JzsVC1FRsF1G18i3zMGg@mail.gmail.com
      00749197
    • Michael Paquier's avatar
      Clean up some code, comments and docs referring to Windows 2000 and older · e2e02191
      Michael Paquier authored
      This fixes and updates a couple of comments related to outdated Windows
      versions.  Particularly, src/common/exec.c had a fallback implementation
      to read a file's line from a pipe because stdin/stdout/stderr does not
      exist in Windows 2000 that is removed to simplify src/common/ as there
      are unlikely versions of Postgres running on such platforms.
      
      Author: Michael Paquier
      Reviewed-by: Kyotaro Horiguchi, Juan José Santamaría Flecha
      Discussion: https://postgr.es/m/20191219021526.GC4202@paquier.xyz
      e2e02191
    • Amit Kapila's avatar
      Stop demanding that top xact must be seen before subxact in decoding. · e3ff789a
      Amit Kapila authored
      Manifested as
      
      ERROR:  subtransaction logged without previous top-level txn record
      
      this check forbids legit behaviours like
       - First xl_xact_assignment record is beyond reading, i.e. earlier
         restart_lsn.
       - After restart_lsn there is some change of a subxact.
       - After that, there is second xl_xact_assignment (for another subxact)
         revealing the relationship between top and first subxact.
      
      Such a transaction won't be streamed anyway because we hadn't seen it in
      full.  Saying for sure whether xact of some record encountered after
      the snapshot was deserialized can be streamed or not requires to know
      whether it wrote something before deserialization point --if yes, it
      hasn't been seen in full and can't be decoded. Snapshot doesn't have such
      info, so there is no easy way to relax the check.
      
      Reported-by: Hsu, John
      Diagnosed-by: Arseny Sher
      Author: Arseny Sher, Amit Kapila
      Reviewed-by: Amit Kapila, Dilip Kumar
      Backpatch-through: 9.5
      Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
      e3ff789a