1. 16 Sep, 2020 2 commits
    • David Rowley's avatar
      Optimize compactify_tuples function · 19c60ad6
      David Rowley authored
      This function could often be seen in profiles of vacuum and could often
      be a significant bottleneck during recovery. The problem was that a qsort
      was performed in order to sort an array of item pointers in reverse offset
      order so that we could use that to safely move tuples up to the end of the
      page without overwriting the memory of yet-to-be-moved tuples. i.e. we
      used to compact the page starting at the back of the page and move towards
      the front. The qsort that this required could be expensive for pages with
      a large number of tuples.
      
      In this commit, we take another approach to tuple compactification.
      
      Now, instead of sorting the remaining item pointers array we first check
      if the array is presorted and only memmove() the tuples that need to be
      moved. This presorted check can be done very cheaply in the calling
      functions when the array is being populated. This presorted case is very
      fast.
      
      When the item pointer array is not presorted we must copy tuples that need
      to be moved into a temp buffer before copying them back into the page
      again. This differs from what we used to do here as we're now copying the
      tuples back into the page in reverse line pointer order. Previously we
      left the existing order alone.  Reordering the tuples results in an
      increased likelihood of hitting the pre-sorted case the next time around.
      Any newly added tuple which consumes a new line pointer will also maintain
      the correct sort order of tuples in the page which will also result in the
      presorted case being hit the next time.  Only consuming an unused line
      pointer can cause the order of tuples to go out again, but that will be
      corrected next time the function is called for the page.
      
      Benchmarks have shown that the non-presorted case is at least equally as
      fast as the original qsort method even when the page just has a few
      tuples. As the number of tuples becomes larger the new method maintains
      its performance whereas the original qsort method became much slower when
      the number of tuples on the page became large.
      
      Author: David Rowley
      Reviewed-by: Thomas Munro
      Tested-by: Jakub Wartak
      Discussion: https://postgr.es/m/CA+hUKGKMQFVpjr106gRhwk6R-nXv0qOcTreZuQzxgpHESAL6dw@mail.gmail.com
      19c60ad6
    • Alvaro Herrera's avatar
      Fix use-after-free bug with event triggers in an extension script · ced138e8
      Alvaro Herrera authored
      ALTER TABLE commands in an extension script are added to an event
      trigger command list; but starting with commit b5810de3 they do so in
      a memory context that's too short-lived, so when execution ends and time
      comes to use the entries, they've already been freed.
      
      (This would also be a problem with ALTER TABLE commands in a
      multi-command query string, but these serendipitously end in
      PortalContext -- which probably explains why it took so long for this to
      be reported.)
      
      Fix by using the memory context specifically set for that, instead.
      
      Backpatch to 13, where the aforementioned commit appeared.
      
      Reported-by: Philippe Beaudoin
      Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
      Discussion: https://postgr.es/m/20200902193715.6e0269d4@firost
      ced138e8
  2. 15 Sep, 2020 3 commits
  3. 14 Sep, 2020 6 commits
    • Tom Lane's avatar
      Make walsenders show their replication commands in pg_stat_activity. · f560209c
      Tom Lane authored
      A walsender process that has executed a SQL command left the text of
      that command in pg_stat_activity.query indefinitely, which is quite
      confusing if it's in RUNNING state but not doing that query.  An easy
      and useful fix is to treat replication commands as if they were SQL
      queries, and show them in pg_stat_activity according to the same rules
      as for regular queries.  While we're at it, it seems also sensible to
      set debug_query_string, allowing error logging and debugging to see
      the replication command.
      
      While here, clean up assorted silliness in exec_replication_command:
      
      * The SQLCmd path failed to restore CurrentMemoryContext to the caller's
      value, and failed to delete the temp context created in this routine.
      It's only through great good fortune that these oversights did not
      result in long-term memory leaks or other problems.  It seems cleaner
      to code SQLCmd as a separate early-exit path, so do it like that.
      
      * Remove useless duplicate call of SnapBuildClearExportedSnapshot().
      
      * replication_scanner_finish() was never called.
      
      None of those things are significant enough to merit a backpatch,
      so this is for HEAD only.
      
      Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us
      f560209c
    • Noah Misch's avatar
      Fix interpolation in test name. · 47a3a1c3
      Noah Misch authored
      A pre-commit review had reported the problem, but the fix reached only
      v10 and earlier.  Back-patch to v11.
      
      Discussion: https://postgr.es/m/20200423.140546.1055476118690602079.horikyota.ntt@gmail.com
      47a3a1c3
    • Fujii Masao's avatar
      Fix typos. · 95233011
      Fujii Masao authored
      Author: Naoki Nakamichi
      Discussion: https://postgr.es/m/b6919d145af00295a8e86ce4d034b7cd@oss.nttdata.com
      95233011
    • Michael Paquier's avatar
      Make index_set_state_flags() transactional · 83158f74
      Michael Paquier authored
      3c840464 is the original commit that introduced index_set_state_flags(),
      where the presence of SnapshotNow made necessary the use of an in-place
      update.  SnapshotNow has been removed in 813fb031, so there is no actual
      reasons to not make this operation transactional.
      
      Note that while making the operation more robust, using a transactional
      operation in this routine was not strictly necessary as there was no use
      case for it yet.  However, some future features are going to need a
      transactional behavior, like support for CREATE/DROP INDEX CONCURRENTLY
      with partitioned tables, where indexes in a partition tree need to have
      all their pg_index.indis* flags updated in the same transaction to make
      the operation stable to the end-user by keeping partition trees
      consistent, even with a failure mid-flight.
      
      REINDEX CONCURRENTLY uses already transactional updates when swapping
      the old and new indexes, making this change more consistent with the
      index-swapping logic.
      
      Author: Michael Paquier
      Reviewed-by: Anastasia Lubennikova
      Discussion: https://postgr.es/m/20200903080440.GA8559@paquier.xyz
      83158f74
    • Peter Eisentraut's avatar
      Message fixes and style improvements · 3e0242b2
      Peter Eisentraut authored
      3e0242b2
    • Michael Paquier's avatar
      Avoid useless allocations for information of dumpable objects in pg_dump/ · ac673a1a
      Michael Paquier authored
      If there are no objects of a certain type, there is no need to do an
      allocation for a set of DumpableObject items.  The previous coding did
      an allocation of 1 byte instead as per the fallback of pg_malloc() in
      the event of an allocation size of zero.  This assigns NULL instead for
      a set of dumpable objects.
      
      A similar rule already applied to findObjectByOid(), so this makes the
      code more defensive as we would just fail with a pointer dereference
      instead of attempting to use some incorrect data if a non-existing,
      positive, OID is given by a caller of this function.
      
      Author: Daniel Gustafsson
      Reviewed-by: Julien Rouhaud, Ranier Vilela
      Discussion: https://postgr.es/m/26C43E58-BDD0-4F1A-97CC-4A07B52E32C5@yesql.se
      ac673a1a
  4. 13 Sep, 2020 1 commit
    • Tom Lane's avatar
      Use the properly transformed RangeVar for expandTableLikeClause(). · 19f5a37b
      Tom Lane authored
      transformCreateStmt() adjusts the transformed statement's RangeVar
      to specify the target schema explicitly, for the express reason
      of making sure that auxiliary statements derived by parse
      transformation operate on the right table.  But the refactoring
      I did in commit 50289819 got this wrong and passed the untransformed
      RangeVar to expandTableLikeClause().  This could lead to assertion
      failures or weird misbehavior if the wrong table was accessed.
      
      Per report from Alexander Lakhin.  Like the previous patch, back-patch
      to all supported branches.
      
      Discussion: https://postgr.es/m/05051f9d-b32b-cb35-6735-0e9f2ab86b5f@gmail.com
      19f5a37b
  5. 12 Sep, 2020 3 commits
  6. 11 Sep, 2020 6 commits
  7. 10 Sep, 2020 11 commits
  8. 09 Sep, 2020 6 commits
  9. 08 Sep, 2020 2 commits
    • Alvaro Herrera's avatar
      Check default partitions constraints while descending · f481d282
      Alvaro Herrera authored
      Partitioning tuple route code assumes that the partition chosen while
      descending the partition hierarchy is always the correct one.  This is
      true except when the partition is the default partition and another
      partition has been added concurrently: the partition constraint changes
      and we don't recheck it.  This can lead to tuples mistakenly being added
      to the default partition that should have been rejected.
      
      Fix by rechecking the default partition constraint while descending the
      hierarchy.
      
      An isolation test based on the reproduction steps described by Hao Wu
      (with tweaks for extra coverage) is included.
      
      Backpatch to 12, where this bug came in with 898e5e32.
      
      Reported by: Hao Wu <hawu@vmware.com>
      Author: Amit Langote <amitlangote09@gmail.com>
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/CA+HiwqFqBmcSSap4sFnCBUEL_VfOMmEKaQ3gwUhyfa4c7J_-nA@mail.gmail.com
      Discussion: https://postgr.es/m/DM5PR0501MB3910E97A9EDFB4C775CF3D75A42F0@DM5PR0501MB3910.namprd05.prod.outlook.com
      f481d282
    • Tom Lane's avatar
      Install an error check into cancel_before_shmem_exit(). · c9ae5cbb
      Tom Lane authored
      Historically, cancel_before_shmem_exit() just silently did nothing
      if the specified callback wasn't the top-of-stack.  The folly of
      ignoring this case was exposed by the bugs fixed in 30364019 and
      bab15004, so let's make it throw elog(ERROR) instead.
      
      There is a decent argument to be made that PG_ENSURE_ERROR_CLEANUP
      should use some separate infrastructure, so it wouldn't break if
      something inside the guarded code decides to register a new
      before_shmem_exit callback.  However, a survey of the surviving
      uses of before_shmem_exit() and PG_ENSURE_ERROR_CLEANUP doesn't
      show any plausible conflicts of that sort today, so for now we'll
      forgo the extra complexity.  (It will almost certainly become
      necessary if anyone ever wants to wrap PG_ENSURE_ERROR_CLEANUP
      around arbitrary user-defined actions, though.)
      
      No backpatch, since this is developer support not a production issue.
      
      Bharath Rupireddy, per advice from Andres Freund, Robert Haas, and myself
      
      Discussion: https://postgr.es/m/CALj2ACWk7j4F2v2fxxYfrroOF=AdFNPr1WsV+AGtHAFQOqm_pw@mail.gmail.com
      c9ae5cbb