1. 20 Apr, 2022 1 commit
    • Tom Lane's avatar
      Fix breakage in AlterFunction(). · 08a9e7a8
      Tom Lane authored
      An ALTER FUNCTION command that tried to update both the function's
      proparallel property and its proconfig list failed to do the former,
      because it stored the new proparallel value into a tuple that was
      no longer the interesting one.  Carelessness in 7aea8e4f.
      
      (I did not bother with a regression test, because the only likely
      future breakage would be for someone to ignore the comment I added
      and add some other field update after the heap_modify_tuple step.
      A test using existing function properties could not catch that.)
      
      Per report from Bryn Llewellyn.  Back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/8AC9A37F-99BD-446F-A2F7-B89AD0022774@yugabyte.com
      08a9e7a8
  2. 19 Apr, 2022 2 commits
  3. 18 Apr, 2022 2 commits
    • Tom Lane's avatar
      Avoid invalid array reference in transformAlterTableStmt(). · e805735a
      Tom Lane authored
      Don't try to look at the attidentity field of system attributes,
      because they're not there in the TupleDescAttr array.  Sometimes
      this is harmless because we accidentally pick up a zero, but
      otherwise we'll report "no owned sequence found" from an attempt
      to alter a system attribute.  (It seems possible that a SIGSEGV
      could occur, too, though I've not seen it in testing.)
      
      It's not in this function's charter to complain that you can't
      alter a system column, so instead just hard-wire an assumption
      that system attributes aren't identities.  I didn't bother with
      a regression test because the appearance of the bug is very
      erratic.
      
      Per bug #17465 from Roman Zharkov.  Back-patch to all supported
      branches.  (There's not actually a live bug before v12, because
      before that get_attidentity() did the right thing anyway.
      But for consistency I changed the test in the older branches too.)
      
      Discussion: https://postgr.es/m/17465-f2a554a6cb5740d3@postgresql.org
      e805735a
    • Michael Paquier's avatar
      Fix race in TAP test 002_archiving.pl when restoring history file · 8bcf90c7
      Michael Paquier authored
      This test, introduced in df86e52c, uses a second standby to check that
      it is able to remove correctly RECOVERYHISTORY and RECOVERYXLOG at the
      end of recovery.  This standby uses the archives of the primary to
      restore its contents, with some of the archive's contents coming from
      the first standby previously promoted.  In slow environments, it was
      possible that the test did not check what it should, as the history file
      generated by the promotion of the first standby may not be stored yet on
      the archives the second standby feeds on.  So, it could be possible that
      the second standby selects an incorrect timeline, without restoring a
      history file at all.
      
      This commits adds a wait phase to make sure that the history file
      required by the second standby is archived before this cluster is
      created.  This relies on poll_query_until() with pg_stat_file() and an
      absolute path, something not supported in REL_10_STABLE.
      
      While on it, this adds a new test to check that the history file has
      been restored by looking at the logs of the second standby.  This
      ensures that a RECOVERYHISTORY, whose removal needs to be checked,
      is created in the first place.  This should make the test more robust.
      
      This test has been introduced by df86e52c, but it came in light as an
      effect of the bug fixed by acf1dd42, where the extra restore_command
      calls made the test much slower.
      
      Reported-by: Andres Freund
      Discussion: https://postgr.es/m/YlT23IvsXkGuLzFi@paquier.xyz
      Backpatch-through: 11
      8bcf90c7
  4. 17 Apr, 2022 1 commit
  5. 14 Apr, 2022 2 commits
    • Robert Haas's avatar
      Rethink the delay-checkpoint-end mechanism in the back-branches. · 10520f43
      Robert Haas authored
      The back-patch of commit bbace569 had
      the unfortunate effect of changing the layout of PGPROC in the
      back-branches, which could break extensions. This happened because it
      changed the delayChkpt from type bool to type int. So, change it back,
      and add a new bool delayChkptEnd field instead. The new field should
      fall within what used to be padding space within the struct, and so
      hopefully won't cause any extensions to break.
      
      Per report from Markus Wanner and discussion with Tom Lane and others.
      
      Patch originally by me, somewhat revised by Markus Wanner per a
      suggestion from Michael Paquier. A very similar patch was developed
      by Kyotaro Horiguchi, but I failed to see the email in which that was
      posted before writing one of my own.
      
      Discussion: http://postgr.es/m/CA+Tgmoao-kUD9c5nG5sub3F7tbo39+cdr8jKaOVEs_1aBWcJ3Q@mail.gmail.com
      Discussion: http://postgr.es/m/20220406.164521.17171257901083417.horikyota.ntt@gmail.com
      10520f43
    • Michael Paquier's avatar
      pageinspect: Fix handling of all-zero pages · df6bbe73
      Michael Paquier authored
      Getting from get_raw_page() an all-zero page is considered as a valid
      case by the buffer manager and it can happen for example when finding a
      corrupted page with zero_damaged_pages enabled (using zero_damaged_pages
      to look at corrupted pages happens), or after a crash when a relation
      file is extended before any WAL for its new data is generated (before a
      vacuum or autovacuum job comes in to do some cleanup).
      
      However, all the functions of pageinspect, as of the index AMs (except
      hash that has its own idea of new pages), heap, the FSM or the page
      header have never worked with all-zero pages, causing various crashes
      when going through the page internals.
      
      This commit changes all the pageinspect functions to be compliant with
      all-zero pages, where the choice is made to return NULL or no rows for
      SRFs when finding a new page.  get_raw_page() still works the same way,
      returning a batch of zeros in the bytea of the page retrieved.  A hard
      error could be used but NULL, while more invasive, is useful when
      scanning relation files in full to get a batch of results for a single
      relation in one query.  Tests are added for all the code paths
      impacted.
      
      Reported-by: Daria Lepikhova
      Author: Michael Paquier
      Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru
      Backpatch-through: 10
      df6bbe73
  6. 13 Apr, 2022 2 commits
    • Tom Lane's avatar
      Prevent access to no-longer-pinned buffer in heapam_tuple_lock(). · c590e514
      Tom Lane authored
      heap_fetch() used to have a "keep_buf" parameter that told it to return
      ownership of the buffer pin to the caller after finding that the
      requested tuple TID exists but is invisible to the specified snapshot.
      This was thoughtlessly removed in commit 5db6df0c, which broke
      heapam_tuple_lock() (formerly EvalPlanQualFetch) because that function
      needs to do more accesses to the tuple even if it's invisible.  The net
      effect is that we would continue to touch the page for a microsecond or
      two after releasing pin on the buffer.  Usually no harm would result;
      but if a different session decided to defragment the page concurrently,
      we could see garbage data and mistakenly conclude that there's no newer
      tuple version to chain up to.  (It's hard to say whether this has
      happened in the field.  The bug was actually found thanks to a later
      change that allowed valgrind to detect accesses to non-pinned buffers.)
      
      The most reasonable way to fix this is to reintroduce keep_buf,
      although I made it behave slightly differently: buffer ownership
      is passed back only if there is a valid tuple at the requested TID.
      In HEAD, we can just add the parameter back to heap_fetch().
      To avoid an API break in the back branches, introduce an additional
      function heap_fetch_extended() in those branches.
      
      In HEAD there is an additional, less obvious API change: tuple->t_data
      will be set to NULL in all cases where buffer ownership is not returned,
      in particular when the tuple exists but fails the time qual (and
      !keep_buf).  This is to defend against any other callers attempting to
      access non-pinned buffers.  We concluded that making that change in back
      branches would be more likely to introduce problems than cure any.
      
      In passing, remove a comment about heap_fetch that was obsoleted by
      9a8ee1dc.
      
      Per bug #17462 from Daniil Anisimov.  Back-patch to v12 where the bug
      was introduced.
      
      Discussion: https://postgr.es/m/17462-9c98a0f00df9bd36@postgresql.org
      c590e514
    • David Rowley's avatar
      Docs: wording improvement for compute_query_id = regress · ea669b80
      David Rowley authored
      It's more accurate to say that the query identifier is not shown when
      compute_query_id = regress rather than to say it is hidden.
      
      This change (ebf6c5249) appeared in v14, so it makes sense to backpatch
      this small adjustment to keep the documents consistent between v14 and
      master.
      
      Author: Justin Pryzby
      Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
      Backpatch-through: 14, where compute_query_id = regress was added
      ea669b80
  7. 12 Apr, 2022 3 commits
  8. 06 Apr, 2022 2 commits
  9. 05 Apr, 2022 1 commit
  10. 02 Apr, 2022 2 commits
  11. 01 Apr, 2022 1 commit
  12. 31 Mar, 2022 3 commits
  13. 30 Mar, 2022 1 commit
  14. 29 Mar, 2022 1 commit
  15. 28 Mar, 2022 3 commits
  16. 27 Mar, 2022 2 commits
    • Tom Lane's avatar
      Fix breakage of get_ps_display() in the PS_USE_NONE case. · 3f7a59c5
      Tom Lane authored
      Commit 8c6d30f2 caused this function to fail to set *displen
      in the PS_USE_NONE code path.  If the variable's previous value
      had been negative, that'd lead to a memory clobber at some call
      sites.  We'd managed not to notice due to very thin test coverage
      of such configurations, but this appears to explain buildfarm member
      lorikeet's recent struggles.
      
      Credit to Andrew Dunstan for spotting the problem.  Back-patch
      to v13 where the bug was introduced.
      
      Discussion: https://postgr.es/m/136102.1648320427@sss.pgh.pa.us
      3f7a59c5
    • Michael Paquier's avatar
      pageinspect: Add more sanity checks to prevent out-of-bound reads · 27d38444
      Michael Paquier authored
      A couple of code paths use the special area on the page passed by the
      function caller, expecting to find some data in it.  However, feeding
      an incorrect page can lead to out-of-bound reads when trying to access
      the page special area (like a heap page that has no special area,
      leading PageGetSpecialPointer() to grab a pointer outside the allocated
      page).
      
      The functions used for hash and btree indexes have some protection
      already against that, while some other functions using a relation OID
      as argument would make sure that the access method involved is correct,
      but functions taking in input a raw page without knowing the relation
      the page is attached to would run into problems.
      
      This commit improves the set of checks used in the code paths of BRIN,
      btree (including one check if a leaf page is found with a non-zero
      level), GIN and GiST to verify that the page given in input has a
      special area size that fits with each access method, which is done
      though PageGetSpecialSize(), becore calling PageGetSpecialPointer().
      
      The scope of the checks done is limited to work with pages that one
      would pass after getting a block with get_raw_page(), as it is possible
      to craft byteas that could bypass existing code paths.  Having too many
      checks would also impact the usability of pageinspect, as the existing
      code is very useful to look at the content details in a corrupted page,
      so the focus is really to avoid out-of-bound reads as this is never a
      good thing even with functions whose execution is limited to
      superusers.
      
      The safest approach could be to rework the functions so as these fetch a
      block using a relation OID and a block number, but there are also cases
      where using a raw page is useful.
      
      Tests are added to cover all the code paths that needed such checks, and
      an error message for hash indexes is reworded to fit better with what
      this commit adds.
      
      Reported-By: Alexander Lakhin
      Author: Julien Rouhaud, Michael Paquier
      Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org
      Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru
      Backpatch-through: 10
      27d38444
  17. 26 Mar, 2022 1 commit
    • Tom Lane's avatar
      Suppress compiler warning in relptr_store(). · 0144c9c7
      Tom Lane authored
      clang 13 with -Wextra warns that "performing pointer subtraction with
      a null pointer has undefined behavior" in the places where freepage.c
      tries to set a relptr variable to constant NULL.  This appears to be
      a compiler bug, but it's unlikely to get fixed instantly.  Fortunately,
      we can work around it by introducing an inline support function, which
      seems like a good change anyway because it removes the macro's existing
      double-evaluation hazard.
      
      Backpatch to v10 where this code was introduced.
      
      Patch by me, based on an idea of Andres Freund's.
      
      Discussion: https://postgr.es/m/48826.1648310694@sss.pgh.pa.us
      0144c9c7
  18. 25 Mar, 2022 2 commits
    • Tom Lane's avatar
      Harden TAP tests that intentionally corrupt page checksums. · 579cef5f
      Tom Lane authored
      The previous method for doing that was to write zeroes into a
      predetermined set of page locations.  However, there's a roughly
      1-in-64K chance that the existing checksum will match by chance,
      and yesterday several buildfarm animals started to reproducibly
      see that, resulting in test failures because no checksum mismatch
      was reported.
      
      Since the checksum includes the page LSN, test success depends on
      the length of the installation's WAL history, which is affected by
      (at least) the initial catalog contents, the set of locales installed
      on the system, and the length of the pathname of the test directory.
      Sooner or later we were going to hit a chance match, and today is
      that day.
      
      Harden these tests by specifically inverting the checksum field and
      leaving all else alone, thereby guaranteeing that the checksum is
      incorrect.
      
      In passing, fix places that were using seek() to set up for syswrite(),
      a combination that the Perl docs very explicitly warn against.  We've
      probably escaped problems because no regular buffered I/O is done on
      these filehandles; but if it ever breaks, we wouldn't deserve or get
      much sympathy.
      
      Although we've only seen problems in HEAD, now that we recognize the
      environmental dependencies it seems like it might be just a matter
      of time until someone manages to hit this in back-branch testing.
      Hence, back-patch to v11 where we started doing this kind of test.
      
      Discussion: https://postgr.es/m/3192026.1648185780@sss.pgh.pa.us
      579cef5f
    • Alvaro Herrera's avatar
      Fix replay of create database records on standby · ffd28516
      Alvaro Herrera authored
      Crash recovery on standby may encounter missing directories when
      replaying create database WAL records.  Prior to this patch, the standby
      would fail to recover in such a case.  However, the directories could be
      legitimately missing.  Consider a sequence of WAL records as follows:
      
          CREATE DATABASE
          DROP DATABASE
          DROP TABLESPACE
      
      If, after replaying the last WAL record and removing the tablespace
      directory, the standby crashes and has to replay the create database
      record again, the crash recovery must be able to move on.
      
      This patch adds a mechanism similar to invalid-page tracking, to keep a
      tally of missing directories during crash recovery.  If all the missing
      directory references are matched with corresponding drop records at the
      end of crash recovery, the standby can safely continue following the
      primary.
      
      Backpatch to 13, at least for now.  The bug is older, but fixing it in
      older branches requires more careful study of the interactions with
      commit e6d80695, which appeared in 13.
      
      A new TAP test file is added to verify the condition.  However, because
      it depends on commit d6d317dbf615, it can only be added to branch
      master.  I (Álvaro) manually verified that the code behaves as expected
      in branch 14.  It's a bit nervous-making to leave the code uncovered by
      tests in older branches, but leaving the bug unfixed is even worse.
      Also, the main reason this fix took so long is precisely that we
      couldn't agree on a good strategy to approach testing for the bug, so
      perhaps this is the best we can do.
      Diagnosed-by: default avatarPaul Guo <paulguo@gmail.com>
      Author: Paul Guo <paulguo@gmail.com>
      Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
      Author: Asim R Praveen <apraveen@pivotal.io>
      Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
      ffd28516
  19. 24 Mar, 2022 1 commit
  20. 23 Mar, 2022 6 commits
  21. 22 Mar, 2022 1 commit