1. 25 Mar, 2022 2 commits
    • Tom Lane's avatar
      Harden TAP tests that intentionally corrupt page checksums. · 579cef5f
      Tom Lane authored
      The previous method for doing that was to write zeroes into a
      predetermined set of page locations.  However, there's a roughly
      1-in-64K chance that the existing checksum will match by chance,
      and yesterday several buildfarm animals started to reproducibly
      see that, resulting in test failures because no checksum mismatch
      was reported.
      
      Since the checksum includes the page LSN, test success depends on
      the length of the installation's WAL history, which is affected by
      (at least) the initial catalog contents, the set of locales installed
      on the system, and the length of the pathname of the test directory.
      Sooner or later we were going to hit a chance match, and today is
      that day.
      
      Harden these tests by specifically inverting the checksum field and
      leaving all else alone, thereby guaranteeing that the checksum is
      incorrect.
      
      In passing, fix places that were using seek() to set up for syswrite(),
      a combination that the Perl docs very explicitly warn against.  We've
      probably escaped problems because no regular buffered I/O is done on
      these filehandles; but if it ever breaks, we wouldn't deserve or get
      much sympathy.
      
      Although we've only seen problems in HEAD, now that we recognize the
      environmental dependencies it seems like it might be just a matter
      of time until someone manages to hit this in back-branch testing.
      Hence, back-patch to v11 where we started doing this kind of test.
      
      Discussion: https://postgr.es/m/3192026.1648185780@sss.pgh.pa.us
      579cef5f
    • Alvaro Herrera's avatar
      Fix replay of create database records on standby · ffd28516
      Alvaro Herrera authored
      Crash recovery on standby may encounter missing directories when
      replaying create database WAL records.  Prior to this patch, the standby
      would fail to recover in such a case.  However, the directories could be
      legitimately missing.  Consider a sequence of WAL records as follows:
      
          CREATE DATABASE
          DROP DATABASE
          DROP TABLESPACE
      
      If, after replaying the last WAL record and removing the tablespace
      directory, the standby crashes and has to replay the create database
      record again, the crash recovery must be able to move on.
      
      This patch adds a mechanism similar to invalid-page tracking, to keep a
      tally of missing directories during crash recovery.  If all the missing
      directory references are matched with corresponding drop records at the
      end of crash recovery, the standby can safely continue following the
      primary.
      
      Backpatch to 13, at least for now.  The bug is older, but fixing it in
      older branches requires more careful study of the interactions with
      commit e6d80695, which appeared in 13.
      
      A new TAP test file is added to verify the condition.  However, because
      it depends on commit d6d317dbf615, it can only be added to branch
      master.  I (Álvaro) manually verified that the code behaves as expected
      in branch 14.  It's a bit nervous-making to leave the code uncovered by
      tests in older branches, but leaving the bug unfixed is even worse.
      Also, the main reason this fix took so long is precisely that we
      couldn't agree on a good strategy to approach testing for the bug, so
      perhaps this is the best we can do.
      Diagnosed-by: default avatarPaul Guo <paulguo@gmail.com>
      Author: Paul Guo <paulguo@gmail.com>
      Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
      Author: Asim R Praveen <apraveen@pivotal.io>
      Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
      ffd28516
  2. 24 Mar, 2022 1 commit
  3. 23 Mar, 2022 6 commits
  4. 22 Mar, 2022 2 commits
    • Andres Freund's avatar
      Add missing dependency of pg_dumpall to WIN32RES. · 2d608c96
      Andres Freund authored
      When cross-building to windows, or building with mingw on windows, the build
      could fail with
        x86_64-w64-mingw32-gcc: error: win32ver.o: No such file or director
      because pg_dumpall didn't depend on WIN32RES, but it's recipe references
      it. The build nevertheless succeeded most of the time, due to
      pg_dump/pg_restore having the required dependency, causing win32ver.o to be
      built.
      Reported-By: default avatarThomas Munro <thomas.munro@gmail.com>
      Discussion: https://postgr.es/m/CA+hUKGJeekpUPWW6yCVdf9=oBAcCp86RrBivo4Y4cwazAzGPng@mail.gmail.com
      Backpatch: 10-, omission present on all live branches
      2d608c96
    • Michael Paquier's avatar
      Fix failures in SSL tests caused by out-of-tree keys and certificates · fdb1be49
      Michael Paquier authored
      This issue is environment-sensitive, where the SSL tests could fail in
      various way by feeding on defaults provided by sslcert, sslkey,
      sslrootkey, sslrootcert, sslcrl and sslcrldir coming from a local setup,
      as of ~/.postgresql/ by default.  Horiguchi-san has reported two
      failures, but more advanced testing from me (aka inclusion of garbage
      SSL configuration in ~/.postgresql/ for all the configuration
      parameters) has showed dozens of failures that can be triggered in the
      whole test suite.
      
      History has showed that we are not good when it comes to address such
      issues, fixing them locally like in dd877998, and such problems keep
      appearing.  This commit strengthens the entire test suite to put an end
      to this set of problems by embedding invalid default values in all the
      connection strings used in the tests.  The invalid values are prefixed
      in each connection string, relying on the follow-up values passed in the
      connection string to enforce any invalid value previously set.  Note
      that two tests related to CRLs are required to fail with certain pre-set
      configurations, but we can rely on enforcing an empty value instead
      after the invalid set of values.
      
      Reported-by: Kyotaro Horiguchi
      Reviewed-by: Andrew Dunstan, Daniel Gustafsson, Kyotaro Horiguchi
      Discussion: https://postgr.es/m/20220316.163658.1122740600489097632.horikyota.ntt@gmail.com
      backpatch-through: 10
      fdb1be49
  5. 21 Mar, 2022 2 commits
    • Tom Lane's avatar
      Fix assorted missing logic for GroupingFunc nodes. · 48b6035f
      Tom Lane authored
      The planner needs to treat GroupingFunc like Aggref for many purposes,
      in particular with respect to processing of the argument expressions,
      which are not to be evaluated at runtime.  A few places hadn't gotten
      that memo, notably including subselect.c's processing of outer-level
      aggregates.  This resulted in assertion failures or wrong plans for
      cases in which a GROUPING() construct references an outer aggregation
      level.
      
      Also fix missing special cases for GroupingFunc in cost_qual_eval
      (resulting in wrong cost estimates for GROUPING(), although it's
      not clear that that would affect plan shapes in practice) and in
      ruleutils.c (resulting in excess parentheses in pretty-print mode).
      
      Per bug #17088 from Yaoguang Chen.  Back-patch to all supported
      branches.
      
      Richard Guo, Tom Lane
      
      Discussion: https://postgr.es/m/17088-e33882b387de7f5c@postgresql.org
      48b6035f
    • Tom Lane's avatar
      Fix risk of deadlock failure while dropping a partitioned index. · 05ccf974
      Tom Lane authored
      DROP INDEX needs to lock the index's table before the index itself,
      else it will deadlock against ordinary queries that acquire the
      relation locks in that order.  This is correctly mechanized for
      plain indexes by RangeVarCallbackForDropRelation; but in the case of
      a partitioned index, we neglected to lock the child tables in advance
      of locking the child indexes.  We can fix that by traversing the
      inheritance tree and acquiring the needed locks in RemoveRelations,
      after we have acquired our locks on the parent partitioned table and
      index.
      
      While at it, do some refactoring to eliminate confusion between
      the actual and expected relkind in RangeVarCallbackForDropRelation.
      We can save a couple of syscache lookups too, by having that function
      pass back info that RemoveRelations will need.
      
      Back-patch to v11 where partitioned indexes were added.
      
      Jimmy Yih, Gaurab Dey, Tom Lane
      
      Discussion: https://postgr.es/m/BYAPR05MB645402330042E17D91A70C12BD5F9@BYAPR05MB6454.namprd05.prod.outlook.com
      05ccf974
  6. 20 Mar, 2022 1 commit
  7. 19 Mar, 2022 1 commit
  8. 18 Mar, 2022 1 commit
  9. 17 Mar, 2022 1 commit
    • Tom Lane's avatar
      Revert applying column aliases to the output of whole-row Vars. · 1d072bd2
      Tom Lane authored
      In commit bf7ca158, I had the bright idea that we could make the
      result of a whole-row Var (that is, foo.*) track any column aliases
      that had been applied to the FROM entry the Var refers to.  However,
      that's not terribly logically consistent, because now the output of
      the Var is no longer of the named composite type that the Var claims
      to emit.  bf7ca158 tried to handle that by changing the output
      tuple values to be labeled with a blessed RECORD type, but that's
      really pretty disastrous: we can wind up storing such tuples onto
      disk, whereupon they're not readable by other sessions.
      
      The only practical fix I can see is to give up on what bf7ca158
      tried to do, and say that the column names of tuples produced by
      a whole-row Var are always those of the underlying named composite
      type, query aliases or no.  While this introduces some inconsistencies,
      it removes others, so it's not that awful in the abstract.  What *is*
      kind of awful is to make such a behavioral change in a back-patched
      bug fix.  But corrupt data is worse, so back-patched it will be.
      
      (A workaround available to anyone who's unhappy about this is to
      introduce an extra level of sub-SELECT, so that the whole-row Var is
      referring to the sub-SELECT's output and not to a named table type.
      Then the Var is of type RECORD to begin with and there's no issue.)
      
      Per report from Miles Delahunty.  The faulty commit dates to 9.5,
      so back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
      1d072bd2
  10. 16 Mar, 2022 8 commits
    • Tomas Vondra's avatar
      Fix publish_as_relid with multiple publications · 677a1dc0
      Tomas Vondra authored
      Commit 83fd4532 allowed publishing of changes via ancestors, for
      publications defined with publish_via_partition_root. But the way
      the ancestor was determined in get_rel_sync_entry() was incorrect,
      simply updating the same variable. So with multiple publications,
      replicating different ancestors, the outcome depended on the order
      of publications in the list - the value from the last loop was used,
      even if it wasn't the top-most ancestor.
      
      This is a probably rare situation, as in most cases publications do
      not overlap, so each partition has exactly one candidate ancestor
      to replicate as and there's no ambiguity.
      
      Fixed by tracking the "ancestor level" for each publication, and
      picking the top-most ancestor. Adds a test case, verifying the
      correct ancestor is used for publishing the changes and that this
      does not depend on order of publications in the list.
      
      Older releases have another bug in this loop - once all actions are
      replicated, the loop is terminated, on the assumption that inspecting
      additional publications is unecessary. But that misses the fact that
      those additional applications may replicate different ancestors.
      
      Fixed by removal of this break condition. We might still terminate the
      loop in some cases (e.g. when replicating all actions and the ancestor
      is the partition root).
      
      Backpatch to 13, where publish_via_partition_root was introduced.
      
      Initial report and fix by me, test added by Hou zj. Reviews and
      improvements by Amit Kapila.
      
      Author: Tomas Vondra, Hou zj, Amit Kapila
      Reviewed-by: Amit Kapila, Hou zj
      Discussion: https://postgr.es/m/d26d24dd-2fab-3c48-0162-2b7f84a9c893%40enterprisedb.com
      677a1dc0
    • Alexander Korotkov's avatar
      Fix default signature length for gist_ltree_ops · 7d30f59d
      Alexander Korotkov authored
      911e7020 implemented operator class parameters including the signature length
      in ltree.  Previously, the signature length for gist_ltree_ops was 8.  Because
      of bug 911e7020 the default signature length for gist_ltree_ops became 28 for
      ltree 1.1 (where options method is NOT provided) and 8 for ltree 1.2 (where
      options method is provided).  This commit changes the default signature length
      for ltree 1.1 to 8.
      
      Existing gist_ltree_ops indexes might be corrupted in various scenarios.
      Thus, we have to recommend reindexing all the gist_ltree_ops indexes after
      the upgrade.
      
      Reported-by: Victor Yegorov
      Reviewed-by: Tomas Vondra, Tom Lane, Andres Freund, Nikita Glukhov
      Reviewed-by: Andrew Dunstan
      Author: Tomas Vondra, Alexander Korotkov
      Discussion: https://postgr.es/m/17406-71e02820ae79bb40%40postgresql.org
      Discussion: https://postgr.es/m/d80e0a55-6c3e-5b26-53e3-3c4f973f737c%40enterprisedb.com
      7d30f59d
    • Thomas Munro's avatar
      Fix race between DROP TABLESPACE and checkpointing. · 26e00793
      Thomas Munro authored
      Commands like ALTER TABLE SET TABLESPACE may leave files for the next
      checkpoint to clean up.  If such files are not removed by the time DROP
      TABLESPACE is called, we request a checkpoint so that they are deleted.
      However, there is presently a window before checkpoint start where new
      unlink requests won't be scheduled until the following checkpoint.  This
      means that the checkpoint forced by DROP TABLESPACE might not remove the
      files we expect it to remove, and the following ERROR will be emitted:
      
      	ERROR:  tablespace "mytblspc" is not empty
      
      To fix, add a call to AbsorbSyncRequests() just before advancing the
      unlink cycle counter.  This ensures that any unlink requests forwarded
      prior to checkpoint start (i.e., when ckpt_started is incremented) will
      be processed by the current checkpoint.  Since AbsorbSyncRequests()
      performs memory allocations, it cannot be called within a critical
      section, so we also need to move SyncPreCheckpoint() to before
      CreateCheckPoint()'s critical section.
      
      This is an old bug, so back-patch to all supported versions.
      
      Author: Nathan Bossart <nathandbossart@gmail.com>
      Reported-by: default avatarNathan Bossart <nathandbossart@gmail.com>
      Reviewed-by: default avatarThomas Munro <thomas.munro@gmail.com>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
      26e00793
    • Michael Paquier's avatar
      pageinspect: Fix memory context allocation of page in brin_revmap_data() · dc5b3bda
      Michael Paquier authored
      This caused the function to fail, as the aligned copy of the raw page
      given by the function caller was not saved in the correct memory
      context, which needs to be multi_call_memory_ctx in this case.
      
      Issue introduced by 076f4d9.
      
      Per buildfarm members sifika, mylodon and longfin.  I have reproduced
      that locally with macos.
      
      Discussion: https://postgr.es/m/YjFPOtfCW6yLXUeM@paquier.xyz
      Backpatch-through: 10
      dc5b3bda
    • Thomas Munro's avatar
      Fix documentation typo in commit 5e6368b4. · ea70f694
      Thomas Munro authored
      Back-patch to 14.
      ea70f694
    • Thomas Munro's avatar
      Fix waiting in RegisterSyncRequest(). · 1396b5c6
      Thomas Munro authored
      If we run out of space in the checkpointer sync request queue (which is
      hopefully rare on real systems, but common with very small buffer pool),
      we wait for it to drain.  While waiting, we should report that as a wait
      event so that users know what is going on, and also handle postmaster
      death, since otherwise the loop might never terminate if the
      checkpointer has exited.
      
      Back-patch to 12.  Although the problem exists in earlier releases too,
      the code is structured differently before 12 so I haven't gone any
      further for now, in the absence of field complaints.
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
      1396b5c6
    • Michael Paquier's avatar
      pageinspect: Fix handling of page sizes and AM types · b61e6214
      Michael Paquier authored
      This commit fixes a set of issues related to the use of the SQL
      functions in this module when the caller is able to pass down raw page
      data as input argument:
      - The page size check was fuzzy in a couple of places, sometimes
      looking after only a sub-range, but what we are looking for is an exact
      match on BLCKSZ.  After considering a few options here, I have settled
      down to do a generalization of get_page_from_raw().  Most of the SQL
      functions already used that, and this is not strictly required if not
      accessing an 8-byte-wide value from a raw page, but this feels safer in
      the long run for alignment-picky environment, particularly if a code
      path begins to access such values.  This also reduces the number of
      strings that need to be translated.
      - The BRIN function brin_page_items() uses a Relation but it did not
      check the access method of the opened index, potentially leading to
      crashes.  All the other functions in need of a Relation already did
      that.
      - Some code paths could fail on elog(), but we should to use ereport()
      for failures that can be triggered by the user.
      
      Tests are added to stress all the cases that are fixed as of this
      commit, with some junk raw pages (\set VERBOSITY ensures that this works
      across all page sizes) and unexpected index types when functions open
      relations.
      
      Author: Michael Paquier, Justin Prysby
      Discussion: https://postgr.es/m/20220218030020.GA1137@telsasoft.com
      Backpatch-through: 10
      b61e6214
    • Thomas Munro's avatar
      Wake up for latches in CheckpointWriteDelay(). · 78c0f85e
      Thomas Munro authored
      The checkpointer shouldn't ignore its latch.  Other backends may be
      waiting for it to drain the request queue.  Hopefully real systems don't
      have a full queue often, but the condition is reached easily when
      shared_buffers is small.
      
      This involves defining a new wait event, which will appear in the
      pg_stat_activity view often due to spread checkpoints.
      
      Back-patch only to 14.  Even though the problem exists in earlier
      branches too, it's hard to hit there.  In 14 we stopped using signal
      handlers for latches on Linux, *BSD and macOS, which were previously
      hiding this problem by interrupting the sleep (though not reliably, as
      the signal could arrive before the sleep begins; precisely the problem
      latches address).
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
      78c0f85e
  11. 15 Mar, 2022 1 commit
    • Thomas Munro's avatar
      Back-patch LLVM 14 API changes. · d9f7ad54
      Thomas Munro authored
      Since LLVM 14 has stopped changing and is about to be released,
      back-patch the following changes from the master branch:
      
        e6a7600202105919bffd62b3dfd941f4a94e082b
        807fee1a39de6bb8184082012e643951abb9ad1d
        a56e7b66010f330782243de9e25ac2a6596be0e1
      
      Back-patch to 11, where LLVM JIT support came in.
      d9f7ad54
  12. 11 Mar, 2022 1 commit
    • Tom Lane's avatar
      Restore the previous semantics of get_constraint_index(). · 8dcd1c35
      Tom Lane authored
      Commit 8b069ef5 changed this function to look at pg_constraint.conindid
      rather than searching pg_depend.  That was a good performance improvement,
      but it failed to preserve the exact semantics.  The old code would only
      return an index that was "owned by" (internally dependent on) the
      specified constraint, whereas the new code will also return indexes that
      are just referenced by foreign key constraints.  This confuses ALTER
      TABLE, which was implicitly expecting the previous semantics, into
      failing with errors like
          ERROR:  relation 146621 has multiple clustered indexes
      or
          ERROR:  "pk_attbl" is not an index for table "atref"
      
      We can fix this without reverting the performance improvement by adding
      a contype check in get_constraint_index().  Another way could be to
      make ALTER TABLE check it, but I'm worried that extension code could
      also have subtle dependencies on the old semantics.
      
      Tom Lane and Japin Li, per bug #17409 from Holly Roberts.
      Back-patch to v14 where the error crept in.
      
      Discussion: https://postgr.es/m/17409-52871dda8b5741cb@postgresql.org
      8dcd1c35
  13. 09 Mar, 2022 1 commit
  14. 05 Mar, 2022 1 commit
  15. 04 Mar, 2022 2 commits
    • Tom Lane's avatar
      Fix pg_regress to print the correct postmaster address on Windows. · a008c03d
      Tom Lane authored
      pg_regress reported "Unix socket" as the default location whenever
      HAVE_UNIX_SOCKETS is defined.  However, that's not been accurate
      on Windows since 8f3ec75d.  Update this logic to match what libpq
      actually does now.
      
      This is just cosmetic, but still it's potentially misleading.
      Back-patch to v13 where 8f3ec75d came in.
      
      Discussion: https://postgr.es/m/3894060.1646415641@sss.pgh.pa.us
      a008c03d
    • Tom Lane's avatar
      Fix bogus casting in BlockIdGetBlockNumber(). · 5c9d17e9
      Tom Lane authored
      This macro cast the result to BlockNumber after shifting, not before,
      which is the wrong thing.  Per the C spec, the uint16 fields would
      promote to int not unsigned int, so that (for 32-bit int) the shift
      potentially shifts a nonzero bit into the sign position.  I doubt
      there are any production systems where this would actually end with
      the wrong answer, but it is undefined behavior per the C spec, and
      clang's -fsanitize=undefined option reputedly warns about it on some
      platforms.  (I can't reproduce that right now, but the code is
      undeniably wrong per spec.)  It's easy to fix by casting to
      BlockNumber (uint32) in the proper places.
      
      It's been wrong for ages, so back-patch to all supported branches.
      
      Report and patch by Zhihong Yu (cosmetic tweaking by me)
      
      Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
      5c9d17e9
  16. 03 Mar, 2022 1 commit
    • Tom Lane's avatar
      Clean up assorted failures under clang's -fsanitize=undefined checks. · b0bc196e
      Tom Lane authored
      Most of these are cases where we could call memcpy() or other libc
      functions with a NULL pointer and a zero count, which is forbidden
      by POSIX even though every production version of libc allows it.
      We've fixed such things before in a piecemeal way, but apparently
      never made an effort to try to get them all.  I don't claim that
      this patch does so either, but it gets every failure I observe in
      check-world, using clang 12.0.1 on current RHEL8.
      
      numeric.c has a different issue that the sanitizer doesn't like:
      "ln(-1.0)" will compute log10(0) and then try to assign the
      resulting -Inf to an integer variable.  We don't actually use the
      result in such a case, so there's no live bug.
      
      Back-patch to all supported branches, with the idea that we might
      start running a buildfarm member that tests this case.  This includes
      back-patching c1132aae3 (Check the size in COPY_POINTER_FIELD),
      which previously silenced some of these issues in copyfuncs.c.
      
      Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
      b0bc196e
  17. 02 Mar, 2022 1 commit
    • Tom Lane's avatar
      Allow root-owned SSL private keys in libpq, not only the backend. · 2a1f8463
      Tom Lane authored
      This change makes libpq apply the same private-key-file ownership
      and permissions checks that we have used in the backend since commit
      9a83564c.  Namely, that the private key can be owned by either the
      current user or root (with different file permissions allowed in the
      two cases).  This allows system-wide management of key files, which
      is just as sensible on the client side as the server, particularly
      when the client is itself some application daemon.
      
      Sync the comments about this between libpq and the backend, too.
      
      Back-patch of a59c79564 and 50f03473e into all supported branches.
      
      David Steele
      
      Discussion: https://postgr.es/m/f4b7bc55-97ac-9e69-7398-335e212f7743@pgmasters.net
      2a1f8463
  18. 25 Feb, 2022 2 commits
    • Tom Lane's avatar
      Disallow execution of SPI functions during plperl function compilation. · ac910bb2
      Tom Lane authored
      Perl can be convinced to execute user-defined code during compilation
      of a plperl function (or at least a plperlu function).  That's not
      such a big problem as long as the activity is confined within the
      Perl interpreter, and it's not clear we could do anything about that
      anyway.  However, if such code tries to use plperl's SPI functions,
      we have a bigger problem.  In the first place, those functions are
      likely to crash because current_call_data->prodesc isn't set up yet.
      In the second place, because it isn't set up, we lack critical info
      such as whether the function is supposed to be read-only.  And in
      the third place, this path allows code execution during function
      validation, which is strongly discouraged because of the potential
      for security exploits.  Hence, reject execution of the SPI functions
      until compilation is finished.
      
      While here, add check_spi_usage_allowed() calls to various functions
      that hadn't gotten the memo about checking that.  I think that perhaps
      plperl_sv_to_literal may have been intentionally omitted on the grounds
      that it was safe at the time; but if so, the addition of transforms
      functionality changed that.  The others are more recently added and
      seem to be flat-out oversights.
      
      Per report from Mark Murawski.  Back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/9acdf918-7fff-4f40-f750-2ffa84f083d2@intellasoft.net
      ac910bb2
    • Andres Freund's avatar
      pg_waldump: Fix error message for WAL files smaller than XLOG_BLCKSZ. · 9ff7fd90
      Andres Freund authored
      When opening a WAL file smaller than XLOG_BLCKSZ (e.g. 0 bytes long) while
      determining the wal_segment_size, pg_waldump checked errno, despite errno not
      being set by the short read. Resulting in a bogus error message.
      
      Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
      Discussion: https://postgr.es/m/20220214.181847.775024684568733277.horikyota.ntt@gmail.com
      Backpatch: 11-, the bug was introducedin fc49e24f
      9ff7fd90
  19. 24 Feb, 2022 1 commit
  20. 23 Feb, 2022 1 commit
  21. 22 Feb, 2022 1 commit
    • Michael Paquier's avatar
      Add compute_query_id = regress · 627c79a1
      Michael Paquier authored
      "regress" is a new mode added to compute_query_id aimed at facilitating
      regression testing when a module computing query IDs is loaded into the
      backend, like pg_stat_statements.  It works the same way as "auto",
      meaning that query IDs are computed if a module enables it, except that
      query IDs are hidden in EXPLAIN outputs to ensure regression output
      stability.
      
      Like any GUCs of the kind (force_parallel_mode, etc.), this new
      configuration can be added to an instance's postgresql.conf, or just
      passed down with PGOPTIONS at command level.  compute_query_id uses an
      enum for its set of option values, meaning that this addition ensures
      ABI compatibility.
      
      Using this new configuration mode allows installcheck-world to pass when
      running the tests on an instance with pg_stat_statements enabled,
      stabilizing the test output while checking the paths doing query ID
      computations.
      
      Reported-by: Anton Melnikov
      Reviewed-by: Julien Rouhaud
      Discussion: https://postgr.es/m/1634283396.372373993@f75.i.mail.ru
      Discussion: https://postgr.es/m/YgHlxgc/OimuPYhH@paquier.xyz
      Backpatch-through: 14
      627c79a1
  22. 21 Feb, 2022 1 commit
    • Andres Freund's avatar
      Fix temporary object cleanup failing due to toast access without snapshot. · 7bbfe599
      Andres Freund authored
      When cleaning up temporary objects during process exit the cleanup could fail
      with:
        FATAL: cannot fetch toast data without an active snapshot
      
      The bug is caused by RemoveTempRelationsCallback() not setting up a
      snapshot. If an object with toasted catalog data needs to be cleaned up,
      init_toast_snapshot() could fail with the above error.
      
      Most of the time however the the problem is masked due to cached catalog
      snapshots being returned by GetOldestSnapshot(). But dropping an object can
      cause catalog invalidations to be emitted. If no further catalog accesses are
      necessary between the invalidation processing and the next toast datum
      deletion, the bug becomes visible.
      
      It's easy to miss this bug because it typically happens after clients
      disconnect and the FATAL error just ends up in the log.
      
      Luckily temporary table cleanup at the next use of the same temporary schema
      or during DISCARD ALL does not have the same problem.
      
      Fix the bug by pushing a snapshot in RemoveTempRelationsCallback(). Also add
      isolation tests for temporary object cleanup, including objects with toasted
      catalog data.
      
      A future HEAD only commit will add more assertions.
      
      Reported-By: Miles Delahunty
      Author: Andres Freund
      Discussion: https://postgr.es/m/CAOFAq3BU5Mf2TTvu8D9n_ZOoFAeQswuzk7yziAb7xuw_qyw5gw@mail.gmail.com
      Backpatch: 10-
      7bbfe599
  23. 20 Feb, 2022 1 commit