1. 04 Oct, 2021 1 commit
    • Michael Paquier's avatar
      Fix snapshot builds during promotion of hot standby node with 2PC · 828f7f00
      Michael Paquier authored
      Some specific logic is done at the end of recovery when involving 2PC
      transactions:
      1) Call RecoverPreparedTransactions(), to recover the state of 2PC
      transactions into memory (re-acquire locks, etc.).
      2) ShutdownRecoveryTransactionEnvironment(), to move back to normal
      operations, mainly cleaning up recovery locks and KnownAssignedXids
      (including any 2PC transaction tracked previously).
      3) Switch XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE, which is
      the tipping point for any process calling RecoveryInProgress() to check
      if the cluster is still in recovery or not.
      
      Any snapshot taken between steps 2) and 3) would be empty, causing any
      transaction relying on a snapshot at this point to potentially corrupt
      data as there could still be some 2PC transactions to track, with
      RecentXmin moving backwards on successive calls to GetSnapshotData() in
      the same transaction.
      
      As SharedRecoveryState is the point to take into account to know if it
      is safe to discard KnownAssignedXids, this commit moves step 2) after
      step 3), so as we can never finish with empty snapshots.
      
      This exists since the introduction of hot standby, so backpatch all the
      way down.  The window with incorrect snapshots is extremely small, but I
      have seen it when running 023_pitr_prepared_xact.pl, as did buildfarm
      member fairywren.  Thomas Munro also found it independently.  Special
      thanks to Andres Freund for taking the time to analyze this issue.
      
      Reported-by: Thomas Munro, Michael Paquier
      Analyzed-by: Andres Freund
      Discussion: https://postgr.es/m/20210422203603.fdnh3fu2mmfp2iov@alap3.anarazel.de
      Backpatch-through: 9.6
      828f7f00
  2. 03 Oct, 2021 1 commit
    • Tom Lane's avatar
      Fix checking of query type in plpgsql's RETURN QUERY command. · e0eba586
      Tom Lane authored
      Prior to v14, we insisted that the query in RETURN QUERY be of a type
      that returns tuples.  (For instance, INSERT RETURNING was allowed,
      but not plain INSERT.)  That happened indirectly because we opened a
      cursor for the query, so spi.c checked SPI_is_cursor_plan().  As a
      consequence, the error message wasn't terribly on-point, but at least
      it was there.
      
      Commit 2f48ede0 lost this detail.  Instead, plain RETURN QUERY
      insisted that the query be a SELECT (by checking for SPI_OK_SELECT)
      while RETURN QUERY EXECUTE failed to check the query type at all.
      Neither of these changes was intended.
      
      The only convenient place to check this in the EXECUTE case is inside
      _SPI_execute_plan, because we haven't done parse analysis until then.
      So we need to pass down a flag saying whether to enforce that the
      query returns tuples.  Fortunately, we can squeeze another boolean
      into struct SPIExecuteOptions without an ABI break, since there's
      padding space there.  (It's unlikely that any extensions would
      already be using this new struct, but preserving ABI in v14 seems
      like a smart idea anyway.)
      
      Within spi.c, it seemed like _SPI_execute_plan's parameter list
      was already ridiculously long, and I didn't want to make it longer.
      So I thought of passing SPIExecuteOptions down as-is, allowing that
      parameter list to become much shorter.  This makes the patch a bit
      more invasive than it might otherwise be, but it's all internal to
      spi.c, so that seems fine.
      
      Per report from Marc Bachmann.  Back-patch to v14 where the
      faulty code came in.
      
      Discussion: https://postgr.es/m/1F2F75F0-27DF-406F-848D-8B50C7EEF06A@gmail.com
      e0eba586
  3. 02 Oct, 2021 2 commits
    • Tom Lane's avatar
      Update our mapping of Windows time zone names using CLDR info. · fa8db487
      Tom Lane authored
      This corrects a bunch of entries in win32_tzmap[], and adds a few
      new ones, based on the CLDR project's windowsZones.xml file.
      Non-cosmetic changes fall into four main categories:
      
      * Flat-out errors:
      
      US/Aleutan doesn't exist
      America/Salvador doesn't exist
      Asia/Baku is wrong for Yerevan
      Asia/Dhaka (Bangladesh) is wrong for Astana (Kazakhstan)
      Europe/Bucharest is wrong for Chisinau
      America/Mexico_City is wrong for Chetumal
      America/Buenos_Aires is wrong for Cayenne
      America/Caracas has its own zone, so poor fit for La Paz
      US/Eastern is wrong for Haiti
      US/Eastern is wrong for Indiana (East)
      Asia/Karachi is wrong for Tashkent
      Etc/UTC+12 doesn't exist
      Signs of Etc/GMT zones were backwards
      
      * Judgment calls:
      
      (These changes follow CLDR's choices, except for the first one)
      
      Use Europe/London for "Greenwich Standard Time", since that seems much
      more likely than Africa/Casablanca to be what people will think that
      zone name means.  CLDR has Atlantic/Reykjavik here, but that's no better.
      
      Asia/Shanghai seems a better fit than Hong Kong for "China Standard
      Time".
      
      Europe/Sarajevo is now a link to Belgrade, ie "Central Europe Standard
      Time"; so use Warsaw for "Central European Standard Time".
      
      America/Sao_Paulo seems more representative than Araguaina for
      "E. South America Standard Time".
      
      Africa/Johannesburg seems more representative than Harare for
      "South Africa Standard Time".
      
      * New Windows zone names:
      
      "Israel Standard Time"
      "Kaliningrad Standard Time"
      "Russia Time Zone N" for various N
      "Singapore Standard Time"
      "South Sudan Standard Time"
      "W. Central Africa Standard Time"
      "West Bank Standard Time"
      "Yukon Standard Time"
      
      Some of these replace older spellings, but I kept the older spellings
      too in case our code runs on a machine with the older data.
      
      * Replace aliases (tzdb Links) with underlying city-named zones:
      
      (This tracks tzdb's longstanding practice, and reduces inconsistency
      with the rest of the entries, as well as with CLDR.)
      
      US/Alaska
      Asia/Kuwait
      Asia/Muscat
      Canada/Atlantic
      Australia/Canberra
      Canada/Saskatchewan
      US/Central
      US/Eastern
      US/Hawaii
      US/Mountain
      Canada/Newfoundland
      US/Pacific
      
      Back-patch to all supported branches, as is our usual practice for
      time zone data updates.
      
      Discussion: https://postgr.es/m/3266414.1633045628@sss.pgh.pa.us
      fa8db487
    • Tom Lane's avatar
      Re-alphabetize the win32_tzmap[] array. · 81464999
      Tom Lane authored
      The original intent seems to have been to sort case-insensitively
      by the Windows zone name, but various changes over the years did
      not get that memo.  This commit just moves a few entries to
      restore exact alphabetic order, to ease comparison to the outputs
      of processing scripts.
      
      Back-patch to all supported branches, as is our usual practice for
      time zone data updates.
      
      Discussion: https://postgr.es/m/3266414.1633045628@sss.pgh.pa.us
      81464999
  4. 01 Oct, 2021 7 commits
  5. 30 Sep, 2021 2 commits
    • Tom Lane's avatar
      Remove gratuitous environment dependency in 002_types.pl test. · afc6081f
      Tom Lane authored
      Computing related timestamps by subtracting "N days" is sensitive
      to the prevailing timezone, since we interpret that as "same local
      time on the N'th prior day".  Even though the intervals in question
      are only two to four days, through remarkable bad luck they managed
      to cross the end of Ramadan in 2014, causing the test's output to
      change if timezone is set to Africa/Casablanca.  (Maybe in other
      Muslim areas as well; I didn't check.)  There's absolutely no reason
      for this test to exercise interval subtraction, so just get rid of
      that and use plain timestamptz constants representing the intended
      values.
      
      Per report from Andres Freund.  Back-patch to v10 where this test
      script came in.
      
      Discussion: https://postgr.es/m/20210930183641.7lh4jhvpipvromca@alap3.anarazel.de
      afc6081f
    • Alvaro Herrera's avatar
      Repair two portability oversights of new test · e3731bac
      Alvaro Herrera authored
      First, as pointed out by Tom Lane and Michael Paquier, I failed to
      realize that Windows' PostgresNode needs an extra pg_hba.conf line
      (added by PostgresNode->set_replication_conf, called internally by
      ->init() when 'allows_streaming=>1' is given -- but I purposefully
      omitted that).  I think a good fix should be to have nodes with only
      'has_archiving=>1' set up for replication too, but that's a bigger
      discussion.  Fix it by calling ->set_replication_conf, which is not
      unprecedented, as pointed out by Andrew Dunstan.
      
      I also forgot to uncomment a ->finish() call for a pumpable IPC::Run
      file descriptor.  Apparently this is innocuous in almost all platforms.
      
      Backpatch to 14.  The older branches were added this file too, but not
      this particular part of the test.
      
      Discussion: https://postgr.es/m/3000074.1632947632@sss.pgh.pa.us
      Discussion: https://postgr.es/m/YVT7qwhR8JmC2kfz@paquier.xyz
      e3731bac
  6. 29 Sep, 2021 6 commits
    • Alvaro Herrera's avatar
      Fix WAL replay in presence of an incomplete record · 64a8687a
      Alvaro Herrera authored
      Physical replication always ships WAL segment files to replicas once
      they are complete.  This is a problem if one WAL record is split across
      a segment boundary and the primary server crashes before writing down
      the segment with the next portion of the WAL record: WAL writing after
      crash recovery would happily resume at the point where the broken record
      started, overwriting that record ... but any standby or backup may have
      already received a copy of that segment, and they are not rewinding.
      This causes standbys to stop following the primary after the latter
      crashes:
        LOG:  invalid contrecord length 7262 at A8/D9FFFBC8
      because the standby is still trying to read the continuation record
      (contrecord) for the original long WAL record, but it is not there and
      it will never be.  A workaround is to stop the replica, delete the WAL
      file, and restart it -- at which point a fresh copy is brought over from
      the primary.  But that's pretty labor intensive, and I bet many users
      would just give up and re-clone the standby instead.
      
      A fix for this problem was already attempted in commit 515e3d84a0b5, but
      it only addressed the case for the scenario of WAL archiving, so
      streaming replication would still be a problem (as well as other things
      such as taking a filesystem-level backup while the server is down after
      having crashed), and it had performance scalability problems too; so it
      had to be reverted.
      
      This commit fixes the problem using an approach suggested by Andres
      Freund, whereby the initial portion(s) of the split-up WAL record are
      kept, and a special type of WAL record is written where the contrecord
      was lost, so that WAL replay in the replica knows to skip the broken
      parts.  With this approach, we can continue to stream/archive segment
      files as soon as they are complete, and replay of the broken records
      will proceed across the crash point without a hitch.
      
      Because a new type of WAL record is added, users should be careful to
      upgrade standbys first, primaries later. Otherwise they risk the standby
      being unable to start if the primary happens to write such a record.
      
      A new TAP test that exercises this is added, but the portability of it
      is yet to be seen.
      
      This has been wrong since the introduction of physical replication, so
      backpatch all the way back.  In stable branches, keep the new
      XLogReaderState members at the end of the struct, to avoid an ABI
      break.
      
      Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
      Reviewed-by: default avatarKyotaro Horiguchi <horikyota.ntt@gmail.com>
      Reviewed-by: default avatarNathan Bossart <bossartn@amazon.com>
      Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
      64a8687a
    • Bruce Momjian's avatar
      doc: PG 14 relnotes, improve cache invalidation wording · 4f2c7531
      Bruce Momjian authored
      Reported-by: Simon Riggs (privately)
      
      Backpatch-through: 14 only
      4f2c7531
    • Fujii Masao's avatar
      pgbench: Fix handling of socket errors during benchmark. · 8231c500
      Fujii Masao authored
      Previously socket errors such as invalid socket or socket wait method failures
      during benchmark caused pgbench to exit with status 0. Instead, errors during
      the run should result in exit status 2.
      
      Back-patch to v12 where pgbench started reporting exit status.
      
      Original complaint and patch by Hayato Kuroda.
      
      Author: Yugo Nagata, Fabien COELHO
      Reviewed-by: Kyotaro Horiguchi, Fujii Masao
      Discussion: https://postgr.es/m/TYCPR01MB5870057375ACA8A73099C649F5349@TYCPR01MB5870.jpnprd01.prod.outlook.com
      8231c500
    • Fujii Masao's avatar
      pgbench: Correct log level of message output when socket wait method fails. · 8021334d
      Fujii Masao authored
      The failure of socket wait method like "select()" doesn't terminate pgbench.
      So the log level of error message when that failure happens should be ERROR.
      But previously FATAL was used in that case.
      
      Back-patch to v13 where pgbench started using common logging API.
      
      Author: Yugo Nagata, Fabien COELHO
      Reviewed-by: Kyotaro Horiguchi, Fujii Masao
      Discussion: https://postgr.es/m/20210617005934.8bd37bf72efd5f1b38e6f482@sraoss.co.jp
      8021334d
    • Michael Paquier's avatar
      Clarify use of "statistics objects" in the code · 2cf9cf5d
      Michael Paquier authored
      The code inconsistently used "statistic object" or "statistics" where
      the correct term, as discussed, is actually "statistics object".  This
      improves the state of the code to be more consistent.
      
      While on it, fix an incorrect error message introduced in a4d75c86.  This
      error should never happen, as the code states, but it would be
      misleading.
      
      Author: Justin Pryzby
      Reviewed-by: Álvaro Herrera, Michael Paquier
      Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
      Backpatch-through: 14
      2cf9cf5d
    • Michael Paquier's avatar
      doc: Fix some typos and markups · 2a27dbae
      Michael Paquier authored
      Author: Ekaterina Kiryanova
      Discussion: https://postgr.es/m/8a14e78f-6991-7a6e-4711-fe376635f2ad@postgrespro.ru
      Backpatch-through: 14
      2a27dbae
  7. 28 Sep, 2021 3 commits
  8. 27 Sep, 2021 3 commits
  9. 26 Sep, 2021 1 commit
  10. 25 Sep, 2021 6 commits
  11. 23 Sep, 2021 4 commits
  12. 22 Sep, 2021 2 commits
    • Amit Kapila's avatar
      Invalidate all partitions for a partitioned table in publication. · 9eff8593
      Amit Kapila authored
      Updates/Deletes on a partition were allowed even without replica identity
      after the parent table was added to a publication. This would later lead
      to an error on subscribers. The reason was that we were not invalidating
      the partition's relcache and the publication information for partitions
      was not getting rebuilt. Similarly, we were not invalidating the
      partitions' relcache after dropping a partitioned table from a publication
      which will prohibit Updates/Deletes on its partition without replica
      identity even without any publication.
      
      Reported-by: Haiying Tang
      Author: Hou Zhijie and Vignesh C
      Reviewed-by: Vignesh C and Amit Kapila
      Backpatch-through: 13
      Discussion: https://postgr.es/m/OS0PR01MB6113D77F583C922F1CEAA1C3FBD29@OS0PR01MB6113.jpnprd01.prod.outlook.com
      9eff8593
    • Peter Geoghegan's avatar
      Fix "single value strategy" index deletion issue. · e665129c
      Peter Geoghegan authored
      It is not appropriate for deduplication to apply single value strategy
      when triggered by a bottom-up index deletion pass.  This wastes cycles
      because later bottom-up deletion passes will overinterpret older
      duplicate tuples that deduplication actually just skipped over "by
      design".  It also makes bottom-up deletion much less effective for low
      cardinality indexes that happen to cross a meaningless "index has single
      key value per leaf page" threshold.
      
      To fix, slightly narrow the conditions under which deduplication's
      single value strategy is considered.  We already avoided the strategy
      for a unique index, since our high level goal must just be to buy time
      for VACUUM to run (not to buy space).  We'll now also avoid it when we
      just had a bottom-up pass that reported failure.  The two cases share
      the same high level goal, and already overlapped significantly, so this
      approach is quite natural.
      
      Oversight in commit d168b666, which added bottom-up index deletion.
      
      Author: Peter Geoghegan <pg@bowt.ie>
      Discussion: https://postgr.es/m/CAH2-WznaOvM+Gyj-JQ0X=JxoMDxctDTYjiEuETdAGbF5EUc3MA@mail.gmail.com
      Backpatch: 14-, where bottom-up deletion was introduced.
      e665129c
  13. 21 Sep, 2021 2 commits
    • Michael Paquier's avatar
      Fix places in TestLib.pm in need of adaptation to the output of Msys perl · 90251ff1
      Michael Paquier authored
      Contrary to the output of native perl, Msys perl generates outputs with
      CRLFs characters.  There are already places in the TAP code where CRLFs
      (\r\n) are automatically converted to LF (\n) on Msys, but we missed a
      couple of places when running commands and using their output for
      comparison, that would lead to failures.
      
      This problem has been found thanks to the test added in 5adb067 using
      TestLib::command_checks_all(), but after a closer look more code paths
      were missing a filter.
      
      This is backpatched all the way down to prevent any surprises if a new
      test is introduced in stable branches.
      
      Reviewed-by: Andrew Dunstan, Álvaro Herrera
      Discussion: https://postgr.es/m/1252480.1631829409@sss.pgh.pa.us
      Backpatch-through: 9.6
      90251ff1
    • Tom Lane's avatar
      Fix misevaluation of STABLE parameters in CALL within plpgsql. · 2ad5f963
      Tom Lane authored
      Before commit 84f5c290, a STABLE function in a plpgsql CALL
      statement's argument list would see an up-to-date snapshot,
      because exec_stmt_call would push a new snapshot.  I got rid of
      that because the possibility of the snapshot disappearing within
      COMMIT made it too hard to manage a snapshot across the CALL
      statement.  That's fine so far as the procedure itself goes,
      but I forgot to think about the possibility of STABLE functions
      within the CALL argument list.  As things now stand, those'll
      be executed with the Portal's snapshot as ActiveSnapshot,
      keeping them from seeing updates more recent than Portal startup.
      
      (VOLATILE functions don't have a problem because they take their
      own snapshots; which indeed is also why the procedure itself
      doesn't have a problem.  There are no STABLE procedures.)
      
      We can fix this by pushing a new snapshot transiently within
      ExecuteCallStmt itself.  Popping the snapshot before we get
      into the procedure proper eliminates the management problem.
      The possibly-useless extra snapshot-grab is slightly annoying,
      but it's no worse than what happened before 84f5c290.
      
      Per bug #17199 from Alexander Nawratil.  Back-patch to v11,
      like the previous patch.
      
      Discussion: https://postgr.es/m/17199-1ab2561f0d94af92@postgresql.org
      2ad5f963