1. 17 Sep, 2020 6 commits
    • Heikki Linnakangas's avatar
      Add support for building GiST index by sorting. · 16fa9b2b
      Heikki Linnakangas authored
      This adds a new optional support function to the GiST access method:
      sortsupport. If it is defined, the GiST index is built by sorting all data
      to the order defined by the sortsupport's comparator function, and packing
      the tuples in that order to GiST pages. This is similar to how B-tree
      index build works, and is much faster than inserting the tuples one by
      one. The resulting index is smaller too, because the pages are packed more
      tightly, upto 'fillfactor'. The normal build method works by splitting
      pages, which tends to lead to more wasted space.
      
      The quality of the resulting index depends on how good the opclass-defined
      sort order is. A good order preserves locality of the input data.
      
      As the first user of this facility, add 'sortsupport' function to the
      point_ops opclass. It sorts the points in Z-order (aka Morton Code), by
      interleaving the bits of the X and Y coordinates.
      
      Author: Andrey Borodin
      Reviewed-by: Pavel Borisov, Thomas Munro
      Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
      16fa9b2b
    • Michael Paquier's avatar
      doc: Apply more consistently <productname> markup for OpenSSL · 089da3c4
      Michael Paquier authored
      OpenSSL was quoted in inconsistent ways in many places of the docs,
      sometimes with <application>, <productname> or just nothing.
      
      Author: Daniel Gustafsson
      Discussion: https://postgr.es/m/DA91E5F0-5F9D-41A7-A7A6-B91CDE0F1D63@yesql.se
      089da3c4
    • Michael Paquier's avatar
      Improve tab completion of IMPORT FOREIGN SCHEMA in psql · 7307df16
      Michael Paquier authored
      It is not possible to get a list of foreign schemas as the server is not
      known, so this provides instead a list of local schemas, which is more
      useful than nothing if using a loopback server or having schema names
      matching in the local and remote servers.
      
      Author: Jeff Janes
      Reviewed-by: Tom Lane, Michael Paquier
      Discussion: https://postgr.es/m/CAMkU=1wr7Roj41q-XiJs=Uyc2xCmHhcGGy7J-peJQK-e+w=ghw@mail.gmail.com
      7307df16
    • Tom Lane's avatar
      Teach walsender to update its process title for replication commands. · babef40c
      Tom Lane authored
      Because the code path taken for SQL commands executed in a walsender
      will update the process title, we pretty much have to update the
      title for replication commands as well.  Otherwise, the title shows
      "idle" for the rest of a logical walsender's lifetime once it's
      executed any SQL command.
      
      Playing with this, I confirm that a walsender now typically spends
      most of its life reporting
      	walsender postgres [local] START_REPLICATION
      Considering this in isolation, it might be better to have it say
      	walsender postgres [local] sending replication data
      However, consistency with the other cases seems to be a stronger
      argument.
      
      In passing, remove duplicative pgstat_report_activity call.
      
      Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us
      babef40c
    • Tom Lane's avatar
      Improve formatting of create_help.pl and plperl_opmask.pl output. · add10584
      Tom Lane authored
      Adjust the whitespace in the emitted files so that it matches
      what pgindent would do.  This makes the generated files look
      like they match project style, and avoids confusion if someone
      does run pgindent on the generated files.
      
      Also, add probes.h to pgindent's exclusion list, because it can
      confuse pgindent, plus there's not much point in processing it.
      
      Daniel Gustafsson, additional fixes by me
      
      Discussion: https://postgr.es/m/79ed5348-be7a-b647-dd40-742207186a22@2ndquadrant.com
      add10584
    • Alvaro Herrera's avatar
      Fix bogus completion tag usage in walsender · 07082b08
      Alvaro Herrera authored
      Since commit fd5942c1 (2012, 9.3-era), walsender has been sending
      completion tags for certain replication commands twice -- and they're
      not even consistent.  Apparently neither libpq nor JDBC have a problem
      with it, but it's not kosher.  Fix by remove the EndCommand() call in
      the common code path for them all, and inserting specific calls to
      EndReplicationCommand() specifically in those places where it's needed.
      
      EndReplicationCommand() is a new simple function to send the completion
      tag for replication commands.  Do this instead of sending a generic
      SELECT completion tag for them all, which was also pretty bogus (if
      innocuous).  While at it, change StartReplication() to use
      EndReplicationCommand() instead of pg_puttextmessage().
      
      In commit 2f966131, I failed to realize that replication commands
      are not close-enough kin of regular SQL commands, so the
      DROP_REPLICATION_SLOT tag I added is undeserved and a type pun.  Take it
      out.
      
      Backpatch to 13, where the latter commit appeared.  The duplicate tag
      has been sent since 9.3, but since nothing is broken, it doesn't seem
      worth fixing.
      
      Per complaints from Tom Lane.
      
      Discussion: https://postgr.es/m/1347966.1600195735@sss.pgh.pa.us
      07082b08
  2. 16 Sep, 2020 12 commits
    • Tom Lane's avatar
      Centralize setup of SIGQUIT handling for postmaster child processes. · 44fc6e25
      Tom Lane authored
      We decided that the policy established in commit 7634bd4f for
      the bgwriter, checkpointer, walwriter, and walreceiver processes,
      namely that they should accept SIGQUIT at all times, really ought
      to apply uniformly to all postmaster children.  Therefore, get
      rid of the duplicative and inconsistent per-process code for
      establishing that signal handler and removing SIGQUIT from BlockSig.
      Instead, make InitPostmasterChild do it.
      
      The handler set up by InitPostmasterChild is SignalHandlerForCrashExit,
      which just summarily does _exit(2).  In interactive backends, we
      almost immediately replace that with quickdie, since we would prefer
      to try to tell the client that we're dying.  However, this patch is
      changing the behavior of autovacuum (both launcher and workers), as
      well as walsenders.  Those processes formerly also used quickdie,
      but AFAICS that was just mindless copy-and-paste: they don't have
      any interactive client that's likely to benefit from being told this.
      
      The stats collector continues to be an outlier, in that it thinks
      SIGQUIT means normal exit.  That should probably be changed for
      consistency, but there's another patch set where that's being
      dealt with, so I didn't do so here.
      
      Discussion: https://postgr.es/m/644875.1599933441@sss.pgh.pa.us
      44fc6e25
    • Tom Lane's avatar
      Don't fetch partition check expression during InitResultRelInfo. · 2000b6c1
      Tom Lane authored
      Since there is only one place that actually needs the partition check
      expression, namely ExecPartitionCheck, it's better to fetch it from
      the relcache there.  In this way we will never fetch it at all if
      the query never has use for it, and we still fetch it just once when
      we do need it.
      
      The reason for taking an interest in this is that if the relcache
      doesn't already have the check expression cached, fetching it
      requires obtaining AccessShareLock on the partition root.  That
      means that operations that look like they should only touch the
      partition itself will also take a lock on the root.  In particular
      we observed that TRUNCATE on a partition may take a lock on the
      partition's root, contributing to a deadlock situation in parallel
      pg_restore.
      
      As written, this patch does have a small cost, which is that we
      are microscopically reducing efficiency for the case where a partition
      has an empty check expression.  ExecPartitionCheck will be called,
      and will go through the motions of setting up and checking an empty
      qual, where before it would not have been called at all.  We could
      avoid that by adding a separate boolean flag to track whether there
      is a partition expression to test.  However, this case only arises
      for a default partition with no siblings, which surely is not an
      interesting case in practice.  Hence adding complexity for it
      does not seem like a good trade-off.
      
      Amit Langote, per a suggestion by me
      
      Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
      2000b6c1
    • Peter Geoghegan's avatar
      Fix amcheck child check pg_upgrade bug. · aac80bfc
      Peter Geoghegan authored
      Commit d114cc53 overlooked the fact that pg_upgrade'd B-Tree indexes
      have leaf page high keys whose offset numbers do not match the one from
      the copy of the tuple one level up (the copy stored with a downlink for
      leaf page's right sibling page).  This led to false positive reports of
      corruption from bt_index_parent_check() when it was called to verify a
      pg_upgrade'd index.
      
      To fix, skip comparing the offset number on pg_upgrade'd B-Tree indexes.
      
      Author: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
      Author: Peter Geoghegan <pg@bowt.ie>
      Reported-By: default avatarAndrew Bille <andrewbille@gmail.com>
      Diagnosed-By: default avatarAnastasia Lubennikova <a.lubennikova@postgrespro.ru>
      Bug: #16619
      Discussion: https://postgr.es/m/16619-aaba10f83fdc1c3c@postgresql.org
      Backpatch: 13-, where child check was enhanced.
      aac80bfc
    • Tom Lane's avatar
      Avoid unnecessary recursion to child tables in ALTER TABLE SET NOT NULL. · e5fac1cb
      Tom Lane authored
      If a partitioned table's column is already marked NOT NULL, there is
      no need to examine its partitions, because we can rely on previous
      DDL to have enforced that the child columns are NOT NULL as well.
      (Unfortunately, the same cannot be said for traditional inheritance,
      so for now we have to restrict the optimization to partitioned tables.)
      Hence, we may skip recursing to child tables in this situation.
      
      The reason this case is worth worrying about is that when pg_dump dumps
      a partitioned table having a primary key, it will include the requisite
      NOT NULL markings in the CREATE TABLE commands, and then add the
      primary key as a separate step.  The primary key addition generates a
      SET NOT NULL as a subcommand, just to be sure.  So the situation where
      a SET NOT NULL is redundant does arise in the real world.
      
      Skipping the recursion does more than just save a few cycles: it means
      that a command such as "ALTER TABLE ONLY partition_parent ADD PRIMARY
      KEY" will take locks only on the partition parent table, not on the
      partitions.  It turns out that parallel pg_restore is effectively
      assuming that that's true, and has little choice but to do so because
      the dependencies listed for such a TOC entry don't include the
      partitions.  pg_restore could thus issue this ALTER while data restores
      on the partitions are still in progress.  Taking unnecessary locks on
      the partitions not only hurts concurrency, but can lead to actual
      deadlock failures, as reported by Domagoj Smoljanovic.
      
      (A contributing factor in the deadlock is that TRUNCATE on a child
      partition wants a non-exclusive lock on the parent.  This seems
      likewise unnecessary, but the fix for it is more invasive so we
      won't consider back-patching it.  Fortunately, getting rid of one
      of these two poor behaviors is enough to remove the deadlock.)
      
      Although support for partitioned primary keys came in with v11,
      this patch is dependent on the SET NOT NULL refactoring done by
      commit f4a3fdfb, so we can only patch back to v12.
      
      Patch by me; thanks to Alvaro Herrera and Amit Langote for review.
      
      Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
      e5fac1cb
    • Tom Lane's avatar
      Fix bogus cache-invalidation logic in logical replication worker. · 3d65b059
      Tom Lane authored
      The code recorded cache invalidation events by zeroing the "localreloid"
      field of affected cache entries.  However, it's possible for an inval
      event to occur even while we have the entry open and locked.  So an
      ill-timed inval could result in "cache lookup failed for relation 0"
      errors, if the worker's code tried to use the cleared field.  We can
      fix that by creating a separate bool field to record whether the entry
      needs to be revalidated.  (In the back branches, cram the bool into
      what had been padding space, to avoid an ABI break in the somewhat
      unlikely event that any extension is looking at this struct.)
      
      Also, rearrange the logic in logicalrep_rel_open so that it
      does the right thing in cases where table_open would fail.
      We should retry the lookup by name in that case, but we didn't.
      
      The real-world impact of this is probably small.  In the first place,
      the error conditions are very low probability, and in the second place,
      the worker would just exit and get restarted.  We only noticed because
      in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly,
      preventing the worker from making progress.  Nonetheless, it's clearly
      a bug, and it impedes a useful type of testing; so back-patch to v10
      where this code was introduced.
      
      Discussion: https://postgr.es/m/1032727.1600096803@sss.pgh.pa.us
      3d65b059
    • Fujii Masao's avatar
      Add leader_pid field into the example of file_fdw for csvlog. · e568ed0e
      Fujii Masao authored
      Commit b8fdee7d added leader_pid field into csvlog,
      but forgot to update the example of file_fdw for csvlog.
      
      Author: Yuta Katsuragi
      e568ed0e
    • Michael Paquier's avatar
      Avoid retrieval of CHECK constraints and DEFAULT exprs in data-only dump · 5423853f
      Michael Paquier authored
      Those extra queries are not necessary when doing a data-only dump.  With
      this change, this means that the dependencies between CHECK/DEFAULT and
      the parent table are not tracked anymore for a data-only dump.  However,
      these dependencies are only used for the schema generation and we have
      never guaranteed that a dump can be reloaded if a CHECK constraint uses
      a custom function whose behavior changes when loading the data, like
      when using cross-table references in the CHECK function.
      
      Author: Julien Rouhaud
      Reviewed-by: Daniel Gustafsson, Michael Paquier
      Discussion: https://postgr.es/m/20200712054850.GA92357@nol
      5423853f
    • Jeff Davis's avatar
      Change LogicalTapeSetBlocks() to use nBlocksWritten. · c8aeaf3a
      Jeff Davis authored
      Previously, it was based on nBlocksAllocated to account for tapes with
      open write buffers that may not have made it to the BufFile yet.
      
      That was unnecessary, because callers do not need to get the number of
      blocks while a tape has an open write buffer; and it also conflicted
      with the preallocation logic added for HashAgg.
      
      Reviewed-by: Peter Geoghegan
      Discussion: https://postgr.es/m/ce5af05900fdbd0e9185747825a7423c48501964.camel@j-davis.com
      Backpatch-through: 13
      c8aeaf3a
    • Jeff Davis's avatar
      HashAgg: release write buffers sooner by rewinding tape. · 3bd35d4f
      Jeff Davis authored
      This was an oversight. The purpose of 7fdd919a was to avoid keeping
      tape buffers around unnecessisarily, but HashAgg didn't rewind early
      enough.
      
      Reviewed-by: Peter Geoghegan
      Discussion: https://postgr.es/m/1fb1151c2cddf8747d14e0532da283c3f97e2685.camel@j-davis.com
      Backpatch-through: 13
      3bd35d4f
    • Amit Kapila's avatar
      Fix initialization of RelationSyncEntry for streaming transactions. · 69bd6067
      Amit Kapila authored
      In commit 46482432, for each RelationSyncEntry we maintained the list
      of xids (streamed_txns) for which we have already sent the schema. This
      helps us to track when to send the schema to the downstream node for
      replication of streaming transactions. Before this list got initialized,
      we were processing invalidation messages which access this list and led
      to an assertion failure.
      
      In passing, clean up the nearby code:
      
      * Initialize the list of xids with NIL instead of NULL which is our usual
      coding practice.
      * Remove the MemoryContext switch for creating a RelationSyncEntry in dynahash.
      
      Diagnosed-by: Amit Kapila and Tom Lane
      Author: Amit Kapila
      Reviewed-by: Tom Lane and Dilip Kumar
      Discussion: https://postgr.es/m/904373.1600033123@sss.pgh.pa.us
      69bd6067
    • David Rowley's avatar
      Optimize compactify_tuples function · 19c60ad6
      David Rowley authored
      This function could often be seen in profiles of vacuum and could often
      be a significant bottleneck during recovery. The problem was that a qsort
      was performed in order to sort an array of item pointers in reverse offset
      order so that we could use that to safely move tuples up to the end of the
      page without overwriting the memory of yet-to-be-moved tuples. i.e. we
      used to compact the page starting at the back of the page and move towards
      the front. The qsort that this required could be expensive for pages with
      a large number of tuples.
      
      In this commit, we take another approach to tuple compactification.
      
      Now, instead of sorting the remaining item pointers array we first check
      if the array is presorted and only memmove() the tuples that need to be
      moved. This presorted check can be done very cheaply in the calling
      functions when the array is being populated. This presorted case is very
      fast.
      
      When the item pointer array is not presorted we must copy tuples that need
      to be moved into a temp buffer before copying them back into the page
      again. This differs from what we used to do here as we're now copying the
      tuples back into the page in reverse line pointer order. Previously we
      left the existing order alone.  Reordering the tuples results in an
      increased likelihood of hitting the pre-sorted case the next time around.
      Any newly added tuple which consumes a new line pointer will also maintain
      the correct sort order of tuples in the page which will also result in the
      presorted case being hit the next time.  Only consuming an unused line
      pointer can cause the order of tuples to go out again, but that will be
      corrected next time the function is called for the page.
      
      Benchmarks have shown that the non-presorted case is at least equally as
      fast as the original qsort method even when the page just has a few
      tuples. As the number of tuples becomes larger the new method maintains
      its performance whereas the original qsort method became much slower when
      the number of tuples on the page became large.
      
      Author: David Rowley
      Reviewed-by: Thomas Munro
      Tested-by: Jakub Wartak
      Discussion: https://postgr.es/m/CA+hUKGKMQFVpjr106gRhwk6R-nXv0qOcTreZuQzxgpHESAL6dw@mail.gmail.com
      19c60ad6
    • Alvaro Herrera's avatar
      Fix use-after-free bug with event triggers in an extension script · ced138e8
      Alvaro Herrera authored
      ALTER TABLE commands in an extension script are added to an event
      trigger command list; but starting with commit b5810de3 they do so in
      a memory context that's too short-lived, so when execution ends and time
      comes to use the entries, they've already been freed.
      
      (This would also be a problem with ALTER TABLE commands in a
      multi-command query string, but these serendipitously end in
      PortalContext -- which probably explains why it took so long for this to
      be reported.)
      
      Fix by using the memory context specifically set for that, instead.
      
      Backpatch to 13, where the aforementioned commit appeared.
      
      Reported-by: Philippe Beaudoin
      Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
      Discussion: https://postgr.es/m/20200902193715.6e0269d4@firost
      ced138e8
  3. 15 Sep, 2020 3 commits
  4. 14 Sep, 2020 6 commits
    • Tom Lane's avatar
      Make walsenders show their replication commands in pg_stat_activity. · f560209c
      Tom Lane authored
      A walsender process that has executed a SQL command left the text of
      that command in pg_stat_activity.query indefinitely, which is quite
      confusing if it's in RUNNING state but not doing that query.  An easy
      and useful fix is to treat replication commands as if they were SQL
      queries, and show them in pg_stat_activity according to the same rules
      as for regular queries.  While we're at it, it seems also sensible to
      set debug_query_string, allowing error logging and debugging to see
      the replication command.
      
      While here, clean up assorted silliness in exec_replication_command:
      
      * The SQLCmd path failed to restore CurrentMemoryContext to the caller's
      value, and failed to delete the temp context created in this routine.
      It's only through great good fortune that these oversights did not
      result in long-term memory leaks or other problems.  It seems cleaner
      to code SQLCmd as a separate early-exit path, so do it like that.
      
      * Remove useless duplicate call of SnapBuildClearExportedSnapshot().
      
      * replication_scanner_finish() was never called.
      
      None of those things are significant enough to merit a backpatch,
      so this is for HEAD only.
      
      Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us
      f560209c
    • Noah Misch's avatar
      Fix interpolation in test name. · 47a3a1c3
      Noah Misch authored
      A pre-commit review had reported the problem, but the fix reached only
      v10 and earlier.  Back-patch to v11.
      
      Discussion: https://postgr.es/m/20200423.140546.1055476118690602079.horikyota.ntt@gmail.com
      47a3a1c3
    • Fujii Masao's avatar
      Fix typos. · 95233011
      Fujii Masao authored
      Author: Naoki Nakamichi
      Discussion: https://postgr.es/m/b6919d145af00295a8e86ce4d034b7cd@oss.nttdata.com
      95233011
    • Michael Paquier's avatar
      Make index_set_state_flags() transactional · 83158f74
      Michael Paquier authored
      3c840464 is the original commit that introduced index_set_state_flags(),
      where the presence of SnapshotNow made necessary the use of an in-place
      update.  SnapshotNow has been removed in 813fb031, so there is no actual
      reasons to not make this operation transactional.
      
      Note that while making the operation more robust, using a transactional
      operation in this routine was not strictly necessary as there was no use
      case for it yet.  However, some future features are going to need a
      transactional behavior, like support for CREATE/DROP INDEX CONCURRENTLY
      with partitioned tables, where indexes in a partition tree need to have
      all their pg_index.indis* flags updated in the same transaction to make
      the operation stable to the end-user by keeping partition trees
      consistent, even with a failure mid-flight.
      
      REINDEX CONCURRENTLY uses already transactional updates when swapping
      the old and new indexes, making this change more consistent with the
      index-swapping logic.
      
      Author: Michael Paquier
      Reviewed-by: Anastasia Lubennikova
      Discussion: https://postgr.es/m/20200903080440.GA8559@paquier.xyz
      83158f74
    • Peter Eisentraut's avatar
      Message fixes and style improvements · 3e0242b2
      Peter Eisentraut authored
      3e0242b2
    • Michael Paquier's avatar
      Avoid useless allocations for information of dumpable objects in pg_dump/ · ac673a1a
      Michael Paquier authored
      If there are no objects of a certain type, there is no need to do an
      allocation for a set of DumpableObject items.  The previous coding did
      an allocation of 1 byte instead as per the fallback of pg_malloc() in
      the event of an allocation size of zero.  This assigns NULL instead for
      a set of dumpable objects.
      
      A similar rule already applied to findObjectByOid(), so this makes the
      code more defensive as we would just fail with a pointer dereference
      instead of attempting to use some incorrect data if a non-existing,
      positive, OID is given by a caller of this function.
      
      Author: Daniel Gustafsson
      Reviewed-by: Julien Rouhaud, Ranier Vilela
      Discussion: https://postgr.es/m/26C43E58-BDD0-4F1A-97CC-4A07B52E32C5@yesql.se
      ac673a1a
  5. 13 Sep, 2020 1 commit
    • Tom Lane's avatar
      Use the properly transformed RangeVar for expandTableLikeClause(). · 19f5a37b
      Tom Lane authored
      transformCreateStmt() adjusts the transformed statement's RangeVar
      to specify the target schema explicitly, for the express reason
      of making sure that auxiliary statements derived by parse
      transformation operate on the right table.  But the refactoring
      I did in commit 50289819 got this wrong and passed the untransformed
      RangeVar to expandTableLikeClause().  This could lead to assertion
      failures or weird misbehavior if the wrong table was accessed.
      
      Per report from Alexander Lakhin.  Like the previous patch, back-patch
      to all supported branches.
      
      Discussion: https://postgr.es/m/05051f9d-b32b-cb35-6735-0e9f2ab86b5f@gmail.com
      19f5a37b
  6. 12 Sep, 2020 3 commits
  7. 11 Sep, 2020 6 commits
  8. 10 Sep, 2020 3 commits
    • Alvaro Herrera's avatar
      Print WAL logical message contents in pg_waldump · 9f1cf97b
      Alvaro Herrera authored
      This helps debuggability when looking at WAL streams containing logical
      messages.
      
      Author: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
      Reviewed-by: default avatarÁlvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/CAExHW5sWx49rKmXbg5H1Xc1t+nRv9PaYKQmgw82HPt6vWDVmDg@mail.gmail.com
      9f1cf97b
    • Tom Lane's avatar
      Use _exit(2) for SIGQUIT during ProcessStartupPacket, too. · 58c6fecc
      Tom Lane authored
      Bring the signal handling for startup-packet collection into line
      with the policy established in commits bedadc73 and 8e19a826,
      namely don't risk running atexit callbacks when handling SIGQUIT.
      
      Ideally, we'd not do so for SIGTERM or timeout interrupts either,
      but that change seems a bit too risky for the back branches.
      For now, just improve the comments in this area to describe the risk.
      
      Also relocate where BackendInitialize re-disables these interrupts,
      to minimize the code span where they're active.  This doesn't buy
      a whole lot of safety, but it can't hurt.
      
      In passing, rename startup_die() to remove confusion about whether
      it is for the startup process.
      
      Like the previous commits, back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/1850884.1599601164@sss.pgh.pa.us
      58c6fecc
    • Robert Haas's avatar
      New contrib module, pg_surgery, with heap surgery functions. · 34a947ca
      Robert Haas authored
      Sometimes it happens that the visibility information for a tuple
      becomes corrupted, either due to bugs in the database software or
      external factors. Provide a function heap_force_kill() that can
      be used to truncate such dead tuples to dead line pointers, and
      a function heap_force_freeze() that can be used to overwrite the
      visibility information in such a way that the tuple becomes
      all-visible.
      
      These functions are unsafe, in that you can easily use them to
      corrupt a database that was not previously corrupted, and you can
      use them to further corrupt an already-corrupted database or to
      destroy data. The documentation accordingly cautions against
      casual use. However, in some cases they permit recovery of data
      that would otherwise be very difficult to recover, or to allow a
      system to continue to function when it would otherwise be difficult
      to do so.
      
      Because we may want to add other functions for performing other
      kinds of surgery in the future, the new contrib module is called
      pg_surgery rather than something specific to these functions. I
      proposed back-patching this so that it could be more easily used
      by people running existing releases who are facing these kinds of
      problems, but that proposal did not attract enough support, so
      no back-patch for now.
      
      Ashutosh Sharma, reviewed and tested by Andrey M. Borodin,
      M. Beena Emerson, Masahiko Sawada, Rajkumar Raghuwanshi,
      Asim Praveen, and Mark Dilger, and somewhat revised by me.
      
      Discussion: http://postgr.es/m/CA+TgmoZW1fsU-QUNCRUQMGUygBDPVeOTLCqRdQZch=EYZnctSA@mail.gmail.com
      34a947ca