1. 18 Mar, 2014 16 commits
    • Heikki Linnakangas's avatar
      Fix compilation of pg_xlogdump, now that rm_safe_restartpoint is no more. · 033dc1c9
      Heikki Linnakangas authored
      Oops. Pointed out by Andres Freund.
      033dc1c9
    • Heikki Linnakangas's avatar
      Remove rm_safe_restartpoint machinery. · 59a5ab3f
      Heikki Linnakangas authored
      It is no longer used, none of the resource managers have multi-record
      actions that would make it unsafe to perform a restartpoint.
      
      Also don't allow rm_cleanup to write WAL records, it's also no longer
      required. Move the call to rm_cleanup routines to make it more symmetric
      with rm_startup.
      59a5ab3f
    • Heikki Linnakangas's avatar
      Fix misc typos in comments. · 1d3b258c
      Heikki Linnakangas authored
      1d3b258c
    • Robert Haas's avatar
      Logical decoding documentation corrections. · 3ee4fcfc
      Robert Haas authored
      Thom Brown
      3ee4fcfc
    • Robert Haas's avatar
      Fix uninitialized variable. · a3b30d4c
      Robert Haas authored
      Report from Andres Freund, but not his fix.
      a3b30d4c
    • Heikki Linnakangas's avatar
      Make the handling of interrupted B-tree page splits more robust. · 40dae7ec
      Heikki Linnakangas authored
      Splitting a page consists of two separate steps: splitting the child page,
      and inserting the downlink for the new right page to the parent. Previously,
      we handled the case that you crash in between those steps with a cleanup
      routine after the WAL recovery had finished, which finished the incomplete
      split. However, that doesn't help if the page split is interrupted but the
      database doesn't crash, so that you don't perform WAL recovery. That could
      happen for example if you run out of disk space.
      
      Remove the end-of-recovery cleanup step. Instead, when a page is split, the
      left page is marked with a new INCOMPLETE_SPLIT flag, and when the downlink
      is inserted to the parent, the flag is cleared again. If an insertion sees
      a page with the flag set, it knows that the split was interrupted for some
      reason, and inserts the missing downlink before proceeding.
      
      I used the same approach to fix GIN and GiST split algorithms earlier. This
      was the last WAL cleanup routine, so we could get rid of that whole
      machinery now, but I'll leave that for a separate patch.
      
      Reviewed by Peter Geoghegan.
      40dae7ec
    • Tom Lane's avatar
      Fix some remaining int64 vestiges in contrib/test_shm_mq. · b6ec7c92
      Tom Lane authored
      Andres Freund and Tom Lane
      b6ec7c92
    • Robert Haas's avatar
      test_shm_mq: Use Size rather than uint64. · c676ac0f
      Robert Haas authored
      Commit 3bd261ca updated the API but
      neglected to make the corresponding edits here.
      
      Per Tom Lane and the buildfarm.
      c676ac0f
    • Robert Haas's avatar
      Documentation for logical decoding. · 49c0864d
      Robert Haas authored
      Craig Ringer, Andres Freund, Christian Kruse, with edits by me.
      49c0864d
    • Robert Haas's avatar
      Add pg_recvlogical, a tool to receive data logical decoding data. · 8bdd12bb
      Robert Haas authored
      This is fairly basic at the moment, but it's at least useful for
      testing and debugging, and possibly more.
      
      Andres Freund
      8bdd12bb
    • Robert Haas's avatar
      Rewrite comment for shm_mq_receive_bytes. · 250f8a7b
      Robert Haas authored
      The comment and the code diverged at some point before the initial
      commit of this feature, and I failed to notice.
      
      Noted by Tom Lane.
      250f8a7b
    • Tom Lane's avatar
      Fix relcache reference leak in refresh_by_match_merge(). · f7271c44
      Tom Lane authored
      One path through the loop over indexes forgot to do index_close().  Rather
      than adding a fourth call, restructure slightly so that there's only one.
      
      In passing, get rid of an unnecessary syscache lookup: the pg_index struct
      for the index is already available from its relcache entry.
      
      Per report from YAMAMOTO Takashi, though this is a bit different from his
      suggested patch.  This is new code in HEAD, so no need for back-patch.
      f7271c44
    • Robert Haas's avatar
      Improve shm_mq portability around MAXIMUM_ALIGNOF and sizeof(Size). · 3bd261ca
      Robert Haas authored
      Revise the original decision to expose a uint64-based interface and
      use Size everywhere possible.  Avoid assuming that MAXIMUM_ALIGNOF is
      8, or making any assumption about the relationship between that value
      and sizeof(Size).  If MAXIMUM_ALIGNOF is bigger, we'll now insert
      padding after the length word; if it's smaller, we are now prepared
      to read and write the length word in chunks.
      
      Per discussion with Tom Lane.
      3bd261ca
    • Tom Lane's avatar
      Fix pg_dumpall option parsing: -i doesn't take an argument. · 19f2d6cd
      Tom Lane authored
      This used to work properly, but got fat-fingered in commit
      3dee636e.  Per bug #9620 from
      Nicolas Payart.
      19f2d6cd
    • Fujii Masao's avatar
      Fix help message and document in pg_receivexlog. · e726e59d
      Fujii Masao authored
      Add SLOTNAME placeholder to --slot option in help message and
      document.
      e726e59d
    • Robert Haas's avatar
      Make it easy to detach completely from shared memory. · 79a4d24f
      Robert Haas authored
      The new function dsm_detach_all() can be used either by postmaster
      children that don't wish to take any risk of accidentally corrupting
      shared memory; or by forked children of regular backends with
      the same need.  This patch also updates the postmaster children that
      already do PGSharedMemoryDetach() to do dsm_detach_all() as well.
      
      Per discussion with Tom Lane.
      79a4d24f
  2. 17 Mar, 2014 11 commits
  3. 16 Mar, 2014 1 commit
    • Magnus Hagander's avatar
      Cleanups from the remove-native-krb5 patch · 0294023a
      Magnus Hagander authored
      krb_srvname is actually not available anymore as a parameter server-side, since
      with gssapi we accept all principals in our keytab. It's still used in libpq for
      client side specification.
      
      In passing remove declaration of krb_server_hostname, where all the functionality
      was already removed.
      
      Noted by Stephen Frost, though a different solution than his suggestion
      0294023a
  4. 15 Mar, 2014 2 commits
  5. 14 Mar, 2014 2 commits
    • Heikki Linnakangas's avatar
      Fix race condition in B-tree page deletion. · efada2b8
      Heikki Linnakangas authored
      In short, we don't allow a page to be deleted if it's the rightmost child
      of its parent, but that situation can change after we check for it.
      
      Problem
      -------
      
      We check that the page to be deleted is not the rightmost child of its
      parent, and then lock its left sibling, the page itself, its right sibling,
      and the parent, in that order. However, if the parent page is split after
      the check but before acquiring the locks, the target page might become the
      rightmost child, if the split happens at the right place. That leads to an
      error in vacuum (I reproduced this by setting a breakpoint in debugger):
      
      ERROR:  failed to delete rightmost child 41 of block 3 in index "foo_pkey"
      
      We currently re-check that the page is still the rightmost child, and throw
      the above error if it's not. We could easily just give up rather than throw
      an error, but that approach doesn't scale to half-dead pages. To recap,
      although we don't normally allow deleting the rightmost child, if the page
      is the *only* child of its parent, we delete the child page and mark the
      parent page as half-dead in one atomic operation. But before we do that, we
      check that the parent can later be deleted, by checking that it in turn is
      not the rightmost child of the grandparent (potentially recursing all the
      way up to the root). But the same situation can arise there - the
      grandparent can be split while we're not holding the locks. We end up with
      a half-dead page that we cannot delete.
      
      To make things worse, the keyspace of the deleted page has already been
      transferred to its right sibling. As the README points out, the keyspace at
      the grandparent level is "out-of-whack" until the half-dead page is deleted,
      and if enough tuples with keys in the transferred keyspace are inserted, the
      page might get split and a downlink might be inserted into the grandparent
      that is out-of-order. That might not cause any serious problem if it's
      transient (as the README ponders), but is surely bad if it stays that way.
      
      Solution
      --------
      
      This patch changes the page deletion algorithm to avoid that problem. After
      checking that the topmost page in the chain of to-be-deleted pages is not
      the rightmost child of its parent, and then deleting the pages from bottom
      up, unlink the pages from top to bottom. This way, the intermediate stages
      are similar to the intermediate stages in page splitting, and there is no
      transient stage where the keyspace is "out-of-whack". The topmost page in
      the to-be-deleted chain doesn't have a downlink pointing to it, like a page
      split before the downlink has been inserted.
      
      This also allows us to get rid of the cleanup step after WAL recovery, if we
      crash during page deletion. The deletion will be continued at next VACUUM,
      but the tree is consistent for searches and insertions at every step.
      
      This bug is old, all supported versions are affected, but this patch is too
      big to back-patch (and changes the WAL record formats of related records).
      We have not heard any reports of the bug from users, so clearly it's not
      easy to bump into. Maybe backpatch later, after this has had some field
      testing.
      
      Reviewed by Kevin Grittner and Peter Geoghegan.
      efada2b8
    • Tom Lane's avatar
      Prevent interrupts while reporting non-ERROR elog messages. · 6c461cb9
      Tom Lane authored
      This should eliminate the risk of recursive entry to syslog(3), which
      appears to be the cause of the hang reported in bug #9551 from James
      Morton.
      
      Arguably, the real problem here is auth.c's willingness to turn on
      ImmediateInterruptOK while executing fairly wide swaths of backend code.
      We may well need to work at narrowing the code ranges in which the
      authentication_timeout interrupt is enabled.  For the moment, though,
      this is a cheap and reasonably noninvasive fix for a field-reported
      failure; the other approach would be complex and not necessarily
      bug-free itself.
      
      Back-patch to all supported branches.
      6c461cb9
  6. 13 Mar, 2014 5 commits
    • Tom Lane's avatar
      Allow psql to print COPY command status in more cases. · f70a78bc
      Tom Lane authored
      Previously, psql would print the "COPY nnn" command status only for COPY
      commands executed server-side.  Now it will print that for frontend copies
      too (including \copy).  However, we continue to suppress the command status
      for COPY TO STDOUT, since in that case the copy data has been routed to the
      same place that the command status would go, and there is a risk of the
      status line being mistaken for another line of COPY data.  Doing that would
      break existing scripts, and it doesn't seem worth the benefit --- this case
      seems fairly analogous to SELECT, for which we also suppress the command
      status.
      
      Kumar Rajeev Rastogi, with substantial review by Amit Khandekar
      f70a78bc
    • Tom Lane's avatar
      Avoid transaction-commit race condition while receiving a NOTIFY message. · 7bae0284
      Tom Lane authored
      Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
      whether a NOTIFY message's originating transaction is in progress,
      committed, or aborted.  The previous coding could accept a message from a
      transaction that was still in-progress according to the PGPROC array;
      if the client were fast enough at starting a new transaction, it might fail
      to see table rows added/updated by the message-sending transaction.  Which
      of course would usually be the point of receiving the message.  We noted
      this type of race condition long ago in tqual.c, but async.c overlooked it.
      
      The race condition probably cannot occur unless there are multiple NOTIFY
      senders in action, since an individual backend doesn't send NOTIFY signals
      until well after it's done committing.  But if two senders commit in close
      succession, it's certainly possible that we could see the second sender's
      message within the race condition window while responding to the signal
      from the first one.
      
      Per bug #9557 from Marko Tiikkaja.  This patch is slightly more invasive
      than what he proposed, since it removes the now-redundant
      TransactionIdDidAbort call.
      
      Back-patch to 9.0, where the current NOTIFY implementation was introduced.
      7bae0284
    • Heikki Linnakangas's avatar
      Fix a couple of typos in docs. · 16ff08b7
      Heikki Linnakangas authored
      Thom Brown
      16ff08b7
    • Bruce Momjian's avatar
      C comments: remove odd blank lines after #ifdef WIN32 lines · 242c2737
      Bruce Momjian authored
      A few more
      242c2737
    • Bruce Momjian's avatar
  7. 12 Mar, 2014 3 commits