1. 16 Aug, 2011 9 commits
    • Tom Lane's avatar
      Forget about targeting catalog cache invalidations by tuple TID. · 632ae682
      Tom Lane authored
      The TID isn't stable enough: we might queue an sinval event before a VACUUM
      FULL, and then process it afterwards, when the target tuple no longer has
      the same TID.  So we must invalidate entries on the basis of hash value
      only.  The old coding can be shown to result in various bizarre,
      hard-to-reproduce errors in the presence of concurrent VACUUM FULLs on
      system catalogs, and could easily result in permanent catalog corruption,
      up to and including complete loss of tables.
      
      This commit is just a minimal fix that removes the unsafe comparison.
      We should remove transmission of the tuple TID from sinval messages
      altogether, and then arrange to suppress the extra message in the common
      case of a heap_update that doesn't change the key hashvalue.  But that's
      going to be much more invasive, and will only produce a probably-marginal
      performance gain, so it doesn't seem like material for a back-patch.
      
      Back-patch to 9.0.  Before that, VACUUM FULL refused to do any tuple moving
      if it found any INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples (and
      CLUSTER would give up altogether), so there was no risk of moving a tuple
      that might be the subject of an unsent sinval message.
      632ae682
    • Tom Lane's avatar
      Fix incorrect order of operations during sinval reset processing. · f4d7f1ad
      Tom Lane authored
      We have to be sure that we have revalidated each nailed-in-cache relcache
      entry before we try to use it to load data for some other relcache entry.
      The introduction of "mapped relations" in 9.0 broke this, because although
      we updated the state kept in relmapper.c early enough, we failed to
      propagate that information into relcache entries soon enough; in
      particular, we could try to fetch pg_class rows out of pg_class before
      we'd updated its relcache entry's rd_node.relNode value from the map.
      
      This bug accounts for Dave Gould's report of failures after "vacuum full
      pg_class", and I believe that there is risk for other system catalogs
      as well.
      
      The core part of the fix is to copy relmapper data into the relcache
      entries during "phase 1" in RelationCacheInvalidate(), before they'll be
      used in "phase 2".  To try to future-proof the code against other similar
      bugs, I also rearranged the order in which nailed relations are visited
      during phase 2: now it's pg_class first, then pg_class_oid_index, then
      other nailed relations.  This should ensure that RelationClearRelation can
      apply RelationReloadIndexInfo to all nailed indexes without risking use
      of not-yet-revalidated relcache entries.
      
      Back-patch to 9.0 where the relation mapper was introduced.
      f4d7f1ad
    • Tom Lane's avatar
      Preserve toast value OIDs in toast-swap-by-content for CLUSTER/VACUUM FULL. · 7b0d0e93
      Tom Lane authored
      This works around the problem that a catalog cache entry might contain a
      toast pointer that we try to dereference just as a VACUUM FULL completes
      on that catalog.  We will see the sinval message on the cache entry when
      we acquire lock on the toast table, but by that point we've already told
      tuptoaster.c "here's the pointer to fetch", so it's difficult from a code
      structural standpoint to update the pointer before we use it.  Much less
      painful to ensure that toast pointers are not invalidated in the first
      place.  We have to add a bit of code to deal with the case that a value
      that previously wasn't toasted becomes so; but that should be a
      seldom-exercised corner case, so the inefficiency shouldn't be significant.
      
      Back-patch to 9.0.  In prior versions, we didn't allow CLUSTER on system
      catalogs, and VACUUM FULL didn't result in reassignment of toast OIDs, so
      there was no problem.
      7b0d0e93
    • Tom Lane's avatar
      Fix race condition in relcache init file invalidation. · 2ada6779
      Tom Lane authored
      The previous code tried to synchronize by unlinking the init file twice,
      but that doesn't actually work: it leaves a window wherein a third process
      could read the already-stale init file but miss the SI messages that would
      tell it the data is stale.  The result would be bizarre failures in catalog
      accesses, typically "could not read block 0 in file ..." later during
      startup.
      
      Instead, hold RelCacheInitLock across both the unlink and the sending of
      the SI messages.  This is more straightforward, and might even be a bit
      faster since only one unlink call is needed.
      
      This has been wrong since it was put in (in 2002!), so back-patch to all
      supported releases.
      2ada6779
    • Magnus Hagander's avatar
      Adjust total size in pg_basebackup progress report when reality changes · 1bb69245
      Magnus Hagander authored
      When streaming including WAL, the size estimate will always be incorrect,
      since we don't know how much WAL is included. To make sure the output doesn't
      look completely unreasonable, this patch increases the total size whenever we
      go past the estimate, to make sure we never go above 100%.
      1bb69245
    • Heikki Linnakangas's avatar
      Fix bogus comment that claimed that the new BACKUP METHOD line in · 2877c67b
      Heikki Linnakangas authored
      backup_label was new in 9.0. Spotted by Fujii Masao.
      2877c67b
    • Peter Eisentraut's avatar
      Make pg_basebackup progress report translatable · 3b3f0935
      Peter Eisentraut authored
      Also fix a potential portability bug, because INT64_FORMAT is only
      guaranteed to be available with snprintf, not fprintf.
      3b3f0935
    • Peter Eisentraut's avatar
      Use less cryptic variable names · 005e5c30
      Peter Eisentraut authored
      005e5c30
    • Bruce Momjian's avatar
      In pg_upgrade, avoid dumping orphaned temporary tables. This makes the · 2411fbda
      Bruce Momjian authored
      pg_upgrade schema matching pattern match pg_dump/pg_dumpall.
      
      Fix for 9.0, 9.1, and 9.2.
      2411fbda
  2. 15 Aug, 2011 2 commits
  3. 14 Aug, 2011 3 commits
    • Tom Lane's avatar
      Fix unsafe order of operations in foreign-table DDL commands. · 52994e9e
      Tom Lane authored
      When updating or deleting a system catalog tuple, it's necessary to acquire
      RowExclusiveLock on the catalog before looking up the tuple; otherwise a
      concurrent VACUUM FULL on the catalog might move the tuple to a different
      TID before we can apply the update.  Coding patterns that find the tuple
      via a table scan aren't at risk here, but when obtaining the tuple from a
      catalog cache, correct ordering is important; and several routines in
      foreigncmds.c got it wrong.  Noted while running the regression tests in
      parallel with VACUUM FULL of assorted system catalogs.
      
      For consistency I moved all the heap_open calls to the starts of their
      functions, including a couple for which there was no actual bug.
      
      Back-patch to 8.4 where foreigncmds.c was added.
      52994e9e
    • Peter Eisentraut's avatar
      Message style improvements · 85612039
      Peter Eisentraut authored
      85612039
    • Peter Eisentraut's avatar
      Fix typo · 7431cb25
      Peter Eisentraut authored
      7431cb25
  4. 13 Aug, 2011 2 commits
    • Tom Lane's avatar
      Fix incorrect timeout handling during initial authentication transaction. · 592b615d
      Tom Lane authored
      The statement start timestamp was not set before initiating the transaction
      that is used to look up client authentication information in pg_authid.
      In consequence, enable_sig_alarm computed a wrong value (far in the past)
      for statement_fin_time.  That didn't have any immediate effect, because the
      timeout alarm was set without reference to statement_fin_time; but if we
      subsequently blocked on a lock for a short time, CheckStatementTimeout
      would consult the bogus value when we cancelled the lock timeout wait,
      and then conclude we'd timed out, leading to immediate failure of the
      connection attempt.  Thus an innocent "vacuum full pg_authid" would cause
      failures of concurrent connection attempts.  Noted while testing other,
      more serious consequences of vacuum full on system catalogs.
      
      We should set the statement timestamp before StartTransactionCommand(),
      so that the transaction start timestamp is also valid.  I'm not sure if
      there are any non-cosmetic effects of it not being valid, but the xact
      timestamp is at least sent to the statistics machinery.
      
      Back-patch to 9.0.  Before that, the client authentication timeout was done
      outside any transaction and did not depend on this state to be valid.
      592b615d
    • Bruce Momjian's avatar
      Make USECS_PER_* timestamp macros visible even when we are not using · 6d7bd5de
      Bruce Momjian authored
      integer timestamps.
      6d7bd5de
  5. 11 Aug, 2011 5 commits
  6. 10 Aug, 2011 5 commits
    • Tom Lane's avatar
      Remove wal_sender_delay GUC, because it's no longer useful. · cff75130
      Tom Lane authored
      The latch infrastructure is now capable of detecting all cases where the
      walsender loop needs to wake up, so there is no reason to have an arbitrary
      timeout.
      
      Also, modify the walsender loop logic to follow the standard pattern of
      ResetLatch, test for work to do, WaitLatch.  The previous coding was both
      hard to follow and buggy: it would sometimes busy-loop despite having
      nothing available to do, eg between receipt of a signal and the next time
      it was caught up with new WAL, and it also had interesting choices like
      deciding to update to WALSNDSTATE_STREAMING on the strength of information
      known to be obsolete.
      cff75130
    • Tom Lane's avatar
      Add a bit of debug logging to backend_read_statsfile(). · 79b2ee20
      Tom Lane authored
      This is in hopes of learning more about what causes "pgstat wait timeout"
      warnings in the buildfarm.  This patch should probably be reverted once
      we've learned what we can.  As coded, it will result in regression test
      "failures" at half the delay that the existing code does, so I expect
      to see a few more than before.
      79b2ee20
    • Tom Lane's avatar
      Change the autovacuum launcher to use WaitLatch instead of a poll loop. · 4dab3d5a
      Tom Lane authored
      In pursuit of this (and with the expectation that WaitLatch will be needed
      in more places), convert the latch field that was already added to PGPROC
      for sync rep into a generic latch that is activated for all PGPROC-owning
      processes, and change many of the standard backend signal handlers to set
      that latch when a signal happens.  This will allow WaitLatch callers to be
      wakened properly by these signals.
      
      In passing, fix a whole bunch of signal handlers that had been hacked to do
      things that might change errno, without adding the necessary save/restore
      logic for errno.  Also make some minor fixes in unix_latch.c, and clean
      up bizarre and unsafe scheme for disowning the process's latch.  Much of
      this has to be back-patched into 9.1.
      
      Peter Geoghegan, with additional work by Tom
      4dab3d5a
    • Heikki Linnakangas's avatar
      Oops, we're working on version 9.2 already, not 9.1. Update the · 1f1b70a7
      Heikki Linnakangas authored
      PG_CONTROL_VERSION accordingly; I updated it wrong in previous commit.
      1f1b70a7
    • Heikki Linnakangas's avatar
      If backup-end record is not seen, and we reach end of recovery from a · 41f9ffd9
      Heikki Linnakangas authored
      streamed backup, throw an error and refuse to start up. The restore has not
      finished correctly in that case and the data directory is possibly corrupt.
      We already errored out in case of archive recovery, but could not during
      crash recovery because we couldn't distinguish between the case that
      pg_start_backup() was called and the database then crashed (must not error,
      data is OK), and the case that we're restoring from a backup and not all
      the needed WAL was replayed (data can be corrupt).
      
      To distinguish those cases, add a line to backup_label to indicate
      whether the backup was taken with pg_start/stop_backup(), or by streaming
      (ie. pg_basebackup).
      
      This requires re-initdb, because of a new field added to the control file.
      41f9ffd9
  7. 09 Aug, 2011 7 commits
    • Tom Lane's avatar
      Measure WaitLatch's timeout parameter in milliseconds, not microseconds. · 9f17ffd8
      Tom Lane authored
      The original definition had the problem that timeouts exceeding about 2100
      seconds couldn't be specified on 32-bit machines.  Milliseconds seem like
      sufficient resolution, and finer grain than that would be fantasy anyway
      on many platforms.
      
      Back-patch to 9.1 so that this aspect of the latch API won't change between
      9.1 and later releases.
      
      Peter Geoghegan
      9f17ffd8
    • Tom Lane's avatar
      Documentation improvement and minor code cleanups for the latch facility. · 4e15a4db
      Tom Lane authored
      Improve the documentation around weak-memory-ordering risks, and do a pass
      of general editorialization on the comments in the latch code.  Make the
      Windows latch code more like the Unix latch code where feasible; in
      particular provide the same Assert checks in both implementations.
      Fix poorly-placed WaitLatch call in syncrep.c.
      
      This patch resolves, for the moment, concerns around weak-memory-ordering
      bugs in latch-related code: we have documented the restrictions and checked
      that existing calls meet them.  In 9.2 I hope that we will install suitable
      memory barrier instructions in SetLatch/ResetLatch, so that their callers
      don't need to be quite so careful.
      4e15a4db
    • Tom Lane's avatar
      Avoid creating PlaceHolderVars immediately within PlaceHolderVars. · cff60f2d
      Tom Lane authored
      Such a construction is useless since the lower PlaceHolderVar is already
      nullable; no need to make it more so.  Noted while pursuing bug #6154.
      
      This is just a minor planner efficiency improvement, since the final plan
      will come out the same anyway after PHVs are flattened.  So not worth the
      risk of back-patching.
      cff60f2d
    • Peter Eisentraut's avatar
      Use clearer notation for getnameinfo() return handling · f4a9da0a
      Peter Eisentraut authored
      Writing
      
          if (getnameinfo(...))
              handle_error();
      
      reads quite strangely, so use something like
      
          if (getnameinfo(...) != 0)
              handle_error();
      
      instead.
      f4a9da0a
    • Heikki Linnakangas's avatar
      Change the way string relopts are allocated. · 77949a29
      Heikki Linnakangas authored
      Don't try to allocate the default value for a string relopt in the same
      palloc chunk as the relopt_string struct. That didn't work too well if you
      added a built-in string relopt in the stringRelOpts array, as it's not
      possible to have an initializer for a variable length struct in C. This
      makes the code slightly simpler too.
      
      While we're at it, move the call to validator function in
      add_string_reloption to before the allocation, so that if someone does pass
      a bogus default value, we don't leak memory.
      77949a29
    • Heikki Linnakangas's avatar
      5b6c8436
    • Tom Lane's avatar
      Fix nested PlaceHolderVar expressions that appear only in targetlists. · 77ba2325
      Tom Lane authored
      A PlaceHolderVar's expression might contain another, lower-level
      PlaceHolderVar.  If the outer PlaceHolderVar is used, the inner one
      certainly will be also, and so we have to make sure that both of them get
      into the placeholder_list with correct ph_may_need values during the
      initial pre-scan of the query (before deconstruct_jointree starts).
      We did this correctly for PlaceHolderVars appearing in the query quals,
      but overlooked the issue for those appearing in the top-level targetlist;
      with the result that nested placeholders referenced only in the targetlist
      did not work correctly, as illustrated in bug #6154.
      
      While at it, add some error checking to find_placeholder_info to ensure
      that we don't try to create new placeholders after it's too late to do so;
      they have to all be created before deconstruct_jointree starts.
      
      Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
      77ba2325
  8. 08 Aug, 2011 4 commits
  9. 07 Aug, 2011 3 commits