1. 14 Dec, 2012 4 commits
  2. 13 Dec, 2012 2 commits
    • Heikki Linnakangas's avatar
      Allow a streaming replication standby to follow a timeline switch. · abfd192b
      Heikki Linnakangas authored
      Before this patch, streaming replication would refuse to start replicating
      if the timeline in the primary doesn't exactly match the standby. The
      situation where it doesn't match is when you have a master, and two
      standbys, and you promote one of the standbys to become new master.
      Promoting bumps up the timeline ID, and after that bump, the other standby
      would refuse to continue.
      
      There's significantly more timeline related logic in streaming replication
      now. First of all, when a standby connects to primary, it will ask the
      primary for any timeline history files that are missing from the standby.
      The missing files are sent using a new replication command TIMELINE_HISTORY,
      and stored in standby's pg_xlog directory. Using the timeline history files,
      the standby can follow the latest timeline present in the primary
      (recovery_target_timeline='latest'), just as it can follow new timelines
      appearing in an archive directory.
      
      START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
      timeline to stream WAL from. This allows the standby to request the primary
      to send over WAL that precedes the promotion. The replication protocol is
      changed slightly (in a backwards-compatible way although there's little hope
      of streaming replication working across major versions anyway), to allow
      replication to stop when the end of timeline reached, putting the walsender
      back into accepting a replication command.
      
      Many thanks to Amit Kapila for testing and reviewing various versions of
      this patch.
      abfd192b
    • Heikki Linnakangas's avatar
      Make xlog_internal.h includable in frontend context. · 52766871
      Heikki Linnakangas authored
      This makes unnecessary the ugly hack used to #include postgres.h in
      pg_basebackup.
      
      Based on Alvaro Herrera's patch
      52766871
  3. 12 Dec, 2012 3 commits
    • Heikki Linnakangas's avatar
      In multi-insert, don't go into infinite loop on a huge tuple and fillfactor. · 6264cd3d
      Heikki Linnakangas authored
      If a tuple is larger than page size minus space reserved for fillfactor,
      heap_multi_insert would never find a page that it fits in and repeatedly ask
      for a new page from RelationGetBufferForTuple. If a tuple is too large to
      fit on any page, taking fillfactor into account, RelationGetBufferForTuple
      will always expand the relation. In a normal insert, heap_insert will accept
      that and put the tuple on the new page. heap_multi_insert, however, does a
      fillfactor check of its own, and doesn't accept the newly-extended page
      RelationGetBufferForTuple returns, even though there is no other choice to
      make the tuple fit.
      
      Fix that by making the logic in heap_multi_insert more like the heap_insert
      logic. The first tuple is always put on the page RelationGetBufferForTuple
      gives us, and the fillfactor check is only applied to the subsequent tuples.
      
      Report from David Gould, although I didn't use his patch.
      6264cd3d
    • Tom Lane's avatar
      Add defenses against integer overflow in dynahash numbuckets calculations. · 691c5ebf
      Tom Lane authored
      The dynahash code requires the number of buckets in a hash table to fit
      in an int; but since we calculate the desired hash table size dynamically,
      there are various scenarios where we might calculate too large a value.
      The resulting overflow can lead to infinite loops, division-by-zero
      crashes, etc.  I (tgl) had previously installed some defenses against that
      in commit 299d1716, but that covered only one
      call path.  Moreover it worked by limiting the request size to work_mem,
      but in a 64-bit machine it's possible to set work_mem high enough that the
      problem appears anyway.  So let's fix the problem at the root by installing
      limits in the dynahash.c functions themselves.
      
      Trouble report and patch by Jeff Davis.
      691c5ebf
    • Tom Lane's avatar
      Disable event triggers in standalone mode. · cd3413ec
      Tom Lane authored
      Per discussion, this seems necessary to allow recovery from broken event
      triggers, or broken indexes on pg_event_trigger.
      
      Dimitri Fontaine
      cd3413ec
  4. 11 Dec, 2012 6 commits
    • Kevin Grittner's avatar
      Fix performance problems with autovacuum truncation in busy workloads. · b19e4250
      Kevin Grittner authored
      In situations where there are over 8MB of empty pages at the end of
      a table, the truncation work for trailing empty pages takes longer
      than deadlock_timeout, and there is frequent access to the table by
      processes other than autovacuum, there was a problem with the
      autovacuum worker process being canceled by the deadlock checking
      code. The truncation work done by autovacuum up that point was
      lost, and the attempt tried again by a later autovacuum worker. The
      attempts could continue indefinitely without making progress,
      consuming resources and blocking other processes for up to
      deadlock_timeout each time.
      
      This patch has the autovacuum worker checking whether it is
      blocking any other thread at 20ms intervals. If such a condition
      develops, the autovacuum worker will persist the work it has done
      so far, release its lock on the table, and sleep in 50ms intervals
      for up to 5 seconds, hoping to be able to re-acquire the lock and
      try again. If it is unable to get the lock in that time, it moves
      on and a worker will try to continue later from the point this one
      left off.
      
      While this patch doesn't change the rules about when and what to
      truncate, it does cause the truncation to occur sooner, with less
      blocking, and with the consumption of fewer resources when there is
      contention for the table's lock.
      
      The only user-visible change other than improved performance is
      that the table size during truncation may change incrementally
      instead of just once.
      
      This problem exists in all supported versions but is infrequently
      reported, although some reports of performance problems when
      autovacuum runs might be caused by this. Initial commit is just the
      master branch, but this should probably be backpatched once the
      build farm and general developer usage confirm that there are no
      surprising effects.
      
      Jan Wieck
      b19e4250
    • Bruce Momjian's avatar
      Fix pg_upgrade for invalid indexes · e95c4bd1
      Bruce Momjian authored
      All versions of pg_upgrade upgraded invalid indexes caused by CREATE
      INDEX CONCURRENTLY failures and marked them as valid.  The patch adds a
      check to all pg_upgrade versions and throws an error during upgrade or
      --check.
      
      Backpatch to 9.2, 9.1, 9.0.  Patch slightly adjusted.
      e95c4bd1
    • Heikki Linnakangas's avatar
      Consistency check should compare last record replayed, not last record read. · 970fb12d
      Heikki Linnakangas authored
      EndRecPtr is the last record that we've read, but not necessarily yet
      replayed. CheckRecoveryConsistency should compare minRecoveryPoint with the
      last replayed record instead. This caused recovery to think it's reached
      consistency too early.
      
      Now that we do the check in CheckRecoveryConsistency correctly, we have to
      move the call of that function to after redoing a record. The current place,
      after reading a record but before replaying it, is wrong. In particular, if
      there are no more records after the one ending at minRecoveryPoint, we don't
      enter hot standby until one extra record is generated and read by the
      standby, and CheckRecoveryConsistency is called. These two bugs conspired
      to make the code appear to work correctly, except for the small window
      between reading the last record that reaches minRecoveryPoint, and
      replaying it.
      
      In the passing, rename recoveryLastRecPtr, which is the last record
      replayed, to lastReplayedEndRecPtr. This makes it slightly less confusing
      with replayEndRecPtr, which is the last record read that we're about to
      replay.
      
      Original report from Kyotaro HORIGUCHI, further diagnosis by Fujii Masao.
      Backpatch to 9.0, where Hot Standby subtly changed the test from
      "minRecoveryPoint < EndRecPtr" to "minRecoveryPoint <= EndRecPtr". The
      former works because where the test is performed, we have always read one
      more record than we've replayed.
      970fb12d
    • Andrew Dunstan's avatar
      Add mode where contrib installcheck runs each module in a separately named database. · ad69bd05
      Andrew Dunstan authored
      Normally each module is tested in a database named contrib_regression,
      which is dropped and recreated at the beginhning of each pg_regress run.
      This new mode, enabled by adding USE_MODULE_DB=1 to the make command
      line, runs most modules in a database with the module name embedded in
      it.
      
      This will make testing pg_upgrade on clusters with the contrib modules
      a lot easier.
      
      Second attempt at this, this time accomodating make versions older
      than 3.82.
      
      Still to be done: adapt to the MSVC build system.
      
      Backpatch to 9.0, which is the earliest version it is reasonably
      possible to test upgrading from.
      ad69bd05
    • Bruce Momjian's avatar
      Fix pg_upgrade -O/-o options · acdb8c22
      Bruce Momjian authored
      Fix previous commit that added synchronous_commit=off, but broke -O/-o
      due to missing space in argument passing.
      
      Backpatch to 9.2.
      acdb8c22
    • Peter Eisentraut's avatar
      doc: Remove blastwave.org link · 8e48d77c
      Peter Eisentraut authored
      Apparently, this service has been dead since 2008.
      8e48d77c
  5. 10 Dec, 2012 2 commits
    • Heikki Linnakangas's avatar
      Update minimum recovery point on truncation. · 7bffc9b7
      Heikki Linnakangas authored
      If a file is truncated, we must update minRecoveryPoint. Once a file is
      truncated, there's no going back; it would not be safe to stop recovery
      at a point earlier than that anymore.
      
      Per report from Kyotaro HORIGUCHI. Backpatch to 8.4. Before that,
      minRecoveryPoint was not updated during recovery at all.
      7bffc9b7
    • Heikki Linnakangas's avatar
      Fix the tracking of min recovery point timeline. · 6be79966
      Heikki Linnakangas authored
      Forgot to update it at the right place. Also, consider checkpoint record
      that switches to new timelne to be on the new timeline.
      
      This fixes erroneous "requested timeline 2 does not contain minimum recovery
      point" errors, pointed out by Amit Kapila while testing another patch.
      6be79966
  6. 09 Dec, 2012 1 commit
    • Tom Lane's avatar
      Fix assorted bugs in privileges-for-types patch. · b46c9211
      Tom Lane authored
      Commit 72920557 added privileges on data
      types, but there were a number of oversights.  The implementation of
      default privileges for types missed a few places, and pg_dump was
      utterly innocent of the whole concept.  Per bug #7741 from Nathan Alden,
      and subsequent wider investigation.
      b46c9211
  7. 08 Dec, 2012 2 commits
    • Tom Lane's avatar
      Support automatically-updatable views. · a99c42f2
      Tom Lane authored
      This patch makes "simple" views automatically updatable, without the need
      to create either INSTEAD OF triggers or INSTEAD rules.  "Simple" views
      are those classified as updatable according to SQL-92 rules.  The rewriter
      transforms INSERT/UPDATE/DELETE commands on such views directly into an
      equivalent command on the underlying table, which will generally have
      noticeably better performance than is possible with either triggers or
      user-written rules.  A view that has INSTEAD OF triggers or INSTEAD rules
      continues to operate the same as before.
      
      For the moment, security_barrier views are not considered simple.
      Also, we do not support WITH CHECK OPTION.  These features may be
      added in future.
      
      Dean Rasheed, reviewed by Amit Kapila
      a99c42f2
    • Peter Eisentraut's avatar
      Update iso.org page link · d12d9f59
      Peter Eisentraut authored
      The old one is responding with 404.
      d12d9f59
  8. 07 Dec, 2012 5 commits
    • Bruce Momjian's avatar
      Improve pg_upgrade's status display · 6dd95845
      Bruce Momjian authored
      Pg_upgrade displays file names during copy and database names during
      dump/restore.  Andrew Dunstan identified three bugs:
      
      *  long file names were being truncated to 60 _leading_ characters, which
         often do not change for long file names
      
      *  file names were truncated to 60 characters in log files
      
      *  carriage returns were being output to log files
      
      This commit fixes these --- it prints 60 _trailing_ characters to the
      status display, and full path names without carriage returns to log
      files.  It also suppresses status output to the log file unless verbose
      mode is used.
      6dd95845
    • Simon Riggs's avatar
      Correct xmax test for COPY FREEZE · ef754fb5
      Simon Riggs authored
      ef754fb5
    • Simon Riggs's avatar
      Optimize COPY FREEZE with CREATE TABLE also. · 1f023f92
      Simon Riggs authored
      Jeff Davis, additional test by me
      1f023f92
    • Simon Riggs's avatar
      Clarify that COPY FREEZE is not a hard rule. · 1eb6cee4
      Simon Riggs authored
      Remove message when FREEZE not honoured,
      clarify reasons in comments and docs.
      1eb6cee4
    • Tom Lane's avatar
      Improve pl/pgsql to support composite-type expressions in RETURN. · 31a89185
      Tom Lane authored
      For some reason lost in the mists of prehistory, RETURN was only coded to
      allow a simple reference to a composite variable when the function's return
      type is composite.  Allow an expression instead, while preserving the
      efficiency of the original code path in the case where the expression is
      indeed just a composite variable's name.  Likewise for RETURN NEXT.
      
      As is true in various other places, the supplied expression must yield
      exactly the number and data types of the required columns.  There was some
      discussion of relaxing that for pl/pgsql, but no consensus yet, so this
      patch doesn't address that.
      
      Asif Rehman, reviewed by Pavel Stehule
      31a89185
  9. 06 Dec, 2012 3 commits
    • Alvaro Herrera's avatar
      Background worker processes · da07a1e8
      Alvaro Herrera authored
      Background workers are postmaster subprocesses that run arbitrary
      user-specified code.  They can request shared memory access as well as
      backend database connections; or they can just use plain libpq frontend
      database connections.
      
      Modules listed in shared_preload_libraries can register background
      workers in their _PG_init() function; this is early enough that it's not
      necessary to provide an extra GUC option, because the necessary extra
      resources can be allocated early on.  Modules can install more than one
      bgworker, if necessary.
      
      Care is taken that these extra processes do not interfere with other
      postmaster tasks: only one such process is started on each ServerLoop
      iteration.  This means a large number of them could be waiting to be
      started up and postmaster is still able to quickly service external
      connection requests.  Also, shutdown sequence should not be impacted by
      a worker process that's reasonably well behaved (i.e. promptly responds
      to termination signals.)
      
      The current implementation lets worker processes specify their start
      time, i.e. at what point in the server startup process they are to be
      started: right after postmaster start (in which case they mustn't ask
      for shared memory access), when consistent state has been reached
      (useful during recovery in a HOT standby server), or when recovery has
      terminated (i.e. when normal backends are allowed).
      
      In case of a bgworker crash, actions to take depend on registration
      data: if shared memory was requested, then all other connections are
      taken down (as well as other bgworkers), just like it were a regular
      backend crashing.  The bgworker itself is restarted, too, within a
      configurable timeframe (which can be configured to be never).
      
      More features to add to this framework can be imagined without much
      effort, and have been discussed, but this seems good enough as a useful
      unit already.
      
      An elementary sample module is supplied.
      
      Author: Álvaro Herrera
      
      This patch is loosely based on prior patches submitted by KaiGai Kohei,
      and unsubmitted code by Simon Riggs.
      
      Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund,
      Heikki Linnakangas, Simon Riggs, Amit Kapila
      da07a1e8
    • Tom Lane's avatar
      Fix intermittent crash in DROP INDEX CONCURRENTLY. · e31d5248
      Tom Lane authored
      When deleteOneObject closes and reopens the pg_depend relation,
      we must see to it that the relcache pointer held by the calling function
      (typically performMultipleDeletions) is updated.  Usually the relcache
      entry is retained so that the pointer value doesn't change, which is why
      the problem had escaped notice ... but after a cache flush event there's
      no guarantee that the same memory will be reassigned.  To fix, change
      the recursive functions' APIs so that we pass around a "Relation *"
      not just "Relation".
      
      Per investigation of occasional buildfarm failures.  This is trivial
      to reproduce with -DCLOBBER_CACHE_ALWAYS, which points up the sad
      lack of any buildfarm member running that way on a regular basis.
      e31d5248
    • Alvaro Herrera's avatar
      Update comment at top of index_create · 5e15cdb2
      Alvaro Herrera authored
      I neglected to update it in commit f4c4335a.
      
      Michael Paquier
      5e15cdb2
  10. 05 Dec, 2012 4 commits
    • Tom Lane's avatar
      Ensure recovery pause feature doesn't pause unless users can connect. · af4aba2f
      Tom Lane authored
      If we're not in hot standby mode, then there's no way for users to connect
      to reset the recoveryPause flag, so we shouldn't pause.  The code was aware
      of this but the test to see if pausing was safe was seriously inadequate:
      it wasn't paying attention to reachedConsistency, and besides what it was
      testing was that we could legally enter hot standby, not that we have
      done so.  Get rid of that in favor of checking LocalHotStandbyActive,
      which because of the coding in CheckRecoveryConsistency is tantamount to
      checking that we have told the postmaster to enter hot standby.
      
      Also, move the recoveryPausesHere() call that reacts to asynchronous
      recoveryPause requests so that it's not in the middle of application of a
      WAL record.  I put it next to the recoveryStopsHere() call --- in future
      those are going to need to interact significantly, so this seems like a
      good waystation.
      
      Also, don't bother trying to read another WAL record if we've already
      decided not to continue recovery.  This was no big deal when the code was
      written originally, but now that reading a record might entail actions like
      fetching an archive file, it seems a bit silly to do it like that.
      
      Per report from Jeff Janes and subsequent discussion.  The pause feature
      needs quite a lot more work, but this gets rid of some indisputable bugs,
      and seems safe enough to back-patch.
      af4aba2f
    • Heikki Linnakangas's avatar
    • Simon Riggs's avatar
      Must not reach consistency before XLOG_BACKUP_RECORD · 6aa2e49a
      Simon Riggs authored
      When waiting for an XLOG_BACKUP_RECORD the minRecoveryPoint
      will be incorrect, so we must not declare recovery as consistent
      before we have seen the record. Major bug allowing recovery to end
      too early in some cases, allowing people to see inconsistent db.
      This patch to HEAD and 9.2, other fix required for 9.1 and 9.0
      
      Simon Riggs and Andres Freund, bug report by Jeff Janes
      6aa2e49a
    • Heikki Linnakangas's avatar
      Add pgstatginindex() function to get the size of the GIN pending list. · 357cbaae
      Heikki Linnakangas authored
      Fujii Masao, reviewed by Kyotaro Horiguchi.
      357cbaae
  11. 04 Dec, 2012 8 commits