1. 26 Nov, 2012 2 commits
    • Tom Lane's avatar
      Revert patch for taking fewer snapshots. · 53299429
      Tom Lane authored
      This reverts commit d573e239, "Take fewer
      snapshots".  While that seemed like a good idea at the time, it caused
      execution to use a snapshot that had been acquired before locking any of
      the tables mentioned in the query.  This created user-visible anomalies
      that were not present in any prior release of Postgres, as reported by
      Tomas Vondra.  While this whole area could do with a redesign (since there
      are related cases that have anomalies anyway), it doesn't seem likely that
      any future patch would be reasonably back-patchable; and we don't want 9.2
      to exhibit a behavior that's subtly unlike either past or future releases.
      Hence, revert to prior code while we rethink the problem.
      53299429
    • Tom Lane's avatar
      Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees. · d3237e04
      Tom Lane authored
      In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is
      pretty useless (there being only one output row), but nonetheless it
      shouldn't fail.  But it could fail if "tab" is an inheritance parent,
      because planagg.c's code for fixing up equivalence classes after making the
      index-optimized MIN/MAX transformation wasn't prepared to find child-table
      versions of the aggregate expression.  The least ugly fix seems to be
      to add an option to mutate_eclass_expressions() to skip child-table
      equivalence class members, which aren't used anymore at this stage of
      planning so it's not really necessary to fix them.  Since child members
      are ignored in many cases already, it seems plausible for
      mutate_eclass_expressions() to have an option to ignore them too.
      
      Per bug #7703 from Maxim Boguk.
      
      Back-patch to 9.1.  Although the same code exists before that, it cannot
      encounter child-table aggregates AFAICS, because the index optimization
      transformation cannot succeed on inheritance trees before 9.1 (for lack
      of MergeAppend).
      d3237e04
  2. 25 Nov, 2012 2 commits
  3. 23 Nov, 2012 2 commits
  4. 22 Nov, 2012 2 commits
    • Tom Lane's avatar
      Fix pg_resetxlog to use correct path to postmaster.pid. · 455b8887
      Tom Lane authored
      Since we've already chdir'd into the data directory, the file should
      be referenced as just "postmaster.pid", without prefixing the directory
      path.  This is harmless in the normal case where an absolute PGDATA path
      is used, but quite dangerous if a relative path is specified, since the
      program might then fail to notice an active postmaster.
      
      Reported by Hari Babu.  This got broken in my commit
      eb5949d1, so patch all active versions.
      455b8887
    • Heikki Linnakangas's avatar
      Avoid bogus "out-of-sequence timeline ID" errors in standby-mode. · 24c19e6b
      Heikki Linnakangas authored
      When startup process opens a WAL segment after replaying part of it, it
      validates the first page on the WAL segment, even though the page it's
      really interested in later in the file. As part of the validation, it checks
      that the TLI on the page header is >= the TLI it saw on the last page it
      read. If the segment contains a timeline switch, and we have already
      replayed it, and then re-open the WAL segment (because of streaming
      replication got disconnected and reconnected, for example), the TLI check
      will fail when the first page is validated. Fix that by relaxing the TLI
      check when re-opening a WAL segment.
      
      Backpatch to 9.0. Earlier versions had the same code, but before standby
      mode was introduced in 9.0, recovery never tried to re-read a segment after
      partially replaying it.
      
      Reported by Amit Kapila, while testing a new feature.
      24c19e6b
  5. 21 Nov, 2012 2 commits
    • Tom Lane's avatar
      Don't launch new child processes after we've been told to shut down. · 27b2c6a1
      Tom Lane authored
      Once we've received a shutdown signal (SIGINT or SIGTERM), we should not
      launch any more child processes, even if we get signals requesting such.
      The normal code path for spawning backends has always understood that,
      but the postmaster's infrastructure for hot standby and autovacuum didn't
      get the memo.  As reported by Hari Babu in bug #7643, this could lead to
      failure to shut down at all in some cases, such as when SIGINT is received
      just before the startup process sends PMSIGNAL_RECOVERY_STARTED: we'd
      launch a bgwriter and checkpointer, and then those processes would have no
      idea that they ought to quit.  Similarly, launching a new autovacuum worker
      would result in waiting till it finished before shutting down.
      
      Also, switch the order of the code blocks in reaper() that detect startup
      process crash versus shutdown termination.  Once we've sent it a signal,
      we should not consider that exit(1) is surprising.  This is just a cosmetic
      fix since shutdown occurs correctly anyway, but better not to log a phony
      complaint about startup process crash.
      
      Back-patch to 9.0.  Some parts of this might be applicable before that,
      but given the lack of prior complaints I'm not going to worry too much
      about older branches.
      27b2c6a1
    • Heikki Linnakangas's avatar
      Speed up operations on numeric, mostly by avoiding palloc() overhead. · 5cb0e335
      Heikki Linnakangas authored
      In many functions, a NumericVar was initialized from an input Numeric, to be
      passed as input to a calculation function. When the NumericVar is not
      modified, the digits array of the NumericVar can point directly to the digits
      array in the original Numeric, and we can avoid a palloc() and memcpy(). Add
      init_var_from_num() function to initialize a var like that.
      
      Remove dscale argument from get_str_from_var(), as all the callers just
      passed the dscale of the variable. That means that the rounding it used to
      do was not actually necessary, and get_str_from_var() no longer scribbles on
      its input. That makes it safer in general, and allows us to use the new
      init_var_from_num() function in e.g numeric_out().
      
      Also modified numericvar_to_int8() to no scribble on its input either. It
      creates a temporary copy to avoid that. To compensate, the callers no longer
      need to create a temporary copy, so the net # of pallocs is the same, but this
      is nicer.
      
      In the passing, use a constant for the number 10 in get_str_from_var_sci(),
      when calculating 10^exponent. Saves a palloc() and some cycles to convert
      integer 10 to numeric.
      
      Original patch by Kyotaro HORIGUCHI, with further changes by me. Reviewed
      by Pavel Stehule.
      5cb0e335
  6. 19 Nov, 2012 3 commits
    • Bruce Momjian's avatar
      In pg_upgrade, report errno string if file existence check returns an · b55743a5
      Bruce Momjian authored
      error and errno != ENOENT.
      b55743a5
    • Tom Lane's avatar
      Improve handling of INT_MIN / -1 and related cases. · 1f7cb5c3
      Tom Lane authored
      Some platforms throw an exception for this division, rather than returning
      a necessarily-overflowed result.  Since we were testing for overflow after
      the fact, an exception isn't nice.  We can avoid the problem by treating
      division by -1 as negation.
      
      Add some regression tests so that we'll find out if any compilers try to
      optimize away the overflow check conditions.
      
      This ought to be back-patched, but I'm going to see what the buildfarm
      reports about the regression tests first.
      
      Per discussion with Xi Wang, though this is different from the patch he
      submitted.
      1f7cb5c3
    • Heikki Linnakangas's avatar
      Fix archive_cleanup_command. · 644a0a63
      Heikki Linnakangas authored
      When I moved ExecuteRecoveryCommand() from xlog.c to xlogarchive.c, I didn't
      realize that it's called from the checkpoint process, not the startup
      process. I tried to use InRedo variable to decide whether or not to attempt
      cleaning up the archive (must not do so before we have read the initial
      checkpoint record), but that variable is only valid within the startup
      process.
      
      Instead, let ExecuteRecoveryCommand() always clean up the archive, and add
      an explicit argument to RestoreArchivedFile() to say whether that's allowed
      or not. The caller knows better.
      
      Reported by Erik Rijkers, diagnosis by Fujii Masao. Only 9.3devel is
      affected.
      644a0a63
  7. 18 Nov, 2012 3 commits
    • Tom Lane's avatar
      Limit values of archive_timeout, post_auth_delay, auth_delay.milliseconds. · b6e3798f
      Tom Lane authored
      The previous definitions of these GUC variables allowed them to range
      up to INT_MAX, but in point of fact the underlying code would suffer
      overflows or other errors with large values.  Reduce the maximum values
      to something that won't misbehave.  There's no apparent value in working
      harder than this, since very large delays aren't sensible for any of
      these.  (Note: the risk with archive_timeout is that if we're late
      checking the state, the timestamp difference it's being compared to
      might overflow.  So we need some amount of slop; the choice of INT_MAX/2
      is arbitrary.)
      
      Per followup investigation of bug #7670.  Although this isn't a very
      significant fix, might as well back-patch.
      b6e3798f
    • Tom Lane's avatar
      Fix syslogger to not fail when log_rotation_age exceeds 2^31 milliseconds. · d038966d
      Tom Lane authored
      We need to avoid calling WaitLatch with timeouts exceeding INT_MAX.
      Fortunately a simple clamp will do the trick, since no harm is done if
      the wait times out before it's really time to rotate the log file.
      Per bug #7670 (probably bug #7545 is the same thing, too).
      
      In passing, fix bogus definition of log_rotation_age's maximum value in
      guc.c --- it was numerically right, but only because MINS_PER_HOUR and
      SECS_PER_MINUTE have the same value.
      
      Back-patch to 9.2.  Before that, syslogger wasn't using WaitLatch.
      d038966d
    • Tom Lane's avatar
      Assert that WaitLatch's timeout is not more than INT_MAX milliseconds. · 14ddff44
      Tom Lane authored
      The behavior with larger values is unspecified by the Single Unix Spec.
      It appears that BSD-derived kernels report EINVAL, although Linux does not.
      If waiting for longer intervals is desired, the calling code has to do
      something to limit the delay; we can't portably fix it here since "long"
      may not be any wider than "int" in the first place.
      
      Part of response to bug #7670, though this change doesn't fix that
      (in fact, it converts the problem from an ERROR into an Assert failure).
      No back-patch since it's just an assertion addition.
      14ddff44
  8. 17 Nov, 2012 1 commit
  9. 16 Nov, 2012 2 commits
    • Peter Eisentraut's avatar
    • Tom Lane's avatar
      Improve check_partial_indexes() to consider join clauses in proof attempts. · 1746ba92
      Tom Lane authored
      Traditionally check_partial_indexes() has only looked at restriction
      clauses while trying to prove partial indexes usable in queries.  However,
      join clauses can also be used in some cases; mainly, that a strict operator
      on "x" proves an "x IS NOT NULL" index predicate, even if the operator is
      in a join clause rather than a restriction clause.  Adding this code fixes
      a regression in 9.2, because previously we would take join clauses into
      account when considering whether a partial index could be used in a
      nestloop inner indexscan path.  9.2 doesn't handle nestloop inner
      indexscans in the same way, and this consideration was overlooked in the
      rewrite.  Moving the work to check_partial_indexes() is a better solution
      anyway, since the proof applies whether or not we actually use the index
      in that particular way, and we don't have to do it over again for each
      possible outer relation.  Per report from Dave Cramer.
      1746ba92
  10. 15 Nov, 2012 2 commits
  11. 14 Nov, 2012 4 commits
  12. 13 Nov, 2012 6 commits
    • Tom Lane's avatar
      Fix memory leaks in record_out() and record_send(). · 273986bf
      Tom Lane authored
      record_out() leaks memory: it fails to free the strings returned by the
      per-column output functions, and also is careless about detoasted values.
      This results in a query-lifespan memory leakage when returning composite
      values to the client, because printtup() runs the output functions in the
      query-lifespan memory context.  Fix it to handle these issues the same way
      printtup() does.  Also fix a similar leakage in record_send().
      
      (At some point we might want to try to run output functions in
      shorter-lived memory contexts, so that we don't need a zero-leakage policy
      for them.  But that would be a significantly more invasive patch, which
      doesn't seem like material for back-patching.)
      
      In passing, use appendStringInfoCharMacro instead of appendStringInfoChar
      in the innermost data-copying loop of record_out, to try to shave a few
      cycles from this function's runtime.
      
      Per trouble report from Carlos Henrique Reimer.  Back-patch to all
      supported versions.
      273986bf
    • Simon Riggs's avatar
      Skip searching for subxact locks at commit. · d9fad107
      Simon Riggs authored
      At commit all standby locks are released
      for the top-level transaction, so searching
      for locks for each subtransaction is both
      pointless and costly (N^2) in the presence
      of many AccessExclusiveLocks.
      d9fad107
    • Simon Riggs's avatar
      Clarify docs on hot standby lock release · 68f7fe14
      Simon Riggs authored
      Andres Freund and Simon Riggs
      68f7fe14
    • Peter Eisentraut's avatar
      8f40ad1f
    • Tom Lane's avatar
      Fix multiple problems in WAL replay. · 3bbf668d
      Tom Lane authored
      Most of the replay functions for WAL record types that modify more than
      one page failed to ensure that those pages were locked correctly to ensure
      that concurrent queries could not see inconsistent page states.  This is
      a hangover from coding decisions made long before Hot Standby was added,
      when it was hardly necessary to acquire buffer locks during WAL replay
      at all, let alone hold them for carefully-chosen periods.
      
      The key problem was that RestoreBkpBlocks was written to hold lock on each
      page restored from a full-page image for only as long as it took to update
      that page.  This was guaranteed to break any WAL replay function in which
      there was any update-ordering constraint between pages, because even if the
      nominal order of the pages is the right one, any mixture of full-page and
      non-full-page updates in the same record would result in out-of-order
      updates.  Moreover, it wouldn't work for situations where there's a
      requirement to maintain lock on one page while updating another.  Failure
      to honor an update ordering constraint in this way is thought to be the
      cause of bug #7648 from Daniel Farina: what seems to have happened there
      is that a btree page being split was rewritten from a full-page image
      before the new right sibling page was written, and because lock on the
      original page was not maintained it was possible for hot standby queries to
      try to traverse the page's right-link to the not-yet-existing sibling page.
      
      To fix, get rid of RestoreBkpBlocks as such, and instead create a new
      function RestoreBackupBlock that restores just one full-page image at a
      time.  This function can be invoked by WAL replay functions at the points
      where they would otherwise perform non-full-page updates; in this way, the
      physical order of page updates remains the same no matter which pages are
      replaced by full-page images.  We can then further adjust the logic in
      individual replay functions if it is necessary to hold buffer locks
      for overlapping periods.  A side benefit is that we can simplify the
      handling of concurrency conflict resolution by moving that code into the
      record-type-specfic functions; there's no more need to contort the code
      layout to keep conflict resolution in front of the RestoreBkpBlocks call.
      
      In connection with that, standardize on zero-based numbering rather than
      one-based numbering for referencing the full-page images.  In HEAD, I
      removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4.  They are
      still there in the header files in previous branches, but are no longer
      used by the code.
      
      In addition, fix some other bugs identified in the course of making these
      changes:
      
      spgRedoAddNode could fail to update the parent downlink at all, if the
      parent tuple is in the same page as either the old or new split tuple and
      we're not doing a full-page image: it would get fooled by the LSN having
      been advanced already.  This would result in permanent index corruption,
      not just transient failure of concurrent queries.
      
      Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
      tail page as a candidate for a full-page image; in the worst case this
      could result in torn-page corruption.
      
      heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
      exclusive lock: it did the former in the normal path but the latter for a
      full-page image.  A plain exclusive lock seems sufficient, so change to
      that.
      
      Also, remove gistRedoPageDeleteRecord(), which has been dead code since
      VACUUM FULL was rewritten.
      
      Back-patch to 9.0, where hot standby was introduced.  Note however that 9.0
      had a significantly different WAL-logging scheme for GIST index updates,
      and it doesn't appear possible to make that scheme safe for concurrent hot
      standby queries, because it can leave inconsistent states in the index even
      between WAL records.  Given the lack of complaints from the field, we won't
      work too hard on fixing that branch.
      3bbf668d
    • Peter Eisentraut's avatar
      Use a stamp file for the XSLT HTML doc build · 9b3ac49e
      Peter Eisentraut authored
      This way it works more like the DSSSL build, and dependencies are
      tracked better by make.
      
      Also copy the CSS stylesheet to the html directory.  This was forgotten
      when the output directory was changed.
      9b3ac49e
  13. 12 Nov, 2012 5 commits
    • Heikki Linnakangas's avatar
      Oops, have to rename local variables called 'errcontext' in contrib, too. · d092d116
      Heikki Linnakangas authored
      As pointed out by Alvaro.
      d092d116
    • Heikki Linnakangas's avatar
      Use correct text domain for translating errcontext() messages. · dbdf9679
      Heikki Linnakangas authored
      errcontext() is typically used in an error context callback function, not
      within an ereport() invocation like e.g errmsg and errdetail are. That means
      that the message domain that the TEXTDOMAIN magic in ereport() determines
      is not the right one for the errcontext() calls. The message domain needs to
      be determined by the C file containing the errcontext() call, not the file
      containing the ereport() call.
      
      Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use
      for the errcontext message. "errcontext" was used in a few places as a
      variable or struct field name, I had to rename those out of the way, now
      that errcontext is a macro.
      
      We've had this problem all along, but this isn't doesn't seem worth
      backporting. It's a fairly minor issue, and turning errcontext from a
      function to a macro requires at least a recompile of any external code that
      calls errcontext().
      dbdf9679
    • Heikki Linnakangas's avatar
      Silence "expression result unused" warnings in AssertVariableIsOfTypeMacro · c9d44a75
      Heikki Linnakangas authored
      At least clang 3.1 generates those warnings. Prepend (void) to avoid them,
      like we have in AssertMacro.
      c9d44a75
    • Peter Eisentraut's avatar
      doc: "only relevant" -> "relevant only" · 42218f29
      Peter Eisentraut authored
      Karl O. Pinc
      42218f29
    • Tom Lane's avatar
      Check for stack overflow in transformSetOperationTree(). · 34f3b396
      Tom Lane authored
      Since transformSetOperationTree() recurses, it can be driven to stack
      overflow with enough UNION/INTERSECT/EXCEPT clauses in a query.  Add a
      check to ensure it fails cleanly instead of crashing.  Per report from
      Matthew Gerber (though it's not clear whether this is the only thing
      going wrong for him).
      
      Historical note: I think the reasoning behind not putting a check here in
      the beginning was that the check in transformExpr() ought to be sufficient
      to guard the whole parser.  However, because transformSetOperationTree()
      recurses all the way to the bottom of the set-operation tree before doing
      any analysis of the statement's expressions, that check doesn't save it.
      34f3b396
  14. 09 Nov, 2012 3 commits
    • Alvaro Herrera's avatar
      Remove leftover LWLockRelease() call · fa12cb7f
      Alvaro Herrera authored
      This code was refactored in d5497b95 but an extra LWLockRelease call was
      left behind.
      
      Per report from Erik Rijkers
      fa12cb7f
    • Peter Eisentraut's avatar
      XSLT stylesheet: Add slash to directory name · 732740e7
      Peter Eisentraut authored
      Some versions of the XSLT stylesheets don't handle the missing slash
      correctly (they concatenate directory and file name without the slash).
      This might never have worked correctly.
      732740e7
    • Tom Lane's avatar
      Fix WaitLatch() to return promptly when the requested timeout expires. · 3e7fdcff
      Tom Lane authored
      If the sleep is interrupted by a signal, we must recompute the remaining
      time to wait; otherwise, a steady stream of non-wait-terminating interrupts
      could delay return from WaitLatch indefinitely.  This has been shown to be
      a problem for the autovacuum launcher, and there may well be other places
      now or in the future with similar issues.  So we'd better make the function
      robust, even though this'll add at least one gettimeofday call per wait.
      
      Back-patch to 9.2.  We might eventually need to fix 9.1 as well, but the
      code is quite different there, and the usage of WaitLatch in 9.1 is so
      limited that it's not clearly important to do so.
      
      Reported and diagnosed by Jeff Janes, though I rewrote his patch rather
      heavily.
      3e7fdcff
  15. 08 Nov, 2012 1 commit
    • Tom Lane's avatar
      Rename ResolveNew() to ReplaceVarsFromTargetList(), and tweak its API. · dcc55dd2
      Tom Lane authored
      This function currently lacks the option to throw error if the provided
      targetlist doesn't have any matching entry for a Var to be replaced.
      Two of the four existing call sites would be better off with an error,
      as would the usage in the pending auto-updatable-views patch, so it seems
      past time to extend the API to support that.  To do so, replace the "event"
      parameter (historically of type CmdType, though it was declared plain int)
      with a special-purpose enum type.
      
      It's unclear whether this function might be called by third-party code.
      Since many C compilers wouldn't warn about a call site continuing to use
      the old calling convention, rename the function to forcibly break any
      such code that hasn't been updated.  The old name was none too well chosen
      anyhow.
      dcc55dd2