1. 13 Nov, 2012 5 commits
    • Simon Riggs's avatar
      Skip searching for subxact locks at commit. · d9fad107
      Simon Riggs authored
      At commit all standby locks are released
      for the top-level transaction, so searching
      for locks for each subtransaction is both
      pointless and costly (N^2) in the presence
      of many AccessExclusiveLocks.
      d9fad107
    • Simon Riggs's avatar
      Clarify docs on hot standby lock release · 68f7fe14
      Simon Riggs authored
      Andres Freund and Simon Riggs
      68f7fe14
    • Peter Eisentraut's avatar
      8f40ad1f
    • Tom Lane's avatar
      Fix multiple problems in WAL replay. · 3bbf668d
      Tom Lane authored
      Most of the replay functions for WAL record types that modify more than
      one page failed to ensure that those pages were locked correctly to ensure
      that concurrent queries could not see inconsistent page states.  This is
      a hangover from coding decisions made long before Hot Standby was added,
      when it was hardly necessary to acquire buffer locks during WAL replay
      at all, let alone hold them for carefully-chosen periods.
      
      The key problem was that RestoreBkpBlocks was written to hold lock on each
      page restored from a full-page image for only as long as it took to update
      that page.  This was guaranteed to break any WAL replay function in which
      there was any update-ordering constraint between pages, because even if the
      nominal order of the pages is the right one, any mixture of full-page and
      non-full-page updates in the same record would result in out-of-order
      updates.  Moreover, it wouldn't work for situations where there's a
      requirement to maintain lock on one page while updating another.  Failure
      to honor an update ordering constraint in this way is thought to be the
      cause of bug #7648 from Daniel Farina: what seems to have happened there
      is that a btree page being split was rewritten from a full-page image
      before the new right sibling page was written, and because lock on the
      original page was not maintained it was possible for hot standby queries to
      try to traverse the page's right-link to the not-yet-existing sibling page.
      
      To fix, get rid of RestoreBkpBlocks as such, and instead create a new
      function RestoreBackupBlock that restores just one full-page image at a
      time.  This function can be invoked by WAL replay functions at the points
      where they would otherwise perform non-full-page updates; in this way, the
      physical order of page updates remains the same no matter which pages are
      replaced by full-page images.  We can then further adjust the logic in
      individual replay functions if it is necessary to hold buffer locks
      for overlapping periods.  A side benefit is that we can simplify the
      handling of concurrency conflict resolution by moving that code into the
      record-type-specfic functions; there's no more need to contort the code
      layout to keep conflict resolution in front of the RestoreBkpBlocks call.
      
      In connection with that, standardize on zero-based numbering rather than
      one-based numbering for referencing the full-page images.  In HEAD, I
      removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4.  They are
      still there in the header files in previous branches, but are no longer
      used by the code.
      
      In addition, fix some other bugs identified in the course of making these
      changes:
      
      spgRedoAddNode could fail to update the parent downlink at all, if the
      parent tuple is in the same page as either the old or new split tuple and
      we're not doing a full-page image: it would get fooled by the LSN having
      been advanced already.  This would result in permanent index corruption,
      not just transient failure of concurrent queries.
      
      Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
      tail page as a candidate for a full-page image; in the worst case this
      could result in torn-page corruption.
      
      heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
      exclusive lock: it did the former in the normal path but the latter for a
      full-page image.  A plain exclusive lock seems sufficient, so change to
      that.
      
      Also, remove gistRedoPageDeleteRecord(), which has been dead code since
      VACUUM FULL was rewritten.
      
      Back-patch to 9.0, where hot standby was introduced.  Note however that 9.0
      had a significantly different WAL-logging scheme for GIST index updates,
      and it doesn't appear possible to make that scheme safe for concurrent hot
      standby queries, because it can leave inconsistent states in the index even
      between WAL records.  Given the lack of complaints from the field, we won't
      work too hard on fixing that branch.
      3bbf668d
    • Peter Eisentraut's avatar
      Use a stamp file for the XSLT HTML doc build · 9b3ac49e
      Peter Eisentraut authored
      This way it works more like the DSSSL build, and dependencies are
      tracked better by make.
      
      Also copy the CSS stylesheet to the html directory.  This was forgotten
      when the output directory was changed.
      9b3ac49e
  2. 12 Nov, 2012 5 commits
    • Heikki Linnakangas's avatar
      Oops, have to rename local variables called 'errcontext' in contrib, too. · d092d116
      Heikki Linnakangas authored
      As pointed out by Alvaro.
      d092d116
    • Heikki Linnakangas's avatar
      Use correct text domain for translating errcontext() messages. · dbdf9679
      Heikki Linnakangas authored
      errcontext() is typically used in an error context callback function, not
      within an ereport() invocation like e.g errmsg and errdetail are. That means
      that the message domain that the TEXTDOMAIN magic in ereport() determines
      is not the right one for the errcontext() calls. The message domain needs to
      be determined by the C file containing the errcontext() call, not the file
      containing the ereport() call.
      
      Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use
      for the errcontext message. "errcontext" was used in a few places as a
      variable or struct field name, I had to rename those out of the way, now
      that errcontext is a macro.
      
      We've had this problem all along, but this isn't doesn't seem worth
      backporting. It's a fairly minor issue, and turning errcontext from a
      function to a macro requires at least a recompile of any external code that
      calls errcontext().
      dbdf9679
    • Heikki Linnakangas's avatar
      Silence "expression result unused" warnings in AssertVariableIsOfTypeMacro · c9d44a75
      Heikki Linnakangas authored
      At least clang 3.1 generates those warnings. Prepend (void) to avoid them,
      like we have in AssertMacro.
      c9d44a75
    • Peter Eisentraut's avatar
      doc: "only relevant" -> "relevant only" · 42218f29
      Peter Eisentraut authored
      Karl O. Pinc
      42218f29
    • Tom Lane's avatar
      Check for stack overflow in transformSetOperationTree(). · 34f3b396
      Tom Lane authored
      Since transformSetOperationTree() recurses, it can be driven to stack
      overflow with enough UNION/INTERSECT/EXCEPT clauses in a query.  Add a
      check to ensure it fails cleanly instead of crashing.  Per report from
      Matthew Gerber (though it's not clear whether this is the only thing
      going wrong for him).
      
      Historical note: I think the reasoning behind not putting a check here in
      the beginning was that the check in transformExpr() ought to be sufficient
      to guard the whole parser.  However, because transformSetOperationTree()
      recurses all the way to the bottom of the set-operation tree before doing
      any analysis of the statement's expressions, that check doesn't save it.
      34f3b396
  3. 09 Nov, 2012 3 commits
    • Alvaro Herrera's avatar
      Remove leftover LWLockRelease() call · fa12cb7f
      Alvaro Herrera authored
      This code was refactored in d5497b95 but an extra LWLockRelease call was
      left behind.
      
      Per report from Erik Rijkers
      fa12cb7f
    • Peter Eisentraut's avatar
      XSLT stylesheet: Add slash to directory name · 732740e7
      Peter Eisentraut authored
      Some versions of the XSLT stylesheets don't handle the missing slash
      correctly (they concatenate directory and file name without the slash).
      This might never have worked correctly.
      732740e7
    • Tom Lane's avatar
      Fix WaitLatch() to return promptly when the requested timeout expires. · 3e7fdcff
      Tom Lane authored
      If the sleep is interrupted by a signal, we must recompute the remaining
      time to wait; otherwise, a steady stream of non-wait-terminating interrupts
      could delay return from WaitLatch indefinitely.  This has been shown to be
      a problem for the autovacuum launcher, and there may well be other places
      now or in the future with similar issues.  So we'd better make the function
      robust, even though this'll add at least one gettimeofday call per wait.
      
      Back-patch to 9.2.  We might eventually need to fix 9.1 as well, but the
      code is quite different there, and the usage of WaitLatch in 9.1 is so
      limited that it's not clearly important to do so.
      
      Reported and diagnosed by Jeff Janes, though I rewrote his patch rather
      heavily.
      3e7fdcff
  4. 08 Nov, 2012 3 commits
    • Tom Lane's avatar
      Rename ResolveNew() to ReplaceVarsFromTargetList(), and tweak its API. · dcc55dd2
      Tom Lane authored
      This function currently lacks the option to throw error if the provided
      targetlist doesn't have any matching entry for a Var to be replaced.
      Two of the four existing call sites would be better off with an error,
      as would the usage in the pending auto-updatable-views patch, so it seems
      past time to extend the API to support that.  To do so, replace the "event"
      parameter (historically of type CmdType, though it was declared plain int)
      with a special-purpose enum type.
      
      It's unclear whether this function might be called by third-party code.
      Since many C compilers wouldn't warn about a call site continuing to use
      the old calling convention, rename the function to forcibly break any
      such code that hasn't been updated.  The old name was none too well chosen
      anyhow.
      dcc55dd2
    • Tom Lane's avatar
      Don't trash input list structure in does_not_exist_skipping(). · 75af5ae9
      Tom Lane authored
      The trigger and rule cases need to split up the input name list, but
      they mustn't corrupt the passed-in data structure, since it could be part
      of a cached utility-statement parsetree.  Per bug #7641.
      75af5ae9
    • Heikki Linnakangas's avatar
      Teach pg_basebackup and pg_receivexlog to reply to server keepalives. · a9dad564
      Heikki Linnakangas authored
      Without this, the connection will be killed after timeout if
      wal_sender_timeout is set in the server.
      
      Original patch by Amit Kapila, modified by me to fit recent changes in the
      code.
      a9dad564
  5. 07 Nov, 2012 5 commits
  6. 06 Nov, 2012 1 commit
  7. 05 Nov, 2012 1 commit
    • Tom Lane's avatar
      Fix handling of inherited check constraints in ALTER COLUMN TYPE. · 5ed6546c
      Tom Lane authored
      This case got broken in 8.4 by the addition of an error check that
      complains if ALTER TABLE ONLY is used on a table that has children.
      We do use ONLY for this situation, but it's okay because the necessary
      recursion occurs at a higher level.  So we need to have a separate
      flag to suppress recursion without making the error check.
      
      Reported and patched by Pavan Deolasee, with some editorial adjustments by
      me.  Back-patch to 8.4, since this is a regression of functionality that
      worked in earlier branches.
      5ed6546c
  8. 02 Nov, 2012 1 commit
  9. 01 Nov, 2012 3 commits
    • Tom Lane's avatar
      Fix bogus handling of $(X) (i.e., ".exe") in isolationtester Makefile. · ef28e05a
      Tom Lane authored
      I'm not sure why commit 1eb1dde0 seems
      to have made this start to fail on Cygwin when it never did before ---
      but nonetheless, the coding was pretty bogus, and unlike the way we
      handle $(X) anywhere else.  Per buildfarm.
      ef28e05a
    • Tom Lane's avatar
      Limit the number of rel sets considered in consider_index_join_outer_rels. · 19e36477
      Tom Lane authored
      In bug #7626, Brian Dunavant exposes a performance problem created by
      commit 3b8968f2: that commit attempted to
      consider *all* possible combinations of indexable join clauses, but if said
      clauses join to enough different relations, there's an exponential increase
      in the number of outer-relation sets considered.
      
      In Brian's example, all the clauses come from the same equivalence class,
      which means it's redundant to use more than one of them in an indexscan
      anyway.  So we can prevent the problem in this class of cases (which is
      probably the majority of real examples) by rejecting combinations that
      would only serve to add a known-redundant clause.
      
      But that still leaves us exposed to exponential growth of planning time
      when the query has a lot of non-equivalence join clauses that are usable
      with the same index.  I chose to prevent such cases by setting an upper
      limit on the number of relation sets considered, equal to ten times the
      number of index clauses considered so far.  (This sliding limit still
      allows new relsets to be added on as we move to additional index columns,
      which is probably more important than considering even more combinations of
      clauses for the previous column.)  This should keep the amount of work done
      roughly linear rather than exponential in the apparent query complexity.
      This part of the fix is pretty ad-hoc; but without a clearer idea of
      real-world cases for which this would result in markedly inferior plans,
      it's hard to see how to do better.
      19e36477
    • Peter Eisentraut's avatar
      Have make never delete intermediate files automatically · 1eb1dde0
      Peter Eisentraut authored
      Several hacks in certain modes already thought this was a bad idea, so
      just disable it globally.
      1eb1dde0
  10. 31 Oct, 2012 4 commits
    • Alvaro Herrera's avatar
      2f1692d2
    • Tom Lane's avatar
      Document that TCP keepalive settings read as 0 on Unix-socket connections. · e774b764
      Tom Lane authored
      Per bug #7631 from Rob Johnson.  The code is operating as designed, but the
      docs didn't explain it.
      e774b764
    • Alvaro Herrera's avatar
      Fix erroneous choices of segNo variables · 9b8dd7e8
      Alvaro Herrera authored
      Commit dfda6eba (which changed segment numbers to use a single 64 bit
      variable instead of log/seg) introduced a couple of bogus choices of
      exactly which log segment number variable to use in each case.
      
      This is currently pretty harmless; in one place, the bogus number was
      only being used in an error message for a pretty unlikely condition
      (failure to fsync a WAL segment file).  In the other, it was using a
      global variable instead of the local variable; but all callsites were
      passing the value of the global variable anyway.
      
      No need to backpatch because that commit is not on earlier branches.
      9b8dd7e8
    • Alvaro Herrera's avatar
      Fix ALTER EXTENSION / SET SCHEMA · 04f28bdb
      Alvaro Herrera authored
      In its original conception, it was leaving some objects into the old
      schema, but without their proper pg_depend entries; this meant that the
      old schema could be dropped, causing future pg_dump calls to fail on the
      affected database.  This was originally reported by Jeff Frost as #6704;
      there have been other complaints elsewhere that can probably be traced
      to this bug.
      
      To fix, be more consistent about altering a table's subsidiary objects
      along the table itself; this requires some restructuring in how tables
      are relocated when altering an extension -- hence the new
      AlterTableNamespaceInternal routine which encapsulates it for both the
      ALTER TABLE and the ALTER EXTENSION cases.
      
      There was another bug lurking here, which was unmasked after fixing the
      previous one: certain objects would be reached twice via the dependency
      graph, and the second attempt to move them would cause the entire
      operation to fail.  Per discussion, it seems the best fix for this is to
      do more careful tracking of objects already moved: we now maintain a
      list of moved objects, to avoid attempting to do it twice for the same
      object.
      
      Authors: Alvaro Herrera, Dimitri Fontaine
      Reviewed by Tom Lane
      04f28bdb
  11. 28 Oct, 2012 1 commit
    • Peter Eisentraut's avatar
      Preserve intermediate .c files in coverage mode · 4af3dda1
      Peter Eisentraut authored
      The introduction of the .y -> .c pattern rule causes some .c files such
      as bootparse.c to be considered intermediate files in the .y -> .c -> .o
      rule chain, which make would automatically delete.  But in coverage
      mode, the processing tools such as genhtml need those files, so mark
      them as "precious" so that make preserves them.
      4af3dda1
  12. 26 Oct, 2012 3 commits
    • Kevin Grittner's avatar
      Throw error if expiring tuple is again updated or deleted. · 6868ed74
      Kevin Grittner authored
      This prevents surprising behavior when a FOR EACH ROW trigger
      BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or
      deletes the the old row.  Prior to this patch the requested action
      on the row could be silently ignored while all triggered actions
      based on the occurence of the requested action could be committed.
      One example of how this could happen is if the BEFORE DELETE
      trigger for a "parent" row deleted "children" which had trigger
      functions to update summary or status data on the parent.
      
      This also prevents similar surprising problems if the query has a
      volatile function which updates a target row while it is already
      being updated.
      
      There are related issues present in FOR UPDATE cursors and READ
      COMMITTED queries which are not handled by this patch.  These
      issues need further evalution to determine what change, if any, is
      needed.
      
      Where the new error messages are generated, in most cases the best
      fix will be to move code from the BEFORE trigger to an AFTER
      trigger.  Where this is not feasible, the trigger can avoid the
      error by re-issuing the triggering statement and returning NULL.
      
      Documentation changes will be submitted in a separate patch.
      
      Kevin Grittner and Tom Lane with input from Florian Pflug and
      Robert Haas, based on problems encountered during conversion of
      Wisconsin Circuit Court trigger logic to plpgsql triggers.
      6868ed74
    • Tom Lane's avatar
      Prefer actual constants to pseudo-constants in equivalence class machinery. · 17804fa7
      Tom Lane authored
      generate_base_implied_equalities_const() should prefer plain Consts over
      other em_is_const eclass members when choosing the "pivot" value that
      all the other members will be equated to.  This makes it more likely that
      the generated equalities will be useful in constraint-exclusion proofs.
      Per report from Rushabh Lathia.
      17804fa7
    • Tom Lane's avatar
      In pg_dump, dump SEQUENCE SET items in the data not pre-data section. · 5a39114f
      Tom Lane authored
      Represent a sequence's current value as a separate TableDataInfo dumpable
      object, so that it can be dumped within the data section of the archive
      rather than in pre-data.  This fixes an undesirable inconsistency between
      the meanings of "--data-only" and "--section=data", and also fixes dumping
      of sequences that are marked as extension configuration tables, as per a
      report from Marko Kreen back in July.  The main cost is that we do one more
      SQL query per sequence, but that's probably not very meaningful in most
      databases.
      
      Back-patch to 9.1, since it has the extension configuration issue even
      though not the --section switch.
      5a39114f
  13. 24 Oct, 2012 2 commits
    • Tom Lane's avatar
      Tweak genericcostestimate's fudge factor for index size. · bf01e34b
      Tom Lane authored
      To provide some bias against using a large index when a small one would do
      as well, genericcostestimate adds a "fudge factor", which for a long time
      was random_page_cost * index_pages/10000.  However, this can grow to be the
      dominant term in indexscan cost estimates when the index involved is large
      enough, a behavior that was never intended.  Change to a ln(1 + n/10000)
      formulation, which has nearly the same behavior up to a few hundred pages
      but tails off significantly thereafter.  (A log curve seems correct on
      first principles, since what we're trying to account for here is index
      descent costs, which are typically logarithmic.)  Per bug #7619 from Niko
      Kiirala.
      
      Possibly this change should get back-patched, but I'm hesitant to mess with
      cost estimates in stable branches.
      bf01e34b
    • Tom Lane's avatar
      When converting a table to a view, remove its system columns. · a4e8680a
      Tom Lane authored
      Views should not have any pg_attribute entries for system columns.
      However, we forgot to remove such entries when converting a table to a
      view.  This could lead to crashes later on, if someone attempted to
      reference such a column, as reported by Kohei KaiGai.
      
      Patch in HEAD only.  This bug has been there forever, but in the back
      branches we will have to defend against existing mis-converted views,
      so it doesn't seem worthwhile to change the conversion code too.
      a4e8680a
  14. 23 Oct, 2012 1 commit
    • Alvaro Herrera's avatar
      Add context info to OAT_POST_CREATE security hook · f4c4335a
      Alvaro Herrera authored
      ... and have sepgsql use it to determine whether to check permissions
      during certain operations.  Indexes that are being created as a result
      of REINDEX, for instance, do not need to have their permissions checked;
      they were already checked when the index was created.
      
      Author: KaiGai Kohei, slightly revised by me
      f4c4335a
  15. 21 Oct, 2012 1 commit
    • Kevin Grittner's avatar
      Correct predicate locking for DROP INDEX CONCURRENTLY. · 4c9d0901
      Kevin Grittner authored
      For the non-concurrent case there is an AccessExclusiveLock lock
      on both the index and the heap at a time during which no other
      process is using either, before which the index is maintained and
      used for scans, and after which the index is no longer used or
      maintained.  Predicate locks can safely be moved from the index to
      the related heap relation under the protection of these locks.
      This was done prior to the introductin of DROP INDEX CONCURRENTLY
      and continues to be done for non-concurrent index drops.
      
      For concurrent index drops, the predicate locks must be moved when
      there are no index scans in progress on that index and no more can
      subsequently start, and before heap inserts stop maintaining the
      index.  As long as these conditions are guaranteed when the
      TransferPredicateLocksToHeapRelation() function is called,
      stronger locks are not needed for correctness.
      
      Kevin Grittner based on questions by Tom Lane in reviewing the
      DROP INDEX CONCURRENTLY patch and in cooperation with Andres
      Freund and Simon Riggs.
      4c9d0901
  16. 20 Oct, 2012 1 commit
    • Tom Lane's avatar
      Fix pg_dump's handling of DROP DATABASE commands in --clean mode. · edef20f6
      Tom Lane authored
      In commit 4317e024, I accidentally broke
      this behavior while rearranging code to ensure that --create wouldn't
      affect whether a DATABASE entry gets put into archive-format output.
      Thus, 9.2 would issue a DROP DATABASE command in --clean mode, which is
      either useless or dangerous depending on the usage scenario.
      It should not do that, and no longer does.
      
      A bright spot is that this refactoring makes it easy to allow the
      combination of --clean and --create to work sensibly, ie, emit DROP
      DATABASE then CREATE DATABASE before reconnecting.  Ordinarily we'd
      consider that a feature addition and not back-patch it, but it seems
      silly to not include the extra couple of lines required in the 9.2
      version of the code.
      
      Per report from Guillaume Lelarge, though this is slightly more extensive
      than his proposed patch.
      edef20f6