1. 02 Aug, 2012 3 commits
    • Tom Lane's avatar
      Fix race conditions associated with SPGiST redirection tuples. · 962e0cc7
      Tom Lane authored
      The correct test for whether a redirection tuple is removable is whether
      tuple's xid < RecentGlobalXmin, not OldestXmin; the previous coding
      failed to protect index searches being done in concurrent transactions that
      have no XID.  This mirrors the recent fix in btree's page recycling logic
      made in commit d3abbbeb.
      
      Also, WAL-log the newest XID of any removed redirection tuple on an index
      page, and apply ResolveRecoveryConflictWithSnapshot during InHotStandby WAL
      replay.  This protects against concurrent Hot Standby transactions possibly
      needing to see the redirection tuple(s).
      
      Per my query of 2012-03-12 and subsequent discussion.
      962e0cc7
    • Tom Lane's avatar
      Update release notes for libpq feature change. · 7719ed04
      Tom Lane authored
      7719ed04
    • Tom Lane's avatar
      Replace libpq's "row processor" API with a "single row" mode. · 41b9c845
      Tom Lane authored
      After taking awhile to digest the row-processor feature that was added to
      libpq in commit 92785dac, we've concluded
      it is over-complicated and too hard to use.  Leave the core infrastructure
      changes in place (that is, there's still a row processor function inside
      libpq), but remove the exposed API pieces, and instead provide a "single
      row" mode switch that causes PQgetResult to return one row at a time in
      separate PGresult objects.
      
      This approach incurs more overhead than proper use of a row processor
      callback would, since construction of a PGresult per row adds extra cycles.
      However, it is far easier to use and harder to break.  The single-row mode
      still affords applications the primary benefit that the row processor API
      was meant to provide, namely not having to accumulate large result sets in
      memory before processing them.  Preliminary testing suggests that we can
      probably buy back most of the extra cycles by micro-optimizing construction
      of the extra results, but that task will be left for another day.
      
      Marko Kreen
      41b9c845
  2. 01 Aug, 2012 1 commit
  3. 31 Jul, 2012 3 commits
    • Tom Lane's avatar
      Fix WITH attached to a nested set operation (UNION/INTERSECT/EXCEPT). · f6ce81f5
      Tom Lane authored
      Parse analysis neglected to cover the case of a WITH clause attached to an
      intermediate-level set operation; it only handled WITH at the top level
      or WITH attached to a leaf-level SELECT.  Per report from Adam Mackler.
      
      In HEAD, I rearranged the order of SelectStmt's fields to put withClause
      with the other fields that can appear on non-leaf SelectStmts.  In back
      branches, leave it alone to avoid a possible ABI break for third-party
      code.
      
      Back-patch to 8.4 where WITH support was added.
      f6ce81f5
    • Tom Lane's avatar
      Fix syslogger so that log_truncate_on_rotation works in the first rotation. · b76356ac
      Tom Lane authored
      In the original coding of the log rotation stuff, we did not bother to make
      the truncation logic work for the very first rotation after postmaster
      start (or after a syslogger crash and restart).  It just always appended
      in that case.  It did not seem terribly important at the time, but we've
      recently had two separate complaints from people who expected it to work
      unsurprisingly.  (Both users tend to restart the postmaster about as often
      as a log rotation is configured to happen, which is maybe not typical use,
      but still...)  Since the initial log file is opened in the postmaster,
      fixing this requires passing down some more state to the syslogger child
      process.
      
      It's always been like this, so back-patch to all supported branches.
      b76356ac
    • Alvaro Herrera's avatar
      pg_basebackup: stylistic adjustments · 2f29f011
      Alvaro Herrera authored
      The most user-visible part of this is to change the long options
      --statusint and --noloop to --status-interval and --no-loop,
      respectively, per discussion.
      
      Also, consistently enclose file names in double quotes, per our
      conventions; and consistently use the term "transaction log file" to
      talk about WAL segments.  (Someday we may need to go over this
      terminology and make it consistent across the whole source code.)
      
      Finally, reflow the code to better fit in 80 columns, and have pgindent
      fix it up some more.
      2f29f011
  4. 30 Jul, 2012 1 commit
  5. 27 Jul, 2012 2 commits
  6. 26 Jul, 2012 5 commits
    • Bruce Momjian's avatar
      Document that the pg_upgrade user of rsync might want to skip some · 69451b09
      Bruce Momjian authored
      files, like postmaster.pid.
      
      Backpatch to 9.2.
      69451b09
    • Tom Lane's avatar
      Only allow autovacuum to be auto-canceled by a directly blocked process. · 26b43869
      Tom Lane authored
      In the original coding of the autovacuum cancel feature, commit
      acac68b2, an autovacuum process was
      considered a target for cancellation if it was found to hard-block any
      process examined in the deadlock search.  This patch tightens the test so
      that the autovacuum must directly hard-block the current process.  This
      should make the behavior more predictable in general, and in particular
      it ensures that an autovacuum will not be canceled with less than
      deadlock_timeout grace period.  In the old coding, it was possible for an
      autovacuum to be canceled almost instantly, given unfortunate timing of two
      or more other processes' lock attempts.
      
      This also justifies the logging methodology in the recent commit
      d7318d43; without this restriction, that
      patch isn't providing enough information to see the connection of the
      canceling process to the autovacuum.  Like that one, patch all the way
      back.
      26b43869
    • Robert Haas's avatar
      Tab complete table names after ALTER TABLE x [NO] INHERIT. · d20cdd31
      Robert Haas authored
      Jeff Janes
      d20cdd31
    • Robert Haas's avatar
      Log a better message when canceling autovacuum. · d7318d43
      Robert Haas authored
      The old message was at DEBUG2, so typically it didn't show up in the
      log at all.  As a result, in most cases where autovacuum was canceled,
      the only information that was logged was the table being vacuumed,
      with no indication as to what problem caused the cancel.  Crank up
      the level to LOG and add some more details to assist with debugging.
      
      Back-patch all the way, per discussion on pgsql-hackers.
      d7318d43
    • Bruce Momjian's avatar
      Simplify pg_upgrade's handling when returning directory listings. · 4da8fc05
      Bruce Momjian authored
      Backpatch to 9.2.
      4da8fc05
  7. 25 Jul, 2012 3 commits
    • Tom Lane's avatar
      Fix longstanding crash-safety bug with newly-created-or-reset sequences. · af026b5d
      Tom Lane authored
      If a crash occurred immediately after the first nextval() call for a serial
      column, WAL replay would restore the sequence to a state in which it
      appeared that no nextval() had been done, thus allowing the first sequence
      value to be returned again by the next nextval() call; as reported in
      bug #6748 from Xiangming Mei.
      
      More generally, the problem would occur if an ALTER SEQUENCE was executed
      on a freshly created or reset sequence.  (The manifestation with serial
      columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
      to serial column creation.)  The cause is that sequence creation attempted
      to save one WAL entry by writing out a WAL record that made it appear that
      the first nextval() had already happened (viz, with is_called = true),
      while marking the sequence's in-database state with log_cnt = 1 to show
      that the first nextval() need not emit a WAL record.  However, ALTER
      SEQUENCE would emit a new WAL entry reflecting the actual in-database state
      (with is_called = false).  Then, nextval would allocate the first sequence
      value and set is_called = true, but it would trust the log_cnt value and
      not emit any WAL record.  A crash at this point would thus restore the
      sequence to its post-ALTER state, causing the next nextval() call to return
      the first sequence value again.
      
      To fix, get rid of the idea of logging an is_called status different from
      reality.  This means that the first nextval-driven WAL record will happen
      at the first nextval call not the second, but the marginal cost of that is
      pretty negligible.  In addition, make sure that ALTER SEQUENCE resets
      log_cnt to zero in any case where it touches sequence parameters that
      affect future nextval results.  This will result in some user-visible
      changes in the contents of a sequence's log_cnt column, as reflected in the
      patch's regression test changes; but no application should be depending on
      that anyway, since it was already true that log_cnt changes rather
      unpredictably depending on checkpoint timing.
      
      In addition, make some basically-cosmetic improvements to get rid of
      sequence.c's undesirable intimacy with page layout details.  It was always
      really trying to WAL-log the contents of the sequence tuple, so we should
      have it do that directly using a HeapTuple's t_data and t_len, rather than
      backing into it with some magic assumptions about where the tuple would be
      on the sequence's page.
      
      Back-patch to all supported branches.
      af026b5d
    • Peter Eisentraut's avatar
    • Alvaro Herrera's avatar
      Add translator comments to module names · 58f17dcf
      Alvaro Herrera authored
      58f17dcf
  8. 24 Jul, 2012 1 commit
    • Alvaro Herrera's avatar
      Change syntax of new CHECK NO INHERIT constraints · d7b47e51
      Alvaro Herrera authored
      The initially implemented syntax, "CHECK NO INHERIT (expr)" was not
      deemed very good, so switch to "CHECK (expr) NO INHERIT" instead.  This
      way it looks similar to SQL-standards compliant constraint attribute.
      
      Backport to 9.2 where the new syntax and feature was introduced.
      
      Per discussion.
      d7b47e51
  9. 23 Jul, 2012 2 commits
  10. 22 Jul, 2012 2 commits
    • Tom Lane's avatar
      Fix name collision between concurrent regression tests. · b71258af
      Tom Lane authored
      Commit f5bcd398 introduced a test using
      a table named "circles" in inherit.sql.  Unfortunately, the concurrently
      executed constraints test was already using that table name, so the
      parallel regression tests would sometimes fail.  Rename table to dodge
      the problem.  Per buildfarm.
      b71258af
    • Tom Lane's avatar
      Improve copydir() code for the case that fsync is off. · 2d46a57d
      Tom Lane authored
      We should avoid calling sync_file_range or posix_fadvise in this case,
      since (a) we don't really care if the data gets synced, and might as
      well save the kernel calls; (b) at least on Linux we know that the
      kernel might block us until it's scheduled the write.
      
      Also, avoid making a useless second traversal of the directory tree
      if we're not actually going to call fsync(2) after all.
      2d46a57d
  11. 21 Jul, 2012 5 commits
    • Tom Lane's avatar
      Use --nosync during make check's initdb call. · 2c4f5b4b
      Tom Lane authored
      We left this out of commit b966dd6c
      so as to get some more buildfarm testing of the new fsync code in initdb.
      But since no problems have turned up, it's probably time to save the
      cycles.
      2c4f5b4b
    • Tom Lane's avatar
      Suppress volatile-related warning seen in some compilers. · 1f115d98
      Tom Lane authored
      Antique versions of gcc complain about vars that are initialized outside
      PG_TRY and then modified within it.  Rather than marking the var volatile,
      expend one more line of code.
      1f115d98
    • Tom Lane's avatar
      Account for SRFs in targetlists in planner rowcount estimates. · 31c7c642
      Tom Lane authored
      We made use of the ROWS estimate for set-returning functions used in FROM,
      but not for those used in SELECT targetlists; which is a bit of an
      oversight considering there are common usages that require the latter
      approach.  Improve that.  (I had initially thought it might be worth
      folding this into cost_qual_eval, but after investigation concluded that
      that wouldn't be very helpful, so just do it separately.)  Per complaint
      from David Johnston.
      
      Back-patch to 9.2, but not further, for fear of destabilizing plan choices
      in existing releases.
      31c7c642
    • Robert Haas's avatar
      Revert temporary patch to debug Windows breakage. · ed0af332
      Robert Haas authored
      This reverts commit 0a248208.
      ed0af332
    • Robert Haas's avatar
      Repair plpgsql_validator breakage. · 0635c0b5
      Robert Haas authored
      Commit 3a0e4d36 arranged to
      reference stack-allocated variables after they were out of scope.
      That's no good, so let's arrange to not do that after all.
      0635c0b5
  12. 20 Jul, 2012 7 commits
    • Andrew Dunstan's avatar
    • Robert Haas's avatar
      Temporary patch to try to debug why event trigger patch broke Windows. · 0a248208
      Robert Haas authored
      Apologies for the ugliness.
      0a248208
    • Andrew Dunstan's avatar
      Remove prepared transactions from main isolation test schedule. · ae55d9fb
      Andrew Dunstan authored
      There is no point in running this test when prepared transactions are disabled,
      which is the default. New make targets that include the test are provided. This
      will save some useless waste of cycles on buildfarm machines.
      
      Backpatch to 9.1 where these tests were introduced.
      ae55d9fb
    • Peter Eisentraut's avatar
      pg_dump: Simplify mkdir() error checking · 8ca03aa4
      Peter Eisentraut authored
      mkdir() can check for errors itself.  We don't need to code that
      ourselves again.
      8ca03aa4
    • Alvaro Herrera's avatar
      connoinherit may be true only for CHECK constraints · f5bcd398
      Alvaro Herrera authored
      The code was setting it true for other constraints, which is
      bogus.  Doing so caused bogus catalog entries for such constraints, and
      in particular caused an error to be raised when trying to drop a
      constraint of types other than CHECK from a table that has children,
      such as reported in bug #6712.
      
      In 9.2, additionally ignore connoinherit=true for other constraint
      types, to avoid having to force initdb; existing databases might already
      contain bogus catalog entries.
      
      Includes a catversion bump (in HEAD only).
      
      Bug report from Miroslav Šulc
      Analysis from Amit Kapila and Noah Misch; Amit also contributed the patch.
      f5bcd398
    • Tom Lane's avatar
      Fix whole-row Var evaluation to cope with resjunk columns (again). · 8e617e29
      Tom Lane authored
      When a whole-row Var is reading the result of a subquery, we need it to
      ignore any "resjunk" columns that the subquery might have evaluated for
      GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
      68e40998, but that fix only covered
      whole-row Vars of named composite types, not those of RECORD type; and it
      was mighty klugy anyway, since it just assumed without checking that any
      extra columns in the result must be resjunk.  A proper fix requires getting
      hold of the subquery's targetlist so we can actually see which columns are
      resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
      the bullet and add some infrastructure to make that possible.
      
      Per report from Andrew Dunstan and additional testing by Merlin Moncure.
      Back-patch to all supported branches.  In 8.3, also back-patch commit
      292176a1, which for some reason I had
      not done at the time, but it's a prerequisite for this change.
      8e617e29
    • Robert Haas's avatar
      Make new event trigger facility actually do something. · 3a0e4d36
      Robert Haas authored
      Commit 3855968f added syntax, pg_dump,
      psql support, and documentation, but the triggers didn't actually fire.
      With this commit, they now do.  This is still a pretty basic facility
      overall because event triggers do not get a whole lot of information
      about what the user is trying to do unless you write them in C; and
      there's still no option to fire them anywhere except at the very
      beginning of the execution sequence, but it's better than nothing,
      and a good building block for future work.
      
      Along the way, add a regression test for ALTER LARGE OBJECT, since
      testing of event triggers reveals that we haven't got one.
      
      Dimitri Fontaine and Robert Haas
      3a0e4d36
  13. 19 Jul, 2012 2 commits
    • Tom Lane's avatar
      Rethink checkpointer's fsync-request table representation. · be86e3dd
      Tom Lane authored
      Instead of having one hash table entry per relation/fork/segment, just have
      one per relation, and use bitmapsets to represent which specific segments
      need to be fsync'd.  This eliminates the need to scan the whole hash table
      to implement FORGET_RELATION_FSYNC, which fixes the O(N^2) behavior
      recently demonstrated by Jeff Janes for cases involving lots of TRUNCATE or
      DROP TABLE operations during a single checkpoint cycle.  Per an idea from
      Robert Haas.
      
      (FORGET_DATABASE_FSYNC still sucks, but since dropping a database is a
      pretty expensive operation anyway, we'll live with that.)
      
      In passing, improve the delayed-unlink code: remove the pass over the list
      in mdpreckpt, since it wasn't doing anything for us except supporting a
      useless Assert in mdpostckpt, and fix mdpostckpt so that it will absorb
      fsync requests every so often when clearing a large backlog of deletion
      requests.
      be86e3dd
    • Tom Lane's avatar
      Send only one FORGET_RELATION_FSYNC request when dropping a relation. · 3072b7ba
      Tom Lane authored
      We were sending one per fork, but a little bit of refactoring allows us
      to send just one request with forknum == InvalidForkNumber.  This not only
      reduces pressure on the shared-memory request queue, but saves repeated
      traversals of the checkpointer's hash table.
      3072b7ba
  14. 18 Jul, 2012 3 commits
    • Heikki Linnakangas's avatar
      Refactor the way code is shared between some range type functions. · a7a4add6
      Heikki Linnakangas authored
      Functions like range_eq, range_before etc. are exposed at the SQL-level, but
      they're also used internally by the GiST consistent support function. The
      code sharing was done by a hack, TrickFunctionCall2, which relied on the
      knowledge that all the functions used fn_extra the same way. This commit
      splits the functions into internal versions that take a TypeCacheEntry as
      argument, and thin wrappers to expose the functions at the SQL-level. The
      internal versions can then be called directly and in a less hacky way from
      the GiST consistent function.
      
      This is just cosmetic, but backpatch to 9.2 anyway, to avoid having a
      different version of this code in the 9.2 branch. That would make
      backpatching fixes in this area more difficult.
      
      Alexander Korotkov
      a7a4add6
    • Tom Lane's avatar
      Fix statistics breakage from bgwriter/checkpointer process split. · 80e373c3
      Tom Lane authored
      ForwardFsyncRequest() supposed that it could only be called in regular
      backends, which used to be true; but since the splitup of bgwriter and
      checkpointer, it is also called in the bgwriter.  We do not want to count
      such calls in pg_stat_bgwriter.buffers_backend statistics, so fix things
      so that they aren't.
      
      (It's worth noting here that this implies an alarmingly large increase in
      the expected amount of cross-process fsync request traffic, which may well
      mean that the process splitup was not such a hot idea.)
      80e373c3
    • Tom Lane's avatar
      Fix management of pendingOpsTable in auxiliary processes. · 4a9c30a8
      Tom Lane authored
      mdinit() was misusing IsBootstrapProcessingMode() to decide whether to
      create an fsync pending-operations table in the current process.  This led
      to creating a table not only in the startup and checkpointer processes as
      intended, but also in the bgwriter process, not to mention other auxiliary
      processes such as walwriter and walreceiver.  Creation of the table in the
      bgwriter is fatal, because it absorbs fsync requests that should have gone
      to the checkpointer; instead they just sit in bgwriter local memory and are
      never acted on.  So writes performed by the bgwriter were not being fsync'd
      which could result in data loss after an OS crash.  I think there is no
      live bug with respect to walwriter and walreceiver because those never
      perform any writes of shared buffers; but the potential is there for
      future breakage in those processes too.
      
      To fix, make AuxiliaryProcessMain() export the current process's
      AuxProcType as a global variable, and then make mdinit() test directly for
      the types of aux process that should have a pendingOpsTable.  Having done
      that, we might as well also get rid of the random bool flags such as
      am_walreceiver that some of the aux processes had grown.  (Note that we
      could not have fixed the bug by examining those variables in mdinit(),
      because it's called from BaseInit() which is run by AuxiliaryProcessMain()
      before entering any of the process-type-specific code.)
      
      Back-patch to 9.2, where the problem was introduced by the split-up of
      bgwriter and checkpointer processes.  The bogus pendingOpsTable exists
      in walwriter and walreceiver processes in earlier branches, but absent
      any evidence that it causes actual problems there, I'll leave the older
      branches alone.
      4a9c30a8