1. 22 Jul, 2012 1 commit
    • Tom Lane's avatar
      Improve copydir() code for the case that fsync is off. · 2d46a57d
      Tom Lane authored
      We should avoid calling sync_file_range or posix_fadvise in this case,
      since (a) we don't really care if the data gets synced, and might as
      well save the kernel calls; (b) at least on Linux we know that the
      kernel might block us until it's scheduled the write.
      
      Also, avoid making a useless second traversal of the directory tree
      if we're not actually going to call fsync(2) after all.
      2d46a57d
  2. 21 Jul, 2012 5 commits
    • Tom Lane's avatar
      Use --nosync during make check's initdb call. · 2c4f5b4b
      Tom Lane authored
      We left this out of commit b966dd6c
      so as to get some more buildfarm testing of the new fsync code in initdb.
      But since no problems have turned up, it's probably time to save the
      cycles.
      2c4f5b4b
    • Tom Lane's avatar
      Suppress volatile-related warning seen in some compilers. · 1f115d98
      Tom Lane authored
      Antique versions of gcc complain about vars that are initialized outside
      PG_TRY and then modified within it.  Rather than marking the var volatile,
      expend one more line of code.
      1f115d98
    • Tom Lane's avatar
      Account for SRFs in targetlists in planner rowcount estimates. · 31c7c642
      Tom Lane authored
      We made use of the ROWS estimate for set-returning functions used in FROM,
      but not for those used in SELECT targetlists; which is a bit of an
      oversight considering there are common usages that require the latter
      approach.  Improve that.  (I had initially thought it might be worth
      folding this into cost_qual_eval, but after investigation concluded that
      that wouldn't be very helpful, so just do it separately.)  Per complaint
      from David Johnston.
      
      Back-patch to 9.2, but not further, for fear of destabilizing plan choices
      in existing releases.
      31c7c642
    • Robert Haas's avatar
      Revert temporary patch to debug Windows breakage. · ed0af332
      Robert Haas authored
      This reverts commit 0a248208.
      ed0af332
    • Robert Haas's avatar
      Repair plpgsql_validator breakage. · 0635c0b5
      Robert Haas authored
      Commit 3a0e4d36 arranged to
      reference stack-allocated variables after they were out of scope.
      That's no good, so let's arrange to not do that after all.
      0635c0b5
  3. 20 Jul, 2012 7 commits
    • Andrew Dunstan's avatar
    • Robert Haas's avatar
      Temporary patch to try to debug why event trigger patch broke Windows. · 0a248208
      Robert Haas authored
      Apologies for the ugliness.
      0a248208
    • Andrew Dunstan's avatar
      Remove prepared transactions from main isolation test schedule. · ae55d9fb
      Andrew Dunstan authored
      There is no point in running this test when prepared transactions are disabled,
      which is the default. New make targets that include the test are provided. This
      will save some useless waste of cycles on buildfarm machines.
      
      Backpatch to 9.1 where these tests were introduced.
      ae55d9fb
    • Peter Eisentraut's avatar
      pg_dump: Simplify mkdir() error checking · 8ca03aa4
      Peter Eisentraut authored
      mkdir() can check for errors itself.  We don't need to code that
      ourselves again.
      8ca03aa4
    • Alvaro Herrera's avatar
      connoinherit may be true only for CHECK constraints · f5bcd398
      Alvaro Herrera authored
      The code was setting it true for other constraints, which is
      bogus.  Doing so caused bogus catalog entries for such constraints, and
      in particular caused an error to be raised when trying to drop a
      constraint of types other than CHECK from a table that has children,
      such as reported in bug #6712.
      
      In 9.2, additionally ignore connoinherit=true for other constraint
      types, to avoid having to force initdb; existing databases might already
      contain bogus catalog entries.
      
      Includes a catversion bump (in HEAD only).
      
      Bug report from Miroslav Šulc
      Analysis from Amit Kapila and Noah Misch; Amit also contributed the patch.
      f5bcd398
    • Tom Lane's avatar
      Fix whole-row Var evaluation to cope with resjunk columns (again). · 8e617e29
      Tom Lane authored
      When a whole-row Var is reading the result of a subquery, we need it to
      ignore any "resjunk" columns that the subquery might have evaluated for
      GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
      68e40998, but that fix only covered
      whole-row Vars of named composite types, not those of RECORD type; and it
      was mighty klugy anyway, since it just assumed without checking that any
      extra columns in the result must be resjunk.  A proper fix requires getting
      hold of the subquery's targetlist so we can actually see which columns are
      resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
      the bullet and add some infrastructure to make that possible.
      
      Per report from Andrew Dunstan and additional testing by Merlin Moncure.
      Back-patch to all supported branches.  In 8.3, also back-patch commit
      292176a1, which for some reason I had
      not done at the time, but it's a prerequisite for this change.
      8e617e29
    • Robert Haas's avatar
      Make new event trigger facility actually do something. · 3a0e4d36
      Robert Haas authored
      Commit 3855968f added syntax, pg_dump,
      psql support, and documentation, but the triggers didn't actually fire.
      With this commit, they now do.  This is still a pretty basic facility
      overall because event triggers do not get a whole lot of information
      about what the user is trying to do unless you write them in C; and
      there's still no option to fire them anywhere except at the very
      beginning of the execution sequence, but it's better than nothing,
      and a good building block for future work.
      
      Along the way, add a regression test for ALTER LARGE OBJECT, since
      testing of event triggers reveals that we haven't got one.
      
      Dimitri Fontaine and Robert Haas
      3a0e4d36
  4. 19 Jul, 2012 2 commits
    • Tom Lane's avatar
      Rethink checkpointer's fsync-request table representation. · be86e3dd
      Tom Lane authored
      Instead of having one hash table entry per relation/fork/segment, just have
      one per relation, and use bitmapsets to represent which specific segments
      need to be fsync'd.  This eliminates the need to scan the whole hash table
      to implement FORGET_RELATION_FSYNC, which fixes the O(N^2) behavior
      recently demonstrated by Jeff Janes for cases involving lots of TRUNCATE or
      DROP TABLE operations during a single checkpoint cycle.  Per an idea from
      Robert Haas.
      
      (FORGET_DATABASE_FSYNC still sucks, but since dropping a database is a
      pretty expensive operation anyway, we'll live with that.)
      
      In passing, improve the delayed-unlink code: remove the pass over the list
      in mdpreckpt, since it wasn't doing anything for us except supporting a
      useless Assert in mdpostckpt, and fix mdpostckpt so that it will absorb
      fsync requests every so often when clearing a large backlog of deletion
      requests.
      be86e3dd
    • Tom Lane's avatar
      Send only one FORGET_RELATION_FSYNC request when dropping a relation. · 3072b7ba
      Tom Lane authored
      We were sending one per fork, but a little bit of refactoring allows us
      to send just one request with forknum == InvalidForkNumber.  This not only
      reduces pressure on the shared-memory request queue, but saves repeated
      traversals of the checkpointer's hash table.
      3072b7ba
  5. 18 Jul, 2012 6 commits
    • Heikki Linnakangas's avatar
      Refactor the way code is shared between some range type functions. · a7a4add6
      Heikki Linnakangas authored
      Functions like range_eq, range_before etc. are exposed at the SQL-level, but
      they're also used internally by the GiST consistent support function. The
      code sharing was done by a hack, TrickFunctionCall2, which relied on the
      knowledge that all the functions used fn_extra the same way. This commit
      splits the functions into internal versions that take a TypeCacheEntry as
      argument, and thin wrappers to expose the functions at the SQL-level. The
      internal versions can then be called directly and in a less hacky way from
      the GiST consistent function.
      
      This is just cosmetic, but backpatch to 9.2 anyway, to avoid having a
      different version of this code in the 9.2 branch. That would make
      backpatching fixes in this area more difficult.
      
      Alexander Korotkov
      a7a4add6
    • Tom Lane's avatar
      Fix statistics breakage from bgwriter/checkpointer process split. · 80e373c3
      Tom Lane authored
      ForwardFsyncRequest() supposed that it could only be called in regular
      backends, which used to be true; but since the splitup of bgwriter and
      checkpointer, it is also called in the bgwriter.  We do not want to count
      such calls in pg_stat_bgwriter.buffers_backend statistics, so fix things
      so that they aren't.
      
      (It's worth noting here that this implies an alarmingly large increase in
      the expected amount of cross-process fsync request traffic, which may well
      mean that the process splitup was not such a hot idea.)
      80e373c3
    • Tom Lane's avatar
      Fix management of pendingOpsTable in auxiliary processes. · 4a9c30a8
      Tom Lane authored
      mdinit() was misusing IsBootstrapProcessingMode() to decide whether to
      create an fsync pending-operations table in the current process.  This led
      to creating a table not only in the startup and checkpointer processes as
      intended, but also in the bgwriter process, not to mention other auxiliary
      processes such as walwriter and walreceiver.  Creation of the table in the
      bgwriter is fatal, because it absorbs fsync requests that should have gone
      to the checkpointer; instead they just sit in bgwriter local memory and are
      never acted on.  So writes performed by the bgwriter were not being fsync'd
      which could result in data loss after an OS crash.  I think there is no
      live bug with respect to walwriter and walreceiver because those never
      perform any writes of shared buffers; but the potential is there for
      future breakage in those processes too.
      
      To fix, make AuxiliaryProcessMain() export the current process's
      AuxProcType as a global variable, and then make mdinit() test directly for
      the types of aux process that should have a pendingOpsTable.  Having done
      that, we might as well also get rid of the random bool flags such as
      am_walreceiver that some of the aux processes had grown.  (Note that we
      could not have fixed the bug by examining those variables in mdinit(),
      because it's called from BaseInit() which is run by AuxiliaryProcessMain()
      before entering any of the process-type-specific code.)
      
      Back-patch to 9.2, where the problem was introduced by the split-up of
      bgwriter and checkpointer processes.  The bogus pendingOpsTable exists
      in walwriter and walreceiver processes in earlier branches, but absent
      any evidence that it causes actual problems there, I'll leave the older
      branches alone.
      4a9c30a8
    • Robert Haas's avatar
      Syntax support and documentation for event triggers. · 3855968f
      Robert Haas authored
      They don't actually do anything yet; that will get fixed in a
      follow-on commit.  But this gets the basic infrastructure in place,
      including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
      SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
      pg_dump and psql support; and documentation for the anticipated
      initial feature set.
      
      Dimitri Fontaine, with review and a bunch of additional hacking by me.
      Thom Brown extensively reviewed earlier versions of this patch set,
      but there's not a whole lot of that code left in this commit, as it
      turns out.
      3855968f
    • Tom Lane's avatar
      Get rid of useless global variable in pg_upgrade. · faf26bf1
      Tom Lane authored
      Since the scandir() emulation was taken out of pg_upgrade, there's
      no longer any need for scandir_file_pattern to exist as a global
      variable.  Replace it with a local in the one remaining function
      that was making use of it.
      faf26bf1
    • Tom Lane's avatar
      Improve pg_upgrade's load_directory() function. · 3d6ec663
      Tom Lane authored
      Error out on out-of-memory, rather than returning -1, which the sole
      existing caller wasn't checking for anyway.  There doesn't seem to be
      any use-case for making the caller check for failure here.
      
      Detect failure return from readdir().
      
      Use a less platform-dependent method of calculating the entrysize.
      It's possible, but not yet confirmed, that this explains bug #6733,
      in which Mike Wilson reports a pg_upgrade crash that did not occur
      in 9.1.  (Note that load_directory is effectively new code in 9.2,
      at least on platforms that have scandir().)
      
      Fix up comments, avoid uselessly using two counters, reduce the number
      of realloc calls to something sane.
      3d6ec663
  6. 17 Jul, 2012 6 commits
    • Tom Lane's avatar
      Improve coding around the fsync request queue. · 73b796a5
      Tom Lane authored
      In all branches back to 8.3, this patch fixes a questionable assumption in
      CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are
      no uninitialized pad bytes in the request queue structs.  This would only
      cause trouble if (a) there were such pad bytes, which could happen in 8.4
      and up if the compiler makes enum ForkNumber narrower than 32 bits, but
      otherwise would require not-currently-planned changes in the widths of
      other typedefs; and (b) the kernel has not uniformly initialized the
      contents of shared memory to zeroes.  Still, it seems a tad risky, and we
      can easily remove any risk by pre-zeroing the request array for ourselves.
      In addition to that, we need to establish a coding rule that struct
      RelFileNode can't contain any padding bytes, since such structs are copied
      into the request array verbatim.  (There are other places that are assuming
      this anyway, it turns out.)
      
      In 9.1 and up, the risk was a bit larger because we were also effectively
      assuming that struct RelFileNodeBackend contained no pad bytes, and with
      fields of different types in there, that would be much easier to break.
      However, there is no good reason to ever transmit fsync or delete requests
      for temp files to the bgwriter/checkpointer, so we can revert the request
      structs to plain RelFileNode, getting rid of the padding risk and saving
      some marginal number of bytes and cycles in fsync queue manipulation while
      we are at it.  The savings might be more than marginal during deletion of
      a temp relation, because the old code transmitted an entirely useless but
      nonetheless expensive-to-process ForgetRelationFsync request to the
      background process, and also had the background process perform the file
      deletion even though that can safely be done immediately.
      
      In addition, make some cleanup of nearby comments and small improvements to
      the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
      73b796a5
    • Peter Eisentraut's avatar
      PL/Python: Remove PLy_result_ass_item · 71f2dd23
      Peter Eisentraut authored
      It is apparently no longer used after the new slicing support was
      implemented (a97207b6), so let's
      remove the dead code and see if anything cares.
      71f2dd23
    • Peter Eisentraut's avatar
      Show step titles in the pg_upgrade man page · d6ce58c0
      Peter Eisentraut authored
      The upstream XSLT stylesheets missed that case.
      
      found by Álvaro Herrera
      d6ce58c0
    • Alvaro Herrera's avatar
      Remove recently added PL/Perl encoding tests · 65558995
      Alvaro Herrera authored
      These only pass cleanly on UTF8 and SQL_ASCII encodings, besides the
      Japanese encoding in which they were originally written, which is clearly
      not good enough.  Since the functionality they test has not ever been
      tested from PL/Perl, the best answer seems to be to remove the new tests
      completely.
      
      Per buildfarm results and ensuing discussion.
      65558995
    • Tom Lane's avatar
      Put back storage/proc.h in postmaster.c. · 57b9bdda
      Tom Lane authored
      I took this out thinking it wasn't needed anymore, but the EXEC_BACKEND
      code still needs it.  Per buildfarm.
      57b9bdda
    • Alvaro Herrera's avatar
      Introduce timeout handling framework · f34c68f0
      Alvaro Herrera authored
      Management of timeouts was getting a little cumbersome; what we
      originally had was more than enough back when we were only concerned
      about deadlocks and query cancel; however, when we added timeouts for
      standby processes, the code got considerably messier.  Since there are
      plans to add more complex timeouts, this seems a good time to introduce
      a central timeout handling module.
      
      External modules register their timeout handlers during process
      initialization, and later enable and disable them as they see fit using
      a simple API; timeout.c is in charge of keeping track of which timeouts
      are in effect at any time, installing a common SIGALRM signal handler,
      and calling setitimer() as appropriate to ensure timely firing of
      external handlers.
      
      timeout.c additionally supports pluggable modules to add their own
      timeouts, though this capability isn't exercised anywhere yet.
      
      Additionally, as of this commit, walsender processes are aware of
      timeouts; we had a preexisting bug there that made those ignore SIGALRM,
      thus being subject to unhandled deadlocks, particularly during the
      authentication phase.  This has already been fixed in back branches in
      commit 0bf8eb2a, which see for more details.
      
      Main author: Zoltán Böszörményi
      Some review and cleanup by Álvaro Herrera
      Extensive reworking by Tom Lane
      f34c68f0
  7. 16 Jul, 2012 3 commits
    • Peter Eisentraut's avatar
      Remove unreachable code · dd16f948
      Peter Eisentraut authored
      The Solaris Studio compiler warns about these instances, unlike more
      mainstream compilers such as gcc.  But manual inspection showed that
      the code is clearly not reachable, and we hope no worthy compiler will
      complain about removing this code.
      dd16f948
    • Peter Eisentraut's avatar
      a76c857e
    • Tom Lane's avatar
      Avoid pre-determining index names during CREATE TABLE LIKE parsing. · c92be3c0
      Tom Lane authored
      Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
      had to pre-assign names to indexes that had comments, because it made up an
      explicit CommentStmt command to apply the comment and so it had to know the
      name for the index.  This creates bad interactions with other indexes, as
      shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
      take any other indexes into account so it could choose a conflicting name.
      
      To fix, add a field to IndexStmt that allows it to carry a comment to be
      assigned to the new index.  (This isn't a user-exposed feature of CREATE
      INDEX, only an internal option.)  Now we don't need preassignment of index
      names in any situation.
      
      I also took the opportunity to refactor DefineIndex to accept the IndexStmt
      as such, rather than passing all its fields individually in a mile-long
      parameter list.
      
      Back-patch to 9.2, but no further, because it seems too dangerous to change
      IndexStmt or DefineIndex's API in released branches.  The bug exists back
      to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
      the lack of prior complaints we'll just let it go unfixed before 9.2.
      c92be3c0
  8. 15 Jul, 2012 1 commit
    • Tom Lane's avatar
      Prevent corner-case core dump in rfree(). · 54fd196f
      Tom Lane authored
      rfree() failed to cope with the case that pg_regcomp() had initialized the
      regex_t struct but then failed to allocate any memory for re->re_guts (ie,
      the first malloc call in pg_regcomp() failed).  It would try to touch the
      guts struct anyway, and thus dump core.  This is a sufficiently narrow
      corner case that it's not surprising it's never been seen in the field;
      but still a bug is a bug, so patch all active branches.
      
      Noted while investigating whether we need to call pg_regfree after a
      failure return from pg_regcomp.  Other than this bug, it turns out we
      don't, so adjust comments appropriately.
      54fd196f
  9. 14 Jul, 2012 3 commits
  10. 13 Jul, 2012 2 commits
    • Tom Lane's avatar
      Add fsync capability to initdb, and use sync_file_range() if available. · b966dd6c
      Tom Lane authored
      Historically we have not worried about fsync'ing anything during initdb
      (in fact, initdb intentionally passes -F to each backend launch to prevent
      it from fsync'ing).  But with filesystems getting more aggressive about
      caching data, that's not such a good plan anymore.  Make initdb do a pass
      over the finished data directory tree to fsync everything.  For testing
      purposes, the -N/--nosync flag can be used to restore the old behavior.
      
      Also, testing shows that on Linux, sync_file_range() is much faster than
      posix_fadvise() for hinting to the kernel that an fsync is coming,
      apparently because the latter blocks on a rather small request queue while
      the former doesn't.  So use this function if available in initdb, and also
      in the backend's pg_flush_data() (where it currently will affect only the
      speed of CREATE DATABASE's cloning step).
      
      We will later make pg_regress invoke initdb with the --nosync flag
      to avoid slowing down cases such as "make check" in contrib.  But
      let's not do so until we've shaken out any portability issues in this
      patch.
      
      Jeff Davis, reviewed by Andres Freund
      b966dd6c
    • Tom Lane's avatar
      Cosmetic cleanup of ginInsertValue(). · 1a9405d2
      Tom Lane authored
      Make it clearer that the passed stack mustn't be empty, and that we
      are not supposed to fall off the end of the stack in the main loop.
      Tighten the loop that extracts the root block number, too.
      
      Markus Wanner and Tom Lane
      1a9405d2
  11. 12 Jul, 2012 4 commits