1. 24 Jun, 2012 7 commits
    • Peter Eisentraut's avatar
      Replace int2/int4 in C code with int16/int32 · b8b2e3b2
      Peter Eisentraut authored
      The latter was already the dominant use, and it's preferable because
      in C the convention is that intXX means XX bits.  Therefore, allowing
      mixed use of int2, int4, int8, int16, int32 is obviously confusing.
      
      Remove the typedefs for int2 and int4 for now.  They don't seem to be
      widely used outside of the PostgreSQL source tree, and the few uses
      can probably be cleaned up by the time this ships.
      b8b2e3b2
    • Heikki Linnakangas's avatar
      Use UINT64CONST for 64-bit integer constants. · 0687a260
      Heikki Linnakangas authored
      Peter Eisentraut advised me that UINT64CONST is the proper way to do that,
      not LL suffix.
      0687a260
    • Heikki Linnakangas's avatar
      Use LL suffix for 64-bit constants. · 96ff85e2
      Heikki Linnakangas authored
      Per warning from buildfarm member 'locust'. At least I think this what's
      making it upset.
      96ff85e2
    • Heikki Linnakangas's avatar
      Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4
      Heikki Linnakangas authored
      This simplifies code that needs to do arithmetic on XLogRecPtrs.
      
      To avoid changing on-disk format of data pages, the LSN on data pages is
      still stored in the old format. That should keep pg_upgrade happy. However,
      we have XLogRecPtrs embedded in the control file, and in the structs that
      are sent over the replication protocol, so this changes breaks compatibility
      of pg_basebackup and server. I didn't do anything about this in this patch,
      per discussion on -hackers, the right thing to do would to be to change the
      replication protocol to be architecture-independent, so that you could use
      a newer version of pg_receivexlog, for example, against an older server
      version.
      0ab9d1c4
    • Heikki Linnakangas's avatar
      Allow WAL record header to be split across pages. · 061e7efb
      Heikki Linnakangas authored
      This saves a few bytes of WAL space, but the real motivation is to make it
      predictable how much WAL space a record requires, as it no longer depends
      on whether we need to waste the last few bytes at end of WAL page because
      the header doesn't fit.
      
      The total length field of WAL record, xl_tot_len, is moved to the beginning
      of the WAL record header, so that it is still always found on the first page
      where a WAL record begins.
      
      Bump WAL version number again as this is an incompatible change.
      061e7efb
    • Heikki Linnakangas's avatar
      Move WAL continuation record information to WAL page header. · 20ba5ca6
      Heikki Linnakangas authored
      The continuation record only contained one field, xl_rem_len, so it makes
      things simpler to just include it in the WAL page header. This wastes four
      bytes on pages that don't begin with a continuation from previos page, plus
      four bytes on every page, because of padding.
      
      The motivation of this is to make it easier to calculate how much space a
      WAL record needs. Before this patch, it depended on how many page boundaries
      the record crosses. The motivation of that, in turn, is to separate the
      allocation of space in the WAL from the copying of the record data to the
      allocated space. Keeping the calculation of space required simple helps to
      keep the critical section of allocating the space from WAL short. But that's
      not included in this patch yet.
      
      Bump WAL version number again, as this is an incompatible change.
      20ba5ca6
    • Heikki Linnakangas's avatar
      Don't waste the last segment of each 4GB logical log file. · dfda6eba
      Heikki Linnakangas authored
      The comments claimed that wasting the last segment made it easier to do
      calculations with XLogRecPtrs, because you don't have problems representing
      last-byte-position-plus-1 that way. In my experience, however, it only made
      things more complicated, because the there was two ways to represent the
      boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0,
      or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were
      picky about which representation was used.
      
      Also, use a 64-bit segment number instead of the log/seg combination, to
      point to a certain WAL segment. We assume that all platforms have a working
      64-bit integer type nowadays.
      
      This is an incompatible change in WAL format, so bumping WAL version number.
      dfda6eba
  2. 21 Jun, 2012 2 commits
    • Tom Lane's avatar
      Fix memory leak in ARRAY(SELECT ...) subqueries. · d14241c2
      Tom Lane authored
      Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which
      I think can only happen if the sub-select is embedded in a larger,
      correlated subquery) would leak memory for the duration of the query,
      due to not reclaiming the array generated in the previous execution.
      Per bug #6698 from Armando Miraglia.  Diagnosis and fix idea by Heikki,
      patch itself by me.
      
      This has been like this all along, so back-patch to all supported versions.
      d14241c2
    • Heikki Linnakangas's avatar
      Add a small cache of locks owned by a resource owner in ResourceOwner. · eeb6f37d
      Heikki Linnakangas authored
      This speeds up reassigning locks to the parent owner, when the transaction
      holds a lot of locks, but only a few of them belong to the current resource
      owner. This is particularly helps pg_dump when dumping a large number of
      objects.
      
      The cache can hold up to 15 locks in each resource owner. After that, the
      cache is marked as overflowed, and we fall back to the old method of
      scanning the whole local lock table. The tradeoff here is that the cache has
      to be scanned whenever a lock is released, so if the cache is too large,
      lock release becomes more expensive. 15 seems enough to cover pg_dump, and
      doesn't have much impact on lock release.
      
      Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.
      eeb6f37d
  3. 20 Jun, 2012 1 commit
    • Tom Lane's avatar
      Improve tests for whether we can skip queueing RI enforcement triggers. · cfa0f425
      Tom Lane authored
      During an update of a PK row, we can skip firing the RI trigger if any old
      key value is NULL, because then the row could not have had any matching
      rows in the FK table.  Conversely, during an update of an FK row, the
      outcome is determined if any new key value is NULL.  In either case it
      becomes unnecessary to compare individual key values.
      
      This patch was inspired by discussion of Vik Reykja's patch to use IS NOT
      DISTINCT semantics for the key comparisons.  In the event there is no need
      for that and so this patch looks nothing like his, but he should still get
      credit for having re-opened consideration of the trigger skip logic.
      cfa0f425
  4. 18 Jun, 2012 1 commit
    • Tom Lane's avatar
      Refer to the default foreign key match style as MATCH SIMPLE internally. · f5297bdf
      Tom Lane authored
      Previously we followed the SQL92 wording, "MATCH <unspecified>", but since
      SQL99 there's been a less awkward way to refer to the default style.
      
      In addition to the code changes, pg_constraint.confmatchtype now stores
      this match style as 's' (SIMPLE) rather than 'u' (UNSPECIFIED).  This
      doesn't affect pg_dump or psql because they use pg_get_constraintdef()
      to reconstruct foreign key definitions.  But other client-side code might
      examine that column directly, so this change will have to be marked as
      an incompatibility in the 9.3 release notes.
      f5297bdf
  5. 17 Jun, 2012 1 commit
    • Tom Lane's avatar
      Fix stats collector to recover nicely when system clock goes backwards. · 9e18eacb
      Tom Lane authored
      Formerly, if the system clock went backwards, the stats collector would
      fail to update the stats file any more until the clock reading again
      exceeds whatever timestamp was last written into the stats file.  Such
      glitches in the clock's behavior are not terribly unlikely on machines
      not using NTP.  Such a scenario has been observed to cause regression test
      failures in the buildfarm, and it could have bad effects on the behavior
      of autovacuum, so it seems prudent to install some defenses.
      
      We could directly detect the clock going backwards by adding
      GetCurrentTimestamp calls in the stats collector's main loop, but that
      would hurt performance on platforms where GetCurrentTimestamp is expensive.
      To minimize the performance hit in normal cases, adopt a more complicated
      scheme wherein backends check for clock skew when reading the stats file,
      and if they see it, signal the stats collector by sending an extra stats
      inquiry message.  The stats collector does an extra GetCurrentTimestamp
      only when it receives an inquiry with an apparently out-of-order
      timestamp.
      
      To avoid unnecessary GetCurrentTimestamp calls, expand the inquiry messages
      to carry the backend's current clock reading as well as its stats cutoff
      time.  The latter, being intentionally slightly in-the-past, would trigger
      more clock rechecks than we need if it were used for this purpose.
      
      We might want to backpatch this change at some point, but let's let it
      shake out in the buildfarm for awhile first.
      9e18eacb
  6. 15 Jun, 2012 1 commit
    • Peter Eisentraut's avatar
      Improve reporting of permission errors for array types · 15b1918e
      Peter Eisentraut authored
      Because permissions are assigned to element types, not array types,
      complaining about permission denied on an array type would be
      misleading to users.  So adjust the reporting to refer to the element
      type instead.
      
      In order not to duplicate the required logic in two dozen places,
      refactor the permission denied reporting for types a bit.
      
      pointed out by Yeb Havinga during the review of the type privilege
      feature
      15b1918e
  7. 14 Jun, 2012 5 commits
    • Robert Haas's avatar
      New SQL functons pg_backup_in_progress() and pg_backup_start_time() · 68de499b
      Robert Haas authored
      Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
      Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
      68de499b
    • Robert Haas's avatar
      Add new function log_newpage_buffer. · 6cd015be
      Robert Haas authored
      When I implemented the ginbuildempty() function as part of
      implementing unlogged tables, I falsified the note in the header
      comment for log_newpage.  Although we could fix that up by changing
      the comment, it seems cleaner to add a new function which is
      specifically intended to handle this case.  So do that.
      6cd015be
    • Robert Haas's avatar
      Remove misplaced sanity check from heap_create(). · a475c603
      Robert Haas authored
      Even when allow_system_table_mods is not set, we allow creation of any
      type of SQL object in pg_catalog, except for relations.  And you can
      get relations into pg_catalog, too, by initially creating them in some
      other schema and then moving them with ALTER .. SET SCHEMA.  So this
      restriction, which prevents relations (only) from being created in
      pg_catalog directly, is fairly pointless.  If we need a safety mechanism
      for this, it should be placed further upstream, so that it affects all
      SQL objects uniformly, and picks up both CREATE and SET SCHEMA.
      
      For now, just rip it out, per discussion with Tom Lane.
      a475c603
    • Robert Haas's avatar
      Remove RELKIND_UNCATALOGED. · d2c86a1c
      Robert Haas authored
      This may have been important at some point in the past, but it no
      longer does anything useful.
      
      Review by Tom Lane.
      d2c86a1c
    • Tom Lane's avatar
      Stamp HEAD as 9.3devel. · bed88fce
      Tom Lane authored
      Let the hacking begin ...
      bed88fce
  8. 10 Jun, 2012 2 commits
  9. 07 Jun, 2012 1 commit
  10. 01 Jun, 2012 1 commit
  11. 31 May, 2012 2 commits
    • Tom Lane's avatar
      Stamp 9.2beta2. · 4bec93ac
      Tom Lane authored
      4bec93ac
    • Tom Lane's avatar
      Force PL and range-type support functions to be owned by a superuser. · ad0009e7
      Tom Lane authored
      We allow non-superusers to create procedural languages (with restrictions)
      and range datatypes.  Previously, the automatically-created support
      functions for these objects ended up owned by the creating user.  This
      represents a rather considerable security hazard, because the owning user
      might be able to alter a support function's definition in such a way as to
      crash the server, inject trojan-horse SQL code, or even execute arbitrary
      C code directly.  It appears that right now the only actually exploitable
      problem is the infinite-recursion bug fixed in the previous patch for
      CVE-2012-2655.  However, it's not hard to imagine that future additions of
      more ALTER FUNCTION capability might unintentionally open up new hazards.
      To forestall future problems, cause these support functions to be owned by
      the bootstrap superuser, not the user creating the parent object.
      ad0009e7
  12. 30 May, 2012 2 commits
    • Tom Lane's avatar
      Expand the allowed range of timezone offsets to +/-15:59:59 from Greenwich. · cd0ff9c0
      Tom Lane authored
      We used to only allow offsets less than +/-13 hours, then it was +/14,
      then it was +/-15.  That's still not good enough though, as per today's bug
      report from Patric Bechtel.  This time I actually looked through the Olson
      timezone database to find the largest offsets used anywhere.  The winners
      are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at
      +15:13:42 until 1867.  So we'd better allow offsets less than +/-16 hours.
      
      Given the history, we are way overdue to have some greppable #define
      symbols controlling this, so make some ... and also remove an obsolete
      comment that didn't get fixed the last time.
      
      Back-patch to all supported branches.
      cd0ff9c0
    • Heikki Linnakangas's avatar
      Change the way parent pages are tracked during buffered GiST build. · d1996ed5
      Heikki Linnakangas authored
      We used to mimic the way a stack is constructed when descending the tree
      during normal GiST inserts, but that was quite complicated during a buffered
      build. It was also wrong: in GiST, the left-to-right relationships on
      different levels might not match each other, so that when you know the
      parent of a child page, you won't necessarily find the parent of the page to
      the right of the child page by following the rightlinks at the parent level.
      This sometimes led to "could not re-find parent" errors while building a
      GiST index.
      
      We now use a simple hash table to track the parent of every internal page.
      Whenever a page is split, and downlinks are moved from one page to another,
      we update the hash table accordingly. This is also better for performance
      than the old method, as we never need to move right to re-find the parent
      page, which could take a significant amount of time for buffers that were
      created much earlier in the index build.
      d1996ed5
  13. 18 May, 2012 1 commit
    • Heikki Linnakangas's avatar
      Fix bug in gistRelocateBuildBuffersOnSplit(). · 1d27dcf5
      Heikki Linnakangas authored
      When we create a temporary copy of the old node buffer, in stack, we mustn't
      leak that into any of the long-lived data structures. Before this patch,
      when we called gistPopItupFromNodeBuffer(), it got added to the array of
      "loaded buffers". After gistRelocateBuildBuffersOnSplit() exits, the
      pointer added to the loaded buffers array points to garbage. Often that goes
      unnotied, because when we go through the array of loaded buffers to unload
      them, buffers with a NULL pageBuffer are ignored, which can often happen by
      accident even if the pointer points to garbage.
      
      This patch fixes that by marking the temporary copy in stack explicitly as
      temporary, and refrain from adding buffers marked as temporary to the array
      of loaded buffers.
      
      While we're at it, initialize nodeBuffer->pageBlocknum to InvalidBlockNumber
      and improve comments a bit. This isn't strictly necessary, but makes
      debugging easier.
      1d27dcf5
  14. 16 May, 2012 1 commit
  15. 15 May, 2012 1 commit
    • Tom Lane's avatar
      Put back AC_REQUIRE([AC_STRUCT_TM]). · f667747b
      Tom Lane authored
      The BSD-ish members of the buildfarm all seem to think removing this
      was a bad idea.  It looks to me like it resulted in omitting the system
      header inclusion necessary to detect the fields of struct tm correctly.
      f667747b
  16. 14 May, 2012 3 commits
  17. 11 May, 2012 2 commits
  18. 10 May, 2012 5 commits
    • Bruce Momjian's avatar
      ee24de40
    • Bruce Momjian's avatar
      Update comment for 'name' data type to say 63 "bytes". · d2fe836c
      Bruce Momjian authored
      Catalog version bump so everyone has the same comment for beta1.
      d2fe836c
    • Tom Lane's avatar
      Stamp 9.2beta1. · f70fa835
      Tom Lane authored
      f70fa835
    • Heikki Linnakangas's avatar
      Fix outdated comment. · 60a3dffb
      Heikki Linnakangas authored
      Multi-insert records observe XLOG_HEAP_INIT_PAGE flag too, as Andres Freund
      pointed out.
      60a3dffb
    • Tom Lane's avatar
      Improve control logic for bgwriter hibernation mode. · 6308ba05
      Tom Lane authored
      Commit 6d90eaaa added a hibernation mode
      to the bgwriter to reduce the server's idle-power consumption.  However,
      its interaction with the detailed behavior of BgBufferSync's feedback
      control loop wasn't very well thought out.  That control loop depends
      primarily on the rate of buffer allocation, not the rate of buffer
      dirtying, so the hibernation mode has to be designed to operate only when
      no new buffer allocations are happening.  Also, the check for whether the
      system is effectively idle was not quite right and would fail to detect
      a constant low level of activity, thus allowing the bgwriter to go into
      hibernation mode in a way that would let the cycle time vary quite a bit,
      possibly further confusing the feedback loop.  To fix, move the wakeup
      support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
      StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
      unless no buffer allocations have happened recently.
      
      In addition, fix the delaying logic to remove the problem of possibly not
      responding to signals promptly, which was basically caused by trying to use
      the process latch's is_set flag for multiple purposes.  I can't prove it
      but I'm suspicious that that hack was responsible for the intermittent
      "postmaster does not shut down" failures we've been seeing in the buildfarm
      lately.  In any case it did nothing to improve the readability or
      robustness of the code.
      
      In passing, express the hibernation sleep time as a multiplier on
      BgWriterDelay, not a constant.  I'm not sure whether there's any value in
      exposing the longer sleep time as an independently configurable setting,
      but we can at least make it act like this for little extra code.
      6308ba05
  19. 09 May, 2012 1 commit