1. 15 Jan, 2014 3 commits
    • Robert Haas's avatar
      Fix compiler warning. · d89746c7
      Robert Haas authored
      Kevin Gritter reports that his compiler complains about inq and outq
      being possibly-uninitialized at the point where they are passed to
      shm_mq_attach().  They are initialized by the call to
      setup_dynamic_shared_memory, but apparently his compiler is inlining
      that function and then having doubts about whether the for loop will
      always execute at least once.  Fix by initializing them to NULL.
      d89746c7
    • Robert Haas's avatar
      Fix compiler warning: Size isn't 64 bits on 32 bit platforms. · be361ef2
      Robert Haas authored
      Report by Peter Eisentraut.
      be361ef2
    • Tom Lane's avatar
      Improve FILES section of psql reference page. · 5df99f64
      Tom Lane authored
      Primarily, explain where to find the system-wide psqlrc file, per recent
      gripe from John Sutton.  Do some general wordsmithing and improve the
      markup, too.
      
      Also adjust psqlrc.sample so its comments about file location are somewhat
      trustworthy.  (Not sure why we bother with this file when it's empty,
      but whatever.)
      
      Back-patch to 9.2 where the startup file naming scheme was last changed.
      5df99f64
  2. 14 Jan, 2014 7 commits
    • Tom Lane's avatar
      Fix multiple bugs in index page locking during hot-standby WAL replay. · 061b079f
      Tom Lane authored
      In ordinary operation, VACUUM must be careful to take a cleanup lock on
      each leaf page of a btree index; this ensures that no indexscans could
      still be "in flight" to heap tuples due to be deleted.  (Because of
      possible index-tuple motion due to concurrent page splits, it's not enough
      to lock only the pages we're deleting index tuples from.)  In Hot Standby,
      the WAL replay process must likewise lock every leaf page.  There were
      several bugs in the code for that:
      
      * The replay scan might come across unused, all-zero pages in the index.
      While btree_xlog_vacuum itself did the right thing (ie, nothing) with
      such pages, xlogutils.c supposed that such pages must be corrupt and
      would throw an error.  This accounts for various reports of replication
      failures with "PANIC: WAL contains references to invalid pages".  To
      fix, add a ReadBufferMode value that instructs XLogReadBufferExtended
      not to complain when we're doing this.
      
      * btree_xlog_vacuum performed the extra locking if standbyState ==
      STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up
      for hot standby queries until the database has reached consistency, and
      we don't want to do the extra locking till then either, for fear of reading
      corrupted pages (which bufmgr.c would complain about).  Fix by exporting a
      new function from xlog.c that will report whether we're actually in hot
      standby replay mode.
      
      * To ensure full coverage of the index in the replay scan, btvacuumscan
      would emit a dummy WAL record for the last page of the index, if no
      vacuuming work had been done on that page.  However, if the last page
      of the index is all-zero, that would result in corruption of said page,
      since the functions called on it weren't prepared to handle that case.
      There's no need to lock any such pages, so change the logic to target
      the last normal leaf page instead.
      
      The first two of these bugs were diagnosed by Andres Freund, the other one
      by me.  Fixes based on ideas from Heikki Linnakangas and myself.
      
      This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
      061b079f
    • Robert Haas's avatar
      Documentation for test_shm_mq. · 16cad3e8
      Robert Haas authored
      Commit 4db3744f added this contrib
      module but neglected to document it.  Oops.
      16cad3e8
    • Robert Haas's avatar
      Mention that VACUUM FREEZE also effectively zeroes the table freeze age. · b6827094
      Robert Haas authored
      Maciek Sakrejda, reviewed by Amit Kapila
      b6827094
    • Robert Haas's avatar
      Fix typo in comment. · 246a9a8d
      Robert Haas authored
      Etsuro Fujita
      246a9a8d
    • Robert Haas's avatar
      Test code for shared memory message queue facility. · 4db3744f
      Robert Haas authored
      This code is intended as a demonstration of how the dynamic shared
      memory and dynamic background worker facilities can be used to establish
      a group of coooperating processes which can coordinate their activities
      using the shared memory message queue facility.  By itself, the code
      does nothing particularly interesting: it simply allows messages to
      be passed through a loop of workers and back to the original process.
      But it's a useful unit test, in addition to its demonstration value.
      4db3744f
    • Robert Haas's avatar
      Single-reader, single-writer, lightweight shared message queue. · ec9037df
      Robert Haas authored
      This code provides infrastructure for user backends to communicate
      relatively easily with background workers.  The message queue is
      structured as a ring buffer and allows messages of arbitary length
      to be sent and received.
      
      Patch by me.  Review by KaiGai Kohei and Andres Freund.
      ec9037df
    • Robert Haas's avatar
      Simple table of contents for a shared memory segment. · 6ddd5137
      Robert Haas authored
      This interface is intended to make it simple to divide a dynamic shared
      memory segment into different regions with distinct purposes.  It
      therefore serves much the same purpose that ShmemIndex accomplishes for
      the main shared memory segment, but it is intended to be more
      lightweight.
      
      Patch by me.  Review by Andres Freund.
      6ddd5137
  3. 13 Jan, 2014 7 commits
  4. 12 Jan, 2014 1 commit
    • Tom Lane's avatar
      Disallow LATERAL references to the target table of an UPDATE/DELETE. · 158b7fa6
      Tom Lane authored
      On second thought, commit 0c051c90 was
      over-hasty: rather than allowing this case, we ought to reject it for now.
      That leaves the field clear for a future feature that allows the target
      table to be re-specified in the FROM (or USING) clause, which will enable
      left-joining the target table to something else.  We can then also allow
      LATERAL references to such an explicitly re-specified target table.
      But allowing them right now will create ambiguities or worse for such a
      feature, and it isn't something we documented 9.3 as supporting.
      
      While at it, add a convenience subroutine to avoid having several copies
      of the ereport for disalllowed-LATERAL-reference cases.
      158b7fa6
  5. 11 Jan, 2014 7 commits
    • Tom Lane's avatar
      Fix possible crashes due to using elog/ereport too early in startup. · 910bac59
      Tom Lane authored
      Per reports from Andres Freund and Luke Campbell, a server failure during
      set_pglocale_pgservice results in a segfault rather than a useful error
      message, because the infrastructure needed to use ereport hasn't been
      initialized; specifically, MemoryContextInit hasn't been called.
      One known cause of this is starting the server in a directory it
      doesn't have permission to read.
      
      We could try to prevent set_pglocale_pgservice from using anything that
      depends on palloc or elog, but that would be messy, and the odds of future
      breakage seem high.  Moreover there are other things being called in main.c
      that look likely to use palloc or elog too --- perhaps those things
      shouldn't be there, but they are there today.  The best solution seems to
      be to move the call of MemoryContextInit to very early in the backend's
      real main() function.  I've verified that an elog or ereport occurring
      immediately after that is now capable of sending something useful to
      stderr.
      
      I also added code to elog.c to print something intelligible rather than
      just crashing if MemoryContextInit hasn't created the ErrorContext.
      This could happen if MemoryContextInit itself fails (due to malloc
      failure), and provides some future-proofing against someone trying to
      sneak in new code even earlier in server startup.
      
      Back-patch to all supported branches.  Since we've only heard reports of
      this type of failure recently, it may be that some recent change has made
      it more likely to see a crash of this kind; but it sure looks like it's
      broken all the way back.
      910bac59
    • Bruce Momjian's avatar
      Revert fd2ace80 · d84c584e
      Bruce Momjian authored
      Seems we want to document '=' plpgsql assignment instead.
      d84c584e
    • Tom Lane's avatar
      Fix compute_scalar_stats() for case that all values exceed WIDTH_THRESHOLD. · 62865262
      Tom Lane authored
      The standard typanalyze functions skip over values whose detoasted size
      exceeds WIDTH_THRESHOLD (1024 bytes), so as to limit memory bloat during
      ANALYZE.  However, we (I think I, actually :-() failed to consider the
      possibility that *every* non-null value in a column is too wide.  While
      compute_minimal_stats() seems to behave reasonably anyway in such a case,
      compute_scalar_stats() just fell through and generated no pg_statistic
      entry at all.  That's unnecessarily pessimistic: we can still produce
      valid stanullfrac and stawidth values in such cases, since we do include
      too-wide values in the average-width calculation.  Furthermore, since the
      general assumption in this code is that too-wide values are probably all
      distinct from each other, it seems reasonable to set stadistinct to -1
      ("all distinct").
      
      Per complaint from Kadri Raudsepp.  This has been like this since roughly
      neolithic times, so back-patch to all supported branches.
      62865262
    • Bruce Momjian's avatar
      docs: remove undocumented assign syntax in plpgsql examples · fd2ace80
      Bruce Momjian authored
      Pavel Stehule
      fd2ace80
    • Tom Lane's avatar
      Add another regression test cross-checking operator and function comments. · 28233ffa
      Tom Lane authored
      Add a query that lists all the functions that are operator implementation
      functions and have a SQL comment that doesn't just say "implementation of
      XYZ operator".  (Note that the preceding test checks that such functions'
      comments exactly match the corresponding operators' comments.)
      
      While it's not forbidden to add more functions to this list, that should
      only be done when we're encouraging users to use either the function or
      operator syntax for the functionality, which is a fairly rare situation.
      28233ffa
    • Andrew Dunstan's avatar
      Remove DESCR entries for json operator functions. · 11829ff8
      Andrew Dunstan authored
      Per -hackers discussion.
      11829ff8
    • Bruce Momjian's avatar
  6. 10 Jan, 2014 2 commits
    • Bruce Momjian's avatar
      Move username lookup functions from /port to /common · 111022ea
      Bruce Momjian authored
      Per suggestion from Peter E and Alvaro
      111022ea
    • Alvaro Herrera's avatar
      Accept pg_upgraded tuples during multixact freezing · 423e1211
      Alvaro Herrera authored
      The new MultiXact freezing routines introduced by commit 8e9a16ab8f7
      neglected to consider tuples that came from a pg_upgrade'd database; a
      vacuum run that tried to freeze such tuples would die with an error such
      as
      ERROR: MultiXactId 11415437 does no longer exist -- apparent wraparound
      
      To fix, ensure that GetMultiXactIdMembers is allowed to return empty
      multis when the infomask bits are right, as is done in other callsites.
      
      Per trouble report from F-Secure.
      
      In passing, fix a copy&paste bug reported by Andrey Karpov from VIVA64
      from their PVS-Studio static checked, that instead of setting relminmxid
      to Invalid, we were setting relfrozenxid twice.  Not an important
      mistake because that code branch is about relations for which we don't
      use the frozenxid/minmxid values at all in the first place, but seems to
      warrants a fix nonetheless.
      423e1211
  7. 09 Jan, 2014 7 commits
    • Tom Lane's avatar
      Remove unnecessary local variables to work around an icc optimization bug. · faab7a95
      Tom Lane authored
      Buildfarm member dunlin has been crashing since commit 8b49a604, but other
      machines seem fine with that code.  It turns out that removing the local
      variables in ordered_set_startup() that are copies of fields in "qstate"
      dodges the problem.  This might cost a few cycles on register-rich
      machines, but it's probably a wash on others, and in any case this code
      isn't performance-critical.  Thanks to Jeremy Drake for off-list
      investigation.
      faab7a95
    • Michael Meskes's avatar
    • Michael Meskes's avatar
      Fix descriptor output in ECPG. · d685e242
      Michael Meskes authored
      While working on most platforms the old way sometimes created alignment
      problems. This should fix it. Also the regresion tests were updated to test for
      the reported case.
      
      Report and fix by MauMau <maumau307@gmail.com>
      d685e242
    • Heikki Linnakangas's avatar
      Refactor checking whether we've reached the recovery target. · c945af80
      Heikki Linnakangas authored
      Makes the replay loop slightly more readable, by separating the concerns of
      whether to stop and whether to delay, and how to extract the timestamp from
      a record.
      
      This has the user-visible change that the timestamp of the last applied
      record is now updated after actually applying it. Before, it was updated
      just before applying it. That meant that pg_last_xact_replay_timestamp()
      could return the timestamp of a commit record that is in process of being
      replayed, but not yet applied. Normally the difference is small, but if
      min_recovery_apply_delay is set, there could be a significant delay between
      reading a record and applying it.
      
      Another behavioral change is that if you recover to a restore point, we stop
      after the restore point record, not before it. It makes no difference as far
      as running queries on the server is concerned, as applying a restore point
      record changes nothing, but if examine the timeline history you will see
      that the new timeline branched off just after the restore point record, not
      before it. One practical consequence is that if you do PITR to the new
      timeline, and set recovery target to the same named restore point again, it
      will find and stop recovery at the same restore point. Conceptually, I think
      it makes more sense to consider the restore point as part of the new
      timeline's history than not.
      
      In principle, setting the last-replayed timestamp before actually applying
      the record was a bug all along, but it doesn't seem worth the risk to
      backpatch, since min_recovery_apply_delay was only added in 9.4.
      c945af80
    • Peter Eisentraut's avatar
      pgcrypto: Make header files stand alone · 10a3b165
      Peter Eisentraut authored
      pgp.h used to require including mbuf.h and px.h first.  Include those in
      pgp.h, so that it can be used without prerequisites.  Remove mbuf.h
      inclusions in .c files where mbuf.h features are not used
      directly.  (px.h was always used.)
      10a3b165
    • Tom Lane's avatar
      We don't need to include pg_sema.h in s_lock.h anymore. · 220b3433
      Tom Lane authored
      Minor improvement to commit daa7527a:
      s_lock.h no longer has any need to mention PGSemaphoreData, so we can
      rip out the #include that supplies that.  In a non-HAVE_SPINLOCKS
      build, this doesn't really buy much since we still need the #include
      in spin.h --- but everywhere else, this reduces #include footprint by
      some trifle, and helps keep the different locking facilities separate.
      220b3433
    • Tom Lane's avatar
      Fix "cannot accept a set" error when only some arms of a CASE return a set. · 080b7db7
      Tom Lane authored
      In commit c1352052, I implemented an
      optimization that assumed that a function's argument expressions would
      either always return a set (ie multiple rows), or always not.  This is
      wrong however: we allow CASE expressions in which some arms return a set
      of some type and others just return a scalar of that type.  There may be
      other examples as well.  To fix, replace the run-time test of whether an
      argument returned a set with a static precheck (expression_returns_set).
      This adds a little bit of query startup overhead, but it seems barely
      measurable.
      
      Per bug #8228 from David Johnston.  This has been broken since 8.0,
      so patch all supported branches.
      080b7db7
  8. 08 Jan, 2014 6 commits
    • Robert Haas's avatar
      Reduce the number of semaphores used under --disable-spinlocks. · daa7527a
      Robert Haas authored
      Instead of allocating a semaphore from the operating system for every
      spinlock, allocate a fixed number of semaphores (by default, 1024)
      from the operating system and multiplex all the spinlocks that get
      created onto them.  This could self-deadlock if a process attempted
      to acquire more than one spinlock at a time, but since processes
      aren't supposed to execute anything other than short stretches of
      straight-line code while holding a spinlock, that shouldn't happen.
      
      One motivation for this change is that, with the introduction of
      dynamic shared memory, it may be desirable to create spinlocks that
      last for less than the lifetime of the server.  Without this change,
      attempting to use such facilities under --disable-spinlocks would
      quickly exhaust any supply of available semaphores.  Quite apart
      from that, it's desirable to contain the quantity of semaphores
      needed to run the server simply on convenience grounds, since using
      too many may make it harder to get PostgreSQL running on a new
      platform, which is mostly the point of --disable-spinlocks in the
      first place.
      
      Patch by me; review by Tom Lane.
      daa7527a
    • Heikki Linnakangas's avatar
      Fix pause_at_recovery_target + recovery_target_inclusive combination. · 3739e5ab
      Heikki Linnakangas authored
      If pause_at_recovery_target is set, recovery pauses *before* applying the
      target record, even if recovery_target_inclusive is set. If you then
      continue with pg_xlog_replay_resume(), it will apply the target record
      before ending recovery. In other words, if you log in while it's paused
      and verify that the database looks OK, ending recovery changes its state
      again, possibly destroying data that you were tring to salvage with PITR.
      
      Backpatch to 9.1, this has been broken since pause_at_recovery_target was
      added.
      3739e5ab
    • Heikki Linnakangas's avatar
      If multiple recovery_targets are specified, use the latest one. · 815d71de
      Heikki Linnakangas authored
      The docs say that only one of recovery_target_xid, recovery_target_time, or
      recovery_target_name can be specified. But the code actually did something
      different, so that a name overrode time, and xid overrode both time and name.
      Now the target specified last takes effect, whether it's an xid, time or
      name.
      
      With this patch, we still accept multiple recovery_target settings, even
      though docs say that only one can be specified. It's a general property of
      the recovery.conf file parser that you if you specify the same option twice,
      the last one takes effect, like with postgresql.conf.
      815d71de
    • Tom Lane's avatar
      Avoid extra AggCheckCallContext() checks in ordered-set aggregates. · 847e46ab
      Tom Lane authored
      In the transition functions, we don't really need to recheck this after the
      first call.  I had been feeling paranoid about possibly getting a non-null
      argument value in some other context; but it's probably game over anyway
      if we have a non-null "internal" value that's not what we are expecting.
      
      In the final functions, the general convention in pre-existing final
      functions seems to be that an Assert() is good enough, so do it like that
      here too.
      
      This seems to save a few tenths of a percent of overall query runtime,
      which isn't much, but still it's just overhead if there's not a plausible
      case where the checks would fire.
      847e46ab
    • Tom Lane's avatar
      Save a few cycles in advance_transition_function(). · e6336b8b
      Tom Lane authored
      Keep a pre-initialized FunctionCallInfoData in AggStatePerAggData, and
      re-use that at each row instead of doing InitFunctionCallInfoData each
      time.  This saves only half a dozen assignments and maybe some stack
      manipulation, and yet that seems to be good for a percent or two of the
      overall query run time for simple aggregates such as count(*).  The cost
      is that the FunctionCallInfoData (which is about a kilobyte, on 64-bit
      machines) stays allocated for the duration of the query instead of being
      short-lived stack data.  But we're already paying an equivalent space cost
      for each regular FuncExpr or OpExpr node, so I don't feel bad about paying
      it for aggregate functions.  The code seems a little cleaner this way too,
      since the number of things passed to advance_transition_function decreases.
      e6336b8b
    • Heikki Linnakangas's avatar
      Fix bug in determining when recovery has reached consistency. · d59ff6c1
      Heikki Linnakangas authored
      When starting WAL replay from an online checkpoint, the last replayed WAL
      record variable was initialized using the checkpoint record's location, even
      though the records between the REDO location and the checkpoint record had
      not been replayed yet. That was noted as "slightly confusing" but harmless
      in the comment, but in some cases, it fooled CheckRecoveryConsistency to
      incorrectly conclude that we had already reached a consistent state
      immediately at the beginning of WAL replay. That caused the system to accept
      read-only connections in hot standby mode too early, and also PANICs with
      message "WAL contains references to invalid pages".
      
      Fix by initializing the variables to the REDO location instead.
      
      In 9.2 and above, change CheckRecoveryConsistency() to use
      lastReplayedEndRecPtr variable when checking if backup end location has
      been reached. It was inconsistently using EndRecPtr for that check, but
      lastReplayedEndRecPtr when checking min recovery point. It made no
      difference before this patch, because in all the places where
      CheckRecoveryConsistency was called the two variables were the same, but
      it was always an accident waiting to happen, and would have been wrong
      after this patch anyway.
      
      Report and analysis by Tomonari Katsumata, bug #8686. Backpatch to 9.0,
      where hot standby was introduced.
      d59ff6c1