1. 09 Jan, 2014 4 commits
    • Heikki Linnakangas's avatar
      Refactor checking whether we've reached the recovery target. · c945af80
      Heikki Linnakangas authored
      Makes the replay loop slightly more readable, by separating the concerns of
      whether to stop and whether to delay, and how to extract the timestamp from
      a record.
      
      This has the user-visible change that the timestamp of the last applied
      record is now updated after actually applying it. Before, it was updated
      just before applying it. That meant that pg_last_xact_replay_timestamp()
      could return the timestamp of a commit record that is in process of being
      replayed, but not yet applied. Normally the difference is small, but if
      min_recovery_apply_delay is set, there could be a significant delay between
      reading a record and applying it.
      
      Another behavioral change is that if you recover to a restore point, we stop
      after the restore point record, not before it. It makes no difference as far
      as running queries on the server is concerned, as applying a restore point
      record changes nothing, but if examine the timeline history you will see
      that the new timeline branched off just after the restore point record, not
      before it. One practical consequence is that if you do PITR to the new
      timeline, and set recovery target to the same named restore point again, it
      will find and stop recovery at the same restore point. Conceptually, I think
      it makes more sense to consider the restore point as part of the new
      timeline's history than not.
      
      In principle, setting the last-replayed timestamp before actually applying
      the record was a bug all along, but it doesn't seem worth the risk to
      backpatch, since min_recovery_apply_delay was only added in 9.4.
      c945af80
    • Peter Eisentraut's avatar
      pgcrypto: Make header files stand alone · 10a3b165
      Peter Eisentraut authored
      pgp.h used to require including mbuf.h and px.h first.  Include those in
      pgp.h, so that it can be used without prerequisites.  Remove mbuf.h
      inclusions in .c files where mbuf.h features are not used
      directly.  (px.h was always used.)
      10a3b165
    • Tom Lane's avatar
      We don't need to include pg_sema.h in s_lock.h anymore. · 220b3433
      Tom Lane authored
      Minor improvement to commit daa7527a:
      s_lock.h no longer has any need to mention PGSemaphoreData, so we can
      rip out the #include that supplies that.  In a non-HAVE_SPINLOCKS
      build, this doesn't really buy much since we still need the #include
      in spin.h --- but everywhere else, this reduces #include footprint by
      some trifle, and helps keep the different locking facilities separate.
      220b3433
    • Tom Lane's avatar
      Fix "cannot accept a set" error when only some arms of a CASE return a set. · 080b7db7
      Tom Lane authored
      In commit c1352052, I implemented an
      optimization that assumed that a function's argument expressions would
      either always return a set (ie multiple rows), or always not.  This is
      wrong however: we allow CASE expressions in which some arms return a set
      of some type and others just return a scalar of that type.  There may be
      other examples as well.  To fix, replace the run-time test of whether an
      argument returned a set with a static precheck (expression_returns_set).
      This adds a little bit of query startup overhead, but it seems barely
      measurable.
      
      Per bug #8228 from David Johnston.  This has been broken since 8.0,
      so patch all supported branches.
      080b7db7
  2. 08 Jan, 2014 7 commits
    • Robert Haas's avatar
      Reduce the number of semaphores used under --disable-spinlocks. · daa7527a
      Robert Haas authored
      Instead of allocating a semaphore from the operating system for every
      spinlock, allocate a fixed number of semaphores (by default, 1024)
      from the operating system and multiplex all the spinlocks that get
      created onto them.  This could self-deadlock if a process attempted
      to acquire more than one spinlock at a time, but since processes
      aren't supposed to execute anything other than short stretches of
      straight-line code while holding a spinlock, that shouldn't happen.
      
      One motivation for this change is that, with the introduction of
      dynamic shared memory, it may be desirable to create spinlocks that
      last for less than the lifetime of the server.  Without this change,
      attempting to use such facilities under --disable-spinlocks would
      quickly exhaust any supply of available semaphores.  Quite apart
      from that, it's desirable to contain the quantity of semaphores
      needed to run the server simply on convenience grounds, since using
      too many may make it harder to get PostgreSQL running on a new
      platform, which is mostly the point of --disable-spinlocks in the
      first place.
      
      Patch by me; review by Tom Lane.
      daa7527a
    • Heikki Linnakangas's avatar
      Fix pause_at_recovery_target + recovery_target_inclusive combination. · 3739e5ab
      Heikki Linnakangas authored
      If pause_at_recovery_target is set, recovery pauses *before* applying the
      target record, even if recovery_target_inclusive is set. If you then
      continue with pg_xlog_replay_resume(), it will apply the target record
      before ending recovery. In other words, if you log in while it's paused
      and verify that the database looks OK, ending recovery changes its state
      again, possibly destroying data that you were tring to salvage with PITR.
      
      Backpatch to 9.1, this has been broken since pause_at_recovery_target was
      added.
      3739e5ab
    • Heikki Linnakangas's avatar
      If multiple recovery_targets are specified, use the latest one. · 815d71de
      Heikki Linnakangas authored
      The docs say that only one of recovery_target_xid, recovery_target_time, or
      recovery_target_name can be specified. But the code actually did something
      different, so that a name overrode time, and xid overrode both time and name.
      Now the target specified last takes effect, whether it's an xid, time or
      name.
      
      With this patch, we still accept multiple recovery_target settings, even
      though docs say that only one can be specified. It's a general property of
      the recovery.conf file parser that you if you specify the same option twice,
      the last one takes effect, like with postgresql.conf.
      815d71de
    • Tom Lane's avatar
      Avoid extra AggCheckCallContext() checks in ordered-set aggregates. · 847e46ab
      Tom Lane authored
      In the transition functions, we don't really need to recheck this after the
      first call.  I had been feeling paranoid about possibly getting a non-null
      argument value in some other context; but it's probably game over anyway
      if we have a non-null "internal" value that's not what we are expecting.
      
      In the final functions, the general convention in pre-existing final
      functions seems to be that an Assert() is good enough, so do it like that
      here too.
      
      This seems to save a few tenths of a percent of overall query runtime,
      which isn't much, but still it's just overhead if there's not a plausible
      case where the checks would fire.
      847e46ab
    • Tom Lane's avatar
      Save a few cycles in advance_transition_function(). · e6336b8b
      Tom Lane authored
      Keep a pre-initialized FunctionCallInfoData in AggStatePerAggData, and
      re-use that at each row instead of doing InitFunctionCallInfoData each
      time.  This saves only half a dozen assignments and maybe some stack
      manipulation, and yet that seems to be good for a percent or two of the
      overall query run time for simple aggregates such as count(*).  The cost
      is that the FunctionCallInfoData (which is about a kilobyte, on 64-bit
      machines) stays allocated for the duration of the query instead of being
      short-lived stack data.  But we're already paying an equivalent space cost
      for each regular FuncExpr or OpExpr node, so I don't feel bad about paying
      it for aggregate functions.  The code seems a little cleaner this way too,
      since the number of things passed to advance_transition_function decreases.
      e6336b8b
    • Heikki Linnakangas's avatar
      Fix bug in determining when recovery has reached consistency. · d59ff6c1
      Heikki Linnakangas authored
      When starting WAL replay from an online checkpoint, the last replayed WAL
      record variable was initialized using the checkpoint record's location, even
      though the records between the REDO location and the checkpoint record had
      not been replayed yet. That was noted as "slightly confusing" but harmless
      in the comment, but in some cases, it fooled CheckRecoveryConsistency to
      incorrectly conclude that we had already reached a consistent state
      immediately at the beginning of WAL replay. That caused the system to accept
      read-only connections in hot standby mode too early, and also PANICs with
      message "WAL contains references to invalid pages".
      
      Fix by initializing the variables to the REDO location instead.
      
      In 9.2 and above, change CheckRecoveryConsistency() to use
      lastReplayedEndRecPtr variable when checking if backup end location has
      been reached. It was inconsistently using EndRecPtr for that check, but
      lastReplayedEndRecPtr when checking min recovery point. It made no
      difference before this patch, because in all the places where
      CheckRecoveryConsistency was called the two variables were the same, but
      it was always an accident waiting to happen, and would have been wrong
      after this patch anyway.
      
      Report and analysis by Tomonari Katsumata, bug #8686. Backpatch to 9.0,
      where hot standby was introduced.
      d59ff6c1
    • Peter Eisentraut's avatar
      pg_upgrade: Fix fatal error handling · ca607b15
      Peter Eisentraut authored
      Restore exiting when pg_log(PG_FATAL) is called directly instead of
      calling pg_fatal().  Fault introduced in
      264aa14a.
      ca607b15
  3. 07 Jan, 2014 6 commits
    • Bruce Momjian's avatar
      Update copyright for 2014 · 7e04792a
      Bruce Momjian authored
      Update all files in head, and files COPYRIGHT and legal.sgml in all back
      branches.
      7e04792a
    • Tom Lane's avatar
      Fix LATERAL references to target table of UPDATE/DELETE. · 0c051c90
      Tom Lane authored
      I failed to think much about UPDATE/DELETE when implementing LATERAL :-(.
      The implemented behavior ended up being that subqueries in the FROM or
      USING clause (respectively) could access the update/delete target table as
      though it were a lateral reference; which seems fine if they said LATERAL,
      but certainly ought to draw an error if they didn't.  Fix it so you get a
      suitable error when you omit LATERAL.  Per report from Emre Hasegeli.
      0c051c90
    • Heikki Linnakangas's avatar
      Silence compiler warning on MSVC. · f68220df
      Heikki Linnakangas authored
      MSVC doesn't know that elog(ERROR) doesn't return, and gives a warning about
      missing return. Silence that.
      
      Amit Kapila
      f68220df
    • Magnus Hagander's avatar
      Move permissions check from do_pg_start_backup to pg_start_backup · 9544cc0d
      Magnus Hagander authored
      And the same for do_pg_stop_backup. The code in do_pg_* is not allowed
      to access the catalogs. For manual base backups, the permissions
      check can be handled in the calling function, and for streaming
      base backups only users with the required permissions can get past
      the authentication step in the first place.
      
      Reported by Antonin Houska, diagnosed by Andres Freund
      9544cc0d
    • Magnus Hagander's avatar
      Avoid including tablespaces inside PGDATA twice in base backups · b168c5ef
      Magnus Hagander authored
      If a tablespace was crated inside PGDATA it was backed up both as part
      of the PGDATA backup and as the backup of the tablespace. Avoid this
      by skipping any directory inside PGDATA that contains one of the active
      tablespaces.
      
      Dimitri Fontaine and Magnus Hagander
      b168c5ef
    • Peter Eisentraut's avatar
      Add more use of psprintf() · edc43458
      Peter Eisentraut authored
      edc43458
  4. 06 Jan, 2014 1 commit
  5. 05 Jan, 2014 1 commit
    • Tom Lane's avatar
      Cache catalog lookup data across groups in ordered-set aggregates. · 8b49a604
      Tom Lane authored
      The initial commit of ordered-set aggregates just did all the setup work
      afresh each time the aggregate function is started up.  But in a GROUP BY
      query, the catalog lookups need not be repeated for each group, since the
      column datatypes and sort information won't change.  When there are many
      small groups, this makes for a useful, though not huge, performance
      improvement.  Per suggestion from Andrew Gierth.
      
      Profiling of these cases suggests that it might be profitable to avoid
      duplicate lookups within tuplesort startup as well; but changing the
      tuplesort APIs would have much broader impact, so I left that for
      another day.
      8b49a604
  6. 04 Jan, 2014 3 commits
    • Tom Lane's avatar
      Fix translatability markings in psql, and add defenses against future bugs. · 92459e7a
      Tom Lane authored
      Several previous commits have added columns to various \d queries without
      updating their translate_columns[] arrays, leading to potentially incorrect
      translations in NLS-enabled builds.  Offenders include commit 89368676
      (added prosecdef to \df+), c9ac00e6 (added description to \dc+) and
      3b17efdf (added description to \dC+).  Fix those cases back to 9.3 or
      9.2 as appropriate.
      
      Since this is evidently more easily missed than one would like, in HEAD
      also add an Assert that the supplied array is long enough.  This requires
      an API change for printQuery(), so it seems inappropriate for back
      branches, but presumably all future changes will be tested in HEAD anyway.
      
      In HEAD and 9.3, also clean up a whole lot of sloppiness in the emitted
      SQL for \dy (event triggers): lack of translatability due to failing to
      pass words-to-be-translated through gettext_noop(), inadequate schema
      qualification, and sloppy formatting resulting in unnecessarily ugly
      -E output.
      
      Peter Eisentraut and Tom Lane, per bug #8702 from Sergey Burladyan
      92459e7a
    • Tom Lane's avatar
      Fix header comment for bitncmp(). · 5858cf8a
      Tom Lane authored
      The result is an int less than, equal to, or greater than zero, in the
      style of memcmp (and, in fact, exactly the output of memcmp in some cases).
      This comment previously said -1, 1, or 0, which was an overspecification,
      as noted by Emre Hasegeli.  All of the existing callers appear to be fine
      with the actual behavior, so just fix the comment.
      
      In passing, improve infelicitous formatting of some call sites.
      5858cf8a
    • Tom Lane's avatar
      Fix typo in comment. · 99299756
      Tom Lane authored
      classifyClauses was renamed to classifyConditions somewhere along the
      line, but this comment didn't get the memo.
      
      Ian Barwick
      99299756
  7. 03 Jan, 2014 3 commits
    • Alvaro Herrera's avatar
      Restore some comments lost during 15732b34 · 1a3e82a7
      Alvaro Herrera authored
      Michael Paquier
      1a3e82a7
    • Tom Lane's avatar
      Ooops, should use double not single quotes in StaticAssertStmt(). · a3b4aeec
      Tom Lane authored
      That's what I get for testing this on an older compiler.
      a3b4aeec
    • Tom Lane's avatar
      Fix calculation of maximum statistics-message size. · a7ef273e
      Tom Lane authored
      The PGSTAT_NUM_TABENTRIES macro should have been updated when new fields
      were added to struct PgStat_MsgTabstat in commit 64482890, but it wasn't.
      Fix that.
      
      Also, add a static assertion that we didn't overrun the intended size limit
      on stats messages.  This will not necessarily catch every mistake in
      computing the maximum array size for stats messages, but it will catch ones
      that have practical consequences.  (The assertion in fact doesn't complain
      about the aforementioned error in PGSTAT_NUM_TABENTRIES, because that was
      not big enough to cause the array length to increase.)
      
      No back-patch, as there's no actual bug in existing releases; this is just
      in the nature of future-proofing.
      
      Mark Dilger and Tom Lane
      a7ef273e
  8. 02 Jan, 2014 6 commits
    • Alvaro Herrera's avatar
      Handle 5-char filenames in SlruScanDirectory · 638cf09e
      Alvaro Herrera authored
      Original users of slru.c were all producing 4-digit filenames, so that
      was all that that code was prepared to handle.  Changes to multixact.c
      in the course of commit 0ac5ad51 made pg_multixact/members create
      5-digit filenames once a certain threshold was reached, which
      SlruScanDirectory wasn't prepared to deal with; in particular,
      5-digit-name files were not removed during truncation.  Change that
      routine to make it aware of those files, and have it process them just
      like any others.
      
      Right now, some pg_multixact/members directories will contain a mixture
      of 4-char and 5-char filenames.  A future commit is expected fix things
      so that each slru.c user declares the correct maximum width for the
      files it produces, to avoid such unsightly mixtures.
      
      Noticed while investigating bug #8673 reported by Serge Negodyuck.
      638cf09e
    • Alvaro Herrera's avatar
      Wrap multixact/members correctly during extension · a50d9762
      Alvaro Herrera authored
      In the 9.2 code for extending multixact/members, the logic was very
      simple because the number of entries in a members page was a proper
      divisor of 2^32, and thus at 2^32 wraparound the logic for page switch
      was identical than at any other page boundary.  In commit 0ac5ad51 I
      failed to realize this and introduced code that was not able to go over
      the 2^32 boundary.  Fix that by ensuring that when we reach the last
      page of the last segment we correctly zero the initial page of the
      initial segment, using correct uint32-wraparound-safe arithmetic.
      
      Noticed while investigating bug #8673 reported by Serge Negodyuck, as
      diagnosed by Andres Freund.
      a50d9762
    • Alvaro Herrera's avatar
      Handle wraparound during truncation in multixact/members · 722acf51
      Alvaro Herrera authored
      In pg_multixact/members, relying on modulo-2^32 arithmetic for
      wraparound handling doesn't work all that well.  Because we don't
      explicitely track wraparound of the allocation counter for members, it
      is possible that the "live" area exceeds 2^31 entries; trying to remove
      SLRU segments that are "old" according to the original logic might lead
      to removal of segments still in use.  To fix, have the truncation
      routine use a tailored SlruScanDirectory callback that keeps track of
      the live area in actual use; that way, when the live range exceeds 2^31
      entries, the oldest segments still live will not get removed untimely.
      
      This new SlruScanDir callback needs to take care not to remove segments
      that are "in the future": if new SLRU segments appear while the
      truncation is ongoing, make sure we don't remove them.  This requires
      examination of shared memory state to recheck for false positives, but
      testing suggests that this doesn't cause a problem.  The original coding
      didn't suffer from this pitfall because segments created when truncation
      is running are never considered to be removable.
      
      Per Andres Freund's investigation of bug #8673 reported by Serge
      Negodyuck.
      722acf51
    • Robert Haas's avatar
      Aggressively freeze tables when CLUSTER or VACUUM FULL rewrites them. · 3cff1879
      Robert Haas authored
      We haven't wanted to do this in the past on the grounds that in rare
      cases the original xmin value will be needed for forensic purposes, but
      commit 37484ad2 removes that objection,
      so now we can.
      
      Per extensive discussion, among many people, on pgsql-hackers.
      3cff1879
    • Tom Lane's avatar
      Fix contrib/pg_upgrade to clean all the cruft made during "make check". · 4cf81b73
      Tom Lane authored
      Although these files get cleaned up if the test runs to completion,
      a failure partway through leaves trash all over the floor.  The Makefile
      ought to be bright enough to get rid of it when you say "make clean".
      4cf81b73
    • Robert Haas's avatar
      Rename walLogHints to wal_log_hints for easier grepping. · 4b351841
      Robert Haas authored
      Michael Paquier
      4b351841
  9. 01 Jan, 2014 1 commit
  10. 30 Dec, 2013 4 commits
    • Tom Lane's avatar
      Fix broken support for event triggers as extension members. · c01bc51f
      Tom Lane authored
      CREATE EVENT TRIGGER forgot to mark the event trigger as a member of its
      extension, and pg_dump didn't pay any attention anyway when deciding
      whether to dump the event trigger.  Per report from Moshe Jacobson.
      
      Given the obvious lack of testing here, it's rather astonishing that
      ALTER EXTENSION ADD/DROP EVENT TRIGGER work, but they seem to.
      c01bc51f
    • Tom Lane's avatar
      Fix alphabetization in catalogs.sgml. · d7ee4311
      Tom Lane authored
      Some recent patches seem not to have grasped the concept that the catalogs
      are described in alphabetical order.
      d7ee4311
    • Tom Lane's avatar
      Remove dead code now that orindxpath.c is history. · f7fbf4b0
      Tom Lane authored
      We don't need make_restrictinfo_from_bitmapqual() anymore at all.
      generate_bitmap_or_paths() doesn't need to be exported, and we can
      drop its rather klugy restriction_only flag.
      f7fbf4b0
    • Tom Lane's avatar
      Extract restriction OR clauses whether or not they are indexable. · f343a880
      Tom Lane authored
      It's possible to extract a restriction OR clause from a join clause that
      has the form of an OR-of-ANDs, if each sub-AND includes a clause that
      mentions only one specific relation.  While PG has been aware of that idea
      for many years, the code previously only did it if it could extract an
      indexable OR clause.  On reflection, though, that seems a silly limitation:
      adding a restriction clause can be a win by reducing the number of rows
      that have to be filtered at the join step, even if we have to test the
      clause as a plain filter clause during the scan.  This should be especially
      useful for foreign tables, where the change can cut the number of rows that
      have to be retrieved from the foreign server; but testing shows it can win
      even on local tables.  Per a suggestion from Robert Haas.
      
      As a heuristic, I made the code accept an extracted restriction clause
      if its estimated selectivity is less than 0.9, which will probably result
      in accepting extracted clauses just about always.  We might need to tweak
      that later based on experience.
      
      Since the code no longer has even a weak connection to Path creation,
      remove orindxpath.c and create a new file optimizer/util/orclauses.c.
      
      There's some additional janitorial cleanup of now-dead code that needs
      to happen, but it seems like that's a fit subject for a separate commit.
      f343a880
  11. 29 Dec, 2013 3 commits
    • Kevin Grittner's avatar
      Don't attempt to limit target database for pg_restore. · 47f50262
      Kevin Grittner authored
      There was an apparent attempt to limit the target database for
      pg_restore to version 7.1.0 or later.  Due to a leading zero this
      was interpreted as an octal number, which allowed targets with
      version numbers down to 2.87.36.  The lowest actual release above
      that was 6.0.0, so that was effectively the limit.
      
      Since the success of the restore attempt will depend primarily on
      on what statements were generated by the dump run, we don't want
      pg_restore trying to guess whether a given target should be allowed
      based on version number.  Allow a connection to any version.  Since
      it is very unlikely that anyone would be using a recent version of
      pg_restore to restore to a pre-6.0 database, this has little to no
      practical impact, but it makes the code less confusing to read.
      
      Issue reported and initial patch suggestion from Joel Jacobson
      based on an article by Andrey Karpov reporting on issues found by
      PVS-Studio static code analyzer.  Final patch based on analysis by
      Tom Lane.  Back-patch to all supported branches.
      47f50262
    • Tom Lane's avatar
      Undo autoconf 2.69's attempt to #define _DARWIN_USE_64_BIT_INODE. · ed011d97
      Tom Lane authored
      Defining this symbol causes OS X 10.5 to use a buggy version of readdir(),
      which can sometimes fail with EINVAL if the previously-fetched directory
      entry has been deleted or renamed.  In later OS X versions that bug has
      been repaired, but we still don't need the #define because it's on by
      default.  So this is just an all-around bad idea, and we can do without it.
      ed011d97
    • Peter Eisentraut's avatar
      Update grammar · 71812a98
      Peter Eisentraut authored
      From: Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
      71812a98
  12. 28 Dec, 2013 1 commit