1. 27 Feb, 2015 5 commits
    • Tom Lane's avatar
      Redefine MemoryContextReset() as deleting, not resetting, child contexts. · eaa5808e
      Tom Lane authored
      That is, MemoryContextReset() now means what was formerly meant by
      MemoryContextResetAndDeleteChildren(), and the latter is now just a macro
      alias for the former.  If you really want the functionality that was
      formerly provided by MemoryContextReset(), what you have to do is
      MemoryContextResetChildren() plus MemoryContextResetOnly() (which is a
      new API to reset *only* the named context and not touch its children).
      
      The reason for this change is that near fifteen years of experience has
      proven that there is noplace where old-style MemoryContextReset() is
      actually what you want.  Making that the default behavior has led to lots
      of context-leakage bugs, while we've not found anyplace where it's actually
      necessary to keep the child contexts; at least the standard regression
      tests do not reveal anyplace where this change breaks anything.  And there
      are upcoming patches that will introduce additional reasons why child
      contexts need to be removed.
      
      We could change existing calls of MemoryContextResetAndDeleteChildren to be
      just MemoryContextReset, but for the moment I'll leave them alone; they're
      not costing anything.
      eaa5808e
    • Alvaro Herrera's avatar
      Make CREATE OR REPLACE VIEW internally more consistent · fbef4342
      Alvaro Herrera authored
      The way that columns are added to a view is by calling
      AlterTableInternal with special subtype AT_AddColumnToView; but that
      subtype is changed to AT_AddColumnRecurse by ATPrepAddColumn.  This has
      no visible effect in the current code, since views cannot have
      inheritance children (thus the recursion step is a no-op) and adding a
      column to a view is executed identically to doing it to a table; but it
      does make a difference for future event trigger code keeping track of
      commands, because the current situation leads to confusing the case with
      a normal ALTER TABLE ADD COLUMN.
      
      Fix the problem by passing a flag to ATPrepAddColumn to prevent it from
      changing the command subtype.  The event trigger code can then properly
      ignore the subcommand.  (We could remove the call to ATPrepAddColumn,
      since views are never typed, and there is never a need for recursion,
      which are the two conditions that are checked by ATPrepAddColumn; but it
      seems more future-proof to keep the call in place.)
      fbef4342
    • Tom Lane's avatar
      Invent a memory context reset/delete callback mechanism. · f65e8270
      Tom Lane authored
      This allows cleanup actions to be registered to be called just before a
      particular memory context's contents are flushed (either by deletion or
      MemoryContextReset).  The patch in itself has no use-cases for this, but
      several likely reasons for wanting this exist.
      
      In passing, per discussion, rearrange some boolean fields in struct
      MemoryContextData so as to avoid wasted padding space.  For safety,
      this requires making allowInCritSection's existence unconditional;
      but I think that's a better approach than what was there anyway.
      f65e8270
    • Alvaro Herrera's avatar
      Fix a couple of trivial issues in jsonb.c · 654809e7
      Alvaro Herrera authored
      Typo "aggreagate" appeared three times, and the return value of function
      JsonbIteratorNext() was being assigned to an int variable in a bunch of
      places.
      654809e7
    • Alvaro Herrera's avatar
      Fix table_rewrite event trigger for ALTER TYPE/SET DATA TYPE CASCADE · 3f190f67
      Alvaro Herrera authored
      When a composite type being used in a typed table is modified by way
      of ALTER TYPE, a table rewrite occurs appearing to come from ALTER TYPE.
      The existing event_trigger.c code was unable to cope with that
      and raised a spurious error.  The fix is just to accept that command
      tag for the event, and document this properly.
      
      Noted while fooling with deparsing of DDL commands.  This appears to be
      an oversight in commit 618c9430.
      
      Thanks to Mark Wong for documentation wording help.
      3f190f67
  2. 26 Feb, 2015 6 commits
    • Andrew Dunstan's avatar
      Render infinite date/timestamps as 'infinity' for json/jsonb · bda76c1c
      Andrew Dunstan authored
      Commit ab14a73a raised an error in these cases and later the
      behaviour was copied to jsonb. This is what the XML code, which we
      then adopted, does, as the XSD types don't accept infinite values.
      However, json dates and timestamps are just strings as far as json is
      concerned, so there is no reason not to render these values as
      'infinity'.
      
      The json portion of this is backpatched to 9.4 where the behaviour was
      introduced. The jsonb portion only affects the development branch.
      
      Per gripe on pgsql-general.
      bda76c1c
    • Andres Freund's avatar
      Reconsider when to wait for WAL flushes/syncrep during commit. · fd6a3f3a
      Andres Freund authored
      Up to now RecordTransactionCommit() waited for WAL to be flushed (if
      synchronous_commit != off) and to be synchronously replicated (if
      enabled), even if a transaction did not have a xid assigned. The primary
      reason for that is that sequence's nextval() did not assign a xid, but
      are worthwhile to wait for on commit.
      
      This can be problematic because sometimes read only transactions do
      write WAL, e.g. HOT page prune records. That then could lead to read only
      transactions having to wait during commit. Not something people expect
      in a read only transaction.
      
      This lead to such strange symptoms as backends being seemingly stuck
      during connection establishment when all synchronous replicas are
      down. Especially annoying when said stuck connection is the standby
      trying to reconnect to allow syncrep again...
      
      This behavior also is involved in a rather complicated <= 9.4 bug where
      the transaction started by catchup interrupt processing waited for
      syncrep using latches, but didn't get the wakeup because it was already
      running inside the same overloaded signal handler. Fix the issue here
      doesn't properly solve that issue, merely papers over the problems. In
      9.5 catchup interrupts aren't processed out of signal handlers anymore.
      
      To fix all this, make nextval() acquire a top level xid, and only wait for
      transaction commit if a transaction both acquired a xid and emitted WAL
      records.  If only a xid has been assigned we don't uselessly want to
      wait just because of writes to temporary/unlogged tables; if only WAL
      has been written we don't want to wait just because of HOT prunes.
      
      The xid assignment in nextval() is unlikely to cause overhead in
      real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
      anyway, for another only usage of nextval() without using the result in
      an insert or similar is affected.
      
      Discussion: 20150223165359.GF30784@awork2.anarazel.de,
          369698E947874884A77849D8FE3680C2@maumau,
          5CF4ABBA67674088B3941894E22A0D25@maumau
      
      Per complaint from maumau and Thom Brown
      
      Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
      better to be consistent behavior across all maintained branches.
      fd6a3f3a
    • Fujii Masao's avatar
      Add note about how to make the SRF detoasted arguments live accross calls. · a7920b87
      Fujii Masao authored
      Andrew Gierth and Ali Akbar
      a7920b87
    • Noah Misch's avatar
      Free SQLSTATE and SQLERRM no earlier than other PL/pgSQL variables. · f5ef00ae
      Noah Misch authored
      "RETURN SQLERRM" prompted plpgsql_exec_function() to read from freed
      memory.  Back-patch to 9.0 (all supported versions).  Little code ran
      between the premature free and the read, so non-assert builds are
      unlikely to witness user-visible consequences.
      f5ef00ae
    • Stephen Frost's avatar
      Add hasRowSecurity to copyfuncs/outfuncs · 62a4a1af
      Stephen Frost authored
      The RLS patch added a hasRowSecurity field to PlannerGlobal and
      PlannedStmt but didn't update nodes/copyfuncs.c and nodes/outfuncs.c to
      reflect those additional fields.
      
      Correct that by adding entries to the appropriate functions for those
      fields.
      
      Pointed out by Robert.
      62a4a1af
    • Stephen Frost's avatar
      Add locking clause for SB views for update/delete · 6f9bd50e
      Stephen Frost authored
      In expand_security_qual(), we were handling locking correctly when a
      PlanRowMark existed, but not when we were working with the target
      relation (which doesn't have any PlanRowMarks, but the subquery created
      for the security barrier quals still needs to lock the rows under it).
      
      Noted by Etsuro Fujita when working with the Postgres FDW, which wasn't
      properly issuing a SELECT ... FOR UPDATE to the remote side under a
      DELETE.
      
      Back-patch to 9.4 where updatable security barrier views were
      introduced.
      
      Per discussion with Etsuro and Dean Rasheed.
      6f9bd50e
  3. 25 Feb, 2015 3 commits
    • Tom Lane's avatar
      Fix over-optimistic caching in fetch_array_arg_replace_nulls(). · 77903ede
      Tom Lane authored
      When I rewrote this in commit 56a79a86,
      I forgot that it's possible for the input array type to change from one
      call to the next (this can happen when applying the function to
      pg_statistic columns, for instance).  Fix that.
      77903ede
    • Tom Lane's avatar
      Fix dumping of views that are just VALUES(...) but have column aliases. · e9f1c01b
      Tom Lane authored
      The "simple" path for printing VALUES clauses doesn't work if we need
      to attach nondefault column aliases, because there's noplace to do that
      in the minimal VALUES() syntax.  So modify get_simple_values_rte() to
      detect nondefault aliases and treat that as a non-simple case.  This
      further exposes that the "non-simple" path never actually worked;
      it didn't produce valid syntax.  Fix that too.  Per bug #12789 from
      Curtis McEnroe, and analysis by Andrew Gierth.
      
      Back-patch to all supported branches.  Before 9.3, this also requires
      back-patching the part of commit 092d7ded
      that created get_simple_values_rte() to begin with; inserting the extra
      test into the old factorization of that logic would've been too messy.
      e9f1c01b
    • Michael Meskes's avatar
      Remove null-pointer checks that are not needed. · 8794bf1c
      Michael Meskes authored
      If a pointer is guaranteed to carry information there is no need to check
      for NULL again. Patch by Michael Paquier.
      8794bf1c
  4. 24 Feb, 2015 4 commits
    • Tom Lane's avatar
      Improve parser's one-extra-token lookahead mechanism. · d809fd00
      Tom Lane authored
      There are a couple of places in our grammar that fail to be strict LALR(1),
      by requiring more than a single token of lookahead to decide what to do.
      Up to now we've dealt with that by using a filter between the lexer and
      parser that merges adjacent tokens into one in the places where two tokens
      of lookahead are necessary.  But that creates a number of user-visible
      anomalies, for instance that you can't name a CTE "ordinality" because
      "WITH ordinality AS ..." triggers folding of WITH and ORDINALITY into one
      token.  I realized that there's a better way.
      
      In this patch, we still do the lookahead basically as before, but we never
      merge the second token into the first; we replace just the first token by
      a special lookahead symbol when one of the lookahead pairs is seen.
      
      This requires a couple extra productions in the grammar, but it involves
      fewer special tokens, so that the grammar tables come out a bit smaller
      than before.  The filter logic is no slower than before, perhaps a bit
      faster.
      
      I also fixed the filter logic so that when backing up after a lookahead,
      the current token's terminator is correctly restored; this eliminates some
      weird behavior in error message issuance, as is shown by the one change in
      existing regression test outputs.
      
      I believe that this patch entirely eliminates odd behaviors caused by
      lookahead for WITH.  It doesn't really improve the situation for NULLS
      followed by FIRST/LAST unfortunately: those sequences still act like a
      reserved word, even though there are cases where they should be seen as two
      ordinary identifiers, eg "SELECT nulls first FROM ...".  I experimented
      with additional grammar hacks but couldn't find any simple solution for
      that.  Still, this is better than before, and it seems much more likely
      that we *could* somehow solve the NULLS case on the basis of this filter
      behavior than the previous one.
      d809fd00
    • Peter Eisentraut's avatar
      Error when creating names too long for tar format · 23a78352
      Peter Eisentraut authored
      The tar format (at least the version we are using), does not support
      file names or symlink targets longer than 99 bytes.  Until now, the tar
      creation code would silently truncate any names that are too long.  (Its
      original application was pg_dump, where this never happens.)  This
      creates problems when running base backups over the replication
      protocol.
      
      The most important problem is when a tablespace path is longer than 99
      bytes, which will result in a truncated tablespace path being backed up.
      Less importantly, the basebackup protocol also promises to back up any
      other files it happens to find in the data directory, which would also
      lead to file name truncation if someone put a file with a long name in
      there.
      
      Now both of these cases result in an error during the backup.
      
      Add tests that fail when a too-long file name or symlink is attempted to
      be backed up.
      Reviewed-by: default avatarRobert Hass <robertmhaas@gmail.com>
      23a78352
    • Heikki Linnakangas's avatar
      347c7432
    • Heikki Linnakangas's avatar
      Fix typo in README. · dd58c609
      Heikki Linnakangas authored
      Kyotaro Horiguchi
      dd58c609
  5. 23 Feb, 2015 10 commits
    • Peter Eisentraut's avatar
      Fix invalid DocBook XML · b007bee1
      Peter Eisentraut authored
      b007bee1
    • Alvaro Herrera's avatar
      Fix stupid merge errors in previous commit · d1712d01
      Alvaro Herrera authored
      Brown paper bag installed permanently.
      d1712d01
    • Tom Lane's avatar
      Further tweaking of raw grammar output to distinguish different inputs. · 56be925e
      Tom Lane authored
      Use a different A_Expr_Kind for LIKE/ILIKE/SIMILAR TO constructs, so that
      they can be distinguished from direct invocation of the underlying
      operators.  Also, postpone selection of the operator name when transforming
      "x IN (select)" to "x = ANY (select)", so that those syntaxes can be told
      apart at parse analysis time.
      
      I had originally thought I'd also have to do something special for the
      syntaxes IS NOT DISTINCT FROM, IS NOT DOCUMENT, and x NOT IN (SELECT...),
      which the grammar translates as though they were NOT (construct).
      On reflection though, we can distinguish those cases reliably by noting
      whether the parse location shown for the NOT is the same as for its child
      node.  This only requires tweaking the parse locations for NOT IN, which
      I've done here.
      
      These changes should have no effect outside the parser; they're just in
      support of being able to give accurate warnings for planned operator
      precedence changes.
      56be925e
    • Alvaro Herrera's avatar
      Support more commands in event triggers · 296f3a60
      Alvaro Herrera authored
      COMMENT, SECURITY LABEL, and GRANT/REVOKE now also fire
      ddl_command_start and ddl_command_end event triggers, when they operate
      on database-local objects.
      
      Reviewed-By: Michael Paquier, Andres Freund, Stephen Frost
      296f3a60
    • Heikki Linnakangas's avatar
      Replace checkpoint_segments with min_wal_size and max_wal_size. · 88e98230
      Heikki Linnakangas authored
      Instead of having a single knob (checkpoint_segments) that both triggers
      checkpoints, and determines how many checkpoints to recycle, they are now
      separate concerns. There is still an internal variable called
      CheckpointSegments, which triggers checkpoints. But it no longer determines
      how many segments to recycle at a checkpoint. That is now auto-tuned by
      keeping a moving average of the distance between checkpoints (in bytes),
      and trying to keep that many segments in reserve. The advantage of this is
      that you can set max_wal_size very high, but the system won't actually
      consume that much space if there isn't any need for it. The min_wal_size
      sets a floor for that; you can effectively disable the auto-tuning behavior
      by setting min_wal_size equal to max_wal_size.
      
      The max_wal_size setting is now the actual target size of WAL at which a
      new checkpoint is triggered, instead of the distance between checkpoints.
      Previously, you could calculate the actual WAL usage with the formula
      "(2 + checkpoint_completion_target) * checkpoint_segments + 1". With this
      patch, you set the desired WAL usage with max_wal_size, and the system
      calculates the appropriate CheckpointSegments with the reverse of that
      formula. That's a lot more intuitive for administrators to set.
      
      Reviewed by Amit Kapila and Venkata Balaji N.
      88e98230
    • Heikki Linnakangas's avatar
      Renumber GUC_* constants. · 0fec0003
      Heikki Linnakangas authored
      This moves all the regular flags back together (for aesthetic reasons), and
      makes room for more GUC_UNIT_* types.
      0fec0003
    • Heikki Linnakangas's avatar
      Refactor unit conversions code in guc.c. · 1b630264
      Heikki Linnakangas authored
      Replace the if-switch-case constructs with two conversion tables,
      containing all the supported conversions between human-readable unit
      strings and the base units used in GUC variables. This makes the code
      easier to read, and makes adding new units simpler.
      1b630264
    • Andres Freund's avatar
      Guard against spurious signals in LockBufferForCleanup. · bc208a5a
      Andres Freund authored
      When LockBufferForCleanup() has to wait for getting a cleanup lock on a
      buffer it does so by setting a flag in the buffer header and then wait
      for other backends to signal it using ProcWaitForSignal().
      Unfortunately LockBufferForCleanup() missed that ProcWaitForSignal() can
      return for other reasons than the signal it is hoping for. If such a
      spurious signal arrives the wait flags on the buffer header will still
      be set. That then triggers "ERROR: multiple backends attempting to wait
      for pincount 1".
      
      The fix is simple, unset the flag if still set when retrying. That
      implies an additional spinlock acquisition/release, but that's unlikely
      to matter given the cost of waiting for a cleanup lock.  Alternatively
      it'd have been possible to move responsibility for maintaining the
      relevant flag to the waiter all together, but that might have had
      negative consequences due to possible floods of signals. Besides being
      more invasive.
      
      This looks to be a very longstanding bug. The relevant code in
      LockBufferForCleanup() hasn't changed materially since its introduction
      and ProcWaitForSignal() was documented to return for unrelated reasons
      since 8.2.  The master only patch series removing ImmediateInterruptOK
      made it much easier to hit though, as ProcSendSignal/ProcWaitForSignal
      now uses a latch shared with other tasks.
      
      Per discussion with Kevin Grittner, Tom Lane and me.
      
      Backpatch to all supported branches.
      
      Discussion: 11553.1423805224@sss.pgh.pa.us
      bc208a5a
    • Fujii Masao's avatar
      Add GUC to control the time to wait before retrieving WAL after failed attempt. · 5d2b45e3
      Fujii Masao authored
      Previously when the standby server failed to retrieve WAL files from any sources
      (i.e., streaming replication, local pg_xlog directory or WAL archive), it always
      waited for five seconds (hard-coded) before the next attempt. For example,
      this is problematic in warm-standby because restore_command can fail
      every five seconds even while new WAL file is expected to be unavailable for
      a long time and flood the log files with its error messages.
      
      This commit adds new parameter, wal_retrieve_retry_interval, to control that
      wait time.
      
      Alexey Vasiliev and Michael Paquier, reviewed by Andres Freund and me.
      5d2b45e3
    • Heikki Linnakangas's avatar
      Fix potential deadlock with libpq non-blocking mode. · 2a3f6e36
      Heikki Linnakangas authored
      If libpq output buffer is full, pqSendSome() function tries to drain any
      incoming data. This avoids deadlock, if the server e.g. sends a lot of
      NOTICE messages, and blocks until we read them. However, pqSendSome() only
      did that in blocking mode. In non-blocking mode, the deadlock could still
      happen.
      
      To fix, take a two-pronged approach:
      
      1. Change the documentation to instruct that when PQflush() returns 1, you
      should wait for both read- and write-ready, and call PQconsumeInput() if it
      becomes read-ready. That fixes the deadlock, but applications are not going
      to change overnight.
      
      2. In pqSendSome(), drain the input buffer before returning 1. This
      alleviates the problem for applications that only wait for write-ready. In
      particular, a slow but steady stream of NOTICE messages during COPY FROM
      STDIN will no longer cause a deadlock. The risk remains that the server
      attempts to send a large burst of data and fills its output buffer, and at
      the same time the client also sends enough data to fill its output buffer.
      The application will deadlock if it goes to sleep, waiting for the socket
      to become write-ready, before the server's data arrives. In practice,
      NOTICE messages and such that the server might be sending are usually
      short, so it's highly unlikely that the server would fill its output buffer
      so quickly.
      
      Backpatch to all supported versions.
      2a3f6e36
  6. 22 Feb, 2015 5 commits
    • Tom Lane's avatar
      Add parse location fields to NullTest and BooleanTest structs. · c063da17
      Tom Lane authored
      We did not need a location tag on NullTest or BooleanTest before, because
      no error messages referred directly to their locations.  That's planned
      to change though, so add these fields in a separate housekeeping commit.
      
      Catversion bump because stored rules may change.
      c063da17
    • Tom Lane's avatar
      Get rid of multiple applications of transformExpr() to the same tree. · 6a75562e
      Tom Lane authored
      transformExpr() has for many years had provisions to do nothing when
      applied to an already-transformed expression tree.  However, this was
      always ugly and of dubious reliability, so we'd be much better off without
      it.  The primary historical reason for it was that gram.y sometimes
      returned multiple links to the same subexpression, which is no longer true
      as of my BETWEEN fixes.  We'd also grown some lazy hacks in CREATE TABLE
      LIKE (failing to distinguish between raw and already-transformed index
      specifications) and one or two other places.
      
      This patch removes the need for and support for re-transforming already
      transformed expressions.  The index case is dealt with by adding a flag
      to struct IndexStmt to indicate that it's already been transformed;
      which has some benefit anyway in that tablecmds.c can now Assert that
      transformation has happened rather than just assuming.  The other main
      reason was some rather sloppy code for array type coercion, which can
      be fixed (and its performance improved too) by refactoring.
      
      I did leave transformJoinUsingClause() still constructing expressions
      containing untransformed operator nodes being applied to Vars, so that
      transformExpr() still has to allow Var inputs.  But that's a much narrower,
      and safer, special case than before, since Vars will never appear in a raw
      parse tree, and they don't have any substructure to worry about.
      
      In passing fix some oversights in the patch that added CREATE INDEX
      IF NOT EXISTS (missing processing of IndexStmt.if_not_exists).  These
      appear relatively harmless, but still sloppy coding practice.
      6a75562e
    • Tom Lane's avatar
      Represent BETWEEN as a special node type in raw parse trees. · 34af082f
      Tom Lane authored
      Previously, gram.y itself converted BETWEEN into AND (or AND/OR) nests of
      expression comparisons.  This was always as bogus as could be, but fixing
      it hasn't risen to the top of the to-do list.  The present patch invents an
      A_Expr representation for BETWEEN expressions, and does the expansion to
      comparison trees in parse_expr.c which is at least a slightly saner place
      to be doing semantic conversions.  There should be no change in the post-
      parse-analysis results.
      
      This does nothing for the semantic issues with BETWEEN (dubious connection
      to btree-opclass semantics, and multiple evaluation of possibly volatile
      subexpressions) ... but it's a necessary preliminary step before we could
      fix any of that.  The main immediate benefit is that preserving BETWEEN as
      an identifiable raw-parse-tree construct will enable better error messages.
      
      While at it, fix the code so that multiply-referenced subexpressions are
      physically duplicated before being passed through transformExpr().  This
      gets rid of one of the principal reasons why transformExpr() has
      historically had to allow already-processed input.
      34af082f
    • Jeff Davis's avatar
      Rename variable in AllocSetContextCreate to be consistent. · 74811c40
      Jeff Davis authored
      Everywhere else in the file, "context" is of type MemoryContext and
      "set" is of type AllocSet. AllocSetContextCreate uses a variable of
      type AllocSet, so rename it from "context" to "set".
      74811c40
    • Jeff Davis's avatar
      In array_agg(), don't create a new context for every group. · b419865a
      Jeff Davis authored
      Previously, each new array created a new memory context that started
      out at 8kB. This is incredibly wasteful when there are lots of small
      groups of just a few elements each.
      
      Change initArrayResult() and friends to accept a "subcontext" argument
      to indicate whether the caller wants the ArrayBuildState allocated in
      a new subcontext or not. If not, it can no longer be released
      separately from the rest of the memory context.
      
      Fixes bug report by Frank van Vugt on 2013-10-19.
      
      Tomas Vondra. Reviewed by Ali Akbar, Tom Lane, and me.
      b419865a
  7. 21 Feb, 2015 7 commits