1. 11 May, 2015 2 commits
    • Robert Haas's avatar
      Advance the stop point for multixact offset creation only at checkpoint. · f6a6c46d
      Robert Haas authored
      Commit b69bf30b advanced the stop point
      at vacuum time, but this has subsequently been shown to be unsafe as a
      result of analysis by myself and Thomas Munro and testing by Thomas
      Munro.  The crux of the problem is that the SLRU deletion logic may
      get confused about what to remove if, at exactly the right time during
      the checkpoint process, the head of the SLRU crosses what used to be
      the tail.
      
      This patch, by me, fixes the problem by advancing the stop point only
      following a checkpoint.  This has the additional advantage of making
      the removal logic work during recovery more like the way it works during
      normal running, which is probably good.
      
      At least one of the calls to DetermineSafeOldestOffset which this patch
      removes was already dead, because MultiXactAdvanceOldest is called only
      during recovery and DetermineSafeOldestOffset was set up to do nothing
      during recovery.  That, however, is inconsistent with the principle that
      recovery and normal running should work similarly, and was confusing to
      boot.
      
      Along the way, fix some comments that previous patches in this area
      neglected to update.  It's not clear to me whether there's any
      concrete basis for the decision to use only half of the multixact ID
      space, but it's neither necessary nor sufficient to prevent multixact
      member wraparound, so the comments should not say otherwise.
      f6a6c46d
    • Robert Haas's avatar
      Fix DetermineSafeOldestOffset for the case where there are no mxacts. · 312747c2
      Robert Haas authored
      Commit b69bf30b failed to take into
      account the possibility that there might be no multixacts in existence
      at all.
      
      Report by Thomas Munro; patch by me.
      312747c2
  2. 10 May, 2015 2 commits
    • Tom Lane's avatar
      Code review for foreign/custom join pushdown patch. · 1a8a4e5c
      Tom Lane authored
      Commit e7cb7ee1 included some design
      decisions that seem pretty questionable to me, and there was quite a lot
      of stuff not to like about the documentation and comments.  Clean up
      as follows:
      
      * Consider foreign joins only between foreign tables on the same server,
      rather than between any two foreign tables with the same underlying FDW
      handler function.  In most if not all cases, the FDW would simply have had
      to apply the same-server restriction itself (far more expensively, both for
      lack of caching and because it would be repeated for each combination of
      input sub-joins), or else risk nasty bugs.  Anyone who's really intent on
      doing something outside this restriction can always use the
      set_join_pathlist_hook.
      
      * Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
      to better reflect what they're for, and allow these custom scan tlists
      to be used even for base relations.
      
      * Change make_foreignscan() API to include passing the fdw_scan_tlist
      value, since the FDW is required to set that.  Backwards compatibility
      doesn't seem like an adequate reason to expect FDWs to set it in some
      ad-hoc extra step, and anyway existing FDWs can just pass NIL.
      
      * Change the API of path-generating subroutines of add_paths_to_joinrel,
      and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
      so that various less-used parameters are passed in a struct rather than
      as separate parameter-list entries.  The objective here is to reduce the
      probability that future additions to those parameter lists will result in
      source-level API breaks for users of these hooks.  It's possible that this
      is even a small win for the core code, since most CPU architectures can't
      pass more than half a dozen parameters efficiently anyway.  I kept root,
      joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
      code churn in joinpath.c --- in particular, putting jointype into the
      struct would have been problematic because of the subroutines' habit of
      changing their local copies of that variable.
      
      * Avoid ad-hocery in ExecAssignScanProjectionInfo.  It was probably all
      right for it to know about IndexOnlyScan, but if the list is to grow
      we should refactor the knowledge out to the callers.
      
      * Restore nodeForeignscan.c's previous use of the relcache to avoid
      extra GetFdwRoutine lookups for base-relation scans.
      
      * Lots of cleanup of documentation and missed comments.  Re-order some
      code additions into more logical places.
      1a8a4e5c
    • Tom Lane's avatar
      Add missing "static" marker. · c594c750
      Tom Lane authored
      Per buildfarm member pademelon.
      c594c750
  3. 09 May, 2015 5 commits
  4. 08 May, 2015 13 commits
    • Stephen Frost's avatar
      Change default for include_realm to 1 · 9a088417
      Stephen Frost authored
      The default behavior for GSS and SSPI authentication methods has long
      been to strip the realm off of the principal, however, this is not a
      secure approach in multi-realm environments and the use-case for the
      parameter at all has been superseded by the regex-based mapping support
      available in pg_ident.conf.
      
      Change the default for include_realm to be '1', meaning that we do
      NOT remove the realm from the principal by default.  Any installations
      which depend on the existing behavior will need to update their
      configurations (ideally by leaving include_realm set to 1 and adding a
      mapping in pg_ident.conf, but alternatively by explicitly setting
      include_realm=0 prior to upgrading).  Note that the mapping capability
      exists in all currently supported versions of PostgreSQL and so this
      change can be done today.  Barring that, existing users can update their
      configurations today to explicitly set include_realm=0 to ensure that
      the prior behavior is maintained when they upgrade.
      
      This needs to be noted in the release notes.
      
      Per discussion with Magnus and Peter.
      9a088417
    • Stephen Frost's avatar
      Modify pg_stat_get_activity to build a tuplestore · f91feba8
      Stephen Frost authored
      This updates pg_stat_get_activity() to build a tuplestore for its
      results instead of using the old-style multiple-call method.  This
      simplifies the function, though that wasn't the primary motivation for
      the change, which is that we may turn it into a helper function which
      can filter the results (or not) much more easily.
      f91feba8
    • Stephen Frost's avatar
      Bump catversion for pg_file_settings · 4b342fb5
      Stephen Frost authored
      Pointed out by Andres (thanks!)
      
      Apologies for not including it in the initial patch.
      4b342fb5
    • Stephen Frost's avatar
      Add pg_file_settings view and function · a97e0c33
      Stephen Frost authored
      The function and view added here provide a way to look at all settings
      in postgresql.conf, any #include'd files, and postgresql.auto.conf
      (which is what backs the ALTER SYSTEM command).
      
      The information returned includes the configuration file name, line
      number in that file, sequence number indicating when the parameter is
      loaded (useful to see if it is later masked by another definition of the
      same parameter), parameter name, and what it is set to at that point.
      This information is updated on reload of the server.
      
      This is unfiltered, privileged, information and therefore access is
      restricted to superusers through the GRANT system.
      
      Author: Sawada Masahiko, various improvements by me.
      Reviewers: David Steele
      a97e0c33
    • Andres Freund's avatar
      Fix two problems in infer_arbiter_indexes(). · bab64ef9
      Andres Freund authored
      The first is a pretty simple bug where a relcache entry is used after
      the relation is closed. In this particular situation it does not appear
      to have bad consequences unless compiled with RELCACHE_FORCE_RELEASE.
      
      The second is that infer_arbiter_indexes() skipped indexes that aren't
      yet valid according to indcheckxmin. That's not required here, because
      uniqueness checks don't care about visibility according to an older
      snapshot.  While thats not really a bug, it makes things undesirably
      non-deterministic.  There is some hope that this explains a test failure
      on buildfarm member jaguarundi.
      
      Discussion: 9096.1431102730@sss.pgh.pa.us
      bab64ef9
    • Heikki Linnakangas's avatar
      At promotion, archive last segment from old timeline with .partial suffix. · de768844
      Heikki Linnakangas authored
      Previously, we would archive the possible-incomplete WAL segment with its
      normal filename, but that causes trouble if the server owning that timeline
      is still running, and tries to archive the same segment later. It's not nice
      for the standby to trip up the master's archival like that. And it's pretty
      confusing, anyway, to have an incomplete segment in the archive that's
      indistinguishable from a normal, complete segment.
      
      To avoid such confusion, add a .partial suffix to the file. Or to be more
      precise, make a copy of the old segment under the .partial suffix, and
      archive that instead of the original file. pg_receivexlog also uses the
      .partial suffix for the same purpose, to tell apart incompletely streamed
      files from complete ones.
      
      There is no automatic mechanism to use the .partial files at recovery, so
      they will go unused, unless the administrator manually copies to them to
      the pg_xlog directory (and removes the .partial suffix). Recovery won't
      normally need the WAL - when recovering to the new timeline, it will find
      the same WAL on the first segment on the new timeline instead - but it
      nevertheless feels better to archive the file with the .partial suffix, for
      debugging purposes if nothing else.
      de768844
    • Heikki Linnakangas's avatar
      Add macros to check if a filename is a WAL segment or other such file. · 179cdd09
      Heikki Linnakangas authored
      We had many instances of the strlen + strspn combination to check for that.
      This makes the code a bit easier to read.
      179cdd09
    • Peter Eisentraut's avatar
      Fix whitespace · 16c73e77
      Peter Eisentraut authored
      16c73e77
    • Andres Freund's avatar
      Minor ON CONFLICT related comments and doc fixes. · e8898e91
      Andres Freund authored
      Geoff Winkless, Stephen Frost, Peter Geoghegan and me.
      e8898e91
    • Robert Haas's avatar
      Teach autovacuum about multixact member wraparound. · 53bb309d
      Robert Haas authored
      The logic introduced in commit b69bf30b
      and repaired in commits 669c7d20 and
      7be47c56 helps to ensure that we don't
      overwrite old multixact member information while it is still needed,
      but a user who creates many large multixacts can still exhaust the
      member space (and thus start getting errors) while autovacuum stands
      idly by.
      
      To fix this, progressively ramp down the effective value (but not the
      actual contents) of autovacuum_multixact_freeze_max_age as member space
      utilization increases.  This makes autovacuum more aggressive and also
      reduces the threshold for a manual VACUUM to perform a full-table scan.
      
      This patch leaves unsolved the problem of ensuring that emergency
      autovacuums are triggered even when autovacuum=off.  We'll need to fix
      that via a separate patch.
      
      Thomas Munro and Robert Haas
      53bb309d
    • Stephen Frost's avatar
      Remove reference to src/tools/backend/index.html · 195fbd40
      Stephen Frost authored
      src/tools/backend was removed back in 63f1ccd8, but
      backend/storage/lmgr/README didn't get the memo.
      
      Author: Amit Langote
      195fbd40
    • Andres Freund's avatar
      Remove dependency on ordering in logical decoding upsert test. · 581f4f96
      Andres Freund authored
      Buildfarm member magpie sorted the output differently than intended by
      Peter. "Resolve" the problem by simply not aggregating, it's not that
      many lines.
      581f4f96
    • Andres Freund's avatar
      Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE. · 168d5805
      Andres Freund authored
      The newly added ON CONFLICT clause allows to specify an alternative to
      raising a unique or exclusion constraint violation error when inserting.
      ON CONFLICT refers to constraints that can either be specified using a
      inference clause (by specifying the columns of a unique constraint) or
      by naming a unique or exclusion constraint.  DO NOTHING avoids the
      constraint violation, without touching the pre-existing row.  DO UPDATE
      SET ... [WHERE ...] updates the pre-existing tuple, and has access to
      both the tuple proposed for insertion and the existing tuple; the
      optional WHERE clause can be used to prevent an update from being
      executed.  The UPDATE SET and WHERE clauses have access to the tuple
      proposed for insertion using the "magic" EXCLUDED alias, and to the
      pre-existing tuple using the table name or its alias.
      
      This feature is often referred to as upsert.
      
      This is implemented using a new infrastructure called "speculative
      insertion". It is an optimistic variant of regular insertion that first
      does a pre-check for existing tuples and then attempts an insert.  If a
      violating tuple was inserted concurrently, the speculatively inserted
      tuple is deleted and a new attempt is made.  If the pre-check finds a
      matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
      If the insertion succeeds without detecting a conflict, the tuple is
      deemed inserted.
      
      To handle the possible ambiguity between the excluded alias and a table
      named excluded, and for convenience with long relation names, INSERT
      INTO now can alias its target table.
      
      Bumps catversion as stored rules change.
      
      Author: Peter Geoghegan, with significant contributions from Heikki
          Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
      Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
          Dean Rasheed, Stephen Frost and many others.
      168d5805
  5. 07 May, 2015 7 commits
    • Andres Freund's avatar
      Represent columns requiring insert and update privileges indentently. · 2c8f4836
      Andres Freund authored
      Previously, relation range table entries used a single Bitmapset field
      representing which columns required either UPDATE or INSERT privileges,
      despite the fact that INSERT and UPDATE privileges are separately
      cataloged, and may be independently held.  As statements so far required
      either insert or update privileges but never both, that was
      sufficient. The required permission could be inferred from the top level
      statement run.
      
      The upcoming INSERT ... ON CONFLICT UPDATE feature needs to
      independently check for both privileges in one statement though, so that
      is not sufficient anymore.
      
      Bumps catversion as stored rules change.
      
      Author: Peter Geoghegan
      Reviewed-By: Andres Freund
      2c8f4836
    • Alvaro Herrera's avatar
      Improve BRIN infra, minmax opclass and regression test · db5f98ab
      Alvaro Herrera authored
      The minmax opclass was using the wrong support functions when
      cross-datatypes queries were run.  Instead of trying to fix the
      pg_amproc definitions (which apparently is not possible), use the
      already correct pg_amop entries instead.  This requires jumping through
      more hoops (read: extra syscache lookups) to obtain the underlying
      functions to execute, but it is necessary for correctness.
      
      Author: Emre Hasegeli, tweaked by Álvaro
      Review: Andreas Karlsson
      
      Also change BrinOpcInfo to record each stored type's typecache entry
      instead of just the OID.  Turns out that the full type cache is
      necessary in brin_deform_tuple: the original code used the indexed
      type's byval and typlen properties to extract the stored tuple, which is
      correct in Minmax; but in other implementations that want to store
      something different, that's wrong.  The realization that this is a bug
      comes from Emre also, but I did not use his patch.
      
      I also adopted Emre's regression test code (with smallish changes),
      which is more complete.
      db5f98ab
    • Robert Haas's avatar
      Fix incorrect math in DetermineSafeOldestOffset. · 7be47c56
      Robert Haas authored
      The old formula didn't have enough parentheses, so it would do the wrong
      thing, and it used / rather than % to find a remainder.  The effect of
      these oversights is that the stop point chosen by the logic introduced in
      commit b69bf30b might be rather
      meaningless.
      
      Thomas Munro, reviewed by Kevin Grittner, with a whitespace tweak by me.
      7be47c56
    • Bruce Momjian's avatar
      Makefile: Add comment that doc uninstall clears man directories · 82ec7c95
      Bruce Momjian authored
      Report by Mario Valdez
      82ec7c95
    • Magnus Hagander's avatar
      Properly send SCM status updates when shutting down service on Windows · 1a241d22
      Magnus Hagander authored
      The Service Control Manager should be notified regularly during a shutdown
      that takes a long time. Previously we would increaes the counter, but forgot
      to actually send the notification to the system. The loop counter was also
      incorrectly initalized in the event that the startup of the system took long
      enough for it to increase, which could cause the shutdown process not to wait
      as long as expected.
      
      Krystian Bigaj, reviewed by Michael Paquier
      1a241d22
    • Magnus Hagander's avatar
      Fix indentation that could mask a future bug · d678bde6
      Magnus Hagander authored
      Michael Paquier, spotted using Coverity
      d678bde6
    • Magnus Hagander's avatar
      Fix minor resource leak in pg_dump · aa7cf3ee
      Magnus Hagander authored
      Michael Paquier, spotted using Coverity
      aa7cf3ee
  6. 06 May, 2015 1 commit
  7. 05 May, 2015 7 commits
    • Tom Lane's avatar
    • Tom Lane's avatar
      Fix incorrect declaration of citext's regexp_matches() functions. · b22527f2
      Tom Lane authored
      These functions should return SETOF TEXT[], like the core functions they
      are wrappers for; but they were incorrectly declared as returning just
      TEXT[].  This mistake had two results: first, if there was no match you got
      a scalar null result, whereas what you should get is an empty set (zero
      rows).  Second, the 'g' flag was effectively ignored, since you would get
      only one result array even if there were multiple matches, as reported by
      Jeff Certain.
      
      While ignoring 'g' is a clear bug, the behavior for no matches might well
      have been thought to be the intended behavior by people who hadn't compared
      it carefully to the core regexp_matches() functions.  So we should tread
      carefully about introducing this change in the back branches.  Still, it
      clearly is a bug and so providing some fix is desirable.
      
      After discussion, the conclusion was to introduce the change in a 1.1
      version of the citext extension (as we would need to do anyway); 1.0 still
      contains the incorrect behavior.  1.1 is the default and only available
      version in HEAD, but it is optional in the back branches, where 1.0 remains
      the default version.  People wishing to adopt the fix in back branches will
      need to explicitly do ALTER EXTENSION citext UPDATE TO '1.1'.  (I also
      provided a downgrade script in the back branches, so people could go back
      to 1.0 if necessary.)
      
      This should be called out as an incompatible change in the 9.5 release
      notes, although we'll also document it in the next set of back-branch
      release notes.  The notes should mention that any views or rules that use
      citext's regexp_matches() functions will need to be dropped before
      upgrading to 1.1, and then recreated again afterwards.
      
      Back-patch to 9.1.  The bug goes all the way back to citext's introduction
      in 8.4, but pre-9.1 there is no extension mechanism with which to manage
      the change.  Given the lack of previous complaints it seems unnecessary to
      change this behavior in 9.0, anyway.
      b22527f2
    • Peter Eisentraut's avatar
    • Alvaro Herrera's avatar
      Add geometry/range functions to support BRIN inclusion · 3b6db1f4
      Alvaro Herrera authored
      This commit adds the following functions:
          box(point) -> box
          bound_box(box, box) -> box
          inet_same_family(inet, inet) -> bool
          inet_merge(inet, inet) -> cidr
          range_merge(anyrange, anyrange) -> anyrange
      
      The first of these is also used to implement a new assignment cast from
      point to box.
      
      These functions are the first part of a base to implement an "inclusion"
      operator class for BRIN, for multidimensional data types.
      
      Author: Emre Hasegeli
      Reviewed by: Andreas Karlsson
      3b6db1f4
    • Robert Haas's avatar
      Fix some problems with patch to fsync the data directory. · 456ff086
      Robert Haas authored
      pg_win32_is_junction() was a typo for pgwin32_is_junction().  open()
      was used not only in a two-argument form, which breaks on Windows,
      but also where BasicOpenFile() should have been used.
      
      Per reports from Andrew Dunstan and David Rowley.
      456ff086
    • Peter Eisentraut's avatar
      hstore_plpython: Support tests on Python 2.3 · c0574cd5
      Peter Eisentraut authored
      Python 2.3 does not have the sorted() function, so do it the long way.
      c0574cd5
    • Peter Eisentraut's avatar
      Fix typos · ad8d6d06
      Peter Eisentraut authored
      Author: Erik Rijkers <er@xs4all.nl>
      ad8d6d06
  8. 04 May, 2015 3 commits
    • Robert Haas's avatar
      Use outerPlanState macro instead of referring to leffttree. · 40f42d2a
      Robert Haas authored
      This makes the executor code more consistent.  It also removes
      an apparently superfluous NULL test in nodeGroup.c.
      
      Qingqing Zhou, reviewed by Tom Lane, and further revised by me.
      40f42d2a
    • Tom Lane's avatar
      Improve procost estimates for some text search functions. · 2503982b
      Tom Lane authored
      The text search functions that involve parsing raw text into lexemes are
      remarkably CPU-intensive, so estimating them at the same cost as most other
      built-in functions seems like a mistake; moreover, doing so turns out to
      discourage the optimizer from using functional indexes on these functions.
      After some debate, we've agreed to raise procost from 1 to 100 for
      to_tsvector(), plainto_tsvector(), to_tsquery(), ts_headline(),
      ts_match_tt(), and ts_match_tq(), which are all the text search functions
      that parse raw text.
      
      Also increase procost for the 2-argument form of ts_rewrite()
      (tsquery_rewrite_query); while this function doesn't do text parsing,
      it does execute a user-supplied SQL query, so its previous procost of 1 is
      clearly a drastic underestimate.  It seems reasonable to assign it the same
      cost we assign to PL functions by default, so 100 is the number here too.
      
      I did not bother bumping catversion for this change, since it does not
      break catalog compatibility with the server executable nor result in
      any regression test changes.
      
      Per complaint from Andrew Gierth and subsequent discussion.
      2503982b
    • Robert Haas's avatar
      Recursively fsync() the data directory after a crash. · 2ce439f3
      Robert Haas authored
      Otherwise, if there's another crash, some writes from after the first
      crash might make it to disk while writes from before the crash fail
      to make it to disk.  This could lead to data corruption.
      
      Back-patch to all supported versions.
      
      Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
      by me.
      2ce439f3