1. 11 Mar, 2016 8 commits
    • Robert Haas's avatar
      psql: Don't automatically use expanded format when there's 1 column. · 69ab7b9d
      Robert Haas authored
      Andreas Karlsson and Robert Haas
      69ab7b9d
    • Robert Haas's avatar
      Fix a typo, and remove unnecessary pgstat_report_wait_end(). · 481c76ab
      Robert Haas authored
      Per Amit Kapila.
      481c76ab
    • Magnus Hagander's avatar
      Refactor receivelog.c parameters · 38c83c9b
      Magnus Hagander authored
      Much cruft had accumulated over time with a large number of parameters
      passed down between functions very deep. With this refactoring, instead
      introduce a StreamCtl structure that holds the parameters, and pass around
      a pointer to this structure instead. This makes it much easier to add or
      remove fields that are needed deeper down in the implementation without
      having to modify every function header in the file.
      
      Patch by me after much nagging from Andres
      Reviewed by Craig Ringer and Daniel Gustafsson
      38c83c9b
    • Simon Riggs's avatar
      Allow emit_log_hook to see original message text · 73e7e49d
      Simon Riggs authored
      emit_log_hook could only see the translated text, making it harder to identify
      which message was being sent. Pass original text to allow the exact message to
      be identified, whichever language is used for logging.
      
      Discussion: 20160216.184755.59721141.horiguchi.kyotaro@lab.ntt.co.jp
      Author: Kyotaro Horiguchi
      73e7e49d
    • Robert Haas's avatar
      Simplify GetLockNameFromTagType. · a414d96a
      Robert Haas authored
      The old code is wrong, because it returns a pointer to an automatic
      variable.  And it's also more clever than we really need to be
      considering that the case it's worrying about should never happen.
      a414d96a
    • Andres Freund's avatar
      Blindly try to fix dtrace enabled builds, broken in 9cd00c45. · c94f0c29
      Andres Freund authored
      Reported-By: Peter Eisentraut
      Discussion: 56E2239E.1050607@gmx.net
      c94f0c29
    • Andres Freund's avatar
      Checkpoint sorting and balancing. · 9cd00c45
      Andres Freund authored
      Up to now checkpoints were written in the order they're in the
      BufferDescriptors. That's nearly random in a lot of cases, which
      performs badly on rotating media, but even on SSDs it causes slowdowns.
      
      To avoid that, sort checkpoints before writing them out. We currently
      sort by tablespace, relfilenode, fork and block number.
      
      One of the major reasons that previously wasn't done, was fear of
      imbalance between tablespaces. To address that balance writes between
      tablespaces.
      
      The other prime concern was that the relatively large allocation to sort
      the buffers in might fail, preventing checkpoints from happening. Thus
      pre-allocate the required memory in shared memory, at server startup.
      
      This particularly makes it more efficient to have checkpoint flushing
      enabled, because that'll often result in a lot of writes that can be
      coalesced into one flush.
      
      Discussion: alpine.DEB.2.10.1506011320000.28433@sto
      Author: Fabien Coelho and Andres Freund
      9cd00c45
    • Andres Freund's avatar
      Allow to trigger kernel writeback after a configurable number of writes. · 428b1d6b
      Andres Freund authored
      Currently writes to the main data files of postgres all go through the
      OS page cache. This means that some operating systems can end up
      collecting a large number of dirty buffers in their respective page
      caches.  When these dirty buffers are flushed to storage rapidly, be it
      because of fsync(), timeouts, or dirty ratios, latency for other reads
      and writes can increase massively.  This is the primary reason for
      regular massive stalls observed in real world scenarios and artificial
      benchmarks; on rotating disks stalls on the order of hundreds of seconds
      have been observed.
      
      On linux it is possible to control this by reducing the global dirty
      limits significantly, reducing the above problem. But global
      configuration is rather problematic because it'll affect other
      applications; also PostgreSQL itself doesn't always generally want this
      behavior, e.g. for temporary files it's undesirable.
      
      Several operating systems allow some control over the kernel page
      cache. Linux has sync_file_range(2), several posix systems have msync(2)
      and posix_fadvise(2). sync_file_range(2) is preferable because it
      requires no special setup, whereas msync() requires the to-be-flushed
      range to be mmap'ed. For the purpose of flushing dirty data
      posix_fadvise(2) is the worst alternative, as flushing dirty data is
      just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
      from the page cache.  Thus the feature is enabled by default only on
      linux, but can be enabled on all systems that have any of the above
      APIs.
      
      While desirable and likely possible this patch does not contain an
      implementation for windows.
      
      With the infrastructure added, writes made via checkpointer, bgwriter
      and normal user backends can be flushed after a configurable number of
      writes. Each of these sources of writes controlled by a separate GUC,
      checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
      respectively; they're separate because the number of flushes that are
      good are separate, and because the performance considerations of
      controlled flushing for each of these are different.
      
      A later patch will add checkpoint sorting - after that flushes from the
      ckeckpoint will almost always be desirable. Bgwriter flushes are most of
      the time going to be random, which are slow on lots of storage hardware.
      Flushing in backends works well if the storage and bgwriter can keep up,
      but if not it can have negative consequences.  This patch is likely to
      have negative performance consequences without checkpoint sorting, but
      unfortunately so has sorting without flush control.
      
      Discussion: alpine.DEB.2.10.1506011320000.28433@sto
      Author: Fabien Coelho and Andres Freund
      428b1d6b
  2. 10 Mar, 2016 14 commits
    • Tom Lane's avatar
      Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too. · c82c92b1
      Tom Lane authored
      All along, this function should have treated WindowFuncs in a manner
      similar to Aggrefs, ie with an option whether or not to recurse into them.
      By not considering the case, it was always recursing, which is OK for most
      callers (although I suspect that the case in prepare_sort_from_pathkeys
      might represent a bug).  But now we need return-without-recursing behavior
      as well.  There are also more than a few callers that should never see a
      WindowFunc, and now we'll get some error checking on that.
      c82c92b1
    • Robert Haas's avatar
      Don't vacuum all-frozen pages. · fd31cd26
      Robert Haas authored
      Commit a892234f gave us enough
      infrastructure to avoid vacuuming pages where every tuple on the
      page is already frozen.  So, replace the notion of a scan_all or
      whole-table vacuum with the less onerous notion of an "aggressive"
      vacuum, which will pages that are all-visible, but still skip those
      that are all-frozen.
      
      This should greatly reduce the cost of anti-wraparound vacuuming
      on large clusters where the majority of data is never touched
      between one cycle and the next, because we'll no longer have to
      read all of those pages only to find out that we don't need to
      do anything with them.
      
      Patch by me, reviewed by Masahiko Sawada.
      fd31cd26
    • Tom Lane's avatar
      Refactor pull_var_clause's API to make it less tedious to extend. · 364a9f47
      Tom Lane authored
      In commit 1d97c19a and later c1d9579d, we extended
      pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
      to maintain, though, because it means every time we add a new behavior we
      must touch every last one of the call sites, even if there's a reasonable
      default behavior that most of them could use.  Let's switch over to using a
      bitmask of flags, instead; that seems more maintainable and might save a
      nanosecond or two as well.  This commit changes no behavior in itself,
      though I'm going to follow it up with one that does add a new behavior.
      
      In passing, remove flatten_tlist(), which has not been used since 9.1
      and would otherwise need the same API changes.
      
      Removing these enums means that optimizer/tlist.h no longer needs to
      depend on optimizer/var.h.  Changing that caused a number of C files to
      need addition of #include "optimizer/var.h" (probably we can thank old
      runs of pgrminclude for that); but on balance it seems like a good change
      anyway.
      364a9f47
    • Simon Riggs's avatar
      Rework wait for AccessExclusiveLocks on Hot Standby · 37c54863
      Simon Riggs authored
      Earlier version committed in 9.0 caused spurious waits in some cases.
      New infrastructure for lock waits in 9.3 used to correct and improve this.
      
      Jeff Janes based upon a proposal by Simon Riggs, who also reviewed
      Additional review comments from Amit Kapila
      37c54863
    • Robert Haas's avatar
      Provide much better wait information in pg_stat_activity. · 53be0b1a
      Robert Haas authored
      When a process is waiting for a heavyweight lock, we will now indicate
      the type of heavyweight lock for which it is waiting.  Also, you can
      now see when a process is waiting for a lightweight lock - in which
      case we will indicate the individual lock name or the tranche, as
      appropriate - or for a buffer pin.
      
      Amit Kapila, Ildus Kurbangaliev, reviewed by me.  Lots of helpful
      discussion and suggestions by many others, including Alexander
      Korotkov, Vladimir Borodin, and many others.
      53be0b1a
    • Alvaro Herrera's avatar
      Document BRIN a bit more thoroughly · a3a8309d
      Alvaro Herrera authored
      The chapter "Interfacing Extensions To Indexes" and CREATE OPERATOR
      CLASS reference page were missed when BRIN was added.  We document
      all our other index access methods there, so make sure BRIN complies.
      
      Author: Álvaro Herrera
      Reported-By: Julien Rouhaud, Tom Lane
      Reviewed-By: Emre Hasegeli
      Discussion: https://www.postgresql.org/message-id/56CF604E.9000303%40dalibo.com
      Backpatch: 9.5, where BRIN was introduced
      a3a8309d
    • Magnus Hagander's avatar
      Avoid crash on old Windows with AVX2-capable CPU for VS2013 builds · 9d903882
      Magnus Hagander authored
      The Visual Studio 2013 CRT generates invalid code when it makes a 64-bit
      build that is later used on a CPU that supports AVX2 instructions using a
      version of Windows before 7SP1/2008R2SP1.
      
      Detect this combination, and in those cases turn off the generation of
      FMA3, per recommendation from the Visual Studio team.
      
      The bug is actually in the CRT shipping with Visual Studio 2013, but
      Microsoft have stated they're only fixing it in newer major versions.
      The fix is therefor conditioned specifically on being built with this
      version of Visual Studio, and not previous or later versions.
      
      Author: Christian Ullrich
      9d903882
    • Simon Riggs's avatar
      Reduce size of two phase file header · e0694cf9
      Simon Riggs authored
      Previously 2PC header was fixed at 200 bytes, which in most cases wasted
      WAL space for a workload using 2PC heavily.
      
      Pavan Deolasee, reviewed by Petr Jelinek
      e0694cf9
    • Simon Riggs's avatar
      Reduce lock level for altering fillfactor · fcb4bfdd
      Simon Riggs authored
      Fabrízio de Royes Mello and Simon Riggs
      fcb4bfdd
    • Robert Haas's avatar
      Code review for b6fb6471. · 090b287f
      Robert Haas authored
      Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev.
      Patch by Amit Langote.
      090b287f
    • Tom Lane's avatar
      Remove a couple of useless pstrdup() calls. · cc402116
      Tom Lane authored
      There's no point in pstrdup'ing the result of TextDatumGetCString,
      since that's necessarily already a freshly-palloc'd C string.
      
      These particular calls are unlikely to be of any consequence
      performance-wise, but still they're a bad precedent that can confuse
      future patch authors.
      
      Noted by Chapman Flack.
      cc402116
    • Andres Freund's avatar
      Avoid unlikely data-loss scenarios due to rename() without fsync. · 1d4a0ab1
      Andres Freund authored
      Renaming a file using rename(2) is not guaranteed to be durable in face
      of crashes. Use the previously added durable_rename()/durable_link_or_rename()
      in various places where we previously just renamed files.
      
      Most of the changed call sites are arguably not critical, but it seems
      better to err on the side of too much durability.  The most prominent
      known case where the previously missing fsyncs could cause data loss is
      crashes at the end of a checkpoint. After the actual checkpoint has been
      performed, old WAL files are recycled. When they're filled, their
      contents are fdatasynced, but we did not fsync the containing
      directory. An OS/hardware crash in an unfortunate moment could then end
      up leaving that file with its old name, but new content; WAL replay
      would thus not replay it.
      
      Reported-By: Tomas Vondra
      Author: Michael Paquier, Tomas Vondra, Andres Freund
      Discussion: 56583BDD.9060302@2ndquadrant.com
      Backpatch: All supported branches
      1d4a0ab1
    • Andres Freund's avatar
      Introduce durable_rename() and durable_link_or_rename(). · 606e0f98
      Andres Freund authored
      Renaming a file using rename(2) is not guaranteed to be durable in face
      of crashes; especially on filesystems like xfs and ext4 when mounted
      with data=writeback. To be certain that a rename() atomically replaces
      the previous file contents in the face of crashes and different
      filesystems, one has to fsync the old filename, rename the file, fsync
      the new filename, fsync the containing directory.  This sequence is not
      generally adhered to currently; which exposes us to data loss risks. To
      avoid having to repeat this arduous sequence, introduce
      durable_rename(), which wraps all that.
      
      Also add durable_link_or_rename(). Several places use link() (with a
      fallback to rename()) to rename a file, trying to avoid replacing the
      target file out of paranoia. Some of those rename sequences need to be
      durable as well. There seems little reason extend several copies of the
      same logic, so centralize the link() callers.
      
      This commit does not yet make use of the new functions; they're used in
      a followup commit.
      
      Author: Michael Paquier, Andres Freund
      Discussion: 56583BDD.9060302@2ndquadrant.com
      Backpatch: All supported branches
      606e0f98
    • Peter Eisentraut's avatar
      doc: Reorganize pg_resetxlog reference page · e19e4cf0
      Peter Eisentraut authored
      The pg_resetxlog reference page didn't have a proper options list, only
      running text listing the options and some explanations of them.  This
      might have worked when there were only a few options, but the list has
      grown over the releases, and now it's hard to find an option and its
      associated explanation.  So write out the options list as on other
      reference pages.
      e19e4cf0
  3. 09 Mar, 2016 18 commits
    • Alvaro Herrera's avatar
      PostgresNode: add backup_fs_hot and backup_fs_cold · 28f6df3c
      Alvaro Herrera authored
      These simple methods rely on RecursiveCopy to create a filesystem-level
      backup of a server.  They aren't currently used anywhere yet,but will be
      useful for future tests.
      
      Author: Craig Ringer
      Reviewed-By: Michael Paquier, Salvador Fandino, Álvaro Herrera
      Commitfest-URL: https://commitfest.postgresql.org/9/569/
      28f6df3c
    • Alvaro Herrera's avatar
      Add filter capability to RecursiveCopy::copypath · a31aaec4
      Alvaro Herrera authored
      This allows skipping copying certain files and subdirectories in tests.
      This is useful in some circumstances such as copying a data directory;
      future tests want this feature.
      
      Also POD-ify the module.
      
      Authors: Craig Ringer, Pallavi Sontakke
      Reviewed-By: Álvaro Herrera
      a31aaec4
    • Tom Lane's avatar
      Fix incorrect handling of NULL index entries in indexed ROW() comparisons. · a298a1e0
      Tom Lane authored
      An index search using a row comparison such as ROW(a, b) > ROW('x', 'y')
      would stop upon reaching a NULL entry in the "b" column, ignoring the
      fact that there might be non-NULL "b" values associated with later values
      of "a".  This happens because _bt_mark_scankey_required() marks the
      subsidiary scankey for "b" as required, which is just wrong: it's for
      a column after the one with the first inequality key (namely "a"), and
      thus can't be considered a required match.
      
      This bit of brain fade dates back to the very beginnings of our support
      for indexed ROW() comparisons, in 2006.  Kind of astonishing that no one
      came across it before Glen Takahashi, in bug #14010.
      
      Back-patch to all supported versions.
      
      Note: the given test case doesn't actually fail in unpatched 9.1, evidently
      because the fix for bug #6278 (i.e., stopping at nulls in either scan
      direction) is required to make it fail.  I'm sure I could devise a case
      that fails in 9.1 as well, perhaps with something involving making a cursor
      back up; but it doesn't seem worth the trouble.
      a298a1e0
    • Robert Haas's avatar
      Re-pgindent vacuumlazy.c. · be060cbc
      Robert Haas authored
      be060cbc
    • Robert Haas's avatar
      pgbench: When -T is used, don't wait for transactions beyond end of run. · accf7616
      Robert Haas authored
      At low rates, this can lead to pgbench taking significantly longer to
      terminate than the user might expect.  Repair.
      
      Fabien Coelho, reviewed by Aleksander Alekseev, Álvaro Herrera, and me.
      accf7616
    • Alvaro Herrera's avatar
      pgcrypto: support changing S2K iteration count · 188f359d
      Alvaro Herrera authored
      pgcrypto already supports key-stretching during symmetric encryption,
      including the salted-and-iterated method; but the number of iterations
      was not configurable.  This commit implements a new s2k-count parameter
      to pgp_sym_encrypt() which permits selecting a larger number of
      iterations.
      
      Author: Jeff Janes
      188f359d
    • Robert Haas's avatar
      Add a generic command progress reporting facility. · b6fb6471
      Robert Haas authored
      Using this facility, any utility command can report the target relation
      upon which it is operating, if there is one, and up to 10 64-bit
      counters; the intent of this is that users should be able to figure out
      what a utility command is doing without having to resort to ugly hacks
      like attaching strace to a backend.
      
      As a demonstration, this adds very crude reporting to lazy vacuum; we
      just report the target relation and nothing else.  A forthcoming patch
      will make VACUUM report a bunch of additional data that will make this
      much more interesting.  But this gets the basic framework in place.
      
      Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by
      Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao,
      and Masanori Oyama.
      b6fb6471
    • Tom Lane's avatar
      Fix incorrect tlist generation in create_gather_plan(). · 8776c15c
      Tom Lane authored
      This function is written as though Gather doesn't project; but it does.
      Even if it did not project, though, we must use build_path_tlist to ensure
      that the output columns receive correct sortgroupref labeling.
      
      Per report from Amit Kapila.
      8776c15c
    • Robert Haas's avatar
      postgres_fdw: Consider foreign joining and foreign sorting together. · aa09cd24
      Robert Haas authored
      Commit ccd8f979 gave us the ability to
      request that the remote side sort the data, and, later, commit
      e4106b25 gave us the ability to
      request that the remote side perform the join for us rather than doing
      it locally.  But we could not do both things at the same time: a
      remote SQL query that had an ORDER BY clause would never be a join.
      This commit adds that capability.
      
      Ashutosh Bapat, reviewed by me.
      aa09cd24
    • Tom Lane's avatar
      Fix copy-and-pasteo in comment. · d31f20e2
      Tom Lane authored
      Wensheng Zhang
      d31f20e2
    • Tom Lane's avatar
      Improve handling of pathtargets in planner.c. · 51c0f63e
      Tom Lane authored
      Refactor so that the internal APIs in planner.c deal in PathTargets not
      targetlists, and establish a more regular structure for deriving the
      targets needed for successive steps.
      
      There is more that could be done here; calculating the eval costs of each
      successive target independently is both inefficient and wrong in detail,
      since we won't actually recompute values available from the input node's
      tlist.  But it's no worse than what happened before the pathification
      rewrite.  In any case this seems like a good starting point for considering
      how to handle Konstantin Knizhnik's function-evaluation-postponement patch.
      51c0f63e
    • Andres Freund's avatar
      Add valgrind suppressions for python code. · 2f1f4439
      Andres Freund authored
      Python's allocator does some low-level tricks for efficiency;
      unfortunately they trigger valgrind errors. Those tricks can be disabled
      making instrumentation easier; but few people testing postgres will have
      such a build of python. So add broad suppressions of the resulting
      errors.
      
      See also https://svn.python.org/projects/python/trunk/Misc/README.valgrind
      
      This possibly will suppress valid errors, but without it it's basically
      impossible to use valgrind with plpython code.
      
      Author: Andres Freund
      Backpatch: 9.4, where we started to maintain valgrind suppressions
      2f1f4439
    • Andres Freund's avatar
      Add valgrind suppressions for bootstrap related code. · 5e43bee8
      Andres Freund authored
      Author: Andres Freund
      Backpatch: 9.4, where we started to maintain valgrind suppressions
      5e43bee8
    • Tom Lane's avatar
      Improve handling of group-column indexes in GroupingSetsPath. · 9e8b9942
      Tom Lane authored
      Instead of having planner.c compute a groupColIdx array and store it in
      GroupingSetsPaths, make create_groupingsets_plan() find the grouping
      columns by searching in the child plan node's tlist.  Although that's
      probably a bit slower for create_groupingsets_plan(), it's more like
      the way every other plan node type does this, and it provides positive
      confirmation that we know which child output columns we're supposed to be
      grouping on.  (Indeed, looking at this now, I'm not at all sure that it
      wasn't broken before, because create_groupingsets_plan() isn't demanding
      an exact tlist match from its child node.)  Also, this allows substantial
      simplification in planner.c, because it no longer needs to compute the
      groupColIdx array at all; no other cases were using it.
      
      I'd intended to put off this refactoring until later (like 9.7), but
      in view of the likely bug fix and the need to rationalize planner.c's
      tlist handling so we can do something sane with Konstantin Knizhnik's
      function-evaluation-postponement patch, I think it can't wait.
      9e8b9942
    • Peter Eisentraut's avatar
      Handle invalid libpq sockets in more places · a40814d7
      Peter Eisentraut authored
      Also, make error messages consistent.
      
      From: Michael Paquier <michael.paquier@gmail.com>
      a40814d7
    • Peter Eisentraut's avatar
    • Peter Eisentraut's avatar
      psql: Fix some strange code in SQL help creation · 92d4294d
      Peter Eisentraut authored
      Struct QL_HELP used to be defined as static in the sql_help.h header
      file, which is included in sql_help.c and help.c, thus creating two
      separate instances of the struct.  This causes a warning from GCC 6,
      because the struct is not used in sql_help.c.
      
      Instead, declare the struct as extern in the header file and define it
      in sql_help.c.  This also allows making a bunch of functions static
      because they are no longer needed outside of sql_help.c.
      Reviewed-by: default avatarThomas Munro <thomas.munro@enterprisedb.com>
      92d4294d
    • Peter Eisentraut's avatar
      ecpg: Fix typo · 0d0644dc
      Peter Eisentraut authored
      GCC 6 points out the redundant conditions, which were apparently typos.
      Reviewed-by: default avatarThomas Munro <thomas.munro@enterprisedb.com>
      0d0644dc