Commits · b1fdc727c37b23da947b4b0d440f723f31beb84d · Abuhujair Javed / Postgres FD Implementation

11 Mar, 2016 14 commits

Fix Windows build broken in 6943a946 · b1fdc727
Teodor Sigaev authored Mar 11, 2016
```
Also it fixes dynamic array allocation disallowed by ANSI-C.

Author: Stas Kelvich
```
b1fdc727

Fix merge affixes for numeric ones · 8829af47

Teodor Sigaev authored Mar 11, 2016

Some dictionaries have duplicated base words with different affix set, we
just merge that sets into one set. But previously merging of sets of affixes
was actually a concatenation of strings but it's wrong for numeric
representation of affixes because such representation uses comma to
separate affixes.

Author: Artur Zakirov

8829af47

Bump catalog version missed in 6943a946 · a9eb6c83
Teodor Sigaev authored Mar 11, 2016

a9eb6c83

Tsvector editing functions · 6943a946

Teodor Sigaev authored Mar 11, 2016

Adds several tsvector editting function: convert tsvector to/from text array,
set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
with given weights

Author: Stas Kelvich with some editorization by me
Reviewers: Tomas Vondram, Teodor Sigaev

6943a946

Minor additional refactoring of planner.c's PathTarget handling. · 49635d7b

Tom Lane authored Mar 11, 2016

Teach make_group_input_target() and make_window_input_target() to work
entirely with the PathTarget representation of tlists, rather than
constructing a tlist and immediately deconstructing it into PathTarget
format. In itself this only saves a few palloc's; the bigger picture is
that it opens the door for sharing cost_qual_eval work across all of
planner.c's constructions of PathTargets. I'll come back to that later.

In support of this, flesh out tlist.c's infrastructure for PathTargets
a bit more.

49635d7b

Allow setting sample ratio for auto_explain · 92f03fe7

Magnus Hagander authored Mar 11, 2016

New configuration parameter auto_explain.sample_ratio makes it
possible to log just a fraction of the queries meeting the configured
threshold, to reduce the amount of logging.

Author: Craig Ringer and Julien Rouhaud
Review: Petr Jelinek

92f03fe7

psql: Don't automatically use expanded format when there's 1 column. · 69ab7b9d
Robert Haas authored Mar 11, 2016
```
Andreas Karlsson and Robert Haas
```
69ab7b9d
Fix a typo, and remove unnecessary pgstat_report_wait_end(). · 481c76ab
Robert Haas authored Mar 11, 2016
```
Per Amit Kapila.
```
481c76ab

Refactor receivelog.c parameters · 38c83c9b

Magnus Hagander authored Mar 11, 2016

Much cruft had accumulated over time with a large number of parameters
passed down between functions very deep. With this refactoring, instead
introduce a StreamCtl structure that holds the parameters, and pass around
a pointer to this structure instead. This makes it much easier to add or
remove fields that are needed deeper down in the implementation without
having to modify every function header in the file.

Patch by me after much nagging from Andres
Reviewed by Craig Ringer and Daniel Gustafsson

38c83c9b

Allow emit_log_hook to see original message text · 73e7e49d

Simon Riggs authored Mar 11, 2016

emit_log_hook could only see the translated text, making it harder to identify
which message was being sent. Pass original text to allow the exact message to
be identified, whichever language is used for logging.

Discussion: 20160216.184755.59721141.horiguchi.kyotaro@lab.ntt.co.jp
Author: Kyotaro Horiguchi

73e7e49d

Simplify GetLockNameFromTagType. · a414d96a

Robert Haas authored Mar 10, 2016

The old code is wrong, because it returns a pointer to an automatic
variable. And it's also more clever than we really need to be
considering that the case it's worrying about should never happen.

a414d96a

Blindly try to fix dtrace enabled builds, broken in 9cd00c45. · c94f0c29
Andres Freund authored Mar 10, 2016
```
Reported-By: Peter Eisentraut
Discussion: 56E2239E.1050607@gmx.net
```
c94f0c29

Checkpoint sorting and balancing. · 9cd00c45

Andres Freund authored Feb 19, 2016

Up to now checkpoints were written in the order they're in the
BufferDescriptors. That's nearly random in a lot of cases, which
performs badly on rotating media, but even on SSDs it causes slowdowns.

To avoid that, sort checkpoints before writing them out. We currently
sort by tablespace, relfilenode, fork and block number.

One of the major reasons that previously wasn't done, was fear of
imbalance between tablespaces. To address that balance writes between
tablespaces.

The other prime concern was that the relatively large allocation to sort
the buffers in might fail, preventing checkpoints from happening. Thus
pre-allocate the required memory in shared memory, at server startup.

This particularly makes it more efficient to have checkpoint flushing
enabled, because that'll often result in a lot of writes that can be
coalesced into one flush.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund

9cd00c45

Allow to trigger kernel writeback after a configurable number of writes. · 428b1d6b

Andres Freund authored Feb 19, 2016

Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches. When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively. This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.

On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.

Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache. Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.

While desirable and likely possible this patch does not contain an
implementation for windows.

With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.

A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences. This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund

428b1d6b

10 Mar, 2016 14 commits

Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too. · c82c92b1

Tom Lane authored Mar 10, 2016

All along, this function should have treated WindowFuncs in a manner
similar to Aggrefs, ie with an option whether or not to recurse into them.
By not considering the case, it was always recursing, which is OK for most
callers (although I suspect that the case in prepare_sort_from_pathkeys
might represent a bug). But now we need return-without-recursing behavior
as well. There are also more than a few callers that should never see a
WindowFunc, and now we'll get some error checking on that.

c82c92b1

Don't vacuum all-frozen pages. · fd31cd26

Robert Haas authored Mar 10, 2016

Commit a892234f gave us enough
infrastructure to avoid vacuuming pages where every tuple on the
page is already frozen.  So, replace the notion of a scan_all or
whole-table vacuum with the less onerous notion of an "aggressive"
vacuum, which will pages that are all-visible, but still skip those
that are all-frozen.

This should greatly reduce the cost of anti-wraparound vacuuming
on large clusters where the majority of data is never touched
between one cycle and the next, because we'll no longer have to
read all of those pages only to find out that we don't need to
do anything with them.

Patch by me, reviewed by Masahiko Sawada.

fd31cd26

Refactor pull_var_clause's API to make it less tedious to extend. · 364a9f47

Tom Lane authored Mar 10, 2016

In commit 1d97c19a and later c1d9579d, we extended
pull_var_clause's API by adding enum-type arguments. That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use. Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well. This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h. Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.

364a9f47

Rework wait for AccessExclusiveLocks on Hot Standby · 37c54863

Simon Riggs authored Mar 10, 2016

Earlier version committed in 9.0 caused spurious waits in some cases.
New infrastructure for lock waits in 9.3 used to correct and improve this.

Jeff Janes based upon a proposal by Simon Riggs, who also reviewed
Additional review comments from Amit Kapila

37c54863

Provide much better wait information in pg_stat_activity. · 53be0b1a

Robert Haas authored Mar 10, 2016

When a process is waiting for a heavyweight lock, we will now indicate
the type of heavyweight lock for which it is waiting.  Also, you can
now see when a process is waiting for a lightweight lock - in which
case we will indicate the individual lock name or the tranche, as
appropriate - or for a buffer pin.

Amit Kapila, Ildus Kurbangaliev, reviewed by me.  Lots of helpful
discussion and suggestions by many others, including Alexander
Korotkov, Vladimir Borodin, and many others.

53be0b1a

Document BRIN a bit more thoroughly · a3a8309d

Alvaro Herrera authored Mar 10, 2016

The chapter "Interfacing Extensions To Indexes" and CREATE OPERATOR
CLASS reference page were missed when BRIN was added.  We document
all our other index access methods there, so make sure BRIN complies.

Author: Álvaro Herrera
Reported-By: Julien Rouhaud, Tom Lane
Reviewed-By: Emre Hasegeli
Discussion: https://www.postgresql.org/message-id/56CF604E.9000303%40dalibo.com
Backpatch: 9.5, where BRIN was introduced

a3a8309d

Avoid crash on old Windows with AVX2-capable CPU for VS2013 builds · 9d903882

Magnus Hagander authored Mar 10, 2016

The Visual Studio 2013 CRT generates invalid code when it makes a 64-bit
build that is later used on a CPU that supports AVX2 instructions using a
version of Windows before 7SP1/2008R2SP1.

Detect this combination, and in those cases turn off the generation of
FMA3, per recommendation from the Visual Studio team.

The bug is actually in the CRT shipping with Visual Studio 2013, but
Microsoft have stated they're only fixing it in newer major versions.
The fix is therefor conditioned specifically on being built with this
version of Visual Studio, and not previous or later versions.

Author: Christian Ullrich

9d903882

Reduce size of two phase file header · e0694cf9

Simon Riggs authored Mar 10, 2016

Previously 2PC header was fixed at 200 bytes, which in most cases wasted
WAL space for a workload using 2PC heavily.

Pavan Deolasee, reviewed by Petr Jelinek

e0694cf9

Reduce lock level for altering fillfactor · fcb4bfdd
Simon Riggs authored Mar 10, 2016
```
Fabrízio de Royes Mello and Simon Riggs
```
fcb4bfdd

Code review for . · 090b287f

Robert Haas authored Mar 10, 2016

Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev.
Patch by Amit Langote.

090b287f

Remove a couple of useless pstrdup() calls. · cc402116

Tom Lane authored Mar 09, 2016

There's no point in pstrdup'ing the result of TextDatumGetCString,
since that's necessarily already a freshly-palloc'd C string.

These particular calls are unlikely to be of any consequence
performance-wise, but still they're a bad precedent that can confuse
future patch authors.

Noted by Chapman Flack.

cc402116

Avoid unlikely data-loss scenarios due to rename() without fsync. · 1d4a0ab1

Andres Freund authored Mar 09, 2016

Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. Use the previously added durable_rename()/durable_link_or_rename()
in various places where we previously just renamed files.

Most of the changed call sites are arguably not critical, but it seems
better to err on the side of too much durability.  The most prominent
known case where the previously missing fsyncs could cause data loss is
crashes at the end of a checkpoint. After the actual checkpoint has been
performed, old WAL files are recycled. When they're filled, their
contents are fdatasynced, but we did not fsync the containing
directory. An OS/hardware crash in an unfortunate moment could then end
up leaving that file with its old name, but new content; WAL replay
would thus not replay it.

Reported-By: Tomas Vondra
Author: Michael Paquier, Tomas Vondra, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches

1d4a0ab1

Introduce durable_rename() and durable_link_or_rename(). · 606e0f98

Andres Freund authored Mar 09, 2016

Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes; especially on filesystems like xfs and ext4 when mounted
with data=writeback. To be certain that a rename() atomically replaces
the previous file contents in the face of crashes and different
filesystems, one has to fsync the old filename, rename the file, fsync
the new filename, fsync the containing directory.  This sequence is not
generally adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.

Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well. There seems little reason extend several copies of the
same logic, so centralize the link() callers.

This commit does not yet make use of the new functions; they're used in
a followup commit.

Author: Michael Paquier, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches

606e0f98

doc: Reorganize pg_resetxlog reference page · e19e4cf0

Peter Eisentraut authored Feb 29, 2016

The pg_resetxlog reference page didn't have a proper options list, only
running text listing the options and some explanations of them.  This
might have worked when there were only a few options, but the list has
grown over the releases, and now it's hard to find an option and its
associated explanation.  So write out the options list as on other
reference pages.

e19e4cf0

09 Mar, 2016 12 commits

PostgresNode: add backup_fs_hot and backup_fs_cold · 28f6df3c

Alvaro Herrera authored Mar 09, 2016

These simple methods rely on RecursiveCopy to create a filesystem-level
backup of a server.  They aren't currently used anywhere yet,but will be
useful for future tests.

Author: Craig Ringer
Reviewed-By: Michael Paquier, Salvador Fandino, Álvaro Herrera
Commitfest-URL: https://commitfest.postgresql.org/9/569/

28f6df3c

Add filter capability to RecursiveCopy::copypath · a31aaec4

Alvaro Herrera authored Mar 09, 2016

This allows skipping copying certain files and subdirectories in tests.
This is useful in some circumstances such as copying a data directory;
future tests want this feature.

Also POD-ify the module.

Authors: Craig Ringer, Pallavi Sontakke
Reviewed-By: Álvaro Herrera

a31aaec4

Fix incorrect handling of NULL index entries in indexed ROW() comparisons. · a298a1e0

Tom Lane authored Mar 09, 2016

An index search using a row comparison such as ROW(a, b) > ROW('x', 'y')
would stop upon reaching a NULL entry in the "b" column, ignoring the
fact that there might be non-NULL "b" values associated with later values
of "a". This happens because _bt_mark_scankey_required() marks the
subsidiary scankey for "b" as required, which is just wrong: it's for
a column after the one with the first inequality key (namely "a"), and
thus can't be considered a required match.

This bit of brain fade dates back to the very beginnings of our support
for indexed ROW() comparisons, in 2006. Kind of astonishing that no one
came across it before Glen Takahashi, in bug #14010.

Back-patch to all supported versions.

Note: the given test case doesn't actually fail in unpatched 9.1, evidently
because the fix for bug #6278 (i.e., stopping at nulls in either scan
direction) is required to make it fail. I'm sure I could devise a case
that fails in 9.1 as well, perhaps with something involving making a cursor
back up; but it doesn't seem worth the trouble.

a298a1e0

Re-pgindent vacuumlazy.c. · be060cbc
Robert Haas authored Mar 09, 2016

be060cbc

pgbench: When -T is used, don't wait for transactions beyond end of run. · accf7616

Robert Haas authored Mar 09, 2016

At low rates, this can lead to pgbench taking significantly longer to
terminate than the user might expect. Repair.

Fabien Coelho, reviewed by Aleksander Alekseev, Álvaro Herrera, and me.

accf7616

pgcrypto: support changing S2K iteration count · 188f359d

Alvaro Herrera authored Mar 09, 2016

pgcrypto already supports key-stretching during symmetric encryption,
including the salted-and-iterated method; but the number of iterations
was not configurable.  This commit implements a new s2k-count parameter
to pgp_sym_encrypt() which permits selecting a larger number of
iterations.

Author: Jeff Janes

188f359d

Add a generic command progress reporting facility. · b6fb6471

Robert Haas authored Mar 09, 2016

Using this facility, any utility command can report the target relation
upon which it is operating, if there is one, and up to 10 64-bit
counters; the intent of this is that users should be able to figure out
what a utility command is doing without having to resort to ugly hacks
like attaching strace to a backend.

As a demonstration, this adds very crude reporting to lazy vacuum; we
just report the target relation and nothing else. A forthcoming patch
will make VACUUM report a bunch of additional data that will make this
much more interesting. But this gets the basic framework in place.

Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by
Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao,
and Masanori Oyama.

b6fb6471

Fix incorrect tlist generation in create_gather_plan(). · 8776c15c

Tom Lane authored Mar 09, 2016

This function is written as though Gather doesn't project; but it does.
Even if it did not project, though, we must use build_path_tlist to ensure
that the output columns receive correct sortgroupref labeling.

Per report from Amit Kapila.

8776c15c

postgres_fdw: Consider foreign joining and foreign sorting together. · aa09cd24

Robert Haas authored Mar 09, 2016

Commit ccd8f979 gave us the ability to
request that the remote side sort the data, and, later, commit
e4106b25 gave us the ability to
request that the remote side perform the join for us rather than doing
it locally.  But we could not do both things at the same time: a
remote SQL query that had an ORDER BY clause would never be a join.
This commit adds that capability.

Ashutosh Bapat, reviewed by me.

aa09cd24

Fix copy-and-pasteo in comment. · d31f20e2
Tom Lane authored Mar 09, 2016
```
Wensheng Zhang
```
d31f20e2

Improve handling of pathtargets in planner.c. · 51c0f63e

Tom Lane authored Mar 09, 2016

Refactor so that the internal APIs in planner.c deal in PathTargets not
targetlists, and establish a more regular structure for deriving the
targets needed for successive steps.

There is more that could be done here; calculating the eval costs of each
successive target independently is both inefficient and wrong in detail,
since we won't actually recompute values available from the input node's
tlist. But it's no worse than what happened before the pathification
rewrite. In any case this seems like a good starting point for considering
how to handle Konstantin Knizhnik's function-evaluation-postponement patch.

51c0f63e

Add valgrind suppressions for python code. · 2f1f4439

Andres Freund authored Mar 08, 2016

Python's allocator does some low-level tricks for efficiency;
unfortunately they trigger valgrind errors. Those tricks can be disabled
making instrumentation easier; but few people testing postgres will have
such a build of python. So add broad suppressions of the resulting
errors.

See also https://svn.python.org/projects/python/trunk/Misc/README.valgrind

This possibly will suppress valid errors, but without it it's basically
impossible to use valgrind with plpython code.

Author: Andres Freund
Backpatch: 9.4, where we started to maintain valgrind suppressions

2f1f4439