Commits · afb9249d06f47d7a6d4a89fea0c3625fe43c5a5d · Abuhujair Javed / Postgres FD Implementation

12 May, 2015 8 commits

Add support for doing late row locking in FDWs. · afb9249d

Tom Lane authored May 12, 2015

Previously, FDWs could only do "early row locking", that is lock a row as
soon as it's fetched, even though local restriction/join conditions might
discard the row later. This patch adds callbacks that allow FDWs to do
late locking in the same way that it's done for regular tables.

To make use of this feature, an FDW must support the "ctid" column as a
unique row identifier. Currently, since ctid has to be of type TID,
the feature is of limited use, though in principle it could be used by
postgres_fdw. We may eventually allow FDWs to specify another data type
for ctid, which would make it possible for more FDWs to use this feature.

This commit does not modify postgres_fdw to use late locking. We've
tested some prototype code for that, but it's not in committable shape,
and besides it's quite unclear whether it actually makes sense to do late
locking against a remote server. The extra round trips required are likely
to outweigh any benefit from improved concurrency.

Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me

afb9249d

pgbench: Don't fail during startup · aa4a0b95

Stephen Frost authored May 12, 2015

In pgbench, report, but ignore, any errors returned when attempting to
vacuum/truncate the default tables during startup.  If the tables are
needed, we'll error out soon enough anyway.

Per discussion with Tatsuo, David Rowley, Jim Nasby, Robert, Andres,
Fujii, Fabrízio de Royes Mello, Tomas Vondra, Michael Paquier, Peter,
based on a suggestion from Jeff Janes, patch from Robert, additional
message wording from Tom.

aa4a0b95

pg_basebackup -F t now succeeds with a long symlink target · 97e0aa69
Andrew Dunstan authored May 12, 2015

97e0aa69
doc build: use unique Makefile variable to control temp install · ea12b3ca
Bruce Momjian authored May 12, 2015

ea12b3ca

"Fix" test_ddl_deparse regress test schedule · 007c932e

Alvaro Herrera authored May 12, 2015

MSVC is not smart enough to figure it out, so dumb down the Makefile and
remove the schedule file.

Also add a .gitignore file.

Author: Michael Paquier

007c932e

doc: prevent SGML 'make check' from building temp install · e8c19263
Bruce Momjian authored May 12, 2015
```
Report by Alvaro Herrera
```
e8c19263

Map basebackup tablespaces using a tablespace_map file · 72d422a5

Andrew Dunstan authored May 12, 2015

Windows can't reliably restore symbolic links from a tar format, so
instead during backup start we create a tablespace_map file, which is
used by the restoring postgres to create the correct links in pg_tblspc.
The backup protocol also now has an option to request this file to be
included in the backup stream, and this is used by pg_basebackup when
operating in tar mode.

This is done on all platforms, not just Windows.

This means that pg_basebackup will not not work in tar mode against 9.4
and older servers, as this protocol option isn't implemented there.

Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.

72d422a5

Replace some appendStringInfo* calls with more appropriate variants · d02f1647
Peter Eisentraut authored May 11, 2015
```
Author: David Rowley <dgrowleyml@gmail.com>
```
d02f1647

11 May, 2015 11 commits

Allow on-the-fly capture of DDL event details · b488c580

Alvaro Herrera authored May 11, 2015

This feature lets user code inspect and take action on DDL events.
Whenever a ddl_command_end event trigger is installed, DDL actions
executed are saved to a list which can be inspected during execution of
a function attached to ddl_command_end.

The set-returning function pg_event_trigger_ddl_commands can be used to
list actions so captured; it returns data about the type of command
executed, as well as the affected object.  This is sufficient for many
uses of this feature.  For the cases where it is not, we also provide a
"command" column of a new pseudo-type pg_ddl_command, which is a
pointer to a C structure that can be accessed by C code.  The struct
contains all the info necessary to completely inspect and even
reconstruct the executed command.

There is no actual deparse code here; that's expected to come later.
What we have is enough infrastructure that the deparsing can be done in
an external extension.  The intention is that we will add some deparsing
code in a later release, as an in-core extension.

A new test module is included.  It's probably insufficient as is, but it
should be sufficient as a starting point for a more complete and
future-proof approach.

Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick,
Abhijit Menon-Sen.

Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier,
Craig Ringer, David Steele.
Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost,
Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule.

Based on original work by Dimitri Fontaine, though I didn't use his
code.

Discussion:
  https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr
  https://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.org
  https://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org

b488c580

Allow LOCK TABLE .. ROW EXCLUSIVE MODE with INSERT · fa264243

Stephen Frost authored May 11, 2015

INSERT acquires RowExclusiveLock during normal operation and therefore
it makes sense to allow LOCK TABLE .. ROW EXCLUSIVE MODE to be executed
by users who have INSERT rights on a table (even if they don't have
UPDATE or DELETE).

Not back-patching this as it's a behavior change which, strictly
speaking, loosens security restrictions.

Per discussion with Tom and Robert (circa 2013).

fa264243

pg_upgrade: use single or double-quotes in command-line strings · 9d15292c
Bruce Momjian authored May 11, 2015
```
This is platform-dependent.
```
9d15292c

Fix incorrect checking of deferred exclusion constraint after a HOT update. · 20781765

Tom Lane authored May 11, 2015

If a row that potentially violates a deferred exclusion constraint is
HOT-updated later in the same transaction, the exclusion constraint would
be reported as violated when the check finally occurs, even if the row(s)
the new row originally conflicted with have since been removed. This
happened because the wrong TID was passed to check_exclusion_constraint(),
causing the live HOT-updated row to be seen as a conflicting row rather
than recognized as the row-under-test.

Per bug #13148 from Evan Martin. It's been broken since exclusion
constraints were invented, so back-patch to all supported branches.

20781765

Increase threshold for multixact member emergency autovac to 50%. · b4d4ce1d

Robert Haas authored May 11, 2015

Analysis by Noah Misch shows that the 25% threshold set by commit
53bb309d is lower than any other,
similar autovac threshold.  While we don't know exactly what value
will be optimal for all users, it is better to err a little on the
high side than on the low side.  A higher value increases the risk
that users might exhaust the available space and start seeing errors
before autovacuum can clean things up sufficiently, but a user who
hits that problem can compensate for it by reducing
autovacuum_multixact_freeze_max_age to a value dependent on their
average multixact size.  On the flip side, if the emergency cap
imposed by that patch kicks in too early, the user will experience
excessive wraparound scanning and will be unable to mitigate that
problem by configuration.  The new value will hopefully reduce the
risk of such bad experiences while still providing enough headroom
to avoid multixact member exhaustion for most users.

Along the way, adjust the documentation to reflect the effects of
commit 04e6d3b8, which taught
autovacuum to run for multixact wraparound even when autovacuum
is configured off.

b4d4ce1d

initdb: only recommend pg_ctl to start the server · 2200713a
Bruce Momjian authored May 11, 2015
```
Previously we mentioned the 'postgres' binary method as well.
```
2200713a
docs: add "serialization anomaly" to transaction isolation table · 23c33198
Bruce Momjian authored May 11, 2015
```
Also distinguish between SQL-standard and Postgres behavior.

Report by David G. Johnston
```
23c33198
pg_dump: suppress "Tablespace:" comment for default tablespaces · c71e2734
Bruce Momjian authored May 11, 2015
```
Report by Hans Ginzel
```
c71e2734
Even when autovacuum=off, force it for members as we do in other cases. · 04e6d3b8
Robert Haas authored May 11, 2015
```
Thomas Munro, with some adjustments by me.
```
04e6d3b8

Advance the stop point for multixact offset creation only at checkpoint. · f6a6c46d

Robert Haas authored May 10, 2015

Commit b69bf30b advanced the stop point
at vacuum time, but this has subsequently been shown to be unsafe as a
result of analysis by myself and Thomas Munro and testing by Thomas
Munro.  The crux of the problem is that the SLRU deletion logic may
get confused about what to remove if, at exactly the right time during
the checkpoint process, the head of the SLRU crosses what used to be
the tail.

This patch, by me, fixes the problem by advancing the stop point only
following a checkpoint.  This has the additional advantage of making
the removal logic work during recovery more like the way it works during
normal running, which is probably good.

At least one of the calls to DetermineSafeOldestOffset which this patch
removes was already dead, because MultiXactAdvanceOldest is called only
during recovery and DetermineSafeOldestOffset was set up to do nothing
during recovery.  That, however, is inconsistent with the principle that
recovery and normal running should work similarly, and was confusing to
boot.

Along the way, fix some comments that previous patches in this area
neglected to update.  It's not clear to me whether there's any
concrete basis for the decision to use only half of the multixact ID
space, but it's neither necessary nor sufficient to prevent multixact
member wraparound, so the comments should not say otherwise.

f6a6c46d

Fix DetermineSafeOldestOffset for the case where there are no mxacts. · 312747c2

Robert Haas authored May 10, 2015

Commit b69bf30b failed to take into
account the possibility that there might be no multixacts in existence
at all.

Report by Thomas Munro; patch by me.

312747c2

10 May, 2015 2 commits

Code review for foreign/custom join pushdown patch. · 1a8a4e5c

Tom Lane authored May 10, 2015

Commit e7cb7ee1 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments.  Clean up
as follows:

* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function.  In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs.  Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.

* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.

* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that.  Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.

* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries.  The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks.  It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway.  I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.

* Avoid ad-hocery in ExecAssignScanProjectionInfo.  It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.

* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.

* Lots of cleanup of documentation and missed comments.  Re-order some
code additions into more logical places.

1a8a4e5c

Add missing "static" marker. · c594c750
Tom Lane authored May 09, 2015
```
Per buildfarm member pademelon.
```
c594c750

09 May, 2015 5 commits

Correct reindexdb documentation · f0a4b20b
Stephen Frost authored May 09, 2015
```
--schema takes a schema, not a table.

Author: Sawada Masahiko
```
f0a4b20b
doc: adjust ordering of pg_stat_statement paragraphs · da31c5ed
Bruce Momjian authored May 09, 2015
```
Clarify installation instructions

Patch by Ian Barwick
```
da31c5ed
Add new OID alias type regnamespace · cb9fa802
Andrew Dunstan authored May 09, 2015
```
Catalog version bumped

Kyotaro HORIGUCHI
```
cb9fa802

Add new OID alias type regrole · 0c90f676

Andrew Dunstan authored May 09, 2015

The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.

Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.

Kyotaro Horiguchi

0c90f676

Improve ParseConfigFp comment wrt head/tail · 0cf56f14

Stephen Frost authored May 09, 2015

The head_p and tail_p pointers passed to ParseConfigFp() are actually
input/output parameters, not strictly output paramaters.  This updates
the function comment to reflect that.

Per discussion with Tom.

0cf56f14

08 May, 2015 13 commits

Change default for include_realm to 1 · 9a088417

Stephen Frost authored May 08, 2015

The default behavior for GSS and SSPI authentication methods has long
been to strip the realm off of the principal, however, this is not a
secure approach in multi-realm environments and the use-case for the
parameter at all has been superseded by the regex-based mapping support
available in pg_ident.conf.

Change the default for include_realm to be '1', meaning that we do
NOT remove the realm from the principal by default. Any installations
which depend on the existing behavior will need to update their
configurations (ideally by leaving include_realm set to 1 and adding a
mapping in pg_ident.conf, but alternatively by explicitly setting
include_realm=0 prior to upgrading). Note that the mapping capability
exists in all currently supported versions of PostgreSQL and so this
change can be done today. Barring that, existing users can update their
configurations today to explicitly set include_realm=0 to ensure that
the prior behavior is maintained when they upgrade.

This needs to be noted in the release notes.

Per discussion with Magnus and Peter.

9a088417

Modify pg_stat_get_activity to build a tuplestore · f91feba8

Stephen Frost authored May 08, 2015

This updates pg_stat_get_activity() to build a tuplestore for its
results instead of using the old-style multiple-call method. This
simplifies the function, though that wasn't the primary motivation for
the change, which is that we may turn it into a helper function which
can filter the results (or not) much more easily.

f91feba8

Bump catversion for pg_file_settings · 4b342fb5
Stephen Frost authored May 08, 2015
```
Pointed out by Andres (thanks!)

Apologies for not including it in the initial patch.
```
4b342fb5

Add pg_file_settings view and function · a97e0c33

Stephen Frost authored May 08, 2015

The function and view added here provide a way to look at all settings
in postgresql.conf, any #include'd files, and postgresql.auto.conf
(which is what backs the ALTER SYSTEM command).

The information returned includes the configuration file name, line
number in that file, sequence number indicating when the parameter is
loaded (useful to see if it is later masked by another definition of the
same parameter), parameter name, and what it is set to at that point.
This information is updated on reload of the server.

This is unfiltered, privileged, information and therefore access is
restricted to superusers through the GRANT system.

Author: Sawada Masahiko, various improvements by me.
Reviewers: David Steele

a97e0c33

Fix two problems in infer_arbiter_indexes(). · bab64ef9

Andres Freund authored May 08, 2015

The first is a pretty simple bug where a relcache entry is used after
the relation is closed. In this particular situation it does not appear
to have bad consequences unless compiled with RELCACHE_FORCE_RELEASE.

The second is that infer_arbiter_indexes() skipped indexes that aren't
yet valid according to indcheckxmin. That's not required here, because
uniqueness checks don't care about visibility according to an older
snapshot. While thats not really a bug, it makes things undesirably
non-deterministic. There is some hope that this explains a test failure
on buildfarm member jaguarundi.

Discussion: 9096.1431102730@sss.pgh.pa.us

bab64ef9

At promotion, archive last segment from old timeline with .partial suffix. · de768844

Heikki Linnakangas authored May 08, 2015

Previously, we would archive the possible-incomplete WAL segment with its
normal filename, but that causes trouble if the server owning that timeline
is still running, and tries to archive the same segment later. It's not nice
for the standby to trip up the master's archival like that. And it's pretty
confusing, anyway, to have an incomplete segment in the archive that's
indistinguishable from a normal, complete segment.

To avoid such confusion, add a .partial suffix to the file. Or to be more
precise, make a copy of the old segment under the .partial suffix, and
archive that instead of the original file. pg_receivexlog also uses the
.partial suffix for the same purpose, to tell apart incompletely streamed
files from complete ones.

There is no automatic mechanism to use the .partial files at recovery, so
they will go unused, unless the administrator manually copies to them to
the pg_xlog directory (and removes the .partial suffix). Recovery won't
normally need the WAL - when recovering to the new timeline, it will find
the same WAL on the first segment on the new timeline instead - but it
nevertheless feels better to archive the file with the .partial suffix, for
debugging purposes if nothing else.

de768844

Add macros to check if a filename is a WAL segment or other such file. · 179cdd09
Heikki Linnakangas authored May 08, 2015
```
We had many instances of the strlen + strspn combination to check for that.
This makes the code a bit easier to read.
```
179cdd09
Fix whitespace · 16c73e77
Peter Eisentraut authored May 08, 2015

16c73e77
Minor ON CONFLICT related comments and doc fixes. · e8898e91
Andres Freund authored May 08, 2015
```
Geoff Winkless, Stephen Frost, Peter Geoghegan and me.
```
e8898e91

Teach autovacuum about multixact member wraparound. · 53bb309d

Robert Haas authored May 08, 2015

The logic introduced in commit b69bf30b
and repaired in commits 669c7d20 and
7be47c56 helps to ensure that we don't
overwrite old multixact member information while it is still needed,
but a user who creates many large multixacts can still exhaust the
member space (and thus start getting errors) while autovacuum stands
idly by.

To fix this, progressively ramp down the effective value (but not the
actual contents) of autovacuum_multixact_freeze_max_age as member space
utilization increases.  This makes autovacuum more aggressive and also
reduces the threshold for a manual VACUUM to perform a full-table scan.

This patch leaves unsolved the problem of ensuring that emergency
autovacuums are triggered even when autovacuum=off.  We'll need to fix
that via a separate patch.

Thomas Munro and Robert Haas

53bb309d

Remove reference to src/tools/backend/index.html · 195fbd40

Stephen Frost authored May 08, 2015

src/tools/backend was removed back in 63f1ccd8, but
backend/storage/lmgr/README didn't get the memo.

Author: Amit Langote

195fbd40

Remove dependency on ordering in logical decoding upsert test. · 581f4f96

Andres Freund authored May 08, 2015

Buildfarm member magpie sorted the output differently than intended by
Peter. "Resolve" the problem by simply not aggregating, it's not that
many lines.

581f4f96

Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE. · 168d5805

Andres Freund authored May 08, 2015

The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.

168d5805

07 May, 2015 1 commit

Represent columns requiring insert and update privileges indentently. · 2c8f4836

Andres Freund authored May 08, 2015

Previously, relation range table entries used a single Bitmapset field
representing which columns required either UPDATE or INSERT privileges,
despite the fact that INSERT and UPDATE privileges are separately
cataloged, and may be independently held.  As statements so far required
either insert or update privileges but never both, that was
sufficient. The required permission could be inferred from the top level
statement run.

The upcoming INSERT ... ON CONFLICT UPDATE feature needs to
independently check for both privileges in one statement though, so that
is not sufficient anymore.

Bumps catversion as stored rules change.

Author: Peter Geoghegan
Reviewed-By: Andres Freund

2c8f4836