Commits · 1a4be103a523db8d47b464463ba175cc664442b2 · Abuhujair Javed / Postgres FD Implementation

09 Sep, 2016 10 commits

Convert PageAddItem into a macro to save a few cycles. · 1a4be103

Tom Lane authored Sep 09, 2016

Nowadays this is just a backwards-compatibility wrapper around
PageAddItemExtended, so let's avoid the extra level of function call.
In addition, because pretty much all callers are passing constants
for the two bool arguments, compilers will be able to constant-fold
the conversion to a flags bitmask.

Discussion: <552.1473445163@sss.pgh.pa.us>

1a4be103

Invent PageIndexTupleOverwrite, and teach BRIN and GiST to use it. · b1328d78

Tom Lane authored Sep 09, 2016

PageIndexTupleOverwrite performs approximately the same function as
PageIndexTupleDelete (or PageIndexDeleteNoCompact) followed by PageAddItem
targeting the same item pointer offset. But in the case where the new
tuple is the same size as the old, it avoids shuffling other data around on
the page, because the new tuple is placed where the old one was rather than
being appended to the end of the page. This has been shown to provide a
substantial speedup for some GiST use-cases.

Also, this change allows some API simplifications: we can get rid of
the rather klugy and error-prone PAI_ALLOW_FAR_OFFSET flag for
PageAddItemExtended, since that was used only to cover a corner case
for BRIN that's better expressed by using PageIndexTupleOverwrite.

Note that this patch causes a rather subtle WAL incompatibility: the
physical page content change represented by certain WAL records is now
different than it was before, because while the tuples have the same
itempointer line numbers, the tuples themselves are in different places.
I have not bumped the WAL version number because I think it doesn't matter
unless you are trying to do bitwise comparisons of original and replayed
pages, and in any case we're early in a devel cycle and there will probably
be more WAL changes before v10 gets out the door.

There is probably room to make use of PageIndexTupleOverwrite in SP-GiST
and GIN too, but that is left for a future patch.

Andrey Borodin, reviewed by Anastasia Lubennikova, whacked around a bit
by me

Discussion: <CAJEAwVGQjGGOj6mMSgMwGvtFd5Kwe6VFAxY=uEPZWMDjzbn4VQ@mail.gmail.com>

b1328d78

Fix locking a tuple updated by an aborted (sub)transaction · 5c609a74

Alvaro Herrera authored Sep 09, 2016

When heap_lock_tuple decides to follow the update chain, it tried to
also lock any version of the tuple that was created by an update that
was subsequently rolled back. This is pointless, since for all intents
and purposes that tuple exists no more; and moreover it causes
misbehavior, as reported independently by Marko Tiikkaja and Marti
Raudsepp: some SELECT FOR UPDATE/SHARE queries may fail to return
the tuples, and assertion-enabled builds crash.

Fix by having heap_lock_updated_tuple test the xmin and return success
immediately if the tuple was created by an aborted transaction.

The condition where tuples become invisible occurs when an updated tuple
chain is followed by heap_lock_updated_tuple, which reports the problem
as HeapTupleSelfUpdated to its caller heap_lock_tuple, which in turn
propagates that code outwards possibly leading the calling code
(ExecLockRows) to believe that the tuple exists no longer.

Backpatch to 9.3. Only on 9.5 and newer this leads to a visible
failure, because of commit 27846f02; before that, heap_lock_tuple
skips the whole dance when the tuple is already locked by the same
transaction, because of the ancient HeapTupleSatisfiesUpdate behavior.
Still, the buggy condition may also exist in more convoluted scenarios
involving concurrent transactions, so it seems safer to fix the bug in
the old branches too.

Discussion:
https://www.postgresql.org/message-id/CABRT9RC81YUf1=jsmWopcKJEro=VoeG2ou6sPwyOUTx_qteRsg@mail.gmail.com
https://www.postgresql.org/message-id/48d3eade-98d3-8b9a-477e-1a8dc32a724d@joh.to

5c609a74

In PageIndexTupleDelete, don't assume stored item lengths are MAXALIGNed. · 984d0a14

Tom Lane authored Sep 09, 2016

PageAddItem stores the item length as-is.  It MAXALIGN's the amount of
space actually allocated for each tuple, but not the stored length.
PageRepairFragmentation, PageIndexMultiDelete, and PageIndexDeleteNoCompact
are all on board with this and MAXALIGN item lengths after fetching them.
But PageIndexTupleDelete expects the stored length to be a MAXALIGN
multiple already.  This accidentally works for existing index AMs because
they all maxalign their tuple sizes internally; but we don't do that for
heap tuples, and it shouldn't be a requirement for index tuples either.

So, sync PageIndexTupleDelete with the rest of bufpage.c by having it
maxalign the item size after fetching.

Also add a check that pd_special is maxaligned, to ensure that the test
"(offset + size) > phdr->pd_special" is still doing the right thing.
(If offset and pd_special are aligned, it doesn't matter whether size is.)
Again, this is in sync with the rest of the routines here, except for
PageAddItem which doesn't test because it doesn't actually do anything
for which pd_special alignment matters.

This shouldn't have any immediate functional impact; it just adds the
flexibility to use PageIndexTupleDelete on index tuples with non-aligned
lengths.

Discussion: <3814.1473366762@sss.pgh.pa.us>

984d0a14

Make better use of existing enums in plpgsql · e0013deb

Peter Eisentraut authored Aug 18, 2016

plpgsql.h defines a number of enums, but most of the code passes them
around as ints.  Update structs and function prototypes to take enum
types instead.  This clarifies the struct definitions in plpgsql.h in
particular.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>

e0013deb

Avoid reporting "cache lookup failed" for some user-reachable cases. · 967a7b0f

Tom Lane authored Sep 09, 2016

We have a not-terribly-thoroughly-enforced-yet project policy that internal
errors with SQLSTATE XX000 (ie, plain elog) should not be triggerable from
SQL. record_in, domain_in, and PL validator functions all failed to meet
this standard, because they threw plain elog("cache lookup failed for XXX")
errors on bad OIDs, and those are all invokable from SQL.

For record_in, the best fix is to upgrade typcache.c (lookup_type_cache)
to throw a user-facing error for this case. That seems consistent because
it was more than halfway there already, having user-facing errors for shell
types and non-composite types. Having done that, tweak domain_in to rely
on the typcache to throw an appropriate error. (This costs little because
InitDomainConstraintRef would fetch the typcache entry anyway.)

For the PL validator functions, we already have a single choke point at
CheckFunctionValidatorAccess, so just fix its error to be user-facing.

Dilip Kumar, reviewed by Haribabu Kommi

Discussion: <87wpxfygg9.fsf@credativ.de>

967a7b0f

Fix corruption of 2PC recovery with subxacts · ec253de1

Simon Riggs authored Sep 09, 2016

Reading 2PC state files during recovery was borked, causing corruptions during
recovery. Effect limited to servers with 2PC, subtransactions and
recovery/replication.

Stas Kelvich, reviewed by Michael Paquier and Pavan Deolasee

ec253de1

Correct TABLESAMPLE docs · f6647242

Simon Riggs authored Sep 09, 2016

Revert to original use of word “sample”, though with clarification,
per Tom Lane.

Discussion: 29052.1471015383@sss.pgh.pa.us

f6647242

Improve scalability of md.c for large relations. · 45e191e3

Andres Freund authored Sep 08, 2016

So far md.c used a linked list of segments. That proved to be a problem
when processing large relations, because every smgr.c/md.c level access
to a page incurred walking through a linked list of all preceding
segments. Thus making accessing pages O(#segments).

Replace the linked list of segments hanging off SMgrRelationData with an
array of opened segments. That allows O(1) access to individual
segments, if they've previously been opened.

Discussion: <20140331101001.GE13135@alap3.anarazel.de>
Reviewed-By: Peter Geoghegan, Tom Lane (in an older version)

45e191e3

Faster PageIsVerified() for the all zeroes case. · 417fefaf

Andres Freund authored Sep 08, 2016

That's primarily useful for testing very large relations, using sparse
files.

Discussion: <20140331101001.GE13135@alap3.anarazel.de>
Reviewed-By: Peter Geoghegan

417fefaf

08 Sep, 2016 8 commits

Fix mdtruncate() to close fd.c handle of deleted segments. · 769fd9d8

Andres Freund authored Sep 08, 2016

mdtruncate() forgot to FileClose() a segment's mdfd_vfd, when deleting
it. That lead to a fd.c handle to a truncated file being kept open until
backend exit.

The issue appears to have been introduced way back in 1a5c450f,
before that the handle was closed inside FileUnlink().

The impact of this bug is limited - only VACUUM and ON COMMIT TRUNCATE
for temporary tables, truncate files in place (i.e. TRUNCATE itself is
not affected), and the relation has to be bigger than 1GB. The
consequences of a leaked fd.c handle aren't severe either.

Discussion: <20160908220748.oqh37ukwqqncbl3n@alap3.anarazel.de>
Backpatch: all supported releases

769fd9d8

Fix two src/test/modules Makefiles · 19acee8c

Alvaro Herrera authored Sep 08, 2016

commit_ts and test_pg_dump were declaring targets before including the
PGXS stanza, which meant that the "all" target customarily defined as
the first (and therefore default target) was not the default anymore.
Fix that by moving those target definitions to after PGXS.

commit_ts was initially good, but I broke it in commit 9def031b;
test_pg_dump was born broken, probably copying from commit_ts' mistake.

In passing, fix a comment mistake in test_pg_dump/Makefile.

Backpatch to 9.6.

Noted by Tom Lane.

19acee8c

Allow pg_dump to dump non-extension members of an extension-owned schema. · df5d9bb8

Tom Lane authored Sep 08, 2016

Previously, if a schema was created by an extension, a normal pg_dump run
(not --binary-upgrade) would summarily skip every object in that schema.
In a case where an extension creates a schema and then users create other
objects within that schema, this does the wrong thing: we want pg_dump
to skip the schema but still create the non-extension-owned objects.

There's no easy way to fix this pre-9.6, because in earlier versions the
"dump" status for a schema is just a bool and there's no way to distinguish
"dump me" from "dump my members".  However, as of 9.6 we do have enough
state to represent that, so this is a simple correction of the logic in
selectDumpableNamespace.

In passing, make some cosmetic fixes in nearby code.

Martín Marqués, reviewed by Michael Paquier

Discussion: <99581032-71de-6466-c325-069861f1947d@2ndquadrant.com>

df5d9bb8

Don't print database's tablespace in pg_dump -C --no-tablespaces output. · e97e9c57

Tom Lane authored Sep 08, 2016

If the database has a non-default tablespace, we emitted a TABLESPACE
clause in the CREATE DATABASE command emitted by -C, even if
--no-tablespaces was also specified.  This seems wrong, and it's
inconsistent with what pg_dumpall does, so change it.  Per bug #14315
from Danylo Hlynskyi.

Back-patch to 9.5.  The bug is much older, but it'd be a more invasive
change before 9.5 because dumpDatabase() hasn't got an easy way to get
to the outputNoTablespaces flag.  Doesn't seem worth the work given
the lack of previous complaints.

Report: <20160908081953.1402.75347@wrigleys.postgresql.org>

e97e9c57

Fix minor memory leak in Standby startup · 67c6bd1c

Simon Riggs authored Sep 08, 2016

StandbyRecoverPreparedTransactions() leaked the buffer
used for two phase state file. This was leaked once
at startup and at every shutdown checkpoint seen.

Backpatch to 9.6

Stas Kelvich

67c6bd1c

MSVC: Pass any user-set MSBFLAGS to MSBuild and VCBUILD. · d299eb41

Noah Misch authored Sep 08, 2016

This is particularly useful to pass /m, to perform a parallel build.

Christian Ullrich, reviewed by Michael Paquier.

d299eb41

MSVC: Place gendef.pl temporary file in the target directory. · 976a9bbd

Noah Misch authored Sep 08, 2016

Until now, it used the current working directory.  This makes it safe
for simultaneous invocations of gendef.pl, with different target
directories, to run from a single current working directory, such as
$(top_srcdir).  The MSVC build system will soon rely on this.

Christian Ullrich, reviewed by Michael Paquier.

976a9bbd

9.6 release notes: correct summary item about freeze · c9cf432e
Bruce Momjian authored Sep 07, 2016
```
Previously it less precisely talked about autovacuum.

Backpatch-through: 9.6
```
c9cf432e

07 Sep, 2016 3 commits

Support renaming an existing value of an enum type. · 0ab9c56d

Tom Lane authored Sep 07, 2016

Not much to be said about this patch: it does what it says on the tin.

In passing, rename AlterEnumStmt.skipIfExists to skipIfNewValExists
to clarify what it actually does. In the discussion of this patch
we considered supporting other similar options, such as IF EXISTS
on the type as a whole or IF NOT EXISTS on the target name. This
patch doesn't actually add any such feature, but it might happen later.

Dagfinn Ilmari Mannsåker, reviewed by Emre Hasegeli

Discussion: <CAO=2mx6uvgPaPDf-rHqG8=1MZnGyVDMQeh8zS4euRyyg4D35OQ@mail.gmail.com>

0ab9c56d

Doc: minor documentation improvements about extensions. · bd180b60

Tom Lane authored Sep 07, 2016

Document the formerly-undocumented behavior that schema and comment
control-file entries for an extension are honored only during initial
installation, whereas other properties are also honored during updates.

While at it, do some copy-editing on the recently-added docs for CREATE
EXTENSION ... CASCADE, use links for some formerly vague cross references,
and make a couple other minor improvements.

Back-patch to 9.6 where CASCADE was added.  The other parts of this
could go further back, but they're probably not important enough to
bother.

bd180b60

Add a HINT for client-vs-server COPY failure cases. · 4f405c8e

Tom Lane authored Sep 06, 2016

Users often get confused between COPY and \copy and try to use client-side
paths with COPY. The server then cannot find the file (if remote), or sees
a permissions problem (if local), or some variant of that. Emit a hint
about this in the most common cases.

In future we might want to expand the set of errnos for which the hint
gets printed, but be conservative for now.

Craig Ringer, reviewed by Christoph Berg and Tom Lane

Discussion: <CAMsr+YEqtD97qPEzQDqrCt5QiqPbWP_X4hmvy2pQzWC0GWiyPA@mail.gmail.com>

4f405c8e

06 Sep, 2016 7 commits

Add location field to DefElem · 49eb0fd0

Peter Eisentraut authored Sep 06, 2016

Add a location field to the DefElem struct, used to parse many utility
commands.  Update various error messages to supply error position
information.

To propogate the error position information in a more systematic way,
create a ParseState in standard_ProcessUtility() and pass that to
interested functions implementing the utility commands.  This seems
better than passing the query string and then reassembling a parse state
ad hoc, which violates the encapsulation of the ParseState type.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>

49eb0fd0

Doc: small improvements for documentation about VACUUM freezing. · 975768f8

Tom Lane authored Sep 06, 2016

Mostly, explain how row xmin's used to be replaced by FrozenTransactionId
and no longer are.  Do a little copy-editing on the side.

Per discussion with Egor Rogov.  Back-patch to 9.4 where the behavioral
change occurred.

Discussion: <575D7955.6060209@postgrespro.ru>

975768f8

Guard against possible memory allocation botch in batchmemtuples(). · f032722f

Tom Lane authored Sep 06, 2016

Negative availMemLessRefund would be problematic.  It's not entirely
clear whether the case can be hit in the code as it stands, but this
seems like good future-proofing in any case.  While we're at it,
insist that the value be not merely positive but not tiny, so as to
avoid doing a lot of repalloc work for little gain.

Peter Geoghegan

Discussion: <CAM3SWZRVkuUB68DbAkgw=532gW0f+fofKueAMsY7hVYi68MuYQ@mail.gmail.com>

f032722f

Teach appendShellString() to not quote strings containing "-". · cdc70597

Tom Lane authored Sep 06, 2016

Brain fade in commit a00c5831: I was thinking that a string starting with
"-" could be taken as a switch depending on command line syntax. That's
true, but having appendShellString() quote it will not help, so we may as
well not do so. Per complaint from Peter Eisentraut.

cdc70597

Repair whitespace in initdb message. · a2ee579b

Tom Lane authored Sep 06, 2016

What used to be four spaces somehow turned into a tab and a couple of
spaces in commit a00c5831, no doubt from overhelpful emacs autoindent.
Noted by Peter Eisentraut.

a2ee579b

Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL · dcb12ce8

Simon Riggs authored Sep 06, 2016

lazy_truncate_heap() was waiting for
VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds
not milliseconds as originally intended.

Found by code inspection.

Simon Riggs

dcb12ce8

C comment: fix file name mention on line 1 · 67e1e2aa
Bruce Momjian authored Sep 06, 2016
```
Author: Amit Langote
```
67e1e2aa

05 Sep, 2016 10 commits

Cosmetic code cleanup in commands/extension.c. · 25794e84

Tom Lane authored Sep 05, 2016

Some of the comments added by the CREATE EXTENSION CASCADE patch were
a bit sloppy, and I didn't care for redeclaring the same local variable
inside a nested block either. No functional changes.

25794e84

Have "make coverage" recurse into contrib as well · 2093f663
Alvaro Herrera authored Sep 05, 2016

2093f663

Make locale-dependent regex character classes work for large char codes. · c54159d4

Tom Lane authored Sep 05, 2016

Previously, we failed to recognize Unicode characters above U+7FF as
being members of locale-dependent character classes such as [[:alpha:]].
(Actually, the same problem occurs for large pg_wchar values in any
multibyte encoding, but UTF8 is the only case people have actually
complained about.) It's impractical to get Spencer's original code to
handle character classes or ranges containing many thousands of characters,
because it insists on considering each member character individually at
regex compile time, whether or not the character will ever be of interest
at run time. To fix, choose a cutoff point MAX_SIMPLE_CHR below which
we process characters individually as before, and deal with entire ranges
or classes as single entities above that. We can actually make things
cheaper than before for chars below the cutoff, because the color map can
now be a simple linear array for those chars, rather than the multilevel
tree structure Spencer designed. It's more expensive than before for
chars above the cutoff, because we must do a binary search in a list of
high chars and char ranges used in the regex pattern, plus call iswalpha()
and friends for each locale-dependent character class used in the pattern.
However, multibyte encodings are normally designed to give smaller codes
to popular characters, so that we can expect that the slow path will be
taken relatively infrequently. In any case, the speed penalty appears
minor except when we have to apply iswalpha() etc. to high character codes
at runtime --- and the previous coding gave wrong answers for those cases,
so whether it was faster is moot.

Tom Lane, reviewed by Heikki Linnakangas

Discussion: <15563.1471913698@sss.pgh.pa.us>

c54159d4

C comment: align dashes in GroupState node header · f80049f7
Bruce Momjian authored Sep 05, 2016
```
Author: Jim Nasby
```
f80049f7

Relax transactional restrictions on ALTER TYPE ... ADD VALUE. · 15bc038f

Tom Lane authored Sep 05, 2016

To prevent possibly breaking indexes on enum columns, we must keep
uncommitted enum values from getting stored in tables, unless we
can be sure that any such column is new in the current transaction.

Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE
from being executed at all in a transaction block, unless the target
enum type had been created in the current transaction. This patch
removes that restriction, and instead insists that an uncommitted enum
value can't be referenced unless it belongs to an enum type created
in the same transaction as the value. Per discussion, this should be
a bit less onerous. It does require each function that could possibly
return a new enum value to SQL operations to check this restriction,
but there aren't so many of those that this seems unmaintainable.

Andrew Dunstan and Tom Lane

Discussion: <4075.1459088427@sss.pgh.pa.us>

15bc038f

Add debug check function LWLockHeldByMeInMode() · 016abf1f

Simon Riggs authored Sep 05, 2016

Tests whether my process holds a lock in given mode.
Add initial usage in MarkBufferDirty().

Thomas Munro

016abf1f

Document LSN acronym in WAL Internals · ec03f412

Simon Riggs authored Sep 05, 2016

We previously didn't mention what an LSN actually was.

Simon Riggs and Michael Paquier

ec03f412

Dirty replication slots when using sql interface · d851bef2

Simon Riggs authored Sep 05, 2016

When pg_logical_slot_get_changes(...) sets confirmed_flush_lsn to the point at
which replay stopped, it doesn't dirty the replication slot. So if the replay
didn't cause restart_lsn or catalog_xmin to change as well, this change will
not get written out to disk. Even on a clean shutdown.

If Pg crashes or restarts, a subsequent pg_logical_slot_get_changes(...) call
will see the same changes already replayed since it uses the slot's
confirmed_flush_lsn as the start point for fetching changes. The caller can't
specify a start LSN when using the SQL interface.

Mark the slot as dirty after reading changes using the SQL interface so that
users won't see repeated changes after a clean shutdown. Repeated changes still
occur when using the walsender interface or after an unclean shutdown.

Craig Ringer

d851bef2

Remove duplicate code from ReorderBufferCleanupTXN(). · b6182081

Tom Lane authored Sep 04, 2016

Andres is apparently the only hacker who thinks this code is better as-is.
I (tgl) follow some of his logic, but the fact that it's setting off
warnings from static code analyzers seems like a sufficient reason to
put the complexity into a comment rather than the code.

Aleksander Alekseev

Discussion: <20160404190345.54d84ee8@fujitsu>

b6182081

Add regression test coverage for non-default timezone abbreviation sets. · c7f68bea

Tom Lane authored Sep 04, 2016

After further reflection about the mess cleaned up in commit 39b691f2,
I decided the main bit of test coverage that was still missing was to
check that the non-default abbreviation-set files we supply are usable.
Add that.

Back-patch to supported branches, just because it seems like a good
idea to keep this all in sync.

c7f68bea

04 Sep, 2016 2 commits

Remove vestigial references to "zic" in favor of "IANA database". · da6ea70c

Tom Lane authored Sep 04, 2016

Commit b2cbced9 instituted a policy of referring to the timezone database
as the "IANA timezone database" in our user-facing documentation.
Propagate that wording into a couple of places that were still using "zic"
to refer to the database, which is definitely not right (zic is the
compilation tool, not the data).

Back-patch, not because this is very important in itself, but because
we routinely cherry-pick updates to the tznames files and I don't want
to risk future merge failures.

da6ea70c

Update release notes to mention need for ALTER EXTENSION UPDATE. · 5a072244

Tom Lane authored Sep 04, 2016

Maybe we ought to make pg_upgrade do this for you, but it won't happen
in 9.6, so call out the need for it as a migration consideration.

5a072244