Commits · a5736bf754c82d8b86674e199e232096c679201d · Abuhujair Javed / Postgres FD Implementation

06 Oct, 2017 2 commits

Fix traversal of half-frozen update chains · a5736bf7

Alvaro Herrera authored Oct 06, 2017

When some tuple versions in an update chain are frozen due to them being
older than freeze_min_age, the xmax/xmin trail can become broken. This
breaks HOT (and probably other things). A subsequent VACUUM can break
things in more serious ways, such as leaving orphan heap-only tuples
whose root HOT redirect items were removed. This can be seen because
index creation (or REINDEX) complain like
ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t"

Because of relfrozenxid contraints, we cannot avoid the freezing of the
early tuples, so we must cope with the results: whenever we see an Xmin
of FrozenTransactionId, consider it a match for whatever the previous
Xmax value was.

This problem seems to have appeared in 9.3 with multixact changes,
though strictly speaking it seems unrelated.

Since 9.4 we have commit 37484ad2 "Change the way we mark tuples as
frozen", so the fix is simple: just compare the raw Xmin (still stored
in the tuple header, since freezing merely set an infomask bit) to the
Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so
the original value is lost and we have nothing to compare the Xmax with.
To cope with that case we need to compare the Xmin with FrozenXid,
assume it's a match, and hope for the best. Sadly, since you can
pg_upgrade a 9.3 instance containing half-frozen pages to newer
releases, we need to keep the old check in newer versions too, which
seems a bit brittle; I hope we can somehow get rid of that.

I didn't optimize the new function for performance. The new coding is
probably a bit slower than before, since there is a function call rather
than a straight comparison, but I'd rather have it work correctly than
be fast but wrong.

This is a followup after 20b65522 fixed a few related problems.
Apparently, in 9.6 and up there are more ways to get into trouble, but
in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so
there must be a separate bug.

Reported-by: Peter Geoghegan
Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood,
Yi Wen Wong, Álvaro
Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com

a5736bf7

Basic partition-wise join functionality. · f49842d1

Robert Haas authored Oct 06, 2017

Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.

Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.

Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.

Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com

f49842d1

05 Oct, 2017 10 commits

Fix typo in README. · fe9ba28e
Tom Lane authored Oct 05, 2017
```
s/BeginInternalSubtransaction/BeginInternalSubTransaction/
```
fe9ba28e

On CREATE TABLE, consider skipping validation of subpartitions. · 6476b261

Robert Haas authored Oct 05, 2017

This is just like commit 14f67a8e, but
for CREATE PARTITION rather than ATTACH PARTITION.

Jeevan Ladhe, with test case changes by me.

Discussion: http://postgr.es/m/CAOgcT0MWwG8WBw8frFMtRYHAgDD=tpt6U7WcsO_L2k0KYpm4Jg@mail.gmail.com

6476b261

On attach, consider skipping validation of subpartitions individually. · 14f67a8e

Robert Haas authored Oct 05, 2017

If the table attached as a partition is itself partitioned, individual
partitions might have constraints strong enough to skip scanning the
table even if the table actually attached does not. This is pretty
cheap to check, and possibly a big win if it works out.

Amit Langote, with test case changes by me.

Discussion: http://postgr.es/m/1f08b844-0078-aa8d-452e-7af3bf77d05f@lab.ntt.co.jp

14f67a8e

Improve error message when skipping scan of default partition. · c31e9d4b

Robert Haas authored Oct 05, 2017

It seems like a good idea to clearly distinguish between skipping the
scan of the new partition itself and skipping the scan of the default
partition.

Amit Langote

Discussion: http://postgr.es/m/1f08b844-0078-aa8d-452e-7af3bf77d05f@lab.ntt.co.jp

c31e9d4b

Allow DML commands that create tables to use parallel query. · e9baa5e9

Robert Haas authored Oct 05, 2017

Haribabu Kommi, reviewed by Dilip Kumar and Rafia Sabih. Various
cosmetic changes by me to explain why this appears to be safe but
allowing inserts in parallel mode in general wouldn't be. Also, I
removed the REFRESH MATERIALIZED VIEW case from Haribabu's patch,
since I'm not convinced that case is OK, and hacked on the
documentation somewhat.

Discussion: http://postgr.es/m/CAJrrPGdo5bak6qnPWe8Kpi8g_jfQEs-G4SYmG9y+OFaw2-dPvA@mail.gmail.com

e9baa5e9

Improve comments in vacuum_rel() and analyze_rel(). · 4d85c290

Tom Lane authored Oct 05, 2017

Remove obsolete references to get_rel_oids(). Avoid listing specific
relkinds in the comments, since we seem unable to keep such things
in sync with the code, and it's not all that helpful anyhow.

Noted by Michael Paquier, though I rewrote the comments a bit more.

Discussion: https://postgr.es/m/CAB7nPqTWiN9zwKTaOrsnKiGDChqRt7C1+CiiDk4N4OMn92rs6A@mail.gmail.com

4d85c290

Fix typo. · 4b2ba1fe

Robert Haas authored Oct 05, 2017

Etsuro Fujita

Discussion: http://postgr.es/m/1b2e9ac7-b99a-2769-5e42-afdf62bfa7fa@lab.ntt.co.jp

4b2ba1fe

Fix more user-visible elog() calls. · c097b271

Robert Haas authored Oct 05, 2017

Michael Paquier discovered that this could be triggered via SQL;
give a nicer message instead.

Patch by Michael Paquier, reviewed by Masahiko Sawada.

Discussion: http://postgr.es/m/CAB7nPqQtPg+LKKtzdKN26judHcvPZ0s1gNigzOT4j8CYuuuBYg@mail.gmail.com

c097b271

Document and use SPI_result_code_string() · 036166f2

Peter Eisentraut authored Aug 30, 2017

A lot of semi-internal code just prints out numeric SPI error codes,
which is not very helpful. We already have an API function to convert
the codes to a string, so let's make more use of that.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>

036166f2

Move SPI error reporting out of ri_ReportViolation() · 582bbcf3

Peter Eisentraut authored Aug 30, 2017

These are two completely unrelated code paths, so it doesn't make sense
to pack them into one function.

Add attribute noreturn to ri_ReportViolation().
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>

582bbcf3

04 Oct, 2017 5 commits

Msvc doesn't know UINT16_MAX, replace with PG_UINT16_MAX. · 9eafa2b5
Andres Freund authored Oct 04, 2017
```
UINT16_MAX usage is originating from commit 212e6f34.

Per buildfarm animal currawong.
```
9eafa2b5
Attempt to adapt windows build for 212e6f34. · 15334ad1
Andres Freund authored Oct 04, 2017
```
Per buildfarm animal baiji.
```
15334ad1

Replace binary search in fmgr_isbuiltin with a lookup array. · 212e6f34

Andres Freund authored Oct 04, 2017

Turns out we have enough functions that the binary search is quite
noticeable in profiles.

Thus have Gen_fmgrtab.pl build a new mapping from a builtin function's
oid to an index in the existing fmgr_builtins array. That keeps the
additional memory usage at a reasonable amount.

Author: Andres Freund, with input from Tom Lane
Discussion: https://postgr.es/m/20170914065128.a5sk7z4xde5uy3ei@alap3.anarazel.de

212e6f34

Move genbki.pl's find_defined_symbol to Catalog.pm. · 18f791ab
Andres Freund authored Oct 04, 2017
```
Will be used in Gen_fmgrtab.pl in a followup commit.
```
18f791ab
Adjust git_changelog for new-style release tags. · 4736d744
Tom Lane authored Oct 04, 2017
```
It wasn't on board with REL_n_n format.
```
4736d744

03 Oct, 2017 3 commits

Allow multiple tables to be specified in one VACUUM or ANALYZE command. · 11d8d72c

Tom Lane authored Oct 03, 2017

Not much to say about this; does what it says on the tin.

However, formerly, if there was a column list then the ANALYZE action was
implied; now it must be specified, or you get an error.  This is because
it would otherwise be a bit unclear what the user meant if some tables
have column lists and some don't.

Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some
editorialization by me

Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com

11d8d72c

Fix race condition with unprotected use of a latch pointer variable. · 45f9d086

Tom Lane authored Oct 03, 2017

Commit 597a87cc introduced a latch pointer variable to replace use
of a long-lived shared latch in the shared WalRcvData structure.
This was not well thought out, because there are now hazards of the
pointer variable changing while it's being inspected by another
process.  This could obviously lead to a core dump in code like

	if (WalRcv->latch)
		SetLatch(WalRcv->latch);

and there's a more remote risk of a torn read, if we have any
platforms where reading/writing a pointer is not atomic.

An actual problem would occur only if the walreceiver process
exits (gracefully) while the startup process is trying to
signal it, but that seems well within the realm of possibility.

To fix, treat the pointer variable (not the referenced latch)
as being protected by the WalRcv->mutex spinlock.  There
remains a race condition that we could apply SetLatch to a
process latch that no longer belongs to the walreceiver, but
I believe that's harmless: at worst it'd cause an extra wakeup
of the next process to use that PGPROC structure.

Back-patch to v10 where the faulty code was added.

Discussion: https://postgr.es/m/22735.1507048202@sss.pgh.pa.us

45f9d086

Fix coding rules violations in walreceiver.c · 89e434b5

Alvaro Herrera authored Oct 03, 2017

1. Since commit b1a9bad9 we had pstrdup() inside a
spinlock-protected critical section; reported by Andreas Seltenreich.
Turn those into strlcpy() to stack-allocated variables instead.
Backpatch to 9.6.

2. Since commit 9ed551e0 we had a pfree() uselessly inside a
spinlock-protected critical section.  Tom Lane noticed in code review.
Move down.  Backpatch to 9.6.

3. Since commit 64233902 we had GetCurrentTimestamp() (a kernel
call) inside a spinlock-protected critical section.  Tom Lane noticed in
code review.  Move it up.  Backpatch to 9.2.

4. Since commit 1bb25580 we did elog(PANIC) while holding spinlock.
Tom Lane noticed in code review.  Release spinlock before dying.
Backpatch to 9.2.

Discussion: https://postgr.es/m/87h8vhtgj2.fsf@ansel.ydns.eu

89e434b5

02 Oct, 2017 4 commits
- Expand collation documentation · f41bd4cb
  Peter Eisentraut authored Sep 22, 2017
```
Document better how to create custom collations and what locale strings
ICU accepts.  Explain the ICU examples in more detail.  Also update the
text on the CREATE COLLATION reference page a bit to take ICU more into
account.
```
  f41bd4cb
- Grammar typo in security warning about md5 · 0703c197
  Simon Riggs authored Oct 02, 2017
  
  0703c197
- Yet another pg_bswap typo in a windows only file. · 0c8b3ee9
  Andres Freund authored Oct 01, 2017
```
Per buildfarm animal frogmouth.

Brown-Paper-Bagged-By: Andres Freund
```
  0c8b3ee9
- Correct include file name in inet_aton fallback. · 859759b6
  Andres Freund authored Oct 01, 2017
```
Per buildfarm animal frogmouth.

Author: Andres Freund
```
  859759b6
01 Oct, 2017 8 commits

Replace most usages of ntoh[ls] and hton[sl] with pg_bswap.h. · 0ba99c84

Andres Freund authored Oct 01, 2017

All postgres internal usages are replaced, it's just libpq example
usages that haven't been converted. External users of libpq can't
generally rely on including postgres internal headers.

Note that this includes replacing open-coded byte swapping of 64bit
integers (using two 32 bit swaps) with a single 64bit swap.

Where it looked applicable, I have removed netinet/in.h and
arpa/inet.h usage, which previously provided the relevant
functionality. It's perfectly possible that I missed other reasons for
including those, the buildfarm will tell.

Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de

0ba99c84

Remove redundant stdint.h include. · 1f2830f9
Andres Freund authored Oct 01, 2017
```
Discussion: https://postgr.es/m/31674.1506788226@sss.pgh.pa.us
```
1f2830f9

Try to make crash restart test work on windows. · 78490579

Andres Freund authored Oct 01, 2017

Author: Andres Freund
Tested-By: Andrew Dunstan
Discussion: https://postgr.es/m/20170930224424.ud5ilchmclbl5y5n@alap3.anarazel.de

78490579

Allow pg_ctl kill to send SIGKILL. · 2e83db3a

Andres Freund authored Oct 01, 2017

Previously that was disallowed out of an abundance of
caution. Providing KILL support however is helpful to make the
013_crash_restart.pl test portable, and there's no actual issue with
allowing it.  SIGABRT, which has similar consequences except it also
dumps core, was already allowed.

Author: Andres Freund
Discussion: https://postgr.es/m/45d42d41-6145-9be1-7261-84acf6d9e344@2ndQuadrant.com

2e83db3a

Update v10 release notes, and set the official release date. · 5a632a21
Tom Lane authored Oct 01, 2017
```
Last(?) round of changes for 10.0.
```
5a632a21

Use a longer connection timeout in pg_isready test. · 4a1c0f3d

Tom Lane authored Oct 01, 2017

Buildfarm members skink and sungazer have both recently failed this
test, with symptoms indicating that the default 3-second timeout
isn't quite enough for those very slow systems. There's no reason
to be miserly with this timeout, so boost it to 60 seconds.

Back-patch to all versions containing this test. That may be overkill,
because the failure has only been observed in the v10 branch, but
I don't feel like having to revisit this later.

4a1c0f3d

Add list of acknowledgments to release notes · 655c938f

Peter Eisentraut authored Oct 01, 2017

This contains all individuals mentioned in the commit messages during
PostgreSQL 10 development.

current through babf18579455e85269ad75e1ddb03f34138f77b6

Discussion: https://www.postgresql.org/message-id/flat/54ad0e42-770e-dfe1-123e-bce9361ad452%402ndquadrant.com

655c938f

Fix busy-wait in pgbench, with --rate. · 396ef156

Heikki Linnakangas authored Oct 01, 2017

If --rate was used to throttle pgbench, it failed to sleep when it had
nothing to do, leading to a busy-wait with 100% CPU usage. This bug was
introduced in the refactoring in v10. Before that, sleep() was called with
a timeout, even when there were no file descriptors to wait for.

Reported by Jeff Janes, patch by Fabien COELHO. Backpatch to v10.

Discussion: https://www.postgresql.org/message-id/CAMkU%3D1x5hoX0pLLKPRnXCy0T8uHoDvXdq%2B7kAM9eoC9_z72ucw%40mail.gmail.com

396ef156

30 Sep, 2017 5 commits

Fix pg_dump to assign domain array type OIDs during pg_upgrade. · 2632bcce

Tom Lane authored Sep 30, 2017

During a binary upgrade, all type OIDs are supposed to be assigned by
pg_dump based on their values in the old cluster. But now that domains
have arrays, there's nothing to base the arrays' type OIDs on, if we're
upgrading from a pre-v11 cluster. Make pg_dump search for an unused type
OID to use for this purpose. Per buildfarm.

Discussion: https://postgr.es/m/E1dyLlE-0002gT-H5@gemulon.postgresql.org

2632bcce

Support arrays over domains. · c12d570f

Tom Lane authored Sep 30, 2017

Allowing arrays with a domain type as their element type was left un-done
in the original domain patch, but not for any very good reason. This
omission leads to such surprising results as array_agg() not working on
a domain column, because the parser can't identify a suitable output type
for the polymorphic aggregate.

In order to fix this, first clean up the APIs of coerce_to_domain() and
some internal functions in parse_coerce.c so that we consistently pass
around a CoercionContext along with CoercionForm. Previously, we sometimes
passed an "isExplicit" boolean flag instead, which is strictly less
information; and coerce_to_domain() didn't even get that, but instead had
to reverse-engineer isExplicit from CoercionForm. That's contrary to the
documentation in primnodes.h that says that CoercionForm only affects
display and not semantics. I don't think this change fixes any live bugs,
but it makes things more consistent. The main reason for doing it though
is that now build_coercion_expression() receives ccontext, which it needs
in order to be able to recursively invoke coerce_to_target_type().

Next, reimplement ArrayCoerceExpr so that the node does not directly know
any details of what has to be done to the individual array elements while
performing the array coercion. Instead, the per-element processing is
represented by a sub-expression whose input is a source array element and
whose output is a target array element. This simplifies life in
parse_coerce.c, because it can build that sub-expression by a recursive
invocation of coerce_to_target_type(). The executor now handles the
per-element processing as a compiled expression instead of hard-wired code.
The main advantage of this is that we can use a single ArrayCoerceExpr to
handle as many as three successive steps per element: base type conversion,
typmod coercion, and domain constraint checking. The old code used two
stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty
inefficient, and adding yet another array deconstruction to do domain
constraint checking seemed very unappetizing.

In the case where we just need a single, very simple coercion function,
doing this straightforwardly leads to a noticeable increase in the
per-array-element runtime cost. Hence, add an additional shortcut evalfunc
in execExprInterp.c that skips unnecessary overhead for that specific form
of expression. The runtime speed of simple cases is within 1% or so of
where it was before, while cases that previously required two levels of
array processing are significantly faster.

Finally, create an implicit array type for every domain type, as we do for
base types, enums, etc. Everything except the array-coercion case seems
to just work without further effort.

Tom Lane, reviewed by Andrew Dunstan

Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us

c12d570f

Fix copy & pasto in 510b8cbf. · 248e3375
Andres Freund authored Sep 29, 2017
```
Reported-By: Peter Geoghegan
```
248e3375
Fix typo. · f1424123
Andres Freund authored Sep 29, 2017
```
Reported-By: Thomas Munro and Jesper Pedersen
```
f1424123

Extend & revamp pg_bswap.h infrastructure. · 510b8cbf

Andres Freund authored Sep 29, 2017

Upcoming patches are going to address performance issues that involve
slow system provided ntohs/htons etc. To address that expand
pg_bswap.h to provide pg_ntoh{16,32,64}, pg_hton{16,32,64} and
optimize their respective implementations by using compiler intrinsics
for gcc compatible compilers and msvc. Fall back to manual
implementations using shifts etc otherwise.

Additionally remove multiple evaluation hazards from the existing
BSWAP32/64 macros, by replacing them with inline functions when
necessary. In the course of that the naming scheme is changed to
pg_bswap16/32/64.

Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de

510b8cbf

29 Sep, 2017 3 commits

Use Py_RETURN_NONE where suitable · 0008a106
Peter Eisentraut authored Sep 29, 2017
```
This is more idiomatic style and available as of Python 2.4, which is
our minimum.
```
0008a106

Fix inadequate locking during get_rel_oids(). · 19de0ab2

Tom Lane authored Sep 29, 2017

get_rel_oids used to not take any relation locks at all, but that stopped
being a good idea with commit 3c3bb993, which inserted a syscache lookup
into the function. A concurrent DROP TABLE could now produce "cache lookup
failed", which we don't want to have happen in normal operation. The best
solution seems to be to transiently take a lock on the relation named by
the RangeVar (which also makes the result of RangeVarGetRelid a lot less
spongy). But we shouldn't hold the lock beyond this function, because we
don't want VACUUM to lock more than one table at a time. (That would not
be a big problem right now, but it will become one after the pending
feature patch to allow multiple tables to be named in VACUUM.)

In passing, adjust vacuum_rel and analyze_rel to document that we don't
trust the passed RangeVar to be accurate, and allow the RangeVar to
possibly be NULL --- which it is anyway for a whole-database VACUUM,
though we accidentally didn't crash for that case.

The passed RangeVar is in fact inaccurate when dealing with a child
partition, as of v10, and it has been wrong for a whole long time in the
case of vacuum_rel() recursing to a TOAST table. None of these things
present visible bugs up to now, because the passed RangeVar is in fact
only consulted for autovacuum logging, and in that particular context it's
always accurate because autovacuum doesn't let vacuum.c expand partitions
nor recurse to toast tables. Still, this seems like trouble waiting to
happen, so let's nail the door at least partly shut. (Further cleanup
is planned, in HEAD only, as part of the pending feature patch.)

Fix some sadly inaccurate/obsolete comments too. Back-patch to v10.

Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/25023.1506107590@sss.pgh.pa.us

19de0ab2

psql: Don't try to print a partition constraint we didn't fetch. · 69c16983

Robert Haas authored Sep 29, 2017

If \d rather than \d+ is used, then verbose is false and we don't ask
the server for the partition constraint; so we shouldn't print it in
that case either.

Maksim Milyutin, per a report from Jesper Pedersen.  Reviewed by
Jesper Pedersen and Amit Langote.

Discussion: http://postgr.es/m/2af5fc4d-7bcc-daa8-4fe6-86274bea363c@redhat.com

69c16983