Commits · 80788a431e9bff06314a054109fdea66ac538199 · Abuhujair Javed / Postgres FD Implementation

13 Feb, 2015 2 commits

Simplify waiting logic in reading from / writing to client. · 80788a43

Heikki Linnakangas authored Feb 13, 2015

The client socket is always in non-blocking mode, and if we actually want
blocking behaviour, we emulate it by sleeping and retrying. But we have
retry loops at different layers for reads and writes, which was confusing.
To simplify, remove all the sleeping and retrying code from the lower
levels, from be_tls_read and secure_raw_read and secure_raw_write, and put
all the logic in secure_read() and secure_write().

80788a43

Simplify the way OpenSSL renegotiation is initiated in server. · 272923a0

Heikki Linnakangas authored Feb 13, 2015

At least in all modern versions of OpenSSL, it is enough to call
SSL_renegotiate() once, and then forget about it. Subsequent SSL_write()
and SSL_read() calls will finish the handshake.

The SSL_set_session_id_context() call is unnecessary too. We only have
one SSL context, and the SSL session was created with that to begin with.

272923a0

12 Feb, 2015 6 commits

pg_upgrade: improve checksum mismatch error message · dc01efa5
Bruce Momjian authored Feb 11, 2015
```
Patch by Greg Sabino Mullane, slight adjustments by me
```
dc01efa5

pg_upgrade: quote directory names in delete_old_cluster script · 056764b1

Bruce Momjian authored Feb 11, 2015

This allows the delete script to properly function when special
characters appear in directory paths, e.g. spaces.

Backpatch through 9.0

056764b1

pg_upgrade: preserve freeze info for postgres/template1 dbs · 866f3017

Bruce Momjian authored Feb 11, 2015

pg_database.datfrozenxid and pg_database.datminmxid were not preserved
for the 'postgres' and 'template1' databases.  This could cause missing
clog file errors on access to user tables and indexes after upgrades in
these databases.

Backpatch through 9.0

866f3017

Fix typo in logicaldecoding.sgml. · 8785e6e3
Andres Freund authored Feb 12, 2015
```
Author: Tatsuo Ishii

Backpatch to 9.4, where logicaldecoding was introduced.
```
8785e6e3

Fix missing PQclear() in libpqrcv_endstreaming(). · 4f38a281

Tom Lane authored Feb 11, 2015

This omission leaked one PGresult per WAL streaming cycle, which possibly
would never be enough to notice in the real world, but it's still a leak.

Per Coverity. Back-patch to 9.3 where the error was introduced.

4f38a281

Fix minor memory leak in ident_inet(). · 58146d35

Tom Lane authored Feb 11, 2015

We'd leak the ident_serv data structure if the second pg_getaddrinfo_all
(the one for the local address) failed.  This is not of great consequence
because a failure return here just leads directly to backend exit(), but
if this function is going to try to clean up after itself at all, it should
not have such holes in the logic.  Try to fix it in a future-proof way by
having all the failure exits go through the same cleanup path, rather than
"optimizing" some of them.

Per Coverity.  Back-patch to 9.2, which is as far back as this patch
applies cleanly.

58146d35

11 Feb, 2015 3 commits

Fix more memory leaks in failure path in buildACLCommands. · 9179444d

Tom Lane authored Feb 11, 2015

We already had one go at this issue in commit d73b7f97, but we
failed to notice that buildACLCommands also leaked several PQExpBuffers
along with a simply malloc'd string. This time let's try to make the
fix a bit more future-proof by eliminating the separate exit path.

It's still not exactly critical because pg_dump will curl up and die on
failure; but since the amount of the potential leak is now several KB,
it seems worth back-patching as far as 9.2 where the previous fix landed.

Per Coverity, which evidently is smarter than clang's static analyzer.

9179444d

Fix pg_dump's heuristic for deciding which casts to dump. · 9feefedf

Tom Lane authored Feb 10, 2015

Back in 2003 we had a discussion about how to decide which casts to dump.
At the time pg_dump really only considered an object's containing schema
to decide what to dump (ie, dump whatever's not in pg_catalog), and so
we chose a complicated idea involving whether the underlying types were to
be dumped (cf commit a6790ce8). But users
are allowed to create casts between built-in types, and we failed to dump
such casts. Let's get rid of that heuristic, which has accreted even more
ugliness since then, in favor of just looking at the cast's OID to decide
if it's a built-in cast or not.

In passing, also fix some really ancient code that supposed that it had to
manufacture a dependency for the cast on its cast function; that's only
true when dumping from a pre-7.3 server. This just resulted in some wasted
cycles and duplicate dependency-list entries with newer servers, but we
might as well improve it.

Per gripes from a number of people, most recently Greg Sabino Mullane.
Back-patch to all supported branches.

9feefedf

Fix GEQO to not assume its join order heuristic always works. · 1a179f36

Tom Lane authored Feb 10, 2015

Back in commit 400e2c93 I rewrote GEQO's
gimme_tree function to improve its heuristic for modifying the given tour
into a legal join order. In what can only be called a fit of hubris,
I supposed that this new heuristic would *always* find a legal join order,
and ripped out the old logic that allowed gimme_tree to sometimes fail.

The folly of this is exposed by bug #12760, in which the "greedy" clumping
behavior of merge_clump() can lead it into a dead end which could only be
recovered from by un-clumping. We have no code for that and wouldn't know
exactly what to do with it if we did. Rather than try to improve the
heuristic rules still further, let's just recognize that it *is* a
heuristic and probably must always have failure cases. So, put back the
code removed in the previous commit to allow for failure (but comment it
a bit better this time).

It's possible that this code was actually fully correct at the time and
has only been broken by the introduction of LATERAL. But having seen this
example I no longer have much faith in that proposition, so back-patch to
all supported branches.

1a179f36

10 Feb, 2015 2 commits

Fixed array handling in ecpg. · 1f393fc9

Michael Meskes authored Feb 10, 2015

When ecpg was rewritten to the new protocol version not all variable types
were corrected. This patch rewrites the code for these types to fix that. It
also fixes the documentation to correctly tell the status of array handling.

1f393fc9

Speed up CRC calculation using slicing-by-8 algorithm. · 025c0242

Heikki Linnakangas authored Feb 10, 2015

This speeds up WAL generation and replay. The new algorithm is
significantly faster with large inputs, like full-page images or when
inserting wide rows. It is slower with tiny inputs, i.e. less than 10 bytes
or so, but the speedup with longer inputs more than make up for that. Even
small WAL records at least have 24 byte header in the front.

The output is identical to the current byte-at-a-time computation, so this
does not affect compatibility. The new algorithm is only used for the
CRC-32C variant, not the legacy version used in tsquery or the
"traditional" CRC-32 used in hstore and ltree. Those are not as performance
critical, and are usually only applied over small inputs, so it seems
better to not carry around the extra lookup tables to speed up those rare
cases.

Abhijit Menon-Sen

025c0242

09 Feb, 2015 4 commits

Fix MSVC build. · cc761b17

Heikki Linnakangas authored Feb 09, 2015

When I moved pg_crc.c from src/port to src/common, I forgot to modify MSVC
build script accordingly.

cc761b17

Minor cleanup/code review for "indirect toast" stuff. · bc4de01d

Tom Lane authored Feb 09, 2015

Fix some issues I noticed while fooling with an extension to allow an
additional kind of toast pointer. Much of this is just comment
improvement, but there are a couple of actual bugs, which might or might
not be reachable today depending on what can happen during logical
decoding. An example is that toast_flatten_tuple() failed to cover the
possibility of an indirection pointer in its input. Back-patch to 9.4
just in case that is reachable now.

In HEAD, also correct some really minor issues with recent compression
reorganization, such as dangerously underparenthesized macros.

bc4de01d

Move pg_crc.c to src/common, and remove pg_crc_tables.h · c619c235

Heikki Linnakangas authored Feb 09, 2015

To get CRC functionality in a client program, you now need to link with
libpgcommon instead of libpgport. The CRC code has nothing to do with
portability, so libpgcommon is a better home. (libpgcommon didn't exist
when pg_crc.c was originally moved to src/port.)

Remove the possibility to get CRC functionality by just #including
pg_crc_tables.h. I'm not aware of any extensions that actually did that and
couldn't simply link with libpgcommon.

This also moves the pg_crc.h header file from src/include/utils to
src/include/common, which will require changes to any external programs
that currently does #include "utils/pg_crc.h". That seems acceptable, as
include/common is clearly the right home for it now, and the change needed
to any such programs is trivial.

c619c235

Move pg_lzcompress.c to src/common. · 40bede54

Fujii Masao authored Feb 09, 2015

The meta data of PGLZ symbolized by PGLZ_Header is removed, to make
the compression and decompression code independent on the backend-only
varlena facility. PGLZ_Header is being used to store some meta data
related to the data being compressed like the raw length of the uncompressed
record or some varlena-related data, making it unpluggable once PGLZ is
stored in src/common as it contains some backend-only code paths with
the management of varlena structures. The APIs of PGLZ are reworked
at the same time to do only compression and decompression of buffers
without the meta-data layer, simplifying its use for a more general usage.

On-disk format is preserved as well, so there is no incompatibility with
previous major versions of PostgreSQL for TOAST entries.

Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. Especially this commit is required
for upcoming WAL compression feature so that the WAL reader facility can
decompress the WAL data by using pglz_decompress.

Michael Paquier, reviewed by me.

40bede54

07 Feb, 2015 2 commits

Check DCH_MAX_ITEM_SIZ limits with <=, not <. · 237795a7

Noah Misch authored Feb 06, 2015

We reserve space for the full amount, not one less. The affected checks
deal with localized month and day names. Today's DCH_MAX_ITEM_SIZ value
would suffice for a 60-byte day name, while the longest known is the
49-byte mn_CN.utf-8 word for "Saturday." Thus, the upshot of this
change is merely to avoid misdirecting future readers of the code; users
are not expected to see errors either way.

237795a7

Assert(PqCommReadingMsg) in pq_peekbyte(). · a7a4adcf

Noah Misch authored Feb 06, 2015

Interrupting pq_recvbuf() can break protocol sync, so its callers all
deserve this assertion.  The one pq_peekbyte() caller suffices already.

a7a4adcf

06 Feb, 2015 1 commit

Report WAL flush, not insert, position in replication IDENTIFY_SYSTEM · ff16b40f

Heikki Linnakangas authored Feb 06, 2015

When beginning streaming replication, the client usually issues the
IDENTIFY_SYSTEM command, which used to return the current WAL insert
position. That's not suitable for the intended purpose of that field,
however. pg_receivexlog uses it to start replication from the reported
point, but if it hasn't been flushed to disk yet, it will fail. Change
IDENTIFY_SYSTEM to report the flush position instead.

Backpatch to 9.1 and above. 9.0 doesn't report any WAL position.

ff16b40f

05 Feb, 2015 1 commit

This routine was calling ecpg_alloc to allocate to memory but did not · 5ee5bc38

Michael Meskes authored Feb 05, 2015

actually check the returned pointer allocated, potentially NULL which
could be the result of a malloc call.

Issue noted by Coverity, fixed by Michael Paquier <michael@otacoo.com>

5ee5bc38

04 Feb, 2015 7 commits

Use a separate memory context for GIN scan keys. · d88976cf

Heikki Linnakangas authored Feb 04, 2015

It was getting tedious to track and release all the different things that
form a scan key. We were leaking at least the queryCategories array, and
possibly more, on a rescan. That was visible if a GIN index was used in a
nested loop join. This also protects from leaks in extractQuery method.

No backpatching, given the lack of complaints from the field. Maybe later,
after this has received more field testing.

d88976cf

Fix reference-after-free when waiting for another xact due to constraint. · 57fe2468

Heikki Linnakangas authored Feb 04, 2015

If an insertion or update had to wait for another transaction to finish,
because there was another insertion with conflicting key in progress,
we would pass a just-free'd item pointer to XactLockTableWait().

All calls to XactLockTableWait() and MultiXactIdWait() had similar issues.
Some passed a pointer to a buffer in the buffer cache, after already
releasing the lock. The call in EvalPlanQualFetch had already released the
pin too. All but the call in execUtils.c would merely lead to reporting a
bogus ctid, however (or an assertion failure, if enabled).

All the callers that passed HeapTuple->t_data->t_ctid were slightly bogus
anyway: if the tuple was updated (again) in the same transaction, its ctid
field would point to the next tuple in the chain, not the tuple itself.

Backpatch to 9.4, where the 'ctid' argument to XactLockTableWait was added
(in commit f88d4cfc)

57fe2468

pgcrypto: Code cleanup for decrypt_internal. · 370b3a46

Robert Haas authored Feb 04, 2015

Remove some unnecessary null-tests, and replace a goto-label construct
with an "if" block.

Michael Paquier, reviewed by me.

370b3a46

Fix memory leaks on OOM in ecpg. · c31b5d9d

Heikki Linnakangas authored Feb 04, 2015

These are fairly obscure cases, but let's keep Coverity happy.

Michael Paquier with some further fixes by me.

c31b5d9d

Add missing float.h include to snprintf.c. · ff8ca3b0

Andres Freund authored Feb 04, 2015

On windows _isnan() (which isnan() is redirected to in port/win32.h)
is declared in float.h, not math.h.

Per buildfarm animal currawong.

Backpatch to all supported branches.

ff8ca3b0

doc: Fix markup · c036edb7
Fujii Masao authored Feb 04, 2015
```
Ian Barwick
```
c036edb7

Add dummy PQsslAttributes function for non-SSL builds. · 302262d5

Heikki Linnakangas authored Feb 04, 2015

All the other new SSL information functions had dummy versions in
be-secure.c, but I missed PQsslAttributes(). Oops. Surprisingly, the linker
did not complain about the missing function on most platforms represented in
the buildfarm, even though it is exported, except for a few Windows systems.

302262d5

03 Feb, 2015 12 commits

Remove ill-conceived Assertion in ProcessClientWriteInterrupt(). · 3a54f4a4

Andres Freund authored Feb 03, 2015

It's perfectly fine to have blocked interrupts when
ProcessClientWriteInterrupt() is called. In fact it's commonly the
case when emitting error reports. And we deal with that correctly.

Even if that'd not be the case, it'd be a bad location for such a
assertion. Because ProcessClientWriteInterrupt() is only called when
the socket is blocked it's hard to hit.

Per Heikki and buildfarm animals nightjar and dunlin.

3a54f4a4

Remove remnants of ImmediateInterruptOK handling. · 2505ce0b

Andres Freund authored Feb 03, 2015

Now that nothing sets ImmediateInterruptOK to true anymore, we can
remove all the supporting code.

Reviewed-By: Heikki Linnakangas

2505ce0b

Remove the option to service interrupts during PGSemaphoreLock(). · d0699571

Andres Freund authored Feb 03, 2015

The remaining caller (lwlocks) doesn't need that facility, and we plan
to remove ImmedidateInterruptOK entirely. That means that interrupts
can't be serviced race-free and portably anyway, so there's little
reason for keeping the feature.

Reviewed-By: Heikki Linnakangas

d0699571

Move deadlock and other interrupt handling in proc.c out of signal handlers. · 6753333f

Andres Freund authored Feb 03, 2015

Deadlock checking was performed inside signal handlers up to
now. While it's a remarkable feat to have made this work reliably,
it's quite complex to understand why that is the case. Partially it
worked due to the assumption that semaphores are signal safe - which
is not actually documented to be the case for sysv semaphores.

The reason we had to rely on performing this work inside signal
handlers is that semaphores aren't guaranteed to be interruptable by
signals on all platforms. But now that latches provide a somewhat
similar API, which actually has the guarantee of being interruptible,
we can avoid doing so.

Signalling between ProcSleep, ProcWakeup, ProcWaitForSignal and
ProcSendSignal is now done using latches. This increases the
likelihood of spurious wakeups. As spurious wakeup already were
possible and aren't likely to be frequent enough to be an actual
problem, this seems acceptable.

This change would allow for further simplification of the deadlock
checking, now that it doesn't have to run in a signal handler. But
even if I were motivated to do so right now, it would still be better
to do that separately. Such a cleanup shouldn't have to be reviewed a
the same time as the more fundamental changes in this commit.

There is one possible usability regression due to this commit. Namely
it is more likely than before that log_lock_waits messages are output
more than once.

Reviewed-By: Heikki Linnakangas

6753333f

Don't allow immediate interrupts during authentication anymore. · 6647248e

Andres Freund authored Feb 03, 2015

We used to handle authentication_timeout by setting
ImmediateInterruptOK to true during large parts of the authentication
phase of a new connection.  While that happens to work acceptably in
practice, it's not particularly nice and has ugly corner cases.

Previous commits converted the FE/BE communication to use latches and
implemented support for interrupt handling during both
send/recv. Building on top of that work we can get rid of
ImmediateInterruptOK during authentication, by immediately treating
timeouts during authentication as a reason to die. As die interrupts
are handled immediately during client communication that provides a
sensibly quick reaction time to authentication timeout.

Additionally add a few CHECK_FOR_INTERRUPTS() to some more complex
authentication methods. More could be added, but this already should
provides a reasonable coverage.

While it this overall increases the maximum time till a timeout is
reacted to, it greatly reduces complexity and increases
reliability. That seems like a overall win. If the increase proves to
be noticeable we can deal with those cases by moving to nonblocking
network code and add interrupt checking there.

Reviewed-By: Heikki Linnakangas

6647248e

Remove unused "m" field in LSEG. · cec916f3

Tom Lane authored Feb 03, 2015

This field has been unreferenced since 1998, and does not appear in lseg
values stored on disk (since sizeof(lseg) is only 32 bytes according to
pg_type). There was apparently some idea of maintaining it just in values
appearing in memory, but the bookkeeping required to make that work would
surely far outweigh the cost of recalculating the line's slope when needed.
Remove it to (a) simplify matters and (b) suppress some uninitialized-field
whining from Coverity.

cec916f3

Process 'die' interrupts while reading/writing from the client socket. · 4fe384bd

Andres Freund authored Feb 03, 2015

Up to now it was impossible to terminate a backend that was trying to
send/recv data to/from the client when the socket's buffer was already
full/empty. While the send/recv calls itself might have gotten
interrupted by signals on some platforms, we just immediately retried.

That could lead to situations where a backend couldn't be terminated ,
after a client died without the connection being closed, because it
was blocked in send/recv.

The problem was far more likely to be hit when sending data than when
reading. That's because while reading a command from the client, and
during authentication, we processed interrupts immediately . That
primarily left COPY FROM STDIN as being problematic for recv.

Change things so that that we process 'die' events immediately when
the appropriate signal arrives. We can't sensibly react to query
cancels at that point, because we might loose sync with the client as
we could be in the middle of writing a message.

We don't interrupt writes if the write buffer isn't full, as indicated
by write() returning EWOULDBLOCK, as that would lead to fewer error
messages reaching clients.

Per discussion with Kyotaro HORIGUCHI and Heikki Linnakangas

Discussion: 20140927191243.GD5423@alap3.anarazel.de

4fe384bd

Introduce and use infrastructure for interrupt processing during client reads. · 4f85fde8

Andres Freund authored Feb 03, 2015

Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers...  The required code was
fragile and verbose, and is likely to contain bugs.

That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.

Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.

As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.

We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.

This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort.  In addition to allowing getting
rid of ImmediateInterruptOK, that is.

Author: Andres Freund
Reviewed-By: Heikki Linnakangas

4f85fde8

Use a nonblocking socket for FE/BE communication and block using latches. · 387da188

Andres Freund authored Feb 03, 2015

This allows to introduce more elaborate handling of interrupts while
reading from a socket.  Currently some interrupt handlers have to do
significant work from inside signal handlers, and it's very hard to
correctly write code to do so.  Generic signal handler limitations,
combined with the fact that we can't safely jump out of a signal
handler while reading from the client have prohibited implementation
of features like timeouts for idle-in-transaction.

Additionally we use the latch code to wait in a couple places where we
previously only had waiting code on windows as other platforms just
busy looped.

This can increase the number of systemcalls happening during FE/BE
communication. Benchmarks so far indicate that the impact isn't very
high, and there's room for optimization in the latch code. The chance
of cleaning up the usage of latches gives us, seem to outweigh the
risk of small performance regressions.

This commit theoretically can't used without the next patch in the
series, as WaitLatchOrSocket is not defined to be fully signal
safe. As we already do that in some cases though, it seems better to
keep the commits separate, so they're easier to understand.

Author: Andres Freund
Reviewed-By: Heikki Linnakangas

387da188

Fix breakage in GEODEBUG debug code. · 778d498c

Tom Lane authored Feb 03, 2015

LINE doesn't have an "m" field (anymore anyway).  Also fix unportable
assumption that %x can print the result of pointer subtraction.

In passing, improve single_decode() in minor ways:
* Remove unnecessary leading-whitespace skip (strtod does that already).
* Make GEODEBUG message more intelligible.
* Remove entirely-useless test to see if strtod returned a silly pointer.
* Don't bother computing trailing-whitespace skip unless caller wants
  an ending pointer.

This has been broken since 261c7d4b.
Although it's only debug code, might as well fix the 9.4 branch too.

778d498c

Add API functions to libpq to interrogate SSL related stuff. · 91fa7b47

Heikki Linnakangas authored Feb 03, 2015

This makes it possible to query for things like the SSL version and cipher
used, without depending on OpenSSL functions or macros. That is a good
thing if we ever get another SSL implementation.

PQgetssl() still works, but it should be considered as deprecated as it
only works with OpenSSL. In particular, PQgetSslInUse() should be used to
check if a connection uses SSL, because as soon as we have another
implementation, PQgetssl() will return NULL even if SSL is in use.

91fa7b47

Refactor page compactifying code. · 809d9a26

Heikki Linnakangas authored Feb 03, 2015

The logic to compact away removed tuples from page was duplicated with
small differences in PageRepairFragmentation, PageIndexMultiDelete, and
PageIndexDeleteNoCompact. Put it into a common function.

Reviewed by Peter Geoghegan.

809d9a26