- 27 Apr, 2017 4 commits
-
-
Bruce Momjian authored
Reported-by: Amit Kapila
-
Simon Riggs authored
The bug fixed by 0874d4f3 caused us to question and rework the handling of subtransactions in 2PC during and at end of recovery. Patch adds checks and tests to ensure no further bugs. This effectively removes the temporary measure put in place by 546c13e1. Author: Simon Riggs Reviewed-by: Tom Lane, Michael Paquier Discussion: http://postgr.es/m/CANP8+j+vvXmruL_i2buvdhMeVv5TQu0Hm2+C5N+kdVwHJuor8w@mail.gmail.com
-
Simon Riggs authored
Tests for normal and prepared transactions Author: Nikhil Sontakke, placed in new test file by me
-
Peter Eisentraut authored
Author: Masahiko Sawada <sawada.mshk@gmail.com>
-
- 26 Apr, 2017 10 commits
-
-
Tom Lane authored
Previously, maybe_start_bgworker() would launch at most one bgworker process per call, on the grounds that the postmaster might otherwise neglect its other duties for too long. However, that seems overly conservative, especially since bad effects only become obvious when many hundreds of bgworkers need to be launched at once. On the other side of the coin is that the existing logic could result in substantial delay of bgworker launches, because ServerLoop isn't guaranteed to iterate immediately after a signal arrives. (My attempt to fix that by using pselect(2) encountered too many portability question marks, and in any case could not help on platforms without pselect().) One could also question the wisdom of using an O(N^2) processing method if the system is intended to support so many bgworkers. As a compromise, allow that function to launch up to 100 bgworkers per call (and in consequence, rename it to maybe_start_bgworkers). This will allow any normal parallel-query request for workers to be satisfied immediately during sigusr1_handler, avoiding the question of whether ServerLoop will be able to launch more promptly. There is talk of rewriting the postmaster to use a WaitEventSet to avoid the signal-response-delay problem, but I'd argue that this change should be kept even after that happens (if it ever does). Backpatch to 9.6 where parallel query was added. The issue exists before that, but previous uses of bgworkers typically aren't as sensitive to how quickly they get launched. Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us
-
Bruce Momjian authored
-
Stephen Frost authored
Our general rule for pg_get_X(oid) functions is to simply return NULL when passed an invalid or inappropriate OID. Teach pg_get_partkeydef to do this also, making it easier for users to use this function when querying against tables with both partitions and non-partitions (such as pg_class). As a concrete example, this makes pg_dump's life a little easier. Author: Amit Langote
-
Tom Lane authored
Smarter compilers can see that "slot" can't be used uninitialized, but some popular ones cannot. Noted by Jeff Janes.
-
Peter Eisentraut authored
WITH is optional for REFRESH PUBLICATION. Also, remove a spurious bracket and fix a punctuation. Author: Euler Taveira <euler@timbira.com.br>
-
Peter Eisentraut authored
Publisher relation can be incorrectly chosen, if there are more than one relation in different schemas with the same name. Author: Euler Taveira <euler@timbira.com.br>
-
Peter Eisentraut authored
Author: Euler Taveira <euler@timbira.com.br>
-
Fujii Masao authored
Author: Masahiko Sawada
-
Peter Eisentraut authored
The code was originally written with assumption that launcher is the only process starting the worker. However that hasn't been true since commit 7c4f5240 which failed to modify the worker management code adequately. This patch adds an in_use field to the LogicalRepWorker struct to indicate whether the worker slot is being used and uses proper locking everywhere this flag is set or read. However if the parent process dies while the new worker is starting and the new worker fails to attach to shared memory, this flag would never get cleared. We solve this rare corner case by adding a sort of garbage collector for in_use slots. This uses another field in the LogicalRepWorker struct named launch_time that contains the time when the worker was started. If any request to start a new worker does not find free slot, we'll check for workers that were supposed to start but took too long to actually do so, and reuse their slot. In passing also fix possible race conditions when stopping a worker that hasn't finished starting yet. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Fujii Masao <masao.fujii@gmail.com>
-
Bruce Momjian authored
Reported-by: Amit Kapila
-
- 25 Apr, 2017 22 commits
-
-
Stephen Frost authored
There is no need to forbid ALTER TABLE ONLY on partitioned tables, when no partitions exist yet. This can be handy for users who are building up their partitioned table independently and will create actual partitions later. In addition, this is how pg_dump likes to operate in certain instances. Author: Amit Langote, with some error message word-smithing by me
-
Bruce Momjian authored
Reported-by: Tels
-
Peter Eisentraut authored
Otherwise one would have to wait up to DEFAULT_NAPTIME_PER_CYCLE until the subscription worker is considered for starting. There is a small race condition: If one enables a subscription right after disabling it, the launcher might not have registered the stopping when receiving the wakeup signal for the re-enabling. The start will then not happen right away but after the full cycle time. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
-
Bruce Momjian authored
Reported-by: Tom Lane
-
Bruce Momjian authored
Reported-by: Andres Freund
-
Bruce Momjian authored
Reported-by: Tels, Andres Freund
-
Bruce Momjian authored
-
Fujii Masao authored
In quorum-based synchronous replication, all the standbys listed in synchronous_standby_names equally have chances to be chosen as synchronous standbys. So they should have the same priority. However, previously, quorum standbys whose names appear earlier in the list were given higher priority values though the difference of those priority values didn't affect the selection of synchronous standbys. Users could see those "meaningless" priority values in pg_stat_replication and this was confusing. This commit gives all the quorum synchronous standbys the same highest priority, i.e., 1, in order to remove such confusion. Author: Fujii Masao Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi Discussion: http://postgr.es/m/CAHGQGwEKOw=SmPLxJzkBsH6wwDBgOnVz46QjHbtsiZ-d-2RGUg@mail.gmail.com
-
Bruce Momjian authored
Reported-by: Michael Paquier, Felix Gerzaguet
-
Bruce Momjian authored
Reported-by: David Rowley, Amit Langote, Ashutosh Bapat
-
Robert Haas authored
Commit 5dfc1981 removed the only existing caller of hash_freeze, but left behind a comment indicating that hash_freeze was still used. Adjust. Kyotaro Horiguchi Discussion: http://postgr.es/m/20170424.165541.230634914.horiguchi.kyotaro@lab.ntt.co.jp
-
Fujii Masao authored
This commit also fixes copyright line missed by the automated script. Author: Masahiko Sawada
-
Bruce Momjian authored
Reported-by: Amit Kapila
-
Bruce Momjian authored
The requirement to rebuild pg_upgrade-ed hash indexes was kept in the incompatibilities section. Reported-by: Amit Kapila
-
Bruce Momjian authored
Reported-by: Amit Kapila
-
Bruce Momjian authored
-
Bruce Momjian authored
Reported-by: Takayuki Tsunakawa
-
Peter Eisentraut authored
Objects in an extension are shippable to a foreign server if the extension is part of the foreign server definition's shippable extensions list. But this was not properly considered in some cases when checking whether a join condition can be pushed to a foreign server and the join condition uses an object from a shippable extension. So the join would never be pushed down in those cases. So, the list of extensions needs to be made available in fpinfo of the relation being considered to be pushed down before any expressions are assessed for being shippable. Fix foreign_join_ok() to do that for a join relation. The code to save FDW options in fpinfo is scattered at multiple places. Bring all of that together into functions apply_server_options(), apply_table_options(), and merge_fdw_options(). David Rowley and Ashutosh Bapat, per report from David Rowley
-
Bruce Momjian authored
Reported-by: Takayuki Tsunakawa
-
Bruce Momjian authored
-
Bruce Momjian authored
Reported-by: Andreas Karlsson
-
Bruce Momjian authored
-
- 24 Apr, 2017 4 commits
-
-
Bruce Momjian authored
-
Tom Lane authored
This reverts commit 81069a9e. Buildfarm results suggest that some platforms have versions of pselect(2) that are not merely non-atomic, but flat out non-functional. Revert the use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART patch as the source of trouble). If it's so, we should probably look into blacklisting specific platforms that have broken pselect. Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
-
Tom Lane authored
Traditionally we've unblocked signals, called select(2), and then blocked signals again. The code expects that the select() will be cancelled with EINTR if an interrupt occurs; but there's a race condition, which is that an already-pending signal will be delivered as soon as we unblock, and then when we reach select() there will be nothing preventing it from waiting. This can result in a long delay before we perform any action that ServerLoop was supposed to have taken in response to the signal. As with the somewhat-similar symptoms fixed by commit 89390208, the main practical problem is slow launching of parallel workers. The window for trouble is usually pretty short, corresponding to one iteration of ServerLoop; but it's not negligible. To fix, use pselect(2) in place of select(2) where available, as that's designed to solve exactly this problem. Where not available, we continue to use the old way, and are no worse off than before. pselect(2) has been required by POSIX since about 2001, so most modern platforms should have it. A bigger portability issue is that some implementations are said to be non-atomic, ie pselect() isn't really any different from unblock/select/reblock. Still, we're no worse off than before on such a platform. There is talk of rewriting the postmaster to use a WaitEventSet and not do signal response work in signal handlers, at which point this could be reverted, since we'd be using a self-pipe to solve the race condition. But that's not happening before v11 at the earliest. Back-patch to 9.6. The problem exists much further back, but the worst symptom arises only in connection with parallel query, so it does not seem worth taking any portability risks in older branches. Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
-
Tom Lane authored
The postmaster keeps signals blocked everywhere except while waiting for something to happen in ServerLoop(). The code expects that the select(2) will be cancelled with EINTR if an interrupt occurs; without that, followup actions that should be performed by ServerLoop() itself will be delayed. However, some platforms interpret the SA_RESTART signal flag as meaning that they should restart rather than cancel the select(2). Worse yet, some of them restart it with the original timeout delay, meaning that a steady stream of signal interrupts can prevent ServerLoop() from iterating at all if there are no incoming connection requests. Observable symptoms of this, on an affected platform such as HPUX 10, include extremely slow parallel query startup (possibly as much as 30 seconds) and failure to update timestamps on the postmaster's sockets and lockfiles when no new connections arrive for a long time. We can fix this by running the postmaster's signal handlers without SA_RESTART. That would be quite a scary change if the range of code where signals are accepted weren't so tiny, but as it is, it seems safe enough. (Note that postmaster children do, and must, reset all the handlers before unblocking signals; so this change should not affect any child process.) There is talk of rewriting the postmaster to use a WaitEventSet and not do signal response work in signal handlers, at which point it might be appropriate to revert this patch. But that's not happening before v11 at the earliest. Back-patch to 9.6. The problem exists much further back, but the worst symptom arises only in connection with parallel query, so it does not seem worth taking any portability risks in older branches. Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
-