- 10 Jul, 2017 4 commits
-
-
Heikki Linnakangas authored
Also fix two other issues, while we're at it: * In error message on connection failure, if multiple network addresses were given as the host option, as in "host=127.0.0.1,127.0.0.2", the error message printed the address twice. * If there were many more ports than hostnames, the error message would always claim that there was one port too many, even if there was more than one. For example, if you gave 2 hostnames and 5 ports, the error message claimed that you gave 2 hostnames and 3 ports. Discussion: https://www.postgresql.org/message-id/10badbc6-4d5a-a769-623a-f7ada43e14dd@iki.fi
-
Tom Lane authored
Original wording had confusingly vague antecedent for "they", so replace that with a more repetitive but clearer formulation. In passing, make the link to the installation requirements section more specific. Per gripe from Martin Mai, though this is not the fix he initially proposed. Discussion: https://postgr.es/m/CAN_NWRu-cWuNaiXUjV3m4H-riWURuPW=j21bSaLADs6rjjzXgQ@mail.gmail.com
-
Tom Lane authored
Improve many item descriptions, improve markup, relocate some items that seemed to be in the wrong section.
-
- 09 Jul, 2017 3 commits
-
-
Tom Lane authored
-
Tom Lane authored
Thinko in commit a892234f. Vik Fearing Discussion: https://postgr.es/m/b6aaa23d-e26f-6404-a30d-e89431492d5d@2ndquadrant.com
-
Noah Misch authored
It generates an empty file, so libpq.dll advertises no version information. Commit facde2a9 mistranslated "print O;" in this one place.
-
- 08 Jul, 2017 1 commit
-
- 07 Jul, 2017 2 commits
-
-
Alvaro Herrera authored
Noticed while reviewing code.
-
Magnus Hagander authored
Author: Masahiko Sawada <sawada.mshk@gmail.com>
-
- 06 Jul, 2017 4 commits
-
-
Teodor Sigaev authored
Fix oversight in 3b97e682 bug fix. Bitwise AND is used instead of OR and it cleans all bits in t_infomask heap tuple field. Backpatch to 9.3
-
Dean Rasheed authored
partition_rbound_cmp() is intended to compare range partition bounds in a way such that if all the bound values are equal but one is an upper bound and one is a lower bound, the upper bound is treated as smaller than the lower bound. This particular ordering is required by RelationBuildPartitionDesc() when building the PartitionBoundInfoData, so that it can consistently keep only the upper bounds when upper and lower bounds coincide. Update the function comment to make that clearer. Also, fix a (currently unreachable) corner-case bug -- if the bound values coincide and they contain unbounded values, fall through to the lower-vs-upper comparison code, rather than immediately returning 0. Currently it is not possible to define coincident upper and lower bounds containing unbounded columns, but that may change in the future, so code defensively. Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
-
Dean Rasheed authored
The previous logic, whilst not actually wrong, was overly complex and involved doing two binary searches, where only one was really necessary. This simplifies that logic and improves the comments. One visible change is that if the new partition overlaps multiple existing partitions, the error message now always reports the overlap with the first existing partition (the one with the lowest bounds). The old code would sometimes report the clash with the first partition and sometimes with the last one. Original patch idea from Amit Langote, substantially rewritten by me. Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
-
Tom Lane authored
Buildfarm members hornet and sungazer have shown multiple instances of "Failed test 'xmin of non-cascaded slot with hs feedback has changed'". The reason seems to be that the test is checking the current xmin of the master server's replication slot against a past xmin of the first slave server's replication slot. Even though the latter slot is downstream of the former, it's possible for its reported xmin to be ahead of the former's reported xmin, because those numbers are updated whenever the respective downstream walreceiver feels like it (see logic in WalReceiverMain). Instrumenting this test shows that indeed the slave slot's xmin does often advance before the master's does, especially if an autovacuum transaction manages to occur during the relevant window. If we happen to capture such an advanced xmin as $xmin, then the subsequent wait_slot_xmins call can fall through before the master's xmin has advanced at all, and then if it advances before the get_slot_xmins call, we can get the observed failure. Yeah, that's a bit of a long chain of deduction, but it's hard to explain any other way how the test can get past an "xmin <> '$xmin'" check only to have the next query find that xmin does equal $xmin. Fix by keeping separate images of the master and slave slots' xmins and testing their has-xmin-advanced conditions independently.
-
- 05 Jul, 2017 5 commits
-
-
Peter Eisentraut authored
Since pg_ctl promote already waits for recovery to end, these calls are obsolete.
-
Peter Eisentraut authored
If an operation being waited for does not complete within the timeout, then exit with a nonzero exit status. This was previously handled inconsistently.
-
Peter Eisentraut authored
WRITE_CHAR_FIELD() didn't do any escaping, so that for example a zero byte would cause the whole output string to be truncated. To fix, pass the char through outToken(), so it is escaped like a string. Adjust the reading side to handle this.
-
Peter Eisentraut authored
Update the documentation for \pset to mention columns|linestyle|pager_min_lines. Add various mentions of \pset command equivalences that were previously inconsistent. Author: Дилян Палаузов <dpa-postgres@aegee.org>
- 04 Jul, 2017 2 commits
-
-
Peter Eisentraut authored
Reported-by: Константин Евтеев <konst583@gmail.com> Bug: #14699
-
Peter Eisentraut authored
This avoids "tuple concurrently updated" errors when a ALTER or DROP SUBSCRIPTION writes to pg_subscription_rel at the same time as a worker. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
-
- 03 Jul, 2017 4 commits
-
-
Magnus Hagander authored
Author: Michael Paquier <michael.paquier@gmail.com>
-
Heikki Linnakangas authored
If the client closes an SSL connection, treat it the same as EOF on a non-SSL connection. In particular, don't write a message in the log about that. Michael Paquier. Discussion: https://www.postgresql.org/message-id/CAB7nPqSfyVV42Q2acFo%3DvrvF2gxoZAMJLAPq3S3KkjhZAYi7aw@mail.gmail.com
-
Heikki Linnakangas authored
Previously, gen_random_uuid() would fall back to a weak random number generator, unlike gen_random_bytes() which would just fail. And this was not made very clear in the docs. For consistency, also make gen_random_uuid() fail outright, if compiled with --disable-strong-random. Re-word the error message you get with --disable-strong-random. It is also used by pgp functions that require random salts, and now also gen_random_uuid(). Reported by Radek Slupik. Discussion: https://www.postgresql.org/message-id/20170101232054.10135.50528@wrigleys.postgresql.org
-
Tom Lane authored
Since reducing pg_ctl's reaction time in commit c61559ec, some slower buildfarm members have shown erratic failures in this test. The reason turns out to be that the test assumes synchronous replication (because it does not provide any lag time for a commit to replicate before shutting down the servers), but it had only enabled sync rep in one direction. The observed symptoms correspond to failure to replicate the last committed transaction in the other direction, which can be expected to happen if the shutdown command is issued soon enough and we are providing no synchronous-commit guarantees. Fix that, and add a bit more paranoid state checking at the bottom of the script. Michael Paquier and myself Discussion: https://postgr.es/m/908.1498965681@sss.pgh.pa.us
-
- 02 Jul, 2017 5 commits
-
-
Tom Lane authored
By default, Perl's split() function drops trailing empty fields, which is not what we want here. Oversight in commit fb093e4c. We'd managed to miss it thus far thanks to the very limited usage of this function. Discussion: https://postgr.es/m/14837.1499029831@sss.pgh.pa.us
-
Tom Lane authored
The original coding here was very confusing, because it named the two servers it set up "master" and "slave" even though it swapped their replication roles multiple times. At any given point in the script it was very unobvious whether "$node_master" actually referred to the server named "master" or the other one. Instead, pick arbitrary names for the two servers --- I used "london" and "paris" --- and distinguish those permanent names from the nonce references $cur_master and $cur_slave. Add logging to help distinguish which is which at any given point. Also, use distinct data and transaction names to make all the prepared transactions easily distinguishable in the postmaster logs. (There was one place where we intentionally tested that the server could cope with re-use of a transaction name, but it seems like one place is sufficient for that purpose.) Also, add checks at the end to make sure that all the transactions that were supposed to be committed did survive. Discussion: https://postgr.es/m/28238.1499010855@sss.pgh.pa.us
-
Tom Lane authored
Add an optional "expected" argument to override the default assumption that we're waiting for the query to return "t". This allows replacing a handwritten polling loop in recovery/t/007_sync_rep.pl with use of poll_query_until(); AFAICS that's the only remaining ad-hoc polling loop in our TAP tests. Change poll_query_until() to probe ten times per second not once per second. Like some similar changes I've been making recently, the one-second interval seems to be rooted in ancient traditions rather than the actual likely wait duration on modern machines. I'd consider reducing it further if there were a convenient way to spawn just one psql for the whole loop rather than one per probe attempt. Discussion: https://postgr.es/m/12486.1498938782@sss.pgh.pa.us
-
Peter Eisentraut authored
Update the documentation a bit to include that logical replication as well as other and third-party replication clients can participate in synchronous replication.
-
Peter Eisentraut authored
The simple calculations done to estimate the size of the output buffers for ucnv_fromUChars() and ucnv_toUChars() could overflow int32_t for large strings. To avoid that, go the long way and run the function first without an output buffer to get the correct output buffer size requirement.
-
- 01 Jul, 2017 4 commits
-
-
Tom Lane authored
Several callers of PostgresNode::poll_query_until() neglected to check for failure; I do not think that's optional. Also, rewrite one place that had reinvented poll_query_until() for no very good reason.
-
Tom Lane authored
The regression tests contain numerous cases where we do some activity on a master server and then wait till the slave has ack'd flushing its copy of that transaction. Because WAL flush on the slave is asynchronous to the logicalrep worker process, the worker cannot send such a feedback message during the LogicalRepApplyLoop iteration where it processes the last data from the master. In the previous coding, the feedback message would come out only when the loop's WaitLatchOrSocket call returned WL_TIMEOUT. That requires one full second of delay (NAPTIME_PER_CYCLE); and to add insult to injury, it could take more than that if the WaitLatchOrSocket was interrupted a few times by latch-setting events. In reality we can expect the slave's walwriter process to have flushed the WAL data after, more or less, WalWriterDelay (typically 200ms). Hence, if there are unacked transactions pending, make the wait delay only that long rather than the full NAPTIME_PER_CYCLE. Also, move one of the send_feedback() calls into the loop main line, so that we'll check for the need to send feedback even if we were woken by a latch event and not either socket data or timeout. It's not clear how much this matters for production purposes, but it's definitely helpful for testing. Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
-
Tom Lane authored
When waiting for a logical replication worker process to start or stop, we have to busy-wait until we see it add or remove itself from the LogicalRepWorker slot in shared memory. Those loops were using a one-second delay between checks, but on any reasonably modern machine, it doesn't take more than a couple of msec for a worker to spawn or shut down. Reduce the loop delays to 10ms to avoid wasting quite so much time in the related regression tests. In principle, a better solution would be to fix things so that the waiting process can be awakened via its latch at the right time. But that seems considerably more invasive, which is undesirable for a post-beta fix. Worker start/stop performance likely isn't of huge interest anyway for production purposes, so we might not ever get around to it. In passing, rearrange the second wait loop in logicalrep_worker_stop() so that the lock is held at the top of the loop, thus saving one lock acquisition/release per call, and making it look more like the other loop. Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
-
Peter Eisentraut authored
The bug would previously prevent the update of any column in a table with identity columns, rather than just the actual identity column. Reported-by: zam6ak@gmail.com Bug: #14718
-
- 30 Jun, 2017 6 commits
-
-
Alvaro Herrera authored
In WAL receiver and WAL server, some accesses to their corresponding shared memory control structs were done without holding any kind of lock, which could lead to inconsistent and possibly insecure results. In walsender, fix by clarifying the locking rules and following them correctly, as documented in the new comment in walsender_private.h; namely that some members can be read in walsender itself without a lock, because the only writes occur in the same process. The rest of the struct requires spinlock for accesses, as usual. In walreceiver, fix by always holding spinlock while accessing the struct. While there is potentially a problem in all branches, it is minor in stable ones. This only became a real problem in pg10 because of quorum commit in synchronous replication (commit 3901fd70), and a potential security problem in walreceiver because a superuser() check was removed by default monitoring roles (commit 25fff407). Thus, no backpatch. In passing, clean up some leftover braces which were used to create unconditional blocks. Once upon a time these were used for volatile-izing accesses to those shmem structs, which is no longer required. Many other occurrences of this pattern remain. Author: Michaël Paquier Reported-by: Michaël Paquier Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi, Thomas Munro, Robert Haas Discussion: https://postgr.es/m/CAB7nPqTWYqtzD=LN_oDaf9r-hAjUEPAy0B9yRkhcsLdRN8fzrw@mail.gmail.com
-
Peter Eisentraut authored
('foo') is not a Python tuple: it is a string wrapped in parentheses. A valid 1-element Python tuple is ('foo',). Author: Daniele Varrazzo <daniele.varrazzo@gmail.com>
-
Peter Eisentraut authored
Author: Masahiko Sawada <sawada.mshk@gmail.com>
-
Tom Lane authored
When a sync worker is waiting for the associated apply worker to notice that it's in SYNCWAIT state, wait_for_worker_state_change() would just patiently wait for that to happen. This generally required waiting for the 1-second timeout in LogicalRepApplyLoop to elapse. Kicking the worker via its latch makes things significantly snappier. While at it, fix race conditions that could potentially result in crashes: we can *not* call logicalrep_worker_wakeup_ptr() once we've released the LogicalRepWorkerLock, because worker->proc might've been reset to NULL after we do that (indeed, there's no really solid reason to believe that the LogicalRepWorker slot even belongs to the same worker anymore). In logicalrep_worker_wakeup(), we can just move the wakeup inside the lock scope. In process_syncing_tables_for_apply(), a bit more code rearrangement is needed. Also improve some nearby comments.
-
Peter Eisentraut authored
Author: Albe Laurenz <laurenz.albe@wien.gv.at>
-
Peter Eisentraut authored
Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
-