- 11 Apr, 2016 2 commits
-
-
Andres Freund authored
Previously we used a spinlock, in adition to the atomically manipulated ->state field, to protect the wait queue. But it's pretty simple to instead perform the locking using a flag in state. Due to 6150a1b0 BufferDescs, on platforms (like PPC) with > 1 byte spinlocks, increased their size above 64byte. As 64 bytes are the size we pad allocated BufferDescs to, this can increase false sharing; causing performance problems in turn. Together with the previous commit this reduces the size to <= 64 bytes on all common platforms. Author: Andres Freund Discussion: CAA4eK1+ZeB8PMwwktf+3bRS0Pt4Ux6Rs6Aom0uip8c6shJWmyg@mail.gmail.com 20160327121858.zrmrjegmji2ymnvr@alap3.anarazel.de
-
Andres Freund authored
Pinning/Unpinning a buffer is a very frequent operation; especially in read-mostly cache resident workloads. Benchmarking shows that in various scenarios the spinlock protecting a buffer header's state becomes a significant bottleneck. The problem can be reproduced with pgbench -S on larger machines, but can be considerably worse for queries which touch the same buffers over and over at a high frequency (e.g. nested loops over a small inner table). To allow atomic operations to be used, cram BufferDesc's flags, usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable; that allows to manipulate them together using 32bit compare-and-swap operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could be lifted by using a 64bit field, but it's not a realistic configuration atm). As not all operations can easily implemented in a lockfree manner, implement the previous buf_hdr_lock via a flag bit in the atomic variable. That way we can continue to lock the header in places where it's needed, but can get away without acquiring it in the more frequent hot-paths. There's some additional operations which can be done without the lock, but aren't in this patch; but the most important places are covered. As bufmgr.c now essentially re-implements spinlocks, abstract the delay logic from s_lock.c into something more generic. It now has already two users, and more are coming up; there's a follupw patch for lwlock.c at least. This patch is based on a proof-of-concept written by me, which Alexander Korotkov made into a fully working patch; the committed version is again revised by me. Benchmarking and testing has, amongst others, been provided by Dilip Kumar, Alexander Korotkov, Robert Haas. On a large x86 system improvements for readonly pgbench, with a high client count, of a factor of 8 have been observed. Author: Alexander Korotkov and Andres Freund Discussion: 2400449.GjM57CE0Yg@dinodell
-
- 10 Apr, 2016 3 commits
-
-
Tom Lane authored
Originally, this test created a 100000-row test table, which made it run rather slowly compared to other contrib tests. Investigation with gcov showed that we got no further improvement in code coverage after the first 700 or so rows, making the large table 99% a waste of time. Cut it back to 2000 rows to fix the runtime problem and still leave some headroom for testing behaviors that may appear later. A closer look at the gcov results showed that the main coverage omissions in contrib/bloom occurred because the test never filled more than one entry in the notFullPage array; which is unsurprising because it exercised index cleanup only in the scenario of complete table deletion, allowing every page in the index to become deleted rather than not-full. Add testing that allows the not-full path to be exercised as well. Also, test the amvalidate function, because blvalidate.c had zero coverage without that, and besides it's a good idea to check for mistakes in the bloom opclass definitions.
-
Alvaro Herrera authored
I used the wrong variable here. Doesn't make a difference today because the only plausible caller passes a non-NULL variable, but someday it will be wrong, and even today's correctness is subtle: the caller that does pass a NULL is never invoked because of object type constraints. Surely not a condition to rely on. Noted by Coverity
-
Tom Lane authored
Since we're requiring pages handled by generic_xlog.c to be standard format, specify REGBUF_STANDARD when doing a full-page image, so that xloginsert.c can compress out the "hole" between pd_lower and pd_upper. Given the current API in which this path will be taken only for a newly initialized page, the hole is likely to be particularly large in such cases, so that this oversight could easily be performance-significant. I don't notice any particular change in the runtime of contrib/bloom's regression test, though.
-
- 09 Apr, 2016 9 commits
-
-
Tom Lane authored
Make the inner comparison loops of computeDelta() as tight as possible by pulling considerations of valid and invalid ranges out of the inner loops, and extending a match or non-match detection as far as possible before deciding what to do next. To keep this tractable, give up the possibility of merging fragments across the pd_lower to pd_upper gap. The fraction of pages where that could happen (ie, there are 4 or fewer bytes in the gap, *and* data changes immediately adjacent to it on both sides) is too small to be worth spending cycles on. Also, avoid two BLCKSZ-length memcpy()s by computing the delta before moving data into the target buffer, instead of after. This doesn't save nearly as many cycles as being tenser about computeDelta(), but it still seems worth doing. On my machine, this patch cuts a full 40% off the runtime of contrib/bloom's regression test.
-
Tom Lane authored
Per buildfarm. Pavel Stehule
-
Tom Lane authored
This routine is unsafe as implemented, because it invalidates the page image pointers returned by previous GenericXLogRegister() calls. Rather than complicate the API or the implementation to avoid that, let's just get rid of it; the use-case for having it seems much too thin to justify a lot of work here. While at it, do some wordsmithing on the SGML docs for generic WAL.
-
Tom Lane authored
That routine is dangerous, and unnecessary once we get rid of this one caller. In passing, fix failure to clean up temp memory context, or switch back to caller's context, during slowest exit path.
-
Tom Lane authored
Improve commentary, use more specific names for the delta fields, const-ify pointer arguments where possible, avoid assuming that initializing only the first element of a local array will guarantee that the remaining elements end up as we need them. (I think that code in generic_redo actually worked, but only because InvalidBuffer is zero; this is a particularly ugly way of depending on that ...)
-
Tom Lane authored
This code desperately needs some micro-optimization, and I'd like it to be formatted a bit more nicely while I work on it.
-
Kevin Grittner authored
-
Kevin Grittner authored
Inclusion of multiple macros inside another macro was pushing MSVC past its size liimit. Reported by buildfarm.
-
Alvaro Herrera authored
It cannot run in the same parallel group as misc, because it creates a table which is unpredictably visible in that test. Per buildfarm member crake.
-
- 08 Apr, 2016 26 commits
-
-
Alvaro Herrera authored
\crosstabview is a completely different way to display results from a query: instead of a vertical display of rows, the data values are placed in a grid where the column and row headers come from the data itself, similar to a spreadsheet. The sort order of the horizontal header can be specified by using another column in the query, and the vertical header determines its ordering from the order in which they appear in the query. This only allows displaying a single value in each cell. If more than one value correspond to the same cell, an error is thrown. Merging of values can be done in the query itself, if necessary. This may be revisited in the future. Author: Daniel Verité Reviewed-by: Pavel Stehule, Dean Rasheed
-
Kevin Grittner authored
The buildfarm showed failure for Windows MSVC builds due to this omission. This might not be the only problem with the Makefile for this feature, but hopefully this will get it past the immediate problem. Fix suggested by Tom Lane
-
Andres Freund authored
Previously bcac23de exposed a subset of support functions, namely the ones Kaigai found useful. In 20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de I mentioned that there's some functions missing to use the facility in an external project. To avoid having to add functions piecemeal, add all the functions which are used to define READ_* and WRITE_* macros; users of the extensible node functionality are likely to need these. Additionally expose outDatum(), which doesn't have it's own WRITE_ macro, as it needs information from the embedding struct. Discussion: 20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de
-
Stephen Frost authored
This creates an initial set of default roles which administrators may use to grant access to, historically, superuser-only functions. Using these roles instead of granting superuser access reduces the number of superuser roles required for a system. Documention for each of the default roles has been added to user-manag.sgml. Bump catversion to 201604082, as we had a commit that bumped it to 201604081 and another that set it back to 201604071... Reviews by José Luis Tallón and Robert Haas
-
Stephen Frost authored
This will prevent users from creating roles which begin with "pg_" and will check for those roles before allowing an upgrade using pg_upgrade. This will allow for default roles to be provided at initdb time. Reviews by José Luis Tallón and Robert Haas
-
Stephen Frost authored
Now that 'dump' is a bitmap, we can't simply set it to 'true'. Noticed while debugging the prior issue.
-
Kevin Grittner authored
This feature is controlled by a new old_snapshot_threshold GUC. A value of -1 disables the feature, and that is the default. The value of 0 is just intended for testing. Above that it is the number of minutes a snapshot can reach before pruning and vacuum are allowed to remove dead tuples which the snapshot would otherwise protect. The xmin associated with a transaction ID does still protect dead tuples. A connection which is using an "old" snapshot does not get an error unless it accesses a page modified recently enough that it might not be able to produce accurate results. This is similar to the Oracle feature, and we use the same SQLSTATE and error message for compatibility.
-
Kevin Grittner authored
This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
-
Stephen Frost authored
Pretty sure I removed this based on some incorrect thinking that it was no longer possible to reach this point for a table which will not be dumped, but that's clearly wrong. Pointed out on IRC by Erik Rijkers.
-
Teodor Sigaev authored
It's not ready yet, revert two commits 690c5435 - unstable test output 386e3d76 - patch itself
-
Magnus Hagander authored
These parameters are available for SSPI authentication only, to make it possible to make it behave more like "normal gssapi", while making it possible to maintain compatibility. compat_realm is on by default, but can be turned off to make the authentication use the full Kerberos realm instead of the NetBIOS name. upn_username is off by default, and can be turned on to return the users Kerberos UPN rather than the SAM-compatible name (a user in Active Directory can have both a legacy SAM-compatible username and a new Kerberos one. Normally they are the same, but not always) Author: Christian Ullrich Reviewed by: Robbie Harwood, Alvaro Herrera, me
-
Teodor Sigaev authored
Found during investigation of failure of skink buildfarm member and its valgrind report. Backpatch to all supported branches
-
Peter Eisentraut authored
OpenSSL has an unfortunate tendency to mix per-session state error handling with per-thread error handling. This can cause problems when programs that link to libpq with OpenSSL enabled have some other use of OpenSSL; without care, one caller of OpenSSL may cause problems for the other caller. Backend code might similarly be affected, for example when a third party extension independently uses OpenSSL without taking the appropriate precautions. To fix, don't trust other users of OpenSSL to clear the per-thread error queue. Instead, clear the entire per-thread queue ahead of certain I/O operations when it appears that there might be trouble (these I/O operations mostly need to call SSL_get_error() to check for success, which relies on the queue being empty). This is slightly aggressive, but it's pretty clear that the other callers have a very dubious claim to ownership of the per-thread queue. Do this is both frontend and backend code. Finally, be more careful about clearing our own error queue, so as to not cause these problems ourself. It's possibly that control previously did not always reach SSLerrmessage(), where ERR_get_error() was supposed to be called to clear the queue's earliest code. Make sure ERR_get_error() is always called, so as to spare other users of OpenSSL the possibility of similar problems caused by libpq (as opposed to problems caused by a third party OpenSSL library like PHP's OpenSSL extension). Again, do this is both frontend and backend code. See bug #12799 and https://bugs.php.net/bug.php?id=68276 Based on patches by Dave Vitek and Peter Eisentraut. From: Peter Geoghegan <pg@bowt.ie>
-
Tom Lane authored
Create a "bsd" auth method that works the same as "password" so far as clients are concerned, but calls the BSD Authentication service to check the password. This is currently only available on OpenBSD. Marisa Emerson, reviewed by Thomas Munro
-
Robert Haas authored
This allows parallel aggregation to use them. It may seem surprising that we use float8_combine for both float4_accum and float8_accum transition functions, but that's because those functions differ only in the type of the non-transition-state argument. Haribabu Kommi, reviewed by David Rowley and Tomas Vondra
-
Teodor Sigaev authored
Just forget to add in 1ec4c7c0
-
Teodor Sigaev authored
As noticed by Tom Lane changing operation's number in commit bb140506 causes on-disk format incompatibility. Revert to previous numbering, that is reason to add special array to store priorities of operation. Also it reverts order of tsquery to previous. Author: Dmitry Ivanov
-
Andrew Dunstan authored
-
Teodor Sigaev authored
Now indexes (but only B-tree for now) can contain "extra" column(s) which doesn't participate in index structure, they are just stored in leaf tuples. It allows to use index only scan by using single index instead of two or more indexes. Author: Anastasia Lubennikova with minor editorializing by me Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
-
Peter Eisentraut authored
see also ce8d7bb6
-
Andrew Dunstan authored
Most of what is produced by the detailed verbosity level is of no interest at all, so switch to the normal level for more usable output. Christian Ullrich Backpatch to all live branches
-
Peter Eisentraut authored
-
Tom Lane authored
Don't try to examine S_ISLNK(st.st_mode) after a failed lstat(). It's undefined. Also, if the lstat() reported ENOENT, we do not wish that to be a hard error, but the code might nonetheless treat it as one (giving an entirely misleading error message, too) depending on luck-of-the-draw as to what S_ISLNK() returned. Don't throw error for ENOENT from rmdir(), either. (We're not really expecting ENOENT because we just stat'd the file successfully; but if we're going to allow ENOENT in the symlink code path, surely the directory code path should too.) Generate an appropriate errcode for its-the-wrong-type-of-file complaints. (ERRCODE_SYSTEM_ERROR doesn't seem appropriate, and failing to write errcode() around it certainly doesn't work, and not writing an errcode at all is not per project policy.) Valgrind noticed the undefined S_ISLNK result; the other problems emerged while reading the code in the area. All of this appears to have been introduced in 8f15f74a. Back-patch to 9.5 where that commit appeared.
-
Robert Haas authored
David Rowley, reviewed by Tomas Vondra
-
Teodor Sigaev authored
Patch adds a new, more rich, way to emit error message or exception from PL/Pythonu code. Author: Pavel Stehule Reviewers: Catalin Iacob, Peter Eisentraut, Jim Nasby
-