- 22 Feb, 2016 3 commits
-
-
Tom Lane authored
This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit 38f8bdca lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.
-
Tom Lane authored
We don't really need this field, because it's either zero or redundant with PGPROC.pid. The use of zero to mark "not a group leader" is not necessary since we can just as well test whether lockGroupLeader is NULL. This does not save very much, either as to code or data, but the simplification seems worthwhile anyway.
-
Andres Freund authored
In 4b4b680c I accidentally used sizeof(PrivateRefCountArray) instead of sizeof(PrivateRefCountEntry) when creating the refcount overflow hashtable. As the former is bigger than the latter, this luckily only resulted in a slightly increased memory usage when many buffers are pinned in a backend. Reported-By: Takashi Horikawa Discussion: 73FA3881462C614096F815F75628AFCD035A48C3@BPXM01GP.gisp.nec.co.jp Backpatch: 9.5, where thew new ref count infrastructure was introduced
-
- 21 Feb, 2016 7 commits
-
-
Tom Lane authored
The "Session Information Functions" table seems to be sorted mostly alphabetically (although it's not perfect), which would be all right if it didn't lead to some related functions being described in a pretty nonintuitive order. Also, the prose discussions after the table were in an order that hardly matched the table at all. Rearrange to make things a bit easier to follow.
-
Tom Lane authored
Coverity griped about use of unchecked strcpy() into a local variable. There's unlikely to be any actual bug there, since no caller would be passing a path longer than MAXPGPATH, but nonetheless use of strlcpy() seems preferable. While at it, get rid of unmaintainable separation between list of field names and list of field values in favor of initializing them in parallel. And we might as well declare get_configdata()'s path argument as const char *, even though no current caller needs that.
-
Andrew Dunstan authored
Some over-eager copy-and-pasting on my part resulted in a nonsense result being returned in this case. I have adopted the same pattern for handling this case as is used in the one argument form of the function, i.e. we just skip over the code that adds values to the object. Diagnosis and patch from Michael Paquier, although not quite his solution. Fixes bug #13936. Backpatch to 9.5 where jsonb_object was introduced.
-
Robert Haas authored
Spotted by Tom Lane.
-
Robert Haas authored
Reflow text in lock manager README so that it fits within 80 columns. Correct some mistakes. Expand the README to explain not only why group locking exists but also the data structures that support it. Improve comments related to group locking several files. Change the name of a macro argument for improved clarity. Most of these problems were reported by Tom Lane, but I found a few of them myself. Robert Haas and Tom Lane
-
Robert Haas authored
list_concat(list_concat(a, b), c) destructively changes both a and b; to avoid such perils, copy lists of remote_conds before incorporating them into larger lists via list_concat(). Ashutosh Bapat, per a report from Etsuro Fujita
-
Tatsuo Ishii authored
With suggentions from Tom Lane.
-
- 20 Feb, 2016 5 commits
-
-
Dean Rasheed authored
Not all compilers support "long long" and the "LL" integer literal suffix, so use a cast to int64 instead.
-
Dean Rasheed authored
Commit 53874c52 broke various 32-bit buildfarm machines because it incorrectly used an 'L' suffix for what needed to be a 64-bit literal. Thanks to Michael Paquier for helping to diagnose this.
-
Dean Rasheed authored
This will parse strings in the format produced by pg_size_pretty() and return sizes in bytes. This allows queries to be written with clauses like "pg_total_relation_size(oid) > pg_size_bytes('10 GB')". Author: Pavel Stehule with various improvements by Vitaly Burovoy Discussion: http://www.postgresql.org/message-id/CAFj8pRD-tGoDKnxdYgECzA4On01_uRqPrwF-8LdkSE-6bDHp0w@mail.gmail.com Reviewed-by: Vitaly Burovoy, Oleksandr Shulgin, Kyotaro Horiguchi, Michael Paquier and Robert Haas
-
Peter Eisentraut authored
Prevent wrapping of the element content to avoid confusing line breaks between hyphens.
-
Noah Misch authored
Architecture reference material specifies this order, and s_lock.h inline assembly agrees. The former order failed to provide mutual exclusion to lwlock.c and perhaps to other clients. The two xlc buildfarm members, hornet and mandrill, have failed sixteen times with duplicate key errors involving pg_class_oid_index or pg_type_oid_index. Back-patch to 9.5, where commit b64d92f1 introduced atomics. Reviewed by Andres Freund and Tom Lane.
-
- 19 Feb, 2016 3 commits
-
-
Simon Riggs authored
StartupSUBTRANS() incorrectly handled cases near the max pageid in the subtrans data structure, which in some cases could lead to errors in startup for Hot Standby. This patch wraps the pageids correctly, avoiding any such errors. Identified by exhaustive crash testing by Jeff Janes. Jeff Janes
-
Peter Eisentraut authored
It was using %u to read a string that was earlier produced by snprintf with %d into a signed integer variable. This seems to work in practice but is incorrect. found by cppcheck
-
Tom Lane authored
Up to now, there's been an assumption that all Paths for a given relation compute the same output column set (targetlist). However, there are good reasons to remove that assumption. For example, an indexscan on an expression index might be able to return the value of an expensive function "for free". While we have the ability to generate such a plan today in simple cases, we don't have a way to model that it's cheaper than a plan that computes the function from scratch, nor a way to create such a plan in join cases (where the function computation would normally happen at the topmost join node). Also, we need this so that we can have Paths representing post-scan/join steps, where the targetlist may well change from one step to the next. Therefore, invent a "struct PathTarget" representing the columns we expect a plan step to emit. It's convenient to include the output tuple width and tlist evaluation cost in this struct, and there will likely be additional fields in future. While Path nodes that actually do have custom outputs will need their own PathTargets, it will still be true that most Paths for a given relation will compute the same tlist. To reduce the overhead added by this patch, keep a "default PathTarget" in RelOptInfo, and allow Paths that compute that column set to just point to their parent RelOptInfo's reltarget. (In the patch as committed, actually every Path is like that, since we do not yet have any cases of custom PathTargets.) I took this opportunity to provide some more-honest costing of PlaceHolderVar evaluation. Up to now, the assumption that "scan/join reltargetlists have cost zero" was applied not only to Vars, where it's reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval cost of a PlaceHolderVar's expression to the first plan level where it can be computed, by including it in the PathTarget cost field and adding that to the cost estimates for Paths. This isn't perfect yet but it's much better than before, and there is a way forward to improve it more. This costing change affects the join order chosen for a couple of the regression tests, changing expected row ordering.
-
- 18 Feb, 2016 3 commits
-
-
Bruce Momjian authored
Suppress creation of the pg_upgrade delete script when the new data directory is inside the old data directory. Reported-by: IRC Backpatch-through: 9.3, where delete script tests were added
-
Tom Lane authored
Dead or half-dead index leaf pages were incorrectly reported as live, as a consequence of a code rearrangement I made (during a moment of severe brain fade, evidently) in commit d287818e. The index metapage was not counted in index_size, causing that result to not agree with the actual index size on-disk. Index root pages were not counted in internal_pages, which is inconsistent compared to the case of a root that's also a leaf (one-page index), where the root would be counted in leaf_pages. Aside from that inconsistency, this could lead to additional transient discrepancies between the reported page counts and index_size, since it's possible for pgstatindex's scan to see zero or multiple pages marked as BTP_ROOT, if the root moves due to a split during the scan. With these fixes, index_size will always be exactly one page more than the sum of the displayed page counts. Also, the index_size result was incorrectly documented as being measured in pages; it's always been measured in bytes. (While fixing that, I couldn't resist doing some small additional wordsmithing on the pgstattuple docs.) Including the metapage causes the reported index_size to not be zero for an empty index. To preserve the desired property that the pgstattuple regression test results are platform-independent (ie, BLCKSZ configuration independent), scale the index_size result in the regression tests. The documentation issue was reported by Otsuka Kenji, and the inconsistent root page counting by Peter Geoghegan; the other problems noted by me. Back-patch to all supported branches, because this has been broken for a long time.
-
Peter Eisentraut authored
The old phrasing was awkward if a replication slot is activated and deactivated repeatedly.
-
- 17 Feb, 2016 4 commits
-
-
Joe Conway authored
In commit a5c43b88 the behavior of command line pg_config was inadvertantly changed to include the config name when specific configs are requested, similar to when none are requested and all are emitted. This breaks scripts that expect to use pg_config for e.g. PGXS. Revert the behavior to the previous.
-
Joe Conway authored
Move and refactor the underlying code for the pg_config client application to src/common in support of sharing it with a new system information SRF called pg_config() which makes the same information available via SQL. Additionally wrap the SRF with a new system view, as called pg_config. Patch by me with extensive input and review by Michael Paquier and additional review by Alvaro Herrera.
-
Robert Haas authored
When processing ordered aggregates following a sort that could make use of the abbreviated key optimization, only call the equality operator to compare successive pairs of tuples when their abbreviated keys were not equal. Peter Geoghegan, reviewd by Andreas Karlsson and by me.
-
Tom Lane authored
A function name that's double-quoted in SQL can contain almost any characters, but we were using that name directly as part of the name generated for the Python-level function, and Python doesn't like anything that isn't pretty much a standard identifier. To fix, replace anything that isn't an ASCII letter or digit with an underscore in the generated name. This doesn't create any risk of duplicate Python function names because we were already appending the function OID to the generated name to ensure uniqueness. Per bug #13960 from Jim Nasby. Patch by Jim Nasby, modified a bit by me. Back-patch to all supported branches.
-
- 16 Feb, 2016 7 commits
-
-
Tom Lane authored
Clarify the description of which transactions will block a CREATE INDEX CONCURRENTLY command from proceeding, and mention that the index might still not be usable after CREATE INDEX completes. (This happens if the index build detected broken HOT chains, so that pg_index.indcheckxmin gets set, and there are open old transactions preventing the xmin horizon from advancing past the index's initial creation. I didn't want to explain what broken HOT chains are, though, so I omitted an explanation of exactly when old transactions prevent the index from being used.) Per discussion with Chris Travers. Back-patch to all supported branches, since the same text appears in all of them.
-
Bruce Momjian authored
Reported-by: Tatsuo Ishii Backpatch-through: 9.5
-
Michael Meskes authored
-
Michael Meskes authored
-
Tatsuo Ishii authored
Change "In this case" to "In the example above" to clarify what it actually refers to.
-
Fujii Masao authored
In runtime.sgml, the old formulas for calculating the reasonable values of SEMMNI and SEMMNS were incorrect. They have forgotten to count the number of semaphores which both the checkpointer process (introduced in 9.2) and the background worker processes (introduced in 9.3) need. This commit fixes those formulas so that they count the number of semaphores which the checkpointer process and the background worker processes need. Report and patch by Kyotaro Horiguchi. Only the patch for 9.3 was modified by me. Back-patch to 9.2 where the checkpointer process was added and the number of needed semaphores was increased. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Backpatch: 9.2 Discussion: http://www.postgresql.org/message-id/20160203.125119.66820697.horiguchi.kyotaro@lab.ntt.co.jp
-
Joe Conway authored
In commit 7b4bfc87 the DATA and DESCR entries for the new row_security_active() function were inadvertantly put after the PROVOLATILE defines, rather than before as they should have been placed. Move them up where they belong. Backpatch to 9.5 where the new entries were introduced.
-
- 15 Feb, 2016 8 commits
-
-
Andres Freund authored
Commit 4de82f7d increased the WAL flush rate, mainly to increase the likelihood that hint bits can be set quickly. More quickly set hint bits can reduce contention around the clog et al. But unfortunately the increased flush rate can have a significant negative performance impact, I have measured up to a factor of ~4. The reason for this slowdown is that if there are independent writes to the underlying devices, for example because shared buffers is a lot smaller than the hot data set, or because a checkpoint is ongoing, the fdatasync() calls force cache flushes to be emitted to the storage. This is achieved by flushing WAL only if the last flush was longer than wal_writer_delay ago, or if more than wal_writer_flush_after (new GUC) unflushed blocks are pending. Based on some tests the default for wal_writer_delay is 1MB, which seems to work well both on SSD and rotational media. To avoid negative performance impact due to 4de82f7d an earlier commit (db76b1ef) made SetHintBits() more likely to succeed; preventing performance regressions in the pgbench tests I performed. Discussion: 20160118163908.GW10941@awork2.anarazel.de
-
Alvaro Herrera authored
The original code wasn't careful to test the file descriptor returned by PQsocket() for an invalid socket. If an invalid socket did turn up, that would amount to calling FD_ISSET with fd = -1, whereby undefined behavior can be invoked. To fix, test file descriptor for validity and stop further processing if that fails. Problem noticed by Coverity. There is an existing FD_ISSET callsite that does check for invalid sockets beforehand, but the error message reported by it was strerror(errno); in testing the aforementioned change, that turns out to result in "bad socket: Success" which isn't terribly helpful. Instead use PQerrorMessage() in both places which is more likely to contain an useful error message. Backpatch-through: 9.1.
-
Tom Lane authored
Reportedly, some compilers warn about tests like "c < 0" if c is unsigned, and hence complain about the character range checks I added in commit 3bb3f42f. This is a bit of a pain since the regex library doesn't really want to assume that chr is unsigned. However, since any such reconfiguration would involve manual edits of regcustom.h anyway, we can put it on the shoulders of whoever wants to do that to adjust this new range-checking macro correctly. Per gripes from Coverity and Andres.
-
Andres Freund authored
Previously we only allowed SetHintBits() to succeed if the commit LSN of the last transaction touching the page has already been flushed to disk. We can't generally change the LSN of the page, because we don't necessarily have the required locks on the page. But the required LSN interlock does not mean the commit record has to be flushed immediately, it just requires that the commit record will be flushed before the page is written out. Therefore if the buffer LSN is newer than the commit LSN, the hint bit can be safely set. In a number of scenarios (e.g. pgbench) this noticeably increases the number of hint bits are set. But more importantly it also keeps the success rate up when flushing WAL less frequently. That was the original reason for commit 4de82f7d, which has negative performance consequences in a number of scenarios. This will allow a followup commit to reduce the flush rate. Discussion: 20160118163908.GW10941@awork2.anarazel.de
-
Joe Conway authored
Looks like this patch went in after Copyright messages were updated for 2016 and it missed the boat. Fixed.
-
Fujii Masao authored
In REFRESH MATERIALIZED VIEW command, CONCURRENTLY option is only allowed if there is at least one unique index with no WHERE clause on one or more columns of the matview. Previously, concurrent refresh checked the existence of a unique index on the matview after filling the data to new snapshot, i.e., after calling refresh_matview_datafill(). So, when there was no unique index, we could need to wait a long time before we detected that and got the error. It was a waste of time. To eliminate such wasting time, this commit changes concurrent refresh so that it checks the existence of a unique index at the beginning of the refresh operation, i.e., before starting any time-consuming jobs. If CONCURRENTLY option is not allowed due to lack of a unique index, concurrent refresh can immediately detect it and emit an error. Author: Masahiko Sawada Reviewed-by: Michael Paquier, Fujii Masao
-
Magnus Hagander authored
-
Noah Misch authored
-