- 08 Apr, 2020 21 commits
-
-
Andrew Dunstan authored
1. Tell Msys2 not to mangle the tablespace map parameter 2. If rmdir doesn't work, fall back to trying unlink on the entry in pg_tblspc. Discussion: https://postgr.es/m/7330a7c7-ce5f-9769-39a1-bdb0b32bb4a6@2ndQuadrant.com
-
Peter Eisentraut authored
-
Tomas Vondra authored
The test never did ANALYZE on the test table, so the plans depended on various defaults (e.g. number of groups being 200). This worked most of the time, but with CLOBBER_CACHE_ALWAYS the autoanalyze often managed to build accurate stats, changing the plan. Fixed by increasing the size of test tables a bit, making the Sort a bit more expensive than Incremental Sort. The tests were constructed to test transitions in the Incremental Sort algorithm, and this change does not break that. Reviewed-by: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
-
Tom Lane authored
Repair an oversight in commit 8728b2c7: if we're postponing restore of event triggers to the end, we must also postpone restoring any comments on them, since of course we cannot create the comments first. (This opens yet another opportunity for an event trigger to bollix the restore, but there's no help for that.) Per bug #16346 from Alexander Lakhin. Like the previous commit, back-patch to all supported branches. Hamid Akhtar and Tom Lane Discussion: https://postgr.es/m/16346-6210ad7a0ea81be1@postgresql.org
-
Thomas Munro authored
GetWalRcvWriteRecPtr() previously reported the latest *flushed* location. Adopt the conventional terminology used elsewhere in the tree by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related variables that used the term "received". Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest *written* value. This will allow later patches to use the value for non-data-integrity purposes, without having to wait for the flush pointer to advance. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
-
Peter Eisentraut authored
To control whether partition changes are replicated using their own identity and schema or an ancestor's, add a new parameter that can be set per publication named 'publish_via_partition_root'. This allows replicating a partitioned table into a different partition structure on the subscriber. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
-
Alexander Korotkov authored
0f5ca02f introduces 3 new keywords. It appears to be too much for relatively small feature. Given now we past feature freeze, it's already late for discussion of the new syntax. So, revert. Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
-
David Rowley authored
2nd pass of modifying various places which obtain the next power of 2 of a number and make them use the new functions added in f0705bb6. In passing, also modify num_combinations(). This can be implemented using simple bitshifting rather than looping. Reviewed-by: John Naylor Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
-
Michael Paquier authored
Attempting to use a COLLATE clause with a type that it not collatable in a partition bound expression could crash the server. This commit fixes the code by adding more checks similar to what is done when computing index or partition attributes by making sure that there is a collation iff the type is collatable. Backpatch down to 12, as 7c079d74 introduced this problem. Reported-by: Alexander Lakhin Author: Dmitry Dolgov Discussion: https://postgr.es/m/16325-809194cf742313ab@postgresql.org Backpatch-through: 12
-
David Rowley authored
First pass of modifying various places that obtain the next power of 2 of a number and make them use the new functions added in pg_bitutils.h instead. This also removes the _hash_log2() function. There are no longer any callers in core. Other users can swap their _hash_log2(n) call to make use of pg_ceil_log2_32(n). Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
-
Jeff Davis authored
If the memory context's maxBlockSize is too big, a single block allocation can suddenly exceed work_mem. For Hash Aggregation, this can mean spilling to disk too early or reporting a confusing memory usage number for EXPLAN ANALYZE. Introduce CreateWorkExprContext(), which is like CreateExprContext(), except that it creates the AllocSet with a maxBlockSize that is reasonable in proportion to work_mem. Right now, CreateWorkExprContext() is only used by Hash Aggregation, but it may be generally useful in the future. Discussion: https://postgr.es/m/412a3fbf306f84d8d78c4009e11791867e62b87c.camel@j-davis.com
-
David Rowley authored
There are many areas in the code where we need to determine the next highest power of 2 of a given number. We tend to always do that in an ad-hoc way each time, generally with some tight for loop which performs a bitshift left once per loop and goes until it finds a number above the given number. Here we add two generic functions which make use of the existing pg_leftmost_one_pos* functions which, when available, will allow us to calculate the next power of 2 without any looping. Here we don't add any code which uses these new functions. That will be done in follow-up commits. Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
-
Tom Lane authored
In commit 4dbcb3f8 I removed some code from parse_coerce.c, and also removed some apparently-no-longer-needed #includes. But removing datum.h broke some not-compiled-by-default code. Discussion: https://postgr.es/m/20200407205436.pyjhddw5bn5upvsu@development
-
Alvaro Herrera authored
Trying to ensure that a slot's restart_lsn or amount of reserved bytes exactly match some specific values seems unnecessary, and fragile as shown by failures in multiple buildfarm members. Discussion: https://postgr.es/m/20200407232602.GA21559@alvherre.pgsql
-
Thomas Munro authored
Provide PrefetchSharedBuffer(), a variant that takes SMgrRelation, for use in recovery. Rename LocalPrefetchBuffer() to PrefetchLocalBuffer() for consistency. Add a return value to all of these. In recovery, tolerate and report missing files, so we can handle relations unlinked before crash recovery began. Also report cache hits and misses, so that callers can do faster buffer lookups and better I/O accounting. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
-
Tom Lane authored
This case didn't work because columns merged by FULL JOIN USING are represented in the parse tree by COALESCE expressions, and the logic for recognizing a partitionable join failed to match upper-level join clauses to such expressions. To fix, synthesize suitable COALESCE expressions and add them to the nullable_partexprs lists. This is pretty ugly and brute-force, but it gets the job done. (I have ambitions of rethinking the way outer-join output Vars are represented, so maybe that will provide a cleaner solution someday. For now, do this.) Amit Langote, reviewed by Justin Pryzby, Richard Guo, and myself Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
-
Etsuro Fujita authored
Previously, the partitionwise join technique only allowed partitionwise join when input partitioned tables had exactly the same partition bounds. This commit extends the technique to some cases when the tables have different partition bounds, by using an advanced partition-matching algorithm introduced by this commit. For both the input partitioned tables, the algorithm checks whether every partition of one input partitioned table only matches one partition of the other input partitioned table at most, and vice versa. In such a case the join between the tables can be broken down into joins between the matching partitions, so the algorithm produces the pairs of the matching partitions, plus the partition bounds for the join relation, to allow partitionwise join for computing the join. Currently, the algorithm works for list-partitioned and range-partitioned tables, but not hash-partitioned tables. See comments in partition_bounds_merge(). Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by Dmitry Dolgov and Amul Sul, with additional review at various points by Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote, Justin Pryzby, and Tomas Vondra Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
-
Tom Lane authored
Our documentation describes four allowed input syntaxes for circles, but the regression tests tried only three ... with predictable consequences. Remarkably, this has been wrong since the circle datatype was added in 1997, but nobody noticed till now. David Zhang, with some help from me Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
-
Andres Freund authored
The goal of separating hotly accessed per-backend data from PGPROC into PGXACT is to make accesses fast (GetSnapshotData() in particular). But delayChkpt is not actually accessed frequently; only when starting a checkpoint. As it is frequently modified (multiple times in the course of a single transaction), storing it in the same cacheline as hotly accessed data unnecessarily dirties a contended cacheline. Therefore move delayChkpt to PGPROC. This is part of a larger series of patches intending to improve GetSnapshotData() scalability. It is committed and pushed separately, as it is independently beneficial (small but measurable win, limited by the other frequent modifications of PGXACT). Author: Andres Freund Reviewed-By: Robert Haas, Thomas Munro, David Rowley Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
-
Tomas Vondra authored
SLRU page hits were tracked only in SimpleLruReadPage, but that's not enough because we may hit the page in SimpleLruReadPage_ReadOnly in which case we don't call SimpleLruReadPage at all. Reported-by: Kuntal Ghosh Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development
-
Andres Freund authored
Before the fix every 2PC commit/abort leaked a file descriptor. As the files are opened using BasicOpenFile(), that quickly leads to the backend running out of file descriptors. Once enough 2PC abort/commit have caused enough FDs to leak, any IO in the backend will fail with "Too many open files", as BasicOpenFilePerm() will have triggered all open files known to fd.c to be closed. The leak causing the problem at hand is a consequence of 0dc8ead4, but is only exascerbated by it. Previously most XLogPageReadCB callbacks used static variables to cache one open file, but after the commit the cache is private to each XLogReader instance. There never was infrastructure to close FDs at the time of XLogReaderFree, but the way XLogReader was used limited the leak to one FD. This commit just closes the during XLogReaderFree() if the FD is stored in XLogReaderState.seg.ws_segno. This may not be the way to solve this medium/long term, but at least unbreaks 2PC. Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
-
- 07 Apr, 2020 14 commits
-
-
Alvaro Herrera authored
Food for the gods must always be found somehow, even when the land starves.
-
Peter Geoghegan authored
Since heap TID is supposed to be just another key attribute to the implementation, it doesn't make much sense to have separate BTreeTupleSetNAtts() and BTreeTupleSetAltHeapTID() functions. Merge the two functions together. This slightly simplifies _bt_truncate().
-
Alvaro Herrera authored
Replication slots are useful to retain data that may be needed by a replication system. But experience has shown that allowing them to retain excessive data can lead to the primary failing because of running out of space. This new feature allows the user to configure a maximum amount of space to be reserved using the new option max_slot_wal_keep_size. Slots that overrun that space are invalidated at checkpoint time, enabling the storage to be released. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
-
Tom Lane authored
We invented \gx to allow the "\pset expanded" flag to be forced on for the duration of one command output, but that turns out to not be nearly enough to satisfy the demand for variant output formats. Hence, make it possible to change any pset option(s) for the duration of a single command output, by writing "option=value ..." inside parentheses, for example \g (format=csv csv_fieldsep='\t') somefile \gx can now be understood as a shorthand for including expanded=on inside the parentheses. Patch by me, expanding on a proposal by Pavel Stehule Discussion: https://postgr.es/m/CAFj8pRBx9OnBPRJVtfA5ycUpySge-XootAXAsv_4rrkHxJ8eRg@mail.gmail.com
-
Alexander Korotkov authored
This commit adds following optional clause to BEGIN and START TRANSACTION commands. WAIT FOR LSN lsn [ TIMEOUT timeout ] New clause pospones transaction start till given lsn is applied on standby. This clause allows user be sure, that changes previously made on primary would be visible on standby. New shared memory struct is used to track awaited lsn per backend. Recovery process wakes up backend once required lsn is applied. Author: Ivan Kartyshov, Anna Akenteva Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs Reviewed-by: Amit Kapila, Alexander Korotkov Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
-
Alvaro Herrera authored
WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL standard's spelling of LIMIT), where you additionally get rows that compare equal to the last of those N rows by the columns in the mandatory ORDER BY clause. There was a proposal by Andrew Gierth to implement this functionality in a more powerful way that would yield more features, but the other patch had not been finished at this time, so we decided to use this one for now in the spirit of incremental development. Author: Surafel Temesgen <surafel3000@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk
-
Tom Lane authored
Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. Lift that restriction by redeclaring the bit number arguments as int8 (which requires a catversion bump, hence is not back-patchable). The similarly-named functions for bit/varbit don't really have a problem because we restrict those types to at most VARBITMAXLEN bits; hence leave them alone. While here, extend the encode/decode functions in utils/adt/encode.c to allow dealing with values wider than 1GB. This is not a live bug or restriction in current usage, because no input could be more than 1GB, and since none of the encoders can expand a string more than 4X, the result size couldn't overflow uint32. But it might be desirable to support more in future, so make the input length values size_t and the potential-output-length values uint64. Also add some test cases to improve the miserable code coverage of these functions. Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
-
Tomas Vondra authored
Reported-by: Thomas Munro
-
Tomas Vondra authored
Some places still used "Maximum" instead of "Peak" when displaying info about sort space, so fix that. Also, add a comment clarifying why it's correct to check the number of full/prefix sort groups. Author: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
-
Fujii Masao authored
Previously when there were multiple timelines listed in the history file of the recovery target timeline, archive recovery searched all of them, starting from the newest timeline to the oldest one, to find the segment to read. That is, archive recovery had to continuously fail scanning the segment until it reached the timeline that the segment belonged to. These scans for non-existent segment could be harmful on the recovery performance especially when archival area was located on the remote storage and each scan could take a long time. To address the issue, this commit changes archive recovery so that it skips scanning the timeline that the segment to read doesn't belong to. Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
-
Tomas Vondra authored
Commit d2d8a229 introduced Incremental Sort, but it was considered only in create_ordered_paths() as an alternative to regular Sort. There are many other places that require sorted input and might benefit from considering Incremental Sort too. This patch modifies a number of those places, but not all. The concern is that just adding Incremental Sort to any place that already adds Sort may increase the number of paths considered, negatively affecting planning time, without any benefit. So we've taken a more conservative approach, based on analysis of which places do affect a set of queries that did seem practical. This means some less common queries may not benefit from Incremental Sort yet. Author: Tomas Vondra Reviewed-by: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
-
Tom Lane authored
It turns out that the code did indeed rely on a zeroed TuplesortInstrumentation.sortMethod field to indicate "this worker never did anything", although it seems the issue only comes up during certain race-condition-y cases. Hence, rearrange the TuplesortMethod enum to restore SORT_TYPE_STILL_IN_PROGRESS to having the value zero, and add some comments reinforcing that that isn't optional. Also future-proof a loop over the possible values of the enum. sizeof(bits32) happened to be the correct limit value, but only by purest coincidence. Per buildfarm and local investigation. Discussion: https://postgr.es/m/12222.1586223974@sss.pgh.pa.us
-
Thomas Munro authored
The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to users as int8. Now that we have an SQL type xid8 for FullTransactionId, define a new set of functions including pg_current_xact_id() and pg_current_snapshot() based on that. Keep the old functions around too, for now. It's a bit sneaky to use the same C functions for both, but since the binary representation is identical except for the signedness of the type, and since older functions are the ones using the wrong signedness, and since we'll presumably drop the older ones after a reasonable period of time, it seems reasonable to switch to FullTransactionId internally and share the code for both. Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com> Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com> Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
-
Thomas Munro authored
Similar to xid, but 64 bits wide. This new type is suitable for use in various system views and administration functions. Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com> Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com> Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
-
- 06 Apr, 2020 5 commits
-
-
Tomas Vondra authored
Per report from lapwing.
-
Tomas Vondra authored
The last test in incremental_sort suite prints a parallel plan, but some of the buildfarm animals have custom max_parallel_workers_per_gather values, causing failures. Fixed by setting the GUC to an explicit value. Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
-
Peter Geoghegan authored
An assertion added by commit 0d861bbb checked that _bt_killitems() only processes a BTScanPosItem whose heap TID is contained in a posting list tuple when its page offset number still matches what is on the page (i.e. when it matches the posting list tuple's current offset number). This was only correct in the common case where the page can't have changed since we first read it. It was not correct in cases where we don't drop the buffer pin (and don't need to verify the page hasn't changed using its LSN). The latter category includes scans involving unlogged tables, and scans that use a non-MVCC snapshot, per the logic originally introduced by commit 2ed5b87f. The assertion still seems helpful. Fix it by taking cases where the page may have been concurrently modified into account. Reported-By: Anastasia Lubennikova, Alexander Lakhin Discussion: https://postgr.es/m/c4e38e9a-0f9c-8e53-e639-adf343f94472@postgrespro.ru
-
Tomas Vondra authored
When executed with force_parallel_mode=regress, the function was exiting too early and thus failed to print the worker stats. Fixed by making it more like show_sort_info. Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
-
Tomas Vondra authored
Incremental Sort is an optimized variant of multikey sort for cases when the input is already sorted by a prefix of the requested sort keys. For example when the relation is already sorted by (key1, key2) and we need to sort it by (key1, key2, key3) we can simply split the input rows into groups having equal values in (key1, key2), and only sort/compare the remaining column key3. This has a number of benefits: - Reduced memory consumption, because only a single group (determined by values in the sorted prefix) needs to be kept in memory. This may also eliminate the need to spill to disk. - Lower startup cost, because Incremental Sort produce results after each prefix group, which is beneficial for plans where startup cost matters (like for example queries with LIMIT clause). We consider both Sort and Incremental Sort, and decide based on costing. The implemented algorithm operates in two different modes: - Fetching a minimum number of tuples without check of equality on the prefix keys, and sorting on all columns when safe. - Fetching all tuples for a single prefix group and then sorting by comparing only the remaining (non-prefix) keys. We always start in the first mode, and employ a heuristic to switch into the second mode if we believe it's beneficial - the goal is to minimize the number of unnecessary comparions while keeping memory consumption below work_mem. This is a very old patch series. The idea was originally proposed by Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the patch was taken over by James Coleman, who wrote and rewrote most of the current code. There were many reviewers/contributors since 2013 - I've done my best to pick the most active ones, and listed them in this commit message. Author: James Coleman, Alexander Korotkov Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
-