- 07 Mar, 2019 9 commits
-
-
Alvaro Herrera authored
Correctly process nodes of more types than previously. In some cases, nodes were being ignored (nothing was output); in other cases, trying to return them resulted in errors about unrecognized nodes. In yet other cases, necessary escaping (of XML special characters) was not being done. Fix all those (as far as the authors could find) and add regression tests cases verifying the new behavior. I (Álvaro) was of two minds about backpatching these changes. They do seem bugfixes that would benefit most users of the affected functions; but on the other hand it would change established behavior in minor releases, so it seems prudent not to. Authors: Pavel Stehule, Markus Winand, Chapman Flack Discussion: https://postgr.es/m/CAFj8pRA6J25CtAZ2TuRvxK3gat7-bBUYh0rfE2yM7Hj9GD14Dg@mail.gmail.com https://postgr.es/m/8BDB0627-2105-4564-AA76-7849F028B96E@winand.at The elephant in the room as pointed out by Chapman Flack, not fixed in this commit, is that we still have XMLTABLE operating on XPath 1.0 instead of the standard-mandated XQuery (or even its subset XPath 2.0). Fixing that is a major undertaking, however.
-
Tom Lane authored
When we introduced separate ProjectSetPath nodes for application of set-returning functions in v10, we inadvertently broke some cases where we're supposed to recognize that the result of a subquery is known to be empty (contain zero rows). That's because IS_DUMMY_REL was just looking for a childless AppendPath without allowing for a ProjectSetPath being possibly stuck on top. In itself, this didn't do anything much worse than produce slightly worse plans for some corner cases. Then in v11, commit 11cf92f6 rearranged things to allow the scan/join targetlist to be applied directly to partial paths before they get gathered. But it inserted a short-circuit path for dummy relations that was a little too short: it failed to insert a ProjectSetPath node at all for a targetlist containing set-returning functions, resulting in bogus "set-valued function called in context that cannot accept a set" errors, as reported in bug #15669 from Madelaine Thibaut. The best way to fix this mess seems to be to reimplement IS_DUMMY_REL so that it drills down through any ProjectSetPath nodes that might be there (and it seems like we'd better allow for ProjectionPath as well). While we're at it, make it look at rel->pathlist not cheapest_total_path, so that it gives the right answer independently of whether set_cheapest has been done lately. That dependency looks pretty shaky in the context of code like apply_scanjoin_target_to_paths, and even if it's not broken today it'd certainly bite us at some point. (Nastily, unsafe use of the old coding would almost always work; the hazard comes down to possibly looking through a dangling pointer, and only once in a blue moon would you find something there that resulted in the wrong answer.) It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if there are any extensions using it, they'll continue to use the old inadequate logic until they're recompiled, after which they'll fail to load into server versions predating this fix. Hopefully there are few such extensions. Having fixed IS_DUMMY_REL, the special path for dummy rels in apply_scanjoin_target_to_paths is unnecessary as well as being wrong, so we can just drop it. Also change a few places that were testing for partitioned-ness of a planner relation but not using IS_PARTITIONED_REL for the purpose; that seems unsafe as well as inconsistent, plus it required an ugly hack in apply_scanjoin_target_to_paths. In passing, save a few cycles in apply_scanjoin_target_to_paths by skipping processing of pre-existing paths for partitioned rels, and do some cosmetic cleanup and comment adjustment in that function. I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking any code that might be using it, since in almost every case that would be wrong; IS_DUMMY_REL is what to be using instead. In HEAD, also make set_dummy_rel_pathlist static (since it's no longer used from outside allpaths.c), and delete is_dummy_plan, since it's no longer used anywhere. Back-patch as appropriate into v11 and v10. Tom Lane and Julien Rouhaud Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
-
Robert Haas authored
We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
-
Alvaro Herrera authored
-
Alvaro Herrera authored
This clause is used to indicate the passing mode of a XML document, but we were doing it wrong: we accepted BY REF and ignored it, and rejected BY VALUE as a syntax error. The reality, however, is that documents are always passed BY VALUE, so rejecting that clause was silly. Change things so that we accept BY VALUE. BY REF continues to be accepted, and continues to be ignored. Author: Chapman Flack Reviewed-by: Pavel Stehule Discussion: https://postgr.es/m/5C297BB7.9070509@anastigmatix.net
-
Alvaro Herrera authored
Per buildfarm
-
Alvaro Herrera authored
This is useful to speed up loading data in a different database engine. Authors: Surafel Temesgen and David Rowley. Lightly edited by Álvaro. Reviewed-by: Fabien Coelho Discussion: https://postgr.es/m/CALAY4q9kumSdnRBzvRJvSRf2+BH20YmSvzqOkvwpEmodD-xv6g@mail.gmail.com
-
Thomas Munro authored
-
Thomas Munro authored
Before commit 3fa2bb31 this type appeared in the catalogs to select which of several block storage mechanisms each relation used. New features under development propose to revive the concept of different block storage managers for new kinds of data accessed via bufmgr.c, but don't need to put references to them in the catalogs. So, avoid useless maintenance work on this type by dropping it. Update some regression tests that were referencing it where any type would do. Discussion: https://postgr.es/m/CA%2BhUKG%2BDE0mmiBZMtZyvwWtgv1sZCniSVhXYsXkvJ_Wo%2B83vvw%40mail.gmail.com
-
- 06 Mar, 2019 12 commits
-
-
Andres Freund authored
Until now the the slot to store the conflicting tuple, and the result of the ON CONFLICT SET, where reused between partitions. That necessitated changing slots descriptor when switching partitions. Besides the overhead of switching descriptors on a slot (which requires memory allocations and prevents JITing), that's importantly also problematic for tableam. There individual partitions might belong to different tableams, needing different kinds of slots. In passing also fix ExecOnConflictUpdate to clear the existing slot at exit. Otherwise that slot could continue to hold a pin till the query ends, which could be far too long if the input data set is large, and there's no further conflicts. While previously also problematic, it's now more important as there will be more such slots when partitioned. Author: Andres Freund Reviewed-By: Robert Haas, David Rowley Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
-
Andres Freund authored
In a complete brown paper bag moment, I forgot to include equalfuncs in my previous fix of copy/out/readfuncs. Thanks Tom for noticing. Discussion: https://postgr.es/m/1659.1551903210@sss.pgh.pa.us
-
Andrew Dunstan authored
This will stop logging cases where, for example, a monitor opens a connection and immediately closes it. If the packet contains any data an incomplete packet will still be logged. Author: Tom Lane Discussion: https://postgr.es/m/a1379a72-2958-1ed0-ef51-09a21219b155@2ndQuadrant.com
-
Andres Freund authored
This includes a catversion bump, as IntoClause is theoretically speaking part of storable rules. In practice I don't think that can happen, but there's no reason to be stingy here. Per buildfarm member calliphoridae.
-
Andres Freund authored
Per buildfarm. This time I hopefully actually made sure to get all the cases...
-
Andres Freund authored
Per buildfarm.
-
Andres Freund authored
This adds pg_dump support for table AMs in a similar manner to how tablespaces are handled. That is, instead of specifying the AM for every CREATE TABLE etc, emit SET default_table_access_method statements. That makes it easier to change the AM for all/most tables in a dump, and allows restore to succeed even if some AM is not available. This increases the dump archive version, as a tables/matview's AM needs to be tracked therein. Author: Dimitri Dolgov, Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
-
Andres Freund authored
This introduces the concept of table access methods, i.e. CREATE ACCESS METHOD ... TYPE TABLE and CREATE TABLE ... USING (storage-engine). No table access functionality is delegated to table AMs as of this commit, that'll be done in following commits. Subsequent commits will incrementally abstract table access functionality to be routed through table access methods. That change is too large to be reviewed & committed at once, so it'll be done incrementally. Docs will be updated at the end, as adding them incrementally would likely make them less coherent, and definitely is a lot more work, without a lot of benefit. Table access methods are specified similar to index access methods, i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with callbacks. In contrast to index AMs that struct needs to live as long as a backend, typically that's achieved by just returning a pointer to a constant struct. Psql's \d+ now displays a table's access method. That can be disabled with HIDE_TABLEAM=true, which is mainly useful so regression tests can be run against different AMs. It's quite possible that this behaviour still needs to be fine tuned. For now it's not allowed to set a table AM for a partitioned table, as we've not resolved how partitions would inherit that. Disallowing allows us to introduce, if we decide that's the way forward, such a behaviour without a compatibility break. Catversion bumped, to add the heap table AM and references to it. Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
-
Andres Freund authored
I broke/typoed this in 4da597ed. Astonishingly this mostly doesn't cause breakage, except when trying to change the tuple descriptor of a slot (because TTS_FLAG_FIXED is assumed to be set). Author: Andres Freund
-
Robert Haas authored
Antonin Houska Discussion: http://postgr.es/m/30413.1551870730@localhost
-
Andrew Dunstan authored
The original 200 default value was set back in f425b605 when the cost delay settings were first added. Hardware has improved quite a bit since then and we've also made improvements such as sorting buffers during checkpoints (9cd00c45) which should result in less random writes. This low default value was reportedly causing problems with badly configured servers and in the absence of a native method to remove excessive bloat from tables without incurring an AccessExclusiveLock, this often made cleaning up the damage caused by badly configured auto-vacuums difficult. It seems more likely that someone will notice that auto-vacuum is running too quickly than too slowly, so let's go all out and multiple the default value for the setting by 10. With the default vacuum_cost_page_dirty and autovacuum_vacuum_cost_delay (assuming a page size of 8192 bytes), this allows autovacuum a theoretical maximum dirty write rate of around 39MB/s instead of just 3.9MB/s. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f_YbXC2qTMPyCbmsPiKvZYwpuQNQMohiRXLj1r=8_rYvw@mail.gmail.com
-
Michael Paquier authored
This was missing since 803b1301, which has introduced the option for the user-facing VACUUM and ANALYZE. Author: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoD2TMdTxRhZ7WSp940V82_OAyPmgHnbi25UbbArLgA92Q@mail.gmail.com
-
- 05 Mar, 2019 4 commits
-
-
Andrew Dunstan authored
This test fails if the containing directory contains a funny character such as a space or some perl metacharacter. To avoid that, we check for files names using readdir and a regex, rather than using a glob pattern. Discussion: https://postgr.es/m/CAM6_UM6dGdU39PKAC24T+HD9ouy0jLN9vH6163K8QEEzr__iZw@mail.gmail.com Author: Fabien COELHO Reviewed-by: Raúl Marín Rodríguez
-
Peter Eisentraut authored
The original commit appears to have accidentally introduced a duplicate definition. Keep only one of them.
-
Heikki Linnakangas authored
Scanning an index in physical order is faster than walking it in logical order, because sequential I/O is faster than random I/O. The idea and code structure is borrowed from B-tree vacuum code. Patch by Andrey Borodin, with changes by me. Based on early work by Konstantin Kuznetsov, although the patch has been rewritten multiple times since his original version. Discussion: https://www.postgresql.org/message-id/1B9FAC6F-FA19-4A24-8C1B-F4F574844892%40yandex-team.ru
-
Peter Geoghegan authored
The nbtree README claims that VACUUM can never finish interrupted page splits by design. That isn't entirely accurate, though. Note an exception to the general rule. Discussion: https://postgr.es/m/CAH2-Wz=_Xvv8byzK_LvY4ci76OgsHCQzoKF7We8yG9waO7j6rA@mail.gmail.com
-
- 04 Mar, 2019 9 commits
-
-
Andrew Dunstan authored
For some reason the dump test with names with high bits set fails on Msys2 (although not Msys1). Disable the tests for now, so that other tests can run.
-
Andrew Dunstan authored
This is the only test that fails when run as an admin user. The reason is that when Postgres is started via pg_ctl its admin privileges are lowered. However, this test called 'postgres -D datadir' directly, resulting in a failure. Replace that by calling pg_ctl and then checking the result for the expected failure, and the logfile for the expected error message.
-
Peter Geoghegan authored
Commit 2c03216d, which revamped the WAL record format, failed to update a comment referencing the old API. Update the comment.
-
Andrew Dunstan authored
Currently only frogmouth in the buildfarm uses the 32bit params, and it's not able to build past release 10, so put those last, saving substantial configure time on more modern systems. Even if we get a modern 32 bit Windows system at some stage we should probably prefer the 64 bit interface here these days.
-
Alvaro Herrera authored
Adds another introspection feature for partitioning, necessary for further psql patches. Reviewed-by: Michaël Paquier Discussion: https://postgr.es/m/20190226222757.GA31622@alvherre.pgsql
-
Alvaro Herrera authored
It's worth immortalizing this behavior, per discussion. Discussion: https://postgr.es/m/20190228193203.GA26151@alvherre.pgsql
-
Peter Eisentraut authored
This was no longer useful since the latch functions use memory barriers already, which are also compiler barriers, and volatile does not help with cross-process access. Discussion: https://www.postgresql.org/message-id/flat/20190218202511.qsfpuj5sy4dbezcw%40alap3.anarazel.de#18783c27d73e9e40009c82f6e0df0974
-
Michael Paquier authored
The implementation of readdir() in src/port/ which gets used by MSVC has been added in 399a36a7, and since the beginning it considers all errors on the first file lookup as ENOENT, setting errno accordingly and letting the routine caller think that the directory is empty. While this is normally enough for the case of the backend, this can confuse callers of this routine on Windows as all errors would map to the same behavior. So, for example, even permission errors would be thought as having an empty directory, while there could be contents in it. This commit changes the error handling so as readdir() gets a behavior similar to native implementations: force errno=0 when seeing ERROR_FILE_NOT_FOUND as error and consider other errors as plain failures. While looking at the patch, I noticed that MinGW does not enforce errno=0 when looking at the first file, but it gets enforced on the next file lookups. A comment related to that was incorrect in the code. Reported-by: Yuri Kurenkov Diagnosed-by: Yuri Kurenkov, Grigory Smolkin Author: Konstantin Knizhnik Reviewed-by: Andrew Dunstan, Michael Paquier Discussion: https://postgr.es/m/2cad7829-8d66-e39c-b937-ac825db5203d@postgrespro.ru Backpatch-through: 9.4
-
Andrew Dunstan authored
-
- 03 Mar, 2019 4 commits
-
-
Andrew Dunstan authored
The test crashes and burns quite badly, for some reason, but even if it didn't it wouldn't work, since Windows doesn't let you rename a file held by a running process.
-
Tom Lane authored
StoreIndexTuple was a loop over index_getattr, which is O(N^2) if the index columns are variable-width, and the performance impact is already quite visible at ten columns. The obvious move is to replace that with a call to index_deform_tuple ... but that's *also* a loop over index_getattr. Improve it to be essentially a clone of heap_deform_tuple. (There are a few other places that loop over all index columns with index_getattr, and perhaps should be changed likewise, but most of them don't seem performance-critical. Anyway, the rest would mostly only be interested in the index key columns, which there aren't likely to be so many of. Wide index tuples are a new thing with INCLUDE.) Konstantin Knizhnik Discussion: https://postgr.es/m/e06b2d27-04fc-5c0e-bb8c-ecd72aa24959@postgrespro.ru
-
Andrew Dunstan authored
Commit f092de05 added a test for pg_dumpall --exclude-database including the wildcard pattern '*dump*' which matches some files in the source directory. The test library on msys uses the shell which expands this and thus the program gets incorrect arguments. This doesn't happen if the pattern doesn't match any files, so here the pattern is set to '*dump_test*' which is such a pattern. Per buildfarm animal jacana.
-
Dean Rasheed authored
Previously, rewriteTargetListIU() generated a list of attribute numbers from the targetlist, which were passed to rewriteValuesRTE(), which expected them to contain the same number of entries as there are columns in the VALUES RTE, and to be in the same order. That was fine when the target relation was a table, but for an updatable view it could be broken in at least three different ways --- rewriteTargetListIU() could insert additional targetlist entries for view columns with defaults, the view columns could be in a different order from the columns of the underlying base relation, and targetlist entries could be merged together when assigning to elements of an array or composite type. As a result, when recursing to the base relation, the list of attribute numbers generated from the rewritten targetlist could no longer be relied upon to match the columns of the VALUES RTE. We got away with that prior to 41531e42 because it used to always be the case that rewriteValuesRTE() did nothing for the underlying base relation, since all DEFAULTS had already been replaced when it was initially invoked for the view, but that was incorrect because it failed to apply defaults from the base relation. Fix this by examining the targetlist entries more carefully and picking out just those that are simple Vars referencing the VALUES RTE. That's sufficient for the purposes of rewriteValuesRTE(), which is only responsible for dealing with DEFAULT items in the VALUES RTE. Any DEFAULT item in the VALUES RTE that doesn't have a matching simple-Var-assignment in the targetlist is an error which we complain about, but in theory that ought to be impossible. Additionally, move this code into rewriteValuesRTE() to give a clearer separation of concerns between the 2 functions. There is no need for rewriteTargetListIU() to know about the details of the VALUES RTE. While at it, fix the comment for rewriteValuesRTE() which claimed that it doesn't support array element and field assignments --- that hasn't been true since a3c7a993 (9.6 and later). Back-patch to all supported versions, with minor differences for the pre-9.6 branches, which don't support array element and field assignments to the same column in multi-row VALUES lists. Reviewed by Amit Langote. Discussion: https://postgr.es/m/15623-5d67a46788ec8b7f@postgresql.org
-
- 02 Mar, 2019 2 commits
-
-
Michael Paquier authored
This changes the partition functions so as tables and indexes which are not part of partition trees are handled the same way as what is done for undefined objects and unsupported relkinds: pg_partition_tree() returns no rows and pg_partition_root() returns a NULL result. Hence, partitioned tables, partitioned indexes and relations whose flag pg_class.relispartition is set are considered as valid objects to process. Previously, tables and indexes not included in a partition tree were processed the same way as a partition or a partitioned table, which caused the functions to return inconsistent results for inherited tables, especially when inheriting from multiple tables. Reported-by: Álvaro Herrera Author: Amit Langote, Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20190228193203.GA26151@alvherre.pgsql
-
Andres Freund authored
Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
-