- 24 Mar, 2019 1 commit
-
-
Andres Freund authored
This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
-
- 23 Mar, 2019 8 commits
-
-
Tom Lane authored
In combination with the previous commit, this ensures that valid XML data can always be dumped and reloaded, whether it is "document" or "content". Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
-
Tom Lane authored
Previously we were using the SQL:2003 definition, which doesn't allow this, but that creates a serious dump/restore gotcha: there is no setting of xmloption that will allow all valid XML data. Hence, switch to the 2006 definition. Since libxml doesn't accept <!DOCTYPE> directives in the mode we use for CONTENT parsing, the implementation is to detect <!DOCTYPE> in the input and switch to DOCUMENT parsing mode. This should not cost much, because <!DOCTYPE> should be close to the front of the input if it's there at all. It's possible that this causes the error messages for malformed input to be slightly different than they were before, if said input includes <!DOCTYPE>; but that does not seem like a big problem. In passing, buy back a few cycles in parsing of large XML documents by not doing strlen() of the whole input in parse_xml_decl(). Back-patch because dump/restore failures are not nice. This change shouldn't break any cases that worked before, so it seems safe to back-patch. Chapman Flack (revised a bit by me) Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
-
Peter Geoghegan authored
Suppress 3 lines of unstable DETAIL output from a DROP ROLE statement in event_trigger.sql. This is further cleanup for commit dd299df8. Note that the event_trigger test instability issue is very similar to the recently suppressed foreign_data test instability issue. Both issues involve DETAIL output for a DROP ROLE statement that needed to be changed as part of dd299df8. Per buildfarm member macaque.
-
Peter Geoghegan authored
Teach nbtree forward index scans to check the high key before moving to the right sibling page in the hope of finding that it isn't actually necessary to do so. The new check may indicate that the scan definitely cannot find matching tuples to the right, ending the scan immediately. We already opportunistically force a similar "continuescan orientated" key check of the final non-pivot tuple when it's clear that it cannot be returned to the scan due to being dead-to-all. The new high key check is complementary. The new approach for forward scans is more effective than checking the final non-pivot tuple, especially with composite indexes and non-unique indexes. The improvements to the logic for picking a split point added by commit fab25024 make it likely that relatively dissimilar high keys will appear on a page. A distinguishing key value that can only appear on non-pivot tuples on the right sibling page will often be present in leaf page high keys. Since forcing the final item to be key checked no longer makes any difference in the case of forward scans, the existing extra key check is now only used for backwards scans. Backward scans continue to opportunistically check the final non-pivot tuple, which is actually the first non-pivot tuple on the page (not the last). Note that even pg_upgrade'd v3 indexes make use of this optimization. Author: Peter Geoghegan, Heikki Linnakangas Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-WzkOmUduME31QnuTFpimejuQoiZ-HOf0pOWeFZNhTMctvA@mail.gmail.com
-
Michael Paquier authored
This makes the code more consistent with the surroundings. Author: Fabrízio de Royes Mello Discussion: https://postgr.es/m/CAFcNs+pXb_35r5feMU3-dWsWxXU=Yjq+spUsthFyGFbT0QcaKg@mail.gmail.com
-
Tom Lane authored
gcc is a bit pickier about this than perhaps it should be. Discussion: https://postgr.es/m/E1h6zzT-0003ft-DD@gemulon.postgresql.org
-
Andres Freund authored
Previously there was basically no coverage for UPDATEs encountering deleted rows, and no coverage for DELETE having to perform EPQ. That's problematic for an upcoming commit in which EPQ is tought to integrate with tableams. Also, there was no test for UPDATE to encounter a row UPDATEd into another partition. Author: Andres Freund
- 22 Mar, 2019 18 commits
-
-
Michael Paquier authored
This is an option consistent with what pg_dump, pg_rewind and pg_basebackup provide which is useful for leveraging the I/O effort when testing things, not to be used in a production environment. Author: Michael Paquier Reviewed-by: Michael Banck, Fabien Coelho, Sergei Kornilov Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
-
Peter Eisentraut authored
This reverts commit 4e274a04. These files aren't actually built anymore since 550b9d26.
-
Michael Paquier authored
An offline cluster can now work with more modes in pg_checksums: - --enable enables checksums in a cluster, updating all blocks with a correct checksum, and updating the control file at the end. - --disable disables checksums in a cluster, updating only the control file. - --check is an extra option able to verify checksums for a cluster, and the default used if no mode is specified. When running --enable or --disable, the data folder gets fsync'd for durability, and then it is followed by a control file update and flush to keep the operation consistent should the tool be interrupted, killed or the host unplugged. If no mode is specified in the options, then --check is used for compatibility with older versions of pg_checksums (named pg_verify_checksums in v11 where it was introduced). Author: Michael Banck, Michael Paquier Reviewed-by: Fabien Coelho, Magnus Hagander, Sergei Kornilov Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
-
Peter Eisentraut authored
We need to set the database to UTF8 encoding so that the test can use Unicode escapes.
-
Peter Eisentraut authored
-
Tom Lane authored
Postpone most of the effort of constructing PartitionedRelPruneInfos until after we have found out whether run-time pruning is needed at all. This costs very little duplicated effort (basically just an extra find_base_rel() call per partition) and saves quite a bit when we can't do run-time pruning. Also, merge the first loop (for building relid_subpart_map) into the second loop, since we don't need the map to be valid during that loop. Amit Langote Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
-
Peter Geoghegan authored
This is almost a straight revert of commit fff518d0, which itself was a revert of 7d3bf73a. It turns out that commit 8aa9dd74, which sorted dependent objects before deletion in DROP OWNED BY, was not sufficient to make all remaining unstable DETAIL output stable. Unstable DETAIL output from DROP ROLE was not affected, because that happens to use a different code path. It doesn't seem worthwhile to fix the other code path at this time. Discussion: https://postgr.es/m/6226.1553274783@sss.pgh.pa.us
-
Tom Lane authored
I (tgl) remain dubious that it's a good idea for PartitionDirectory to hold a pin on a relcache entry throughout planning, rather than copying the data or using some kind of refcount scheme. However, it's certainly the responsibility of the PartitionDirectory code to ensure that what it's handing back is a stable data structure, not that of its caller. So this is a pretty clear oversight in commit 898e5e32, and one that can cost a lot of performance when there are many partitions. Amit Langote (extracted from a much larger patch set) Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
-
Heikki Linnakangas authored
There were more large constants that needed UINT64CONST. And one variable was declared as "int", when it needed to be uint64. These bugs were only visible on 32-bit systems; clearly I should've tested on one, given that this code does a lot of work with 64-bit integers. Also, in the test "huge distances" test, the code created some values with random distances between them, but the test logic didn't take into account the possibility that the random distance was exactly 1. That never actually happens with the seed we're using, but let's be tidy.
-
Peter Eisentraut authored
Change the tests to use old-style ICU locale specifications so that they can run on older ICU versions.
-
Heikki Linnakangas authored
Use UINT64CONST for large constants.
-
Heikki Linnakangas authored
Use UINT64_FORMAT for printing uint64s.
-
Heikki Linnakangas authored
Buildfarm member 'woodlouse' failed one of the tests, and I'm not sure which test failed. Better to print the names of the tests, so that it will appear in the regression.diffs on failure.
-
Heikki Linnakangas authored
We mustn't assume that the IndexVacuumInfo pointer passed to bulkdelete() stage is still valid in the vacuumcleanup() stage. Per very pink buildfarm.
-
Heikki Linnakangas authored
To do this, we scan GiST two times. In the first pass we make note of empty leaf pages and internal pages. At second pass we scan through internal pages, looking for downlinks to the empty pages. Deleting internal pages is still not supported, like in nbtree, the last child of an internal page is never deleted. That means that if you have a workload where new keys are always inserted to different area than where old keys are removed, the index will still grow without bound. But the rate of growth will be an order of magnitude slower than before. Author: Andrey Borodin Discussion: https://www.postgresql.org/message-id/B1E4DF12-6CD3-4706-BDBD-BF3283328F60@yandex-team.ru
-
Heikki Linnakangas authored
The set is implemented as a B-tree, with a compact representation at leaf items, using Simple-8b algorithm, so that clusters of nearby values use less memory. The IntegerSet isn't used for anything yet, aside from the test code, but we have two patches in the works that would benefit from this: A patch to allow GiST vacuum to delete empty pages, and a patch to reduce heap VACUUM's memory usage, by storing the list of dead TIDs more efficiently and lifting the 1 GB limit on its size. This includes a unit test module, in src/test/modules/test_integerset. It can be used to verify correctness, as a regression test, but if you run it manully, it can also print memory usage and execution time of some of the tests. Author: Heikki Linnakangas, Andrey Borodin Reviewed-by: Julien Rouhaud Discussion: https://www.postgresql.org/message-id/b5e82599-1966-5783-733c-1a947ddb729f@iki.fi
-
Peter Eisentraut authored
This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
-
Michael Paquier authored
Trying to call the function with the top-most parent of a partition tree was leading to a crash. In this case the correct result is to return the top-most parent itself. Reported-by: Álvaro Herrera Author: Michael Paquier Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20190322032612.GA323@alvherre.pgsql
-
- 21 Mar, 2019 5 commits
-
-
Peter Geoghegan authored
This should be superseded by commit 8aa9dd74.
-
Alvaro Herrera authored
-
Alvaro Herrera authored
When DefineIndex recurses to create constraints on partitions, it needs to use the value returned by index_constraint_create to set up partition dependencies. However, in the course of fixing the DEPENDENCY_INTERNAL_AUTO mess, commit 1d92a0c9 introduced some code to that function that clobbered the return value, causing the recorded OID to be of the wrong object. Close examination of pg_depend after creating the tables leads to indescribable objects :-( My sin (in commit bdc3d7fa, while preparing for DDL deparsing in event triggers) was to use a variable name for the return value that's typically used for throwaway objects in dependency-setting calls ("referenced"). Fix by changing the variable names to match extended practice (the return value is "myself" rather than "referenced".) The pg_upgrade test notices the problem (in an indirect way: the pg_dump outputs are in different order), but only if you create the objects in a specific way that wasn't being used in the existing tests. Add a stanza to leave some objects around that shows the bug. Catversion bump because preexisting databases might have bogus pg_depend entries. Discussion: https://postgr.es/m/20190318204235.GA30360@alvherre.pgsql
-
Tom Lane authored
These commands allow the argument type list to be omitted if there is just one object that matches by name. However, if that syntax was used with DROP IF EXISTS and there was more than one match, you got a "function ... does not exist, skipping" notice message rather than a truthful complaint about the ambiguity. This was basically due to poor factorization and a rats-nest of logic, so refactor the relevant lookup code to make it cleaner. Note that this amounts to narrowing the scope of which sorts of error conditions IF EXISTS will bypass. Per discussion, we only intend it to skip no-such-object cases, not multiple-possible-matches cases. Per bug #15572 from Ash Marath. Although this definitely seems like a bug, it's not clear that people would thank us for changing the behavior in minor releases, so no back-patch. David Rowley, reviewed by Julien Rouhaud and Pavel Stehule Discussion: https://postgr.es/m/15572-ed1b9ed09503de8a@postgresql.org
-
Thomas Munro authored
LDAP servers can be advertised on a network with RFC 2782 DNS SRV records. The OpenLDAP command-line tools automatically try to find servers that way, if no server name is provided by the user. Teach PostgreSQL to do the same using OpenLDAP's support functions, when building with OpenLDAP. For now, we assume that HAVE_LDAP_INITIALIZE (an OpenLDAP extension available since OpenLDAP 2.0 and also present in Apple LDAP) implies that you also have ldap_domain2hostlist() (which arrived in the same OpenLDAP version and is also present in Apple LDAP). Author: Thomas Munro Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/CAEepm=2hAnSfhdsd6vXsM6VZVN0br-FbAZ-O+Swk18S5HkCP=A@mail.gmail.com
-
- 20 Mar, 2019 8 commits
-
-
Tom Lane authored
This finishes a task we left undone in commit f1ad067f, by extending the delete-in-descending-OID-order rule to deletions triggered by DROP OWNED BY. We've coped with machine-dependent deletion orders one time too many, and the new issues caused by Peter G's recent nbtree hacking seem like the last straw. Discussion: https://postgr.es/m/E1h6eep-0001Mw-Vd@gemulon.postgresql.org
-
Alvaro Herrera authored
This new function simplifies some existing coding, as well as supports future patches. Discussion: https://postgr.es/m/201901222145.t6wws6t6vrcu@alvherre.pgsql Reviewed-by: Amit Langote, Jesper Pedersen
-
Peter Geoghegan authored
Cleanup from commit dd299df8. Per complaint from Tom Lane.
-
Peter Geoghegan authored
Unstable sort order related to changes to nbtree from commit dd299df8 can cause two lines of DETAIL output to be in opposite-of-expected order. Suppress the output using the same VERBOSITY hack that is used elsewhere in the foreign_data tests. Note that the same foreign_data.out DETAIL output was mechanically updated by commit dd299df8. Only a few such changes were required, though. Per buildfarm member batfish. Discussion: https://postgr.es/m/CAH2-WzkCQ_MtKeOpzozj7QhhgP1unXsK8o9DMAFvDqQFEPpkYQ@mail.gmail.com
-
Alvaro Herrera authored
I unnecessarily removed this check in 3de241db because I misunderstood what the final representation of constraints across a partitioning hierarchy was to be. Put it back (in both branches). Discussion: https://postgr.es/m/201901222145.t6wws6t6vrcu@alvherre.pgsql
-
Peter Geoghegan authored
Teach contrib/amcheck's bt_index_parent_check() function to take advantage of the uniqueness property of heapkeyspace indexes in support of a new verification option: non-pivot tuples (non-highkey tuples on the leaf level) can optionally be re-found using a new search for each, that starts from the root page. If a tuple cannot be re-found, report that the index is corrupt. The new "rootdescend" verification option is exhaustive, and can therefore make a call to bt_index_parent_check() take a lot longer. Re-finding tuples during verification is mostly intended as an option for backend developers, since the corruption scenarios that it alone is uniquely capable of detecting seem fairly far-fetched. For example, "rootdescend" verification is much more likely to detect corruption of the least significant byte of a key from a pivot tuple in the root page of a B-Tree that already has at least three levels. Typically, only a few tuples on a cousin leaf page are at risk of "getting overlooked" by index scans in this scenario. The corrupt key in the root page is only slightly corrupt: corrupt enough to give wrong answers to some queries, and yet not corrupt enough to allow the problem to be detected without verifying agreement between the leaf page and the root page, skipping at least one internal page level. The existing bt_index_parent_check() checks never cross more than a single level. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-Wz=yTWnVu+HeHGKb2AGiADL9eprn-cKYAto4MkKOuiGtRQ@mail.gmail.com
-
Peter Geoghegan authored
Teach nbtree to give some consideration to how "distinguishing" candidate leaf page split points are. This should not noticeably affect the balance of free space within each half of the split, while still making suffix truncation truncate away significantly more attributes on average. The logic for choosing a leaf split point now uses a fallback mode in the case where the page is full of duplicates and it isn't possible to find even a minimally distinguishing split point. When the page is full of duplicates, the split should pack the left half very tightly, while leaving the right half mostly empty. Our assumption is that logical duplicates will almost always be inserted in ascending heap TID order with v4 indexes. This strategy leaves most of the free space on the half of the split that will likely be where future logical duplicates of the same value need to be placed. The number of cycles added is not very noticeable. This is important because deciding on a split point takes place while at least one exclusive buffer lock is held. We avoid using authoritative insertion scankey comparisons to save cycles, unlike suffix truncation proper. We use a faster binary comparison instead. Note that even pg_upgrade'd v3 indexes make use of these optimizations. Benchmarking has shown that even v3 indexes benefit, despite the fact that suffix truncation will only truncate non-key attributes in INCLUDE indexes. Grouping relatively similar tuples together is beneficial in and of itself, since it reduces the number of leaf pages that must be accessed by subsequent index scans. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-WzmmoLNQOj9mAD78iQHfWLJDszHEDrAzGTUMG3mVh5xWPw@mail.gmail.com
-
Peter Geoghegan authored
Make nbtree treat all index tuples as having a heap TID attribute. Index searches can distinguish duplicates by heap TID, since heap TID is always guaranteed to be unique. This general approach has numerous benefits for performance, and is prerequisite to teaching VACUUM to perform "retail index tuple deletion". Naively adding a new attribute to every pivot tuple has unacceptable overhead (it bloats internal pages), so suffix truncation of pivot tuples is added. This will usually truncate away the "extra" heap TID attribute from pivot tuples during a leaf page split, and may also truncate away additional user attributes. This can increase fan-out, especially in a multi-column index. Truncation can only occur at the attribute granularity, which isn't particularly effective, but works well enough for now. A future patch may add support for truncating "within" text attributes by generating truncated key values using new opclass infrastructure. Only new indexes (BTREE_VERSION 4 indexes) will have insertions that treat heap TID as a tiebreaker attribute, or will have pivot tuples undergo suffix truncation during a leaf page split (on-disk compatibility with versions 2 and 3 is preserved). Upgrades to version 4 cannot be performed on-the-fly, unlike upgrades from version 2 to version 3. contrib/amcheck continues to work with version 2 and 3 indexes, while also enforcing stricter invariants when verifying version 4 indexes. These stricter invariants are the same invariants described by "3.1.12 Sequencing" from the Lehman and Yao paper. A later patch will enhance the logic used by nbtree to pick a split point. This patch is likely to negatively impact performance without smarter choices around the precise point to split leaf pages at. Making these two mostly-distinct sets of enhancements into distinct commits seems like it might clarify their design, even though neither commit is particularly useful on its own. The maximum allowed size of new tuples is reduced by an amount equal to the space required to store an extra MAXALIGN()'d TID in a new high key during leaf page splits. The user-facing definition of the "1/3 of a page" restriction is already imprecise, and so does not need to be revised. However, there should be a compatibility note in the v12 release notes. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas, Alexander Korotkov Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
-