1. 07 Apr, 2018 16 commits
    • Alvaro Herrera's avatar
      Support partition pruning at execution time · 499be013
      Alvaro Herrera authored
      Existing partition pruning is only able to work at plan time, for query
      quals that appear in the parsed query.  This is good but limiting, as
      there can be parameters that appear later that can be usefully used to
      further prune partitions.
      
      This commit adds support for pruning subnodes of Append which cannot
      possibly contain any matching tuples, during execution, by evaluating
      Params to determine the minimum set of subnodes that can possibly match.
      We support more than just simple Params in WHERE clauses. Support
      additionally includes:
      
      1. Parameterized Nested Loop Joins: The parameter from the outer side of the
         join can be used to determine the minimum set of inner side partitions to
         scan.
      
      2. Initplans: Once an initplan has been executed we can then determine which
         partitions match the value from the initplan.
      
      Partition pruning is performed in two ways.  When Params external to the plan
      are found to match the partition key we attempt to prune away unneeded Append
      subplans during the initialization of the executor.  This allows us to bypass
      the initialization of non-matching subplans meaning they won't appear in the
      EXPLAIN or EXPLAIN ANALYZE output.
      
      For parameters whose value is only known during the actual execution
      then the pruning of these subplans must wait.  Subplans which are
      eliminated during this stage of pruning are still visible in the EXPLAIN
      output.  In order to determine if pruning has actually taken place, the
      EXPLAIN ANALYZE must be viewed.  If a certain Append subplan was never
      executed due to the elimination of the partition then the execution
      timing area will state "(never executed)".  Whereas, if, for example in
      the case of parameterized nested loops, the number of loops stated in
      the EXPLAIN ANALYZE output for certain subplans may appear lower than
      others due to the subplan having been scanned fewer times.  This is due
      to the list of matching subnodes having to be evaluated whenever a
      parameter which was found to match the partition key changes.
      
      This commit required some additional infrastructure that permits the
      building of a data structure which is able to perform the translation of
      the matching partition IDs, as returned by get_matching_partitions, into
      the list index of a subpaths list, as exist in node types such as
      Append, MergeAppend and ModifyTable.  This allows us to translate a list
      of clauses into a Bitmapset of all the subpath indexes which must be
      included to satisfy the clause list.
      
      Author: David Rowley, based on an earlier effort by Beena Emerson
      Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
      Jesper Pedersen
      Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
      499be013
    • Alvaro Herrera's avatar
      Add bms_prev_member function · 5c067521
      Alvaro Herrera authored
      This works very much like the existing bms_last_member function, only it
      traverses through the Bitmapset in the opposite direction from the most
      significant bit down to the least significant bit.  A special prevbit value of
      -1 may be used to have the function determine the most significant bit.  This
      is useful for starting a loop.  When there are no members less than prevbit,
      the function returns -2 to indicate there are no more members.
      
      Author: David Rowley
      Discussion: https://postgr.es/m/CAKJS1f-K=3d5MDASNYFJpUpc20xcBnAwNC1-AOeunhn0OtkWbQ@mail.gmail.com
      5c067521
    • Andres Freund's avatar
      Raise error when affecting tuple moved into different partition. · f16241be
      Andres Freund authored
      When an update moves a row between partitions (supported since
      2f178441), our normal logic for following update chains in READ
      COMMITTED mode doesn't work anymore. Cross partition updates are
      modeled as an delete from the old and insert into the new
      partition. No ctid chain exists across partitions, and there's no
      convenient space to introduce that link.
      
      Not throwing an error in a partitioned context when one would have
      been thrown without partitioning is obviously problematic. This commit
      introduces infrastructure to detect when a tuple has been moved, not
      just plainly deleted. That allows to throw an error when encountering
      a deletion that's actually a move, while attempting to following a
      ctid chain.
      
      The row deleted as part of a cross partition update is marked by
      pointing it's t_ctid to an invalid block, instead of self as a normal
      update would.  That was deemed to be the least invasive and most
      future proof way to represent the knowledge, given how few infomask
      bits are there to be recycled (there's also some locking issues with
      using infomask bits).
      
      External code following ctid chains should be updated to check for
      moved tuples. The most likely consequence of not doing so is a missed
      error.
      
      Author: Amul Sul, editorialized by me
      Reviewed-By: Amit Kapila, Pavan Deolasee, Andres Freund, Robert Haas
      Discussion: http://postgr.es/m/CAAJ_b95PkwojoYfz0bzXU8OokcTVGzN6vYGCNVUukeUDrnF3dw@mail.gmail.com
      f16241be
    • Teodor Sigaev's avatar
      Indexes with INCLUDE columns and their support in B-tree · 8224de4f
      Teodor Sigaev authored
      This patch introduces INCLUDE clause to index definition.  This clause
      specifies a list of columns which will be included as a non-key part in
      the index.  The INCLUDE columns exist solely to allow more queries to
      benefit from index-only scans.  Also, such columns don't need to have
      appropriate operator classes.  Expressions are not supported as INCLUDE
      columns since they cannot be used in index-only scans.
      
      Index access methods supporting INCLUDE are indicated by amcaninclude flag
      in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.
      
      In B-tree indexes INCLUDE columns are truncated from pivot index tuples
      (tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
      now might have variable number of attributes.  This patch also provides
      generic facility to support that: pivot tuples contain number of their
      attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
      that.  This facility will simplify further support of index suffix truncation.
      The changes of above are backward-compatible, pg_upgrade doesn't need special
      handling of B-tree indexes for that.
      
      Bump catalog version
      
      Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
      Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
      			 David Rowley, Alexander Korotkov
      Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
      8224de4f
    • Teodor Sigaev's avatar
      Make test of json(b)_to_tsvector language-independ · 01bb8516
      Teodor Sigaev authored
      Missed in 1c1791e0 commit
      01bb8516
    • Teodor Sigaev's avatar
      Add json(b)_to_tsvector function · 1c1791e0
      Teodor Sigaev authored
      Jsonb has a complex nature so there isn't best-for-everything way to convert it
      to tsvector for full text search. Current to_tsvector(json(b)) suggests to
      convert only string values, but it's possible to index keys, numerics and even
      booleans value. To solve that json(b)_to_tsvector has a second required
      argument contained a list of desired types of json fields. Second argument is
      a jsonb scalar or array right now with possibility to add new options in a
      future.
      
      Bump catalog version
      
      Author: Dmitry Dolgov with some editorization by me
      Reviewed by: Teodor Sigaev
      Discussion: https://www.postgresql.org/message-id/CA+q6zcXJQbS1b4kJ_HeAOoOc=unfnOrUEL=KGgE32QKDww7d8g@mail.gmail.com
      1c1791e0
    • Peter Eisentraut's avatar
      Fix timing issue in new subscription truncate test · 529ab7bd
      Peter Eisentraut authored
      We need to wait for the initial sync of all subscriptions.  On
      some (faster?) machines, this didn't make a difference, but
      the (slower?) buildfarm machines are upset.
      529ab7bd
    • Andres Freund's avatar
      Deactive flapping checksum isolation tests. · bf75fe47
      Andres Freund authored
      They've been broken for days, and prevent other tests from being
      run. The plan is to revert their addition later.
      
      Discussion: https://postgr.es/m/20180407162252.wfo5aorjrjw2n5ws@alap3.anarazel.de
      bf75fe47
    • Peter Eisentraut's avatar
      Logical replication support for TRUNCATE · 039eb6e9
      Peter Eisentraut authored
      Update the built-in logical replication system to make use of the
      previously added logical decoding for TRUNCATE support.  Add the
      required truncate callback to pgoutput and a new logical replication
      protocol message.
      
      Publications get a new attribute to determine whether to replicate
      truncate actions.  When updating a publication via pg_dump from an older
      version, this is not set, thus preserving the previous behavior.
      
      Author: Simon Riggs <simon@2ndquadrant.com>
      Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
      Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
      Reviewed-by: default avatarPetr Jelinek <petr.jelinek@2ndquadrant.com>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Reviewed-by: default avatarAlvaro Herrera <alvherre@alvh.no-ip.org>
      039eb6e9
    • Peter Eisentraut's avatar
      Logical decoding of TRUNCATE · 5dfd1e5a
      Peter Eisentraut authored
      Add a new WAL record type for TRUNCATE, which is only used when
      wal_level >= logical.  (For physical replication, TRUNCATE is already
      replicated via SMGR records.)  Add new callback for logical decoding
      output plugins to receive TRUNCATE actions.
      
      Author: Simon Riggs <simon@2ndquadrant.com>
      Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
      Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
      Reviewed-by: default avatarPetr Jelinek <petr.jelinek@2ndquadrant.com>
      Reviewed-by: default avatarAndres Freund <andres@anarazel.de>
      Reviewed-by: default avatarAlvaro Herrera <alvherre@alvh.no-ip.org>
      5dfd1e5a
    • Teodor Sigaev's avatar
      Predicate locking in hash indexes. · b508a56f
      Teodor Sigaev authored
      Hash index searches acquire predicate locks on the primary
      page of a bucket. It acquires a lock on both the old and new buckets
      for scans that happen concurrently with page splits. During a bucket
      split, a predicate lock is copied from the primary page of an old
      bucket to the primary page of a new bucket.
      
      Author: Shubham Barai, Amit Kapila
      Reviewed by: Amit Kapila, Alexander Korotkov, Thomas Munro
      Discussion: https://www.postgresql.org/message-id/flat/CALxAEPvNsM2GTiXdRgaaZ1Pjd1bs+sxfFsf7Ytr+iq+5JJoYXA@mail.gmail.com
      b508a56f
    • Alvaro Herrera's avatar
      Document partprune.c a little better · 971d7ddb
      Alvaro Herrera authored
      Author: Amit Langote
      Reviewed-by: Álvaro Herrera, David Rowley
      Discussion: https://postgr.es/m/CA+HiwqGzq4D6z=8R0AP+XhbTFCQ-4Ct+t2ekqjE9Fpm84_JUGg@mail.gmail.com
      971d7ddb
    • Andres Freund's avatar
      Blindly attempt to fix sepgsql tests broken due to 9fdb675f. · 4f813c72
      Andres Freund authored
      The failure appears to solely be caused by the changed partition
      pruning logic.
      
      Author: Andres Freund
      Discussion: https://postgr.es/m/20180406210330.wmqw42wqgiicktli@alap3.anarazel.de
      4f813c72
    • Andres Freund's avatar
      Attempt to fix endianess issues in new hash partition test. · 40e42e10
      Andres Freund authored
      The tests added as part of 9fdb675f yield differing results
      depending on endianess, causing buildfarm failures. As the differences
      are expected, split the hash partitioning tests into a different file
      and maintain alternative output. The separate file is so the amount of
      duplicated output is reduced.
      
      David produced the alternative output without a machine to test on, so
      it's possible this'll require a buildfarm cycle or two to get right.
      
      Author: David Rowley
      Discussion: https://postgr.es/m/CAKJS1f-6f4c2Qhuipe-GY7BKmFd0FMBobRnLS7hVCoAmTszsBg@mail.gmail.com
      40e42e10
    • Andres Freund's avatar
      Fix and improve pg_atomic_flag fallback implementation. · 8c3debbb
      Andres Freund authored
      The atomics fallback implementation for pg_atomic_flag was broken,
      returning the inverted value from pg_atomic_test_set_flag().  This was
      unnoticed because a) atomic flags were unused until recently b) the
      test code wasn't run when the fallback implementation was in
      use (because it didn't allow to test for some edge cases).
      
      Fix the bug, and improve the fallback so it has the same behaviour as
      the non-fallback implementation in the problematic edge cases. That
      breaks ABI compatibility in the back branches when fallbacks are in
      use, but given they were broken until now...
      
      Author: Andres Freund
      Reported-by: Daniel Gustafsson
      Discussion:
          https://postgr.es/m/FB948276-7B32-4B77-83E6-D00167F8EEB4@yesql.se
          https://postgr.es/m/20180406233854.uni2h3mbnveczl32@alap3.anarazel.de
      Backpatch: 9.5-, where the atomics abstraction was introduced.
      8c3debbb
    • Tom Lane's avatar
      Doc: fix broken markup. · eb2a0e00
      Tom Lane authored
      Commit 3d956d95 was apparently not checked against HEAD's doc toolchain.
      Per buildfarm.
      eb2a0e00
  2. 06 Apr, 2018 18 commits
  3. 05 Apr, 2018 6 commits