1. 08 Apr, 2016 2 commits
    • Teodor Sigaev's avatar
      Revert CREATE INDEX ... INCLUDING ... · 8b99edef
      Teodor Sigaev authored
      It's not ready yet, revert two commits
      690c5435 - unstable test output
      386e3d76 - patch itself
      8b99edef
    • Teodor Sigaev's avatar
      CREATE INDEX ... INCLUDING (column[, ...]) · 386e3d76
      Teodor Sigaev authored
      Now indexes (but only B-tree for now) can contain "extra" column(s) which
      doesn't participate in index structure, they are just stored in leaf
      tuples. It allows to use index only scan by using single index instead
      of two or more indexes.
      
      Author: Anastasia Lubennikova with minor editorializing by me
      Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
      386e3d76
  2. 07 Apr, 2016 1 commit
    • Stephen Frost's avatar
      In pg_dump, include pg_catalog and extension ACLs, if changed · 23f34fa4
      Stephen Frost authored
      Now that all of the infrastructure exists, add in the ability to
      dump out the ACLs of the objects inside of pg_catalog or the ACLs
      for objects which are members of extensions, but only if they have
      been changed from their original values.
      
      The original values are tracked in pg_init_privs.  When pg_dump'ing
      9.6-and-above databases, we will dump out the ACLs for all objects
      in pg_catalog and the ACLs for all extension members, where the ACL
      has been changed from the original value which was set during either
      initdb or CREATE EXTENSION.
      
      This should not change dumps against pre-9.6 databases.
      
      Reviews by Alexander Korotkov, Jose Luis Tallon
      23f34fa4
  3. 25 Mar, 2016 1 commit
    • Tom Lane's avatar
      Fix DROP OPERATOR to reset oprcom/oprnegate links to the dropped operator. · c94959d4
      Tom Lane authored
      This avoids leaving dangling links in pg_operator; which while fairly
      harmless are also unsightly.
      
      While we're at it, simplify OperatorUpd, which went through
      heap_modify_tuple for no very good reason considering it had already made
      a tuple copy it could just scribble on.
      
      Roma Sokolov, reviewed by Tomas Vondra, additional hacking by Robert Haas
      and myself.
      c94959d4
  4. 24 Mar, 2016 1 commit
  5. 05 Feb, 2016 1 commit
    • Tom Lane's avatar
      Add num_nulls() and num_nonnulls() to count NULL arguments. · 6819514f
      Tom Lane authored
      An example use-case is "CHECK(num_nonnulls(a,b,c) = 1)" to assert that
      exactly one of a,b,c isn't NULL.  The functions are variadic, so they
      can also be pressed into service to count the number of null or nonnull
      elements in an array.
      
      Marko Tiikkaja, reviewed by Pavel Stehule
      6819514f
  6. 27 Nov, 2015 1 commit
  7. 06 Nov, 2015 1 commit
  8. 08 Oct, 2015 1 commit
    • Andrew Dunstan's avatar
      Factor out encoding specific tests for json · b6363772
      Andrew Dunstan authored
      This lets us remove the large alternative results files for the main
      json and jsonb tests, which makes modifying those tests simpler for
      committers and patch submitters.
      
      Backpatch to 9.4 for jsonb and 9.3 for json.
      b6363772
  9. 21 Aug, 2015 1 commit
    • Stephen Frost's avatar
      In AlterRole, make bypassrls an int · 7ec8296e
      Stephen Frost authored
      When reworking bypassrls in AlterRole to operate the same way the other
      attribute handling is done, I missed that the variable was incorrectly a
      bool rather than an int.  This meant that on platforms with an unsigned
      char, we could end up with incorrect behavior during ALTER ROLE.
      
      Pointed out by Andres thanks to tests he did changing our bool to be the
      one from stdbool.h which showed this and a number of other issues.
      
      Add regression tests to test CREATE/ALTER role for the various role
      attributes.  Arrange to leave roles behind for testing pg_dumpall, but
      none which have the LOGIN attribute.
      
      Back-patch to 9.5 where the AlterRole bug exists.
      7ec8296e
  10. 14 Jul, 2015 1 commit
  11. 16 May, 2015 1 commit
    • Andres Freund's avatar
      Support GROUPING SETS, CUBE and ROLLUP. · f3d31185
      Andres Freund authored
      This SQL standard functionality allows to aggregate data by different
      GROUP BY clauses at once. Each grouping set returns rows with columns
      grouped by in other sets set to NULL.
      
      This could previously be achieved by doing each grouping as a separate
      query, conjoined by UNION ALLs. Besides being considerably more concise,
      grouping sets will in many cases be faster, requiring only one scan over
      the underlying data.
      
      The current implementation of grouping sets only supports using sorting
      for input. Individual sets that share a sort order are computed in one
      pass. If there are sets that don't share a sort order, additional sort &
      aggregation steps are performed. These additional passes are sourced by
      the previous sort step; thus avoiding repeated scans of the source data.
      
      The code is structured in a way that adding support for purely using
      hash aggregation or a mix of hashing and sorting is possible. Sorting
      was chosen to be supported first, as it is the most generic method of
      implementation.
      
      Instead of, as in an earlier versions of the patch, representing the
      chain of sort and aggregation steps as full blown planner and executor
      nodes, all but the first sort are performed inside the aggregation node
      itself. This avoids the need to do some unusual gymnastics to handle
      having to return aggregated and non-aggregated tuples from underlying
      nodes, as well as having to shut down underlying nodes early to limit
      memory usage.  The optimizer still builds Sort/Agg node to describe each
      phase, but they're not part of the plan tree, but instead additional
      data for the aggregation node. They're a convenient and preexisting way
      to describe aggregation and sorting.  The first (and possibly only) sort
      step is still performed as a separate execution step. That retains
      similarity with existing group by plans, makes rescans fairly simple,
      avoids very deep plans (leading to slow explains) and easily allows to
      avoid the sorting step if the underlying data is sorted by other means.
      
      A somewhat ugly side of this patch is having to deal with a grammar
      ambiguity between the new CUBE keyword and the cube extension/functions
      named cube (and rollup). To avoid breaking existing deployments of the
      cube extension it has not been renamed, neither has cube been made a
      reserved keyword. Instead precedence hacking is used to make GROUP BY
      cube(..) refer to the CUBE grouping sets feature, and not the function
      cube(). To actually group by a function cube(), unlikely as that might
      be, the function name has to be quoted.
      
      Needs a catversion bump because stored rules may change.
      
      Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
      Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
          Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
      Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
      f3d31185
  12. 15 May, 2015 1 commit
    • Simon Riggs's avatar
      TABLESAMPLE, SQL Standard and extensible · f6d208d6
      Simon Riggs authored
      Add a TABLESAMPLE clause to SELECT statements that allows
      user to specify random BERNOULLI sampling or block level
      SYSTEM sampling. Implementation allows for extensible
      sampling functions to be written, using a standard API.
      Basic version follows SQLStandard exactly. Usable
      concrete use cases for the sampling API follow in later
      commits.
      
      Petr Jelinek
      
      Reviewed by Michael Paquier and Simon Riggs
      f6d208d6
  13. 08 May, 2015 1 commit
    • Andres Freund's avatar
      Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE. · 168d5805
      Andres Freund authored
      The newly added ON CONFLICT clause allows to specify an alternative to
      raising a unique or exclusion constraint violation error when inserting.
      ON CONFLICT refers to constraints that can either be specified using a
      inference clause (by specifying the columns of a unique constraint) or
      by naming a unique or exclusion constraint.  DO NOTHING avoids the
      constraint violation, without touching the pre-existing row.  DO UPDATE
      SET ... [WHERE ...] updates the pre-existing tuple, and has access to
      both the tuple proposed for insertion and the existing tuple; the
      optional WHERE clause can be used to prevent an update from being
      executed.  The UPDATE SET and WHERE clauses have access to the tuple
      proposed for insertion using the "magic" EXCLUDED alias, and to the
      pre-existing tuple using the table name or its alias.
      
      This feature is often referred to as upsert.
      
      This is implemented using a new infrastructure called "speculative
      insertion". It is an optimistic variant of regular insertion that first
      does a pre-check for existing tuples and then attempts an insert.  If a
      violating tuple was inserted concurrently, the speculatively inserted
      tuple is deleted and a new attempt is made.  If the pre-check finds a
      matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
      If the insertion succeeds without detecting a conflict, the tuple is
      deemed inserted.
      
      To handle the possible ambiguity between the excluded alias and a table
      named excluded, and for convenience with long relation names, INSERT
      INTO now can alias its target table.
      
      Bumps catversion as stored rules change.
      
      Author: Peter Geoghegan, with significant contributions from Heikki
          Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
      Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
          Dean Rasheed, Stephen Frost and many others.
      168d5805
  14. 09 Mar, 2015 1 commit
    • Alvaro Herrera's avatar
      Allow CURRENT/SESSION_USER to be used in certain commands · 31eae602
      Alvaro Herrera authored
      Commands such as ALTER USER, ALTER GROUP, ALTER ROLE, GRANT, and the
      various ALTER OBJECT / OWNER TO, as well as ad-hoc clauses related to
      roles such as the AUTHORIZATION clause of CREATE SCHEMA, the FOR clause
      of CREATE USER MAPPING, and the FOR ROLE clause of ALTER DEFAULT
      PRIVILEGES can now take the keywords CURRENT_USER and SESSION_USER as
      user specifiers in place of an explicit user name.
      
      This commit also fixes some quite ugly handling of special standards-
      mandated syntax in CREATE USER MAPPING, which in particular would fail
      to work in presence of a role named "current_user".
      
      The special role specifiers PUBLIC and NONE also have more consistent
      handling now.
      
      Also take the opportunity to add location tracking to user specifiers.
      
      Authors: Kyotaro Horiguchi.  Heavily reworked by Álvaro Herrera.
      Reviewed by: Rushabh Lathia, Adam Brightwell, Marti Raudsepp.
      31eae602
  15. 08 Jan, 2015 1 commit
    • Stephen Frost's avatar
      Move rowsecurity event trigger test · c219cbfe
      Stephen Frost authored
      The event trigger test for rowsecurity can cause problems for other
      tests which are run in parallel with it.  Instead of running that test
      in the rowsecurity set, move it to the event_trigger set, which runs
      isolated from other tests.
      
      Also reverts 7161b082, which moved rowsecurity into its own test group.
      That's no longer necessary, now that the event trigger test is gone from
      the rowsecurity set of tests.
      
      Pointed out by Tom.
      c219cbfe
  16. 31 Dec, 2014 1 commit
  17. 23 Dec, 2014 1 commit
    • Alvaro Herrera's avatar
      Add SQL-callable pg_get_object_address · d7ee82e5
      Alvaro Herrera authored
      This allows access to get_object_address from SQL, which is useful to
      obtain OID addressing information from data equivalent to that emitted
      by the parser.  This is necessary infrastructure of a project to let
      replication systems propagate object dropping events to remote servers,
      where the schema might be different than the server originating the
      DROP.
      
      This patch also adds support for OBJECT_DEFAULT to get_object_address;
      that is, it is now possible to refer to a column's default value.
      
      Catalog version bumped due to the new function.
      
      Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres
      Freund, Abhijit Menon-Sen, Adam Brightwell.
      d7ee82e5
  18. 07 Dec, 2014 1 commit
    • Simon Riggs's avatar
      Event Trigger for table_rewrite · 618c9430
      Simon Riggs authored
      Generate a table_rewrite event when ALTER TABLE
      attempts to rewrite a table. Provide helper
      functions to identify table and reason.
      
      Intended use case is to help assess or to react
      to schema changes that might hold exclusive locks
      for long periods.
      
      Dimitri Fontaine, triggering an edit by Simon Riggs
      
      Reviewed in detail by Michael Paquier
      618c9430
  19. 19 Nov, 2014 1 commit
    • Heikki Linnakangas's avatar
      Add test cases for indexam operations not currently covered. · 88fc7192
      Heikki Linnakangas authored
      That includes VACUUM on GIN, GiST and SP-GiST indexes, and B-tree indexes
      large enough to cause page deletions in B-tree. Plus some other special
      cases.
      
      After this patch, the regression tests generate all different WAL record
      types. Not all branches within the redo functions are covered, but it's a
      step forward.
      88fc7192
  20. 07 Nov, 2014 1 commit
    • Alvaro Herrera's avatar
      BRIN: Block Range Indexes · 7516f525
      Alvaro Herrera authored
      BRIN is a new index access method intended to accelerate scans of very
      large tables, without the maintenance overhead of btrees or other
      traditional indexes.  They work by maintaining "summary" data about
      block ranges.  Bitmap index scans work by reading each summary tuple and
      comparing them with the query quals; all pages in the range are returned
      in a lossy TID bitmap if the quals are consistent with the values in the
      summary tuple, otherwise not.  Normal index scans are not supported
      because these indexes do not store TIDs.
      
      As new tuples are added into the index, the summary information is
      updated (if the block range in which the tuple is added is already
      summarized) or not; in the latter case, a subsequent pass of VACUUM or
      the brin_summarize_new_values() function will create the summary
      information.
      
      For data types with natural 1-D sort orders, the summary info consists
      of the maximum and the minimum values of each indexed column within each
      page range.  This type of operator class we call "Minmax", and we
      supply a bunch of them for most data types with B-tree opclasses.
      Since the BRIN code is generalized, other approaches are possible for
      things such as arrays, geometric types, ranges, etc; even for things
      such as enum types we could do something different than minmax with
      better results.  In this commit I only include minmax.
      
      Catalog version bumped due to new builtin catalog entries.
      
      There's more that could be done here, but this is a good step forwards.
      
      Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
      with contribution by Heikki Linnakangas.
      
      Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
      Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
      
      PS:
        The research leading to these results has received funding from the
        European Union's Seventh Framework Programme (FP7/2007-2013) under
        grant agreement n° 318633.
      7516f525
  21. 01 Oct, 2014 1 commit
    • Tom Lane's avatar
      Fix some more problems with nested append relations. · 5a6c168c
      Tom Lane authored
      As of commit a87c7291 (which later got backpatched as far as 9.1),
      we're explicitly supporting the notion that append relations can be
      nested; this can occur when UNION ALL constructs are nested, or when
      a UNION ALL contains a table with inheritance children.
      
      Bug #11457 from Nelson Page, as well as an earlier report from Elvis
      Pranskevichus, showed that there were still nasty bugs associated with such
      cases: in particular the EquivalenceClass mechanism could try to generate
      "join" clauses connecting an appendrel child to some grandparent appendrel,
      which would result in assertion failures or bogus plans.
      
      Upon investigation I concluded that all current callers of
      find_childrel_appendrelinfo() need to be fixed to explicitly consider
      multiple levels of parent appendrels.  The most complex fix was in
      processing of "broken" EquivalenceClasses, which are ECs for which we have
      been unable to generate all the derived equality clauses we would like to
      because of missing cross-type equality operators in the underlying btree
      operator family.  That code path is more or less entirely untested by
      the regression tests to date, because no standard opfamilies have such
      holes in them.  So I wrote a new regression test script to try to exercise
      it a bit, which turned out to be quite a worthwhile activity as it exposed
      existing bugs in all supported branches.
      
      The present patch is essentially the same as far back as 9.2, which is
      where parameterized paths were introduced.  In 9.0 and 9.1, we only need
      to back-patch a small fragment of commit 5b7b5518, which fixes failure to
      propagate out the original WHERE clauses when a broken EC contains constant
      members.  (The regression test case results show that these older branches
      are noticeably stupider than 9.2+ in terms of the quality of the plans
      generated; but we don't really care about plan quality in such cases,
      only that the plan not be outright wrong.  A more invasive fix in the
      older branches would not be a good idea anyway from a plan-stability
      standpoint.)
      5a6c168c
  22. 19 Sep, 2014 1 commit
    • Stephen Frost's avatar
      Row-Level Security Policies (RLS) · 491c029d
      Stephen Frost authored
      Building on the updatable security-barrier views work, add the
      ability to define policies on tables to limit the set of rows
      which are returned from a query and which are allowed to be added
      to a table.  Expressions defined by the policy for filtering are
      added to the security barrier quals of the query, while expressions
      defined to check records being added to a table are added to the
      with-check options of the query.
      
      New top-level commands are CREATE/ALTER/DROP POLICY and are
      controlled by the table owner.  Row Security is able to be enabled
      and disabled by the owner on a per-table basis using
      ALTER TABLE .. ENABLE/DISABLE ROW SECURITY.
      
      Per discussion, ROW SECURITY is disabled on tables by default and
      must be enabled for policies on the table to be used.  If no
      policies exist on a table with ROW SECURITY enabled, a default-deny
      policy is used and no records will be visible.
      
      By default, row security is applied at all times except for the
      table owner and the superuser.  A new GUC, row_security, is added
      which can be set to ON, OFF, or FORCE.  When set to FORCE, row
      security will be applied even for the table owner and superusers.
      When set to OFF, row security will be disabled when allowed and an
      error will be thrown if the user does not have rights to bypass row
      security.
      
      Per discussion, pg_dump sets row_security = OFF by default to ensure
      that exports and backups will have all data in the table or will
      error if there are insufficient privileges to bypass row security.
      A new option has been added to pg_dump, --enable-row-security, to
      ask pg_dump to export with row security enabled.
      
      A new role capability, BYPASSRLS, which can only be set by the
      superuser, is added to allow other users to be able to bypass row
      security using row_security = OFF.
      
      Many thanks to the various individuals who have helped with the
      design, particularly Robert Haas for his feedback.
      
      Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean
      Rasheed, with additional changes and rework by me.
      
      Reviewers have included all of the above, Greg Smith,
      Jeff McCormick, and Robert Haas.
      491c029d
  23. 08 Apr, 2014 1 commit
    • Robert Haas's avatar
      Add new to_reg* functions for error-free OID lookups. · 0886fc6a
      Robert Haas authored
      These functions won't throw an error if the object doesn't exist,
      or if (for functions and operators) there's more than one matching
      object.
      
      Yugo Nagata and Nozomi Anzai, reviewed by Amit Khandekar, Marti
      Raudsepp, Amit Kapila, and me.
      0886fc6a
  24. 23 Mar, 2014 1 commit
    • Andrew Dunstan's avatar
      Introduce jsonb, a structured format for storing json. · d9134d0a
      Andrew Dunstan authored
      The new format accepts exactly the same data as the json type. However, it is
      stored in a format that does not require reparsing the orgiginal text in order
      to process it, making it much more suitable for indexing and other operations.
      Insignificant whitespace is discarded, and the order of object keys is not
      preserved. Neither are duplicate object keys kept - the later value for a given
      key is the only one stored.
      
      The new type has all the functions and operators that the json type has,
      with the exception of the json generation functions (to_json, json_agg etc.)
      and with identical semantics. In addition, there are operator classes for
      hash and btree indexing, and two classes for GIN indexing, that have no
      equivalent in the json type.
      
      This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
      was intended to provide similar facilities to a nested hstore type, but which
      in the end proved to have some significant compatibility issues.
      
      Authors: Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
      Review: Andres Freund
      d9134d0a
  25. 19 Feb, 2014 1 commit
  26. 08 Nov, 2013 1 commit
    • Robert Haas's avatar
      Add the notion of REPLICA IDENTITY for a table. · 07cacba9
      Robert Haas authored
      Pending patches for logical replication will use this to determine
      which columns of a tuple ought to be considered as its candidate key.
      
      Andres Freund, with minor, mostly cosmetic adjustments by me
      07cacba9
  27. 10 Oct, 2013 1 commit
  28. 15 Jul, 2013 1 commit
  29. 03 Jul, 2013 1 commit
  30. 02 Jul, 2013 1 commit
    • Robert Haas's avatar
      Add support for multiple kinds of external toast datums. · 36820250
      Robert Haas authored
      To that end, support tags rather than lengths for external datums.
      As an example of how this can be used, add support or "indirect"
      tuples which point to some externally allocated memory containing
      a toast tuple.  Similar infrastructure could be used for other
      purposes, including, perhaps, support for alternative compression
      algorithms.
      
      Andres Freund, reviewed by Hitoshi Harada and myself
      36820250
  31. 04 Mar, 2013 1 commit
    • Kevin Grittner's avatar
      Add a materialized view relations. · 3bf3ab8c
      Kevin Grittner authored
      A materialized view has a rule just like a view and a heap and
      other physical properties like a table.  The rule is only used to
      populate the table, references in queries refer to the
      materialized data.
      
      This is a minimal implementation, but should still be useful in
      many cases.  Currently data is only populated "on demand" by the
      CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
      It is expected that future releases will add incremental updates
      with various timings, and that a more refined concept of defining
      what is "fresh" data will be developed.  At some point it may even
      be possible to have queries use a materialized in place of
      references to underlying tables, but that requires the other
      above-mentioned features to be working first.
      
      Much of the documentation work by Robert Haas.
      Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
      Security review by KaiGai Kohei, with a decision on how best to
      implement sepgsql still pending.
      3bf3ab8c
  32. 02 Feb, 2013 1 commit
  33. 08 Dec, 2012 1 commit
    • Tom Lane's avatar
      Support automatically-updatable views. · a99c42f2
      Tom Lane authored
      This patch makes "simple" views automatically updatable, without the need
      to create either INSTEAD OF triggers or INSTEAD rules.  "Simple" views
      are those classified as updatable according to SQL-92 rules.  The rewriter
      transforms INSERT/UPDATE/DELETE commands on such views directly into an
      equivalent command on the underlying table, which will generally have
      noticeably better performance than is possible with either triggers or
      user-written rules.  A view that has INSTEAD OF triggers or INSTEAD rules
      continues to operate the same as before.
      
      For the moment, security_barrier views are not considered simple.
      Also, we do not support WITH CHECK OPTION.  These features may be
      added in future.
      
      Dean Rasheed, reviewed by Amit Kapila
      a99c42f2
  34. 15 Oct, 2012 1 commit
  35. 29 Sep, 2012 1 commit
    • Alvaro Herrera's avatar
      Add alternative expected output for alter_generic · 811ca130
      Alvaro Herrera authored
      The original only expected file failed to consider machines without
      non-default collation support.  Per buildfarm.
      
      Also, move the test to another parallel group; the one it was originally
      put in is already full according to comments in the schedule file.  Per
      note from Tom Lane.
      811ca130
  36. 28 Sep, 2012 1 commit
  37. 18 Jul, 2012 1 commit
    • Robert Haas's avatar
      Syntax support and documentation for event triggers. · 3855968f
      Robert Haas authored
      They don't actually do anything yet; that will get fixed in a
      follow-on commit.  But this gets the basic infrastructure in place,
      including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
      SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
      pg_dump and psql support; and documentation for the anticipated
      initial feature set.
      
      Dimitri Fontaine, with review and a bunch of additional hacking by me.
      Thom Brown extensively reviewed earlier versions of this patch set,
      but there's not a whole lot of that code left in this commit, as it
      turns out.
      3855968f
  38. 20 Feb, 2012 1 commit
    • Tom Lane's avatar
      Fix regex back-references that are directly quantified with *. · 5223f96d
      Tom Lane authored
      The syntax "\n*", that is a backref with a * quantifier directly applied
      to it, has never worked correctly in Spencer's library.  This has been an
      open bug in the Tcl bug tracker since 2005:
      https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894
      
      The core of the problem is in parseqatom(), which first changes "\n*" to
      "\n+|" and then applies repeat() to the NFA representing the backref atom.
      repeat() thinks that any arc leading into its "rp" argument is part of the
      sub-NFA to be repeated.  Unfortunately, since parseqatom() already created
      the arc that was intended to represent the empty bypass around "\n+", this
      arc gets moved too, so that it now leads into the state loop created by
      repeat().  Thus, what was supposed to be an "empty" bypass gets turned into
      something that represents zero or more repetitions of the NFA representing
      the backref atom.  In the original example, in place of
      	^([bc])\1*$
      we now have something that acts like
      	^([bc])(\1+|[bc]*)$
      At runtime, the branch involving the actual backref fails, as it's supposed
      to, but then the other branch succeeds anyway.
      
      We could no doubt fix this by some rearrangement of the operations in
      parseqatom(), but that code is plenty ugly already, and what's more the
      whole business of converting "x*" to "x+|" probably needs to go away to fix
      another problem I'll mention in a moment.  Instead, this patch suppresses
      the *-conversion when the target is a simple backref atom, leaving the case
      of m == 0 to be handled at runtime.  This makes the patch in regcomp.c a
      one-liner, at the cost of having to tweak cbrdissect() a little.  In the
      event I went a bit further than that and rewrote cbrdissect() to check all
      the string-length-related conditions before it starts comparing characters.
      It seems a bit stupid to possibly iterate through many copies of an
      n-character backreference, only to fail at the end because the target
      string's length isn't a multiple of n --- we could have found that out
      before starting.  The existing coding could only be a win if integer
      division is hugely expensive compared to character comparison, but I don't
      know of any modern machine where that might be true.
      
      This does not fix all the problems with quantified back-references.  In
      particular, the code is still broken for back-references that appear within
      a larger expression that is quantified (so that direct insertion of the
      quantification limits into the BACKREF node doesn't apply).  I think fixing
      that will take some major surgery on the NFA code, specifically introducing
      an explicit iteration node type instead of trying to transform iteration
      into concatenation of modified regexps.
      
      Back-patch to all supported branches.  In HEAD, also add a regression test
      case for this.  (It may seem a bit silly to create a regression test file
      for just one test case; but I'm expecting that we will soon import a whole
      bunch of regex regression tests from Tcl, so might as well create the
      infrastructure now.)
      5223f96d
  39. 15 Feb, 2012 1 commit