1. 16 Jan, 2020 4 commits
    • Tom Lane's avatar
      Update header comments for wchar.c and encnames.c. · 3d4cb5d6
      Tom Lane authored
      Bring these into common style (including having proper copyright
      notices) and adjust their self-declaration of where they live.
      
      Discussion: https://postgr.es/m/CA+TgmoYO8oq-iy8E02rD8eX25T-9SmyxKWqqks5OMHxKvGXpXQ@mail.gmail.com
      3d4cb5d6
    • Tom Lane's avatar
      Move wchar.c and encnames.c to src/common/. · e6afa891
      Tom Lane authored
      Formerly, various frontend directories symlinked these two sources
      and then built them locally.  That's an ancient, ugly hack, and
      we now have a much better way: put them into libpgcommon.
      So do that.  (The immediate motivation for this is the prospect
      of having to introduce still more symlinking if we don't.)
      
      This commit moves these two files absolutely verbatim, for ease of
      reviewing the git history.  There's some follow-on work to be done
      that will modify them a bit.
      
      Robert Haas, Tom Lane
      
      Discussion: https://postgr.es/m/CA+TgmoYO8oq-iy8E02rD8eX25T-9SmyxKWqqks5OMHxKvGXpXQ@mail.gmail.com
      e6afa891
    • Robert Haas's avatar
      Fix problems with "read only query" checks, and refactor the code. · 2eb34ac3
      Robert Haas authored
      Previously, check_xact_readonly() was responsible for determining
      which types of queries could not be run in a read-only transaction,
      standard_ProcessUtility() was responsibility for prohibiting things
      which were allowed in read only transactions but not in recovery, and
      utility commands were basically prohibited in bulk in parallel mode by
      calls to CommandIsReadOnly() in functions.c and spi.c.  This situation
      was confusing and error-prone. Accordingly, move all the checks to a
      new function ClassifyUtilityCommandAsReadOnly(), which determines the
      degree to which a given statement is read only.
      
      In the old code, check_xact_readonly() inadvertently failed to handle
      several statement types that actually should have been prohibited,
      specifically T_CreatePolicyStmt, T_AlterPolicyStmt, T_CreateAmStmt,
      T_CreateStatsStmt, T_AlterStatsStmt, and T_AlterCollationStmt.  As a
      result, thes statements were erroneously allowed in read only
      transactions, parallel queries, and standby operation. Generally, they
      would fail anyway due to some lower-level error check, but we
      shouldn't rely on that.  In the new code structure, future omissions
      of this type should cause ClassifyUtilityCommandAsReadOnly() to
      complain about an unrecognized node type.
      
      As a fringe benefit, this means we can allow certain types of utility
      commands in parallel mode, where it's safe to do so. This allows
      ALTER SYSTEM, CALL, DO, CHECKPOINT, COPY FROM, EXPLAIN, and SHOW.
      It might be possible to allow additional commands with more work
      and thought.
      
      Along the way, document the thinking process behind the current set
      of checks, as per discussion especially with Peter Eisentraut. There
      is some interest in revising some of these rules, but that seems
      like a job for another patch.
      
      Patch by me, reviewed by Tom Lane, Stephen Frost, and Peter
      Eisentraut.
      
      Discussion: http://postgr.es/m/CA+TgmoZ_rLqJt5sYkvh+JpQnfX0Y+B2R+qfi820xNih6x-FQOQ@mail.gmail.com
      2eb34ac3
    • Tom Lane's avatar
      Minor code beautification in regexp.c. · 0db7c670
      Tom Lane authored
      Remove duplicated code (apparently introduced by commit c8ea87e4).
      Also get rid of some PG_USED_FOR_ASSERTS_ONLY variables we don't
      really need to have.
      
      Li Japin, Tom Lane
      
      Discussion: https://postgr.es/m/PS1PR0601MB3770A5595B6E5E3FD6F35724B6360@PS1PR0601MB3770.apcprd06.prod.outlook.com
      0db7c670
  2. 15 Jan, 2020 5 commits
    • Tom Lane's avatar
      Restructure ALTER TABLE execution to fix assorted bugs. · 1281a5c9
      Tom Lane authored
      We've had numerous bug reports about how (1) IF NOT EXISTS clauses in
      ALTER TABLE don't behave as-expected, and (2) combining certain actions
      into one ALTER TABLE doesn't work, though executing the same actions as
      separate statements does.  This patch cleans up all of the cases so far
      reported from the field, though there are still some oddities associated
      with identity columns.
      
      The core problem behind all of these bugs is that we do parse analysis
      of ALTER TABLE subcommands too soon, before starting execution of the
      statement.  The root of the bugs in group (1) is that parse analysis
      schedules derived commands (such as a CREATE SEQUENCE for a serial
      column) before it's known whether the IF NOT EXISTS clause should cause
      a subcommand to be skipped.  The root of the bugs in group (2) is that
      earlier subcommands may change the catalog state that later subcommands
      need to be parsed against.
      
      Hence, postpone parse analysis of ALTER TABLE's subcommands, and do
      that one subcommand at a time, during "phase 2" of ALTER TABLE which
      is the phase that does catalog rewrites.  Thus the catalog effects
      of earlier subcommands are already visible when we analyze later ones.
      (The sole exception is that we do parse analysis for ALTER COLUMN TYPE
      subcommands during phase 1, so that their USING expressions can be
      parsed against the table's original state, which is what we need.
      Arguably, these bugs stem from falsely concluding that because ALTER
      COLUMN TYPE must do early parse analysis, every other command subtype
      can too.)
      
      This means that ALTER TABLE itself must deal with execution of any
      non-ALTER-TABLE derived statements that are generated by parse analysis.
      Add a suitable entry point to utility.c to accept those recursive
      calls, and create a struct to pass through the information needed by
      the recursive call, rather than making the argument lists of
      AlterTable() and friends even longer.
      
      Getting this to work correctly required a little bit of fiddling
      with the subcommand pass structure, in particular breaking up
      AT_PASS_ADD_CONSTR into multiple passes.  But otherwise it's mostly
      a pretty straightforward application of the above ideas.
      
      Fixing the residual issues for identity columns requires refactoring of
      where the dependency link from an identity column to its sequence gets
      set up.  So that seems like suitable material for a separate patch,
      especially since this one is pretty big already.
      
      Discussion: https://postgr.es/m/10365.1558909428@sss.pgh.pa.us
      1281a5c9
    • Alvaro Herrera's avatar
      Report progress of ANALYZE commands · a166d408
      Alvaro Herrera authored
      This uses the progress reporting infrastructure added by c16dc1ac,
      adding support for ANALYZE.
      Co-authored-by: default avatarÁlvaro Herrera <alvherre@alvh.no-ip.org>
      Co-authored-by: default avatarTatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
      Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi,
      	Vignesh C, Amit Langote
      a166d408
    • Peter Eisentraut's avatar
      Remove libpq.rc, use win32ver.rc for libpq · 16a4a3d5
      Peter Eisentraut authored
      For historical reasons, libpq used a separate libpq.rc file for the
      Windows builds while all other components use a common file
      win32ver.rc.  With a bit of tweaking, the libpq build can also use the
      win32ver.rc file.  This removes a bit of duplicative code.
      Reviewed-by: default avatarKyotaro Horiguchi <horikyota.ntt@gmail.com>
      Reviewed-by: default avatarMichael Paquier <michael@paquier.xyz>
      Discussion: https://www.postgresql.org/message-id/flat/ad505e61-a923-e114-9f38-9867d161073f@2ndquadrant.com
      16a4a3d5
    • Michael Paquier's avatar
      Fix buggy logic in isTempNamespaceInUse() · ac5bdf62
      Michael Paquier authored
      The logic introduced in this routine as of 246a6c8f would report an
      incorrect result when a session calls it to check if the temporary
      namespace owned by the session is in use or not.  It is possible to
      optimize more the routine in this case to avoid a PGPROC lookup, but
      let's keep the logic simple.  As this routine is used only by autovacuum
      for now, there were no live bugs, still let's be correct for any future
      code involving it.
      
      Author: Michael Paquier
      Reviewed-by: Julien Rouhaud
      Discussion: https://postgr.es/m/20200113093703.GA41902@paquier.xyz
      Backpatch-through: 11
      ac5bdf62
    • Amit Kapila's avatar
      Introduce IndexAM fields for parallel vacuum. · 4d8a8d0c
      Amit Kapila authored
      Introduce new fields amusemaintenanceworkmem and amparallelvacuumoptions
      in IndexAmRoutine for parallel vacuum.  The amusemaintenanceworkmem tells
      whether a particular IndexAM uses maintenance_work_mem or not.  This will
      help in controlling the memory used by individual workers as otherwise,
      each worker can consume memory equal to maintenance_work_mem.  The
      amparallelvacuumoptions tell whether a particular IndexAM participates in
      a parallel vacuum and if so in which phase (bulkdelete, vacuumcleanup) of
      vacuum.
      
      Author: Masahiko Sawada and Amit Kapila
      Reviewed-by: Dilip Kumar, Amit Kapila, Tomas Vondra and Robert Haas
      Discussion:
      https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
      https://postgr.es/m/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com
      4d8a8d0c
  3. 14 Jan, 2020 9 commits
  4. 13 Jan, 2020 8 commits
    • Tom Lane's avatar
      Reduce size of backend scanner's tables. · 7f380c59
      Tom Lane authored
      Previously, the core scanner's yy_transition[] array had 37045 elements.
      Since that number is larger than INT16_MAX, Flex generated the array to
      contain 32-bit integers.  By reimplementing some of the bulkier scanner
      rules, this patch reduces the array to 20495 elements.  The much smaller
      total length, combined with the consequent use of 16-bit integers for
      the array elements reduces the binary size by over 200kB.  This was
      accomplished in two ways:
      
      1. Consolidate handling of quote continuations into a new start condition,
      rather than duplicating that logic for five different string types.
      
      2. Treat Unicode strings and identifiers followed by a UESCAPE sequence
      as three separate tokens, rather than one.  The logic to de-escape
      Unicode strings is moved to the filter code in parser.c, which already
      had the ability to provide special processing for token sequences.
      While we could have implemented the conversion in the grammar, that
      approach was rejected for performance and maintainability reasons.
      
      Performance in microbenchmarks of raw parsing seems equal or slightly
      faster in most cases, and it's reasonable to expect that in real-world
      usage (with more competition for the CPU cache) there will be a larger
      win.  The exception is UESCAPE sequences; lexing those is about 10%
      slower, primarily because the scanner now has to be called three times
      rather than one.  This seems acceptable since that feature is very
      rarely used.
      
      The psql and epcg lexers are likewise modified, primarily because we
      want to keep them all in sync.  Since those lexers don't use the
      space-hogging -CF option, the space savings is much less, but it's
      still good for perhaps 10kB apiece.
      
      While at it, merge the ecpg lexer's handling of C-style comments used
      in SQL and in C.  Those have different rules regarding nested comments,
      but since we already have the ability to keep track of the previous
      start condition, we can use that to handle both cases within a single
      start condition.  This matches the core scanner more closely.
      
      John Naylor
      
      Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com
      7f380c59
    • Peter Eisentraut's avatar
      Fix base backup with database OIDs larger than INT32_MAX · 259bbe17
      Peter Eisentraut authored
      The use of pg_atoi() for parsing a string into an Oid fails for values
      larger than INT32_MAX, since OIDs are unsigned.  Instead, use
      atooid().  While this has less error checking, the contents of the
      data directory are expected to be trustworthy, so we don't need to go
      out of our way to do full error checking.
      
      Discussion: https://www.postgresql.org/message-id/flat/dea47fc8-6c89-a2b1-07e3-754ff1ab094b%402ndquadrant.com
      259bbe17
    • Amit Kapila's avatar
      Fix typo. · 23d0dfa8
      Amit Kapila authored
      Reported-by: Antonin Houska
      Author: Antonin Houska
      Backpatch-through: 11, where it was introduced
      Discussion: https://postgr.es/m/2246.1578900133@antos
      23d0dfa8
    • Michael Paquier's avatar
      Fix comment in heapam.c · 7689d907
      Michael Paquier authored
      Improvement per suggestion from Tom Lane.
      
      Author: Daniel Gustafsson
      Discussion: https://postgr.es/m/FED18699-4270-4778-8DA8-10F119A5ECF3@yesql.se
      7689d907
    • Andrew Dunstan's avatar
      cebf9d6e
    • Amit Kapila's avatar
      Delete empty pages in each pass during GIST VACUUM. · 4e514c61
      Amit Kapila authored
      Earlier, we use to postpone deleting empty pages till the second stage of
      vacuum to amortize the cost of scanning internal pages.  However, that can
      sometimes (say vacuum is canceled or errored between first and second
      stage) delay the pages to be recycled.
      
      Another thing is that to facilitate deleting empty pages in the second
      stage, we need to share the information about internal and empty pages
      between different stages of vacuum.  It will be quite tricky to share this
      information via DSM which is required for the upcoming parallel vacuum
      patch.
      
      Also, it will bring the logic to reclaim deleted pages closer to nbtree
      where we delete empty pages in each pass.
      
      Overall, the advantages of deleting empty pages in each pass outweigh the
      advantages of postponing the same.
      
      Author: Dilip Kumar, with changes by Amit Kapila
      Reviewed-by: Sawada Masahiko and Amit Kapila
      Discussion: https://postgr.es/m/CAA4eK1LGr+MN0xHZpJ2dfS8QNQ1a_aROKowZB+MPNep8FVtwAA@mail.gmail.com
      4e514c61
    • Tomas Vondra's avatar
      Apply multiple multivariate MCV lists when possible · eae056c1
      Tomas Vondra authored
      Until now we've only used a single multivariate MCV list per relation,
      covering the largest number of clauses. So for example given a query
      
          SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1
      
      and extended statistics on (a,b) and (c,d), we'd only pick and use one
      of them. This commit improves this by repeatedly picking and applying
      the best statistics (matching the largest number of remaining clauses)
      until no additional statistics is applicable.
      
      This greedy algorithm is simple, but may not be optimal. A different
      choice of statistics may leave fewer clauses unestimated and/or give
      better estimates for some other reason.
      
      This can however happen only when there are overlapping statistics, and
      selecting one makes it impossible to use the other. E.g. with statistics
      on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d).
      But it's not clear which option is the best one.
      
      We however assume cases like this are rare, and the easiest solution is
      to define statistics covering the whole group of correlated columns. In
      the future we might support overlapping stats, using some of the clauses
      as conditions (in conditional probability sense).
      
      Author: Tomas Vondra
      Reviewed-by: Mark Dilger, Kyotaro Horiguchi
      Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
      eae056c1
    • Tomas Vondra's avatar
      Apply all available functional dependencies · aaa67618
      Tomas Vondra authored
      When considering functional dependencies during selectivity estimation,
      it's not necessary to bother with selecting the best extended statistic
      object and then use just dependencies from it. We can simply consider
      all applicable functional dependencies at once.
      
      This means we need to deserialie all (applicable) dependencies before
      applying them to the clauses. This is a bit more expensive than picking
      the best statistics and deserializing dependencies for it. To minimize
      the additional cost, we ignore statistics that are not applicable.
      
      Author: Tomas Vondra
      Reviewed-by: Mark Dilger
      Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
      aaa67618
  5. 12 Jan, 2020 2 commits
    • Tom Lane's avatar
      Fix edge-case crashes and misestimation in range containment selectivity. · 652686a3
      Tom Lane authored
      When estimating the selectivity of "range_var <@ range_constant" or
      "range_var @> range_constant", if the upper (or respectively lower)
      bound of the range_constant was above the last bin of the range_var's
      histogram, the code would access uninitialized memory and potentially
      crash (though it seems the probability of a crash is quite low).
      Handle the endpoint cases explicitly to fix that.
      
      While at it, be more paranoid about the possibility of getting NaN
      or other silly results from the range type's subdiff function.
      And improve some comments.
      
      Ordinarily we'd probably add a regression test case demonstrating
      the bug in unpatched code.  But it's too hard to get it to crash
      reliably because of the uninitialized-memory dependence, so skip that.
      
      Per bug #16122 from Adam Scott.  It's been broken from the beginning,
      apparently, so backpatch to all supported branches.
      
      Diagnosis by Michael Paquier, patch by Andrey Borodin and Tom Lane.
      
      Discussion: https://postgr.es/m/16122-eb35bc248c806c15@postgresql.org
      652686a3
    • Michael Paquier's avatar
      Remove incorrect assertion for INSERT in logical replication's publisher · 1088729e
      Michael Paquier authored
      On the publisher, it was assumed that an INSERT change cannot happen for
      a relation with no replica identity.  However this is true only for a
      change that needs references to old rows, aka UPDATE or DELETE, so
      trying to use logical replication with a relation that has no replica
      identity led to an assertion failure in the publisher when issuing an
      INSERT.  This commit removes the incorrect assertion, and adds more
      regression tests to provide coverage for relations without replica
      identity.
      
      Reported-by: Neha Sharma
      Author: Dilip Kumar, Michael Paquier
      Reviewed-by: Andres Freund
      Discussion: https://postgr.es/m/CANiYTQsL1Hb8_Km08qd32svrqNumXLJeoGo014O7VZymgOhZEA@mail.gmail.com
      Backpatch-through: 10
      1088729e
  6. 11 Jan, 2020 4 commits
  7. 10 Jan, 2020 5 commits
  8. 09 Jan, 2020 3 commits
    • Tom Lane's avatar
      Skip tab-completion tests if envar SKIP_READLINE_TESTS is defined. · e7ee4331
      Tom Lane authored
      Experience so far suggests that getting these tests to pass on
      all libedit versions that are out there may be impossible, or
      require dumbing down the tests to the point of uselessness.
      So we need to provide a way to skip them when the user knows they'll
      fail.  An environment variable is probably the most convenient way
      to deal with this; it's easy for, e.g., a buildfarm animal's
      configuration to set up.
      
      Discussion: https://postgr.es/m/9594.1578586797@sss.pgh.pa.us
      e7ee4331
    • Tom Lane's avatar
      Reconsider the representation of join alias Vars. · 9ce77d75
      Tom Lane authored
      The core idea of this patch is to make the parser generate join alias
      Vars (that is, ones with varno pointing to a JOIN RTE) only when the
      alias Var is actually different from any raw join input, that is a type
      coercion and/or COALESCE is necessary to generate the join output value.
      Otherwise just generate varno/varattno pointing to the relevant join
      input column.
      
      In effect, this means that the planner's flatten_join_alias_vars()
      transformation is already done in the parser, for all cases except
      (a) columns that are merged by JOIN USING and are transformed in the
      process, and (b) whole-row join Vars.  In principle that would allow
      us to skip doing flatten_join_alias_vars() in many more queries than
      we do now, but we don't have quite enough infrastructure to know that
      we can do so --- in particular there's no cheap way to know whether
      there are any whole-row join Vars.  I'm not sure if it's worth the
      trouble to add a Query-level flag for that, and in any case it seems
      like fit material for a separate patch.  But even without skipping the
      work entirely, this should make flatten_join_alias_vars() faster,
      particularly where there are nested joins that it previously had to
      flatten recursively.
      
      An essential part of this change is to replace Var nodes'
      varnoold/varoattno fields with varnosyn/varattnosyn, which have
      considerably more tightly-defined meanings than the old fields: when
      they differ from varno/varattno, they identify the Var's position in
      an aliased JOIN RTE, and the join alias is what ruleutils.c should
      print for the Var.  This is necessary because the varno change
      destroyed ruleutils.c's ability to find the JOIN RTE from the Var's
      varno.
      
      Another way in which this change broke ruleutils.c is that it's no
      longer feasible to determine, from a JOIN RTE's joinaliasvars list,
      which join columns correspond to which columns of the join's immediate
      input relations.  (If those are sub-joins, the joinaliasvars entries
      may point to columns of their base relations, not the sub-joins.)
      But that was a horrid mess requiring a lot of fragile assumptions
      already, so let's just bite the bullet and add some more JOIN RTE
      fields to make it more straightforward to figure that out.  I added
      two integer-List fields containing the relevant column numbers from
      the left and right input rels, plus a count of how many merged columns
      there are.
      
      This patch depends on the ParseNamespaceColumn infrastructure that
      I added in commit 5815696b.  The biggest bit of code change is
      restructuring transformFromClauseItem's handling of JOINs so that
      the ParseNamespaceColumn data is propagated upward correctly.
      
      Other than that and the ruleutils fixes, everything pretty much
      just works, though some processing is now inessential.  I grabbed
      two pieces of low-hanging fruit in that line:
      
      1. In find_expr_references, we don't need to recurse into join alias
      Vars anymore.  There aren't any except for references to merged USING
      columns, which are more properly handled when we scan the join's RTE.
      This change actually fixes an edge-case issue: we will now record a
      dependency on any type-coercion function present in a USING column's
      joinaliasvar, even if that join column has no references in the query
      text.  The odds of the missing dependency causing a problem seem quite
      small: you'd have to posit somebody dropping an implicit cast between
      two data types, without removing the types themselves, and then having
      a stored rule containing a whole-row Var for a join whose USING merge
      depends on that cast.  So I don't feel a great need to change this in
      the back branches.  But in theory this way is more correct.
      
      2. markRTEForSelectPriv and markTargetListOrigin don't need to recurse
      into join alias Vars either, because the cases they care about don't
      apply to alias Vars for USING columns that are semantically distinct
      from the underlying columns.  This removes the only case in which
      markVarForSelectPriv could be called with NULL for the RTE, so adjust
      the comments to describe that hack as being strictly internal to
      markRTEForSelectPriv.
      
      catversion bump required due to changes in stored rules.
      
      Discussion: https://postgr.es/m/7115.1577986646@sss.pgh.pa.us
      9ce77d75
    • Robert Haas's avatar
      Add pg_shmem_allocations view. · ed10f32e
      Robert Haas authored
      This tells you about allocations that have been made from the main
      shared memory segment. The original patch also tried to show information
      about dynamic shared memory allocation as well, but I decided to
      leave that problem for another time.
      
      Andres Freund and Robert Haas, reviewed by Michael Paquier, Marti
      Raudsepp, Tom Lane, Álvaro Herrera, and Kyotaro Horiguchi.
      
      Discussion: http://postgr.es/m/20140504114417.GM12715@awork2.anarazel.de
      ed10f32e