1. 20 Dec, 2018 9 commits
    • Alexander Korotkov's avatar
      Check for conflicting queries during replay of gistvacuumpage() · c952eae5
      Alexander Korotkov authored
      013ebc0a implements so-called GiST microvacuum.  That is gistgettuple() marks
      index tuples as dead when kill_prior_tuple is set.  Later, when new tuple
      insertion claims page space, those dead index tuples are physically deleted
      from page.  When this deletion is replayed on standby, it might conflict with
      read-only queries.  But 013ebc0a doesn't handle this.  That may lead to
      disappearance of some tuples from read-only snapshots on standby.
      
      This commit implements resolving of conflicts between replay of GiST microvacuum
      and standby queries.  On the master we implement new WAL record type
      XLOG_GIST_DELETE, which comprises necessary information.  On stable releases
      we've to be tricky to keep WAL compatibility.  Information required for conflict
      processing is just appended to data of XLOG_GIST_PAGE_UPDATE record.  So,
      PostgreSQL version, which doesn't know about conflict processing, will just
      ignore that.
      
      Reported-by: Andres Freund
      Diagnosed-by: Andres Freund
      Discussion: https://postgr.es/m/20181212224524.scafnlyjindmrbe6%40alap3.anarazel.de
      Author: Alexander Korotkov
      Backpatch-through: 9.6
      c952eae5
    • Tom Lane's avatar
      Base information_schema.sql_identifier domain on name, not varchar. · 7c15cef8
      Tom Lane authored
      The SQL spec says that sql_identifier is a domain over varchar,
      but it also says that that domain is supposed to represent the set
      of valid identifiers for the implementation, in particular applying
      a length limit matching the implementation's identifier length limit.
      We were declaring sql_identifier as just "character varying", thus
      duplicating what the spec says about base type, but entirely failing
      at the rest of it.
      
      Instead, let's declare sql_identifier as a domain over type "name".
      (We can drop the COLLATE "C" added by commit 6b0faf72, since that's
      now implicit in "name".)  With the recent improvements to name's
      comparison support, there's not a lot of functional difference between
      name and varchar.  So although in principle this is a spec deviation,
      it's a pretty minor one.  And correctly enforcing PG's name length limit
      is a good thing; on balance this seems closer to the intent of the spec
      than what we had.
      
      But that's all just language-lawyering.  The *real* reason to do this is
      that it makes sql_identifier columns exposed by information_schema views
      be just direct representations of the underlying "name" catalog columns,
      eliminating a semantic mismatch that was disastrous for performance of
      typical queries on the information_schema.  In combination with the
      recent change to allow dropping no-op CoerceToDomain nodes, this allows
      (for example) queries such as
      
          select ... from information_schema.tables where table_name = 'foo';
      
      to produce an indexscan rather than a seqscan on pg_class.
      
      Discussion: https://postgr.es/m/CAFj8pRBUCX4LZ2rA2BbEkdD6NN59mgx+BLo1gO08Wod4RLtcTg@mail.gmail.com
      7c15cef8
    • Tom Lane's avatar
      Avoid producing over-length specific_name outputs in information_schema. · 5bbee34d
      Tom Lane authored
      information_schema output columns that are declared as being type
      sql_identifier are supposed to conform to the implementation's rules
      for valid identifiers, in particular the identifier length limit.
      Several places potentially violated this limit by concatenating a
      function's name and OID.  (The OID is added to ensure name uniqueness
      within a schema, since the spec doesn't expect function name overloading.)
      
      Simply truncating the concatenation result to fit in "name" won't do,
      since losing part of the OID might wind up giving non-unique results.
      Instead, let's truncate the function name as necessary.
      
      The most practical way to do that is to do it in a C function; the
      information_schema.sql script doesn't have easy access to the value
      of NAMEDATALEN, nor does it have an easy way to truncate on the basis
      of resulting byte-length rather than number of characters.
      
      (There are still a couple of places that cast concatenation results to
      sql_identifier, but as far as I can see they are guaranteed not to produce
      over-length strings, at least with the normal value of NAMEDATALEN.)
      
      Discussion: https://postgr.es/m/23817.1545283477@sss.pgh.pa.us
      5bbee34d
    • Alvaro Herrera's avatar
      Fix lock level used for partition when detaching it · 7b14bcc0
      Alvaro Herrera authored
      For probably bogus reasons, we acquire only AccessShareLock on the
      partition when we try to detach it from its parent partitioned table.
      This can cause ugly things to happen if another transaction is doing
      any sort of DDL to the partition concurrently.
      
      Upgrade that lock to ShareUpdateExclusiveLock, which per discussion
      seems to be the minimum needed.
      
      Reported by Robert Haas.
      
      Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
      7b14bcc0
    • Tom Lane's avatar
      Doc: fix ancient mistake in search_path documentation. · 42bdf853
      Tom Lane authored
      "$user" in a search_path string is replaced by CURRENT_USER not
      SESSION_USER.  (It actually was SESSION_USER in the initial implementation,
      but we changed it shortly later, and evidently forgot to fix the docs to
      match.)
      
      Noted by antonov@stdpr.ru
      
      Discussion: https://postgr.es/m/159151fb45d490c8d31ea9707e9ba99d@stdpr.ru
      42bdf853
    • Tom Lane's avatar
      Make bitmapset.c use 64-bit bitmap words on 64-bit machines. · 216af5ee
      Tom Lane authored
      Using the full width of the CPU's native word size shouldn't cost
      anything in typical cases.  When working with large bitmapsets,
      this halves the number of operations needed for many common BMS
      operations.  On the right sort of test case, a measurable improvement
      is obtained.
      
      David Rowley
      
      Discussion: https://postgr.es/m/CAKJS1f9EGBd2h-VkXvb=51tf+X46zMX5T8h-KYgXEV_u2zmLUw@mail.gmail.com
      216af5ee
    • Alvaro Herrera's avatar
      DETACH PARTITION: hold locks on indexes until end of transaction · 0c237715
      Alvaro Herrera authored
      When a partition is detached from its parent, we acquire locks on all
      attached indexes to also detach them ... but we release those locks
      immediately.  This is a violation of the policy of keeping locks on user
      objects to the end of the transaction.  Bug introduced in 8b08f7d4.
      
      It's unclear that there are any ill effects possible, but it's clearly
      wrong nonetheless.  It's likely that bad behavior *is* possible, but
      mostly because the relation that the index is for is only locked with
      AccessShareLock, which is an older bug that shall be fixed separately.
      
      While touching that line of code, close the index opened with
      index_open() using index_close() instead of relation_close().
      No difference in practice, but let's be consistent.
      
      Unearthed by Robert Haas.
      
      Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
      0c237715
    • Michael Paquier's avatar
      Add more tab completion for CREATE TABLE in psql · 4cba9c2a
      Michael Paquier authored
      The following completion patterns are added:
      - CREATE TABLE <name> with '(', OF or PARTITION OF.
      - CREATE TABLE <name> OF with list of composite types.
      - CREATE TABLE name (...) with PARTITION OF, WITH, TABLESPACE, ON
      COMMIT (depending on the presence of a temporary table).
      - CREATE TABLE ON COMMIT with actions (only for temporary tables).
      
      Author: Dagfinn Ilmari Mannsåker
      Discussion: https://postgr.es/m/d8j1s77kdbb.fsf@dalvik.ping.uio.no
      4cba9c2a
    • Greg Stark's avatar
      Fix ADD IF NOT EXISTS used in conjunction with ALTER TABLE ONLY · 1075dfda
      Greg Stark authored
      The flag for IF NOT EXISTS was only being passed down in the normal
      recursing case. It's been this way since originally added in 9.6 in
      commit 2cd40adb so backpatch back to 9.6.
      1075dfda
  2. 19 Dec, 2018 8 commits
    • Tom Lane's avatar
      Add text-vs-name cross-type operators, and unify name_ops with text_ops. · 2ece7c07
      Tom Lane authored
      Now that name comparison has effectively the same behavior as text
      comparison, we might as well merge the name_ops opfamily into text_ops,
      allowing cross-type comparisons to be processed without forcing a
      datatype coercion first.  We need do little more than add cross-type
      operators to make the opfamily complete, and fix one or two places
      in the planner that assumed text_ops was a single-datatype opfamily.
      
      I chose to unify hash name_ops into hash text_ops as well, since the
      types have compatible hashing semantics.  This allows marking the
      new cross-type equality operators as oprcanhash.
      
      (Note: this doesn't remove the name_ops opclasses, so there's no
      breakage of index definitions.  Those opclasses are just reparented
      into the text_ops opfamily.)
      
      Discussion: https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
      2ece7c07
    • Tom Lane's avatar
      Make type "name" collation-aware. · 586b98fd
      Tom Lane authored
      The "name" comparison operators now all support collations, making them
      functionally equivalent to "text" comparisons, except for the different
      physical representation of the datatype.  They do, in fact, mostly share
      the varstr_cmp and varstr_sortsupport infrastructure, which has been
      slightly enlarged to handle the case.
      
      To avoid changes in the default behavior of the datatype, set name's
      typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that
      by default comparisons to a name value will continue to use strcmp
      semantics.  (This would have been the case for system catalog columns
      anyway, because of commit 6b0faf72, but doing this makes it true for
      user-created name columns as well.  In particular, this avoids
      locale-dependent changes in our regression test results.)
      
      In consequence, tweak a couple of places that made assumptions about
      collatable base types always having typcollation DEFAULT_COLLATION_OID.
      I have not, however, attempted to relax the restriction that user-
      defined collatable types must have that.  Hence, "name" doesn't
      behave quite like a user-defined type; it acts more like a domain
      with COLLATE "C".  (Conceivably, if we ever get rid of the need for
      catalog name columns to be fixed-length, "name" could actually become
      such a domain over text.  But that'd be a pretty massive undertaking,
      and I'm not volunteering.)
      
      Discussion: https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
      586b98fd
    • Alvaro Herrera's avatar
      Remove function names from error messages · 68f6f2b7
      Alvaro Herrera authored
      They are not necessary, and having them there gives useless work for
      translators.
      68f6f2b7
    • Tom Lane's avatar
      Small improvements for allocation logic in ginHeapTupleFastCollect(). · c6e394c1
      Tom Lane authored
      Avoid repetitive calls to repalloc() when the required size of the
      collector array grows more than 2x in one call.  Also ensure that the
      array size is a power of 2 (since palloc will probably consume a power
      of 2 anyway) and doesn't start out very small (which'd likely just lead
      to extra repallocs).
      
      David Rowley, tweaked a bit by me
      
      Discussion: https://postgr.es/m/CAKJS1f8vn-iSBE8PKeVHrnhvyjRNYCxguPFFY08QLYmjWG9hPQ@mail.gmail.com
      c6e394c1
    • Tom Lane's avatar
    • Peter Geoghegan's avatar
      Remove obsolete nbtree duplicate entries comment. · 61a4480a
      Peter Geoghegan authored
      Remove a comment from the Berkeley days claiming that nbtree must
      disambiguate duplicate keys within _bt_moveright().  There is no special
      care taken around duplicates within _bt_moveright(), at least since
      commit 9e85183b removed inscrutable _bt_moveright() code to handle
      pages full of duplicates.
      61a4480a
    • Peter Geoghegan's avatar
      Correct obsolete nbtree recovery comments. · 60f3cc95
      Peter Geoghegan authored
      Commit 40dae7ec, which made the handling of interrupted nbtree page
      splits more robust, removed an nbtree-specific end-of-recovery cleanup
      step.  This meant that it was no longer possible to complete an
      interrupted page split during recovery.  However, a reference to
      recovery as a reason for using a NULL stack while inserting into a
      parent page was missed.  Remove the reference.
      
      Remove a similar obsolete reference to recovery that was introduced much
      more recently, as part of the btree fastpath optimization enhancement
      that made it into Postgres 11 (commit 2b272734, and follow-up commits).
      
      Backpatch: 11-, where the fastpath optimization was introduced.
      60f3cc95
    • Tatsuo Ishii's avatar
      3cab5487
  3. 18 Dec, 2018 7 commits
    • Tom Lane's avatar
      Make collation-aware system catalog columns use "C" collation. · 6b0faf72
      Tom Lane authored
      Up to now we allowed text columns in system catalogs to use collation
      "default", but that isn't really safe because it might mean something
      different in template0 than it means in a database cloned from template0.
      In particular, this could mean that cloned pg_statistic entries for such
      columns weren't entirely valid, possibly leading to bogus planner
      estimates, though (probably) not any outright failures.
      
      In the wake of commit 5e092800, a better solution is available: if we
      label such columns with "C" collation, then their pg_statistic entries
      will also use that collation and hence will be valid independently of
      the database collation.
      
      This also provides a cleaner solution for indexes on such columns than
      the hack added by commit 0b28ea79: the indexes will naturally inherit
      "C" collation and don't have to be forced to use text_pattern_ops.
      
      Also, with the planned improvement of type "name" to be collation-aware,
      this policy will apply cleanly to both text and name columns.
      
      Because of the pg_statistic angle, we should also apply this policy
      to the tables in information_schema.  This patch does that by adjusting
      information_schema's textual domain types to specify "C" collation.
      That has the user-visible effect that order-sensitive comparisons to
      textual information_schema view columns will now use "C" collation
      by default.  The SQL standard says that the collation of those view
      columns is implementation-defined, so I think this is legal per spec.
      At some point this might allow for translation of such comparisons
      into indexable conditions on the underlying "name" columns, although
      additional work will be needed before that can happen.
      
      Discussion: https://postgr.es/m/19346.1544895309@sss.pgh.pa.us
      6b0faf72
    • Tom Lane's avatar
      Update sepgsql regression test results for commit ca410302. · b2d9e177
      Tom Lane authored
      Per buildfarm.
      b2d9e177
    • Tom Lane's avatar
      Fix ancient thinko in mergejoin cost estimation. · d364e881
      Tom Lane authored
      "rescanratio" was computed as 1 + rescanned-tuples / total-inner-tuples,
      which is sensible if it's to be multiplied by total-inner-tuples or a cost
      value corresponding to scanning all the inner tuples.  But in reality it
      was (mostly) multiplied by inner_rows or a related cost, numbers that take
      into account the possibility of stopping short of scanning the whole inner
      relation thanks to a limited key range in the outer relation.  This'd
      still make sense if we could expect that stopping short would result in a
      proportional decrease in the number of tuples that have to be rescanned.
      It does not, however.  The argument that establishes the validity of our
      estimate for that number is independent of whether we scan all of the inner
      relation or stop short, and experimentation also shows that stopping short
      doesn't reduce the number of rescanned tuples.  So the correct calculation
      is 1 + rescanned-tuples / inner_rows, and we should be sure to multiply
      that by inner_rows or a corresponding cost value.
      
      Most of the time this doesn't make much difference, but if we have
      both a high rescan rate (due to lots of duplicate values) and an outer
      key range much smaller than the inner key range, then the error can
      be significant, leading to a large underestimate of the cost associated
      with rescanning.
      
      Per report from Vijaykumar Jain.  This thinko appears to go all the way
      back to the introduction of the rescan estimation logic in commit
      70fba704, so back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/CAE7uO5hMb_TZYJcZmLAgO6iD68AkEK6qCe7i=vZUkCpoKns+EQ@mail.gmail.com
      d364e881
    • Michael Paquier's avatar
      Include partitioned indexes to system view pg_indexes · f94cec64
      Michael Paquier authored
      pg_tables already includes partitioned tables, so for consistency
      pg_indexes should show partitioned indexes.
      
      Author: Suraj Kharage
      Reviewed-by: Amit Langote, Michael Paquier
      Discussion: https://postgr.es/m/CAF1DzPVrYo4XNTEnc=PqVp6aLJc7LFYpYR4rX=_5pV=wJ2KdZg@mail.gmail.com
      f94cec64
    • Michael Paquier's avatar
      Tweak description comments in tests for partition functions · 3e514c12
      Michael Paquier authored
      The new wording is more generic and fixes one grammar mistake and one
      typo on the way.
      
      Per discussion between Amit Langote and me.
      
      Discussion: https://postgr.es/m/20181217064028.GJ31474@paquier.xyz
      3e514c12
    • Michael Paquier's avatar
      Update project link of pgBadger in documentation · ed6f15eb
      Michael Paquier authored
      The project has moved to a new place.
      
      Reported-by: Peter Neave
      Discussion: https://postgr.es/m/154474118231.5066.16352227860913505754@wrigleys.postgresql.org
      ed6f15eb
    • Michael Paquier's avatar
      Include ALTER INDEX SET STATISTICS in pg_dump · e4fca461
      Michael Paquier authored
      The new grammar pattern of ALTER INDEX SET STATISTICS able to use column
      numbers on top of the existing column names introduced by commit 5b6d13ee
      forgot to add support for the feature in pg_dump, so defining statistics
      on index columns was missing from the dumps, potentially causing silent
      planning problems with a subsequent restore.
      
      pg_dump ought to not use column names in what it generates as these are
      automatically generated by the server and could conflict with real
      relation attributes with matching patterns.  "expr" and "exprN", N
      incremented automatically after the creation of the first one, are used
      as default attribute names for index expressions, and that could easily
      match what is defined in other relations, causing the dumps to fail if
      some of those attributes are renamed at some point.  So to avoid any
      problems, the new grammar with column numbers gets used.
      
      Reported-by: Ronan Dunklau
      Author: Michael Paquier
      Reviewed-by: Tom Lane, Adrien Nayrat, Amul Sul
      Discussion: https://postgr.es/m/CAARsnT3UQ4V=yDNW468w8RqHfYiY9mpn2r_c5UkBJ97NAApUEw@mail.gmail.com
      Backpatch-through: 11, where the new syntax has been introduced.
      e4fca461
  4. 17 Dec, 2018 7 commits
  5. 16 Dec, 2018 2 commits
    • Tom Lane's avatar
      Make error handling in parallel pg_upgrade less bogus. · 16fda4b8
      Tom Lane authored
      reap_child() basically ignored the possibility of either an error in
      waitpid() itself or a child process failure on signal.  We don't really
      need to do more than report and crash hard, but proceeding as though
      nothing is wrong is definitely Not Acceptable.  The error report for
      nonzero child exit status was pretty off-point, as well.
      
      Noted while fooling around with child-process failure detection
      logic elsewhere.  It's been like this a long time, so back-patch to
      all supported branches.
      16fda4b8
    • Tom Lane's avatar
      Improve detection of child-process SIGPIPE failures. · ade2d61e
      Tom Lane authored
      Commit ffa4cbd6 added logic to detect SIGPIPE failure of a COPY child
      process, but it only worked correctly if the SIGPIPE occurred in the
      immediate child process.  Depending on the shell in use and the
      complexity of the shell command string, we might instead get back
      an exit code of 128 + SIGPIPE, representing a shell error exit
      reporting SIGPIPE in the child process.
      
      We could just hack up ClosePipeToProgram() to add the extra case,
      but it seems like this is a fairly general issue deserving a more
      general and better-documented solution.  I chose to add a couple
      of functions in src/common/wait_error.c, which is a natural place
      to know about wait-result encodings, that will test for either a
      specific child-process signal type or any child-process signal failure.
      Then, adjust other places that were doing ad-hoc tests of this type
      to use the common functions.
      
      In RestoreArchivedFile, this fixes a race condition affecting whether
      the process will report an error or just silently proc_exit(1): before,
      that depended on whether the intermediate shell got SIGTERM'd itself
      or reported a child process failing on SIGTERM.
      
      Like the previous patch, back-patch to v10; we could go further
      but there seems no real need to.
      
      Per report from Erik Rijkers.
      
      Discussion: https://postgr.es/m/f3683f87ab1701bea5d86a7742b22432@xs4all.nl
      ade2d61e
  6. 14 Dec, 2018 1 commit
    • Tom Lane's avatar
      Make pg_statistic and related code account more honestly for collations. · 5e092800
      Tom Lane authored
      When we first put in collations support, we basically punted on teaching
      pg_statistic, ANALYZE, and the planner selectivity functions about that.
      They've just used DEFAULT_COLLATION_OID independently of the actual
      collation of the data.  It's time to improve that, so:
      
      * Add columns to pg_statistic that record the specific collation associated
      with each statistics slot.
      
      * Teach ANALYZE to use the column's actual collation when comparing values
      for statistical purposes, and record this in the appropriate slot.  (Note
      that type-specific typanalyze functions are now expected to fill
      stats->stacoll with the appropriate collation, too.)
      
      * Teach assorted selectivity functions to use the actual collation of
      the stats they are looking at, instead of just assuming it's
      DEFAULT_COLLATION_OID.
      
      This should give noticeably better results in selectivity estimates for
      columns with nondefault collations, at least for query clauses that use
      that same collation (which would be the default behavior in most cases).
      It's still true that comparisons with explicit COLLATE clauses different
      from the stored data's collation won't be well-estimated, but that's no
      worse than before.  Also, this patch does make the first step towards
      doing better with that, which is that it's now theoretically possible to
      collect stats for a collation other than the column's own collation.
      
      Patch by me; thanks to Peter Eisentraut for review.
      
      Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
      5e092800
  7. 13 Dec, 2018 6 commits
    • Michael Paquier's avatar
      Introduce new extended routines for FDW and foreign server lookups · 8fb569e9
      Michael Paquier authored
      The cache lookup routines for foreign-data wrappers and foreign servers
      are extended with an extra argument to handle a set of flags.  The only
      value which can be used now is to indicate if a missing object should
      result in an error or not, and are designed to be extensible on need.
      Those new routines are added into the existing set of user-visible
      FDW APIs and documented in consequence.  They will be used for future
      patches to improve the SQL interface for object addresses.
      
      Author: Michael Paquier
      Reviewed-by: Álvaro Herrera
      Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
      8fb569e9
    • Andres Freund's avatar
      Create a separate oid range for oids assigned by genbki.pl. · 09568ec3
      Andres Freund authored
      The changes I made in 578b2297 assigned oids below
      FirstBootstrapObjectId to objects in include/catalog/*.dat files that
      did not have an oid assigned, starting at the max oid explicitly
      assigned.  Tom criticized that for mainly two reasons:
      1) It's not clear which values are manually and which explicitly
         assigned.
      2) The space below FirstBootstrapObjectId gets pretty crowded, and
         some PostgreSQL forks have used oids >= 9000 for their own objects,
         to avoid conflicting.
      
      Thus create a new range for objects not assigned explicit oids, but
      assigned by genbki.pl. For now 1-9999 is for explicitly assigned oids,
      FirstGenbkiObjectId (10000) to FirstBootstrapObjectId (1200) -1 is for
      genbki.pl assigned oids, and < FirstNormalObjectId (16384) is for oids
      assigned during bootstrap.  It's possible that we'll have to adjust
      these boundaries, but there's some headroom for now.
      
      Add a note suggesting that oids in forks should be assigned in the
      9000-9999 range.
      
      Catversion bump for obvious reasons.
      
      Per complaint from Tom Lane.
      
      Author: Andres Freund
      Discussion: https://postgr.es/m/16845.1544393682@sss.pgh.pa.us
      09568ec3
    • Tom Lane's avatar
      Fix bogus logic for skipping unnecessary partcollation dependencies. · 84d51488
      Tom Lane authored
      The idea here is to not call recordDependencyOn for the default collation,
      since we know that's pinned.  But what the code actually did was to record
      the partition key's dependency on the opclass twice, instead.
      
      Evidently introduced by sloppy coding in commit 2186b608.  Back-patch
      to v10 where that came in.
      84d51488
    • Tom Lane's avatar
      Drop no-op CoerceToDomain nodes from expressions at planning time. · 04fe805a
      Tom Lane authored
      If a domain has no constraints, then CoerceToDomain doesn't really do
      anything and can be simplified to a RelabelType.  This not only
      eliminates cycles at execution, but allows the planner to optimize better
      (for instance, match the coerced expression to an index on the underlying
      column).  However, we do have to support invalidating the plan later if
      a constraint gets added to the domain.  That's comparable to the case of
      a change to a SQL function that had been inlined into a plan, so all the
      necessary logic already exists for plans depending on functions.  We
      need only duplicate or share that logic for domains.
      
      ALTER DOMAIN ADD/DROP CONSTRAINT need to be taught to send out sinval
      messages for the domain's pg_type entry, since those operations don't
      update that row.  (ALTER DOMAIN SET/DROP NOT NULL do update that row,
      so no code change is needed for them.)
      
      Testing this revealed what's really a pre-existing bug in plpgsql:
      it caches the SQL-expression-tree expansion of type coercions and
      had no provision for invalidating entries in that cache.  Up to now
      that was only a problem if such an expression had inlined a SQL
      function that got changed, which is unlikely though not impossible.
      But failing to track changes of domain constraints breaks an existing
      regression test case and would likely cause practical problems too.
      
      We could fix that locally in plpgsql, but what seems like a better
      idea is to build some generic infrastructure in plancache.c to store
      standalone expressions and track invalidation events for them.
      (It's tempting to wonder whether plpgsql's "simple expression" stuff
      could use this code with lower overhead than its current use of the
      heavyweight plancache APIs.  But I've left that idea for later.)
      
      Other stuff fixed in passing:
      
      * Allow estimate_expression_value() to drop CoerceToDomain
      unconditionally, effectively assuming that the coercion will succeed.
      This will improve planner selectivity estimates for cases involving
      estimatable expressions that are coerced to domains.  We could have
      done this independently of everything else here, but there wasn't
      previously any need for eval_const_expressions_mutator to know about
      CoerceToDomain at all.
      
      * Use a dlist for plancache.c's list of cached plans, rather than a
      manually threaded singly-linked list.  That eliminates a potential
      performance problem in DropCachedPlan.
      
      * Fix a couple of inconsistencies in typecmds.c about whether
      operations on domains drop RowExclusiveLock on pg_type.  Our common
      practice is that DDL operations do drop catalog locks, so standardize
      on that choice.
      
      Discussion: https://postgr.es/m/19958.1544122124@sss.pgh.pa.us
      04fe805a
    • Alexander Korotkov's avatar
      Prevent GIN deleted pages from being reclaimed too early · 52ac6cd2
      Alexander Korotkov authored
      When GIN vacuum deletes a posting tree page, it assumes that no concurrent
      searchers can access it, thanks to ginStepRight() locking two pages at once.
      However, since 9.4 searches can skip parts of posting trees descending from the
      root.  That leads to the risk that page is deleted and reclaimed before
      concurrent search can access it.
      
      This commit prevents the risk of above by waiting for every transaction, which
      might wait to reference this page, to finish.  Due to binary compatibility
      we can't change GinPageOpaqueData to store corresponding transaction id.
      Instead we reuse page header pd_prune_xid field, which is unused in index pages.
      
      Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com
      Author: Andrey Borodin, Alexander Korotkov
      Reviewed-by: Alexander Korotkov
      Backpatch-through: 9.4
      52ac6cd2
    • Alexander Korotkov's avatar
      Prevent deadlock in ginRedoDeletePage() · c6ade7a8
      Alexander Korotkov authored
      On standby ginRedoDeletePage() can work concurrently with read-only queries.
      Those queries can traverse posting tree in two ways.
      1) Using rightlinks by ginStepRight(), which locks the next page before
         unlocking its left sibling.
      2) Using downlinks by ginFindLeafPage(), which locks at most one page at time.
      
      Original lock order was: page, parent, left sibling.  That lock order can
      deadlock with ginStepRight().  In order to prevent deadlock this commit changes
      lock order to: left sibling, page, parent.  Note, that position of parent in
      locking order seems insignificant, because we only lock one page at time while
      traversing downlinks.
      
      Reported-by: Chen Huajun
      Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin
      Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com
      Author: Alexander Korotkov
      Backpatch-through: 9.4
      c6ade7a8