1. 20 Mar, 2019 4 commits
    • Peter Geoghegan's avatar
      Make heap TID a tiebreaker nbtree index column. · dd299df8
      Peter Geoghegan authored
      Make nbtree treat all index tuples as having a heap TID attribute.
      Index searches can distinguish duplicates by heap TID, since heap TID is
      always guaranteed to be unique.  This general approach has numerous
      benefits for performance, and is prerequisite to teaching VACUUM to
      perform "retail index tuple deletion".
      
      Naively adding a new attribute to every pivot tuple has unacceptable
      overhead (it bloats internal pages), so suffix truncation of pivot
      tuples is added.  This will usually truncate away the "extra" heap TID
      attribute from pivot tuples during a leaf page split, and may also
      truncate away additional user attributes.  This can increase fan-out,
      especially in a multi-column index.  Truncation can only occur at the
      attribute granularity, which isn't particularly effective, but works
      well enough for now.  A future patch may add support for truncating
      "within" text attributes by generating truncated key values using new
      opclass infrastructure.
      
      Only new indexes (BTREE_VERSION 4 indexes) will have insertions that
      treat heap TID as a tiebreaker attribute, or will have pivot tuples
      undergo suffix truncation during a leaf page split (on-disk
      compatibility with versions 2 and 3 is preserved).  Upgrades to version
      4 cannot be performed on-the-fly, unlike upgrades from version 2 to
      version 3.  contrib/amcheck continues to work with version 2 and 3
      indexes, while also enforcing stricter invariants when verifying version
      4 indexes.  These stricter invariants are the same invariants described
      by "3.1.12 Sequencing" from the Lehman and Yao paper.
      
      A later patch will enhance the logic used by nbtree to pick a split
      point.  This patch is likely to negatively impact performance without
      smarter choices around the precise point to split leaf pages at.  Making
      these two mostly-distinct sets of enhancements into distinct commits
      seems like it might clarify their design, even though neither commit is
      particularly useful on its own.
      
      The maximum allowed size of new tuples is reduced by an amount equal to
      the space required to store an extra MAXALIGN()'d TID in a new high key
      during leaf page splits.  The user-facing definition of the "1/3 of a
      page" restriction is already imprecise, and so does not need to be
      revised.  However, there should be a compatibility note in the v12
      release notes.
      
      Author: Peter Geoghegan
      Reviewed-By: Heikki Linnakangas, Alexander Korotkov
      Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
      dd299df8
    • Peter Geoghegan's avatar
      Refactor nbtree insertion scankeys. · e5adcb78
      Peter Geoghegan authored
      Use dedicated struct to represent nbtree insertion scan keys.  Having a
      dedicated struct makes the difference between search type scankeys and
      insertion scankeys a lot clearer, and simplifies the signature of
      several related functions.  This is based on a suggestion by Andrey
      Lepikhov.
      
      Streamline how unique index insertions cache binary search progress.
      Cache the state of in-progress binary searches within _bt_check_unique()
      for later instead of having callers avoid repeating the binary search in
      an ad-hoc manner.  This makes it easy to add a new optimization:
      _bt_check_unique() now falls out of its loop immediately in the common
      case where it's already clear that there couldn't possibly be a
      duplicate.
      
      The new _bt_check_unique() scheme makes it a lot easier to manage cached
      binary search effort afterwards, from within _bt_findinsertloc().  This
      is needed for the upcoming patch to make nbtree tuples unique by
      treating heap TID as a final tiebreaker column.  Unique key binary
      searches need to restore lower and upper bounds.  They cannot simply
      continue to use the >= lower bound as the offset to insert at, because
      the heap TID tiebreaker column must be used in comparisons for the
      restored binary search (unlike the original _bt_check_unique() binary
      search, where scankey's heap TID column must be omitted).
      
      Author: Peter Geoghegan, Heikki Linnakangas
      Reviewed-By: Heikki Linnakangas, Andrey Lepikhov
      Discussion: https://postgr.es/m/CAH2-WzmE6AhUdk9NdWBf4K3HjWXZBX3+umC7mH7+WDrKcRtsOw@mail.gmail.com
      e5adcb78
    • Alexander Korotkov's avatar
      Get rid of jsonpath_gram.h and jsonpath_scanner.h · 550b9d26
      Alexander Korotkov authored
      Jsonpath grammar and scanner are both quite small.  It doesn't worth complexity
      to compile them separately.  This commit makes grammar and scanner be compiled
      at once.  Therefore, jsonpath_gram.h and jsonpath_gram.h are no longer needed.
      This commit also does some reorganization of code in jsonpath_gram.y.
      
      Discussion: https://postgr.es/m/d47b2023-3ecb-5f04-d253-d557547cf74f%402ndQuadrant.com
      550b9d26
    • Alexander Korotkov's avatar
      Remove ambiguity for jsonb_path_match() and jsonb_path_exists() · 641fde25
      Alexander Korotkov authored
      There are 2-arguments and 4-arguments versions of jsonb_path_match() and
      jsonb_path_exists().  But 4-arguments versions have optional 3rd and 4th
      arguments, that leads to ambiguity.  In the same time 2-arguments versions are
      needed only for @@ and @? operators.  So, rename 2-arguments versions to
      remove the ambiguity.
      
      Catversion is bumped.
      641fde25
  2. 19 Mar, 2019 11 commits
    • Tom Lane's avatar
      Restructure libpq's handling of send failures. · 1f39a1c0
      Tom Lane authored
      Originally, if libpq got a failure (e.g., ECONNRESET) while trying to
      send data to the server, it would just report that and wash its hands
      of the matter.  It was soon found that that wasn't a very pleasant way
      of coping with server-initiated disconnections, so we introduced a hack
      (pqHandleSendFailure) in the code that sends queries to make it peek
      ahead for server error reports before reporting the send failure.
      
      It now emerges that related cases can occur during connection setup;
      in particular, as of TLS 1.3 it's unsafe to assume that SSL connection
      failures will be reported by SSL_connect rather than during our first
      send attempt.  We could have fixed that in a hacky way by applying
      pqHandleSendFailure after a startup packet send failure, but
      (a) pqHandleSendFailure explicitly disclaims suitability for use in any
      state except query startup, and (b) the problem still potentially exists
      for other send attempts in libpq.
      
      Instead, let's fix this in a more general fashion by eliminating
      pqHandleSendFailure altogether, and instead arranging to postpone
      all reports of send failures in libpq until after we've made an
      attempt to read and process server messages.  The send failure won't
      be reported at all if we find a server message or detect input EOF.
      
      (Note: this removes one of the reasons why libpq typically overwrites,
      rather than appending to, conn->errorMessage: pqHandleSendFailure needed
      that behavior so that the send failure report would be replaced if we
      got a server message or read failure report.  Eventually I'd like to get
      rid of that overwrite behavior altogether, but today is not that day.
      For the moment, pqSendSome is assuming that its callees will overwrite
      not append to conn->errorMessage.)
      
      Possibly this change should get back-patched someday; but it needs
      testing first, so let's not consider that till after v12 beta.
      
      Discussion: https://postgr.es/m/CAEepm=2n6Nv+5tFfe8YnkUm1fXgvxR0Mm1FoD+QKG-vLNGLyKg@mail.gmail.com
      1f39a1c0
    • Alexander Korotkov's avatar
      Rename typedef in jsonpath_gram.y from "string" to "JsonPathString" · 5e28b778
      Alexander Korotkov authored
      Reason is the same as in 75c57058.
      5e28b778
    • Peter Geoghegan's avatar
      Tweak nbtsearch.c function prototype order. · 1009920a
      Peter Geoghegan authored
      nbtsearch.c's static function prototypes were slightly out of order.
      Make the order consistent with static function definition order.
      1009920a
    • Tom Lane's avatar
      Make checkpoint requests more robust. · 0dfe3d0e
      Tom Lane authored
      Commit 6f6a6d8b introduced a delay of up to 2 seconds if we're trying
      to request a checkpoint but the checkpointer hasn't started yet (or,
      much less likely, our kill() call fails).  However buildfarm experience
      shows that that's not quite enough for slow or heavily-loaded machines.
      There's no good reason to assume that the checkpointer won't start
      eventually, so we may as well make the timeout much longer, say 60 sec.
      
      However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad
      idea to be waiting at all, much less for as long as 60 sec.  We can
      remove the need for that, and make this whole thing more robust, by
      adjusting the code so that the existence of a pending checkpoint
      request is clear from the contents of shared memory, and making sure
      that the checkpointer process will notice it at startup even if it did
      not get a signal.  In this way there's no need for a non-CHECKPOINT_WAIT
      call to wait at all; if it can't send the signal, it can nonetheless
      assume that the checkpointer will eventually service the request.
      
      A potential downside of this change is that "kill -INT" on the checkpointer
      process is no longer enough to trigger a checkpoint, should anyone be
      relying on something so hacky.  But there's no obvious reason to do it
      like that rather than issuing a plain old CHECKPOINT command, so we'll
      assume that nobody is.  There doesn't seem to be a way to preserve this
      undocumented quasi-feature without introducing race conditions.
      
      Since a principal reason for messing with this is to prevent intermittent
      buildfarm failures, back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us
      0dfe3d0e
    • Peter Eisentraut's avatar
      Reorder LOCALLOCK structure members to compact the size · 28988a84
      Peter Eisentraut authored
      Save 8 bytes (on x86-64) by filling up padding holes.
      
      Author: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>
      Discussion: https://www.postgresql.org/message-id/20190219001639.ft7kxir2iz644alf@alap3.anarazel.de
      28988a84
    • Alexander Korotkov's avatar
      Rename typedef in jsonpath_scan.l from "keyword" to "JsonPathKeyword" · 75c57058
      Alexander Korotkov authored
      Typedef name should be both unique and non-intersect with variable names
      across all the sources.  That makes both pg_indent and debuggers happy.
      
      Discussion: https://postgr.es/m/23865.1552936099%40sss.pgh.pa.us
      75c57058
    • Peter Eisentraut's avatar
      Ignore attempts to add TOAST table to shared or catalog tables · 590a8702
      Peter Eisentraut authored
      Running ALTER TABLE on any table will check if a TOAST table needs to be
      added.  On shared tables, this would previously fail, thus effectively
      disabling ALTER TABLE for those tables.  On (non-shared) system
      catalogs, on the other hand, it would add a TOAST table, even though we
      don't really want TOAST tables on some system catalogs.  In some cases,
      it would also fail with an error "AccessExclusiveLock required to add
      toast table.", depending on what locks the ALTER TABLE actions had
      already taken.
      
      So instead, just ignore attempts to add TOAST tables to such tables,
      outside of bootstrap mode, pretending they don't need one.
      
      This allows running ALTER TABLE on such tables without messing up the
      TOAST situation.  Legitimate uses for ALTER TABLE on system catalogs
      include setting reloptions (say, fillfactor or autovacuum settings).
      
      (All this still requires allow_system_table_mods, which is independent
      of this.)
      
      Discussion: https://www.postgresql.org/message-id/flat/e49f825b-fb25-0bc8-8afc-d5ad895c7975@2ndquadrant.com
      590a8702
    • Peter Eisentraut's avatar
      Fix whitespace · e537ac51
      Peter Eisentraut authored
      e537ac51
    • Peter Eisentraut's avatar
      Fix bug in support for collation attributes on older ICU versions · 1f050c08
      Peter Eisentraut authored
      Unrecognized attribute names are supposed to be ignored.  But the code
      would error out on an unrecognized attribute value even if it did not
      recognize the attribute name.  So unrecognized attributes wouldn't
      really be ignored unless the value happened to be one that matched a
      recognized value.  This would break some important cases where the
      attribute would be processed by ucol_open() directly.  Fix that and
      add a test case.
      
      The restructured code should also avoid compiler warnings about
      initializing a UColAttribute value to -1, because the type might be an
      unsigned enum.  (reported by Andres Freund)
      1f050c08
    • Robert Haas's avatar
      Fix copyfuncs/equalfuncs support for VacuumStmt. · 53680c11
      Robert Haas authored
      Commit 6776142a failed to do this,
      and the buildfarm broke.
      
      Patch by me, per advice from Tom Lane and Michael Paquier.
      
      Discussion: http://postgr.es/m/13988.1552960403@sss.pgh.pa.us
      53680c11
    • Andrew Gierth's avatar
      Implement OR REPLACE option for CREATE AGGREGATE. · 01bde4fa
      Andrew Gierth authored
      Aggregates have acquired a dozen or so optional attributes in recent
      years for things like parallel query and moving-aggregate mode; the
      lack of an OR REPLACE option to add or change these for an existing
      agg makes extension upgrades gratuitously hard. Rectify.
      01bde4fa
  3. 18 Mar, 2019 12 commits
    • Tom Lane's avatar
      Fix memory leak in printtup.c. · f2004f19
      Tom Lane authored
      Commit f2dec34e changed things so that printtup's output stringinfo
      buffer was allocated outside the per-row temporary context, not inside
      it.  This creates a need to free that buffer explicitly when the temp
      context is freed, but that was overlooked.  In most cases, this is all
      happening inside a portal or executor context that will go away shortly
      anyhow, but that's not always true.  Notably, the stringinfo ends up
      getting leaked when JDBC uses row-at-a-time fetches.  For a query
      that returns wide rows, that adds up after awhile.
      
      Per bug #15700 from Matthias Otterbach.  Back-patch to v11 where the
      faulty code was added.
      
      Discussion: https://postgr.es/m/15700-8c408321a87d56bb@postgresql.org
      f2004f19
    • Andres Freund's avatar
      Fix typos in sgml docs about RefetchForeignRow(). · 11180a50
      Andres Freund authored
      I screwed this up in ad0bda5d.
      
      Reported-By: Jie Zhang, Michael Paquier, Etsuro Fujita
      Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2DA203@G08CNEXMBPEKD02.g08.fujitsu.local
      11180a50
    • Andres Freund's avatar
      Remove leftover reference to oid column. · 7571ce6f
      Andres Freund authored
      I (Andres) missed this in 578b2297.
      
      Author: John Naylor
      Discussion: https://postgr.es/m/CACPNZCtd+ckUgibRFs9KewK4Yr5rj3Oipefquupw+XJZebFhrA@mail.gmail.com
      7571ce6f
    • Robert Haas's avatar
      Don't auto-restart per-database autoprewarm workers. · 1459e84c
      Robert Haas authored
      We should try to prewarm each database only once.  Otherwise, if
      prewarming fails for some reason, it will just keep retrying in an
      infnite loop.  This can happen if, for example, the database has been
      dropped.  The existing code was intended to implement the try-once
      behavior, but failed to do so because it neglected to set
      worker.bgw_restart_time to BGW_NEVER_RESTART.
      
      Mithun Cy, per a report from Hans Buschmann
      
      Discussion: http://postgr.es/m/CA+hUKGKpQJCWcgyy3QTC9vdn6uKAR_8r__A-MMm2GYfj45caag@mail.gmail.com
      1459e84c
    • Robert Haas's avatar
      Revise parse tree representation for VACUUM and ANALYZE. · 6776142a
      Robert Haas authored
      Like commit f41551f6, this aims
      to make it easier to add non-Boolean options to VACUUM (or, in
      this case, to ANALYZE).  Instead of building up a bitmap of
      options directly in the parser, build up a list of DefElem
      objects and let ExecVacuum() sort it out; right now, we make
      no use of the fact that a DefElem can carry an associated value,
      but it will be easy to make that change in the future.
      
      Masahiko Sawada
      
      Discussion: http://postgr.es/m/CAD21AoATE4sn0jFFH3NcfUZXkU2BMbjBWB_kDj-XWYA-LXDcQA@mail.gmail.com
      6776142a
    • Robert Haas's avatar
      Fold vacuum's 'int options' parameter into VacuumParams. · f41551f6
      Robert Haas authored
      Many places need both, so this allows a few functions to take one
      fewer parameter.  More importantly, as soon as we add a VACUUM
      option that takes a non-Boolean parameter, we need to replace
      'int options' with a struct, and it seems better to think
      of adding more fields to VacuumParams rather than passing around
      both VacuumParams and a separate struct as well.
      
      Patch by me, reviewed by Masahiko Sawada
      
      Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com
      f41551f6
    • Peter Eisentraut's avatar
      Fix optimization of foreign-key on update actions · 1ffa59a8
      Peter Eisentraut authored
      In RI_FKey_pk_upd_check_required(), we check among other things
      whether the old and new key are equal, so that we don't need to run
      cascade actions when nothing has actually changed.  This was using the
      equality operator.  But the effect of this is that if a value in the
      primary key is changed to one that "looks" different but compares as
      equal, the update is not propagated.  (Examples are float -0 and 0 and
      case-insensitive text.)  This appears to violate the SQL standard, and
      it also behaves inconsistently if in a multicolumn key another key is
      also updated that would cause the row to compare as not equal.
      
      To fix, if we are looking at the PK table in ri_KeysEqual(), then do a
      bytewise comparison similar to record_image_eq() instead of using the
      equality operators.  This only makes a difference for ON UPDATE
      CASCADE, but for consistency we treat all changes to the PK the same.  For
      the FK table, we continue to use the equality operators.
      
      Discussion: https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a@2ndquadrant.com
      1ffa59a8
    • Peter Eisentraut's avatar
      Remove unused macro · fb580653
      Peter Eisentraut authored
      It has never been used.
      fb580653
    • Alexander Korotkov's avatar
      Revert 4178d8b9 · a0478b69
      Alexander Korotkov authored
      As it was agreed to worsen the code readability.
      
      Discussion: https://postgr.es/m/ecfcfb5f-3233-eaa9-0c83-07056fb49a83%402ndquadrant.com
      a0478b69
    • Michael Paquier's avatar
      Refactor more code logic to update the control file · 8b938d36
      Michael Paquier authored
      ce6afc68 has begun the refactoring work by plugging pg_rewind into a
      central routine to update the control file, and left around two extra
      copies, with one in xlog.c for the backend and one in pg_resetwal.c.  By
      adding an extra option to the central routine in controldata_utils.c to
      control if a flush of the control file needs to be done, it is proving
      to be straight-forward to make xlog.c and pg_resetwal.c use the central
      code path at the condition of moving the wait event tracking there.
      Hence, this allows to have only one central code path to update the
      control file, shaving the code from the duplicates.
      
      This refactoring actually fixes a problem in pg_resetwal.  Previously,
      the control file was first removed before being recreated.  So if a
      crash happened between the moment the file was removed and the moment
      the file was created, then it would have been possible to not have a
      control file anymore in the database folder.
      
      Author: Fabien Coelho
      Reviewed-by: Michael Paquier
      Discussion: https://postgr.es/m/alpine.DEB.2.21.1903170935210.2506@lancre
      8b938d36
    • Michael Paquier's avatar
      Fix pg_rewind when rewinding new database with tables included · a7eadaaa
      Michael Paquier authored
      This fixes an issue introduced by 266b6acb, which has added filters to
      exclude file patterns on the target and source data directories to
      reduce the number of files transferred.  Filters get applied to both
      the target and source data files, and include pg_internal.init which is
      present for each database once relations are created on it.  However, if
      the target differed from the source with at least one new database with
      relations, the rewind would fail due to the exclusion filters applied on
      the target files, causing pg_internal.init to still be present on the
      target database folder, while its contents should have been completely
      removed so as there is nothing remaining inside at the time of the
      folder deletion.
      
      Applying exclusion filters on the source files is fine, because this way
      the amount of data copied from the source to the target is reduced.  And
      actually, not applying the filters on the target is what pg_rewind
      should do, because this causes such files to be automatically removed
      during the rewind on the target.  Exclusion filters apply to paths which
      are removed or recreated automatically at startup, so removing all those
      files on the target during the rewind is a win.
      
      The existing set of TAP tests already stresses the rewind of databases,
      but it did not include any tables on those newly-created databases.
      Creating extra tables in this case is enough to reproduce the failure,
      so the existing tests are extended to close the gap.
      
      Reported-by: Mithun Cy
      Author: Michael Paquier
      Discussion: https://postgr.es/m/CADq3xVYt6_pO7ZzmjOqPgY9HWsL=kLd-_tNyMtdfjKqEALDyTA@mail.gmail.com
      Backpatch-through: 11
      a7eadaaa
    • Michael Paquier's avatar
      Error out in pg_checksums on incompatible block size · fa339565
      Michael Paquier authored
      pg_checksums is compiled with a given block size and has a hard
      dependency to it per the way checksums are calculated via
      checksum_impl.h, and trying to use the tool on a data folder which has
      not the same block size would result in incorrect checksum calculations
      and/or block read errors, meaning that the data folder is corrupted.
      This is harmless as checksums are only checked now, but very confusing
      for the user so issue an error properly if the block size used at
      compilation and the block size used in the data folder do not match.
      
      Reported-by: Sergei Kornilov
      Author: Michael Banck, Michael Paquier
      Reviewed-by: Fabien Coelho, Magnus Hagander
      Discussion: https://postgr.es/m/20190317054657.GA3357@paquier.xyz
      ackpatch-through: 11
      fa339565
  4. 17 Mar, 2019 6 commits
  5. 16 Mar, 2019 7 commits