1. 30 Apr, 2017 2 commits
    • Tom Lane's avatar
      Fix possible null pointer dereference or invalid warning message. · 12d11432
      Tom Lane authored
      Thinko in commit de438971: this warning message references the wrong
      "LogicalRepWorker *" variable.  This would often result in a core dump,
      but if it didn't, the message would show the wrong subscription OID.
      
      In passing, adjust the message text to format a subscription OID
      similarly to how that's done elsewhere in the function; and fix
      grammatical issues in some nearby messages.
      
      Per Coverity testing.
      12d11432
    • Tom Lane's avatar
      Micro-optimize some slower queries in the opr_sanity regression test. · c2384421
      Tom Lane authored
      Convert the binary_coercible() and physically_coercible() functions from
      SQL to plpgsql.  It's not that plpgsql is inherently better at doing
      queries; if you simply convert the previous single SQL query into one
      RETURN expression, it's no faster.  The problem with the existing code
      is that it fools the plancache into deciding that it's worth re-planning
      the query every time, since constant-folding with a concrete value for $2
      allows elimination of at least one sub-SELECT.  In reality that's using the
      planner to do the equivalent of a few runtime boolean tests, causing the
      function to run much slower than it should.  Splitting the AND/OR logic
      into separate plpgsql statements allows each if-expression to acquire a
      static plan.
      
      Also, get rid of some uses of obj_description() in favor of explicitly
      joining to pg_description, allowing the joins to be optimized better.
      (Someday we might improve the SQL-function-inlining logic enough that
      this happens automatically, but today is not that day.)
      
      Together, these changes reduce the runtime of the opr_sanity regression
      test by about a factor of two on one of my slower machines.  They don't
      seem to help as much on a fast machine, but this should at least benefit
      the buildfarm.
      c2384421
  2. 28 Apr, 2017 9 commits
  3. 27 Apr, 2017 12 commits
    • Andres Freund's avatar
      Don't build full initial logical decoding snapshot if NOEXPORT_SNAPSHOT. · ab9c4338
      Andres Freund authored
      Earlier commits (56e19d93 and 2bef06d5) make it cheaper to
      create a logical slot if not exporting the initial snapshot.  If
      NOEXPORT_SNAPSHOT is specified, we can skip the overhead, not just
      when creating a slot via sql (which can't export snapshots).  As
      NOEXPORT_SNAPSHOT has only recently been introduced, this shouldn't be
      backpatched.
      ab9c4338
    • Andres Freund's avatar
      Don't use on-disk snapshots for exported logical decoding snapshot. · 56e19d93
      Andres Freund authored
      Logical decoding stores historical snapshots on disk, so that logical
      decoding can restart without having to reconstruct a snapshot from
      scratch (for which the resources are not guaranteed to be present
      anymore).  These serialized snapshots were also used when creating a
      new slot via the walsender interface, which can export a "full"
      snapshot (i.e. one that can read all tables, not just catalog ones).
      
      The problem is that the serialized snapshots are only useful for
      catalogs and not for normal user tables.  Thus the use of such a
      serialized snapshot could result in an inconsistent snapshot being
      exported, which could lead to queries returning wrong data.  This
      would only happen if logical slots are created while another logical
      slot already exists.
      
      Author: Petr Jelinek
      Reviewed-By: Andres Freund
      Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
      Backport: 9.4, where logical decoding was introduced.
      56e19d93
    • Tom Lane's avatar
      Avoid slow shutdown of pg_basebackup. · 7834d20b
      Tom Lane authored
      pg_basebackup's child process did not pay any attention to the pipe
      from its parent while waiting for input from the source server.
      If no server data was arriving, it would only wake up and check the
      pipe every standby_message_timeout or so.  This creates a problem
      since the parent process might determine and send the desired stop
      position only after the server has reached end-of-WAL and stopped
      sending data.  In the src/test/recovery regression tests, the timing
      is repeatably such that it takes nearly 10 seconds for the child
      process to realize that it should shut down.  It's not clear how
      often that would happen in real-world cases, but it sure seems like
      a bug --- and if the user turns off standby_message_timeout or sets
      it very large, the delay could be a lot worse.
      
      To fix, expand the StreamCtl API to allow the pipe input FD to be
      passed down to the low-level wait routine, and watch both sockets
      when sleeping.
      
      (Note: AFAICS this issue doesn't affect the Windows port, since
      it doesn't rely on a pipe to transfer the stop position to the
      child thread.)
      
      Discussion: https://postgr.es/m/6456.1493263884@sss.pgh.pa.us
      7834d20b
    • Fujii Masao's avatar
      Fix bug so logical rep launcher saves correctly time of last startup of worker. · 9f11fcec
      Fujii Masao authored
      Previously the logical replication launcher stored the last timestamp
      when it started the worker, in the local variable "last_start_time",
      in order to check whether wal_retrive_retry_interval elapsed since
      the last startup of worker. If it has elapsed, the launcher sees
      pg_subscription and starts new worker if necessary. This is for
      limitting the startup of worker to once a wal_retrieve_retry_interval.
      
      The bug was that the variable "last_start_time" was defined and
      always initialized with 0 at the beginning of the launcher's main loop.
      So even if it's set to the last timestamp in later phase of the loop,
      it's always reset to 0. Therefore the launcher could not check
      correctly whether wal_retrieve_retry_interval elapsed since
      the last startup.
      
      This patch moves the variable "last_start_time" outside the main loop
      so that it will not be reset.
      
      Reviewed-by: Petr Jelinek
      Discussion: http://postgr.es/m/CAHGQGwGJrPO++XM4mFENAwpy1eGXKsGdguYv43GUgLgU-x8nTQ@mail.gmail.com
      9f11fcec
    • Tom Lane's avatar
      Cope with glibc too old to have epoll_create1(). · 82ebbeb0
      Tom Lane authored
      Commit fa31b6f4 supposed that we didn't have to worry about that
      anymore, but it seems that RHEL5 is like that, and that's still
      a supported platform.  Put back the prior coding under an #ifdef,
      adding an explicit fcntl() to retain the desired CLOEXEC property.
      
      Discussion: https://postgr.es/m/12307.1493325329@sss.pgh.pa.us
      82ebbeb0
    • Andres Freund's avatar
      Preserve required !catalog tuples while computing initial decoding snapshot. · 2bef06d5
      Andres Freund authored
      The logical decoding machinery already preserved all the required
      catalog tuples, which is sufficient in the course of normal logical
      decoding, but did not guarantee that non-catalog tuples were preserved
      during computation of the initial snapshot when creating a slot over
      the replication protocol.
      
      This could cause a corrupted initial snapshot being exported.  The
      time window for issues is usually not terribly large, but on a busy
      server it's perfectly possible to it hit it.  Ongoing decoding is not
      affected by this bug.
      
      To avoid increased overhead for the SQL API, only retain additional
      tuples when a logical slot is being created over the replication
      protocol.  To do so this commit changes the signature of
      CreateInitDecodingContext(), but it seems unlikely that it's being
      used in an extension, so that's probably ok.
      
      In a drive-by fix, fix handling of
      ReplicationSlotsComputeRequiredXmin's already_locked argument, which
      should only apply to ProcArrayLock, not ReplicationSlotControlLock.
      
      Reported-By: Erik Rijkers
      Analyzed-By: Petr Jelinek
      Author: Petr Jelinek, heavily editorialized by Andres Freund
      Reviewed-By: Andres Freund
      Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com
      Backport: 9.4, where logical decoding was introduced.
      2bef06d5
    • Tom Lane's avatar
      Make latch.c more paranoid about child-process cases. · fa31b6f4
      Tom Lane authored
      Although the postmaster doesn't currently create a self-pipe or any
      latches, there's discussion of it doing so in future.  It's also
      conceivable that a shared_preload_libraries extension would try to
      create such a thing in the postmaster process today.  In that case
      the self-pipe FDs would be inherited by forked child processes.
      latch.c was entirely unprepared for such a case and could suffer an
      assertion failure, or worse try to use the inherited pipe if somebody
      called WaitLatch without having called InitializeLatchSupport in that
      process.  Make it keep track of whether InitializeLatchSupport has been
      called in the *current* process, and do the right thing if state has
      been inherited from a parent.
      
      Apply FD_CLOEXEC to file descriptors created in latch.c (the self-pipe,
      as well as epoll event sets).  This ensures that child processes spawned
      in backends, the archiver, etc cannot accidentally or intentionally mess
      with these FDs.  It also ensures that we end up with the right state
      for the self-pipe in EXEC_BACKEND processes, which otherwise wouldn't
      know to close the postmaster's self-pipe FDs.
      
      Back-patch to 9.6, mainly to keep latch.c looking similar in all branches
      it exists in.
      
      Discussion: https://postgr.es/m/8322.1493240739@sss.pgh.pa.us
      fa31b6f4
    • Bruce Momjian's avatar
      doc: PG10 release note typo fix · a311d2a0
      Bruce Momjian authored
      Reported-by: daniel.westermann
      a311d2a0
    • Bruce Momjian's avatar
      doc PG10rel: adjust hash index commits and add parallel subquery · f8ab08ad
      Bruce Momjian authored
      Reported-by: Amit Kapila
      f8ab08ad
    • Simon Riggs's avatar
      Rework handling of subtransactions in 2PC recovery · 49e92815
      Simon Riggs authored
      The bug fixed by 0874d4f3
      caused us to question and rework the handling of
      subtransactions in 2PC during and at end of recovery.
      Patch adds checks and tests to ensure no further bugs.
      
      This effectively removes the temporary measure put in place
      by 546c13e1.
      
      Author: Simon Riggs
      Reviewed-by: Tom Lane, Michael Paquier
      Discussion: http://postgr.es/m/CANP8+j+vvXmruL_i2buvdhMeVv5TQu0Hm2+C5N+kdVwHJuor8w@mail.gmail.com
      49e92815
    • Simon Riggs's avatar
      Additional tests for subtransactions in recovery · 0352c15e
      Simon Riggs authored
      Tests for normal and prepared transactions
      
      Author: Nikhil Sontakke, placed in new test file by me
      0352c15e
    • Peter Eisentraut's avatar
      Fix typo in comment · 6c9bd27a
      Peter Eisentraut authored
      Author: Masahiko Sawada <sawada.mshk@gmail.com>
      6c9bd27a
  4. 26 Apr, 2017 10 commits
    • Tom Lane's avatar
      Allow multiple bgworkers to be launched per postmaster iteration. · aa1351f1
      Tom Lane authored
      Previously, maybe_start_bgworker() would launch at most one bgworker
      process per call, on the grounds that the postmaster might otherwise
      neglect its other duties for too long.  However, that seems overly
      conservative, especially since bad effects only become obvious when
      many hundreds of bgworkers need to be launched at once.  On the other
      side of the coin is that the existing logic could result in substantial
      delay of bgworker launches, because ServerLoop isn't guaranteed to
      iterate immediately after a signal arrives.  (My attempt to fix that
      by using pselect(2) encountered too many portability question marks,
      and in any case could not help on platforms without pselect().)
      One could also question the wisdom of using an O(N^2) processing
      method if the system is intended to support so many bgworkers.
      
      As a compromise, allow that function to launch up to 100 bgworkers
      per call (and in consequence, rename it to maybe_start_bgworkers).
      This will allow any normal parallel-query request for workers
      to be satisfied immediately during sigusr1_handler, avoiding the
      question of whether ServerLoop will be able to launch more promptly.
      
      There is talk of rewriting the postmaster to use a WaitEventSet to
      avoid the signal-response-delay problem, but I'd argue that this change
      should be kept even after that happens (if it ever does).
      
      Backpatch to 9.6 where parallel query was added.  The issue exists
      before that, but previous uses of bgworkers typically aren't as
      sensitive to how quickly they get launched.
      
      Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us
      aa1351f1
    • Bruce Momjian's avatar
      fda4fec5
    • Stephen Frost's avatar
      pg_get_partkeydef: return NULL for non-partitions · 0c76c246
      Stephen Frost authored
      Our general rule for pg_get_X(oid) functions is to simply return NULL
      when passed an invalid or inappropriate OID.  Teach pg_get_partkeydef to
      do this also, making it easier for users to use this function when
      querying against tables with both partitions and non-partitions (such as
      pg_class).
      
      As a concrete example, this makes pg_dump's life a little easier.
      
      Author: Amit Langote
      0c76c246
    • Tom Lane's avatar
      Silence compiler warning induced by commit de438971. · 49da0067
      Tom Lane authored
      Smarter compilers can see that "slot" can't be used uninitialized,
      but some popular ones cannot.  Noted by Jeff Janes.
      49da0067
    • Peter Eisentraut's avatar
      doc: ALTER SUBSCRIPTION documentation fixes · e315346d
      Peter Eisentraut authored
      WITH is optional for REFRESH PUBLICATION.  Also, remove a spurious
      bracket and fix a punctuation.
      
      Author: Euler Taveira <euler@timbira.com.br>
      e315346d
    • Peter Eisentraut's avatar
      Fix query that gets remote relation info · 61ecc90b
      Peter Eisentraut authored
      Publisher relation can be incorrectly chosen, if there are more than
      one relation in different schemas with the same name.
      
      Author: Euler Taveira <euler@timbira.com.br>
      61ecc90b
    • Peter Eisentraut's avatar
      Spelling fixes in code comments · e495c168
      Peter Eisentraut authored
      Author: Euler Taveira <euler@timbira.com.br>
      e495c168
    • Fujii Masao's avatar
      Fix typo in comment. · 1f8b0601
      Fujii Masao authored
      Author: Masahiko Sawada
      1f8b0601
    • Peter Eisentraut's avatar
      Fix various concurrency issues in logical replication worker launching · de438971
      Peter Eisentraut authored
      The code was originally written with assumption that launcher is the
      only process starting the worker.  However that hasn't been true since
      commit 7c4f5240 which failed to modify the worker management code
      adequately.
      
      This patch adds an in_use field to the LogicalRepWorker struct to
      indicate whether the worker slot is being used and uses proper locking
      everywhere this flag is set or read.
      
      However if the parent process dies while the new worker is starting and
      the new worker fails to attach to shared memory, this flag would never
      get cleared.  We solve this rare corner case by adding a sort of garbage
      collector for in_use slots.  This uses another field in the
      LogicalRepWorker struct named launch_time that contains the time when
      the worker was started.  If any request to start a new worker does not
      find free slot, we'll check for workers that were supposed to start but
      took too long to actually do so, and reuse their slot.
      
      In passing also fix possible race conditions when stopping a worker that
      hasn't finished starting yet.
      
      Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
      Reported-by: default avatarFujii Masao <masao.fujii@gmail.com>
      de438971
    • Bruce Momjian's avatar
      doc PG10: add Rafia Sabih to parallel index scan item · 309191f6
      Bruce Momjian authored
      Reported-by: Amit Kapila
      309191f6
  5. 25 Apr, 2017 7 commits