1. 09 Oct, 2017 3 commits
  2. 08 Oct, 2017 3 commits
    • Andres Freund's avatar
      Reduce memory usage of targetlist SRFs. · 84ad4b03
      Andres Freund authored
      Previously nodeProjectSet only released memory once per input tuple,
      rather than once per returned tuple. If the computation of an
      individual returned tuple requires a lot of memory, that can lead to
      problems.
      
      Instead change things so that the expression context can be reset once
      per output tuple, which requires a new memory context to store SRF
      arguments in.
      
      This is a longstanding issue, but was hard to fix before 9.6, due to
      the way tSRFs where evaluated. But it's fairly easy to fix now. We
      could backpatch this into 10, but given there've been fewc omplaints
      that doesn't seem worth the risk so far.
      
      Reported-By: Lucas Fairchild
      Author: Andres Freund, per discussion with Tom Lane
      Discussion: https://postgr.es/m/4514.1507318623@sss.pgh.pa.us
      84ad4b03
    • Tom Lane's avatar
      Increase distance between flush requests during bulk file copies. · 643c27e3
      Tom Lane authored
      copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
      and historically has issued a pg_flush_data request after each write.
      This turns out to interact really badly with macOS's new APFS file
      system: a large file copy takes over 100X longer than it ought to on
      APFS, as reported by Brent Dearth.  While that's arguably a macOS bug,
      it's not clear whether Apple will do anything about it in the near
      future, and in any case experimentation suggests that issuing flushes
      a bit less often can be helpful on other platforms too.
      
      Hence, rearrange the logic in copy_file() so that flush requests are
      issued once per N writes rather than every time through the loop.
      I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
      results in a noticeable speed degradation on APFS), but 1MB elsewhere.
      In limited testing on Linux and FreeBSD, this seems slightly faster
      than the previous code, and certainly no worse.  It helps noticeably
      on macOS even with the older HFS filesystem.
      
      A simpler change would have been to just increase the size of the
      copy buffer without changing the loop logic, but that seems likely
      to trash the processor cache without really helping much.
      
      Back-patch to 9.6 where we introduced msync() as an implementation
      option for pg_flush_data().  The problem seems specific to APFS's
      mmap/msync support, so I don't think we need to go further back.
      
      Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
      643c27e3
    • Tom Lane's avatar
      Reduce "X = X" to "X IS NOT NULL", if it's easy to do so. · 8ec5429e
      Tom Lane authored
      If the operator is a strict btree equality operator, and X isn't volatile,
      then the clause must yield true for any non-null value of X, or null if X
      is null.  At top level of a WHERE clause, we can ignore the distinction
      between false and null results, so it's valid to simplify the clause to
      "X IS NOT NULL".  This is a useful improvement mainly because we'll get
      a far better selectivity estimate in most cases.
      
      Because such cases seldom arise in well-written queries, it is unappetizing
      to expend a lot of planner cycles looking for them ... but it turns out
      that there's a place we can shoehorn this in practically for free, because
      equivclass.c already has to detect and reject candidate equivalences of the
      form X = X.  That doesn't catch every place that it would be valid to
      simplify to X IS NOT NULL, but it catches the typical case.  Working harder
      doesn't seem justified.
      
      Patch by me, reviewed by Petr Jelinek
      
      Discussion: https://postgr.es/m/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com
      8ec5429e
  3. 07 Oct, 2017 3 commits
  4. 06 Oct, 2017 10 commits
    • Tom Lane's avatar
      Fix crash when logical decoding is invoked from a PL function. · 1518d078
      Tom Lane authored
      The logical decoding functions do BeginInternalSubTransaction and
      RollbackAndReleaseCurrentSubTransaction to clean up after themselves.
      It turns out that AtEOSubXact_SPI has an unrecognized assumption that
      we always need to cancel the active SPI operation in the SPI context
      that surrounds the subtransaction (if there is one).  That's true
      when the RollbackAndReleaseCurrentSubTransaction call is coming from
      the SPI-using function itself, but not when it's happening inside
      some unrelated function invoked by a SPI query.  In practice the
      affected callers are the various PLs.
      
      To fix, record the current subtransaction ID when we begin a SPI
      operation, and clean up only if that ID is the subtransaction being
      canceled.
      
      Also, remove AtEOSubXact_SPI's assertion that it must have cleaned
      up the surrounding SPI context's active tuptable.  That's proven
      wrong by the same test case.
      
      Also clarify (or, if you prefer, reinterpret) the calling conventions
      for _SPI_begin_call and _SPI_end_call.  The memory context cleanup
      in the latter means that these have always had the flavor of a matched
      resource-management pair, but they weren't documented that way before.
      
      Per report from Ben Chobot.
      
      Back-patch to 9.4 where logical decoding came in.  In principle,
      the SPI changes should go all the way back, since the problem dates
      back to commit 7ec1c5a8.  But given the lack of field complaints
      it seems few people are using internal subtransactions in this way.
      So I don't feel a need to take any risks in 9.2/9.3.
      
      Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com
      1518d078
    • Robert Haas's avatar
      Copy information from the relcache instead of pointing to it. · 45866c75
      Robert Haas authored
      We have the relations continuously locked, but not open, so relcache
      pointers are not guaranteed to be stable.  Per buildfarm member
      prion.
      
      Ashutosh Bapat.  I fixed a typo.
      
      Discussion: http://postgr.es/m/CAFjFpRcRBqoKLZSNmRsjKr81uEP=ennvqSQaXVCCBTXvJ2rW+Q@mail.gmail.com
      45866c75
    • Tom Lane's avatar
      Fix intra-query memory leakage in nodeProjectSet.c. · a1c2c430
      Tom Lane authored
      Both ExecMakeFunctionResultSet() and evaluation of simple expressions
      need to be done in the per-tuple memory context, not per-query, else
      we leak data until end of query.  This is a consideration that was
      missed while refactoring code in the ProjectSet patch (note that in
      pre-v10, ExecMakeFunctionResult is called in the per-tuple context).
      
      Per bug #14843 from Ben M.  Diagnosed independently by Andres and myself.
      
      Discussion: https://postgr.es/m/20171005230321.28561.15927@wrigleys.postgresql.org
      a1c2c430
    • Tom Lane's avatar
      Fix access-off-end-of-array in clog.c. · 6b87416c
      Tom Lane authored
      Sloppy loop coding in set_status_by_pages() resulted in fetching one array
      element more than it should from the subxids[] array.  The odds of this
      resulting in SIGSEGV are pretty small, but we've certainly seen that happen
      with similar mistakes elsewhere.  While at it, we can get rid of an extra
      TransactionIdToPage() calculation per loop.
      
      Per report from David Binderman.  Back-patch to all supported branches,
      since this code is quite old.
      
      Discussion: https://postgr.es/m/HE1PR0802MB2331CBA919CBFFF0C465EB429C710@HE1PR0802MB2331.eurprd08.prod.outlook.com
      6b87416c
    • Peter Eisentraut's avatar
      Support coverage on vpath builds · c3d9a660
      Peter Eisentraut authored
      A few paths needed to be tweaked so everything looks into the
      appropriate directories.
      Reviewed-by: default avatarMichael Paquier <michael.paquier@gmail.com>
      c3d9a660
    • Peter Eisentraut's avatar
      Run coverage commands quietly · 52e1b1b0
      Peter Eisentraut authored
      They are very chatty by default, but the output doesn't seem all that
      useful for normal operation.
      Reviewed-by: default avatarMichael Paquier <michael.paquier@gmail.com>
      52e1b1b0
    • Peter Eisentraut's avatar
      Remove coverage details view · c0112363
      Peter Eisentraut authored
      This is only useful if we name the different tests, which we don't do at
      the moment.
      Reviewed-by: default avatarMichael Paquier <michael.paquier@gmail.com>
      c0112363
    • Tom Lane's avatar
      #ifdef out some dead code in psql/mainloop.c. · 3620569f
      Tom Lane authored
      This pg_send_history() call is unreachable, since the block it's in
      is currently only entered in !cur_cmd_interactive mode.  But rather
      than just delete it, make it #ifdef NOT_USED, in hopes that we'll
      remember to enable it if we ever change that decision.
      
      Per report from David Binderman.  Since this is basically cosmetic,
      I see no great need to back-patch.
      
      Discussion: https://postgr.es/m/HE1PR0802MB233122B61F00A15E035C83BE9C710@HE1PR0802MB2331.eurprd08.prod.outlook.com
      3620569f
    • Alvaro Herrera's avatar
      Fix traversal of half-frozen update chains · a5736bf7
      Alvaro Herrera authored
      When some tuple versions in an update chain are frozen due to them being
      older than freeze_min_age, the xmax/xmin trail can become broken.  This
      breaks HOT (and probably other things).  A subsequent VACUUM can break
      things in more serious ways, such as leaving orphan heap-only tuples
      whose root HOT redirect items were removed.  This can be seen because
      index creation (or REINDEX) complain like
        ERROR:  XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t"
      
      Because of relfrozenxid contraints, we cannot avoid the freezing of the
      early tuples, so we must cope with the results: whenever we see an Xmin
      of FrozenTransactionId, consider it a match for whatever the previous
      Xmax value was.
      
      This problem seems to have appeared in 9.3 with multixact changes,
      though strictly speaking it seems unrelated.
      
      Since 9.4 we have commit 37484ad2 "Change the way we mark tuples as
      frozen", so the fix is simple: just compare the raw Xmin (still stored
      in the tuple header, since freezing merely set an infomask bit) to the
      Xmax.  But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so
      the original value is lost and we have nothing to compare the Xmax with.
      To cope with that case we need to compare the Xmin with FrozenXid,
      assume it's a match, and hope for the best.  Sadly, since you can
      pg_upgrade a 9.3 instance containing half-frozen pages to newer
      releases, we need to keep the old check in newer versions too, which
      seems a bit brittle; I hope we can somehow get rid of that.
      
      I didn't optimize the new function for performance.  The new coding is
      probably a bit slower than before, since there is a function call rather
      than a straight comparison, but I'd rather have it work correctly than
      be fast but wrong.
      
      This is a followup after 20b65522 fixed a few related problems.
      Apparently, in 9.6 and up there are more ways to get into trouble, but
      in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so
      there must be a separate bug.
      
      Reported-by: Peter Geoghegan
      Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood,
      	Yi Wen Wong, Álvaro
      Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
      a5736bf7
    • Robert Haas's avatar
      Basic partition-wise join functionality. · f49842d1
      Robert Haas authored
      Instead of joining two partitioned tables in their entirety we can, if
      it is an equi-join on the partition keys, join the matching partitions
      individually.  This involves teaching the planner about "other join"
      rels, which are related to regular join rels in the same way that
      other member rels are related to baserels.  This can use significantly
      more CPU time and memory than regular join planning, because there may
      now be a set of "other" rels not only for every base relation but also
      for every join relation.  In most practical cases, this probably
      shouldn't be a problem, because (1) it's probably unusual to join many
      tables each with many partitions using the partition keys for all
      joins and (2) if you do that scenario then you probably have a big
      enough machine to handle the increased memory cost of planning and (3)
      the resulting plan is highly likely to be better, so what you spend in
      planning you'll make up on the execution side.  All the same, for now,
      turn this feature off by default.
      
      Currently, we can only perform joins between two tables whose
      partitioning schemes are absolutely identical.  It would be nice to
      cope with other scenarios, such as extra partitions on one side or the
      other with no match on the other side, but that will have to wait for
      a future patch.
      
      Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
      Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
      Khandekar, and by me.  A few final adjustments by me.
      
      Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
      Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
      f49842d1
  5. 05 Oct, 2017 10 commits
  6. 04 Oct, 2017 5 commits
  7. 03 Oct, 2017 3 commits
    • Tom Lane's avatar
      Allow multiple tables to be specified in one VACUUM or ANALYZE command. · 11d8d72c
      Tom Lane authored
      Not much to say about this; does what it says on the tin.
      
      However, formerly, if there was a column list then the ANALYZE action was
      implied; now it must be specified, or you get an error.  This is because
      it would otherwise be a bit unclear what the user meant if some tables
      have column lists and some don't.
      
      Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some
      editorialization by me
      
      Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com
      11d8d72c
    • Tom Lane's avatar
      Fix race condition with unprotected use of a latch pointer variable. · 45f9d086
      Tom Lane authored
      Commit 597a87cc introduced a latch pointer variable to replace use
      of a long-lived shared latch in the shared WalRcvData structure.
      This was not well thought out, because there are now hazards of the
      pointer variable changing while it's being inspected by another
      process.  This could obviously lead to a core dump in code like
      
      	if (WalRcv->latch)
      		SetLatch(WalRcv->latch);
      
      and there's a more remote risk of a torn read, if we have any
      platforms where reading/writing a pointer is not atomic.
      
      An actual problem would occur only if the walreceiver process
      exits (gracefully) while the startup process is trying to
      signal it, but that seems well within the realm of possibility.
      
      To fix, treat the pointer variable (not the referenced latch)
      as being protected by the WalRcv->mutex spinlock.  There
      remains a race condition that we could apply SetLatch to a
      process latch that no longer belongs to the walreceiver, but
      I believe that's harmless: at worst it'd cause an extra wakeup
      of the next process to use that PGPROC structure.
      
      Back-patch to v10 where the faulty code was added.
      
      Discussion: https://postgr.es/m/22735.1507048202@sss.pgh.pa.us
      45f9d086
    • Alvaro Herrera's avatar
      Fix coding rules violations in walreceiver.c · 89e434b5
      Alvaro Herrera authored
      1. Since commit b1a9bad9 we had pstrdup() inside a
      spinlock-protected critical section; reported by Andreas Seltenreich.
      Turn those into strlcpy() to stack-allocated variables instead.
      Backpatch to 9.6.
      
      2. Since commit 9ed551e0 we had a pfree() uselessly inside a
      spinlock-protected critical section.  Tom Lane noticed in code review.
      Move down.  Backpatch to 9.6.
      
      3. Since commit 64233902 we had GetCurrentTimestamp() (a kernel
      call) inside a spinlock-protected critical section.  Tom Lane noticed in
      code review.  Move it up.  Backpatch to 9.2.
      
      4. Since commit 1bb25580 we did elog(PANIC) while holding spinlock.
      Tom Lane noticed in code review.  Release spinlock before dying.
      Backpatch to 9.2.
      
      Discussion: https://postgr.es/m/87h8vhtgj2.fsf@ansel.ydns.eu
      89e434b5
  8. 02 Oct, 2017 3 commits