1. 13 Apr, 2016 5 commits
    • Andres Freund's avatar
      Avoid atomic operation in MarkLocalBufferDirty(). · 6b93fcd1
      Andres Freund authored
      The recent patch to make Pin/UnpinBuffer lockfree in the hot
      path (48354581), accidentally used pg_atomic_fetch_or_u32() in
      MarkLocalBufferDirty(). Other code operating on local buffers was
      careful to only use pg_atomic_read/write_u32 which just read/write from
      memory; to avoid unnecessary overhead.
      
      On its own that'd just make MarkLocalBufferDirty() slightly less
      efficient, but in addition InitLocalBuffers() doesn't call
      pg_atomic_init_u32() - thus the spinlock fallback for the atomic
      operations isn't initialized. That in turn caused, as reported by Tom,
      buildfarm animal gaur to fail.  As those errors are actually useful
      against this type of error, continue to omit - intentionally this time -
      initialization of the atomic variable.
      
      In addition, add an explicit note about only using pg_atomic_read/write
      on local buffers's state to BufferDesc's description.
      
      Reported-By: Tom Lane
      Discussion: 1881.1460431476@sss.pgh.pa.us
      6b93fcd1
    • Tom Lane's avatar
      Widen amount-to-flush arguments of FileWriteback and callers. · 95ef43c4
      Tom Lane authored
      It's silly to define these counts as narrower than they might someday
      need to be.  Also, I believe that the BLCKSZ * nflush calculation in
      mdwriteback was capable of overflowing an int.
      95ef43c4
    • Tom Lane's avatar
      Fix assorted portability issues with using msync() for data flushing. · fa11a09f
      Tom Lane authored
      Commit 428b1d6b introduced the use of
      msync() for flushing dirty data from the kernel's file buffers.  Several
      portability issues were overlooked, though:
      
      * Not all implementations of mmap() think that nbytes == 0 means "map
      the whole file".  To fix, use lseek() to find out the true length.
      Fix callers of pg_flush_data to be aware that nbytes == 0 may result
      in trashing the file's seek position.
      
      * Not all implementations of mmap() will accept partial-page mmap
      requests.  To fix, round down the length request to whatever sysconf()
      says the page size is.  (I think this is OK from a portability standpoint,
      because sysconf() is required by SUS v2, and we aren't trying to compile
      this part on Windows anyway.  Buildfarm should let us know if not.)
      
      * On 32-bit machines, the file size might exceed the available free
      address space, or even exceed what will fit in size_t.  Check for
      the latter explicitly to avoid passing a false request size to mmap().
      If mmap fails, silently fall through to the next implementation method,
      rather than bleating to the postmaster log and giving up.
      
      * mmap'ing directories fails on some platforms, and even if it works,
      msync'ing the directory is quite unlikely to help, as for that matter are
      the other flush implementations.  In pre_sync_fname(), just skip flush
      attempts on directories.
      
      In passing, copy-edit the comments a bit.
      
      Stas Kelvich and myself
      fa11a09f
    • Tom Lane's avatar
      Improve documentation for \crosstabview. · 85e00470
      Tom Lane authored
      Fix misleading syntax summary (there cannot be a space between colH and
      scolH).  Provide a link from the existing crosstab() function's
      documentation to \crosstabview.  Copy-edit the command's description.
      
      Christoph Berg and Tom Lane
      85e00470
    • Robert Haas's avatar
      Use PG_INT32_MIN instead of reiterating the constant. · cbb2a812
      Robert Haas authored
      Makes no difference, but it's cleaner this way.
      
      Michael Paquier
      cbb2a812
  2. 12 Apr, 2016 13 commits
    • Tom Lane's avatar
      Provide errno-translation wrappers around bind() and listen() on Windows. · d1b7d487
      Tom Lane authored
      I've seen one too many "could not bind IPv4 socket: No error" log entries
      from the Windows buildfarm members.  Per previous discussion, this is
      likely caused by the fact that we're doing nothing to translate
      WSAGetLastError() to errno.  Put in a wrapper layer to do that.
      
      If this works as expected, it should get back-patched, but let's see what
      happens in the buildfarm first.
      
      Discussion: <4065.1452450340@sss.pgh.pa.us>
      d1b7d487
    • Robert Haas's avatar
      Fix costing for parallel aggregation. · deb71fa9
      Robert Haas authored
      The original patch kind of ignored the fact that we were doing something
      different from a costing point of view, but nobody noticed.  This patch
      fixes that oversight.
      
      David Rowley
      deb71fa9
    • Fujii Masao's avatar
      Remove unused function GetOldestWALSendPointer from walsender code. · 46d73e0d
      Fujii Masao authored
      That unused function was introduced as a sample because synchronous
      replication or replication monitoring tools might need it in the future.
      Recently commit 989be081 added the function SyncRepGetOldestSyncRecPtr
      which provides almost the same functionality for multiple synchronous
      standbys feature. So it's time to remove that unused sample function.
      This commit does that.
      46d73e0d
    • Tom Lane's avatar
      Redefine create_upper_paths_hook as being invoked once per upper relation. · f1f01de1
      Tom Lane authored
      Per discussion, this gives potential users of the hook more flexibility,
      because they can build custom Paths that implement only one stage of
      upper processing atop core-provided Paths for earlier stages.
      f1f01de1
    • Tom Lane's avatar
      Improve coding of column-name parsing in psql's new crosstabview.c. · 7a5f8b5c
      Tom Lane authored
      Coverity complained about this code, not without reason because it was
      rather messy.  Adjust it to not scribble on the passed string; that adds
      one malloc/free cycle per column name, which is going to be insignificant
      in context.  We can actually const-ify both the string argument and the
      PGresult.
      
      Daniel Verité, with some further cleanup by me
      7a5f8b5c
    • Kevin Grittner's avatar
      Avoid extra locks in GetSnapshotData if old_snapshot_threshold < 0 · 2201d801
      Kevin Grittner authored
      On a big NUMA machine with 1000 connections in saturation load
      there was a performance regression due to spinlock contention, for
      acquiring values which were never used.  Just fill with dummy
      values if we're not going to use them.
      
      This patch has not been benchmarked yet on a big NUMA machine, but
      it seems like a good idea on general principle, and it seemed to
      prevent an apparent 2.2% regression on a single-socket i7 box
      running 200 connections at saturation load.
      2201d801
    • Tom Lane's avatar
      Improve API of GenericXLogRegister(). · 5713f039
      Tom Lane authored
      Rename this function to GenericXLogRegisterBuffer() to make it clearer
      what it does, and leave room for other sorts of "register" actions in
      future.  Also, replace its "bool isNew" argument with an integer flags
      argument, so as to allow adding more flags in future without an API
      break.
      
      Alexander Korotkov, adjusted slightly by me
      5713f039
    • Tom Lane's avatar
      In generic WAL application and replay, ensure page "hole" is always zero. · bdf7db81
      Tom Lane authored
      The previous coding could allow the contents of the "hole" between pd_lower
      and pd_upper to diverge during replay from what it had been when the update
      was originally applied.  This would pose a problem if checksums were in
      use, and in any case would complicate forensic comparisons between master
      and slave servers.  So force the "hole" to contain zeroes, both at initial
      application of a generically-logged action, and at replay.
      
      Alexander Korotkov, adjusted slightly by me
      bdf7db81
    • Teodor Sigaev's avatar
      Add page id to bloom index · 813b456e
      Teodor Sigaev authored
      Added to ensure that bloom index pages can be distinguished from other pages
      by pg_filedump. Because there wasn't any public/production versions before,
      it doesn't pay attention to any compatibility issues.
      
      Per notice from Tom Lane
      813b456e
    • Tom Lane's avatar
      Remove unnecessary definition of _WIN64 in libpq/win32.mak. · e7bcde8c
      Tom Lane authored
      In commit b0e40d18, I should have just
      removed the /D switch defining WIN64.  The reason the code worked before
      is that all Windows64 compilers automatically predefine _WIN64.  Perhaps
      at one time we had code that depended on WIN64 being defined, but it's
      long gone, and we should not encourage any reappearance.  Per discussion
      with Christian Ullrich.
      e7bcde8c
    • Stephen Frost's avatar
      Correct copyright for newly added genericdesc.c · cd13471f
      Stephen Frost authored
      It's 2016 these days (no, not entirely sure how we got here either).
      
      Pointed out by Amit Langote
      cd13471f
    • Peter Eisentraut's avatar
      Fix whitespace · 70715e6a
      Peter Eisentraut authored
      70715e6a
    • Tom Lane's avatar
      Fix _SPI_execute_plan() for CREATE TABLE IF NOT EXISTS foo AS ... · 39c283e4
      Tom Lane authored
      When IF NOT EXISTS was added to CREATE TABLE AS, this logic didn't get
      the memo, possibly resulting in an Assert failure.  It looks like there
      would have been no ill effects in a non-Assert build, though.  Back-patch
      to 9.5 where the IF NOT EXISTS option was added.
      
      Stas Kelvich
      39c283e4
  3. 11 Apr, 2016 18 commits
    • Tom Lane's avatar
      Fix two places that thought Windows64 is indicated by WIN64 macro. · b0e40d18
      Tom Lane authored
      Everyplace else thinks it's _WIN64, so make these places fall in line.
      
      The pg_regress.c usage is not going to result in any change in behavior,
      only suppressing (or not) a compiler warning about downcasting HANDLEs.
      So there seems no need for back-patching there.
      
      The libpq/win32.mak usage might represent an actual bug, if anyone were
      using this script to build for Windows64, which perhaps nobody is.
      Given the lack of field complaints, no back-patch here either.
      
      pg_regress.c problem found by Christian Ullrich, the other by me.
      b0e40d18
    • Tom Lane's avatar
      Fix freshly-introduced PL/Python portability bug. · 1d2f9de3
      Tom Lane authored
      It turns out that those PyErr_Clear() calls I removed from plpy_elog.c
      in 7e3bb080 et al were not quite as random as they appeared: they
      mask a Python 2.3.x bug.  (Specifically, it turns out that PyType_Ready()
      can fail if the error indicator is set on entry, and PLy_traceback's fetch
      of frame.f_code may be the first operation in a session that requires the
      "frame" type to be readied.  Ick.)  Put back the clear call, but in a more
      centralized place closer to what it's protecting, and this time with a
      comment warning what it's really for.
      
      Per buildfarm member prairiedog.  Although prairiedog was only failing
      on HEAD, it seems clearly possible for this to occur in older branches
      as well, so back-patch to 9.2 the same as the previous patch.
      1d2f9de3
    • Kevin Grittner's avatar
      Use static inline function for BufferGetPage() · a6f6b781
      Kevin Grittner authored
      I was initially concerned that the some of the hundreds of
      references to BufferGetPage() where the literal
      BGP_NO_SNAPSHOT_TEST were passed might not optimize as well as a
      macro, leading to some hard-to-find performance regressions in
      corner cases.  Inspection of disassembled code has shown identical
      code at all inspected locations, and the size difference doesn't
      amount to even one byte per such call.  So make it readable.
      
      Per gripes from Álvaro Herrera and Tom Lane
      a6f6b781
    • Kevin Grittner's avatar
      Make oldSnapshotControl a pointer to a volatile structure · 80647bf6
      Kevin Grittner authored
      It was incorrectly declared as a volatile pointer to a non-volatile
      structure.  Eliminate the OldSnapshotControl struct definition; it
      is really not needed.  Pointed out by Tom Lane.
      
      While at it, add OldSnapshotControlData to pgindent's list of
      structures.
      80647bf6
    • Peter Eisentraut's avatar
      Fix whitespace · d8ed83cd
      Peter Eisentraut authored
      d8ed83cd
    • Stephen Frost's avatar
      Prefix RLS regression test roles with 'regress_' · 6c7b0388
      Stephen Frost authored
      To avoid any possible overlap with existing roles on a system when
      doing a 'make installcheck', use role names which start with
      'regress_'.
      
      Pointed out by Tom.
      6c7b0388
    • Peter Eisentraut's avatar
      29ca231b
    • Tom Lane's avatar
      Fix missing "volatile" in PLy_output(). · 81ba9348
      Tom Lane authored
      Commit 5c3c3cd0 plastered "volatile" on a bunch of variables
      in PLy_output(), but removed the one that actually mattered, ie the
      one on "oldcontext".  This allows some versions of clang to generate
      code in which "oldcontext" has been trashed when control reaches the
      PG_CATCH block.  Per buildfarm member tick.
      81ba9348
    • Peter Eisentraut's avatar
      cpluspluscheck: Update include path · ee5dbc81
      Peter Eisentraut authored
      Some things in src/include/fe_utils require libpq headers, so add
      libpq's include path to the command line used here.
      ee5dbc81
    • Fujii Masao's avatar
    • Fujii Masao's avatar
      Use ereport(ERROR) instead of Assert() to emit syncrep_parser error. · 0038c1e2
      Fujii Masao authored
      The existing code would either Assert or generate an invalid
      SyncRepConfig variable, neither of which is desirable. A regular
      error should be thrown instead.
      
      This commit silences compiler warning in non assertion-enabled builds.
      
      Per report from Jeff Janes.
      Suggested fix by Tom Lane.
      0038c1e2
    • Tom Lane's avatar
      Fix poorly thought-through code from commit 5c3c3cd0. · f73b2bbb
      Tom Lane authored
      It's not entirely clear to me whether PyString_AsString can return
      null (looks like the answer might vary between Python 2 and 3).
      But in any case, this code's attempt to cope with the possibility
      was quite broken, because pstrdup() neither allows a null argument
      nor ever returns a null.
      
      Moreover, the code below this point assumes that "message" is a
      palloc'd string, which would not be the case for a dgettext result.
      
      Fix both problems by doing the pstrdup step separately.
      f73b2bbb
    • Tom Lane's avatar
      pg_dump: add missing "destroyPQExpBuffer(query)" in dumpForeignServer(). · 074050f1
      Tom Lane authored
      Coverity complained about this resource leak (why now, I don't know,
      since it's been like that a long time).  Our general policy in pg_dump
      is that PQExpBuffers are worth cleaning up, so do it here too.  But
      don't bother with a back-patch, because it seems unlikely that very
      many databases contain enough FOREIGN SERVER objects to notice.
      074050f1
    • Tom Lane's avatar
      Add comment about intentional fallthrough in switch. · 1630f5b9
      Tom Lane authored
      Coverity complained about an apparent missing "break" in a switch
      added by bb140506.  The human-readable comments are pretty
      clear that this is intentional, but add a standard /* FALL THRU */
      comment to make it clear to tools too.
      1630f5b9
    • Tom Lane's avatar
      Clean up foreign-key caching code in planner. · 5306df28
      Tom Lane authored
      Coverity complained that the code added by 015e8894 lacked an
      error check for SearchSysCache1 failures, which it should have.  But
      the code was pretty duff in other ways too, including failure to think
      about whether it could really cope with arrays of different lengths.
      5306df28
    • Tom Lane's avatar
      Fix access-to-already-freed-memory issue in plpython's error handling. · 7e3bb080
      Tom Lane authored
      PLy_elog() could attempt to access strings that Python had already freed,
      because the strings that PLy_get_spi_error_data() returns are simply
      pointers into storage associated with the error "val" PyObject.  That's
      fine at the instant PLy_get_spi_error_data() returns them, but just after
      that PLy_traceback() intentionally releases the only refcount on that
      object, allowing it to be freed --- so that the strings we pass to
      ereport() are dangling pointers.
      
      In principle this could result in garbage output or a coredump.  In
      practice, I think the risk is pretty low, because there are no Python
      operations between where we decrement that refcount and where we use the
      strings (and copy them into PG storage), and thus no reason for Python
      to recycle the storage.  Still, it's clearly hazardous, and it leads to
      Valgrind complaints when running under a Valgrind that hasn't been
      lobotomized to ignore Python memory allocations.
      
      The code was a mess anyway: we fetched the error data out of Python
      (clearing Python's error indicator) with PyErr_Fetch, examined it, pushed
      it back into Python with PyErr_Restore (re-setting the error indicator),
      then immediately pulled it back out with another PyErr_Fetch.  Just to
      confuse matters even more, there were some gratuitous-and-yet-hazardous
      PyErr_Clear calls in the "examine" step, and we didn't get around to doing
      PyErr_NormalizeException until after the second PyErr_Fetch, making it even
      less clear which object was being manipulated where and whether we still
      had a refcount on it.  (If PyErr_NormalizeException did substitute a
      different "val" object, it's possible that the problem could manifest for
      real, because then we'd be doing assorted Python stuff with no refcount
      on the object we have string pointers into.)
      
      So, rearrange all that into some semblance of sanity, and don't decrement
      the refcount on the Python error objects until the end of PLy_elog().
      In HEAD, I failed to resist the temptation to reformat some messy bits
      from 5c3c3cd0 along the way.
      
      Back-patch as far as 9.2, because the code is substantially the same
      that far back.  I believe that 9.1 has the bug as well; but the code
      around it is rather different and I don't want to take a chance on
      breaking something for what seems a low-probability problem.
      7e3bb080
    • Andres Freund's avatar
      Avoid the use of a separate spinlock to protect a LWLock's wait queue. · 008608b9
      Andres Freund authored
      Previously we used a spinlock, in adition to the atomically manipulated
      ->state field, to protect the wait queue. But it's pretty simple to
      instead perform the locking using a flag in state.
      
      Due to 6150a1b0 BufferDescs, on platforms (like PPC) with > 1 byte
      spinlocks, increased their size above 64byte. As 64 bytes are the size
      we pad allocated BufferDescs to, this can increase false sharing;
      causing performance problems in turn. Together with the previous commit
      this reduces the size to <= 64 bytes on all common platforms.
      
      Author: Andres Freund
      Discussion: CAA4eK1+ZeB8PMwwktf+3bRS0Pt4Ux6Rs6Aom0uip8c6shJWmyg@mail.gmail.com
          20160327121858.zrmrjegmji2ymnvr@alap3.anarazel.de
      008608b9
    • Andres Freund's avatar
      Allow Pin/UnpinBuffer to operate in a lockfree manner. · 48354581
      Andres Freund authored
      Pinning/Unpinning a buffer is a very frequent operation; especially in
      read-mostly cache resident workloads. Benchmarking shows that in various
      scenarios the spinlock protecting a buffer header's state becomes a
      significant bottleneck. The problem can be reproduced with pgbench -S on
      larger machines, but can be considerably worse for queries which touch
      the same buffers over and over at a high frequency (e.g. nested loops
      over a small inner table).
      
      To allow atomic operations to be used, cram BufferDesc's flags,
      usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
      that allows to manipulate them together using 32bit compare-and-swap
      operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
      be lifted by using a 64bit field, but it's not a realistic configuration
      atm).
      
      As not all operations can easily implemented in a lockfree manner,
      implement the previous buf_hdr_lock via a flag bit in the atomic
      variable. That way we can continue to lock the header in places where
      it's needed, but can get away without acquiring it in the more frequent
      hot-paths.  There's some additional operations which can be done without
      the lock, but aren't in this patch; but the most important places are
      covered.
      
      As bufmgr.c now essentially re-implements spinlocks, abstract the delay
      logic from s_lock.c into something more generic. It now has already two
      users, and more are coming up; there's a follupw patch for lwlock.c at
      least.
      
      This patch is based on a proof-of-concept written by me, which Alexander
      Korotkov made into a fully working patch; the committed version is again
      revised by me.  Benchmarking and testing has, amongst others, been
      provided by Dilip Kumar, Alexander Korotkov, Robert Haas.
      
      On a large x86 system improvements for readonly pgbench, with a high
      client count, of a factor of 8 have been observed.
      
      Author: Alexander Korotkov and Andres Freund
      Discussion: 2400449.GjM57CE0Yg@dinodell
      48354581
  4. 10 Apr, 2016 3 commits
    • Tom Lane's avatar
      Improve contrib/bloom regression test using code coverage info. · cf223c3b
      Tom Lane authored
      Originally, this test created a 100000-row test table, which made it
      run rather slowly compared to other contrib tests.  Investigation with
      gcov showed that we got no further improvement in code coverage after
      the first 700 or so rows, making the large table 99% a waste of time.
      Cut it back to 2000 rows to fix the runtime problem and still leave
      some headroom for testing behaviors that may appear later.
      
      A closer look at the gcov results showed that the main coverage
      omissions in contrib/bloom occurred because the test never filled more
      than one entry in the notFullPage array; which is unsurprising because
      it exercised index cleanup only in the scenario of complete table
      deletion, allowing every page in the index to become deleted rather
      than not-full.  Add testing that allows the not-full path to be
      exercised as well.
      
      Also, test the amvalidate function, because blvalidate.c had zero
      coverage without that, and besides it's a good idea to check for
      mistakes in the bloom opclass definitions.
      cf223c3b
    • Alvaro Herrera's avatar
      Fix possible NULL dereference in ExecAlterObjectDependsStmt · bd905a0d
      Alvaro Herrera authored
      I used the wrong variable here.  Doesn't make a difference today because
      the only plausible caller passes a non-NULL variable, but someday it
      will be wrong, and even today's correctness is subtle: the caller that
      does pass a NULL is never invoked because of object type constraints.
      Surely not a condition to rely on.
      
      Noted by Coverity
      bd905a0d
    • Tom Lane's avatar
      Further minor improvement in generic_xlog.c: always say REGBUF_STANDARD. · 660d5fb8
      Tom Lane authored
      Since we're requiring pages handled by generic_xlog.c to be standard
      format, specify REGBUF_STANDARD when doing a full-page image, so that
      xloginsert.c can compress out the "hole" between pd_lower and pd_upper.
      Given the current API in which this path will be taken only for a newly
      initialized page, the hole is likely to be particularly large in such
      cases, so that this oversight could easily be performance-significant.
      I don't notice any particular change in the runtime of contrib/bloom's
      regression test, though.
      660d5fb8
  5. 09 Apr, 2016 1 commit
    • Tom Lane's avatar
      Micro-optimize GenericXLogFinish(). · 68689c66
      Tom Lane authored
      Make the inner comparison loops of computeDelta() as tight as possible by
      pulling considerations of valid and invalid ranges out of the inner loops,
      and extending a match or non-match detection as far as possible before
      deciding what to do next.  To keep this tractable, give up the possibility
      of merging fragments across the pd_lower to pd_upper gap.  The fraction of
      pages where that could happen (ie, there are 4 or fewer bytes in the gap,
      *and* data changes immediately adjacent to it on both sides) is too small
      to be worth spending cycles on.
      
      Also, avoid two BLCKSZ-length memcpy()s by computing the delta before
      moving data into the target buffer, instead of after.  This doesn't save
      nearly as many cycles as being tenser about computeDelta(), but it still
      seems worth doing.
      
      On my machine, this patch cuts a full 40% off the runtime of
      contrib/bloom's regression test.
      68689c66