1. 25 Dec, 2014 8 commits
    • Andres Freund's avatar
      Blindly fix a dtrace probe in lwlock.c for a removed local variable. · 740a4ec7
      Andres Freund authored
      Per buildfarm member locust.
      740a4ec7
    • Tom Lane's avatar
      Temporarily revert "Move pg_lzcompress.c to src/common." · 966115c3
      Tom Lane authored
      This reverts commit 60838df9.
      That change needs a bit more thought to be workable.  In view of
      the potentially machine-dependent stuff that went in today,
      we need all of the buildfarm to be testing those other changes.
      966115c3
    • Andres Freund's avatar
      Lockless StrategyGetBuffer clock sweep hot path. · d72731a7
      Andres Freund authored
      StrategyGetBuffer() has proven to be a bottleneck in a number of
      buffer acquisition heavy workloads. To some degree this has already
      been alleviated by 5d7962c6, but it still can be quite a heavy
      bottleneck.  The problem is that in unfortunate usage patterns a
      single StrategyGetBuffer() call will have to look at a large number of
      buffers - in turn making it likely that the process will be put to
      sleep while still holding the spinlock.
      
      Replace most of the usage of the buffer_strategy_lock spinlock for the
      clock sweep by a atomic nextVictimBuffer variable. That variable,
      modulo NBuffers, is the current hand of the clock sweep. The buffer
      clock-sweep then only needs to acquire the spinlock after a
      wraparound. And even then only in the process that did the wrapping
      around. That alleviates nearly all the contention on the relevant
      spinlock, although significant contention on the cacheline can still
      exist.
      
      Reviewed-By: Robert Haas and Amit Kapila
      
      Discussion: 20141010160020.GG6670@alap3.anarazel.de,
          20141027133218.GA2639@awork2.anarazel.de
      d72731a7
    • Andres Freund's avatar
      Improve LWLock scalability. · ab5194e6
      Andres Freund authored
      The old LWLock implementation had the problem that concurrent lock
      acquisitions required exclusively acquiring a spinlock. Often that
      could lead to acquirers waiting behind the spinlock, even if the
      actual LWLock was free.
      
      The new implementation doesn't acquire the spinlock when acquiring the
      lock itself. Instead the new atomic operations are used to atomically
      manipulate the state. Only the waitqueue, used solely in the slow
      path, is still protected by the spinlock. Check lwlock.c's header for
      an explanation about the used algorithm.
      
      For some common workloads on larger machines this can yield
      significant performance improvements. Particularly in read mostly
      workloads.
      
      Reviewed-By: Amit Kapila and Robert Haas
      Author: Andres Freund
      
      Discussion: 20130926225545.GB26663@awork2.anarazel.de
      ab5194e6
    • Andres Freund's avatar
      Convert the PGPROC->lwWaitLink list into a dlist instead of open coding it. · 7882c3b0
      Andres Freund authored
      Besides being shorter and much easier to read it changes the logic in
      LWLockRelease() to release all shared lockers when waking up any. This
      can yield some significant performance improvements - and the fairness
      isn't really much worse than before, as we always allowed new shared
      lockers to jump the queue.
      7882c3b0
    • Andres Freund's avatar
      Add capability to suppress CONTEXT: messages to elog machinery. · 570bd2b3
      Andres Freund authored
      Hiding context messages usually is not a good idea - except for rather
      verbose debugging/development utensils like LOG_DEBUG. There the
      amount of repeated context messages just bloat the log without adding
      information.
      570bd2b3
    • Fujii Masao's avatar
      Remove duplicate include of slot.h. · 4a559319
      Fujii Masao authored
      Back-patch to 9.4, where this problem was added.
      4a559319
    • Fujii Masao's avatar
      Move pg_lzcompress.c to src/common. · 60838df9
      Fujii Masao authored
      Exposing compression and decompression APIs of pglz makes possible its
      use by extensions and contrib modules. pglz_decompress contained a call
      to elog to emit an error message in case of corrupted data. This function
      is changed to return a status code to let its callers return an error instead.
      
      This commit is required for upcoming WAL compression feature so that
      the WAL reader facility can decompress the WAL data by using pglz_decompress.
      
      Michael Paquier
      60838df9
  2. 24 Dec, 2014 5 commits
  3. 23 Dec, 2014 6 commits
    • Tom Lane's avatar
      Remove failing collation case from object_address regression test. · 3e227535
      Tom Lane authored
      Per buildfarm, this test case does not yield consistent results.
      I don't think it's useful enough to figure out a workaround, either.
      3e227535
    • Alvaro Herrera's avatar
      Revert "Use a bitmask to represent role attributes" · a609d967
      Alvaro Herrera authored
      This reverts commit 1826987a.
      
      The overall design was deemed unacceptable, in discussion following the
      previous commit message; we might find some parts of it still
      salvageable, but I don't want to be on the hook for fixing it, so let's
      wait until we have a new patch.
      a609d967
    • Alvaro Herrera's avatar
      Add SQL-callable pg_get_object_address · d7ee82e5
      Alvaro Herrera authored
      This allows access to get_object_address from SQL, which is useful to
      obtain OID addressing information from data equivalent to that emitted
      by the parser.  This is necessary infrastructure of a project to let
      replication systems propagate object dropping events to remote servers,
      where the schema might be different than the server originating the
      DROP.
      
      This patch also adds support for OBJECT_DEFAULT to get_object_address;
      that is, it is now possible to refer to a column's default value.
      
      Catalog version bumped due to the new function.
      
      Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres
      Freund, Abhijit Menon-Sen, Adam Brightwell.
      d7ee82e5
    • Alvaro Herrera's avatar
      Use a bitmask to represent role attributes · 1826987a
      Alvaro Herrera authored
      The previous representation using a boolean column for each attribute
      would not scale as well as we want to add further attributes.
      
      Extra auxilliary functions are added to go along with this change, to
      make up for the lost convenience of access of the old representation.
      
      Catalog version bumped due to change in catalogs and the new functions.
      
      Author: Adam Brightwell, minor tweaks by Álvaro
      Reviewed by: Stephen Frost, Andres Freund, Álvaro Herrera
      1826987a
    • Alvaro Herrera's avatar
      get_object_address: separate domain constraints from table constraints · 7eca575d
      Alvaro Herrera authored
      Apart from enabling comments on domain constraints, this enables a
      future project to replicate object dropping to remote servers: with the
      current mechanism there's no way to distinguish between the two types of
      constraints, so there's no way to know what to drop.
      
      Also added support for the domain constraint comments in psql's \dd and
      pg_dump.
      
      Catalog version bumped due to the change in ObjectType enum.
      7eca575d
    • Peter Eisentraut's avatar
      Change local_preload_libraries to PGC_USERSET · 584e35d1
      Peter Eisentraut authored
      This allows it to be used with ALTER ROLE SET.
      
      Although the old setting of PGC_BACKEND prevented changes after session
      start, after discussion it was more useful to allow ALTER ROLE SET
      instead and just document that changes during a session have no effect.
      This is similar to how session_preload_libraries works already.
      
      An alternative would be to change things to allow PGC_BACKEND and
      PGC_SU_BACKEND settings to be changed by ALTER ROLE SET.  But that might
      need further research (e.g., log_connections would probably not work).
      
      based on patch by Kyotaro Horiguchi
      584e35d1
  4. 22 Dec, 2014 5 commits
  5. 21 Dec, 2014 2 commits
    • Tom Lane's avatar
      Docs: clarify treatment of variadic functions with zero variadic arguments. · 699300a1
      Tom Lane authored
      Explain that you have to use "VARIADIC ARRAY[]" to pass an empty array
      to a variadic parameter position.  This was already implicit in the text
      but it seems better to spell it out.
      
      Per a suggestion from David Johnston, though I didn't use his proposed
      wording.  Back-patch to all supported branches.
      699300a1
    • Heikki Linnakangas's avatar
      Fix file descriptor leak at end of recovery. · 2ef6c66a
      Heikki Linnakangas authored
      XLogFileInit() returns a file descriptor, which needs to be closed. The leak
      was short-lived, since the startup process exits shortly afterwards, but it
      was clearly a bug, nevertheless.
      
      Per Coverity report.
      2ef6c66a
  6. 20 Dec, 2014 1 commit
  7. 19 Dec, 2014 4 commits
    • Alvaro Herrera's avatar
      pg_event_trigger_dropped_objects: add behavior flags · 0ee98d1c
      Alvaro Herrera authored
      Add "normal" and "original" flags as output columns to the
      pg_event_trigger_dropped_objects() function.  With this it's possible to
      distinguish which objects, among those listed, need to be explicitely
      referenced when trying to replicate a deletion.
      
      This is necessary so that the list of objects can be pruned to the
      minimum necessary to replicate the DROP command in a remote server that
      might have slightly different schema (for instance, TOAST tables and
      constraints with different names and such.)
      
      Catalog version bumped due to change of function definition.
      
      Reviewed by: Abhijit Menon-Sen, Stephen Frost, Heikki Linnakangas,
      Robert Haas.
      0ee98d1c
    • Heikki Linnakangas's avatar
      Fix timestamp in end-of-recovery WAL records. · 5c805d0a
      Heikki Linnakangas authored
      We used time(null) to set a TimestampTz field, which gave bogus results.
      Noticed while looking at pg_xlogdump output.
      
      Backpatch to 9.3 and above, where the fast promotion was introduced.
      5c805d0a
    • Andres Freund's avatar
      Prevent potentially hazardous compiler/cpu reordering during lwlock release. · 37de8de9
      Andres Freund authored
      In LWLockRelease() (and in 9.4+ LWLockUpdateVar()) we release enqueued
      waiters using PGSemaphoreUnlock(). As there are other sources of such
      unlocks backends only wake up if MyProc->lwWaiting is set to false;
      which is only done in the aforementioned functions.
      
      Before this commit there were dangers because the store to lwWaitLink
      could become visible before the store to lwWaitLink. This could both
      happen due to compiler reordering (on most compilers) and on some
      platforms due to the CPU reordering stores.
      
      The possible consequence of this is that a backend stops waiting
      before lwWaitLink is set to NULL. If that backend then tries to
      acquire another lock and has to wait there the list could become
      corrupted once the lwWaitLink store is finally performed.
      
      Add a write memory barrier to prevent that issue.
      
      Unfortunately the barrier support has been only added in 9.2. Given
      that the issue has not knowingly been observed in praxis it seems
      sufficient to prohibit compiler reordering using volatile for 9.0 and
      9.1. Actual problems due to compiler reordering are more likely
      anyway.
      
      Discussion: 20140210134625.GA15246@awork2.anarazel.de
      37de8de9
    • Andres Freund's avatar
      Define Assert() et al to ((void)0) to avoid pedantic warnings. · 9959abb0
      Andres Freund authored
      gcc's -Wempty-body warns about the current usage when compiling
      postgres without --enable-cassert.
      9959abb0
  8. 18 Dec, 2014 9 commits
    • Tom Lane's avatar
      Improve documentation about CASE and constant subexpressions. · 5b516835
      Tom Lane authored
      The possibility that constant subexpressions of a CASE might be evaluated
      at planning time was touched on in 9.17.1 (CASE expressions), but it really
      ought to be explained in 4.2.14 (Expression Evaluation Rules) which is the
      primary discussion of such topics.  Add text and an example there, and
      revise the <note> under CASE to link there.
      
      Back-patch to all supported branches, since it's acted like this for a
      long time (though 9.2+ is probably worse because of its more aggressive
      use of constant-folding via replanning of nominally-prepared statements).
      Pre-9.4, also back-patch text added in commit 0ce627d4 about CASE versus
      aggregate functions.
      
      Tom Lane and David Johnston, per discussion of bug #12273.
      5b516835
    • Alvaro Herrera's avatar
      Use %u to print out BlockNumber variables · cd6e6657
      Alvaro Herrera authored
      Per Tom Lane
      cd6e6657
    • Alvaro Herrera's avatar
      Have VACUUM log number of skipped pages due to pins · 35192f06
      Alvaro Herrera authored
      Author: Jim Nasby, some kibitzing by Heikki Linnankangas.
      Discussion leading to current behavior and precise wording fueled by
      thoughts from Robert Haas and Andres Freund.
      35192f06
    • Tom Lane's avatar
      Improve hash_create's API for selecting simple-binary-key hash functions. · 4a14f13a
      Tom Lane authored
      Previously, if you wanted anything besides C-string hash keys, you had to
      specify a custom hashing function to hash_create().  Nearly all such
      callers were specifying tag_hash or oid_hash; which is tedious, and rather
      error-prone, since a caller could easily miss the opportunity to optimize
      by using hash_uint32 when appropriate.  Replace this with a design whereby
      callers using simple binary-data keys just specify HASH_BLOBS and don't
      need to mess with specific support functions.  hash_create() itself will
      take care of optimizing when the key size is four bytes.
      
      This nets out saving a few hundred bytes of code space, and offers
      a measurable performance improvement in tidbitmap.c (which was not
      exploiting the opportunity to use hash_uint32 for its 4-byte keys).
      There might be some wins elsewhere too, I didn't analyze closely.
      
      In future we could look into offering a similar optimized hashing function
      for 8-byte keys.  Under this design that could be done in a centralized
      and machine-independent fashion, whereas getting it right for keys of
      platform-dependent sizes would've been notationally painful before.
      
      For the moment, the old way still works fine, so as not to break source
      code compatibility for loadable modules.  Eventually we might want to
      remove tag_hash and friends from the exported API altogether, since there's
      no real need for them to be explicitly referenced from outside dynahash.c.
      
      Teodor Sigaev and Tom Lane
      4a14f13a
    • Heikki Linnakangas's avatar
      Change how first WAL segment on new timeline after promotion is created. · ba94518a
      Heikki Linnakangas authored
      Two changes:
      
      1. When copying a WAL segment from old timeline to create the first segment
      on the new timeline, only copy up to the point where the timeline switch
      happens, and zero-fill the rest. This avoids corner cases where we might
      think that the copied WAL from the previous timeline belong to the new
      timeline.
      
      2. If the timeline switch happens at a segment boundary, don't copy the
      whole old segment to the new timeline. It's pointless, because it's 100%
      identical to the old segment.
      ba94518a
    • Fujii Masao's avatar
      Add memory barriers for PgBackendStatus.st_changecount protocol. · 38628db8
      Fujii Masao authored
      st_changecount protocol needs the memory barriers to ensure that
      the apparent order of execution is as it desires. Otherwise,
      for example, the CPU might rearrange the code so that st_changecount
      is incremented twice before the modification on a machine with
      weak memory ordering. This surprising result can lead to bugs.
      
      This commit introduces the macros to load and store st_changecount
      with the memory barriers. These are called before and after
      PgBackendStatus entries are modified or copied into private memory,
      in order to prevent CPU from reordering PgBackendStatus access.
      
      Per discussion on pgsql-hackers, we decided not to back-patch this
      to 9.4 or before until we get an actual bug report about this.
      
      Patch by me. Review by Robert Haas.
      38628db8
    • Fujii Masao's avatar
      Ensure variables live across calls in generate_series(numeric, numeric). · 19e065c0
      Fujii Masao authored
      In generate_series_step_numeric(), the variables "start_num"
      and "stop_num" may be potentially freed until the next call.
      So they should be put in the location which can survive across calls.
      But previously they were not, and which could cause incorrect
      behavior of generate_series(numeric, numeric). This commit fixes
      this problem by copying them on multi_call_memory_ctx.
      
      Andrew Gierth
      19e065c0
    • Fujii Masao's avatar
      Update .gitignore for config.cache. · ccf292cd
      Fujii Masao authored
      Also add a comment about why regreesion.* aren't listed in .gitignore.
      
      Jim Nasby
      ccf292cd
    • Andres Freund's avatar
      Adjust valgrind suppression to the changes in 2c03216d. · 72950dc1
      Andres Freund authored
      CRC computation is now done in XLogRecordAssemble.
      72950dc1