1. 01 Mar, 2017 1 commit
    • Andres Freund's avatar
      Reduce size of common allocation header. · 7e3aa03b
      Andres Freund authored
      The new slab allocator needs different per-allocation information than
      the classical aset.c.  The definition in 58b25e98 wasn't sufficiently
      careful on 32 platforms with 8 byte alignment, leading to buildfarm
      failures.  That's not entirely easy to fix by just adjusting the
      definition.
      
      As slab.c doesn't actually need the size part(s) of the common header,
      all chunks are equally sized after all, it seems better to instead
      reduce the header to the part needed by all allocators, namely which
      context an allocation belongs to. That has the advantage of reducing
      the overhead of slab allocations, and also allows for more flexibility
      in future allocators.
      
      To avoid spreading the logic about accessing a chunk's context around,
      centralize it in GetMemoryChunkContext(), which allows to delete a
      good number of lines.
      
      A followup commit will revise the mmgr/README portion about
      StandardChunkHeader, and more.
      
      Author: Andres Freund
      Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
      7e3aa03b
  2. 28 Feb, 2017 2 commits
  3. 27 Feb, 2017 8 commits
    • Tom Lane's avatar
      Allow index AMs to return either HeapTuple or IndexTuple format during IOS. · 9b88f27c
      Tom Lane authored
      Previously, only IndexTuple format was supported for the output data of
      an index-only scan.  This is fine for btree, which is just returning a
      verbatim index tuple anyway.  It's not so fine for SP-GiST, which can
      return reconstructed data that's much larger than a page.
      
      To fix, extend the index AM API so that index-only scan data can be
      returned in either HeapTuple or IndexTuple format.  There's other ways
      we could have done it, but this way avoids an API break for index AMs
      that aren't concerned with the issue, and it costs little except a couple
      more fields in IndexScanDescs.
      
      I changed both GiST and SP-GiST to use the HeapTuple method.  I'm not
      very clear on whether GiST can reconstruct data that's too large for an
      IndexTuple, but that seems possible, and it's not much of a code change to
      fix.
      
      Per a complaint from Vik Fearing.  Reviewed by Jason Li.
      
      Discussion: https://postgr.es/m/49527f79-530d-0bfe-3dad-d183596afa92@2ndquadrant.fr
      9b88f27c
    • Robert Haas's avatar
      hash: Refactor overflow page allocation. · 30df93f6
      Robert Haas authored
      As with commit b0f18cb7, the goal
      here is to move all of the related page modifications to a single
      section of code, in preparation for adding write-ahead logging.
      
      Amit Kapila, with slight changes by me.  The larger patch series
      of which this is a part has been reviewed and tested by Álvaro
      Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper
      Pedersen, all of whom should also have been credited in the
      previous commit message.
      30df93f6
    • Robert Haas's avatar
      hash: Refactor bucket squeeze code. · b0f18cb7
      Robert Haas authored
      In preparation for adding write-ahead logging to hash indexes,
      refactor _hash_freeovflpage and _hash_squeezebucket so that all
      related page modifications happen in a single section of code.  The
      previous coding assumed that it would be fine to move tuples one at a
      time, and also that the various operations involved in freeing an
      overflow page didn't necessarily all need to be done together, all
      of which is true if you don't care about write-ahead logging.
      
      Amit Kapila, with slight changes by me.
      b0f18cb7
    • Tom Lane's avatar
      Remove PL/Tcl's "module" facility. · 817f2a58
      Tom Lane authored
      PL/Tcl has long had a facility whereby Tcl code could be autoloaded from
      a database table named "pltcl_modules".  However, nobody is using it, as
      evidenced by the recent discovery that it's never been fixed to work with
      standard_conforming_strings turned on.  Moreover, it's rather shaky from
      a security standpoint, and the table design is very old and crufty (partly
      because it dates from before we had TOAST).  A final problem is that
      because the table-population scripts depend on the Tcl client library
      Pgtcl, which we removed from the core distribution in 2004, it's
      impossible to create a self-contained regression test for the feature.
      Rather than try to surmount these problems, let's just remove it.
      
      A follow-on patch will provide a way to execute user-defined
      initialization code, similar to features that exist in plperl and plv8.
      With that, it will be possible to implement this feature or similar ones
      entirely in userspace, which is where it belongs.
      
      Discussion: https://postgr.es/m/22067.1488046447@sss.pgh.pa.us
      817f2a58
    • Peter Eisentraut's avatar
      chomp PQerrorMessage() in backend uses · 2ed193c9
      Peter Eisentraut authored
      PQerrorMessage() returns an error message with a trailing newline, but
      in backend use (dblink, postgres_fdw, libpqwalreceiver), we want to have
      the error message without that for emitting via ereport().  To simplify
      that, add a function pchomp() that returns a pstrdup'ed string with the
      trailing newline characters removed.
      2ed193c9
    • Andres Freund's avatar
      Use the new "Slab" context for some allocations in reorderbuffer.h. · 9fab40ad
      Andres Freund authored
      Note that this change alone does not yet fully address the performance
      problems triggering this work, a large portion of the slowdown is
      triggered by the tuple allocator, which isn't converted to the new
      allocator.  It would be possible to do so, but using evenly sized
      objects, like both the current implementation in reorderbuffer.c and
      slab.c, wastes a fair amount of memory.  A later patch by Tomas will
      introduce a better approach.
      
      Author: Tomas Vondra
      Reviewed-By: Andres Freund
      Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
      9fab40ad
    • Andres Freund's avatar
      Add "Slab" MemoryContext implementation for efficient equal-sized allocations. · 58b25e98
      Andres Freund authored
      The default general purpose aset.c style memory context is not a great
      choice for allocations that are all going to be evenly sized,
      especially when those objects aren't small, and have varying
      lifetimes.  There tends to be a lot of fragmentation, larger
      allocations always directly go to libc rather than have their cost
      amortized over several pallocs.
      
      These problems lead to the introduction of ad-hoc slab allocators in
      reorderbuffer.c. But it turns out that the simplistic implementation
      leads to problems when a lot of objects are allocated and freed, as
      aset.c is still the underlying implementation. Especially freeing can
      easily run into O(n^2) behavior in aset.c.
      
      While the O(n^2) behavior in aset.c can, and probably will, be
      addressed, custom allocators for this behavior are more efficient
      both in space and time.
      
      This allocator is for evenly sized allocations, and supports both
      cheap allocations and freeing, without fragmenting significantly.  It
      does so by allocating evenly sized blocks via malloc(), and carves
      them into chunks that can be used for allocations.  In order to
      release blocks to the OS as early as possible, chunks are allocated
      from the fullest block that still has free objects, increasing the
      likelihood of a block being entirely unused.
      
      A subsequent commit uses this in reorderbuffer.c, but a further
      allocator is needed to resolve the performance problems triggering
      this work.
      
      There likely are further potentialy uses of this allocator besides
      reorderbuffer.c.
      
      There's potential further optimizations of the new slab.c, in
      particular the array of freelists could be replaced by a more
      intelligent structure - but for now this looks more than good enough.
      
      Author: Tomas Vondra, editorialized by Andres Freund
      Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
      Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
      58b25e98
    • Andres Freund's avatar
      Make useful infrastructure from aset.c generally available. · bfd12ccc
      Andres Freund authored
      An upcoming patch introduces a new type of memory context. To avoid
      duplicating debugging infrastructure within aset.c, move useful pieces
      to memdebug.[ch].
      
      While touching aset.c, fix printf format code in AllocFree* debug
      macros.
      
      Author: Tomas Vondra
      Reviewed-By: Andres Freund
      Discussion: https://postgr.es/m/b3b2245c-b37a-e1e5-ebc4-857c914bc747@2ndquadrant.com
      bfd12ccc
  4. 26 Feb, 2017 5 commits
  5. 25 Feb, 2017 4 commits
    • Tom Lane's avatar
      Put back #include <windows.h> in dirmod.c. · 285ca261
      Tom Lane authored
      I removed this in commit 9e3755ec, reasoning that the win32.h
      port-specific header file included by c.h would have provided it.
      However, that's only true on native win32 builds, not Cygwin builds.
      
      It may be that some of the other <windows.h> inclusions also need
      to be put back on the same grounds; but this is the only one that
      is clearly meant to be included #ifdef __CYGWIN__, so maybe this is
      the extent of the problem.  Awaiting further buildfarm results.
      285ca261
    • Tom Lane's avatar
      Remove some configure header-file checks that we weren't really using. · 2bd7f857
      Tom Lane authored
      We had some AC_CHECK_HEADER tests that were really wastes of cycles,
      because the code proceeded to #include those headers unconditionally
      anyway, in all or a large majority of cases.  The lack of complaints
      shows that those headers are available on every platform of interest,
      so we might as well let configure run a bit faster by not probing
      those headers at all.
      
      I suspect that some of the tests I left alone are equally useless, but
      since all the existing #includes of the remaining headers are properly
      guarded, I didn't touch them.
      2bd7f857
    • Tom Lane's avatar
      Remove useless duplicate inclusions of system header files. · 9e3755ec
      Tom Lane authored
      c.h #includes a number of core libc header files, such as <stdio.h>.
      There's no point in re-including these after having read postgres.h,
      postgres_fe.h, or c.h; so remove code that did so.
      
      While at it, also fix some places that were ignoring our standard pattern
      of "include postgres[_fe].h, then system header files, then other Postgres
      header files".  While there's not any great magic in doing it that way
      rather than system headers last, it's silly to have just a few files
      deviating from the general pattern.  (But I didn't attempt to enforce this
      globally, only in files I was touching anyway.)
      
      I'd be the first to say that this is mostly compulsive neatnik-ism,
      but over time it might save enough compile cycles to be useful.
      9e3755ec
    • Bruce Momjian's avatar
      pg_upgrade docs: clarify instructions on standby extensions · 5639cedd
      Bruce Momjian authored
      Previously the pg_upgrade standby upgrade instructions said not to
      execute pgcrypto.sql, but it should have referenced the extension
      command "CREATE EXTENSION pgcrypto".  This patch makes that doc change.
      
      Reported-by: a private bug report
      
      Backpatch-through: 9.4, where standby instructions were added
      5639cedd
  6. 24 Feb, 2017 4 commits
  7. 23 Feb, 2017 6 commits
    • Tom Lane's avatar
      Consistently declare timestamp variables as TimestampTz. · c29aff95
      Tom Lane authored
      Twiddle the replication-related code so that its timestamp variables
      are declared TimestampTz, rather than the uninformative "int64" that
      was previously used for meant-to-be-always-integer timestamps.
      This resolves the int64-vs-TimestampTz declaration inconsistencies
      introduced by commit 7c030783, though in the opposite direction to
      what was originally suggested.
      
      This required including datatype/timestamp.h in a couple more places
      than before.  I decided it would be a good idea to slim down that
      header by not having it pull in <float.h> etc, as those headers are
      no longer at all relevant to its purpose.  Unsurprisingly, a small number
      of .c files turn out to have been depending on those inclusions, so add
      them back in the .c files as needed.
      
      Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
      Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
      c29aff95
    • Tom Lane's avatar
      Remove now-dead code for !HAVE_INT64_TIMESTAMP. · b9d092c9
      Tom Lane authored
      This is a basically mechanical removal of #ifdef HAVE_INT64_TIMESTAMP
      tests and the negative-case controlled code.
      
      Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
      b9d092c9
    • Tom Lane's avatar
      Remove pg_control's enableIntTimes field. · d28aafb6
      Tom Lane authored
      We don't need it any more.
      
      pg_controldata continues to report that date/time type storage is
      "64-bit integers", but that's now a hard-wired behavior not something
      it sees in the data.  This avoids breaking pg_upgrade, and perhaps other
      utilities that inspect pg_control this way.  Ditto for pg_resetwal.
      
      I chose to remove the "bigint_timestamps" output column of
      pg_control_init(), though, as that function hasn't been around long
      and probably doesn't have ossified users.
      
      Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
      d28aafb6
    • Tom Lane's avatar
      De-support floating-point timestamps. · b6aa17e0
      Tom Lane authored
      Per discussion, the time has come to do this.  The handwriting has been
      on the wall at least since 9.0 that this would happen someday, whenever
      it got to be too much of a burden to support the float-timestamp option.
      The triggering factor now is the discovery that there are multiple bugs
      in the code that attempts to implement use of integer timestamps in the
      replication protocol even when the server is built for float timestamps.
      The internal float timestamps leak into the protocol fields in places.
      While we could fix the identified bugs, there's a very high risk of
      introducing more.  Trying to build a wall that would positively prevent
      mixing integer and float timestamps is more complexity than we want to
      undertake to maintain a long-deprecated option.  The fact that these
      bugs weren't found through testing also indicates a lack of interest
      in float timestamps.
      
      This commit disables configure's --disable-integer-datetimes switch
      (it'll still accept --enable-integer-datetimes, though), removes direct
      references to USE_INTEGER_DATETIMES, and removes discussion of float
      timestamps from the user documentation.  A considerable amount of code is
      rendered dead by this, but removing that will occur as separate mop-up.
      
      Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
      b6aa17e0
    • Peter Eisentraut's avatar
      Fix logical replication with different encodings · c3368f91
      Peter Eisentraut authored
      reported by Shinoda, Noriyoshi <noriyoshi.shinoda@hpe.com>; partial
      patch by Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
      c3368f91
    • Peter Eisentraut's avatar
      Remove deprecated COMMENT ON RULE syntax · e8d016d8
      Peter Eisentraut authored
      This was only used for allowing upgrades from pre-7.3 instances, which
      was a long time ago.
      e8d016d8
  8. 22 Feb, 2017 6 commits
  9. 21 Feb, 2017 4 commits
    • Tom Lane's avatar
      Suppress unused-variable warning. · c56ac291
      Tom Lane authored
      Rearrange so we don't have an unused variable in disable-cassert case.
      
      Discussion: https://postgr.es/m/CAMkU=1x63f2QyFTeas83xJqD+Hm1PBuok1LrzYzS-OngDzYOVA@mail.gmail.com
      c56ac291
    • Tom Lane's avatar
      Fix sloppy handling of corner-case errors in fd.c. · f97de05a
      Tom Lane authored
      Several places in fd.c had badly-thought-through handling of error returns
      from lseek() and close().  The fact that those would seldom fail on valid
      FDs is probably the reason we've not noticed this up to now; but if they
      did fail, we'd get quite confused.
      
      LruDelete and LruInsert actually just Assert'd that lseek never fails,
      which is pretty awful on its face.
      
      In LruDelete, we indeed can't throw an error, because that's likely to get
      called during error abort and so throwing an error would probably just lead
      to an infinite loop.  But by the same token, throwing an error from the
      close() right after that was ill-advised, not to mention that it would've
      left the LRU state corrupted since we'd already unlinked the VFD from the
      list.  I also noticed that really, most of the time, we should know the
      current seek position and it shouldn't be necessary to do an lseek here at
      all.  As patched, if we don't have a seek position and an lseek attempt
      doesn't give us one, we'll close the file but then subsequent re-open
      attempts will fail (except in the somewhat-unlikely case that a
      FileSeek(SEEK_SET) call comes between and allows us to re-establish a known
      target seek position).  This isn't great but it won't result in any state
      corruption.
      
      Meanwhile, having an Assert instead of an honest test in LruInsert is
      really dangerous: if that lseek failed, a subsequent read or write would
      read or write from the start of the file, not where the caller expected,
      leading to data corruption.
      
      In both LruDelete and FileClose, if close() fails, just LOG that and mark
      the VFD closed anyway.  Possibly leaking an FD is preferable to getting
      into an infinite loop or corrupting the VFD list.  Besides, as far as I can
      tell from the POSIX spec, it's unspecified whether or not the file has been
      closed, so treating it as still open could be the wrong thing anyhow.
      
      I also fixed a number of other places that were being sloppy about
      behaving correctly when the seekPos is unknown.
      
      Also, I changed FileSeek to return -1 with EINVAL for the cases where it
      detects a bad offset, rather than throwing a hard elog(ERROR).  It seemed
      pretty inconsistent that some bad-offset cases would get a failure return
      while others got elog(ERROR).  It was missing an offset validity check for
      the SEEK_CUR case on a closed file, too.
      
      Back-patch to all supported branches, since all this code is fundamentally
      identical in all of them.
      
      Discussion: https://postgr.es/m/2982.1487617365@sss.pgh.pa.us
      f97de05a
    • Alvaro Herrera's avatar
      Add tests for two-phase commit · 30820982
      Alvaro Herrera authored
      There's some ongoing performance work on this area, so let's make sure
      we don't break things.
      
      Extracted from a larger patch originally by Stas Kelvich.
      
      Authors: Stas Kelvich, Nikhil Sontakke, Michael Paquier
      Discussion: https://postgr.es/m/CAMGcDxfsuLLOg=h5cTg3g77Jjk-UGnt=RW7zK57zBSoFsapiWA@mail.gmail.com
      30820982
    • Peter Eisentraut's avatar
      Fix whitespace · 74321d87
      Peter Eisentraut authored
      74321d87