1. 20 Nov, 2014 8 commits
    • Tom Lane's avatar
      Initial code review for CustomScan patch. · a34fa8ee
      Tom Lane authored
      Get rid of the pernicious entanglement between planner and executor headers
      introduced by commit 0b03e595.
      
      Also, rearrange the CustomFoo struct/typedef definitions so that all the
      typedef names are seen as used by the compiler.  Without this pgindent
      will mess things up a bit, which is not so important perhaps, but it also
      removes a bizarre discrepancy between the declaration arrangement used for
      CustomExecMethods and that used for CustomScanMethods and
      CustomPathMethods.
      
      Clean up the commentary around ExecSupportsMarkRestore to reflect the
      rather large change in its API.
      
      Const-ify register_custom_path_provider's argument.  This necessitates
      casting away const in the function, but that seems better than forcing
      callers of the function to do so (or else not const-ify their method
      pointer structs, which was sort of the whole point).
      
      De-export fix_expr_common.  I don't like the exporting of fix_scan_expr
      or replace_nestloop_params either, but this one surely has got little
      excuse.
      a34fa8ee
    • Tom Lane's avatar
      Fix another oversight in CustomScan patch. · 081a6048
      Tom Lane authored
      execCurrent.c's search_plan_tree() must recognize a CustomScan on the
      target relation.  This would only be helpful for custom providers that
      support CurrentOfExpr quals, which is probably a bit far-fetched, but
      it's not impossible I think.  But even without assuming that, we need
      to recognize a scanned-relation match so that we will properly throw
      error if the desired relation is being scanned with both a CustomScan
      and a regular scan (ie, self-join).
      
      Also recognize ForeignScanState for similar reasons.  Supporting WHERE
      CURRENT OF on a foreign table is probably even more far-fetched than
      it is for custom scans, but I think in principle you could do it with
      postgres_fdw (or another FDW that supports the ctid column).  This
      would be a back-patchable bug fix if existing FDWs handled CurrentOfExpr,
      but I doubt any do so I won't bother back-patching.
      081a6048
    • Tom Lane's avatar
      Fix another oversight in CustomScan patch. · 03e574af
      Tom Lane authored
      disuse_physical_tlist() must work for all plan types handled by
      create_scan_plan().
      03e574af
    • Tom Lane's avatar
      Remove no-longer-needed phony typedefs in genbki.h. · c5111ea9
      Tom Lane authored
      Now that we have a policy of hiding varlena catalog fields behind
      "#ifdef CATALOG_VARLEN", there is no need for their type names to be
      acceptable to the C compiler.  And experimentation shows that it does
      not matter to pgindent either.  (If it did, we'd have problems anyway,
      since these typedefs are unreferenced so far as the C compiler is
      concerned, and find_typedef fails to identify such typedefs.)
      
      Hence, remove the phony typedefs that genbki.h provided to make
      some varlena field definitions compilable.
      
      In passing, rearrange #define's into what seemed a more logical order.
      c5111ea9
    • Tom Lane's avatar
      Add missing case for CustomScan. · f9e0255c
      Tom Lane authored
      Per KaiGai Kohei.
      
      In passing improve formatting of some code added in commit 30d7ae3c,
      because otherwise pgindent will make a mess of it.
      f9e0255c
    • Heikki Linnakangas's avatar
      Silence compiler warning about variable being used uninitialized. · f4640421
      Heikki Linnakangas authored
      It's a false positive - the variable is only used when 'onleft' is true,
      and it is initialized in that case. But the compiler doesn't necessarily
      see that.
      f4640421
    • Heikki Linnakangas's avatar
      Revamp the WAL record format. · 2c03216d
      Heikki Linnakangas authored
      Each WAL record now carries information about the modified relation and
      block(s) in a standardized format. That makes it easier to write tools that
      need that information, like pg_rewind, prefetching the blocks to speed up
      recovery, etc.
      
      There's a whole new API for building WAL records, replacing the XLogRecData
      chains used previously. The new API consists of XLogRegister* functions,
      which are called for each buffer and chunk of data that is added to the
      record. The new API also gives more control over when a full-page image is
      written, by passing flags to the XLogRegisterBuffer function.
      
      This also simplifies the XLogReadBufferForRedo() calls. The function can dig
      the relation and block number from the WAL record, so they no longer need to
      be passed as arguments.
      
      For the convenience of redo routines, XLogReader now disects each WAL record
      after reading it, copying the main data part and the per-block data into
      MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
      but the redo routines can assume that the pointers returned by XLogRecGet*
      functions are. Redo routines are now passed the XLogReaderState, which
      contains the record in the already-disected format, instead of the plain
      XLogRecord.
      
      The new record format also makes the fixed size XLogRecord header smaller,
      by removing the xl_len field. The length of the "main data" portion is now
      stored at the end of the WAL record, and there's a separate header after
      XLogRecord for it. The alignment padding at the end of XLogRecord is also
      removed. This compansates for the fact that the new format would otherwise
      be more bulky than the old format.
      
      Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
      Fujii Masao.
      2c03216d
    • Peter Eisentraut's avatar
      Fix suggested layout for PGXS makefile · 8dc626de
      Peter Eisentraut authored
      Custom rules must come after pgxs inclusion, not before, because any
      rule added before pgxs will break the default 'all' target.
      
      Author: Cédric Villemain <cedric@2ndquadrant.fr>
      8dc626de
  2. 19 Nov, 2014 6 commits
    • Tom Lane's avatar
      Improve documentation's description of JOIN clauses. · 8372304e
      Tom Lane authored
      In bug #12000, Andreas Kunert complained that the documentation was
      misleading in saying "FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2".
      That's correct as far as it goes, but the equivalence doesn't hold when
      you consider three or more tables, since JOIN binds more tightly than
      comma.  I added a <note> to explain this, and ended up rearranging some
      of the existing text so that the note would make sense in context.
      
      In passing, rewrite the description of JOIN USING, which was unnecessarily
      vague, and hadn't been helped any by somebody's reliance on markup as a
      substitute for clear writing.  (Mostly this involved reintroducing a
      concrete example that was unaccountably removed by commit 032f3b7e.)
      
      Back-patch to all supported branches.
      8372304e
    • Heikki Linnakangas's avatar
      Add test cases for indexam operations not currently covered. · 88fc7192
      Heikki Linnakangas authored
      That includes VACUUM on GIN, GiST and SP-GiST indexes, and B-tree indexes
      large enough to cause page deletions in B-tree. Plus some other special
      cases.
      
      After this patch, the regression tests generate all different WAL record
      types. Not all branches within the redo functions are covered, but it's a
      step forward.
      88fc7192
    • Robert Haas's avatar
      Avoid file descriptor leak in pg_test_fsync. · a0165553
      Robert Haas authored
      This can cause problems on Windows, where files that are still open
      can't be unlinked.
      
      Jeff Janes
      a0165553
    • Fujii Masao's avatar
      Fix bug in the test of file descriptor of current WAL file in pg_receivexlog. · d5f4df72
      Fujii Masao authored
      In pg_receivexlog, in order to check whether the current WAL file is
      being opened or not, its file descriptor has to be checked against -1
      as an invalid value. But, oops, 7900e94 added the incorrect test
      checking the descriptor against 1. This commit fixes that bug.
      
      Back-patch to 9.4 where the bug was added.
      
      Spotted by Magnus Hagander
      d5f4df72
    • Fujii Masao's avatar
      Fix pg_receivexlog --slot so that it doesn't prevent the server shutdown. · f66c20b3
      Fujii Masao authored
      When pg_receivexlog --slot is connecting to the server, at the shutdown
      of the server, walsender keeps waiting for the last WAL record to be
      replicated and flushed in pg_receivexlog. But previously pg_receivexlog
      issued sync command only when WAL file was switched. So there was
      the case where the last WAL was never flushed and walsender had to
      keep waiting infinitely. This caused the server shutdown to get stuck.
      
      pg_recvlogical handles this problem by calling fsync() when it receives
      the request of immediate reply from the server. That is, at shutdown,
      walsender sends the request, pg_recvlogical receives it, flushes the last
      WAL record, and sends the flush location back to the server. Since
      walsender can see that the last WAL record is successfully flushed, it can
      exit cleanly.
      
      This commit introduces the same logic as pg_recvlogical has,
      to pg_receivexlog.
      
      Back-patch to 9.4 where pg_receivexlog was changed so that it can use
      the replication slot.
      
      Original patch by Michael Paquier, rewritten by me.
      Bug report by Furuya Osamu.
      f66c20b3
    • Tom Lane's avatar
      Don't require bleeding-edge timezone data in timestamptz regression test. · 8d7af8fb
      Tom Lane authored
      The regression test cases added in commits b2cbced9 et al depended in part
      on the Russian timezone offset changes of Oct 2014.  While this is of no
      particular concern for a default Postgres build, it was possible for a
      build using --with-system-tzdata to fail the tests if the system tzdata
      database wasn't au courant.  Bjorn Munch and Christoph Berg both complained
      about this while packaging 9.4rc1, so we probably shouldn't insist on the
      system tzdata being up-to-date.  Instead, make an equivalent test using a
      zone change that occurred in Venezuela in 2007.  With this patch, the
      regression tests should pass using any tzdata set from 2012 or later.
      (I can't muster much sympathy for somebody using --with-system-tzdata
      on a machine whose system tzdata is more than three years out-of-date.)
      8d7af8fb
  3. 18 Nov, 2014 4 commits
    • Tom Lane's avatar
      Update comments in find_typedef. · 7aa8d9e5
      Tom Lane authored
      These comments don't seem to have been touched in a long time.  Make them
      describe the current implementation rather than what was here last century,
      and be a bit more explicit about the unreferenced-typedefs issue.
      7aa8d9e5
    • Tom Lane's avatar
      Fix some bogus direct uses of realloc(). · 8b13e5c6
      Tom Lane authored
      pg_dump/parallel.c was using realloc() directly with no error check.
      While the odds of an actual failure here seem pretty low, Coverity
      complains about it, so fix by using pg_realloc() instead.
      
      While looking for other instances, I noticed a couple of places in
      psql that hadn't gotten the memo about the availability of pg_realloc.
      These aren't bugs, since they did have error checks, but verbosely
      inconsistent code is not a good thing.
      
      Back-patch as far as 9.3.  9.2 did not have pg_dump/parallel.c, nor
      did it have pg_realloc available in all frontend code.
      8b13e5c6
    • Simon Riggs's avatar
      Reduce btree scan overhead for < and > strategies · 606c0123
      Simon Riggs authored
      For <, <=, > and >= strategies, mark the first scan key
      as already matched if scanning in an appropriate direction.
      If index tuple contains no nulls we can skip the first
      re-check for each tuple.
      
      Author: Rajeev Rastogi
      Reviewer: Haribabu Kommi
      Rework of the code and comments by Simon Riggs
      606c0123
    • Heikki Linnakangas's avatar
      Remove obsolete debugging option, RTDEBUG. · dedae6c2
      Heikki Linnakangas authored
      The r-tree AM that used it was removed back in 2005.
      
      Peter Geoghegan
      dedae6c2
  4. 17 Nov, 2014 7 commits
    • Simon Riggs's avatar
      Add pg_dump --snapshot option · be1cc8f4
      Simon Riggs authored
      Allows pg_dump to use a snapshot previously defined by a concurrent
      session that has either used pg_export_snapshot() or obtained a
      snapshot when creating a logical slot. When this option is used with
      parallel pg_dump, the snapshot defined by this option is used and no
      new snapshot is taken.
      
      Simon Riggs and Michael Paquier
      be1cc8f4
    • Tom Lane's avatar
      83205404
    • Fujii Masao's avatar
      Add --synchronous option to pg_receivexlog, for more reliable WAL writing. · c4f99d20
      Fujii Masao authored
      Previously pg_receivexlog flushed WAL data only when WAL file was switched.
      Then 3dad73e7 added -F option to pg_receivexlog so that users could control
      how frequently sync commands were issued to WAL files. It also allowed users
      to make pg_receivexlog flush WAL data immediately after writing by
      specifying 0 in -F option. However feedback messages were not sent back
      immediately even after a flush location was updated. So even if WAL data
      was flushed in real time, the server could not see that for a while.
      
      This commit removes -F option from and adds --synchronous to pg_receivexlog.
      If --synchronous is specified, like the standby's wal receiver, pg_receivexlog
      flushes WAL data as soon as there is WAL data which has not been flushed yet.
      Then it sends back the feedback message identifying the latest flush location
      to the server. This option is useful to make pg_receivexlog behave as sync
      standby by using replication slot, for example.
      
      Original patch by Furuya Osamu, heavily rewritten by me.
      Reviewed by Heikki Linnakangas, Alvaro Herrera and Sawada Masahiko.
      c4f99d20
    • Tom Lane's avatar
      Update time zone data files to tzdata release 2014j. · bc241488
      Tom Lane authored
      DST law changes in the Turks & Caicos Islands (America/Grand_Turk) and
      in Fiji.  New zone Pacific/Bougainville for portions of Papua New Guinea.
      Historical changes for Korea and Vietnam.
      bc241488
    • Heikki Linnakangas's avatar
      Fix WAL-logging of B-tree "unlink halfdead page" operation. · c73669c0
      Heikki Linnakangas authored
      There was some confusion on how to record the case that the operation
      unlinks the last non-leaf page in the branch being deleted.
      _bt_unlink_halfdead_page set the "topdead" field in the WAL record to
      the leaf page, but the redo routine assumed that it would be an invalid
      block number in that case. This commit fixes _bt_unlink_halfdead_page to
      do what the redo routine expected.
      
      This code is new in 9.4, so backpatch there.
      c73669c0
    • Alvaro Herrera's avatar
      Fix relpersistence setting in reindex_index · 0f9692b4
      Alvaro Herrera authored
      Buildfarm members with CLOBBER_CACHE_ALWAYS advised us that commit
      85b506bb was mistaken in setting the relpersistence value of the
      index directly in the relcache entry, within reindex_index.  The reason
      for the failure is that an invalidation message that comes after mucking
      with the relcache entry directly, but before writing it to the catalogs,
      would cause the entry to become rebuilt in place from catalogs with the
      old contents, losing the update.
      
      Fix by passing the correct persistence value to
      RelationSetNewRelfilenode instead; this routine also writes the updated
      tuple to pg_class, avoiding the problem.  Suggested by Tom Lane.
      0f9692b4
    • Peter Eisentraut's avatar
      Translation updates · 7466a1b7
      Peter Eisentraut authored
      7466a1b7
  5. 16 Nov, 2014 2 commits
  6. 15 Nov, 2014 7 commits
    • Simon Riggs's avatar
      Emit msg re skipping ANALYZE for absent inh tree · 0f66d212
      Simon Riggs authored
      When checking a table that has an inheritance tree marked,
      if no child tables remain, we skip ANALYZE. This patch emits
      a message to show that the action has been skipped.
      
      Author: Etsuro Fujita
      Reviewer: Furuya Osamu
      0f66d212
    • Alvaro Herrera's avatar
      Get rid of SET LOGGED indexes persistence kludge · 85b506bb
      Alvaro Herrera authored
      This removes ATChangeIndexesPersistence() introduced by f41872d0
      which was too ugly to live for long.  Instead, the correct persistence
      marking is passed all the way down to reindex_index, so that the
      transient relation built to contain the index relfilenode can
      get marked correctly right from the start.
      
      Author: Fabrízio de Royes Mello
      Review and editorialization by Michael Paquier
                                           and Álvaro Herrera
      85b506bb
    • Alvaro Herrera's avatar
      Remove unused InhPaths · e4d1e264
      Alvaro Herrera authored
      Allegedly, the last remaining usages of that struct were removed by
      0e99be1c.
      
      Author: Peter Geoghegan
      e4d1e264
    • Alvaro Herrera's avatar
    • Andres Freund's avatar
      Fix initdb --sync-only to also sync tablespaces. · 522c85a6
      Andres Freund authored
      630cd144 added initdb --sync-only, for use by pg_upgrade, by just
      exposing the existing fsync code. That's wrong, because initdb so far
      had absolutely no reason to deal with tablespaces.
      
      Fix --sync-only by additionally explicitly syncing each of the
      tablespaces.
      
      Backpatch to 9.3 where --sync-only was introduced.
      
      Abhijit Menon-Sen and Andres Freund
      522c85a6
    • Andres Freund's avatar
      Sync unlogged relations to disk after they have been reset. · 98ec7fd9
      Andres Freund authored
      Unlogged relations are only reset when performing a unclean
      restart. That means they have to be synced to disk during clean
      shutdowns. During normal processing that's achieved by registering a
      buffer's file to be fsynced at the next checkpoint when flushed. But
      ResetUnloggedRelations() doesn't go through the buffer manager, so
      nothing will force reset relations to disk before the next shutdown
      checkpoint.
      
      So just make ResetUnloggedRelations() fsync the newly created main
      forks to disk.
      
      Discussion: 20140912112246.GA4984@alap3.anarazel.de
      
      Backpatch to 9.1 where unlogged tables were introduced.
      
      Abhijit Menon-Sen and Andres Freund
      98ec7fd9
    • Andres Freund's avatar
      Ensure unlogged tables are reset even if crash recovery errors out. · d3586fc8
      Andres Freund authored
      Unlogged relations are reset at the end of crash recovery as they're
      only synced to disk during a proper shutdown. Unfortunately that and
      later steps can fail, e.g. due to running out of space. This reset
      was, up to now performed after marking the database as having finished
      crash recovery successfully. As out of space errors trigger a crash
      restart that could lead to the situation that not all unlogged
      relations are reset.
      
      Once that happend usage of unlogged relations could yield errors like
      "could not open file "...": No such file or directory". Luckily
      clusters that show the problem can be fixed by performing a immediate
      shutdown, and starting the database again.
      
      To fix, just call ResetUnloggedRelations(UNLOGGED_RELATION_INIT)
      earlier, before marking the database as having successfully recovered.
      
      Discussion: 20140912112246.GA4984@alap3.anarazel.de
      
      Backpatch to 9.1 where unlogged tables were introduced.
      
      Abhijit Menon-Sen and Andres Freund
      d3586fc8
  7. 14 Nov, 2014 6 commits
    • Tom Lane's avatar
      Document evaluation-order considerations for aggregate functions. · 0ce627d4
      Tom Lane authored
      The SELECT reference page didn't really address the question of when
      aggregate function evaluation occurs, nor did the "expression evaluation
      rules" documentation mention that CASE can't be used to control whether
      an aggregate gets evaluated or not.  Improve that.
      
      Per discussion of bug #11661.  Original text by Marti Raudsepp and Michael
      Paquier, rewritten significantly by me.
      0ce627d4
    • Stephen Frost's avatar
      Clean up includes from RLS patch · 80eacaa3
      Stephen Frost authored
      The initial patch for RLS mistakenly included headers associated with
      the executor and planner bits in rewrite/rowsecurity.h.  Per policy and
      general good sense, executor headers should not be included in planner
      headers or vice versa.
      
      The include of execnodes.h was a mistaken holdover from previous
      versions, while the include of relation.h was used for Relation's
      definition, which should have been coming from utils/relcache.h.  This
      patch cleans these issues up, adds comments to the RowSecurityPolicy
      struct and the RowSecurityConfigType enum, and changes Relation->rsdesc
      to Relation->rd_rsdesc to follow Relation field naming convention.
      
      Additionally, utils/rel.h was including rewrite/rowsecurity.h, which
      wasn't a great idea since that was pulling in things not really needed
      in utils/rel.h (which gets included in quite a few places).  Instead,
      use 'struct RowSecurityDesc' for the rd_rsdesc field and add comments
      explaining why.
      
      Lastly, add an include into access/nbtree/nbtsort.c for
      utils/sortsupport.h, which was evidently missed due to the above mess.
      
      Pointed out by Tom in 16970.1415838651@sss.pgh.pa.us; note that the
      concerns regarding a similar situation in the custom-path commit still
      need to be addressed.
      80eacaa3
    • Alvaro Herrera's avatar
      Document BRIN's pages_per_range in CREATE INDEX · 79172a58
      Alvaro Herrera authored
      Author: Michael Paquier
      79172a58
    • Stephen Frost's avatar
      Revert change to ALTER TABLESPACE summary. · 155c0f24
      Stephen Frost authored
      When ALTER TABLESPACE MOVE ALL was changed to be ALTER TABLE ALL IN
      TABLESPACE, the ALTER TABLESPACE summary should have been adjusted back
      to its original definition.
      
      Patch by Thom Brown (thanks!).
      155c0f24
    • Alvaro Herrera's avatar
      Reduce disk footprint of brin regression test · 86cf9a56
      Alvaro Herrera authored
      Per complaint from Tom.
      
      While at it, throw in some extra tests for nulls as well, and make sure
      that the set of data we insert on the second round is not identical to
      the first one.  Both measures are intended to improve coverage of the
      test.
      
      Also uncomment the ON COMMIT DROP clause on the CREATE TEMP TABLE
      commands.  This doesn't have any effect for someone examining the
      regression database after the tests are done, but it reduces clutter for
      those that execute the script directly.
      86cf9a56
    • Alvaro Herrera's avatar
      Allow interrupting GetMultiXactIdMembers · 51f9ea25
      Alvaro Herrera authored
      This function has a loop which can lead to uninterruptible process
      "stalls" (actually infinite loops) when some bugs are triggered.  Avoid
      that unpleasant situation by adding a check for interrupts in a place
      that shouldn't degrade performance in the normal case.
      
      Backpatch to 9.3.  Older branches have an identical loop here, but the
      aforementioned bugs are only a problem starting in 9.3 so there doesn't
      seem to be any point in backpatching any further.
      51f9ea25