1. 20 Sep, 2010 1 commit
  2. 11 Sep, 2010 1 commit
    • Joe Conway's avatar
      SERIALIZABLE transactions are actually implemented beneath the covers with · 5eb15c99
      Joe Conway authored
      transaction snapshots, i.e. a snapshot registered at the beginning of
      a transaction. Change variable naming and comments to reflect this reality
      in preparation for a future, truly serializable mode, e.g.
      Serializable Snapshot Isolation (SSI).
      
      For the moment transaction snapshots are still used to implement
      SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin
      Grittner and Dan Ports with review and some minor wording changes by me.
      5eb15c99
  3. 26 Aug, 2010 1 commit
    • Tom Lane's avatar
      Fix ExecMakeTableFunctionResult to verify that all rows returned by a SRF · db2d9c60
      Tom Lane authored
      returning "record" actually do have the same rowtype.  This is needed because
      the parser can't realistically enforce that they will all have the same typmod,
      as seen in a recent example from David Wheeler.
      
      Back-patch to 8.0, which is as far back as we have the notion of RECORD
      subtypes being distinguished by typmod.  Wheeler's example depends on
      8.4-and-up features, but I suspect there may be ways to provoke similar
      failures before 8.4.
      db2d9c60
  4. 18 Aug, 2010 1 commit
    • Tom Lane's avatar
      Reset the per-output-tuple exprcontext each time through the main loop in · 3573c834
      Tom Lane authored
      ExecModifyTable().  This avoids memory leakage when trigger functions leave
      junk behind in that context (as they more or less must).  Problem and solution
      identified by Dean Rasheed.
      
      I'm a bit concerned about the longevity of this solution --- once a plan can
      have multiple ModifyTable nodes, we are very possibly going to have to do
      something different.  But it should hold up for 9.0.
      3573c834
  5. 05 Aug, 2010 1 commit
    • Robert Haas's avatar
      Standardize get_whatever_oid functions for object types with · 2a6ef344
      Robert Haas authored
      unqualified names.
      
      - Add a missing_ok parameter to get_tablespace_oid.
      - Avoid duplicating get_tablespace_od guts in objectNamesToOids.
      - Add a missing_ok parameter to get_database_oid.
      - Replace get_roleid and get_role_checked with get_role_oid.
      - Add get_namespace_oid, get_language_oid, get_am_oid.
      - Refactor existing code to use new interfaces.
      
      Thanks to KaiGai Kohei for the review.
      2a6ef344
  6. 28 Jul, 2010 2 commits
    • Tom Lane's avatar
      Fix oversight in new EvalPlanQual logic: the second loop over the ExecRowMark · 77c75076
      Tom Lane authored
      list in ExecLockRows() forgot to allow for the possibility that some of the
      rowmarks are for child tables that aren't relevant to the current row.
      Per report from Kenichiro Tanaka.
      77c75076
    • Tom Lane's avatar
      Fix potential failure when hashing the output of a subplan that produces · 133924e1
      Tom Lane authored
      a pass-by-reference datatype with a nontrivial projection step.
      We were using the same memory context for the projection operation as for
      the temporary context used by the hashtable routines in execGrouping.c.
      However, the hashtable routines feel free to reset their temp context at
      any time, which'd lead to destroying input data that was still needed.
      Report and diagnosis by Tao Ma.
      
      Back-patch to 8.1, where the problem was introduced by the changes that
      allowed us to work with "virtual" tuples instead of materializing intermediate
      tuple values everywhere.  The earlier code looks quite similar, but it doesn't
      suffer the problem because the data gets copied into another context as a
      result of having to materialize ExecProject's output tuple.
      133924e1
  7. 25 Jul, 2010 1 commit
  8. 22 Jul, 2010 1 commit
    • Robert Haas's avatar
      Centralize DML permissions-checking logic. · b8c6c71d
      Robert Haas authored
      Remove bespoke code in DoCopy and RI_Initial_Check, which now instead
      fabricate call ExecCheckRTPerms with a manufactured RangeTblEntry.
      This is intended to make it feasible for an enhanced security provider
      to actually make use of ExecutorCheckPerms_hook, but also has the
      advantage that RI_Initial_Check can allow use of the fast-path when
      column-level but not table-level permissions are present.
      
      KaiGai Kohei.  Reviewed (in an earlier version) by Stephen Frost, and by me.
      Some further changes to the comments by me.
      b8c6c71d
  9. 16 Jul, 2010 1 commit
    • Tom Lane's avatar
      Remove a sanity check in the exclusion-constraint code that prevented users · e11cfa87
      Tom Lane authored
      from defining non-self-conflicting constraints.
      
      Jeff Davis
      
      Note: I (tgl) objected to removing this check in 9.0 on the grounds that it
      was an important sanity check in new, poorly tested code.  However, it should
      be all right to remove it for 9.1, since we'll get field testing from the
      9.0 branch.
      e11cfa87
  10. 12 Jul, 2010 1 commit
    • Tom Lane's avatar
      Make NestLoop plan nodes pass outer-relation variables into their inner · 53e75768
      Tom Lane authored
      relation using the general PARAM_EXEC executor parameter mechanism, rather
      than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
      The previous method was hard to understand and could never be extended to
      handle parameters coming from multiple join levels.  This patch doesn't
      change the set of possible plans nor have any significant performance effect,
      but it's necessary infrastructure for future generalization of the concept
      of an inner indexscan plan.
      
      ExecReScan's second parameter is now unused, so it's removed.
      53e75768
  11. 09 Jul, 2010 1 commit
  12. 06 Jul, 2010 1 commit
  13. 29 May, 2010 1 commit
  14. 28 May, 2010 1 commit
    • Tom Lane's avatar
      Rejigger mergejoin logic so that a tuple with a null in the first merge column · f39d57b8
      Tom Lane authored
      is treated like end-of-input, if nulls sort last in that column and we are not
      doing outer-join filling for that input.  In such a case, the tuple cannot
      join to anything from the other input (because we assume mergejoinable
      operators are strict), and neither can any tuple following it in the sort
      order.  If we're not interested in doing outer-join filling we can just
      pretend the tuple and its successors aren't there at all.  This can save a
      great deal of time in situations where there are many nulls in the join
      column, as in a recent example from Scott Marlowe.  Also, since the planner
      tends to not count nulls in its mergejoin scan selectivity estimates, this
      is an important fix to make the runtime behavior more like the estimate.
      
      I regard this as an omission in the patch I wrote years ago to teach mergejoin
      that tuples containing nulls aren't joinable, so I'm back-patching it.  But
      only to 8.3 --- in older versions, we didn't have a solid notion of whether
      nulls sort high or low, so attempting to apply this optimization could break
      things.
      f39d57b8
  15. 28 Apr, 2010 1 commit
    • Heikki Linnakangas's avatar
      Introduce wal_level GUC to explicitly control if information needed for · 9b8a7332
      Heikki Linnakangas authored
      archival or hot standby should be WAL-logged, instead of deducing that from
      other options like archive_mode. This replaces recovery_connections GUC in
      the primary, where it now has no effect, but it's still used in the standby
      to enable/disable hot standby.
      
      Remove the WAL-logging of "unlogged operations", like creating an index
      without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
      the wal_mode setting and the settings that affect how much shared memory a
      hot standby server needs to track master transactions (max_connections,
      max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
      change, at server restart, write a WAL record noting the new settings and
      update pg_control. This allows us to notice the change in those settings in
      the standby at the right moment, they used to be included in checkpoint
      records, but that meant that a changed value was not reflected in the
      standby until the first checkpoint after the change.
      
      Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
      the sequence it used to follow, before hot standby and subsequent patches
      changed it to 0x9003.
      9b8a7332
  16. 21 Mar, 2010 1 commit
  17. 19 Mar, 2010 1 commit
    • Tom Lane's avatar
      Modify error context callback functions to not assume that they can fetch · a836abe9
      Tom Lane authored
      catalog entries via SearchSysCache and related operations.  Although, at the
      time that these callbacks are called by elog.c, we have not officially aborted
      the current transaction, it still seems rather risky to initiate any new
      catalog fetches.  In all these cases the needed information is readily
      available in the caller and so it's just a matter of a bit of extra notation
      to pass it to the callback.
      
      Per crash report from Dennis Koegel.  I've concluded that the real fix for
      his problem is to clear the error context stack at entry to proc_exit, but
      it still seems like a good idea to make the callbacks a bit less fragile
      for other cases.
      
      Backpatch to 8.4.  We could go further back, but the patch doesn't apply
      cleanly.  In the absence of proof that this fixes something and isn't just
      paranoia, I'm not going to expend the effort.
      a836abe9
  18. 26 Feb, 2010 1 commit
  19. 20 Feb, 2010 1 commit
    • Tom Lane's avatar
      Clean up handling of XactReadOnly and RecoveryInProgress checks. · 05d8a561
      Tom Lane authored
      Add some checks that seem logically necessary, in particular let's make
      real sure that HS slave sessions cannot create temp tables.  (If they did
      they would think that temp tables belonging to the master's session with
      the same BackendId were theirs.  We *must* not allow myTempNamespace to
      become set in a slave session.)
      
      Change setval() and nextval() so that they are only allowed on temp sequences
      in a read-only transaction.  This seems consistent with what we allow for
      table modifications in read-only transactions.  Since an HS slave can't have a
      temp sequence, this also provides a nicer cure for the setval PANIC reported
      by Erik Rijkers.
      
      Make the error messages more uniform, and have them mention the specific
      command being complained of.  This seems worth the trifling amount of extra
      code, since people are likely to see such messages a lot more than before.
      05d8a561
  20. 18 Feb, 2010 1 commit
    • Tom Lane's avatar
      Fix ExecEvalArrayRef to pass down the old value of the array element or slice · 11d5ba97
      Tom Lane authored
      being assigned to, in case the expression to be assigned is a FieldStore that
      would need to modify that value.  The need for this was foreseen some time
      ago, but not implemented then because we did not have arrays of composites.
      Now we do, but the point evidently got overlooked in that patch.  Net result
      is that updating a field of an array element doesn't work right, as
      illustrated if you try the new regression test on an unpatched backend.
      Noted while experimenting with EXPLAIN VERBOSE, which has also got some issues
      in this area.
      
      Backpatch to 8.3, where arrays of composites were introduced.
      11d5ba97
  21. 14 Feb, 2010 1 commit
    • Robert Haas's avatar
      Wrap calls to SearchSysCache and related functions using macros. · e26c539e
      Robert Haas authored
      The purpose of this change is to eliminate the need for every caller
      of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
      GetSysCacheOid, and SearchSysCacheList to know the maximum number
      of allowable keys for a syscache entry (currently 4).  This will
      make it far easier to increase the maximum number of keys in a
      future release should we choose to do so, and it makes the code
      shorter, too.
      
      Design and review by Tom Lane.
      e26c539e
  22. 12 Feb, 2010 1 commit
    • Tom Lane's avatar
      Extend the set of frame options supported for window functions. · ec4be2ee
      Tom Lane authored
      This patch allows the frame to start from CURRENT ROW (in either RANGE or
      ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING
      start and end points.  (RANGE value PRECEDING/FOLLOWING isn't there yet ---
      the grammar works, but that's all.)
      
      Hitoshi Harada, reviewed by Pavel Stehule
      ec4be2ee
  23. 09 Feb, 2010 1 commit
    • Tom Lane's avatar
      Fix up rickety handling of relation-truncation interlocks. · cbe9d6be
      Tom Lane authored
      Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
      relation entries, so that they will get reset to InvalidBlockNumber whenever
      an smgr-level flush happens.  Because we now send smgr invalidation messages
      immediately (not at end of transaction) when a relation truncation occurs,
      this ensures that other backends will reset their values before they next
      access the relation.  We no longer need the unreliable assumption that a
      VACUUM that's doing a truncation will hold its AccessExclusive lock until
      commit --- in fact, we can intentionally release that lock as soon as we've
      completed the truncation.  This patch therefore reverts (most of) Alvaro's
      patch of 2009-11-10, as well as my marginal hacking on it yesterday.  We can
      also get rid of assorted no-longer-needed relcache flushes, which are far more
      expensive than an smgr flush because they kill a lot more state.
      
      In passing this patch fixes smgr_redo's failure to perform visibility-map
      truncation, and cleans up some rather dubious assumptions in freespace.c and
      visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
      date.
      cbe9d6be
  24. 08 Feb, 2010 2 commits
    • Tom Lane's avatar
      Create an official API function for C functions to use to check if they are · d5768dce
      Tom Lane authored
      being called as aggregates, and to get the aggregate transition state memory
      context if needed.  Use it instead of poking directly into AggState and
      WindowAggState in places that shouldn't know so much.
      
      We should have done this in 8.4, probably, but better late than never.
      
      Revised version of a patch by Hitoshi Harada.
      d5768dce
    • Tom Lane's avatar
      Remove old-style VACUUM FULL (which was known for a little while as · 0a469c87
      Tom Lane authored
      VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
      Per discussion, the use case for this method of vacuuming is no longer large
      enough to justify maintaining it; not to mention that we don't wish to invest
      the work that would be needed to make it play nicely with Hot Standby.
      
      Aside from the code directly related to old-style VACUUM FULL, this commit
      removes support for certain WAL record types that could only be generated
      within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
      nontransactional generation of cache invalidation sinval messages (the last
      being the sticking point for Hot Standby).
      
      We still have to retain all code that copes with finding HEAP_MOVED_OFF and
      HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
      as we want to support in-place update from pre-9.0 databases.
      0a469c87
  25. 07 Feb, 2010 1 commit
    • Tom Lane's avatar
      Create a "relation mapping" infrastructure to support changing the relfilenodes · b9b8831a
      Tom Lane authored
      of shared or nailed system catalogs.  This has two key benefits:
      
      * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.
      
      * We no longer have to use an unsafe reindex-in-place approach for reindexing
        shared catalogs.
      
      CLUSTER on nailed catalogs now works too, although I left it disabled on
      shared catalogs because the resulting pg_index.indisclustered update would
      only be visible in one database.
      
      Since reindexing shared system catalogs is now fully transactional and
      crash-safe, the former special cases in REINDEX behavior have been removed;
      shared catalogs are treated the same as non-shared.
      
      This commit does not do anything about the recently-discussed problem of
      deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
      concurrent queries; will address that in a separate patch.  As a stopgap,
      parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
      such failures during the regression tests.
      b9b8831a
  26. 03 Feb, 2010 1 commit
  27. 01 Feb, 2010 1 commit
  28. 31 Jan, 2010 1 commit
    • Tom Lane's avatar
      Fix memory leak created by deferrable-index-constraints patches. · 034fffbf
      Tom Lane authored
      We need to free the OID list returned by ExecInsertIndexTuples to avoid
      a query-lifespan memory leak.  When many rows require rechecking, this
      can be a significant leak --- it's even more than the space used for the
      queued trigger events.
      
      Dean Rasheed
      034fffbf
  29. 28 Jan, 2010 1 commit
  30. 15 Jan, 2010 1 commit
    • Heikki Linnakangas's avatar
      Introduce Streaming Replication. · 40f908bd
      Heikki Linnakangas authored
      This includes two new kinds of postmaster processes, walsenders and
      walreceiver. Walreceiver is responsible for connecting to the primary server
      and streaming WAL to disk, while walsender runs in the primary server and
      streams WAL from disk to the client.
      
      Documentation still needs work, but the basics are there. We will probably
      pull the replication section to a new chapter later on, as well as the
      sections describing file-based replication. But let's do that as a separate
      patch, so that it's easier to see what has been added/changed. This patch
      also adds a new section to the chapter about FE/BE protocol, documenting the
      protocol used by walsender/walreceivxer.
      
      Bump catalog version because of two new functions,
      pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
      monitoring the progress of replication.
      
      Fujii Masao, with additional hacking by me
      40f908bd
  31. 11 Jan, 2010 1 commit
    • Tom Lane's avatar
      Improve ExecEvalVar's handling of whole-row variables in cases where the · 292176a1
      Tom Lane authored
      rowtype contains dropped columns.  Sometimes the input tuple will be formed
      from a select targetlist in which dropped columns are filled with a NULL
      of an arbitrary type (the planner typically uses INT4, since it can't tell
      what type the dropped column really was).  So we need to relax the rowtype
      compatibility check to not insist on physical compatibility if the actual
      column value is NULL.
      
      In principle we might need to do this for functions returning composite
      types, too (see tupledesc_match()).  In practice there doesn't seem to be
      a bug there, probably because the function will be using the same cached
      rowtype descriptor as the caller.  Fixing that code path would require
      significant rearrangement, so I left it alone for now.
      
      Per complaint from Filip Rembialkowski.
      292176a1
  32. 09 Jan, 2010 1 commit
    • Tom Lane's avatar
      Make ExecEvalFieldSelect throw a more intelligible error if it's asked to · 85113bcf
      Tom Lane authored
      extract a system column, and remove a couple of lines that are useless
      in light of the fact that we aren't ever going to support this case.  There
      isn't much point in trying to make this work because a tuple Datum does
      not carry many of the system columns.  Per experimentation with a case
      reported by Dean Rasheed; we'll have to fix his problem somewhere else.
      85113bcf
  33. 08 Jan, 2010 1 commit
    • Tom Lane's avatar
      Fix oversight in EvalPlanQualFetch: after failing to lock a tuple because · 217dc525
      Tom Lane authored
      someone else has just updated it, we have to set priorXmax to that tuple's
      xmax (ie, the XID of the other xact that updated it) before looping back to
      examine the next tuple.  Obviously, the next tuple in the update chain should
      have that XID as its xmin, not the same xmin as the preceding tuple that we
      had been trying to lock.  The mismatch would cause the EvalPlanQual logic to
      decide that the tuple chain ended in a deletion, when actually there was a
      live tuple that should have been found.
      
      I inserted this error when recently adding logic to EvalPlanQual to make it
      lock tuples before returning them (as opposed to the old method in which the
      lock would occur much later, causing a great deal of work to be wasted if we
      only then discover someone else updated it).  Sigh.  Per today's report from
      Takahiro Itagaki of inconsistent results during pgbench runs.
      217dc525
  34. 06 Jan, 2010 1 commit
    • Bruce Momjian's avatar
      Preserve relfilenodes: · f98fbc78
      Bruce Momjian authored
      Add support to pg_dump --binary-upgrade to preserve all relfilenodes,
      for use by pg_migrator.
      f98fbc78
  35. 05 Jan, 2010 1 commit
    • Tom Lane's avatar
      Add support for doing FULL JOIN ON FALSE. While this is really a rather · 90f4c2d9
      Tom Lane authored
      peculiar variant of UNION ALL, and so wouldn't likely get written directly
      as-is, it's possible for it to arise as a result of simplification of
      less-obviously-silly queries.  In particular, now that we can do flattening
      of subqueries that have constant outputs and are underneath an outer join,
      it's possible for the case to result from simplification of queries of the
      type exhibited in bug #5263.  Back-patch to 8.4 to avoid a functionality
      regression for this type of query.
      90f4c2d9
  36. 04 Jan, 2010 1 commit
    • Tom Lane's avatar
      When estimating the selectivity of an inequality "column > constant" or · 40608e7f
      Tom Lane authored
      "column < constant", and the comparison value is in the first or last
      histogram bin or outside the histogram entirely, try to fetch the actual
      column min or max value using an index scan (if there is an index on the
      column).  If successful, replace the lower or upper histogram bound with
      that value before carrying on with the estimate.  This limits the
      estimation error caused by moving min/max values when the comparison
      value is close to the min or max.  Per a complaint from Josh Berkus.
      
      It is tempting to consider using this mechanism for mergejoinscansel as well,
      but that would inject index fetches into main-line join estimation not just
      endpoint cases.  I'm refraining from that until we can get a better handle
      on the costs of doing this type of lookup.
      40608e7f
  37. 02 Jan, 2010 2 commits