1. 12 Mar, 2014 1 commit
    • Heikki Linnakangas's avatar
      Only WAL-log the modified portion in an UPDATE, if possible. · a3115f0d
      Heikki Linnakangas authored
      When a row is updated, and the new tuple version is put on the same page as
      the old one, only WAL-log the part of the new tuple that's not identical to
      the old. This saves significantly on the amount of WAL that needs to be
      written, in the common case that most fields are not modified.
      
      Amit Kapila, with a lot of back and forth with me, Robert Haas, and others.
      a3115f0d
  2. 03 Mar, 2014 1 commit
    • Robert Haas's avatar
      Introduce logical decoding. · b89e1510
      Robert Haas authored
      This feature, building on previous commits, allows the write-ahead log
      stream to be decoded into a series of logical changes; that is,
      inserts, updates, and deletes and the transactions which contain them.
      It is capable of handling decoding even across changes to the schema
      of the effected tables.  The output format is controlled by a
      so-called "output plugin"; an example is included.  To make use of
      this in a real replication system, the output plugin will need to be
      modified to produce output in the format appropriate to that system,
      and to perform filtering.
      
      Currently, information can be extracted from the logical decoding
      system only via SQL; future commits will add the ability to stream
      changes via walsender.
      
      Andres Freund, with review and other contributions from many other
      people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
      Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
      Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
      Singer.
      b89e1510
  3. 01 Feb, 2014 1 commit
    • Robert Haas's avatar
      Introduce replication slots. · 858ec118
      Robert Haas authored
      Replication slots are a crash-safe data structure which can be created
      on either a master or a standby to prevent premature removal of
      write-ahead log segments needed by a standby, as well as (with
      hot_standby_feedback=on) pruning of tuples whose removal would cause
      replication conflicts.  Slots have some advantages over existing
      techniques, as explained in the documentation.
      
      In a few places, we refer to the type of replication slots introduced
      by this patch as "physical" slots, because forthcoming patches for
      logical decoding will also have slots, but with somewhat different
      properties.
      
      Andres Freund and Robert Haas
      858ec118
  4. 25 Jan, 2014 1 commit
    • Heikki Linnakangas's avatar
      Add recovery_target='immediate' option. · 71c6a8e3
      Heikki Linnakangas authored
      This allows ending recovery as a consistent state has been reached. Without
      this, there was no easy way to e.g restore an online backup, without
      replaying any extra WAL after the backup ended.
      
      MauMau and me.
      71c6a8e3
  5. 14 Jan, 2014 1 commit
    • Tom Lane's avatar
      Fix multiple bugs in index page locking during hot-standby WAL replay. · 061b079f
      Tom Lane authored
      In ordinary operation, VACUUM must be careful to take a cleanup lock on
      each leaf page of a btree index; this ensures that no indexscans could
      still be "in flight" to heap tuples due to be deleted.  (Because of
      possible index-tuple motion due to concurrent page splits, it's not enough
      to lock only the pages we're deleting index tuples from.)  In Hot Standby,
      the WAL replay process must likewise lock every leaf page.  There were
      several bugs in the code for that:
      
      * The replay scan might come across unused, all-zero pages in the index.
      While btree_xlog_vacuum itself did the right thing (ie, nothing) with
      such pages, xlogutils.c supposed that such pages must be corrupt and
      would throw an error.  This accounts for various reports of replication
      failures with "PANIC: WAL contains references to invalid pages".  To
      fix, add a ReadBufferMode value that instructs XLogReadBufferExtended
      not to complain when we're doing this.
      
      * btree_xlog_vacuum performed the extra locking if standbyState ==
      STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up
      for hot standby queries until the database has reached consistency, and
      we don't want to do the extra locking till then either, for fear of reading
      corrupted pages (which bufmgr.c would complain about).  Fix by exporting a
      new function from xlog.c that will report whether we're actually in hot
      standby replay mode.
      
      * To ensure full coverage of the index in the replay scan, btvacuumscan
      would emit a dummy WAL record for the last page of the index, if no
      vacuuming work had been done on that page.  However, if the last page
      of the index is all-zero, that would result in corruption of said page,
      since the functions called on it weren't prepared to handle that case.
      There's no need to lock any such pages, so change the logic to target
      the last normal leaf page instead.
      
      The first two of these bugs were diagnosed by Andres Freund, the other one
      by me.  Fixes based on ideas from Heikki Linnakangas and myself.
      
      This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
      061b079f
  6. 07 Jan, 2014 1 commit
  7. 02 Jan, 2014 1 commit
  8. 20 Dec, 2013 1 commit
  9. 13 Dec, 2013 1 commit
    • Heikki Linnakangas's avatar
      Add GUC to enable WAL-logging of hint bits, even with checksums disabled. · 50e54709
      Heikki Linnakangas authored
      WAL records of hint bit updates is useful to tools that want to examine
      which pages have been modified. In particular, this is required to make
      the pg_rewind tool safe (without checksums).
      
      This can also be used to test how much extra WAL-logging would occur if
      you enabled checksums, without actually enabling them (which you can't
      currently do without re-initdb'ing).
      
      Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with
      further changes by me.
      50e54709
  10. 11 Dec, 2013 1 commit
    • Robert Haas's avatar
      Add new wal_level, logical, sufficient for logical decoding. · e55704d8
      Robert Haas authored
      When wal_level=logical, we'll log columns from the old tuple as
      configured by the REPLICA IDENTITY facility added in commit
      07cacba9.  This makes it possible
      a properly-configured logical replication solution to correctly
      follow table updates even if they change the chosen key columns,
      or, with REPLICA IDENTITY FULL, even if the table has no key at
      all.  Note that updates which do not modify the replica identity
      column won't log anything extra, making the choice of a good key
      (i.e. one that will rarely be changed) important to performance
      when wal_level=logical is configured.
      
      Each insert, update, or delete to a catalog table will also log
      the CMIN and/or CMAX values of stamped by the current transaction.
      This is necessary because logical decoding will require access to
      historical snapshots of the catalog in order to decode some data
      types, and the CMIN/CMAX values that we may need in order to judge
      row visibility may have been overwritten by the time we need them.
      
      Andres Freund, reviewed in various versions by myself, Heikki
      Linnakangas, KONDO Mitsumasa, and many others.
      e55704d8
  11. 08 Jul, 2013 1 commit
    • Heikki Linnakangas's avatar
      Improve scalability of WAL insertions. · 9a20a9b2
      Heikki Linnakangas authored
      This patch replaces WALInsertLock with a number of WAL insertion slots,
      allowing multiple backends to insert WAL records to the WAL buffers
      concurrently. This is particularly useful for parallel loading large amounts
      of data on a system with many CPUs.
      
      This has one user-visible change: switching to a new WAL segment with
      pg_switch_xlog() now fills the remaining unused portion of the segment with
      zeros. This potentially adds some overhead, but it has been a very common
      practice by DBA's to clear the "tail" of the segment with an external
      pg_clearxlogtail utility anyway, to make the WAL files compress better.
      With this patch, it's no longer necessary to do that.
      
      This patch adds a new GUC, xloginsert_slots, to tune the number of WAL
      insertion slots. Performance testing suggests that the default, 8, works
      pretty well for all kinds of worklods, but I left the GUC in place to allow
      others with different hardware to test that easily. We might want to remove
      that before release.
      
      Reviewed by Andres Freund.
      9a20a9b2
  12. 17 Jun, 2013 1 commit
    • Jeff Davis's avatar
      Add buffer_std flag to MarkBufferDirtyHint(). · b8fd1a09
      Jeff Davis authored
      MarkBufferDirtyHint() writes WAL, and should know if it's got a
      standard buffer or not. Currently, the only callers where buffer_std
      is false are related to the FSM.
      
      In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive.
      
      Back-patch to 9.3.
      b8fd1a09
  13. 29 May, 2013 1 commit
  14. 22 Mar, 2013 1 commit
    • Simon Riggs's avatar
      Allow I/O reliability checks using 16-bit checksums · 96ef3b8f
      Simon Riggs authored
      Checksums are set immediately prior to flush out of shared buffers
      and checked when pages are read in again. Hint bit setting will
      require full page write when block is dirtied, which causes various
      infrastructure changes. Extensive comments, docs and README.
      
      WARNING message thrown if checksum fails on non-all zeroes page;
      ERROR thrown but can be disabled with ignore_checksum_failure = on.
      
      Feature enabled by an initdb option, since transition from option off
      to option on is long and complex and has not yet been implemented.
      Default is not to use checksums.
      
      Checksum used is WAL CRC-32 truncated to 16-bits.
      
      Simon Riggs, Jeff Davis, Greg Smith
      Wide input and assistance from many community members. Thank you.
      96ef3b8f
  15. 11 Feb, 2013 1 commit
    • Heikki Linnakangas's avatar
      Support unlogged GiST index. · 62401db4
      Heikki Linnakangas authored
      The reason this wasn't supported before was that GiST indexes need an
      increasing sequence to detect concurrent page-splits. In a regular WAL-
      logged GiST index, the LSN of the page-split record is used for that
      purpose, and in a temporary index, we can get away with a backend-local
      counter. Neither of those methods works for an unlogged relation.
      
      To provide such an increasing sequence of numbers, create a "fake LSN"
      counter that is saved and restored across shutdowns. On recovery, unlogged
      relations are blown away, so the counter doesn't need to survive that
      either.
      
      Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
      62401db4
  16. 17 Jan, 2013 1 commit
    • Heikki Linnakangas's avatar
      Make pg_receivexlog and pg_basebackup -X stream work across timeline switches. · 0b632913
      Heikki Linnakangas authored
      This mirrors the changes done earlier to the server in standby mode. When
      receivelog reaches the end of a timeline, as reported by the server, it
      fetches the timeline history file of the next timeline, and restarts
      streaming from the new timeline by issuing a new START_STREAMING command.
      
      When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
      last segment on the old timeline. This helps you to tell apart a partial
      segment left in the directory because of a timeline switch, and a completed
      segment. If you just follow a single server, it won't make a difference, but
      it can be significant in more complicated scenarios where new WAL is still
      generated on the old timeline.
      
      This includes two small changes to the streaming replication protocol:
      First, when you reach the end of timeline while streaming, the server now
      sends the TLI of the next timeline in the server's history to the client.
      pg_receivexlog uses that as the next timeline, so that it doesn't need to
      parse the timeline history file like a standby server does. Second, when
      BASE_BACKUP command sends the begin and end WAL positions, it now also sends
      the timeline IDs corresponding the positions.
      0b632913
  17. 03 Jan, 2013 1 commit
    • Heikki Linnakangas's avatar
      Tolerate timeline switches while "pg_basebackup -X fetch" is running. · b0daba57
      Heikki Linnakangas authored
      If you take a base backup from a standby server with "pg_basebackup -X
      fetch", and the timeline switches while the backup is being taken, the
      backup used to fail with an error "requested WAL segment %s has already
      been removed". This is because the server-side code that sends over the
      required WAL files would not construct the WAL filename with the correct
      timeline after a switch.
      
      Fix that by using readdir() to scan pg_xlog for all the WAL segments in the
      range, regardless of timeline.
      
      Also, include all timeline history files in the backup, if taken with
      "-X fetch". That fixes another related bug: If a timeline switch happened
      just before the backup was initiated in a standby, the WAL segment
      containing the initial checkpoint record contains WAL from the older
      timeline too. Recovery will not accept that without a timeline history file
      that lists the older timeline.
      
      Backpatch to 9.2. Versions prior to that were not affected as you could not
      take a base backup from a standby before 9.2.
      b0daba57
  18. 01 Jan, 2013 1 commit
  19. 21 Dec, 2012 1 commit
  20. 20 Dec, 2012 1 commit
    • Heikki Linnakangas's avatar
      Follow TLI of last replayed record, not recovery target TLI, in walsenders. · af275a12
      Heikki Linnakangas authored
      Most of the time, the last replayed record comes from the recovery target
      timeline, but there is a corner case where it makes a difference. When
      the startup process scans for a new timeline, and decides to change recovery
      target timeline, there is a window where the recovery target TLI has already
      been bumped, but there are no WAL segments from the new timeline in pg_xlog
      yet. For example, if we have just replayed up to point 0/30002D8, on
      timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog
      that contains the WAL up to that point. When recovery switches recovery
      target timeline to 2, a walsender can immediately try to read WAL from
      0/30002D8, from timeline 2, so it will try to open WAL file
      000000020000000000000003. However, that doesn't exist yet - the startup
      process hasn't copied that file from the archive yet nor has the walreceiver
      streamed it yet, so walsender fails with error "requested WAL segment
      000000020000000000000003 has already been removed". That's harmless, in that
      the standby will try to reconnect later and by that time the segment is
      already created, but error messages that should be ignored are not good.
      
      To fix that, have walsender track the TLI of the last replayed record,
      instead of the recovery target timeline. That way walsender will not try to
      read anything from timeline 2, until the WAL segment has been created and at
      least one record has been replayed from it. The recovery target timeline is
      now xlog.c's internal affair, it doesn't need to be exposed in shared memory
      anymore.
      
      This fixes the error reported by Thom Brown. depesz the same error message,
      but I'm not sure if this fixes his scenario.
      af275a12
  21. 13 Dec, 2012 1 commit
    • Heikki Linnakangas's avatar
      Allow a streaming replication standby to follow a timeline switch. · abfd192b
      Heikki Linnakangas authored
      Before this patch, streaming replication would refuse to start replicating
      if the timeline in the primary doesn't exactly match the standby. The
      situation where it doesn't match is when you have a master, and two
      standbys, and you promote one of the standbys to become new master.
      Promoting bumps up the timeline ID, and after that bump, the other standby
      would refuse to continue.
      
      There's significantly more timeline related logic in streaming replication
      now. First of all, when a standby connects to primary, it will ask the
      primary for any timeline history files that are missing from the standby.
      The missing files are sent using a new replication command TIMELINE_HISTORY,
      and stored in standby's pg_xlog directory. Using the timeline history files,
      the standby can follow the latest timeline present in the primary
      (recovery_target_timeline='latest'), just as it can follow new timelines
      appearing in an archive directory.
      
      START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
      timeline to stream WAL from. This allows the standby to request the primary
      to send over WAL that precedes the promotion. The replication protocol is
      changed slightly (in a backwards-compatible way although there's little hope
      of streaming replication working across major versions anyway), to allow
      replication to stop when the end of timeline reached, putting the walsender
      back into accepting a replication command.
      
      Many thanks to Amit Kapila for testing and reviewing various versions of
      this patch.
      abfd192b
  22. 13 Nov, 2012 1 commit
    • Tom Lane's avatar
      Fix multiple problems in WAL replay. · 3bbf668d
      Tom Lane authored
      Most of the replay functions for WAL record types that modify more than
      one page failed to ensure that those pages were locked correctly to ensure
      that concurrent queries could not see inconsistent page states.  This is
      a hangover from coding decisions made long before Hot Standby was added,
      when it was hardly necessary to acquire buffer locks during WAL replay
      at all, let alone hold them for carefully-chosen periods.
      
      The key problem was that RestoreBkpBlocks was written to hold lock on each
      page restored from a full-page image for only as long as it took to update
      that page.  This was guaranteed to break any WAL replay function in which
      there was any update-ordering constraint between pages, because even if the
      nominal order of the pages is the right one, any mixture of full-page and
      non-full-page updates in the same record would result in out-of-order
      updates.  Moreover, it wouldn't work for situations where there's a
      requirement to maintain lock on one page while updating another.  Failure
      to honor an update ordering constraint in this way is thought to be the
      cause of bug #7648 from Daniel Farina: what seems to have happened there
      is that a btree page being split was rewritten from a full-page image
      before the new right sibling page was written, and because lock on the
      original page was not maintained it was possible for hot standby queries to
      try to traverse the page's right-link to the not-yet-existing sibling page.
      
      To fix, get rid of RestoreBkpBlocks as such, and instead create a new
      function RestoreBackupBlock that restores just one full-page image at a
      time.  This function can be invoked by WAL replay functions at the points
      where they would otherwise perform non-full-page updates; in this way, the
      physical order of page updates remains the same no matter which pages are
      replaced by full-page images.  We can then further adjust the logic in
      individual replay functions if it is necessary to hold buffer locks
      for overlapping periods.  A side benefit is that we can simplify the
      handling of concurrency conflict resolution by moving that code into the
      record-type-specfic functions; there's no more need to contort the code
      layout to keep conflict resolution in front of the RestoreBkpBlocks call.
      
      In connection with that, standardize on zero-based numbering rather than
      one-based numbering for referencing the full-page images.  In HEAD, I
      removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4.  They are
      still there in the header files in previous branches, but are no longer
      used by the code.
      
      In addition, fix some other bugs identified in the course of making these
      changes:
      
      spgRedoAddNode could fail to update the parent downlink at all, if the
      parent tuple is in the same page as either the old or new split tuple and
      we're not doing a full-page image: it would get fooled by the LSN having
      been advanced already.  This would result in permanent index corruption,
      not just transient failure of concurrent queries.
      
      Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
      tail page as a candidate for a full-page image; in the worst case this
      could result in torn-page corruption.
      
      heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
      exclusive lock: it did the former in the normal path but the latter for a
      full-page image.  A plain exclusive lock seems sufficient, so change to
      that.
      
      Also, remove gistRedoPageDeleteRecord(), which has been dead code since
      VACUUM FULL was rewritten.
      
      Back-patch to 9.0, where hot standby was introduced.  Note however that 9.0
      had a significantly different WAL-logging scheme for GIST index updates,
      and it doesn't appear possible to make that scheme safe for concurrent hot
      standby queries, because it can leave inconsistent states in the index even
      between WAL records.  Given the lack of complaints from the field, we won't
      work too hard on fixing that branch.
      3bbf668d
  23. 05 Sep, 2012 1 commit
    • Heikki Linnakangas's avatar
      Fix bugs in cascading replication with recovery_target_timeline='latest' · c4c22747
      Heikki Linnakangas authored
      The cascading replication code assumed that the current RecoveryTargetTLI
      never changes, but that's not true with recovery_target_timeline='latest'.
      The obvious upshot of that is that RecoveryTargetTLI in shared memory needs
      to be protected by a lock. A less obvious consequence is that when a
      cascading standby is connected, and the standby switches to a new target
      timeline after scanning the archive, it will continue to stream WAL to the
      cascading standby, but from a wrong file, ie. the file of the previous
      timeline. For example, if the standby is currently streaming from the middle
      of file 000000010000000000000005, and the timeline changes, the standby
      will continue to stream from that file. However, the WAL on the new
      timeline is in file 000000020000000000000005, so the standby sends garbage
      from 000000010000000000000005 to the cascading standby, instead of the
      correct WAL from file 000000020000000000000005.
      
      This also fixes a related bug where a partial WAL segment is restored from
      the archive and streamed to a cascading standby. The code assumed that when
      a WAL segment is copied from the archive, it can immediately be fully
      streamed to a cascading standby. However, if the segment is only partially
      filled, ie. has the right size, but only N first bytes contain valid WAL,
      that's not safe. That can happen if a partial WAL segment is manually copied
      to the archive, or if a partial WAL segment is archived because a server is
      started up on a new timeline within that segment. The cascading standby will
      get confused if the WAL it received is not valid, and will get stuck until
      it's restarted. This patch fixes that problem by not allowing WAL restored
      from the archive to be streamed to a cascading standby until it's been
      replayed, and thus validated.
      c4c22747
  24. 24 Jun, 2012 2 commits
    • Heikki Linnakangas's avatar
      Allow WAL record header to be split across pages. · 061e7efb
      Heikki Linnakangas authored
      This saves a few bytes of WAL space, but the real motivation is to make it
      predictable how much WAL space a record requires, as it no longer depends
      on whether we need to waste the last few bytes at end of WAL page because
      the header doesn't fit.
      
      The total length field of WAL record, xl_tot_len, is moved to the beginning
      of the WAL record header, so that it is still always found on the first page
      where a WAL record begins.
      
      Bump WAL version number again as this is an incompatible change.
      061e7efb
    • Heikki Linnakangas's avatar
      Don't waste the last segment of each 4GB logical log file. · dfda6eba
      Heikki Linnakangas authored
      The comments claimed that wasting the last segment made it easier to do
      calculations with XLogRecPtrs, because you don't have problems representing
      last-byte-position-plus-1 that way. In my experience, however, it only made
      things more complicated, because the there was two ways to represent the
      boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0,
      or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were
      picky about which representation was used.
      
      Also, use a 64-bit segment number instead of the log/seg combination, to
      point to a certain WAL segment. We assume that all platforms have a working
      64-bit integer type nowadays.
      
      This is an incompatible change in WAL format, so bumping WAL version number.
      dfda6eba
  25. 09 May, 2012 2 commits
    • Tom Lane's avatar
      Fix an issue in recent walwriter hibernation patch. · acd4c7d5
      Tom Lane authored
      Users of asynchronous-commit mode expect there to be a guaranteed maximum
      delay before an async commit's WAL records get flushed to disk.  The
      original version of the walwriter hibernation patch broke that.  Add an
      extra shared-memory flag to allow async commits to kick the walwriter out
      of hibernation mode, without adding any noticeable overhead in cases where
      no action is needed.
      acd4c7d5
    • Tom Lane's avatar
      Reduce idle power consumption of walwriter and checkpointer processes. · 5461564a
      Tom Lane authored
      This patch modifies the walwriter process so that, when it has not found
      anything useful to do for many consecutive wakeup cycles, it extends its
      sleep time to reduce the server's idle power consumption.  It reverts to
      normal as soon as it's done any successful flushes.  It's still true that
      during any async commit, backends check for completed, unflushed pages of
      WAL and signal the walwriter if there are any; so that in practice the
      walwriter can get awakened and returned to normal operation sooner than the
      sleep time might suggest.
      
      Also, improve the checkpointer so that it uses a latch and a computed delay
      time to not wake up at all except when it has something to do, replacing a
      previous hardcoded 0.5 sec wakeup cycle.  This also is primarily useful for
      reducing the server's power consumption when idle.
      
      In passing, get rid of the dedicated latch for signaling the walwriter in
      favor of using its procLatch, since that comports better with possible
      generic signal handlers using that latch.  Also, fix a pre-existing bug
      with failure to save/restore errno in walwriter's signal handlers.
      
      Peter Geoghegan, somewhat simplified by Tom
      5461564a
  26. 25 Jan, 2012 1 commit
  27. 11 Jan, 2012 1 commit
  28. 01 Jan, 2012 1 commit
  29. 31 Dec, 2011 1 commit
  30. 12 Dec, 2011 1 commit
    • Tom Lane's avatar
      Move BKP_REMOVABLE bit from individual WAL records to WAL page headers. · 2dd9322b
      Tom Lane authored
      Removing this bit from xl_info allows us to restore the old limit of four
      (not three) separate pages touched by a WAL record, which is needed for the
      upcoming SP-GiST feature, and will likely be useful elsewhere in future.
      
      When we implemented XLR_BKP_REMOVABLE in 2007, we had to do it like that
      because no special WAL-visible action was taken when starting a backup.
      However, now we force a segment switch when starting a backup, so a
      compressing WAL archiver (such as pglesslog) that uses the state shown in
      the current page header will not be fooled as to removability of backup
      blocks.  The only downside is that the archiver will not return to
      compressing mode for up to one WAL page after the backup is over, which is
      a small price to pay for getting back the extra xl_info bit.  In any case
      the archiver could look for XLOG_BACKUP_END records if it thought it was
      worth the trouble to do so.
      
      Bump XLOG_PAGE_MAGIC since this is effectively a change in WAL format.
      2dd9322b
  31. 09 Dec, 2011 1 commit
    • Heikki Linnakangas's avatar
      Don't set reachedMinRecoveryPoint during crash recovery. In crash recovery, · 9f0d2bdc
      Heikki Linnakangas authored
      we don't reach consistency before replaying all of the WAL. Rename the
      variable to reachedConsistency, to make its intention clearer.
      
      In master, that was an active bug because of the recent patch to
      immediately PANIC if a reference to a missing page is found in WAL after
      reaching consistency, as Tom Lane's test case demonstrated. In 9.1 and 9.0,
      the only consequence was a misleading "consistent recovery state reached at
      %X/%X" message in the log at the beginning of crash recovery (the database
      is not consistent at that point yet). In 8.4, the log message was not
      printed in crash recovery, even though there was a similar
      reachedMinRecoveryPoint local variable that was also set early. So,
      backpatch to 9.1 and 9.0.
      9f0d2bdc
  32. 02 Dec, 2011 1 commit
    • Heikki Linnakangas's avatar
      During recovery, if we reach consistent state and still have entries in the · 1e616f63
      Heikki Linnakangas authored
      invalid-page hash table, PANIC immediately. Immediate PANIC is much better
      than waiting for end-of-recovery, which is what we did before, because the
      end-of-recovery might not come until months later if this is a standby
      server.
      
      Also refrain from creating a restartpoint if there are invalid-page entries
      in the hash table. Restarting recovery from such a restartpoint would not
      see the invalid references, and wouldn't be able to cross-check them when
      consistency is reached. That wouldn't matter when things are going smoothly,
      but the more sanity checks you have the better.
      
      Fujii Masao
      1e616f63
  33. 13 Nov, 2011 1 commit
    • Simon Riggs's avatar
      Wakeup WALWriter as needed for asynchronous commit performance. · 4de82f7d
      Simon Riggs authored
      Previously we waited for wal_writer_delay before flushing WAL. Now
      we also wake WALWriter as soon as a WAL buffer page has filled.
      Significant effect observed on performance of asynchronous commits
      by Robert Haas, attributed to the ability to set hint bits on tuples
      earlier and so reducing contention caused by clog lookups.
      4de82f7d
  34. 04 Nov, 2011 1 commit
  35. 02 Nov, 2011 1 commit
  36. 09 Sep, 2011 1 commit
    • Tom Lane's avatar
      Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. · a7801b62
      Tom Lane authored
      As per my recent proposal, this refactors things so that these typedefs and
      macros are available in a header that can be included in frontend-ish code.
      I also changed various headers that were undesirably including
      utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
      this showed that half the system was getting utils/timestamp.h by way of
      xlog.h.
      
      No actual code changes here, just header refactoring.
      a7801b62
  37. 04 Sep, 2011 2 commits
    • Tom Lane's avatar
      Clean up the #include mess a little. · 1609797c
      Tom Lane authored
      walsender.h should depend on xlog.h, not vice versa.  (Actually, the
      inclusion was circular until a couple hours ago, which was even sillier;
      but Bruce broke it in the expedient rather than logically correct
      direction.)  Because of that poor decision, plus blind application of
      pgrminclude, we had a situation where half the system was depending on
      xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
      the header inclusion, and manually revert a lot of what pgrminclude had
      done so things build again.
      
      This episode reinforces my feeling that pgrminclude should not be run
      without adult supervision.  Inclusion changes in header files in particular
      need to be reviewed with great care.  More generally, it'd be good if we
      had a clearer notion of module layering to dictate which headers can sanely
      include which others ... but that's a big task for another day.
      1609797c
    • Bruce Momjian's avatar
      Move AllowCascadeReplication() define from xlog.h to replication include · 85e6e166
      Bruce Momjian authored
      file.
      
      Per suggestion from Alvaro.
      85e6e166