1. 29 Apr, 2016 1 commit
    • Andres Freund's avatar
      Remember asking for feedback during walsender shutdown. · 59455018
      Andres Freund authored
      Since 5a991ef8 we're explicitly asking for feedback from the receiving
      side when shutting down walsender, if there's not yet replicated
      data.
      
      Unfortunately we didn't remember (i.e. set waiting_for_ping_response to
      true) having asked for feedback, leading to scenarios in which replies
      were requested at a high frequency.
      
      I can't reproduce this problem on my laptop, I think that's because the
      problem requires a significant TCP window to manifest due to the
      !pq_is_send_pending() condition. But since this clearly is a bug, let's
      fix it.  There's quite possibly more wrong than just this though.
      
      While fiddling with WalSndDone(), I rewrote a hard to understand comment
      about looking at the flush vs. the write position.
      
      Reported-By: Nick Cleaton, Magnus Hagander
      Author: Nick Cleaton
      Discussion: CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com
          CABUevExsjROqDcD0A2rnJ6HK6FuKGyewJr3PL12pw85BHFGS2Q@mail.gmail.com
      Backpatch: 9.4, were 5a991ef8 introduced the use of feedback messages
          during shutdown.
      59455018
  2. 27 Apr, 2016 1 commit
    • Tom Lane's avatar
      Clean up parsing of synchronous_standby_names GUC variable. · 4c804fbd
      Tom Lane authored
      Commit 989be081 added a flex/bison lexer/parser to interpret
      synchronous_standby_names.  It was done in a pretty crufty way, though,
      making assorted end-use sites responsible for calling the parser at the
      right times.  That was not only vulnerable to errors of omission, but made
      it possible that lexer/parser errors occur at very undesirable times,
      and created memory leakages even if there was no error.
      
      Instead, perform the parsing once during check_synchronous_standby_names
      and let guc.c manage the resulting data.  To do that, we have to flatten
      the parsed representation into a single hunk of malloc'd memory, but that
      is not very hard.
      
      While at it, work a little harder on making useful error reports for
      parsing problems; the previous code felt that "synchronous_standby_names
      parser returned 1" was an appropriate user-facing error message.  (To
      be fair, it did also log a syntax error message, but separately from the
      GUC problem report, which is at best confusing.)  It had some outright
      bugs in the face of invalid input, too.
      
      I (tgl) also concluded that we need to restrict unquoted names in
      synchronous_standby_names to be just SQL identifiers.  The previous coding
      would accept darn near anything, which (1) makes the quoting convention
      both nearly-unnecessary and formally ambiguous, (2) makes it very hard to
      understand what is a syntax error and what is a creative interpretation of
      the input as a standby name, and (3) makes it impossible to further extend
      the syntax in future without a compatibility break.  I presume that we're
      intending future extensions of the syntax, else this parsing infrastructure
      is massive overkill, so (3) is an important objection.  Since we've taken
      a compatibility hit for non-identifier names with this change anyway, we
      might as well lock things down now and insist that users use double quotes
      for standby names that aren't identifiers.
      
      Kyotaro Horiguchi and Tom Lane
      4c804fbd
  3. 12 Apr, 2016 1 commit
    • Fujii Masao's avatar
      Remove unused function GetOldestWALSendPointer from walsender code. · 46d73e0d
      Fujii Masao authored
      That unused function was introduced as a sample because synchronous
      replication or replication monitoring tools might need it in the future.
      Recently commit 989be081 added the function SyncRepGetOldestSyncRecPtr
      which provides almost the same functionality for multiple synchronous
      standbys feature. So it's time to remove that unused sample function.
      This commit does that.
      46d73e0d
  4. 06 Apr, 2016 1 commit
    • Fujii Masao's avatar
      Support multiple synchronous standby servers. · 989be081
      Fujii Masao authored
      Previously synchronous replication offered only the ability to confirm
      that all changes made by a transaction had been transferred to at most
      one synchronous standby server.
      
      This commit extends synchronous replication so that it supports multiple
      synchronous standby servers. It enables users to consider one or more
      standby servers as synchronous, and increase the level of transaction
      durability by ensuring that transaction commits wait for replies from
      all of those synchronous standbys.
      
      Multiple synchronous standby servers are configured in
      synchronous_standby_names which is extended to support new syntax of
      'num_sync ( standby_name [ , ... ] )', where num_sync specifies
      the number of synchronous standbys that transaction commits need to
      wait for replies from and standby_name is the name of a standby
      server.
      
      The syntax of 'standby_name [ , ... ]' which was used in 9.5 or before
      is also still supported. It's the same as new syntax with num_sync=1.
      
      This commit doesn't include "quorum commit" feature which was discussed
      in pgsql-hackers. Synchronous standbys are chosen based on their priorities.
      synchronous_standby_names determines the priority of each standby for
      being chosen as a synchronous standby. The standbys whose names appear
      earlier in the list are given higher priority and will be considered as
      synchronous. Other standby servers appearing later in this list
      represent potential synchronous standbys.
      
      The regression test for multiple synchronous standbys is not included
      in this commit. It should come later.
      
      Authors: Sawada Masahiko, Beena Emerson, Michael Paquier, Fujii Masao
      Reviewed-By: Kyotaro Horiguchi, Amit Kapila, Robert Haas, Simon Riggs,
      Amit Langote, Thomas Munro, Sameer Thakur, Suraj Kharage, Abhijit Menon-Sen,
      Rajeev Rastogi
      
      Many thanks to the various individuals who were involved in
      discussing and developing this feature.
      989be081
  5. 10 Mar, 2016 1 commit
    • Robert Haas's avatar
      Provide much better wait information in pg_stat_activity. · 53be0b1a
      Robert Haas authored
      When a process is waiting for a heavyweight lock, we will now indicate
      the type of heavyweight lock for which it is waiting.  Also, you can
      now see when a process is waiting for a lightweight lock - in which
      case we will indicate the individual lock name or the tranche, as
      appropriate - or for a buffer pin.
      
      Amit Kapila, Ildus Kurbangaliev, reviewed by me.  Lots of helpful
      discussion and suggestions by many others, including Alexander
      Korotkov, Vladimir Borodin, and many others.
      53be0b1a
  6. 02 Jan, 2016 1 commit
  7. 13 Dec, 2015 2 commits
  8. 27 Oct, 2015 1 commit
    • Alvaro Herrera's avatar
      Measure string lengths only once · 0cd836a4
      Alvaro Herrera authored
      Bernd Helmle complained that CreateReplicationSlot() was assigning the
      same value to the same variable twice, so we could remove one of them.
      Code inspection reveals that we can actually remove both assignments:
      according to the author the assignment was there for beauty of the
      strlen line only, and another possible fix to that is to put the strlen
      in its own line, so do that.
      
      To be consistent within the file, refactor all duplicated strlen()
      calls, which is what we do elsewhere in the backend anyway.  In
      basebackup.c, snprintf already returns the right length; no need for
      strlen afterwards.
      
      Backpatch to 9.4, where replication slots were introduced, to keep code
      identical.  Some of this is older, but the patch doesn't apply cleanly
      and it's only of cosmetic value anyway.
      
      Discussion: http://www.postgresql.org/message-id/BE2FD71DEA35A2287EA5F018@eje.credativ.lan
      0cd836a4
  9. 06 Oct, 2015 1 commit
    • Robert Haas's avatar
      Remove more volatile qualifiers. · 8f6bb851
      Robert Haas authored
      Prior to commit 0709b7ee, access to
      variables within a spinlock-protected critical section had to be done
      through a volatile pointer, but that should no longer be necessary.
      This continues work begun in df4077cd
      and 6ba4ecbf.
      
      Thomas Munro and Michael Paquier
      8f6bb851
  10. 06 Sep, 2015 1 commit
    • Andres Freund's avatar
      Add ability to reserve WAL upon slot creation via replication protocol. · c314ead5
      Andres Freund authored
      Since 6fcd8851 it is possible to immediately reserve WAL when creating a
      slot via pg_create_physical_replication_slot(). Extend the replication
      protocol to allow that as well.
      
      Although, in contrast to the SQL interface, it is possible to update the
      reserved location via the replication interface, it is still useful
      being able to reserve upon creation there. Otherwise the logic in
      ReplicationSlotReserveWal() has to be repeated in slot employing
      clients.
      
      Author: Michael Paquier
      Discussion: CAB7nPqT0Wc1W5mdYGeJ_wbutbwNN+3qgrFR64avXaQCiJMGaYA@mail.gmail.com
      c314ead5
  11. 15 Aug, 2015 1 commit
  12. 11 Aug, 2015 1 commit
  13. 24 May, 2015 1 commit
  14. 12 Apr, 2015 1 commit
  15. 26 Mar, 2015 1 commit
    • Tom Lane's avatar
      Tweak __attribute__-wrapping macros for better pgindent results. · 785941cd
      Tom Lane authored
      This improves on commit bbfd7eda by
      making two simple changes:
      
      * pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
      Likewise pg_attribute_unused(), pg_attribute_packed().  This reduces
      pgindent's tendency to misformat declarations involving them.
      
      * attributes are now always attached to function declarations, not
      definitions.  Previously some places were taking creative shortcuts,
      which were not merely candidates for bad misformatting by pgindent
      but often were outright wrong anyway.  (It does little good to put a
      noreturn annotation where callers can't see it.)  In any case, if
      we would like to believe that these macros can be used with non-gcc
      compilers, we should avoid gratuitous variance in usage patterns.
      
      I also went through and manually improved the formatting of a lot of
      declarations, and got rid of excessively repetitive (and now obsolete
      anyway) comments informing the reader what pg_attribute_printf is for.
      785941cd
  16. 11 Mar, 2015 1 commit
    • Andres Freund's avatar
      Add macros wrapping all usage of gcc's __attribute__. · bbfd7eda
      Andres Freund authored
      Until now __attribute__() was defined to be empty for all compilers but
      gcc. That's problematic because it prevents using it in other compilers;
      which is necessary e.g. for atomics portability.  It's also just
      generally dubious to do so in a header as widely included as c.h.
      
      Instead add pg_attribute_format_arg, pg_attribute_printf,
      pg_attribute_noreturn macros which are implemented in the compilers that
      understand them. Also add pg_attribute_noreturn and pg_attribute_packed,
      but don't provide fallbacks, since they can affect functionality.
      
      This means that external code that, possibly unwittingly, relied on
      __attribute__ defined to be empty on !gcc compilers may now run into
      warnings or errors on those compilers. But there shouldn't be many
      occurances of that and it's hard to work around...
      
      Discussion: 54B58BA3.8040302@ohmu.fi
      Author: Oskari Saarenmaa, with some minor changes by me.
      bbfd7eda
  17. 06 Feb, 2015 1 commit
    • Heikki Linnakangas's avatar
      Report WAL flush, not insert, position in replication IDENTIFY_SYSTEM · ff16b40f
      Heikki Linnakangas authored
      When beginning streaming replication, the client usually issues the
      IDENTIFY_SYSTEM command, which used to return the current WAL insert
      position. That's not suitable for the intended purpose of that field,
      however. pg_receivexlog uses it to start replication from the reported
      point, but if it hasn't been flushed to disk yet, it will fail. Change
      IDENTIFY_SYSTEM to report the flush position instead.
      
      Backpatch to 9.1 and above. 9.0 doesn't report any WAL position.
      ff16b40f
  18. 02 Feb, 2015 1 commit
    • Heikki Linnakangas's avatar
      Be more careful to not lose sync in the FE/BE protocol. · 2b3a8b20
      Heikki Linnakangas authored
      If any error occurred while we were in the middle of reading a protocol
      message from the client, we could lose sync, and incorrectly try to
      interpret a part of another message as a new protocol message. That will
      usually lead to an "invalid frontend message" error that terminates the
      connection. However, this is a security issue because an attacker might
      be able to deliberately cause an error, inject a Query message in what's
      supposed to be just user data, and have the server execute it.
      
      We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other
      operations that could ereport(ERROR) in the middle of processing a message,
      but a query cancel interrupt or statement timeout could nevertheless cause
      it to happen. Also, the V2 fastpath and COPY handling were not so careful.
      It's very difficult to recover in the V2 COPY protocol, so we will just
      terminate the connection on error. In practice, that's what happened
      previously anyway, as we lost protocol sync.
      
      To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set
      whenever we're in the middle of reading a message. When it's set, we cannot
      safely ERROR out and continue running, because we might've read only part
      of a message. PqCommReadingMsg acts somewhat similarly to critical sections
      in that if an error occurs while it's set, the error handler will force the
      connection to be terminated, as if the error was FATAL. It's not
      implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted
      to PANIC in critical sections, because we want to be able to use
      PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes
      advantage of that to prevent an OOM error from terminating the connection.
      
      To prevent unnecessary connection terminations, add a holdoff mechanism
      similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel
      interrupts, but still allow die interrupts. The rules on which interrupts
      are processed when are now a bit more complicated, so refactor
      ProcessInterrupts() and the calls to it in signal handlers so that the
      signal handlers always call it if ImmediateInterruptOK is set, and
      ProcessInterrupts() can decide to not do anything if the other conditions
      are not met.
      
      Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund.
      Backpatch to all supported versions.
      
      Security: CVE-2015-0244
      2b3a8b20
  19. 17 Jan, 2015 1 commit
    • Andres Freund's avatar
      Replace walsender's latch with the general shared latch. · ff44fba4
      Andres Freund authored
      Relying on the normal shared latch simplifies interrupt/signal
      handling because we can rely on all signal handlers setting the proc
      latch. That in turn allows us to avoid the use of
      ImmediateInterruptOK, which arguably isn't correct because
      WaitLatchOrSocket isn't declared to be immediately interruptible.
      
      Also change sections that wait on the walsender's latch to notice
      interrupts quicker/more reliably and make them more consistent with
      each other.
      
      This is part of a larger "get rid of ImmediateInterruptOK" series.
      
      Discussion: 20150115020335.GZ5245@awork2.anarazel.de
      ff44fba4
  20. 06 Jan, 2015 1 commit
  21. 25 Dec, 2014 1 commit
  22. 16 Dec, 2014 1 commit
  23. 12 Dec, 2014 1 commit
  24. 20 Nov, 2014 1 commit
    • Heikki Linnakangas's avatar
      Revamp the WAL record format. · 2c03216d
      Heikki Linnakangas authored
      Each WAL record now carries information about the modified relation and
      block(s) in a standardized format. That makes it easier to write tools that
      need that information, like pg_rewind, prefetching the blocks to speed up
      recovery, etc.
      
      There's a whole new API for building WAL records, replacing the XLogRecData
      chains used previously. The new API consists of XLogRegister* functions,
      which are called for each buffer and chunk of data that is added to the
      record. The new API also gives more control over when a full-page image is
      written, by passing flags to the XLogRegisterBuffer function.
      
      This also simplifies the XLogReadBufferForRedo() calls. The function can dig
      the relation and block number from the WAL record, so they no longer need to
      be passed as arguments.
      
      For the convenience of redo routines, XLogReader now disects each WAL record
      after reading it, copying the main data part and the per-block data into
      MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
      but the redo routines can assume that the pointers returned by XLogRecGet*
      functions are. Redo routines are now passed the XLogReaderState, which
      contains the record in the already-disected format, instead of the plain
      XLogRecord.
      
      The new record format also makes the fixed size XLogRecord header smaller,
      by removing the xl_len field. The length of the "main data" portion is now
      stored at the end of the WAL record, and there's a separate header after
      XLogRecord for it. The alignment padding at the end of XLogRecord is also
      removed. This compansates for the fact that the new format would otherwise
      be more bulky than the old format.
      
      Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
      Fujii Masao.
      2c03216d
  25. 20 Oct, 2014 1 commit
  26. 12 Sep, 2014 1 commit
    • Fujii Masao's avatar
      Add GUC to enable logging of replication commands. · 4ad2a548
      Fujii Masao authored
      Previously replication commands like IDENTIFY_COMMAND were not logged
      even when log_statements is set to all. Some users who want to audit
      all types of statements were not satisfied with this situation. To
      address the problem, this commit adds new GUC log_replication_commands.
      If it's enabled, all replication commands are logged in the server log.
      
      There are many ways to allow us to enable that logging. For example,
      we can extend log_statement so that replication commands are logged
      when it's set to all. But per discussion in the community, we reached
      the consensus to add separate GUC for that.
      
      Reviewed by Ian Barwick, Robert Haas and Heikki Linnakangas.
      4ad2a548
  27. 12 Aug, 2014 1 commit
    • Andres Freund's avatar
      Be less aggressive in asking for feedback of logical walsender clients. · 41d5f8ad
      Andres Freund authored
      When doing logical decoding using START_LOGICAL_REPLICATION in a
      walsender process the walsender sometimes was sending out keepalive
      messages too frequently. Asking for feedback every time.
      
      WalSndWaitForWal() sends out keepalive messages when it's waiting for
      new WAL to be generated locally when it sees that the remote side
      hasn't yet flushed WAL up to the local position. That generally is
      good but causes problems if the remote side only writes but doesn't
      flush changes yet. So check for both remote write and flush position.
      
      Additionally we've asked for feedback to the keepalive message which
      isn't warranted when waiting for WAL in contrast to preventing
      timeouts because of wal_sender_timeout.
      
      Complaint and patch by Steve Singer.
      41d5f8ad
  28. 12 Jun, 2014 1 commit
  29. 28 May, 2014 1 commit
    • Andres Freund's avatar
      Don't pay heed to wal_sender_timeout while creating a decoding slot. · 21d48d66
      Andres Freund authored
      Sometimes CREATE_REPLICATION_SLOT ... LOGICAL ... needs to wait for
      further WAL using WalSndWaitForWal(). That used to always respect
      wal_sender_timeout and kill the session when waiting long enough
      because no feedback/ping messages can be sent while the slot is still
      being created.
      Introduce the notion that last_reply_timestamp = 0 means that the
      walsender currently doesn't need timeout processing to avoid that
      problem. Use that notion for CREATE_REPLICATION_SLOT ... LOGICAL.
      
      Bugreport and initial patch by Steve Singer, revised by me.
      21d48d66
  30. 22 May, 2014 1 commit
  31. 06 May, 2014 1 commit
    • Bruce Momjian's avatar
      pgindent run for 9.4 · 0a783200
      Bruce Momjian authored
      This includes removing tabs after periods in C comments, which was
      applied to back branches, so this change should not effect backpatching.
      0a783200
  32. 17 Mar, 2014 1 commit
    • Fujii Masao's avatar
      Fix bug in clean shutdown of walsender that pg_receiving is connecting to. · 5c6d9fc4
      Fujii Masao authored
      On clean shutdown, walsender waits for all WAL to be replicated to a standby,
      and exits. It determined whether that replication had been completed by
      checking whether its sent location had been equal to a standby's flush
      location. Unfortunately this condition never becomes true when the standby
      such as pg_receivexlog which always returns an invalid flush location is
      connecting to walsender, and then walsender waits forever.
      
      This commit changes walsender so that it just checks a standby's write
      location if a flush location is invalid.
      
      Back-patch to 9.1 where enough infrastructure for this exists.
      5c6d9fc4
  33. 10 Mar, 2014 1 commit
    • Robert Haas's avatar
      Allow logical decoding via the walsender interface. · 5a991ef8
      Robert Haas authored
      In order for this to work, walsenders need the optional ability to
      connect to a database, so the "replication" keyword now allows true
      or false, for backward-compatibility, and the new value "database"
      (which causes the "dbname" parameter to be respected).
      
      walsender needs to loop not only when idle but also when sending
      decoded data to the user and when waiting for more xlog data to decode.
      This means that there are now three separate loops inside walsender.c;
      although some refactoring has been done here, this is still a bit ugly.
      
      Andres Freund, with contributions from Álvaro Herrera, and further
      review by me.
      5a991ef8
  34. 06 Mar, 2014 1 commit
    • Heikki Linnakangas's avatar
      Send keepalives from walsender even when busy sending WAL. · 94ae6ba7
      Heikki Linnakangas authored
      If walsender doesn't hear from the client for the time specified by
      wal_sender_timeout, it will conclude the connection or client is dead, and
      disconnect. When half of wal_sender_timeout has elapsed, it sends a ping
      to the client, leaving it the remainig half of wal_sender_timeout to
      respond. However, it only checked if half of wal_sender_timeout had elapsed
      when it was about to sleep, so if it was busy sending WAL to the client for
      long enough, it would not send the ping request in time. Then the client
      would not know it needs to send a reply, and the walsender will disconnect
      even though the client is still alive. Fix that.
      
      Andres Freund, reviewed by Robert Haas, and some further changes by me.
      Backpatch to 9.3. Earlier versions relied on the client to send the
      keepalives on its own, and hence didn't have this problem.
      94ae6ba7
  35. 04 Mar, 2014 1 commit
    • Heikki Linnakangas's avatar
      Error out on send failure in walsender loop. · 7558cc95
      Heikki Linnakangas authored
      I changed the loop in 9.3 to use "goto send_failure" instead of "break" on
      errors, but I missed this one case. It was a relatively harmless bug: if
      the flush fails once it will most likely fail again as soon as we try to
      flush the output again. But it's a bug nevertheless.
      
      Report and fix by Andres Freund.
      7558cc95
  36. 03 Mar, 2014 1 commit
    • Robert Haas's avatar
      Introduce logical decoding. · b89e1510
      Robert Haas authored
      This feature, building on previous commits, allows the write-ahead log
      stream to be decoded into a series of logical changes; that is,
      inserts, updates, and deletes and the transactions which contain them.
      It is capable of handling decoding even across changes to the schema
      of the effected tables.  The output format is controlled by a
      so-called "output plugin"; an example is included.  To make use of
      this in a real replication system, the output plugin will need to be
      modified to produce output in the format appropriate to that system,
      and to perform filtering.
      
      Currently, information can be extracted from the logical decoding
      system only via SQL; future commits will add the ability to stream
      changes via walsender.
      
      Andres Freund, with review and other contributions from many other
      people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
      Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
      Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
      Singer.
      b89e1510
  37. 24 Feb, 2014 1 commit
  38. 02 Feb, 2014 1 commit
  39. 01 Feb, 2014 1 commit
    • Tom Lane's avatar
      Fix some more bugs in signal handlers and process shutdown logic. · 214c7a4f
      Tom Lane authored
      WalSndKill was doing things exactly backwards: it should first clear
      MyWalSnd (to stop signal handlers from touching MyWalSnd->latch),
      then disown the latch, and only then mark the WalSnd struct unused by
      clearing its pid field.
      
      Also, WalRcvSigUsr1Handler and worker_spi_sighup failed to preserve
      errno, which is surely a requirement for any signal handler.
      
      Per discussion of recent buildfarm failures.  Back-patch as far
      as the relevant code exists.
      214c7a4f