1. 29 Dec, 2018 4 commits
    • Tom Lane's avatar
      Use pg_strong_random() to select each server process's random seed. · 4203842a
      Tom Lane authored
      Previously we just set the seed based on process ID and start timestamp.
      Both those values are directly available within the session, and can
      be found out or guessed by other users too, making the session's series
      of random(3) values fairly predictable.  Up to now, our backend-internal
      uses of random(3) haven't seemed security-critical, but commit 88bdbd3f
      added one that potentially is: when using log_statement_sample_rate, a
      user might be able to predict which of his SQL statements will get logged.
      
      To improve this situation, upgrade the per-process seed initialization
      method to use pg_strong_random() if available, greatly reducing the
      predictability of the initial seed value.  This adds a few tens of
      microseconds to process start time, but since backend startup time is
      at least a couple of milliseconds, that seems an acceptable price.
      
      This means that pg_strong_random() needs to be able to run without
      reliance on any backend infrastructure, since it will be invoked
      before any of that is up.  It was safe for that already, but adjust
      comments and #include commands to make it clearer.
      
      Discussion: https://postgr.es/m/3859.1545849900@sss.pgh.pa.us
      4203842a
    • Tom Lane's avatar
      Use a separate random seed for SQL random()/setseed() functions. · 6645ad6b
      Tom Lane authored
      Previously, the SQL random() function depended on libc's random(3),
      and setseed() invoked srandom(3).  This results in interference between
      these functions and backend-internal uses of random(3).  We'd never paid
      too much mind to that, but in the wake of commit 88bdbd3f which added
      log_statement_sample_rate, the interference arguably has a security
      consequence: if log_statement_sample_rate is active then an unprivileged
      user could probably control which if any of his SQL commands get logged,
      by issuing setseed() at the right times.  That seems bad.
      
      To fix this reliably, we need random() and setseed() to use their own
      private random state variable.  Standard random(3) isn't amenable to such
      usage, so let's switch to pg_erand48().  It's hard to say whether that's
      more or less "random" than any particular platform's version of random(3),
      but it does have a wider seed value and a longer period than are required
      by POSIX, so we can hope that this isn't a big downgrade.  Also, we should
      now have uniform behavior of random() across platforms, which is worth
      something.
      
      While at it, upgrade the per-process seed initialization method to use
      pg_strong_random() if available, greatly reducing the predictability
      of the initial seed value.  (I'll separately do something similar for
      the internal uses of random().)
      
      In addition to forestalling the possible security problem, this has a
      benefit in the other direction, which is that we can now document
      setseed() as guaranteeing a reproducible sequence of random() values.
      Previously, because of the possibility of internal calls of random(3),
      we could not promise any such thing.
      
      Discussion: https://postgr.es/m/3859.1545849900@sss.pgh.pa.us
      6645ad6b
    • Peter Eisentraut's avatar
      1a4eba4e
    • Peter Eisentraut's avatar
      Remove redundant translation markers · e3299d36
      Peter Eisentraut authored
      psql_error() already handles that itself.
      e3299d36
  2. 28 Dec, 2018 7 commits
    • Michael Paquier's avatar
      Improve description of DEFAULT_XLOG_SEG_SIZE in pg_config.h · 0a5a493f
      Michael Paquier authored
      This was incorrectly referring to --walsegsize, and its description is
      rewritten in a clearer way.
      
      Author: Ian Barwick, Tom Lane
      Reviewed-by: Álvaro Herrera, Michael Paquier
      Discussion: https://postgr.es/m/08534fc6-119a-c498-254e-d5acc4e6bf85@2ndquadrant.com
      0a5a493f
    • Tom Lane's avatar
      Marginal performance hacking in erand48.c. · 6b9bba2d
      Tom Lane authored
      Get rid of the multiplier and addend variables in favor of hard-wired
      constants.  Do the multiply-and-add using uint64 arithmetic, rather
      than manually combining several narrower multiplications and additions.
      Make _dorand48 return the full-width new random value, and have its
      callers use that directly (after suitable masking) rather than
      reconstructing what they need from the unsigned short[] representation.
      
      On my machine, this is good for a nearly factor-of-2 speedup of
      pg_erand48(), probably mostly from needing just one call of ldexp()
      rather than three.  The wins for the other functions are smaller
      but measurable.  While none of the existing call sites are really
      performance-critical, a cycle saved is a cycle earned; and besides
      the machine code is smaller this way (at least on x86_64).
      
      Patch by me, but the original idea to optimize this by switching
      to int64 arithmetic is from Fabien Coelho.
      
      Discussion: https://postgr.es/m/1551.1546018192@sss.pgh.pa.us
      6b9bba2d
    • Tom Lane's avatar
      Fix latent problem with pg_jrand48(). · e0904664
      Tom Lane authored
      POSIX specifies that jrand48() returns a signed 32-bit value (in the
      range [-2^31, 2^31)), but our code was returning an unsigned 32-bit
      value (in the range [0, 2^32)).  This doesn't actually matter to any
      existing call site, because they all cast the "long" result to int32
      or uint32; but it will doubtless bite somebody in the future.
      To fix, cast the arithmetic result to int32 explicitly before the
      compiler widens it to long (if widening is needed).
      
      While at it, upgrade this file's far-short-of-project-style comments.
      Had there been some peer pressure to document pg_jrand48() properly,
      maybe this thinko wouldn't have gotten committed to begin with.
      
      Backpatch to v10 where pg_jrand48() was added, just in case somebody
      back-patches a fix that uses it and depends on the standard behavior.
      
      Discussion: https://postgr.es/m/17235.1545951602@sss.pgh.pa.us
      e0904664
    • Alvaro Herrera's avatar
      Fix thinko in previous commit · 4ed6c071
      Alvaro Herrera authored
      4ed6c071
    • Alvaro Herrera's avatar
      Rewrite ExecPartitionCheckEmitError for clarity · e8b0e6b8
      Alvaro Herrera authored
      The original was hard to follow and failed to comply with DRY principle.
      
      Discussion: https://postgr.es/m/20181206222221.g5witbsklvqthjll@alvherre.pgsql
      e8b0e6b8
    • Michael Paquier's avatar
      Clarify referential actions in docs of CREATE/ALTER TABLE · f7ea1a42
      Michael Paquier authored
      The documentation of ON DELETE and ON UPDATE uses the term "action",
      which is also used on the ALTER TABLE page for other purposes.  This
      commit renames the term to "referential_action", which is more
      consistent with the SQL specification.  The new term is now used on the
      documentation of both CREATE TABLE and ALTER TABLE for consistency.
      
      Reported-by: Brigitte Blanc-Lafay
      Author: Lætitia Avrot
      Reviewed-by: Álvaro Herrera
      Discussion: https://postgr.es/m/CAB_COdiHEVVs0uB+uYCjjYUwQ4YFFekppq+Xqv6qAM8+cd42gA@mail.gmail.com
      f7ea1a42
    • Alexander Korotkov's avatar
      Reduce length of GIN predicate locking isolation test suite · 0c6f4f92
      Alexander Korotkov authored
      Isolation test suite of GIN predicate locking was criticized for being too slow,
      especially under Valgrind.  This commit is intended to accelerate it.  Tests are
      simplified in the following ways.
      
        1) Amount of data is reduced.  We're now close to the minimal amount of data,
           which produces at least one posting tree and at least two pages of entry
           tree.
        2) Three isolation tests are merged into one.
        3) Only one tuple is queried from posting tree.  So, locking of index is the
           same, but tuple locks are not propagated to relation lock.  Also, it is
           faster.
        4) Test cases itself are simplified.  Now each test case run just one INSERT
           and one SELECT involving GIN, which either conflict or not.
      
      Discussion: https://postgr.es/m/20181204000740.ok2q53nvkftwu43a%40alap3.anarazel.de
      Reported-by: Andres Freund
      Tested-by: Andrew Dunstan
      Author: Alexander Korotkov
      Backpatch-through: 11
      0c6f4f92
  3. 27 Dec, 2018 4 commits
  4. 26 Dec, 2018 2 commits
    • Tom Lane's avatar
      Fix failure to check for open() or fsync() failures. · 8528e3d8
      Tom Lane authored
      While it seems OK to not be concerned about fsync() failure for a
      pre-existing signal file, it's not OK to not even check for open()
      failure.  This at least causes complaints from static analyzers,
      and I think on some platforms passing -1 to fsync() or close() might
      trigger assertion-type failures.  Also add (void) casts to make clear
      that we're ignoring fsync's result intentionally.
      
      Oversights in commit 2dedf4d9, noted by Coverity.
      8528e3d8
    • Tom Lane's avatar
      Fix portability failure introduced in commits d2b0b60e et al. · e9fcfed3
      Tom Lane authored
      I made a frontend fprintf() format use %m, forgetting that that's only
      safe in HEAD not the back branches; prior to 96bf88d5 and d6c55de1,
      it would work on glibc platforms but not elsewhere.  Revert to using
      %s ... strerror(errno) as the code did before.
      
      We could have left HEAD as-is, but for code consistency across branches,
      I chose to apply this patch there too.
      
      Per Coverity and a few buildfarm members.
      e9fcfed3
  5. 25 Dec, 2018 1 commit
  6. 24 Dec, 2018 1 commit
    • Michael Paquier's avatar
      Prioritize history files when archiving · b981df4c
      Michael Paquier authored
      At the end of recovery for the post-promotion process, a new history
      file is created followed by the last partial segment of the previous
      timeline.  Based on the timing, the archiver would first try to archive
      the last partial segment and then the history file.  This can delay the
      detection of a new timeline taken, particularly depending on the time it
      takes to transfer the last partial segment as it delays the moment the
      history file of the new timeline gets archived.  This can cause promoted
      standbys to use the same timeline as one already taken depending on the
      circumstances if multiple instances look at archives at the same
      location.
      
      This commit changes the order of archiving so as history files are
      archived in priority over other file types, which reduces the likelihood
      of the same timeline being taken (still not reducing the window to
      zero), and it makes the archiver behave more consistently with the
      startup process doing its post-promotion business.
      
      Author: David Steele
      Reviewed-by: Michael Paquier, Kyotaro Horiguchi
      Discussion: https://postgr.es/m/929068cf-69e1-bba2-9dc0-e05986aed471@pgmasters.net
      Backpatch-through: 9.5
      b981df4c
  7. 23 Dec, 2018 2 commits
  8. 22 Dec, 2018 3 commits
  9. 20 Dec, 2018 9 commits
    • Alexander Korotkov's avatar
      Check for conflicting queries during replay of gistvacuumpage() · c952eae5
      Alexander Korotkov authored
      013ebc0a implements so-called GiST microvacuum.  That is gistgettuple() marks
      index tuples as dead when kill_prior_tuple is set.  Later, when new tuple
      insertion claims page space, those dead index tuples are physically deleted
      from page.  When this deletion is replayed on standby, it might conflict with
      read-only queries.  But 013ebc0a doesn't handle this.  That may lead to
      disappearance of some tuples from read-only snapshots on standby.
      
      This commit implements resolving of conflicts between replay of GiST microvacuum
      and standby queries.  On the master we implement new WAL record type
      XLOG_GIST_DELETE, which comprises necessary information.  On stable releases
      we've to be tricky to keep WAL compatibility.  Information required for conflict
      processing is just appended to data of XLOG_GIST_PAGE_UPDATE record.  So,
      PostgreSQL version, which doesn't know about conflict processing, will just
      ignore that.
      
      Reported-by: Andres Freund
      Diagnosed-by: Andres Freund
      Discussion: https://postgr.es/m/20181212224524.scafnlyjindmrbe6%40alap3.anarazel.de
      Author: Alexander Korotkov
      Backpatch-through: 9.6
      c952eae5
    • Tom Lane's avatar
      Base information_schema.sql_identifier domain on name, not varchar. · 7c15cef8
      Tom Lane authored
      The SQL spec says that sql_identifier is a domain over varchar,
      but it also says that that domain is supposed to represent the set
      of valid identifiers for the implementation, in particular applying
      a length limit matching the implementation's identifier length limit.
      We were declaring sql_identifier as just "character varying", thus
      duplicating what the spec says about base type, but entirely failing
      at the rest of it.
      
      Instead, let's declare sql_identifier as a domain over type "name".
      (We can drop the COLLATE "C" added by commit 6b0faf72, since that's
      now implicit in "name".)  With the recent improvements to name's
      comparison support, there's not a lot of functional difference between
      name and varchar.  So although in principle this is a spec deviation,
      it's a pretty minor one.  And correctly enforcing PG's name length limit
      is a good thing; on balance this seems closer to the intent of the spec
      than what we had.
      
      But that's all just language-lawyering.  The *real* reason to do this is
      that it makes sql_identifier columns exposed by information_schema views
      be just direct representations of the underlying "name" catalog columns,
      eliminating a semantic mismatch that was disastrous for performance of
      typical queries on the information_schema.  In combination with the
      recent change to allow dropping no-op CoerceToDomain nodes, this allows
      (for example) queries such as
      
          select ... from information_schema.tables where table_name = 'foo';
      
      to produce an indexscan rather than a seqscan on pg_class.
      
      Discussion: https://postgr.es/m/CAFj8pRBUCX4LZ2rA2BbEkdD6NN59mgx+BLo1gO08Wod4RLtcTg@mail.gmail.com
      7c15cef8
    • Tom Lane's avatar
      Avoid producing over-length specific_name outputs in information_schema. · 5bbee34d
      Tom Lane authored
      information_schema output columns that are declared as being type
      sql_identifier are supposed to conform to the implementation's rules
      for valid identifiers, in particular the identifier length limit.
      Several places potentially violated this limit by concatenating a
      function's name and OID.  (The OID is added to ensure name uniqueness
      within a schema, since the spec doesn't expect function name overloading.)
      
      Simply truncating the concatenation result to fit in "name" won't do,
      since losing part of the OID might wind up giving non-unique results.
      Instead, let's truncate the function name as necessary.
      
      The most practical way to do that is to do it in a C function; the
      information_schema.sql script doesn't have easy access to the value
      of NAMEDATALEN, nor does it have an easy way to truncate on the basis
      of resulting byte-length rather than number of characters.
      
      (There are still a couple of places that cast concatenation results to
      sql_identifier, but as far as I can see they are guaranteed not to produce
      over-length strings, at least with the normal value of NAMEDATALEN.)
      
      Discussion: https://postgr.es/m/23817.1545283477@sss.pgh.pa.us
      5bbee34d
    • Alvaro Herrera's avatar
      Fix lock level used for partition when detaching it · 7b14bcc0
      Alvaro Herrera authored
      For probably bogus reasons, we acquire only AccessShareLock on the
      partition when we try to detach it from its parent partitioned table.
      This can cause ugly things to happen if another transaction is doing
      any sort of DDL to the partition concurrently.
      
      Upgrade that lock to ShareUpdateExclusiveLock, which per discussion
      seems to be the minimum needed.
      
      Reported by Robert Haas.
      
      Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
      7b14bcc0
    • Tom Lane's avatar
      Doc: fix ancient mistake in search_path documentation. · 42bdf853
      Tom Lane authored
      "$user" in a search_path string is replaced by CURRENT_USER not
      SESSION_USER.  (It actually was SESSION_USER in the initial implementation,
      but we changed it shortly later, and evidently forgot to fix the docs to
      match.)
      
      Noted by antonov@stdpr.ru
      
      Discussion: https://postgr.es/m/159151fb45d490c8d31ea9707e9ba99d@stdpr.ru
      42bdf853
    • Tom Lane's avatar
      Make bitmapset.c use 64-bit bitmap words on 64-bit machines. · 216af5ee
      Tom Lane authored
      Using the full width of the CPU's native word size shouldn't cost
      anything in typical cases.  When working with large bitmapsets,
      this halves the number of operations needed for many common BMS
      operations.  On the right sort of test case, a measurable improvement
      is obtained.
      
      David Rowley
      
      Discussion: https://postgr.es/m/CAKJS1f9EGBd2h-VkXvb=51tf+X46zMX5T8h-KYgXEV_u2zmLUw@mail.gmail.com
      216af5ee
    • Alvaro Herrera's avatar
      DETACH PARTITION: hold locks on indexes until end of transaction · 0c237715
      Alvaro Herrera authored
      When a partition is detached from its parent, we acquire locks on all
      attached indexes to also detach them ... but we release those locks
      immediately.  This is a violation of the policy of keeping locks on user
      objects to the end of the transaction.  Bug introduced in 8b08f7d4.
      
      It's unclear that there are any ill effects possible, but it's clearly
      wrong nonetheless.  It's likely that bad behavior *is* possible, but
      mostly because the relation that the index is for is only locked with
      AccessShareLock, which is an older bug that shall be fixed separately.
      
      While touching that line of code, close the index opened with
      index_open() using index_close() instead of relation_close().
      No difference in practice, but let's be consistent.
      
      Unearthed by Robert Haas.
      
      Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
      0c237715
    • Michael Paquier's avatar
      Add more tab completion for CREATE TABLE in psql · 4cba9c2a
      Michael Paquier authored
      The following completion patterns are added:
      - CREATE TABLE <name> with '(', OF or PARTITION OF.
      - CREATE TABLE <name> OF with list of composite types.
      - CREATE TABLE name (...) with PARTITION OF, WITH, TABLESPACE, ON
      COMMIT (depending on the presence of a temporary table).
      - CREATE TABLE ON COMMIT with actions (only for temporary tables).
      
      Author: Dagfinn Ilmari Mannsåker
      Discussion: https://postgr.es/m/d8j1s77kdbb.fsf@dalvik.ping.uio.no
      4cba9c2a
    • Greg Stark's avatar
      Fix ADD IF NOT EXISTS used in conjunction with ALTER TABLE ONLY · 1075dfda
      Greg Stark authored
      The flag for IF NOT EXISTS was only being passed down in the normal
      recursing case. It's been this way since originally added in 9.6 in
      commit 2cd40adb so backpatch back to 9.6.
      1075dfda
  10. 19 Dec, 2018 7 commits
    • Tom Lane's avatar
      Add text-vs-name cross-type operators, and unify name_ops with text_ops. · 2ece7c07
      Tom Lane authored
      Now that name comparison has effectively the same behavior as text
      comparison, we might as well merge the name_ops opfamily into text_ops,
      allowing cross-type comparisons to be processed without forcing a
      datatype coercion first.  We need do little more than add cross-type
      operators to make the opfamily complete, and fix one or two places
      in the planner that assumed text_ops was a single-datatype opfamily.
      
      I chose to unify hash name_ops into hash text_ops as well, since the
      types have compatible hashing semantics.  This allows marking the
      new cross-type equality operators as oprcanhash.
      
      (Note: this doesn't remove the name_ops opclasses, so there's no
      breakage of index definitions.  Those opclasses are just reparented
      into the text_ops opfamily.)
      
      Discussion: https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
      2ece7c07
    • Tom Lane's avatar
      Make type "name" collation-aware. · 586b98fd
      Tom Lane authored
      The "name" comparison operators now all support collations, making them
      functionally equivalent to "text" comparisons, except for the different
      physical representation of the datatype.  They do, in fact, mostly share
      the varstr_cmp and varstr_sortsupport infrastructure, which has been
      slightly enlarged to handle the case.
      
      To avoid changes in the default behavior of the datatype, set name's
      typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that
      by default comparisons to a name value will continue to use strcmp
      semantics.  (This would have been the case for system catalog columns
      anyway, because of commit 6b0faf72, but doing this makes it true for
      user-created name columns as well.  In particular, this avoids
      locale-dependent changes in our regression test results.)
      
      In consequence, tweak a couple of places that made assumptions about
      collatable base types always having typcollation DEFAULT_COLLATION_OID.
      I have not, however, attempted to relax the restriction that user-
      defined collatable types must have that.  Hence, "name" doesn't
      behave quite like a user-defined type; it acts more like a domain
      with COLLATE "C".  (Conceivably, if we ever get rid of the need for
      catalog name columns to be fixed-length, "name" could actually become
      such a domain over text.  But that'd be a pretty massive undertaking,
      and I'm not volunteering.)
      
      Discussion: https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
      586b98fd
    • Alvaro Herrera's avatar
      Remove function names from error messages · 68f6f2b7
      Alvaro Herrera authored
      They are not necessary, and having them there gives useless work for
      translators.
      68f6f2b7
    • Tom Lane's avatar
      Small improvements for allocation logic in ginHeapTupleFastCollect(). · c6e394c1
      Tom Lane authored
      Avoid repetitive calls to repalloc() when the required size of the
      collector array grows more than 2x in one call.  Also ensure that the
      array size is a power of 2 (since palloc will probably consume a power
      of 2 anyway) and doesn't start out very small (which'd likely just lead
      to extra repallocs).
      
      David Rowley, tweaked a bit by me
      
      Discussion: https://postgr.es/m/CAKJS1f8vn-iSBE8PKeVHrnhvyjRNYCxguPFFY08QLYmjWG9hPQ@mail.gmail.com
      c6e394c1
    • Tom Lane's avatar
    • Peter Geoghegan's avatar
      Remove obsolete nbtree duplicate entries comment. · 61a4480a
      Peter Geoghegan authored
      Remove a comment from the Berkeley days claiming that nbtree must
      disambiguate duplicate keys within _bt_moveright().  There is no special
      care taken around duplicates within _bt_moveright(), at least since
      commit 9e85183b removed inscrutable _bt_moveright() code to handle
      pages full of duplicates.
      61a4480a
    • Peter Geoghegan's avatar
      Correct obsolete nbtree recovery comments. · 60f3cc95
      Peter Geoghegan authored
      Commit 40dae7ec, which made the handling of interrupted nbtree page
      splits more robust, removed an nbtree-specific end-of-recovery cleanup
      step.  This meant that it was no longer possible to complete an
      interrupted page split during recovery.  However, a reference to
      recovery as a reason for using a NULL stack while inserting into a
      parent page was missed.  Remove the reference.
      
      Remove a similar obsolete reference to recovery that was introduced much
      more recently, as part of the btree fastpath optimization enhancement
      that made it into Postgres 11 (commit 2b272734, and follow-up commits).
      
      Backpatch: 11-, where the fastpath optimization was introduced.
      60f3cc95