1. 16 May, 2015 3 commits
    • Simon Riggs's avatar
      Add docs for tablesample system_time() · f941d033
      Simon Riggs authored
      f941d033
    • Andres Freund's avatar
      Support GROUPING SETS, CUBE and ROLLUP. · f3d31185
      Andres Freund authored
      This SQL standard functionality allows to aggregate data by different
      GROUP BY clauses at once. Each grouping set returns rows with columns
      grouped by in other sets set to NULL.
      
      This could previously be achieved by doing each grouping as a separate
      query, conjoined by UNION ALLs. Besides being considerably more concise,
      grouping sets will in many cases be faster, requiring only one scan over
      the underlying data.
      
      The current implementation of grouping sets only supports using sorting
      for input. Individual sets that share a sort order are computed in one
      pass. If there are sets that don't share a sort order, additional sort &
      aggregation steps are performed. These additional passes are sourced by
      the previous sort step; thus avoiding repeated scans of the source data.
      
      The code is structured in a way that adding support for purely using
      hash aggregation or a mix of hashing and sorting is possible. Sorting
      was chosen to be supported first, as it is the most generic method of
      implementation.
      
      Instead of, as in an earlier versions of the patch, representing the
      chain of sort and aggregation steps as full blown planner and executor
      nodes, all but the first sort are performed inside the aggregation node
      itself. This avoids the need to do some unusual gymnastics to handle
      having to return aggregated and non-aggregated tuples from underlying
      nodes, as well as having to shut down underlying nodes early to limit
      memory usage.  The optimizer still builds Sort/Agg node to describe each
      phase, but they're not part of the plan tree, but instead additional
      data for the aggregation node. They're a convenient and preexisting way
      to describe aggregation and sorting.  The first (and possibly only) sort
      step is still performed as a separate execution step. That retains
      similarity with existing group by plans, makes rescans fairly simple,
      avoids very deep plans (leading to slow explains) and easily allows to
      avoid the sorting step if the underlying data is sorted by other means.
      
      A somewhat ugly side of this patch is having to deal with a grammar
      ambiguity between the new CUBE keyword and the cube extension/functions
      named cube (and rollup). To avoid breaking existing deployments of the
      cube extension it has not been renamed, neither has cube been made a
      reserved keyword. Instead precedence hacking is used to make GROUP BY
      cube(..) refer to the CUBE grouping sets feature, and not the function
      cube(). To actually group by a function cube(), unlikely as that might
      be, the function name has to be quoted.
      
      Needs a catversion bump because stored rules may change.
      
      Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
      Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
          Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
      Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
      f3d31185
    • Simon Riggs's avatar
      Add docs for tablesample system_rows() · 6e4415c6
      Simon Riggs authored
      6e4415c6
  2. 15 May, 2015 32 commits
    • Tom Lane's avatar
      Update time zone data files to tzdata release 2015d. · 9d366c1f
      Tom Lane authored
      DST law changes in Egypt, Mongolia, Palestine.
      Historical corrections for Canada and Chile.
      Revised zone abbreviation for America/Adak (HST/HDT not HAST/HADT).
      9d366c1f
    • Alvaro Herrera's avatar
      Add BRIN infrastructure for "inclusion" opclasses · b0b7be61
      Alvaro Herrera authored
      This lets BRIN be used with R-Tree-like indexing strategies.
      
      Also provided are operator classes for range types, box and inet/cidr.
      The infrastructure provided here should be sufficient to create operator
      classes for similar datatypes; for instance, opclasses for PostGIS
      geometries should be doable, though we didn't try to implement one.
      
      (A box/point opclass was also submitted, but we ripped it out before
      commit because the handling of floating point comparisons in existing
      code is inconsistent and would generate corrupt indexes.)
      
      Author: Emre Hasegeli.  Cosmetic changes by me
      Review: Andreas Karlsson
      b0b7be61
    • Tom Lane's avatar
      Improve test for CONVERT() with GB18030 <-> UTF8. · 199f5973
      Tom Lane authored
      Add a bit of coverage of high code points.
      
      Arjen Nienhuis
      199f5973
    • Alvaro Herrera's avatar
      Move strategy numbers to include/access/stratnum.h · 26df7066
      Alvaro Herrera authored
      For upcoming BRIN opclasses, it's convenient to have strategy numbers
      defined in a single place.  Since there's nothing appropriate, create
      it.  The StrategyNumber typedef now lives there, as well as existing
      strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
      gist.h).  skey.h is forced to include stratnum.h because of the
      StrategyNumber typedef, but gist.h is not; extensions that currently
      rely on gist.h for rtree strategy numbers might need to add a new
      
      A few .c files can stop including skey.h and/or gist.h, which is a nice
      side benefit.
      
      Per discussion:
      https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org
      
      Authored by Emre Hasegeli and Álvaro.
      
      (It's not clear to me why bootscanner.l has any #include lines at all.)
      26df7066
    • Simon Riggs's avatar
      1e98fa0b
    • Tom Lane's avatar
      Fix uninitialized variable. · 66493dd7
      Tom Lane authored
      Per compiler warnings.
      66493dd7
    • Simon Riggs's avatar
      Tablesample method API docs · 910baf0a
      Simon Riggs authored
      Petr Jelinek
      910baf0a
    • Simon Riggs's avatar
      Add to contrib/Makefile · df259759
      Simon Riggs authored
      df259759
    • Simon Riggs's avatar
      contrib/tsm_system_time · 56e121a5
      Simon Riggs authored
      56e121a5
    • Simon Riggs's avatar
      contrib/tsm_system_rows · 4d40494b
      Simon Riggs authored
      4d40494b
    • Simon Riggs's avatar
      TABLESAMPLE system_time(limit) · 149f6f15
      Simon Riggs authored
      Contrib module implementing a tablesample method
      that allows you to limit the sample by a hard time
      limit.
      
      Petr Jelinek
      
      Reviewed by Michael Paquier, Amit Kapila and
      Simon Riggs
      149f6f15
    • Simon Riggs's avatar
      TABLESAMPLE system_rows(limit) · 9689290f
      Simon Riggs authored
      Contrib module implementing a tablesample method
      that allows you to limit the sample by a hard row
      limit.
      
      Petr Jelinek
      
      Reviewed by Michael Paquier, Amit Kapila and
      Simon Riggs
      9689290f
    • Tom Lane's avatar
      Extend GB18030 encoding conversion to cover full Unicode range. · 8d3e0906
      Tom Lane authored
      Our previous code for GB18030 <-> UTF8 conversion only covered Unicode code
      points up to U+FFFF, but the actual spec defines conversions for all code
      points up to U+10FFFF.  That would be rather impractical as a lookup table,
      but fortunately there is a simple algorithmic conversion between the
      additional code points and the equivalent GB18030 byte patterns.  Make use
      of the just-added callback facility in LocalToUtf/UtfToLocal to perform the
      additional conversions.
      
      Having created the infrastructure to do that, we can use the same code to
      map certain linearly-related subranges of the Unicode space below U+FFFF,
      allowing removal of the corresponding lookup table entries.  This more
      than halves the lookup table size, which is a substantial savings;
      utf8_and_gb18030.so drops from nearly a megabyte to about half that.
      
      In support of doing that, replace ISO10646-GB18030.TXT with the data file
      gb-18030-2000.xml (retrieved from
      http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/ )
      in which these subranges have been deleted from the simple lookup entries.
      
      Per bug #12845 from Arjen Nienhuis.  The conversion code added here is
      based on his proposed patch, though I whacked it around rather heavily.
      8d3e0906
    • Robert Haas's avatar
      doc: CREATE FOREIGN TABLE now allows CHECK ( ... ) NO INHERIT · 92edba26
      Robert Haas authored
      Etsuro Fujita
      92edba26
    • Simon Riggs's avatar
      TABLESAMPLE, SQL Standard and extensible · f6d208d6
      Simon Riggs authored
      Add a TABLESAMPLE clause to SELECT statements that allows
      user to specify random BERNOULLI sampling or block level
      SYSTEM sampling. Implementation allows for extensible
      sampling functions to be written, using a standard API.
      Basic version follows SQLStandard exactly. Usable
      concrete use cases for the sampling API follow in later
      commits.
      
      Petr Jelinek
      
      Reviewed by Michael Paquier and Simon Riggs
      f6d208d6
    • Heikki Linnakangas's avatar
      Silence another create_index regression test failure. · 11a83bbe
      Heikki Linnakangas authored
      More platform differences in the less-significant digits in output.
      
      Per buildfarm member rover_firefly, still.
      11a83bbe
    • Tom Lane's avatar
      Fix outdated src/test/mb/ tests, and add a GB18030 test. · 07af5238
      Tom Lane authored
      The expected-output files for these tests were broken by the recent
      addition of a warning for hash indexes.  Update them.
      
      Also add a test case for GB18030 encoding, similar to the other ones.
      This is a pretty weak test, but it's better than nothing.
      07af5238
    • Heikki Linnakangas's avatar
      Fix docs build. Oops. · 8b0f105d
      Heikki Linnakangas authored
      8b0f105d
    • Heikki Linnakangas's avatar
      Add archive_mode='always' option. · ffd37740
      Heikki Linnakangas authored
      In 'always' mode, the standby independently archives all files it receives
      from the primary.
      
      Original patch by Fujii Masao, docs and review by me.
      ffd37740
    • Bruce Momjian's avatar
      docs: consistently uppercase index method and add spacing · f6d65f0c
      Bruce Momjian authored
      Consistently uppercase index method names, e.g. GIN, and add space after
      the index method name and the parentheses enclosing the column names.
      f6d65f0c
    • Heikki Linnakangas's avatar
      Silence create_index regression test failure. · 9feaba28
      Heikki Linnakangas authored
      The expected output contained some floating point values which might get
      rounded slightly differently on different platforms. The exact output isn't
      very interesting in this test, so just round it.
      
      Per buildfarm member rover_firefly.
      9feaba28
    • Heikki Linnakangas's avatar
      Fix datatype confusion with the new lossy GiST distance functions. · 98edd617
      Heikki Linnakangas authored
      We can only support a lossy distance function when the distance function's
      datatype is comparable with the original ordering operator's datatype.
      The distance function always returns a float8, so we are limited to float8,
      and float4 (by a hard-coded cast of the float8 to float4).
      
      In light of this limitation, it seems like a good idea to have a separate
      'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy
      distance function, it still works with lossy quals. There are cases like
      that with the build-in or contrib opclasses, but it's plausible.
      
      There was a hidden assumption that the ORDER BY values returned by GiST
      match the original ordering operator's return type, but there are plenty
      of examples where that's not true, e.g. in btree_gist and pg_trgm. As long
      as the distance function is not lossy, we can tolerate that and just not
      return the distance to the executor (or rather, always return NULL). The
      executor doesn't need the distances if there are no lossy results.
      
      There was another little bug: the recheck variable was not initialized
      before calling the distance function. That revealed the bigger issue,
      as the executor tried to reorder tuples that didn't need reordering, and
      that failed because of the datatype mismatch.
      98edd617
    • Tom Lane's avatar
      Fix insufficiently-paranoid GB18030 encoding verifier. · a868931f
      Tom Lane authored
      The previous coding effectively only verified that the second byte of a
      multibyte character was in the expected range; moreover, it wasn't careful
      to make sure that the second byte even exists in the buffer before touching
      it.  The latter seems unlikely to cause any real problems in the field
      (in particular, it could never be a problem with null-terminated input),
      but it's still a bug.
      
      Since GB18030 is not a supported backend encoding, the only thing we'd
      really be doing with GB18030 text is converting it to UTF8 in LocalToUtf,
      which would fail anyway on any invalid character for lack of a match in
      its lookup table.  So the only user-visible consequence of this change
      should be that you'll get "invalid byte sequence for encoding" rather than
      "character has no equivalent" for malformed GB18030 input.  However,
      impending changes to the GB18030 conversion code will require these tighter
      up-front checks to avoid producing bogus results.
      a868931f
    • Stephen Frost's avatar
      Remove useless pg_audit.conf · aff27e33
      Stephen Frost authored
      No need to have pg_audit.conf any longer since the regression tests are
      just loading the module at the start of each session (to simulate being
      in shared_preload_libraries, which isn't something we can actually make
      happen on the buildfarm itself, it seems).
      
      Pointed out by Tom
      aff27e33
    • Fujii Masao's avatar
      Support --verbose option in reindexdb. · 458a0770
      Fujii Masao authored
      Sawada Masahiko, reviewed by Fabrízio Mello
      458a0770
    • Heikki Linnakangas's avatar
      Allow GiST distance function to return merely a lower-bound. · 35fcb1b3
      Heikki Linnakangas authored
      The distance function can now set *recheck = false, like index quals. The
      executor will then re-check the ORDER BY expressions, and use a queue to
      reorder the results on the fly.
      
      This makes it possible to do kNN-searches on polygons and circles, which
      don't store the exact value in the index, but just a bounding box.
      
      Alexander Korotkov and me
      35fcb1b3
    • Fujii Masao's avatar
      Support VERBOSE option in REINDEX command. · ecd222e7
      Fujii Masao authored
      When this option is specified, a progress report is printed as each index
      is reindexed.
      
      Per discussion, we agreed on the following syntax for the extensibility of
      the options.
      
          REINDEX (flexible options) { INDEX | ... } name
      
      Sawada Masahiko.
      Reviewed by Robert Haas, Fabrízio Mello, Alvaro Herrera, Kyotaro Horiguchi,
      Jim Nasby and me.
      
      Discussion: CAD21AoA0pK3YcOZAFzMae+2fcc3oGp5zoRggDyMNg5zoaWDhdQ@mail.gmail.com
      ecd222e7
    • Tom Lane's avatar
      Honor traditional SGML NAMELEN limit. · 4b8f797f
      Tom Lane authored
      We've conformed to this limit in the past, so might as well continue to.
      
      Aaron Swenson
      4b8f797f
    • Tom Lane's avatar
      Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions. · 7730f48e
      Tom Lane authored
      Until now, these functions have only supported encoding conversions using
      lookup tables, which is fine as long as there's not too many code points
      to convert.  However, GB18030 expects all 1.1 million Unicode code points
      to be convertible, which would require a ridiculously-sized lookup table.
      Fortunately, a large fraction of those conversions can be expressed through
      arithmetic, ie the conversions are one-to-one in certain defined ranges.
      To support that, provide a callback function that is used after consulting
      the lookup tables.  (This patch doesn't actually change anything about the
      GB18030 conversion behavior, just provide infrastructure for fixing it.)
      
      Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway,
      take the opportunity to rearrange their argument lists into what seems
      to me a saner order.  And beautify the call sites by using lengthof()
      instead of error-prone sizeof() arithmetic.
      
      In passing, also mark all the lookup tables used by these calls "const".
      This moves an impressive amount of stuff into the text segment, at least
      on my machine, and is safer anyhow.
      7730f48e
    • Simon Riggs's avatar
      Separate block sampling functions · 83e176ec
      Simon Riggs authored
      Refactoring ahead of tablesample patch
      
      Requested and reviewed by Michael Paquier
      
      Petr Jelinek
      83e176ec
    • Bruce Momjian's avatar
      pg_upgrade: make controldata checks more consistent · 5a3022fd
      Bruce Momjian authored
      Also add missing float8_pass_by_value check.
      5a3022fd
    • Peter Eisentraut's avatar
      Add pg_settings.pending_restart column · a486e357
      Peter Eisentraut authored
      with input from David G. Johnston, Robert Haas, Michael Paquier
      a486e357
  3. 14 May, 2015 5 commits