1. 07 Dec, 2016 6 commits
    • Tom Lane's avatar
      Handle empty or all-blank PAGER setting more sanely in psql. · 18f8f784
      Tom Lane authored
      If the PAGER environment variable is set but contains an empty string,
      psql would pass it to "sh" which would silently exit, causing whatever
      query output we were printing to vanish entirely.  This is quite
      mystifying; it took a long time for us to figure out that this was the
      cause of Joseph Brenner's trouble report.  Rather than allowing that
      to happen, we should treat this as another way to specify "no pager".
      (We could alternatively treat it as selecting the default pager, but
      it seems more likely that the former is what the user meant to achieve
      by setting PAGER this way.)
      
      Nonempty, but all-white-space, PAGER values have the same behavior, and
      it's pretty easy to test for that, so let's handle that case the same way.
      
      Most other cases of faulty PAGER values will result in the shell printing
      some kind of complaint to stderr, which should be enough to diagnose the
      problem, so we don't need to work harder than this.  (Note that there's
      been an intentional decision not to be very chatty about apparent failure
      returns from the pager process, since that may happen if, eg, the user
      quits the pager with control-C or some such.  I'd just as soon not start
      splitting hairs about which exit codes might merit making our own report.)
      
      libpq's old PQprint() function was already on board with ignoring empty
      PAGER values, but for consistency, make it ignore all-white-space values
      as well.
      
      It's been like this a long time, so back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/CAFfgvXWLOE2novHzYjmQK8-J6TmHz42G8f3X0SORM44+stUGmw@mail.gmail.com
      18f8f784
    • Heikki Linnakangas's avatar
      Fix query cancellation. · 81f2e514
      Heikki Linnakangas authored
      In commit fe0a0b59, the datatype used for MyCancelKey and other variables
      that store cancel keys were changed from long to uint32, but I missed this
      one. That broke query cancellation on platforms where long is wider than 32
      bits.
      
      Report by Andres Freund, fix by Michael Paquier.
      81f2e514
    • Heikki Linnakangas's avatar
      Fix whitespace. · 9790b87f
      Heikki Linnakangas authored
      Thomas Munro
      9790b87f
    • Stephen Frost's avatar
      Silence compiler warnings · d97b14dd
      Stephen Frost authored
      Rearrange a bit of code to ensure that 'mode' in LWLockRelease is
      obviously always set, which seems a bit cleaner and avoids a compiler
      warning (thanks to Robert for the suggestion!).
      
      In GetCachedPlan(), initialize 'plan' to silence a compiler warning, but
      also add an Assert() to make sure we don't ever actually fall through
      with 'plan' still being set to NULL, since we are about to dereference
      it.
      
      Neither of these appear to be live bugs but at least gcc
      5.4.0-6ubuntu1~16.04.4 doesn't quite have the smarts to realize that.
      
      Discussion: https://www.postgresql.org/message-id/20161129152102.GR13284%40tamriel.snowman.net
      d97b14dd
    • Tom Lane's avatar
      Fix unsafe assumption that struct timeval.tv_sec is a "long". · 0645dacc
      Tom Lane authored
      It typically is a "long", but it seems possible that on some platforms
      it wouldn't be.  In any case, this silences a compiler warning on
      OpenBSD (cf buildfarm member curculio).
      
      While at it, use snprintf not sprintf.  This format string couldn't
      possibly overrun the supplied buffer, but that doesn't seem like
      a good reason not to use the safer style.
      
      Oversight in commit f828654e.  Back-patch to 9.6 where that came in.
      0645dacc
    • Tom Lane's avatar
      Put AC_MSG_RESULT() call in the right place. · c648f058
      Tom Lane authored
      Thinko in ecb0d20a --- this needs to go one level further out in
      the "if" nest.  As it stood, nothing got printed in the case of
      selecting named POSIX semaphores.  Cosmetic issue only, but a bug.
      c648f058
  2. 06 Dec, 2016 4 commits
    • Robert Haas's avatar
      Fix interaction of parallel query with prepared statements. · 4212cb73
      Robert Haas authored
      Previously, a prepared statement created via a Parse message could get
      a parallel plan, but one created with a PREPARE statement could not.
      This state of affairs was due to confusion on my (rhaas) part: I
      erroneously believed that a CREATE TABLE .. AS EXECUTE statement could
      only be performed with a prepared statement by PREPARE, but in fact
      one created by a Prepare message works just as well.  Therefore, it
      makes no sense to allow parallel query in one case but not the other.
      
      To fix, allow parallel query with all prepared statements, but run
      the parallel plan serially (i.e. without workers) in the case of
      CREATE TABLE .. AS EXECUTE.  Also, document this.
      
      Amit Kapila and Tobias Bussman, plus an extra sentence of
      documentation by me.
      4212cb73
    • Stephen Frost's avatar
      Bump catversion for restrictive RLS changes · cb9dcbc1
      Stephen Frost authored
      Mea culpa.
      
      Pointed out by Andres.
      cb9dcbc1
    • Fujii Masao's avatar
      Improve documentation about pg_stat_replication view. · dfe530a0
      Fujii Masao authored
      Add the descriptions of possible values in "state" and "sync_state" columns
      of pg_stat_replication view.
      
      Author: Michael Paquier, slightly modified by me
      Discussion: <CAB7nPqT7APWrvPFZrcjKEHoq4=g3z2ErxtTdojSf+sDALzuemA@mail.gmail.com>
      dfe530a0
    • Tom Lane's avatar
      Remove extraneous semicolon from uses of relptr_declare(). · 3ebf2b45
      Tom Lane authored
      If we're going to write a semicolon after calls of relptr_declare(),
      then we don't need one inside the macro, and removing it suppresses
      "empty declaration" warnings from pickier compilers (eg pademelon).
      
      While at it, we might as well use relptr() inside relptr_declare(),
      because otherwise that macro would likely go unused altogether.
      
      Also improve the comment, which I for one found unclear,
      and provide a specific example of intended usage.
      3ebf2b45
  3. 05 Dec, 2016 14 commits
    • Heikki Linnakangas's avatar
      Fix typo in new message in configure. · 44a977f5
      Heikki Linnakangas authored
      Remove spurious "of", and reformat to fit on a 80 chars wide line.
      44a977f5
    • Robert Haas's avatar
      Ensure gatherstate->nextreader is properly initialized. · 53c7cff7
      Robert Haas authored
      The previously code worked OK as long as a Gather node was never
      rescanned, or if it was rescanned, as long as it got at least as
      many workers on rescan as it had originally.  But if the number
      of workers ever decreased on a rescan, then it could crash.
      
      Andreas Seltenreich
      53c7cff7
    • Stephen Frost's avatar
      Add support for restrictive RLS policies · 093129c9
      Stephen Frost authored
      We have had support for restrictive RLS policies since 9.5, but they
      were only available through extensions which use the appropriate hooks.
      This adds support into the grammer, catalog, psql and pg_dump for
      restrictive RLS policies, thus reducing the cases where an extension is
      necessary.
      
      In passing, also move away from using "AND"d and "OR"d in comments.
      As pointed out by Alvaro, it's not really appropriate to attempt
      to make verbs out of "AND" and "OR", so reword those comments which
      attempted to.
      
      Reviewed By: Jeevan Chalke, Dean Rasheed
      Discussion: https://postgr.es/m/20160901063404.GY4028@tamriel.snowman.net
      093129c9
    • Robert Haas's avatar
      dsa: Cope with the possibility that SIZE_MAX is not defined. · 2bbdc687
      Robert Haas authored
      Per buildfarm member gaur and Tom Lane.
      2bbdc687
    • Robert Haas's avatar
      libpq: Fix another bug in 721f7bd3. · a0ae54df
      Robert Haas authored
      If we failed to connect to one or more hosts, and then afterwards we
      find one that fails to be read-write, the latter error message was
      clobbering any earlier ones.  Repair.
      
      Mithun Cy, slightly revised by me.
      a0ae54df
    • Robert Haas's avatar
      Fix race introduced by 6d46f478. · 2f4193c3
      Robert Haas authored
      It's possible for the metapage contents to change after we release
      the lock, so we must read them before releasing the lock.
      
      Amit Kapila.  Submitted in response to a trouble report from
      Andreas Seltenreich, though it is not certain this fixes the
      problem.
      2f4193c3
    • Robert Haas's avatar
      Assorted documentation improvements for max_parallel_workers. · 0e50af24
      Robert Haas authored
      Commit b460f5d6 overlooked a few bits
      of documentation that seem like they should mention the new setting.
      0e50af24
    • Robert Haas's avatar
      Reduce the default for max_worker_processes back to 8. · 2b959d49
      Robert Haas authored
      Commit b460f5d6 -- at my suggestion --
      increased the default value of max_worker_processes from 8 to 16, on
      the theory that this would be harmless and convenient for users.
      Unfortunately, this caused some buildfarm machines with low connection
      limits to start failing, so apparently it's not harmless after all.
      2b959d49
    • Robert Haas's avatar
      Fix more DSA problems uncovered by the buildfarm. · 88f626f8
      Robert Haas authored
      On 32-bit systems, don't try to use 64-bit DSA pointers, because the
      computation of DSA_MAX_SEGMENT_SIZE overflows Size.
      
      Cast 1 to Size before shifting it, so that the compiler doesn't
      produce a result of the wrong width.
      
      In passing, change one use of size_t to Size.
      88f626f8
    • Robert Haas's avatar
      Try to fix some DSA-related compiler warnings. · 670b3bc8
      Robert Haas authored
      Commit 13df76a5 was overconfident
      about how portable %016lx is.  Some compilers complain because they
      need %016llx, while platforms where DSA pointers are only 32 bits
      get unhappy about using a 64-bit format for a 32-bit quantity.
      
      Thomas Munro, per an off-list suggestion from me.
      670b3bc8
    • Heikki Linnakangas's avatar
      Fix creation of stand-alone INSTALL.html file. · 7dd8eb39
      Heikki Linnakangas authored
      I missed the notice at the top of the file, that plain xref must not be
      used in installation.sgml.
      
      Per buildfarm member guaibasaurus.
      7dd8eb39
    • Fujii Masao's avatar
      Fix typo in docs. · daac8e30
      Fujii Masao authored
      Reported-by: Darko Prelec
      daac8e30
    • Heikki Linnakangas's avatar
      Replace PostmasterRandom() with a stronger source, second attempt. · fe0a0b59
      Heikki Linnakangas authored
      This adds a new routine, pg_strong_random() for generating random bytes,
      for use in both frontend and backend. At the moment, it's only used in
      the backend, but the upcoming SCRAM authentication patches need strong
      random numbers in libpq as well.
      
      pg_strong_random() is based on, and replaces, the existing implementation
      in pgcrypto. It can acquire strong random numbers from a number of sources,
      depending on what's available:
      
      - OpenSSL RAND_bytes(), if built with OpenSSL
      - On Windows, the native cryptographic functions are used
      - /dev/urandom
      
      Unlike the current pgcrypto function, the source is chosen by configure.
      That makes it easier to test different implementations, and ensures that
      we don't accidentally fall back to a less secure implementation, if the
      primary source fails. All of those methods are quite reliable, it would be
      pretty surprising for them to fail, so we'd rather find out by failing
      hard.
      
      If no strong random source is available, we fall back to using erand48(),
      seeded from current timestamp, like PostmasterRandom() was. That isn't
      cryptographically secure, but allows us to still work on platforms that
      don't have any of the above stronger sources. Because it's not very secure,
      the built-in implementation is only used if explicitly requested with
      --disable-strong-random.
      
      This replaces the more complicated Fortuna algorithm we used to have in
      pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom,
      so it doesn't seem worth the maintenance effort to keep that. pgcrypto
      functions that require strong random numbers will be disabled with
      --disable-strong-random.
      
      Original patch by Magnus Hagander, tons of further work by Michael Paquier
      and me.
      
      Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com
      Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com
      fe0a0b59
    • Fujii Masao's avatar
      Fix incorrect output from gin_desc(). · 5dc851af
      Fujii Masao authored
      Previously gin_desc() displayed incorrect output "unknown action 0"
      for XLOG_GIN_INSERT and XLOG_GIN_VACUUM_DATA_LEAF_PAGE records with
      valid actions. The cause of this problem was that gin_desc() wrongly
      used XLogRecGetData() to extract data from those records.
      Since they were registered by XLogRegisterBufData(), gin_desc() should
      have used XLogRecGetBlockData(), instead, like gin_redo().
      Also there were other differences about how to treat XLOG_GIN_INSERT
      record between gin_desc() and gin_redo().
      
      This commit fixes gin_desc() routine so that it treats those records
      in the same way as gin_redo().
      
      Batch-patch to 9.5 where WAL record format was revamped and
      XLogRegisterBufData() was added.
      
      Reported-By: Andres Freund
      Reviewed-By: Tom Lane
      Discussion: <20160509194645.7lewnpw647zegx2m@alap3.anarazel.de>
      5dc851af
  4. 04 Dec, 2016 3 commits
    • Tom Lane's avatar
      Don't mess up pstate->p_next_resno in transformOnConflictClause(). · 38507232
      Tom Lane authored
      transformOnConflictClause incremented p_next_resno while generating the
      phony targetlist for the EXCLUDED pseudo-rel.  Then that field got
      incremented some more during transformTargetList, possibly leading to
      free_parsestate concluding that we'd overrun the allowed length of a tlist,
      as reported by Justin Pryzby.
      
      We could fix this by resetting p_next_resno to 1 after using it for the
      EXCLUDED pseudo-rel tlist, but it seems easier and less coupled to other
      places if we just don't use that field at all in this loop.  (Note that
      this doesn't change anything about the resnos that end up appearing in
      the main target list, because those are all replaced with target-column
      numbers by updateTargetListEntry.)
      
      In passing, fix incorrect type OID assigned to the whole-row Var for
      "EXCLUDED.*" (somehow this escaped having any bad consequences so far,
      but it's certainly wrong); remove useless assignment to var->location;
      pstrdup the column names in case of a relcache flush; and improve
      nearby comments.
      
      Back-patch to 9.5 where ON CONFLICT was introduced.
      
      Report: https://postgr.es/m/20161204163237.GA8030@telsasoft.com
      38507232
    • Noah Misch's avatar
      Document recipe for testing compatibility with old Perl. · d61aa6ae
      Noah Misch authored
      Craig Ringer, reviewed by Kyotaro HORIGUCHI and Michael Paquier.
      d61aa6ae
    • Noah Misch's avatar
      Make pgwin32_putenv() probe every known CRT, regardless of compiler. · 54aa6ccf
      Noah Misch authored
      This extends to MinGW builds the provision for MSVC-built libraries to
      see putenv() effects.  Doing so repairs, for example, the handling of
      the krb_server_keyfile parameter when linked with MSVC-built MIT
      Kerberos.  Like the previous commit, no back-patch.
      54aa6ccf
  5. 03 Dec, 2016 4 commits
    • Noah Misch's avatar
      Make pgwin32_putenv() follow DLL loading and unloading. · 202dbdbe
      Noah Misch authored
      Until now, the first putenv() call of a given postgres.exe process would
      cache the set of loaded CRTs.  If a CRT unloaded after that call, the
      next putenv() would crash.  That risk was largely theoretical, because
      the first putenv() precedes all PostgreSQL-initiated module loading.
      However, this might explain bad interactions with antivirus and other
      software that injects threads asynchronously.  If an additional CRT
      loaded after the first putenv(), pgwin32_putenv() would not discover it.
      That CRT would have all environment changes predating its load, but it
      would not receive later PostgreSQL-initiated changes.  An additional CRT
      loading concurrently with the first putenv() might miss that change in
      addition to missing later changes.  Fix all those problems.  This
      removes the cache mechanism from pgwin32_putenv(); the cost, less than
      100 μs per backend startup, is negligible.
      
      No resulting misbehavior was known to be user-visible given the core
      distribution alone, but one can readily construct an affected extension
      module.  No back-patch given the lack of complaints and the potential
      for behavior changes in non-PostgreSQL code running in the backend.
      
      Christian Ullrich, reviewed by Michael Paquier.
      202dbdbe
    • Noah Misch's avatar
      Make pgwin32_putenv() visit debug CRTs. · 95b9b8a3
      Noah Misch authored
      This has no effect in the most conventional case, where no relevant DLL
      uses a debug build.  For an example where it does matter, given a debug
      build of MIT Kerberos, the krb_server_keyfile parameter usually had no
      effect.  Since nobody wants a Heisenbug, back-patch to 9.2 (all
      supported versions).
      
      Christian Ullrich, reviewed by Michael Paquier.
      95b9b8a3
    • Noah Misch's avatar
      Remove wrong CloseHandle() call. · b37da1e8
      Noah Misch authored
      In accordance with its own documentation, invoke CloseHandle() only when
      directed in the documentation for the function that furnished the
      handle.  GetModuleHandle() does not so direct.  We have been issuing
      this call only in the rare event that a CRT DLL contains no "_putenv"
      symbol, so lack of bug reports is uninformative.  Back-patch to 9.2 (all
      supported versions).
      
      Christian Ullrich, reviewed by Michael Paquier.
      b37da1e8
    • Noah Misch's avatar
      Refine win32env.c cosmetics. · a9d9208c
      Noah Misch authored
      Replace use of plain 0 as a null pointer constant.  In comments, update
      terminology and lessen redundancy.  Back-patch to 9.2 (all supported
      versions) for the convenience of back-patching the next two commits.
      
      Christian Ullrich and Noah Misch, reviewed (in earlier versions) by
      Michael Paquier.
      a9d9208c
  6. 02 Dec, 2016 9 commits
    • Tom Lane's avatar
      Fix broken wait-for-previous-process-to-exit loop in regression test. · 19fcc005
      Tom Lane authored
      Must do pg_stat_clear_snapshot() inside test's loop, or our snapshot of
      pg_stat_activity will never change :-(.  Thinko in b3427dad -- evidently
      my workstation never really iterated the loop in testing.  Per buildfarm.
      19fcc005
    • Robert Haas's avatar
      Fix thinko in b3427dad. · 767a9039
      Robert Haas authored
      767a9039
    • Tom Lane's avatar
      Delete deleteWhatDependsOn() in favor of more performDeletion() flag bits. · b3427dad
      Tom Lane authored
      deleteWhatDependsOn() had grown an uncomfortably large number of
      assumptions about what it's used for.  There are actually only two minor
      differences between what it does and what a regular performDeletion() call
      can do, so let's invent additional bits in performDeletion's existing flags
      argument that specify those behaviors, and get rid of deleteWhatDependsOn()
      as such.  (We'd probably have done it this way from the start, except that
      performDeletion didn't originally have a flags argument, IIRC.)
      
      Also, add a SKIP_EXTENSIONS flag bit that prevents ever recursing to an
      extension, and use that when dropping temporary objects at session end.
      This provides a more general solution to the problem addressed in a hacky
      way in commit 08dd23ce: if an extension script creates temp objects and
      forgets to remove them again, the whole extension went away when its
      contained temp objects were deleted.  The previous solution only covered
      temp relations, but this solves it for all object types.
      
      These changes require minor additions in dependency.c to pass the flags
      to subroutines that previously didn't get them, but it's still a net
      savings of code, and it seems cleaner than before.
      
      Having done this, revert the special-case code added in 08dd23ce that
      prevented addition of pg_depend records for temp table extension
      membership, because that caused its own oddities: dropping an extension
      that had created such a table didn't automatically remove the table,
      leading to a failure if the table had another dependency on the extension
      (such as use of an extension data type), or to a duplicate-name failure if
      you then tried to recreate the extension.  But we keep the part that
      prevents the pg_temp_nnn schema from becoming an extension member; we never
      want that to happen.  Add a regression test case covering these behaviors.
      
      Although this fixes some arguable bugs, we've heard few field complaints,
      and any such problems are easily worked around by explicitly dropping temp
      objects at the end of extension scripts (which seems like good practice
      anyway).  So I won't risk a back-patch.
      
      Discussion: https://postgr.es/m/e51f4311-f483-4dd0-1ccc-abec3c405110@BlueTreble.com
      b3427dad
    • Robert Haas's avatar
      Introduce dynamic shared memory areas. · 13df76a5
      Robert Haas authored
      Programmers discovered decades ago that it was useful to have a simple
      interface for allocating and freeing memory, which is why malloc() and
      free() were invented.  Unfortunately, those handy tools don't work
      with dynamic shared memory segments because those are specific to
      PostgreSQL and are not necessarily mapped at the same address in every
      cooperating process.  So invent our own allocator instead.  This makes
      it possible for processes cooperating as part of parallel query
      execution to allocate and free chunks of memory without having to
      reserve them prior to the start of execution.  It could also be used
      for longer lived objects; for example, we could consider storing data
      for pg_stat_statements or the stats collector in shared memory using
      these interfaces, rather than writing them to files.  Basically,
      anything that needs shared memory but can't predict in advance how
      much it's going to need might find this useful.
      
      Thomas Munro and Robert Haas.  The original code (of mine) on which
      Thomas based his work was actually designed to be a new backend-local
      memory allocator for PostgreSQL, but that hasn't gone anywhere - or
      not yet, anyway.  Thomas took that work and performed major
      refactoring and extensive modifications to make it work with dynamic
      shared memory, including the addition of appropriate locking.
      
      Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
      Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
      13df76a5
    • Robert Haas's avatar
      Management of free memory pages. · 13e14a78
      Robert Haas authored
      This is intended as infrastructure for a full-fledged allocator for
      dynamic shared memory.  The interface looks a bit like a real
      allocator, but only supports allocating and freeing memory in
      multiples of the 4kB page size.  Further, to free memory, you must
      know the size of the span you wish to free, in pages.  While these are
      make it unsuitable as an allocator in and of itself, it still serves
      as very useful scaffolding for a full-fledged allocator.
      
      Robert Haas and Thomas Munro.  This code is mostly the same as my 2014
      submission, but Thomas fixed quite a few bugs and made some changes to
      the interface.
      
      Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
      Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
      13e14a78
    • Robert Haas's avatar
      Add a crude facility for dealing with relative pointers. · fbc1c12a
      Robert Haas authored
      C doesn't have any sort of built-in understanding of a pointer
      relative to some arbitrary base address, but dynamic shared memory
      segments can be mapped at different addresses in different processes,
      so any sort of shared data structure stored within a dynamic shared
      memory segment can't use absolute pointers.  We could use something
      like Size to represent a relative pointer, but then the compiler
      provides no type-checking.  Use stupid macro tricks to get some
      type-checking.
      
      Patch originally by me.  Concept suggested by Andres Freund.  Recently
      resubmitted as part of Thomas Munro's work on dynamic shared memory
      allocation.
      
      Discussion: 20131205144434.GG12398@alap2.anarazel.de
      Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
      fbc1c12a
    • Robert Haas's avatar
      Clarify that pg_stat_activity.query has a length limit. · e63d4149
      Robert Haas authored
      There was always documentation of the GUC that controlled what the
      limit actually was, but previously the documentation of the field
      itself made no mention of that limit.
      
      Ian Barwick
      e63d4149
    • Alvaro Herrera's avatar
      Fix outdated comments · 5e5986b6
      Alvaro Herrera authored
      Commit 597a87cc neglected to update some comments; fix.
      
      Report and patch by Thomas Munro.
      Reviewed by Petr Jelínek.
      5e5986b6
    • Robert Haas's avatar
      Add max_parallel_workers GUC. · b460f5d6
      Robert Haas authored
      Increase the default value of the existing max_worker_processes GUC
      from 8 to 16, and add a new max_parallel_workers GUC with a maximum
      of 8.  This way, even if the maximum amount of parallel query is
      happening, there is still room for background workers that do other
      things, as originally envisioned when max_worker_processes was added.
      
      Julien Rouhaud, reviewed by Amit Kapila and by revised by me.
      b460f5d6