Commits · 4823621db312a0597c40686c4c94d47428889fef · Abuhujair Javed / Postgres FD Implementation

15 Jan, 2021 6 commits

Improve our heuristic for selecting PG_SYSROOT on macOS. · 4823621d

Tom Lane authored Jan 15, 2021

In cases where Xcode is newer than the underlying macOS version,
asking xcodebuild for the SDK path will produce a pointer to the
SDK shipped with Xcode, which may end up building code that does
not work on the underlying macOS version. It appears that in
such cases, xcodebuild's answer also fails to match the default
behavior of Apple's compiler: assuming one has installed Xcode's
"command line tools", there will be an SDK for the OS's own version
in /Library/Developer/CommandLineTools, and the compiler will
default to using that. This is all pretty poorly documented,
but experimentation suggests that "xcrun --show-sdk-path" gives
the sysroot path that the compiler is actually using, at least
in some cases. Hence, try that first, but revert to xcodebuild
if xcrun fails (in very old Xcode, it is missing or lacks the
--show-sdk-path switch).

Also, "xcrun --show-sdk-path" may give a path that is valid but lacks
any OS version identifier. We don't really want that, since most
of the motivation for wiring -isysroot into the build flags at all
is to ensure that all parts of a PG installation are built against
the same SDK, even when considering extensions built later and/or on
a different machine. Insist on finding "N.N" in the directory name
before accepting the result. (Adding "--sdk macosx" to the xcrun
call seems to produce the same answer as xcodebuild, but usually
more quickly because it's cached, so we also try that as a fallback.)

The core reason why we don't want to use Xcode's default SDK in cases
like this is that Apple's technology for introducing new syscalls
does not play nice with Autoconf: for example, configure will think
that preadv/pwritev exist when using a Big Sur SDK, even when building
on an older macOS version where they don't exist. It'd be nice to
have a better solution to that problem, but this patch doesn't attempt
to fix that.

Per report from Sergey Shinderuk. Back-patch to all supported versions.

Discussion: https://postgr.es/m/ed3b8e5d-0da8-6ebd-fd1c-e0ac80a4b204@postgrespro.ru

4823621d

Avoid spurious wait in concurrent reindex · f9900df5

Alvaro Herrera authored Jan 15, 2021

This is like commit c98763bf, but for REINDEX CONCURRENTLY. To wit:
this flags indicates that the current process is safe to ignore for the
purposes of waiting for other snapshots, when doing CREATE INDEX
CONCURRENTLY and REINDEX CONCURRENTLY. This helps two processes doing
either of those things not deadlock, and also avoids spurious waits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql

f9900df5

Fix calculation of how much shared memory is required to store a TOC. · 2ad78a87

Fujii Masao authored Jan 15, 2021

Commit ac883ac4 refactored shm_toc_estimate() but changed its calculation
of shared memory size for TOC incorrectly. Previously this could cause too
large memory to be allocated.

Back-patch to v11 where the bug was introduced.

Author: Takayuki Tsunakawa
Discussion: https://postgr.es/m/TYAPR01MB2990BFB73170E2C4921E2C4DFEA80@TYAPR01MB2990.jpnprd01.prod.outlook.com

2ad78a87

Remove PG_SHA*_DIGEST_STRING_LENGTH from sha2.h · ccf4e277

Michael Paquier authored Jan 15, 2021

The last reference to those variables has been removed in aef8948f, so
this cleans up a bit the code.

Discussion: https://postgr.es/m/X//ggAqmTtt+3t7X@paquier.xyz

ccf4e277

Fix O(N^2) stat() calls when recycling WAL segments · 5ae15729

Michael Paquier authored Jan 15, 2021

The counter tracking the last segment number recycled was getting
initialized when recycling one single segment, while it should be used
across a full cycle of segments recycled to prevent useless checks
related to entries already recycled.

This performance issue has been introduced by b2a5545b, and it was first
implemented in 61b86142.

No backpatch is done per the lack of field complaints.

Reported-by: Andres Freund, Thomas Munro
Author: Michael Paquier
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170621211016.eln6cxxp3jrv7m4m@alap3.anarazel.de
Discussion: https://postgr.es/m/CA+hUKG+DRiF9z1_MU4fWq+RfJMxP7zjoptfcmuCFPeO4JM2iVg@mail.gmail.com

5ae15729

postgres_fdw: Save foreign server OID in connection cache entry. · 5e5f4fcd

Fujii Masao authored Jan 15, 2021

The foreign server OID stored in the connection cache entry is used as
a lookup key to directly get the server name.

Previously since the connection cache entry did not have the server OID,
postgres_fdw had to get the server OID at first from user mapping before
getting the server name. So if the corresponding user mapping was dropped,
postgres_fdw could raise the error "cache lookup failed for user mapping"
while looking up user mapping and fail to get the server name even though
the server had not been dropped yet.

Author: Bharath Rupireddy
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CALj2ACVRZPUB7ZwqLn-6DY8C_UmPs6084gSpHA92YBv++1AJXA@mail.gmail.com

5e5f4fcd

14 Jan, 2021 7 commits

pg_dump: label PUBLICATION TABLE ArchiveEntries with an owner. · 8e396a77

Tom Lane authored Jan 14, 2021

This is the same fix as commit 9eabfe30 applied to INDEX ATTACH
entries, but for table-to-publication attachments.  As in that
case, even though the backend doesn't record "ownership" of the
attachment, we still ought to label it in the dump archive with
the role name that should run the ALTER PUBLICATION command.
The existing behavior causes the ALTER to be done by the original
role that started the restore; that will usually work fine, but
there may be corner cases where it fails.

The bulk of the patch is concerned with changing struct
PublicationRelInfo to include a pointer to the associated
PublicationInfo object, so that we can get the owner's name
out of that when the time comes.  While at it, I rewrote
getPublicationTables() to do just one query of pg_publication_rel,
not one per table.

Back-patch to v10 where this code was introduced.

Discussion: https://postgr.es/m/1165710.1610473242@sss.pgh.pa.us

8e396a77

Prevent drop of tablespaces used by partitioned relations · ebfe2dbd

Alvaro Herrera authored Jan 14, 2021

When a tablespace is used in a partitioned relation (per commits
ca410302 in pg12 for tables and 33e6c34c3267 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems.  One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.

Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.

Backpatch this to 11, where this problem has been latent all along.  We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.

Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
correct.

It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace.  Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.

This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>

ebfe2dbd

Stabilize timeline switch regression test. · 424d7a9b

Fujii Masao authored Jan 14, 2021

Commit fef5b47f added the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.
But the buildfarm member florican reported that this test failed because
the requested WAL segment was removed and replication failed. This is a
timing issue. Since neither replication slot is used nor wal_keep_size is set
in the test, checkpoint could remove the WAL segment that's still necessary
for replication.

This commit stabilizes the test by setting wal_keep_size.

Back-patch to v13 where the regression test that this commit stabilizes
was added.

Author: Fujii Masao
Discussion: https://postgr.es/m/X//PsenxcC50jDzX@paquier.xyz

424d7a9b

Improve tab-completion for CLOSE, DECLARE, FETCH and MOVE. · 3f238b88

Fujii Masao authored Jan 14, 2021

This commit makes CLOSE, FETCH and MOVE commands tab-complete the list of
cursors. Also this commit makes DECLARE command tab-complete the options.

Author: Shinya Kato, Sawada Masahiko, tweaked by Fujii Masao
Reviewed-by: Shinya Kato, Sawada Masahiko, Fujii Masao
Discussion: https://postgr.es/m/b0e4c5c53ef84c5395524f5056fc71f0@MP-MSGSS-MBX001.msg.nttdata.co.jp

3f238b88

Minor header cleanup for the new iovec code. · fb29ab26

Thomas Munro authored Jan 14, 2021

Remove redundant function declaration and improve header comment in
pg_iovec.h.  Move the new declaration in fd.h next to a group of more
similar functions.

fb29ab26

Ensure that a standby is able to follow a primary on a newer timeline. · fef5b47f

Fujii Masao authored Jan 14, 2021

Commit 709d003f refactored WAL-reading code, but accidentally caused
WalSndSegmentOpen() to fail to follow a timeline switch while reading from
a historic timeline. This issue caused a standby to fail to follow a primary
on a newer timeline when WAL archiving is enabled.

If there is a timeline switch within the segment, WalSndSegmentOpen() should
read from the WAL segment belonging to the new timeline. But previously
since it failed to follow a timeline switch, it tried to read the WAL segment
with old timeline. When WAL archiving is enabled, that WAL segment with
old timeline doesn't exist because it's renamed to .partial. This leads
a primary to have tried to read non-existent WAL segment, and which caused
replication to faill with the error "ERROR: requested WAL segment ... has
already been removed".

This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
switch, to ensure that a standby is able to follow a primary on a newer
timeline even when WAL archiving is enabled.

This commit also adds the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.

Back-patch to v13 where the bug was introduced.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi, tweaked by Fujii Masao
Reviewed-by: Alvaro Herrera, Fujii Masao
Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com

fef5b47f

Rework refactoring of hex and encoding routines · aef8948f

Michael Paquier authored Jan 14, 2021

This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected.  Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing.  This is moved to src/common/ and
it gains the same overflow checks as the decoding part.

On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries.  Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.

While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection.  The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code.  Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks.  The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.

Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us

aef8948f

13 Jan, 2021 18 commits

Move our p{read,write}v replacements into their own files. · 0d56acfb

Thomas Munro authored Jan 14, 2021

macOS's ranlib issued a warning about an empty pread.o file with the
previous arrangement, on systems new enough to require no replacement
functions. Let's go back to using configure's AC_REPLACE_FUNCS system
to build and include each .o in the library only if it's needed, which
requires moving the *v() functions to their own files.

Also move the _with_retry() wrapper to a more permanent home.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us

0d56acfb

Mark inet_server_addr() and inet_server_port() as parallel-restricted. · 5a6f9bce

Tom Lane authored Jan 13, 2021

These need to be PR because they access the MyProcPort data structure,
which doesn't get copied to parallel workers.  The very similar
functions inet_client_addr() and inet_client_port() are already
marked PR, but somebody missed these.

Although this is a pre-existing bug, we can't readily fix it in the back
branches since we can't force initdb.  Given the small usage of these
two functions, and the even smaller likelihood that they'd get pushed to
a parallel worker anyway, it doesn't seem worth the trouble to suggest
that DBAs should fix it manually.

Masahiko Sawada

Discussion: https://postgr.es/m/CAD21AoAT4aHP0Uxq91qpD7NL009tnUYQe-b14R3MnSVOjtE71g@mail.gmail.com

5a6f9bce

Run reformat-dat-files to declutter the catalog data files. · 8b411b8f

Tom Lane authored Jan 13, 2021

Things had gotten pretty messy here, apparently mostly but not
entirely the fault of the multirange patch.  No functional changes.

8b411b8f

Doc, more or less: uncomment tutorial example that was fixed long ago. · dce62490

Tom Lane authored Jan 13, 2021

Reverts a portion of commit 344190b7.  Apparently, back in the
twentieth century we had some issues with multi-statement SQL
functions, but they've worked fine for a long time.

Daniel Westermann

Discussion: https://postgr.es/m/GVAP278MB04242DCBF5E31F528D53FA18D2A90@GVAP278MB0424.CHEP278.PROD.OUTLOOK.COM

dce62490

Call out vacuum considerations in create index docs · 93c39f98

Alvaro Herrera authored Jan 13, 2021

Backpatch to pg12, which is as far as it goes without conflicts.

Author: James Coleman <jtc331@gmail.com>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CAAaqYe9oEfbz7AxXq7OX+FFVi5w5p1e_Of8ON8ZnKO9QqBfmjg@mail.gmail.com

93c39f98

Disallow a digit as the first character of a variable name in pgbench. · c21ea4d5

Tom Lane authored Jan 13, 2021

The point of this restriction is to avoid trying to substitute variables
into timestamp literal values, which may contain strings like '12:34'.

There is a good deal more that should be done to reduce pgbench's
tendency to substitute where it shouldn't.  But this is sufficient to
solve the case complained of by Jaime Soler, and it's simple enough
to back-patch.

Back-patch to v11; before commit 9d36a386, pgbench had a slightly
different definition of what a variable name is, and anyway it seems
unwise to change long-stable branches for this.

Fabien Coelho

Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2006291740420.805678@pseudo

c21ea4d5

Fix test failure with wal_level=minimal. · 5abca4b1

Heikki Linnakangas authored Jan 13, 2021

The newly-added gist pageinspect test prints the LSNs of GiST pages,
expecting them all to be 1 (GistBuildLSN). But with wal_level=minimal,
they got updated by the whole-relation WAL-logging at commit. Fix by
wrapping the problematic tests in the same transaction with the CREATE
INDEX.

Per buildfarm failure on thorntail.

Discussion: https://www.postgresql.org/message-id/3B4F97E5-40FB-4142-8CAA-B301CDFBF982%40iki.fi

5abca4b1

Doc: clarify behavior of back-half options in pg_dump. · 06ed235a

Tom Lane authored Jan 13, 2021

Options that change how the archive data is converted to SQL text
are ignored when dumping to archive formats. The documentation
previously said "not meaningful", which is not helpful.

Discussion: https://postgr.es/m/161052021249.12228.9598689907884726185@wrigleys.postgresql.org

06ed235a

Enhance nbtree index tuple deletion. · d168b666

Peter Geoghegan authored Jan 13, 2021

Teach nbtree and heapam to cooperate in order to eagerly remove
duplicate tuples representing dead MVCC versions. This is "bottom-up
deletion". Each bottom-up deletion pass is triggered lazily in response
to a flood of versions on an nbtree leaf page. This usually involves a
"logically unchanged index" hint (these are produced by the executor
mechanism added by commit 9dc718bd).

The immediate goal of bottom-up index deletion is to avoid "unnecessary"
page splits caused entirely by version duplicates. It naturally has an
even more useful effect, though: it acts as a backstop against
accumulating an excessive number of index tuple versions for any given
_logical row_. Bottom-up index deletion complements what we might now
call "top-down index deletion": index vacuuming performed by VACUUM.
Bottom-up index deletion responds to the immediate local needs of
queries, while leaving it up to autovacuum to perform infrequent clean
sweeps of the index. The overall effect is to avoid certain
pathological performance issues related to "version churn" from UPDATEs.

The previous tableam interface used by index AMs to perform tuple
deletion (the table_compute_xid_horizon_for_tuples() function) has been
replaced with a new interface that supports certain new requirements.
Many (perhaps all) of the capabilities added to nbtree by this commit
could also be extended to other index AMs. That is left as work for a
later commit.

Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
to consider extra index tuples (that are not LP_DEAD-marked) for
deletion in passing. This increases the number of index tuples deleted
significantly in many cases. The LP_DEAD deletion process (which is now
called "simple deletion" to clearly distinguish it from bottom-up
deletion) won't usually need to visit any extra table blocks to check
these extra tuples. We have to visit the same table blocks anyway to
generate a latestRemovedXid value (at least in the common case where the
index deletion operation's WAL record needs such a value).

Testing has shown that the "extra tuples" simple deletion enhancement
increases the number of index tuples deleted with almost any workload
that has LP_DEAD bits set in leaf pages. That is, it almost never fails
to delete at least a few extra index tuples. It helps most of all in
cases that happen to naturally have a lot of delete-safe tuples. It's
not uncommon for an individual deletion operation to end up deleting an
order of magnitude more index tuples compared to the old naive approach
(e.g., custom instrumentation of the patch shows that this happens
fairly often when the regression tests are run).

Add a further enhancement that augments simple deletion and bottom-up
deletion in indexes that make use of deduplication: Teach nbtree's
_bt_delitems_delete() function to support granular TID deletion in
posting list tuples. It is now possible to delete individual TIDs from
posting list tuples provided the TIDs have a tableam block number of a
table block that gets visited as part of the deletion process (visiting
the table block can be triggered directly or indirectly). Setting the
LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
but that matters much less now that deletion only needs to start out
with the right _general_ idea about which index tuples are deletable.

Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.

No bump in BTREE_VERSION, since there are no changes to the on-disk
representation of nbtree indexes. Indexes built on PostgreSQL 12 or
PostgreSQL 13 will automatically benefit from bottom-up index deletion
(i.e. no reindexing required) following a pg_upgrade. The enhancement
to simple deletion is available with all B-Tree indexes following a
pg_upgrade, no matter what PostgreSQL version the user upgrades from.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com

d168b666

Pass down "logically unchanged index" hint. · 9dc718bd

Peter Geoghegan authored Jan 13, 2021

Add an executor aminsert() hint mechanism that informs index AMs that
the incoming index tuple (the tuple that accompanies the hint) is not
being inserted by execution of an SQL statement that logically modifies
any of the index's key columns.

The hint is received by indexes when an UPDATE takes place that does not
apply an optimization like heapam's HOT (though only for indexes where
all key columns are logically unchanged). Any index tuple that receives
the hint on insert is expected to be a duplicate of at least one
existing older version that is needed for the same logical row. Related
versions will typically be stored on the same index page, at least
within index AMs that apply the hint.

Recognizing the difference between MVCC version churn duplicates and
true logical row duplicates at the index AM level can help with cleanup
of garbage index tuples. Cleanup can intelligently target tuples that
are likely to be garbage, without wasting too many cycles on less
promising tuples/pages (index pages with little or no version churn).

This is infrastructure for an upcoming commit that will teach nbtree to
perform bottom-up index deletion. No index AM actually applies the hint
just yet.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com

9dc718bd

Log long wait time on recovery conflict when it's resolved. · 39b03690

Fujii Masao authored Jan 13, 2021

This is a follow-up of the work done in commit 0650ff23. This commit
extends log_recovery_conflict_waits so that a log message is produced
also when recovery conflict has already been resolved after deadlock_timeout
passes, i.e., when the startup process finishes waiting for recovery
conflict after deadlock_timeout. This is useful in investigating how long
recovery conflicts prevented the recovery from applying WAL.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Bertrand Drouvot
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com

39b03690

Fix portability issues in the new gist pageinspect test. · 6ecaaf81

Heikki Linnakangas authored Jan 13, 2021

1. The raw bytea representation of the point-type keys used in the test
depends on endianess. Remove the raw key_data column from the test.

2. The items stored on non-leftmost gist page depends on how many items
git on the other pages. This showed up as a failure on 32-bit i386
systems. To fix, only test the gist_page_items() function on the
leftmost leaf page.

Per Andrey Borodin and the buildfarm.

Discussion: https://www.postgresql.org/message-id/9FCEC1DC-86FB-4A57-88EF-DD13663B36AF%40yandex-team.ru

6ecaaf81

Remove incorrect markup · e6eeb8d7

Magnus Hagander authored Jan 13, 2021

Seems 737d69ff made a copy/paste or automation error resulting in two
extra right-parenthesis.

Reported-By: Michael Vastola
Backpatch-through: 13
Discussion: https://postgr.es/m/161051035421.12224.1741822783166533529@wrigleys.postgresql.org

e6eeb8d7

Add functions to 'pageinspect' to inspect GiST indexes. · 756ab291

Heikki Linnakangas authored Jan 13, 2021

Author: Andrey Borodin and me
Discussion: https://www.postgresql.org/message-id/3E4F9093-A1B5-4DF8-A292-0B48692E3954%40yandex-team.ru

756ab291

Don't use elog() in src/port/pwrite.c. · df10ac62

Thomas Munro authored Jan 13, 2021

Nothing broke because of this oversight yet, but it would fail to link
if we tried to use pg_pwrite() in frontend code on a system that lacks
pwrite(). Use an assertion instead. Also pgindent while here.

Discussion: https://postgr.es/m/CA%2BhUKGL57RvoQsS35TVPnQoPYqbtBixsdRhynB8NpcUKpHTTtg%40mail.gmail.com

df10ac62

Fix memory leak in SnapBuildSerialize. · ee1b38f6

Amit Kapila authored Jan 13, 2021

The memory for the snapshot was leaked while serializing it to disk during
logical decoding. This memory will be freed only once walsender stops
streaming the changes. This can lead to a huge memory increase when master
logs Standby Snapshot too frequently say when the user is trying to create
many replication slots.

Reported-by: funnyxj.fxj@alibaba-inc.com
Diagnosed-by: funnyxj.fxj@alibaba-inc.com
Author: Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/033ab54c-6393-42ee-8ec9-2b399b5d8cde.funnyxj.fxj@alibaba-inc.com

ee1b38f6

Optimize DropRelFileNodesAllBuffers() for recovery. · bea449c6

Amit Kapila authored Jan 13, 2021

Similar to commit d6ad34f3, this patch optimizes
DropRelFileNodesAllBuffers() by avoiding the complete buffer pool scan and
instead find the buffers to be invalidated by doing lookups in the
BufMapping table.

This optimization helps operations where the relation files need to be
removed like Truncate, Drop, Abort of Create Table, etc.

Author: Kirk Jamison
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com

bea449c6

Fix routine name in comment of catcache.c · fce7d0e6

Michael Paquier authored Jan 13, 2021

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACUDXLAkf_XxQO9tAUtnTNGi3Lmd8fANd+vBJbcHn1HoWA@mail.gmail.com

fce7d0e6

12 Jan, 2021 8 commits

Invent struct ReindexIndexInfo · c6c4b373

Alvaro Herrera authored Jan 12, 2021

This struct is used by ReindexRelationConcurrently to keep track of the
relations to process.  This saves having to obtain some data repeatedly,
and has future uses as well.
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql

c6c4b373

pg_dump: label INDEX ATTACH ArchiveEntries with an owner. · 9eabfe30

Tom Lane authored Jan 12, 2021

Although a partitioned index's attachment to its parent doesn't
have separate ownership, the ArchiveEntry for it needs to be
marked with an owner anyway, to ensure that the ALTER command
is run by the appropriate role when restoring with
--use-set-session-authorization.  Without this, the ALTER will
be run by the role that started the restore session, which will
usually work but it's formally the wrong thing.

Back-patch to v11 where this type of ArchiveEntry was added.
In HEAD, add equivalent commentary to the just-added TABLE ATTACH
case, which I'd made do the right thing already.

Discussion: https://postgr.es/m/1094034.1610418498@sss.pgh.pa.us

9eabfe30

Doc: fix description of privileges needed for ALTER PUBLICATION. · cc865c0f

Tom Lane authored Jan 12, 2021

Adding a table to a publication requires ownership of the table
(in addition to ownership of the publication).  This was mentioned
nowhere.

cc865c0f

Fix thinko in comment · a3e51a36

Alvaro Herrera authored Jan 12, 2021

This comment has been wrong since its introduction in commit
2c03216d.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com

a3e51a36

Fix relation descriptor leak. · 044aa9e7

Amit Kapila authored Jan 12, 2021

We missed closing the relation descriptor while sending changes via the
root of partitioned relations during logical replication.

Author: Amit Langote and Mark Zhao
Reviewed-by: Amit Kapila and Ashutosh Bapat
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
Discussion: https://postgr.es/m/tencent_6E296D2F7D70AFC90D83353B69187C3AA507@qq.com

044aa9e7

Optimize DropRelFileNodeBuffers() for recovery. · d6ad34f3

Amit Kapila authored Jan 12, 2021

The recovery path of DropRelFileNodeBuffers() is optimized so that
scanning of the whole buffer pool can be avoided when the number of
blocks to be truncated in a relation is below a certain threshold. For
such cases, we find the buffers by doing lookups in BufMapping table.
This improves the performance by more than 100 times in many cases
when several small tables (tested with 1000 relations) are truncated
and where the server is configured with a large value of shared
buffers (greater than equal to 100GB).

This optimization helps cases (a) when vacuum or autovacuum truncated off
any of the empty pages at the end of a relation, or (b) when the relation is
truncated in the same transaction in which it was created.

This commit introduces a new API smgrnblocks_cached which returns a cached
value for the number of blocks in a relation fork. This helps us to determine
the exact size of relation which is required to apply this optimization. The
exact size is required to ensure that we don't leave any buffer for the
relation being dropped as otherwise the background writer or checkpointer
can lead to a PANIC error while flushing buffers corresponding to files that
don't exist.

Author: Kirk Jamison based on ideas by Amit Kapila
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com

d6ad34f3

Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry. · 9a4c0e36

Tom Lane authored Jan 11, 2021

Previously, we emitted the ATTACH PARTITION command as part of
the child table's ArchiveEntry.  This was a poor choice since it
complicates restoring the partition as a standalone table; you have
to ignore the error from the ATTACH, which isn't even an option when
restoring direct-to-database with pg_restore.  (pg_restore will issue
the whole ArchiveEntry as one PQexec, so that any error rolls back
the table creation as well.)  Hence, separate it out as its own
ArchiveEntry, as indeed we already did for index ATTACH PARTITION
commands.

Justin Pryzby

Discussion: https://postgr.es/m/20201023052940.GE9241@telsasoft.com

9a4c0e36

Make pg_dump's table of object-type priorities more maintainable. · d5ab79d8

Tom Lane authored Jan 11, 2021

Wedging a new object type into this table has historically required
manually renumbering a lot of existing entries. (Although it appears
that some people got lazy and re-used the priority level of an
existing object type, even if it wasn't particularly related.)
We can let the compiler do the counting by inventing an enum type that
lists the desired priority levels in order. Now, if you want to add
or remove a priority level, that's a one-liner.

This patch is not purely cosmetic, because I split apart the priorities
of DO_COLLATION and DO_TRANSFORM, as well as those of DO_ACCESS_METHOD
and DO_OPERATOR, which look to me to have been merged out of expediency
rather than because it was a good idea. Shell types continue to be
sorted interchangeably with full types, and opclasses interchangeably
with opfamilies.

d5ab79d8

11 Jan, 2021 1 commit

Fix function prototypes in dependency.h. · f315205f

Thomas Munro authored Jan 12, 2021

Commit 257836a7 accidentally deleted a couple of
redundant-but-conventional "extern" keywords on function prototypes.
Put them back.
Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org>

f315205f