Commits · b8b2e3b2deeaab19715af063fc009b7c230b2336 · Abuhujair Javed / Postgres FD Implementation

24 Jun, 2012 7 commits

Replace int2/int4 in C code with int16/int32 · b8b2e3b2

Peter Eisentraut authored 12 years ago

The latter was already the dominant use, and it's preferable because
in C the convention is that intXX means XX bits. Therefore, allowing
mixed use of int2, int4, int8, int16, int32 is obviously confusing.

Remove the typedefs for int2 and int4 for now. They don't seem to be
widely used outside of the PostgreSQL source tree, and the few uses
can probably be cleaned up by the time this ships.

b8b2e3b2

Use UINT64CONST for 64-bit integer constants. · 0687a260
Heikki Linnakangas authored 12 years ago
```
Peter Eisentraut advised me that UINT64CONST is the proper way to do that,
not LL suffix.
```
0687a260
Use LL suffix for 64-bit constants. · 96ff85e2
Heikki Linnakangas authored 12 years ago
```
Per warning from buildfarm member 'locust'. At least I think this what's
making it upset.
```
96ff85e2

Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4

Heikki Linnakangas authored 12 years ago

This simplifies code that needs to do arithmetic on XLogRecPtrs.

To avoid changing on-disk format of data pages, the LSN on data pages is
still stored in the old format. That should keep pg_upgrade happy. However,
we have XLogRecPtrs embedded in the control file, and in the structs that
are sent over the replication protocol, so this changes breaks compatibility
of pg_basebackup and server. I didn't do anything about this in this patch,
per discussion on -hackers, the right thing to do would to be to change the
replication protocol to be architecture-independent, so that you could use
a newer version of pg_receivexlog, for example, against an older server
version.

0ab9d1c4

Allow WAL record header to be split across pages. · 061e7efb

Heikki Linnakangas authored 12 years ago

This saves a few bytes of WAL space, but the real motivation is to make it
predictable how much WAL space a record requires, as it no longer depends
on whether we need to waste the last few bytes at end of WAL page because
the header doesn't fit.

The total length field of WAL record, xl_tot_len, is moved to the beginning
of the WAL record header, so that it is still always found on the first page
where a WAL record begins.

Bump WAL version number again as this is an incompatible change.

061e7efb

Move WAL continuation record information to WAL page header. · 20ba5ca6

Heikki Linnakangas authored 12 years ago

The continuation record only contained one field, xl_rem_len, so it makes
things simpler to just include it in the WAL page header. This wastes four
bytes on pages that don't begin with a continuation from previos page, plus
four bytes on every page, because of padding.

The motivation of this is to make it easier to calculate how much space a
WAL record needs. Before this patch, it depended on how many page boundaries
the record crosses. The motivation of that, in turn, is to separate the
allocation of space in the WAL from the copying of the record data to the
allocated space. Keeping the calculation of space required simple helps to
keep the critical section of allocating the space from WAL short. But that's
not included in this patch yet.

Bump WAL version number again, as this is an incompatible change.

20ba5ca6

Don't waste the last segment of each 4GB logical log file. · dfda6eba

Heikki Linnakangas authored 12 years ago

The comments claimed that wasting the last segment made it easier to do
calculations with XLogRecPtrs, because you don't have problems representing
last-byte-position-plus-1 that way. In my experience, however, it only made
things more complicated, because the there was two ways to represent the
boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0,
or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were
picky about which representation was used.

Also, use a 64-bit segment number instead of the log/seg combination, to
point to a certain WAL segment. We assume that all platforms have a working
64-bit integer type nowadays.

This is an incompatible change in WAL format, so bumping WAL version number.

dfda6eba

21 Jun, 2012 2 commits

Fix memory leak in ARRAY(SELECT ...) subqueries. · d14241c2

Tom Lane authored 12 years ago

Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which
I think can only happen if the sub-select is embedded in a larger,
correlated subquery) would leak memory for the duration of the query,
due to not reclaiming the array generated in the previous execution.
Per bug #6698 from Armando Miraglia. Diagnosis and fix idea by Heikki,
patch itself by me.

This has been like this all along, so back-patch to all supported versions.

d14241c2

Add a small cache of locks owned by a resource owner in ResourceOwner. · eeb6f37d

Heikki Linnakangas authored 12 years ago

This speeds up reassigning locks to the parent owner, when the transaction
holds a lot of locks, but only a few of them belong to the current resource
owner. This is particularly helps pg_dump when dumping a large number of
objects.

The cache can hold up to 15 locks in each resource owner. After that, the
cache is marked as overflowed, and we fall back to the old method of
scanning the whole local lock table. The tradeoff here is that the cache has
to be scanned whenever a lock is released, so if the cache is too large,
lock release becomes more expensive. 15 seems enough to cover pg_dump, and
doesn't have much impact on lock release.

Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.

eeb6f37d

20 Jun, 2012 1 commit

Improve tests for whether we can skip queueing RI enforcement triggers. · cfa0f425

Tom Lane authored 12 years ago

During an update of a PK row, we can skip firing the RI trigger if any old
key value is NULL, because then the row could not have had any matching
rows in the FK table. Conversely, during an update of an FK row, the
outcome is determined if any new key value is NULL. In either case it
becomes unnecessary to compare individual key values.

This patch was inspired by discussion of Vik Reykja's patch to use IS NOT
DISTINCT semantics for the key comparisons. In the event there is no need
for that and so this patch looks nothing like his, but he should still get
credit for having re-opened consideration of the trigger skip logic.

cfa0f425

18 Jun, 2012 1 commit

Refer to the default foreign key match style as MATCH SIMPLE internally. · f5297bdf

Tom Lane authored 12 years ago

Previously we followed the SQL92 wording, "MATCH <unspecified>", but since
SQL99 there's been a less awkward way to refer to the default style.

In addition to the code changes, pg_constraint.confmatchtype now stores
this match style as 's' (SIMPLE) rather than 'u' (UNSPECIFIED). This
doesn't affect pg_dump or psql because they use pg_get_constraintdef()
to reconstruct foreign key definitions. But other client-side code might
examine that column directly, so this change will have to be marked as
an incompatibility in the 9.3 release notes.

f5297bdf

17 Jun, 2012 1 commit

Fix stats collector to recover nicely when system clock goes backwards. · 9e18eacb

Tom Lane authored 12 years ago

Formerly, if the system clock went backwards, the stats collector would
fail to update the stats file any more until the clock reading again
exceeds whatever timestamp was last written into the stats file. Such
glitches in the clock's behavior are not terribly unlikely on machines
not using NTP. Such a scenario has been observed to cause regression test
failures in the buildfarm, and it could have bad effects on the behavior
of autovacuum, so it seems prudent to install some defenses.

We could directly detect the clock going backwards by adding
GetCurrentTimestamp calls in the stats collector's main loop, but that
would hurt performance on platforms where GetCurrentTimestamp is expensive.
To minimize the performance hit in normal cases, adopt a more complicated
scheme wherein backends check for clock skew when reading the stats file,
and if they see it, signal the stats collector by sending an extra stats
inquiry message. The stats collector does an extra GetCurrentTimestamp
only when it receives an inquiry with an apparently out-of-order
timestamp.

To avoid unnecessary GetCurrentTimestamp calls, expand the inquiry messages
to carry the backend's current clock reading as well as its stats cutoff
time. The latter, being intentionally slightly in-the-past, would trigger
more clock rechecks than we need if it were used for this purpose.

We might want to backpatch this change at some point, but let's let it
shake out in the buildfarm for awhile first.

9e18eacb

15 Jun, 2012 1 commit

Improve reporting of permission errors for array types · 15b1918e

Peter Eisentraut authored 12 years ago

Because permissions are assigned to element types, not array types,
complaining about permission denied on an array type would be
misleading to users.  So adjust the reporting to refer to the element
type instead.

In order not to duplicate the required logic in two dozen places,
refactor the permission denied reporting for types a bit.

pointed out by Yeb Havinga during the review of the type privilege
feature

15b1918e

14 Jun, 2012 5 commits

New SQL functons pg_backup_in_progress() and pg_backup_start_time() · 68de499b

Robert Haas authored 12 years ago

Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
Marco Nenciarini.  Stylistic cleanup and OID fixes by me.

68de499b

Add new function log_newpage_buffer. · 6cd015be

Robert Haas authored 12 years ago

When I implemented the ginbuildempty() function as part of
implementing unlogged tables, I falsified the note in the header
comment for log_newpage.  Although we could fix that up by changing
the comment, it seems cleaner to add a new function which is
specifically intended to handle this case.  So do that.

6cd015be

Remove misplaced sanity check from heap_create(). · a475c603

Robert Haas authored 12 years ago

Even when allow_system_table_mods is not set, we allow creation of any
type of SQL object in pg_catalog, except for relations. And you can
get relations into pg_catalog, too, by initially creating them in some
other schema and then moving them with ALTER .. SET SCHEMA. So this
restriction, which prevents relations (only) from being created in
pg_catalog directly, is fairly pointless. If we need a safety mechanism
for this, it should be placed further upstream, so that it affects all
SQL objects uniformly, and picks up both CREATE and SET SCHEMA.

For now, just rip it out, per discussion with Tom Lane.

a475c603

Remove RELKIND_UNCATALOGED. · d2c86a1c

Robert Haas authored 12 years ago

This may have been important at some point in the past, but it no
longer does anything useful.

Review by Tom Lane.

d2c86a1c

Stamp HEAD as 9.3devel. · bed88fce
Tom Lane authored 12 years ago
```
Let the hacking begin ...
```
bed88fce

10 Jun, 2012 2 commits
- Run pgindent on 9.2 source tree in preparation for first 9.3 · 927d61ee
  Bruce Momjian authored 12 years ago
```
commit-fest.
```
  927d61ee
- Make include files work without having to include other ones first · 8570114d
  Peter Eisentraut authored 12 years ago
  
  8570114d
07 Jun, 2012 1 commit

Scan the buffer pool just once, not once per fork, during relation drop. · ece01aae

Tom Lane authored 12 years ago

This provides a speedup of about 4X when NBuffers is large enough.
There is also a useful reduction in sinval traffic, since we
only do CacheInvalidateSmgr() once not once per fork.

Simon Riggs, reviewed and somewhat revised by Tom Lane

ece01aae

01 Jun, 2012 1 commit
- After any checkpoint, close all smgr files handles in bgwriter · 055c352a
  Simon Riggs authored 12 years ago
  
  055c352a
31 May, 2012 2 commits

Stamp 9.2beta2. · 4bec93ac
Tom Lane authored 12 years ago

4bec93ac

Force PL and range-type support functions to be owned by a superuser. · ad0009e7

Tom Lane authored 12 years ago

We allow non-superusers to create procedural languages (with restrictions)
and range datatypes. Previously, the automatically-created support
functions for these objects ended up owned by the creating user. This
represents a rather considerable security hazard, because the owning user
might be able to alter a support function's definition in such a way as to
crash the server, inject trojan-horse SQL code, or even execute arbitrary
C code directly. It appears that right now the only actually exploitable
problem is the infinite-recursion bug fixed in the previous patch for
CVE-2012-2655. However, it's not hard to imagine that future additions of
more ALTER FUNCTION capability might unintentionally open up new hazards.
To forestall future problems, cause these support functions to be owned by
the bootstrap superuser, not the user creating the parent object.

ad0009e7

30 May, 2012 2 commits

Expand the allowed range of timezone offsets to +/-15:59:59 from Greenwich. · cd0ff9c0

Tom Lane authored 12 years ago

We used to only allow offsets less than +/-13 hours, then it was +/14,
then it was +/-15.  That's still not good enough though, as per today's bug
report from Patric Bechtel.  This time I actually looked through the Olson
timezone database to find the largest offsets used anywhere.  The winners
are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at
+15:13:42 until 1867.  So we'd better allow offsets less than +/-16 hours.

Given the history, we are way overdue to have some greppable #define
symbols controlling this, so make some ... and also remove an obsolete
comment that didn't get fixed the last time.

Back-patch to all supported branches.

cd0ff9c0

Change the way parent pages are tracked during buffered GiST build. · d1996ed5

Heikki Linnakangas authored 12 years ago

We used to mimic the way a stack is constructed when descending the tree
during normal GiST inserts, but that was quite complicated during a buffered
build. It was also wrong: in GiST, the left-to-right relationships on
different levels might not match each other, so that when you know the
parent of a child page, you won't necessarily find the parent of the page to
the right of the child page by following the rightlinks at the parent level.
This sometimes led to "could not re-find parent" errors while building a
GiST index.

We now use a simple hash table to track the parent of every internal page.
Whenever a page is split, and downlinks are moved from one page to another,
we update the hash table accordingly. This is also better for performance
than the old method, as we never need to move right to re-find the parent
page, which could take a significant amount of time for buffers that were
created much earlier in the index build.

d1996ed5

18 May, 2012 1 commit

Fix bug in gistRelocateBuildBuffersOnSplit(). · 1d27dcf5

Heikki Linnakangas authored 12 years ago

When we create a temporary copy of the old node buffer, in stack, we mustn't
leak that into any of the long-lived data structures. Before this patch,
when we called gistPopItupFromNodeBuffer(), it got added to the array of
"loaded buffers". After gistRelocateBuildBuffersOnSplit() exits, the
pointer added to the loaded buffers array points to garbage. Often that goes
unnotied, because when we go through the array of loaded buffers to unload
them, buffers with a NULL pageBuffer are ignored, which can often happen by
accident even if the pointer points to garbage.

This patch fixes that by marking the temporary copy in stack explicitly as
temporary, and refrain from adding buffers marked as temporary to the array
of loaded buffers.

While we're at it, initialize nodeBuffer->pageBlocknum to InvalidBlockNumber
and improve comments a bit. This isn't strictly necessary, but makes
debugging easier.

1d27dcf5

16 May, 2012 1 commit

Change COLLATION keyword category · be6d1c88

Peter Eisentraut authored 12 years ago

It was changed from unreserved to reserved as part of the COLLATION
FOR syntax, but it turns out that type_func_name_keyword is
sufficient.

be6d1c88

15 May, 2012 1 commit

Put back AC_REQUIRE([AC_STRUCT_TM]). · f667747b

Tom Lane authored 12 years ago

The BSD-ish members of the buildfarm all seem to think removing this
was a bad idea. It looks to me like it resulted in omitting the system
header inclusion necessary to detect the fields of struct tm correctly.

f667747b

14 May, 2012 3 commits

Remove unused AC_DEFINE symbols · ff4628f3

Peter Eisentraut authored 12 years ago

ENABLE_DTRACE            unused as of a7b7b07a
HAVE_ERR_SET_MARK        unused as of 4ed4b6c5
HAVE_FCVT                unused as of 4553e1d8
HAVE_STRUCT_SOCKADDR_UN  unused as of b4cea00a
HAVE_SYSCONF             unused as of f83356c7
TM_IN_SYS_TIME           never used, obsolescent per Autoconf documentation

ff4628f3

Update comments that became out-of-date with the PGXACT struct. · 9e4637bf

Heikki Linnakangas authored 12 years ago

When the "hot" members of PGPROC were split off to separate PGXACT structs,
many PGPROC fields referred to in comments were moved to PGXACT, but the
comments were neglected in the commit. Mostly this is just a search/replace
of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for
prepared transactions changed more, making some of the comments totally
bogus.

Noah Misch

9e4637bf

Remove leftovers of BeOS port · 64f09ca3
Peter Eisentraut authored 12 years ago
```
These should have been removed when the BeOS port was removed in
44f90212.
```
64f09ca3

11 May, 2012 2 commits
- Prevent loss of init fork when truncating an unlogged table. · 1331cc6c
  Robert Haas authored 12 years ago
```
Fixes bug #6635, reported by Akira Kurosawa.
```
  1331cc6c
- Ensure age() returns a stable value rather than the latest value · b06679e0
  Simon Riggs authored 12 years ago
  
  b06679e0
10 May, 2012 5 commits

Revert catalog bump; was post-beta1, and unnecessary. · ee24de40
Bruce Momjian authored 12 years ago

ee24de40
Update comment for 'name' data type to say 63 "bytes". · d2fe836c
Bruce Momjian authored 12 years ago
```
Catalog version bump so everyone has the same comment for beta1.
```
d2fe836c
Stamp 9.2beta1. · f70fa835
Tom Lane authored 12 years ago

f70fa835

Fix outdated comment. · 60a3dffb

Heikki Linnakangas authored 12 years ago

Multi-insert records observe XLOG_HEAP_INIT_PAGE flag too, as Andres Freund
pointed out.

60a3dffb

Improve control logic for bgwriter hibernation mode. · 6308ba05

Tom Lane authored 12 years ago

Commit 6d90eaaa added a hibernation mode
to the bgwriter to reduce the server's idle-power consumption. However,
its interaction with the detailed behavior of BgBufferSync's feedback
control loop wasn't very well thought out. That control loop depends
primarily on the rate of buffer allocation, not the rate of buffer
dirtying, so the hibernation mode has to be designed to operate only when
no new buffer allocations are happening. Also, the check for whether the
system is effectively idle was not quite right and would fail to detect
a constant low level of activity, thus allowing the bgwriter to go into
hibernation mode in a way that would let the cycle time vary quite a bit,
possibly further confusing the feedback loop. To fix, move the wakeup
support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
unless no buffer allocations have happened recently.

In addition, fix the delaying logic to remove the problem of possibly not
responding to signals promptly, which was basically caused by trying to use
the process latch's is_set flag for multiple purposes. I can't prove it
but I'm suspicious that that hack was responsible for the intermittent
"postmaster does not shut down" failures we've been seeing in the buildfarm
lately. In any case it did nothing to improve the readability or
robustness of the code.

In passing, express the hibernation sleep time as a multiplier on
BgWriterDelay, not a constant. I'm not sure whether there's any value in
exposing the longer sleep time as an independently configurable setting,
but we can at least make it act like this for little extra code.

6308ba05

09 May, 2012 1 commit
- Rename BgWriterShmem/Request to CheckpointerShmem/Request · 8f28789b
  Simon Riggs authored 12 years ago
  
  8f28789b