Commit d8767702 authored by Bruce Momjian's avatar Bruce Momjian

Add anther sequential scan email.

parent 43e740b3
...@@ -345,7 +345,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999 ...@@ -345,7 +345,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087 by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT) for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
Received: from localhost (majordom@localhost) Received: from localhost (majordom@localhost)
by hub.org (8.9.3/8.9.3) with SMTP id KAA30328; by hub.org (8.9.3/8.9.3) with SMTP id KAA30328;
Tue, 19 Oct 1999 10:12:10 -0400 (EDT) Tue, 19 Oct 1999 10:12:10 -0400 (EDT)
...@@ -454,7 +454,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999 ...@@ -454,7 +454,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130 by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT) for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT)
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT) Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
Received: from localhost (majordom@localhost) Received: from localhost (majordom@localhost)
by hub.org (8.9.3/8.9.3) with SMTP id VAA50745; by hub.org (8.9.3/8.9.3) with SMTP id VAA50745;
Tue, 19 Oct 1999 21:07:23 -0400 (EDT) Tue, 19 Oct 1999 21:07:23 -0400 (EDT)
...@@ -1006,7 +1006,7 @@ From pgsql-general-owner+M2497@hub.org Fri Jun 16 18:31:03 2000 ...@@ -1006,7 +1006,7 @@ From pgsql-general-owner+M2497@hub.org Fri Jun 16 18:31:03 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04165 by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04165
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:31:01 -0400 (EDT) for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:31:01 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14 $) with ESMTP id RAA13110 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:20:12 -0400 (EDT) Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15 $) with ESMTP id RAA13110 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:20:12 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1]) Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5GLDaM14477; by hub.org (8.10.1/8.10.1) with SMTP id e5GLDaM14477;
Fri, 16 Jun 2000 17:13:36 -0400 (EDT) Fri, 16 Jun 2000 17:13:36 -0400 (EDT)
...@@ -3032,3 +3032,133 @@ Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org ...@@ -3032,3 +3032,133 @@ Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC Don't you know, in this new Dark Age, we're all light. --XTC
From cjs@cynic.net Wed Apr 24 23:19:23 2002
Return-path: <cjs@cynic.net>
Received: from angelic.cynic.net ([202.232.117.21])
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g3P3JM414917
for <pgman@candle.pha.pa.us>; Wed, 24 Apr 2002 23:19:22 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])
by angelic.cynic.net (Postfix) with ESMTP
id 1F36F870E; Thu, 25 Apr 2002 12:19:14 +0900 (JST)
Date: Thu, 25 Apr 2002 12:19:14 +0900 (JST)
From: Curt Sampson <cjs@cynic.net>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Sequential Scan Read-Ahead
In-Reply-To: <200204250156.g3P1ufh05751@candle.pha.pa.us>
Message-ID: <Pine.NEB.4.43.0204251118040.445-100000@angelic.cynic.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: OR
On Wed, 24 Apr 2002, Bruce Momjian wrote:
> > 1. Not all systems do readahead.
>
> If they don't, that isn't our problem. We expect it to be there, and if
> it isn't, the vendor/kernel is at fault.
It is your problem when another database kicks Postgres' ass
performance-wise.
And at that point, *you're* at fault. You're the one who's knowingly
decided to do things inefficiently.
Sorry if this sounds harsh, but this, "Oh, someone else is to blame"
attitude gets me steamed. It's one thing to say, "We don't support
this." That's fine; there are often good reasons for that. It's a
completely different thing to say, "It's an unrelated entity's fault we
don't support this."
At any rate, relying on the kernel to guess how to optimise for
the workload will never work as well as well as the software that
knows the workload doing the optimization.
The lack of support thing is no joke. Sure, lots of systems nowadays
support unified buffer cache and read-ahead. But how many, besides
Solaris, support free-behind, which is also very important to avoid
blowing out your buffer cache when doing sequential reads? And who
at all supports read-ahead for reverse scans? (Or does Postgres
not do those, anyway? I can see the support is there.)
And even when the facilities are there, you create problems by
using them. Look at the OS buffer cache, for example. Not only do
we lose efficiency by using two layers of caching, but (as people
have pointed out recently on the lists), the optimizer can't even
know how much or what is being cached, and thus can't make decisions
based on that.
> Yes, seek() in file will turn off read-ahead. Grabbing bigger chunks
> would help here, but if you have two people already reading from the
> same file, grabbing bigger chunks of the file may not be optimal.
Grabbing bigger chunks is always optimal, AFICT, if they're not
*too* big and you use the data. A single 64K read takes very little
longer than a single 8K read.
> > 3. Even when the read-ahead does occur, you're still doing more
> > syscalls, and thus more expensive kernel/userland transitions, than
> > you have to.
>
> I would guess the performance impact is minimal.
If it were minimal, people wouldn't work so hard to build multi-level
thread systems, where multiple userland threads are scheduled on
top of kernel threads.
However, it does depend on how much CPU your particular application
is using. You may have it to spare.
> http://candle.pha.pa.us/mhonarc/todo.detail/performance/msg00009.html
Well, this message has some points in it that I feel are just incorrect.
1. It is *not* true that you have no idea where data is when
using a storage array or other similar system. While you
certainly ought not worry about things such as head positions
and so on, it's been a given for a long, long time that two
blocks that have close index numbers are going to be close
together in physical storage.
2. Raw devices are quite standard across Unix systems (except
in the unfortunate case of Linux, which I think has been
remedied, hasn't it?). They're very portable, and have just as
well--if not better--defined write semantics as a filesystem.
3. My observations of OS performance tuning over the past six
or eight years contradict the statement, "There's a considerable
cost in complexity and code in using "raw" storage too, and
it's not a one off cost: as the technologies change, the "fast"
way to do things will change and the code will have to be
updated to match." While optimizations have been removed over
the years the basic optimizations (order reads by block number,
do larger reads rather than smaller, cache the data) have
remained unchanged for a long, long time.
4. "Better to leave this to the OS vendor where possible, and
take advantage of the tuning they do." Well, sorry guys, but
have a look at the tuning they do. It hasn't changed in years,
except to remove now-unnecessary complexity realated to really,
really old and slow disk devices, and to add a few thing that
guess workload but still do a worse job than if the workload
generator just did its own optimisations in the first place.
> http://candle.pha.pa.us/mhonarc/todo.detail/optimizer/msg00011.html
Well, this one, with statements like "Postgres does have control
over its buffer cache," I don't know what to say. You can interpret
the statement however you like, but in the end Postgres very little
control at all over how data is moved between memory and disk.
BTW, please don't take me as saying that all control over physical
IO should be done by Postgres. I just think that Posgres could do
a better job of managing data transfer between disk and memory than
the OS can. The rest of the things (using raw paritions, read-ahead,
free-behind, etc.) just drop out of that one idea.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment