Commit 10c70b86 authored by Tom Lane's avatar Tom Lane

Add note about space usage of 'manual' approach to clustering, per

suggestion from Sergey Koposov.  Also some other minor editing.
parent 6fada498
<!-- <!--
$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.37 2006/10/31 01:52:31 neilc Exp $ $PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.38 2006/11/04 19:03:51 tgl Exp $
PostgreSQL documentation PostgreSQL documentation
--> -->
...@@ -108,8 +108,8 @@ CLUSTER ...@@ -108,8 +108,8 @@ CLUSTER
If you are requesting a range of indexed values from a table, or a If you are requesting a range of indexed values from a table, or a
single indexed value that has multiple rows that match, single indexed value that has multiple rows that match,
<command>CLUSTER</command> will help because once the index identifies the <command>CLUSTER</command> will help because once the index identifies the
heap page for the first row that matches, all other rows table page for the first row that matches, all other rows
that match are probably already on the same heap page, that match are probably already on the same table page,
and so you save disk accesses and speed up the query. and so you save disk accesses and speed up the query.
</para> </para>
...@@ -137,30 +137,33 @@ CLUSTER ...@@ -137,30 +137,33 @@ CLUSTER
<para> <para>
There is another way to cluster data. The There is another way to cluster data. The
<command>CLUSTER</command> command reorders the original table using <command>CLUSTER</command> command reorders the original table by
the ordering of the index you specify. This can be slow scanning it using the index you specify. This can be slow
on large tables because the rows are fetched from the heap on large tables because the rows are fetched from the table
in index order, and if the heap table is unordered, the in index order, and if the table is disordered, the
entries are on random pages, so there is one disk page entries are on random pages, so there is one disk page
retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache, retrieved for every row moved. (<productname>PostgreSQL</productname> has
but the majority of a big table will not fit in the cache.) a cache, but the majority of a big table will not fit in the cache.)
The other way to cluster a table is to use The other way to cluster a table is to use
<programlisting> <programlisting>
CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
SELECT <replaceable class="parameter">columnlist</replaceable> FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>; SELECT * FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
</programlisting> </programlisting>
which uses the <productname>PostgreSQL</productname> sorting code in which uses the <productname>PostgreSQL</productname> sorting code
the <literal>ORDER BY</literal> clause to create the desired order; this is usually much to produce the desired order;
faster than an index scan for this is usually much faster than an index scan for disordered data.
unordered data. You then drop the old table, use Then you drop the old table, use
<command>ALTER TABLE ... RENAME</command> <command>ALTER TABLE ... RENAME</command>
to rename <replaceable class="parameter">newtable</replaceable> to the old name, and to rename <replaceable class="parameter">newtable</replaceable> to the
recreate the table's indexes. However, this approach does not preserve old name, and recreate the table's indexes.
The big disadvantage of this approach is that it does not preserve
OIDs, constraints, foreign key relationships, granted privileges, and OIDs, constraints, foreign key relationships, granted privileges, and
other ancillary properties of the table &mdash; all such items must be other ancillary properties of the table &mdash; all such items must be
manually recreated. manually recreated. Another disadvantage is that this way requires a sort
temporary file about the same size as the table itself, so peak disk usage
is about three times the table size instead of twice the table size.
</para> </para>
</refsect1> </refsect1>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment