Commit ab3bb9cf authored by Tom Lane's avatar Tom Lane

Add some real documentation about TOAST (finally). Combine this with

the old 'page' chapter and the recently added 'filelayout' chapter to
make a coherent chapter about PostgreSQL's physical storage layout.
parent 521e8888
<!-- <!--
$PostgreSQL: pgsql/doc/src/sgml/diskusage.sgml,v 1.13 2004/12/28 19:08:58 tgl Exp $ $PostgreSQL: pgsql/doc/src/sgml/diskusage.sgml,v 1.14 2005/01/10 00:04:38 tgl Exp $
--> -->
<chapter id="diskusage"> <chapter id="diskusage">
...@@ -22,12 +22,12 @@ $PostgreSQL: pgsql/doc/src/sgml/diskusage.sgml,v 1.13 2004/12/28 19:08:58 tgl Ex ...@@ -22,12 +22,12 @@ $PostgreSQL: pgsql/doc/src/sgml/diskusage.sgml,v 1.13 2004/12/28 19:08:58 tgl Ex
stored. If the table has any columns with potentially-wide values, stored. If the table has any columns with potentially-wide values,
there is also a <acronym>TOAST</> file associated with the table, there is also a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main which is used to store values too wide to fit comfortably in the main
table. There will be one index on the table (see <xref linkend="storage-toast">). There will be one index on the
<acronym>TOAST</> table, if present. There may also be indexes associated <acronym>TOAST</> table, if present. There may also be indexes associated
with the base table. Each table and index is stored in a separate disk with the base table. Each table and index is stored in a separate disk
file &mdash; possibly more than one file, if the file would exceed one file &mdash; possibly more than one file, if the file would exceed one
gigabyte. Naming conventions for these files are described in <xref gigabyte. Naming conventions for these files are described in <xref
linkend="file-layout">. linkend="storage-file-layout">.
</para> </para>
<para> <para>
......
<!--
$PostgreSQL: pgsql/doc/src/sgml/filelayout.sgml,v 1.2 2004/11/16 15:00:36 tgl Exp $
-->
<chapter id="file-layout">
<title>Database File Layout</title>
<abstract>
<para>
A description of the database physical storage layout.
</para>
</abstract>
<para>
This section provides an overview of the physical format used by
<productname>PostgreSQL</productname> databases.
</para>
<para>
All the data needed for a database cluster is stored within the cluster's data
directory, commonly referred to as <varname>PGDATA</> (after the name of the
environment variable that can be used to define it). A common location for
<varname>PGDATA</> is <filename>/var/lib/pgsql/data</>. Multiple clusters,
managed by different postmasters, can exist on the same machine.
</para>
<para>
The <varname>PGDATA</> directory contains several subdirectories and control
files, as shown in <xref linkend="pgdata-contents-table">. In addition to
these required items, the cluster configuration files
<filename>postgresql.conf</filename>, <filename>pg_hba.conf</filename>, and
<filename>pg_ident.conf</filename> are traditionally stored in
<varname>PGDATA</> (although beginning in
<productname>PostgreSQL</productname> 8.0 it is possible to keep them
elsewhere).
</para>
<table tocentry="1" id="pgdata-contents-table">
<title>Contents of <varname>PGDATA</></title>
<tgroup cols="2">
<thead>
<row>
<entry>
Item
</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><filename>PG_VERSION</></entry>
<entry>A file containing the major version number of <productname>PostgreSQL</productname></entry>
</row>
<row>
<entry><filename>base</></entry>
<entry>Subdirectory containing per-database subdirectories</entry>
</row>
<row>
<entry><filename>global</></entry>
<entry>Subdirectory containing cluster-wide tables, such as
<structname>pg_database</></entry>
</row>
<row>
<entry><filename>pg_clog</></entry>
<entry>Subdirectory containing transaction commit status data</entry>
</row>
<row>
<entry><filename>pg_subtrans</></entry>
<entry>Subdirectory containing subtransaction status data</entry>
</row>
<row>
<entry><filename>pg_tblspc</></entry>
<entry>Subdirectory containing symbolic links to tablespaces</entry>
</row>
<row>
<entry><filename>pg_xlog</></entry>
<entry>Subdirectory containing WAL (Write Ahead Log) files</entry>
</row>
<row>
<entry><filename>postmaster.opts</></entry>
<entry>A file recording the command-line options the postmaster was
last started with</entry>
</row>
<row>
<entry><filename>postmaster.pid</></entry>
<entry>A lock file recording the current postmaster PID and shared memory
segment ID (not present after postmaster shutdown)</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
For each database in the cluster there is a subdirectory within
<varname>PGDATA</><filename>/base</>, named after the database's OID in
<structname>pg_database</>. This subdirectory is the default location
for the database's files; in particular, its system catalogs are stored
there.
</para>
<para>
Each table and index is stored in a separate file, named after the table
or index's <firstterm>filenode</> number, which can be found in
<structname>pg_class</>.<structfield>relfilenode</>.
</para>
<caution>
<para>
Note that while a table's filenode often matches its OID, this is
<emphasis>not</> necessarily the case; some operations, like
<command>TRUNCATE</>, <command>REINDEX</>, <command>CLUSTER</> and some forms
of <command>ALTER TABLE</>, can change the filenode while preserving the OID.
Avoid assuming that filenode and table OID are the same.
</para>
</caution>
<para>
When a table or index exceeds 1Gb, it is divided into gigabyte-sized
<firstterm>segments</>. The first segment's file name is the same as the
filenode; subsequent segments are named filenode.1, filenode.2, etc.
This arrangement avoids problems on platforms that have file size limitations.
The contents of tables and indexes are discussed further in
<xref linkend="page">.
</para>
<para>
A table that has columns with potentially large entries will have an
associated <firstterm>TOAST</> table, which is used for out-of-line storage of
field values that are too large to keep in the table rows proper.
<structname>pg_class</>.<structfield>reltoastrelid</> links from a table to
its TOAST table, if any.
</para>
<para>
Tablespaces make the scenario more complicated. Each user-defined tablespace
has a symbolic link inside the <varname>PGDATA</><filename>/pg_tblspc</>
directory, which points to the physical tablespace directory (as specified in
its <command>CREATE TABLESPACE</> command). The symbolic link is named after
the tablespace's OID. Inside the physical tablespace directory there is
a subdirectory for each database that has elements in the tablespace, named
after the database's OID. Tables within that directory follow the filenode
naming scheme. The <literal>pg_default</> tablespace is not accessed through
<filename>pg_tblspc</>, but corresponds to
<varname>PGDATA</><filename>/base</>. Similarly, the <literal>pg_global</>
tablespace is not accessed through <filename>pg_tblspc</>, but corresponds to
<varname>PGDATA</><filename>/global</>.
</para>
</chapter>
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.40 2004/12/03 05:50:18 momjian Exp $ --> <!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.41 2005/01/10 00:04:38 tgl Exp $ -->
<!entity history SYSTEM "history.sgml"> <!entity history SYSTEM "history.sgml">
<!entity info SYSTEM "info.sgml"> <!entity info SYSTEM "info.sgml">
...@@ -75,15 +75,14 @@ ...@@ -75,15 +75,14 @@
<!entity arch-dev SYSTEM "arch-dev.sgml"> <!entity arch-dev SYSTEM "arch-dev.sgml">
<!entity bki SYSTEM "bki.sgml"> <!entity bki SYSTEM "bki.sgml">
<!entity catalogs SYSTEM "catalogs.sgml"> <!entity catalogs SYSTEM "catalogs.sgml">
<!entity filelayout SYSTEM "filelayout.sgml">
<!entity geqo SYSTEM "geqo.sgml"> <!entity geqo SYSTEM "geqo.sgml">
<!entity gist SYSTEM "gist.sgml"> <!entity gist SYSTEM "gist.sgml">
<!entity indexcost SYSTEM "indexcost.sgml"> <!entity indexcost SYSTEM "indexcost.sgml">
<!entity nls SYSTEM "nls.sgml"> <!entity nls SYSTEM "nls.sgml">
<!entity page SYSTEM "page.sgml">
<!entity plhandler SYSTEM "plhandler.sgml"> <!entity plhandler SYSTEM "plhandler.sgml">
<!entity protocol SYSTEM "protocol.sgml"> <!entity protocol SYSTEM "protocol.sgml">
<!entity sources SYSTEM "sources.sgml"> <!entity sources SYSTEM "sources.sgml">
<!entity storage SYSTEM "storage.sgml">
<!-- appendixes --> <!-- appendixes -->
<!entity contacts SYSTEM "contacts.sgml"> <!entity contacts SYSTEM "contacts.sgml">
......
<!-- <!--
$PostgreSQL: pgsql/doc/src/sgml/lobj.sgml,v 1.35 2005/01/08 22:13:33 tgl Exp $ $PostgreSQL: pgsql/doc/src/sgml/lobj.sgml,v 1.36 2005/01/10 00:04:38 tgl Exp $
--> -->
<chapter id="largeObjects"> <chapter id="largeObjects">
...@@ -51,9 +51,11 @@ $PostgreSQL: pgsql/doc/src/sgml/lobj.sgml,v 1.35 2005/01/08 22:13:33 tgl Exp $ ...@@ -51,9 +51,11 @@ $PostgreSQL: pgsql/doc/src/sgml/lobj.sgml,v 1.35 2005/01/08 22:13:33 tgl Exp $
</para> </para>
<para> <para>
<indexterm><primary>TOAST</></> <indexterm>
<indexterm><primary>sliced bread</><see>TOAST</></indexterm> <primary>TOAST</primary>
<productname>PostgreSQL 7.1</productname> introduced a mechanism <secondary>versus large objects</secondary>
</indexterm>
<productname>PostgreSQL</productname> 7.1 introduced a mechanism
(nicknamed <quote><acronym>TOAST</acronym></quote>) that allows (nicknamed <quote><acronym>TOAST</acronym></quote>) that allows
data values to be much larger than single pages. This data values to be much larger than single pages. This
makes the large object facility partially obsolete. One makes the large object facility partially obsolete. One
......
<!-- <!--
$PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.72 2004/12/30 03:13:56 tgl Exp $ $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.73 2005/01/10 00:04:38 tgl Exp $
--> -->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [ <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
...@@ -237,8 +237,7 @@ $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.72 2004/12/30 03:13:56 tgl Exp ...@@ -237,8 +237,7 @@ $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.72 2004/12/30 03:13:56 tgl Exp
&geqo; &geqo;
&indexcost; &indexcost;
&gist; &gist;
&filelayout; &storage;
&page;
&bki; &bki;
</part> </part>
......
<!-- <!--
$PostgreSQL: pgsql/doc/src/sgml/ref/alter_table.sgml,v 1.75 2005/01/04 00:39:53 tgl Exp $ $PostgreSQL: pgsql/doc/src/sgml/ref/alter_table.sgml,v 1.76 2005/01/10 00:04:43 tgl Exp $
PostgreSQL documentation PostgreSQL documentation
--> -->
...@@ -153,10 +153,14 @@ where <replaceable class="PARAMETER">action</replaceable> is one of: ...@@ -153,10 +153,14 @@ where <replaceable class="PARAMETER">action</replaceable> is one of:
inline, uncompressed. <literal>MAIN</literal> is for inline, inline, uncompressed. <literal>MAIN</literal> is for inline,
compressible data. <literal>EXTERNAL</literal> is for external, compressible data. <literal>EXTERNAL</literal> is for external,
uncompressed data, and <literal>EXTENDED</literal> is for external, uncompressed data, and <literal>EXTENDED</literal> is for external,
compressed data. <literal>EXTENDED</literal> is the default for all compressed data. <literal>EXTENDED</literal> is the default for most
data types that support it. Use of <literal>EXTERNAL</literal> will data types that support non-<literal>PLAIN</literal> storage.
Use of <literal>EXTERNAL</literal> will
make substring operations on <type>text</type> and <type>bytea</type> make substring operations on <type>text</type> and <type>bytea</type>
columns faster, at the penalty of increased storage space. columns faster, at the penalty of increased storage space. Note that
<literal>SET STORAGE</> doesn't itself change anything in the table,
it just sets the strategy to be pursued during future table updates.
See <xref linkend="storage-toast"> for more information.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
......
<!-- <!--
$PostgreSQL: pgsql/doc/src/sgml/xtypes.sgml,v 1.24 2005/01/08 22:13:38 tgl Exp $ $PostgreSQL: pgsql/doc/src/sgml/xtypes.sgml,v 1.25 2005/01/10 00:04:38 tgl Exp $
--> -->
<sect1 id="xtypes"> <sect1 id="xtypes">
...@@ -232,10 +232,14 @@ CREATE TYPE complex ( ...@@ -232,10 +232,14 @@ CREATE TYPE complex (
</para> </para>
<para> <para>
<indexterm>
<primary>TOAST</primary>
<secondary>and user-defined types</secondary>
</indexterm>
If the values of your data type might exceed a few hundred bytes in If the values of your data type might exceed a few hundred bytes in
size (in internal form), you should make the data type size (in internal form), you should make the data type
TOAST-able.<indexterm><primary>TOAST</primary><secondary>and <acronym>TOAST</>-able (see <xref linkend="storage-toast">).
user-defined types</secondary></indexterm> To do this, the internal To do this, the internal
representation must follow the standard layout for variable-length representation must follow the standard layout for variable-length
data: the first four bytes must be an <type>int32</type> containing data: the first four bytes must be an <type>int32</type> containing
the total length in bytes of the datum (including itself). The C the total length in bytes of the datum (including itself). The C
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment