Commit 8c72a7fa authored by Tom Lane's avatar Tom Lane

Update our documentation concerning where to create data directories.

Although initdb has long discouraged use of a filesystem mount-point
directory as a PG data directory, this point was covered nowhere in the
user-facing documentation.  Also, with the popularity of pg_upgrade,
we really need to recommend that the PG user own not only the data
directory but its parent directory too.  (Without a writable parent
directory, operations such as "mv data data.old" fail immediately.
pg_upgrade itself doesn't do that, but wrapper scripts for it often do.)

Hence, adjust the "Creating a Database Cluster" section to address
these points.  I also took the liberty of wordsmithing the discussion
of NFS a bit.

These considerations aren't by any means new, so back-patch to all
supported branches.
parent 6d10f4e9
...@@ -49,7 +49,7 @@ ...@@ -49,7 +49,7 @@
<para> <para>
Before you can do anything, you must initialize a database storage Before you can do anything, you must initialize a database storage
area on disk. We call this a <firstterm>database cluster</firstterm>. area on disk. We call this a <firstterm>database cluster</firstterm>.
(<acronym>SQL</acronym> uses the term catalog cluster.) A (The <acronym>SQL</acronym> standard uses the term catalog cluster.) A
database cluster is a collection of databases that is managed by a database cluster is a collection of databases that is managed by a
single instance of a running database server. After initialization, a single instance of a running database server. After initialization, a
database cluster will contain a database named <literal>postgres</literal>, database cluster will contain a database named <literal>postgres</literal>,
...@@ -65,7 +65,7 @@ ...@@ -65,7 +65,7 @@
</para> </para>
<para> <para>
In file system terms, a database cluster will be a single directory In file system terms, a database cluster is a single directory
under which all data will be stored. We call this the <firstterm>data under which all data will be stored. We call this the <firstterm>data
directory</firstterm> or <firstterm>data area</firstterm>. It is directory</firstterm> or <firstterm>data area</firstterm>. It is
completely up to you where you choose to store your data. There is no completely up to you where you choose to store your data. There is no
...@@ -109,15 +109,18 @@ ...@@ -109,15 +109,18 @@
<para> <para>
<command>initdb</command> will attempt to create the directory you <command>initdb</command> will attempt to create the directory you
specify if it does not already exist. It is likely that it will not specify if it does not already exist. Of course, this will fail if
have the permission to do so (if you followed our advice and created <command>initdb</command> does not have permissions to write in the
an unprivileged account). In that case you should create the parent directory. It's generally recommendable that the
directory yourself (as root) and change the owner to be the <productname>PostgreSQL</productname> user own not just the data
<productname>PostgreSQL</productname> user. Here is how this might directory but its parent directory as well, so that this should not
be done: be a problem. If the desired parent directory doesn't exist either,
you will need to create it first, using root privileges if the
grandparent directory isn't writable. So the process might look
like this:
<screen> <screen>
root# <userinput>mkdir /usr/local/pgsql/data</userinput> root# <userinput>mkdir /usr/local/pgsql</userinput>
root# <userinput>chown postgres /usr/local/pgsql/data</userinput> root# <userinput>chown postgres /usr/local/pgsql</userinput>
root# <userinput>su postgres</userinput> root# <userinput>su postgres</userinput>
postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput> postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
</screen> </screen>
...@@ -125,7 +128,9 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput> ...@@ -125,7 +128,9 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
<para> <para>
<command>initdb</command> will refuse to run if the data directory <command>initdb</command> will refuse to run if the data directory
looks like it has already been initialized.</para> exists and already contains files; this is to prevent accidentally
overwriting an existing installation.
</para>
<para> <para>
Because the data directory contains all the data stored in the Because the data directory contains all the data stored in the
...@@ -178,8 +183,30 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput> ...@@ -178,8 +183,30 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
locale setting. For details see <xref linkend="multibyte">. locale setting. For details see <xref linkend="multibyte">.
</para> </para>
<sect2 id="creating-cluster-mount-points">
<title>Use of Secondary File Systems</title>
<indexterm zone="creating-cluster-mount-points">
<primary>file system mount points</primary>
</indexterm>
<para>
Many installations create their database clusters on file systems
(volumes) other than the machine's <quote>root</> volume. If you
choose to do this, it is not advisable to try to use the secondary
volume's topmost directory (mount point) as the data directory.
Best practice is to create a directory within the mount-point
directory that is owned by the <productname>PostgreSQL</productname>
user, and then create the data directory within that. This avoids
permissions problems, particularly for operations such
as <application>pg_upgrade</>, and it also ensures clean failures if
the secondary volume is taken offline.
</para>
</sect2>
<sect2 id="creating-cluster-nfs"> <sect2 id="creating-cluster-nfs">
<title>Network File Systems</title> <title>Use of Network File Systems</title>
<indexterm zone="creating-cluster-nfs"> <indexterm zone="creating-cluster-nfs">
<primary>Network File Systems</primary> <primary>Network File Systems</primary>
...@@ -188,22 +215,30 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput> ...@@ -188,22 +215,30 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
<indexterm><primary>Network Attached Storage (<acronym>NAS</>)</><see>Network File Systems</></> <indexterm><primary>Network Attached Storage (<acronym>NAS</>)</><see>Network File Systems</></>
<para> <para>
Many installations create database clusters on network file systems. Many installations create their database clusters on network file
Sometimes this is done directly via <acronym>NFS</>, or by using a systems. Sometimes this is done via <acronym>NFS</>, or by using a
Network Attached Storage (<acronym>NAS</>) device that uses Network Attached Storage (<acronym>NAS</>) device that uses
<acronym>NFS</> internally. <productname>PostgreSQL</> does nothing <acronym>NFS</> internally. <productname>PostgreSQL</> does nothing
special for <acronym>NFS</> file systems, meaning it assumes special for <acronym>NFS</> file systems, meaning it assumes
<acronym>NFS</> behaves exactly like locally-connected drives <acronym>NFS</> behaves exactly like locally-connected drives.
(<acronym>DAS</>, Direct Attached Storage). If client and server If the client or server <acronym>NFS</> implementation does not
<acronym>NFS</> implementations have non-standard semantics, this can provide standard file system semantics, this can
cause reliability problems (see <ulink cause reliability problems (see <ulink
url="http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html"></ulink>). url="http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html"></ulink>).
Specifically, delayed (asynchronous) writes to the <acronym>NFS</> Specifically, delayed (asynchronous) writes to the <acronym>NFS</>
server can cause reliability problems; if possible, mount server can cause data corruption problems. If possible, mount the
<acronym>NFS</> file systems synchronously (without caching) to avoid <acronym>NFS</> file system synchronously (without caching) to avoid
this. Also, soft-mounting <acronym>NFS</> is not recommended. this hazard. Also, soft-mounting the <acronym>NFS</> file system is
(Storage Area Networks (<acronym>SAN</>) use a low-level not recommended.
communication protocol rather than <acronym>NFS</>.) </para>
<para>
Storage Area Networks (<acronym>SAN</>) typically use communication
protocols other than <acronym>NFS</>, and may or may not be subject
to hazards of this sort. It's advisable to consult the vendor's
documentation concerning data consistency guarantees.
<productname>PostgreSQL</productname> cannot be more reliable than
the file system it's using.
</para> </para>
</sect2> </sect2>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment