Commit d2bddc25 authored by Thomas Munro's avatar Thomas Munro

Add huge_page_size setting for use on Linux.

This allows the huge page size to be set explicitly.  The default is 0,
meaning it will use the system default, as before.

Author: Odin Ugedal <odin@ugedal.com>
Discussion: https://postgr.es/m/20200608154639.20254-1-odin%40ugedal.com
parent d66b23b0
...@@ -1582,6 +1582,33 @@ include_dir 'conf.d' ...@@ -1582,6 +1582,33 @@ include_dir 'conf.d'
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry id="guc-huge-page-size" xreflabel="huge_page_size">
<term><varname>huge_page_size</varname> (<type>integer</type>)
<indexterm>
<primary><varname>huge_page_size</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Controls the size of huge pages, when they are enabled with
<xref linkend="guc-huge-pages"/>.
The default is zero (<literal>0</literal>).
When set to <literal>0</literal>, the default huge page size on the
system will be used.
</para>
<para>
Some commonly available page sizes on modern 64 bit server architectures include:
<literal>2MB</literal> and <literal>1GB</literal> (Intel and AMD), <literal>16MB</literal> and
<literal>16GB</literal> (IBM POWER), and <literal>64kB</literal>, <literal>2MB</literal>,
<literal>32MB</literal> and <literal>1GB</literal> (ARM). For more information
about usage and support, see <xref linkend="linux-huge-pages"/>.
</para>
<para>
Non-default settings are currently supported only on Linux.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-temp-buffers" xreflabel="temp_buffers"> <varlistentry id="guc-temp-buffers" xreflabel="temp_buffers">
<term><varname>temp_buffers</varname> (<type>integer</type>) <term><varname>temp_buffers</varname> (<type>integer</type>)
<indexterm> <indexterm>
......
...@@ -1391,13 +1391,14 @@ export PG_OOM_ADJUST_VALUE=0 ...@@ -1391,13 +1391,14 @@ export PG_OOM_ADJUST_VALUE=0
using large values of <xref linkend="guc-shared-buffers"/>. To use this using large values of <xref linkend="guc-shared-buffers"/>. To use this
feature in <productname>PostgreSQL</productname> you need a kernel feature in <productname>PostgreSQL</productname> you need a kernel
with <varname>CONFIG_HUGETLBFS=y</varname> and with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to adjust <varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the kernel setting <varname>vm.nr_hugepages</varname>. To estimate the the operating system to provide enough huge pages of the desired size.
number of huge pages needed, start <productname>PostgreSQL</productname> To estimate the number of huge pages needed, start
without huge pages enabled and check the <productname>PostgreSQL</productname> without huge pages enabled and check
postmaster's anonymous shared memory segment size, as well as the system's the postmaster's anonymous shared memory segment size, as well as the
huge page size, using the <filename>/proc</filename> file system. This might system's default and supported huge page sizes, using the
look like: <filename>/proc</filename> and <filename>/sys</filename> file systems.
This might look like:
<programlisting> <programlisting>
$ <userinput>head -1 $PGDATA/postmaster.pid</userinput> $ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
4170 4170
...@@ -1405,27 +1406,40 @@ $ <userinput>pmap 4170 | awk '/rw-s/ &amp;&amp; /zero/ {print $2}'</userinput> ...@@ -1405,27 +1406,40 @@ $ <userinput>pmap 4170 | awk '/rw-s/ &amp;&amp; /zero/ {print $2}'</userinput>
6490428K 6490428K
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput> $ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
Hugepagesize: 2048 kB Hugepagesize: 2048 kB
$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
hugepages-1048576kB hugepages-2048kB
</programlisting> </programlisting>
In this example the default is 2MB, but you can also explicitly request
either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
Assuming <literal>2MB</literal> huge pages,
<literal>6490428</literal> / <literal>2048</literal> gives approximately <literal>6490428</literal> / <literal>2048</literal> gives approximately
<literal>3169.154</literal>, so in this example we need at <literal>3169.154</literal>, so in this example we need at
least <literal>3170</literal> huge pages, which we can set with: least <literal>3170</literal> huge pages. A larger setting would be
appropriate if other programs on the machine also need huge pages.
We can set this with:
<programlisting>
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
</programlisting>
Don't forget to add this setting to <filename>/etc/sysctl.conf</filename>
so that it is reapplied after reboots. For non-default huge page sizes,
we can instead use:
<programlisting> <programlisting>
$ <userinput>sysctl -w vm.nr_hugepages=3170</userinput> # <userinput>echo 3170 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages</userinput>
</programlisting> </programlisting>
A larger setting would be appropriate if other programs on the machine It is also possible to provide these settings at boot time using
also need huge pages. Don't forget to add this setting kernel parameters such as <literal>hugepagesz=2M hugepages=3170</literal>.
to <filename>/etc/sysctl.conf</filename> so that it will be reapplied
after reboots.
</para> </para>
<para> <para>
Sometimes the kernel is not able to allocate the desired number of huge Sometimes the kernel is not able to allocate the desired number of huge
pages immediately, so it might be necessary to repeat the command or to pages immediately due to fragmentation, so it might be necessary
reboot. (Immediately after a reboot, most of the machine's memory to repeat the command or to reboot. (Immediately after a reboot, most of
should be available to convert into huge pages.) To verify the huge the machine's memory should be available to convert into huge pages.)
page allocation situation, use: To verify the huge page allocation situation for a given size, use:
<programlisting> <programlisting>
$ <userinput>grep Huge /proc/meminfo</userinput> $ <userinput>cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages</userinput>
</programlisting> </programlisting>
</para> </para>
...@@ -1438,8 +1452,9 @@ $ <userinput>grep Huge /proc/meminfo</userinput> ...@@ -1438,8 +1452,9 @@ $ <userinput>grep Huge /proc/meminfo</userinput>
<para> <para>
The default behavior for huge pages in The default behavior for huge pages in
<productname>PostgreSQL</productname> is to use them when possible and <productname>PostgreSQL</productname> is to use them when possible, with
to fall back to normal pages when failing. To enforce the use of huge the system's default huge page size, and
to fall back to normal pages on failure. To enforce the use of huge
pages, you can set <xref linkend="guc-huge-pages"/> pages, you can set <xref linkend="guc-huge-pages"/>
to <literal>on</literal> in <filename>postgresql.conf</filename>. to <literal>on</literal> in <filename>postgresql.conf</filename>.
Note that with this setting <productname>PostgreSQL</productname> will fail to Note that with this setting <productname>PostgreSQL</productname> will fail to
......
...@@ -32,6 +32,7 @@ ...@@ -32,6 +32,7 @@
#endif #endif
#include "miscadmin.h" #include "miscadmin.h"
#include "port/pg_bitutils.h"
#include "portability/mem.h" #include "portability/mem.h"
#include "storage/dsm.h" #include "storage/dsm.h"
#include "storage/fd.h" #include "storage/fd.h"
...@@ -448,7 +449,7 @@ PGSharedMemoryAttach(IpcMemoryId shmId, ...@@ -448,7 +449,7 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
#ifdef MAP_HUGETLB #ifdef MAP_HUGETLB
/* /*
* Identify the huge page size to use. * Identify the huge page size to use, and compute the related mmap flags.
* *
* Some Linux kernel versions have a bug causing mmap() to fail on requests * Some Linux kernel versions have a bug causing mmap() to fail on requests
* that are not a multiple of the hugepage size. Versions without that bug * that are not a multiple of the hugepage size. Versions without that bug
...@@ -464,25 +465,13 @@ PGSharedMemoryAttach(IpcMemoryId shmId, ...@@ -464,25 +465,13 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* hugepage sizes, we might want to think about more invasive strategies, * hugepage sizes, we might want to think about more invasive strategies,
* such as increasing shared_buffers to absorb the extra space. * such as increasing shared_buffers to absorb the extra space.
* *
* Returns the (real or assumed) page size into *hugepagesize, * Returns the (real, assumed or config provided) page size into *hugepagesize,
* and the hugepage-related mmap flags to use into *mmap_flags. * and the hugepage-related mmap flags to use into *mmap_flags.
*
* Currently *mmap_flags is always just MAP_HUGETLB. Someday, on systems
* that support it, we might OR in additional bits to specify a particular
* non-default huge page size.
*/ */
static void static void
GetHugePageSize(Size *hugepagesize, int *mmap_flags) GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{ {
/* Size default_hugepagesize = 0;
* If we fail to find out the system's default huge page size, assume it
* is 2MB. This will work fine when the actual size is less. If it's
* more, we might get mmap() or munmap() failures due to unaligned
* requests; but at this writing, there are no reports of any non-Linux
* systems being picky about that.
*/
*hugepagesize = 2 * 1024 * 1024;
*mmap_flags = MAP_HUGETLB;
/* /*
* System-dependent code to find out the default huge page size. * System-dependent code to find out the default huge page size.
...@@ -491,6 +480,7 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags) ...@@ -491,6 +480,7 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
* nnnn kB". Ignore any failures, falling back to the preset default. * nnnn kB". Ignore any failures, falling back to the preset default.
*/ */
#ifdef __linux__ #ifdef __linux__
{ {
FILE *fp = AllocateFile("/proc/meminfo", "r"); FILE *fp = AllocateFile("/proc/meminfo", "r");
char buf[128]; char buf[128];
...@@ -505,7 +495,7 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags) ...@@ -505,7 +495,7 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{ {
if (ch == 'k') if (ch == 'k')
{ {
*hugepagesize = sz * (Size) 1024; default_hugepagesize = sz * (Size) 1024;
break; break;
} }
/* We could accept other units besides kB, if needed */ /* We could accept other units besides kB, if needed */
...@@ -515,6 +505,44 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags) ...@@ -515,6 +505,44 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
} }
} }
#endif /* __linux__ */ #endif /* __linux__ */
if (huge_page_size != 0)
{
/* If huge page size is requested explicitly, use that. */
*hugepagesize = (Size) huge_page_size * 1024;
}
else if (default_hugepagesize != 0)
{
/* Otherwise use the system default, if we have it. */
*hugepagesize = default_hugepagesize;
}
else
{
/*
* If we fail to find out the system's default huge page size, or no
* huge page size is requested explicitly, assume it is 2MB. This will
* work fine when the actual size is less. If it's more, we might get
* mmap() or munmap() failures due to unaligned requests; but at this
* writing, there are no reports of any non-Linux systems being picky
* about that.
*/
*hugepagesize = 2 * 1024 * 1024;
}
*mmap_flags = MAP_HUGETLB;
/*
* On recent enough Linux, also include the explicit page size, if
* necessary.
*/
#if defined(MAP_HUGE_MASK) && defined(MAP_HUGE_SHIFT)
if (*hugepagesize != default_hugepagesize)
{
int shift = pg_ceil_log2_64(*hugepagesize);
*mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
}
#endif
} }
#endif /* MAP_HUGETLB */ #endif /* MAP_HUGETLB */
...@@ -583,7 +611,7 @@ CreateAnonymousSegment(Size *size) ...@@ -583,7 +611,7 @@ CreateAnonymousSegment(Size *size)
"(currently %zu bytes), reduce PostgreSQL's shared " "(currently %zu bytes), reduce PostgreSQL's shared "
"memory usage, perhaps by reducing shared_buffers or " "memory usage, perhaps by reducing shared_buffers or "
"max_connections.", "max_connections.",
*size) : 0)); allocsize) : 0));
} }
*size = allocsize; *size = allocsize;
......
...@@ -20,11 +20,14 @@ ...@@ -20,11 +20,14 @@
#include <float.h> #include <float.h>
#include <math.h> #include <math.h>
#include <limits.h> #include <limits.h>
#include <unistd.h> #ifndef WIN32
#include <sys/mman.h>
#endif
#include <sys/stat.h> #include <sys/stat.h>
#ifdef HAVE_SYSLOG #ifdef HAVE_SYSLOG
#include <syslog.h> #include <syslog.h>
#endif #endif
#include <unistd.h>
#include "access/commit_ts.h" #include "access/commit_ts.h"
#include "access/gin.h" #include "access/gin.h"
...@@ -198,6 +201,7 @@ static bool check_max_wal_senders(int *newval, void **extra, GucSource source); ...@@ -198,6 +201,7 @@ static bool check_max_wal_senders(int *newval, void **extra, GucSource source);
static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource source); static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource source);
static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source); static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source);
static bool check_maintenance_io_concurrency(int *newval, void **extra, GucSource source); static bool check_maintenance_io_concurrency(int *newval, void **extra, GucSource source);
static bool check_huge_page_size(int *newval, void **extra, GucSource source);
static void assign_pgstat_temp_directory(const char *newval, void *extra); static void assign_pgstat_temp_directory(const char *newval, void *extra);
static bool check_application_name(char **newval, void **extra, GucSource source); static bool check_application_name(char **newval, void **extra, GucSource source);
static void assign_application_name(const char *newval, void *extra); static void assign_application_name(const char *newval, void *extra);
...@@ -576,6 +580,7 @@ int ssl_renegotiation_limit; ...@@ -576,6 +580,7 @@ int ssl_renegotiation_limit;
* need to be duplicated in all the different implementations of pg_shmem.c. * need to be duplicated in all the different implementations of pg_shmem.c.
*/ */
int huge_pages; int huge_pages;
int huge_page_size;
/* /*
* These variables are all dummies that don't do anything, except in some * These variables are all dummies that don't do anything, except in some
...@@ -3381,6 +3386,17 @@ static struct config_int ConfigureNamesInt[] = ...@@ -3381,6 +3386,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, assign_tcp_user_timeout, show_tcp_user_timeout NULL, assign_tcp_user_timeout, show_tcp_user_timeout
}, },
{
{"huge_page_size", PGC_POSTMASTER, RESOURCES_MEM,
gettext_noop("The size of huge page that should be requested."),
NULL,
GUC_UNIT_KB
},
&huge_page_size,
0, 0, INT_MAX,
check_huge_page_size, NULL, NULL
},
/* End-of-list marker */ /* End-of-list marker */
{ {
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
...@@ -11565,6 +11581,20 @@ check_maintenance_io_concurrency(int *newval, void **extra, GucSource source) ...@@ -11565,6 +11581,20 @@ check_maintenance_io_concurrency(int *newval, void **extra, GucSource source)
return true; return true;
} }
static bool
check_huge_page_size(int *newval, void **extra, GucSource source)
{
#if !(defined(MAP_HUGE_MASK) && defined(MAP_HUGE_SHIFT))
/* Recent enough Linux only, for now. See GetHugePageSize(). */
if (*newval != 0)
{
GUC_check_errdetail("huge_page_size must be 0 on this platform.");
return false;
}
#endif
return true;
}
static void static void
assign_pgstat_temp_directory(const char *newval, void *extra) assign_pgstat_temp_directory(const char *newval, void *extra)
{ {
......
...@@ -122,6 +122,8 @@ ...@@ -122,6 +122,8 @@
# (change requires restart) # (change requires restart)
#huge_pages = try # on, off, or try #huge_pages = try # on, off, or try
# (change requires restart) # (change requires restart)
#huge_page_size = 0 # zero for system default
# (change requires restart)
#temp_buffers = 8MB # min 800kB #temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature #max_prepared_transactions = 0 # zero disables the feature
# (change requires restart) # (change requires restart)
......
...@@ -44,6 +44,7 @@ typedef struct PGShmemHeader /* standard header for all Postgres shmem */ ...@@ -44,6 +44,7 @@ typedef struct PGShmemHeader /* standard header for all Postgres shmem */
/* GUC variables */ /* GUC variables */
extern int shared_memory_type; extern int shared_memory_type;
extern int huge_pages; extern int huge_pages;
extern int huge_page_size;
/* Possible values for huge_pages */ /* Possible values for huge_pages */
typedef enum typedef enum
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment