• Bruce Momjian's avatar
    Use O_DIRECT if available when using O_SYNC for wal_sync_method. · c34bb005
    Bruce Momjian authored
    Also, write multiple WAL buffers out in one write() operation.
    
    ITAGAKI Takahiro
    
    ---------------------------------------------------------------------------
    
    > If we disable writeback-cache and use open_sync, the per-page writing
    > behavior in WAL module will show up as bad result. O_DIRECT is similar
    > to O_DSYNC (at least on linux), so that the benefit of it will disappear
    > behind the slow disk revolution.
    >
    > In the current source, WAL is written as:
    >     for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); }
    > Is this intentional? Can we rewrite it as follows?
    >    write(&buffers[0], N * BLCKSZ);
    >
    > In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff).
    > Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff).
    > These two patches are independent, so they can be applied either or both.
    >
    >
    > I tested them on my machine and the results as follows. It shows that
    > direct-io and gather-write is the best choice when writeback-cache is off.
    > Are these two patches worth trying if they are used together?
    >
    >
    >             | writeback | fsync= | fdata | open_ | fsync_ | open_
    > patch       | cache     |  false |  sync |  sync | direct | direct
    > ------------+-----------+--------+-------+-------+--------+---------
    > direct io   | off       |  124.2 | 105.7 |  48.3 |   48.3 |  48.2
    > direct io   | on        |  129.1 | 112.3 | 114.1 |  142.9 | 144.5
    > gather-write| off       |  124.3 | 108.7 | 105.4 |  (N/A) | (N/A)
    > both        | off       |  131.5 | 115.5 | 114.4 |  145.4 | 145.2
    >
    > - 20runs * pgbench -s 100 -c 50 -t 200
    >    - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8)
    > - using 2 ATA disks:
    >    - hda(reiserfs) includes system and wal.
    >    - hdc(jfs) includes database files. writeback-cache is always on.
    >
    > ---
    > ITAGAKI Takahiro
    c34bb005
xlog.c 170 KB