Allow multi-inserts during COPY into a partitioned table

CopyFrom allows multi-inserts to be used for non-partitioned tables, but this was disabled for partitioned tables. The reason for this appeared to be that the tuple may not belong to the same partition as the previous tuple did. Not allowing multi-inserts here greatly slowed down imports into partitioned tables. These could take twice as long as a copy to an equivalent non-partitioned table. It seems wise to do something about this, so this change allows the multi-inserts by flushing the so-far inserted tuples to the partition when the next tuple does not belong to the same partition, or when the buffer fills. This improves performance when the next tuple in the stream commonly belongs to the same partition as the previous tuple. In cases where the target partition changes on every tuple, using multi-inserts slightly slows the performance. To get around this we track the average size of the batches that have been inserted and adaptively enable or disable multi-inserts based on the size of the batch. Some testing was done and the regression only seems to exist when the average size of the insert batch is close to 1, so let's just enable multi-inserts when the average size is at least 1.3. More performance testing might reveal a better number for, this, but since the slowdown was only 1-2% it does not seem critical enough to spend too much time calculating it. In any case it may depend on other factors rather than just the size of the batch. Allowing multi-inserts for partitions required a bit of work around the per-tuple memory contexts as we must flush the tuples when the next tuple does not belong the same partition. In which case there is no good time to reset the per-tuple context, as we've already built the new tuple by this time. In order to work around this we maintain two per-tuple contexts and just switch between them every time the partition changes and reset the old one. This does mean that the first of each batch of tuples is not allocated in the same memory context as the others, but that does not matter since we only reset the context once the previous batch has been inserted. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>

Allow multi-inserts during COPY into a partitioned table
CopyFrom allows multi-inserts to be used for non-partitioned tables, but this was disabled for partitioned tables. The reason for this appeared to be that the tuple may not belong to the same partition as the previous tuple did. Not allowing multi-inserts here greatly slowed down imports into partitioned tables. These could take twice as long as a copy to an equivalent non-partitioned table. It seems wise to do something about this, so this change allows the multi-inserts by flushing the so-far inserted tuples to the partition when the next tuple does not belong to the same partition, or when the buffer fills. This improves performance when the next tuple in the stream commonly belongs to the same partition as the previous tuple. In cases where the target partition changes on every tuple, using multi-inserts slightly slows the performance. To get around this we track the average size of the batches that have been inserted and adaptively enable or disable multi-inserts based on the size of the batch. Some testing was done and the regression only seems to exist when the average size of the insert batch is close to 1, so let's just enable multi-inserts when the average size is at least 1.3. More performance testing might reveal a better number for, this, but since the slowdown was only 1-2% it does not seem critical enough to spend too much time calculating it. In any case it may depend on other factors rather than just the size of the batch. Allowing multi-inserts for partitions required a bit of work around the per-tuple memory contexts as we must flush the tuples when the next tuple does not belong the same partition. In which case there is no good time to reset the per-tuple context, as we've already built the new tuple by this time. In order to work around this we maintain two per-tuple contexts and just switch between them every time the partition changes and reset the old one. This does mean that the first of each batch of tuples is not allocated in the same memory context as the others, but that does not matter since we only reset the context once the previous batch has been inserted. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
0d5f05cd · Peter Eisentraut · b6d6488a · 0d5f05cd · 0d5f05cd · 0d5f05cd
Commit 0d5f05cd authored Aug 01, 2018 by Peter Eisentraut
6 changed files
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -774,7 +774,8 @@ HeapTuple
 ConvertPartitionTupleSlot(TupleConversionMap *map,
 						  HeapTuple tuple,
 						  TupleTableSlot *new_slot,
-						  TupleTableSlot **p_my_slot)
+						  TupleTableSlot **p_my_slot,
+						  bool shouldFree)
 {
 	if (!map)
 		return tuple;
@@ -787,7 +788,7 @@ ConvertPartitionTupleSlot(TupleConversionMap *map,
 	*p_my_slot = new_slot;
 	Assert(new_slot != NULL);
 	ExecSetSlotDescriptor(new_slot, map->outdesc);
-	ExecStoreTuple(tuple, new_slot, InvalidBuffer, true);
+	ExecStoreTuple(tuple, new_slot, InvalidBuffer, shouldFree);
 	return tuple;
 }

--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1164,7 +1164,8 @@ lreplace:;
 			tuple = ConvertPartitionTupleSlot(tupconv_map,
 											  tuple,
 											  proute->root_tuple_slot,
-											  &slot);
+											  &slot,
+											  true);
 			/*
 			 * Prepare for tuple routing, making it look like we're inserting
@@ -1792,7 +1793,8 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ConvertPartitionTupleSlot(proute->parent_child_tupconv_maps[partidx],
 							  tuple,
 							  proute->partition_tuple_slot,
-							  &slot);
+							  &slot,
+							  true);
 	/* Initialize information needed to handle ON CONFLICT DO UPDATE. */
 	Assert(mtstate != NULL);

--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -205,7 +205,8 @@ extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 						  HeapTuple tuple,
 						  TupleTableSlot *new_slot,
-						  TupleTableSlot **p_my_slot);
+						  TupleTableSlot **p_my_slot,
+						  bool shouldFree);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 						PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,

--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -133,3 +133,33 @@ this is just a line full of junk that would error out if parsed
 \.
 copy copytest3 to stdout csv header;
+-- test copy from with a partitioned table
+create table parted_copytest (
+	a int,
+	b int,
+	c text
+) partition by list (b);
+create table parted_copytest_a1 (c text, b int, a int);
+create table parted_copytest_a2 (a int, c text, b int);
+alter table parted_copytest attach partition parted_copytest_a1 for values in(1);
+alter table parted_copytest attach partition parted_copytest_a2 for values in(2);
+-- We must insert enough rows to trigger multi-inserts.  These are only
+-- enabled adaptively when there are few enough partition changes.
+insert into parted_copytest select x,1,'One' from generate_series(1,1000) x;
+insert into parted_copytest select x,2,'Two' from generate_series(1001,1010) x;
+insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
+copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+select tableoid::regclass,count(*),sum(a) from parted_copytest
+group by tableoid order by tableoid::regclass::name;
+drop table parted_copytest;
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,3 +95,30 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+-- test copy from with a partitioned table
+create table parted_copytest (
+	a int,
+	b int,
+	c text
+) partition by list (b);
+create table parted_copytest_a1 (c text, b int, a int);
+create table parted_copytest_a2 (a int, c text, b int);
+alter table parted_copytest attach partition parted_copytest_a1 for values in(1);
+alter table parted_copytest attach partition parted_copytest_a2 for values in(2);
+-- We must insert enough rows to trigger multi-inserts.  These are only
+-- enabled adaptively when there are few enough partition changes.
+insert into parted_copytest select x,1,'One' from generate_series(1,1000) x;
+insert into parted_copytest select x,2,'Two' from generate_series(1001,1010) x;
+insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
+copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+select tableoid::regclass,count(*),sum(a) from parted_copytest
+group by tableoid order by tableoid::regclass::name;
+      tableoid      | count |  sum   
+--------------------+-------+--------
+ parted_copytest_a1 |  1010 | 510655
+ parted_copytest_a2 |    10 |  10055
+(2 rows)
+drop table parted_copytest;