array.sgml 18.1 KB
Newer Older
1
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/array.sgml,v 1.32 2003/11/01 01:56:28 petere Exp $ -->
2

3
<sect1 id="arrays">
4 5
 <title>Arrays</title>

6
 <indexterm>
Peter Eisentraut's avatar
Peter Eisentraut committed
7
  <primary>array</primary>
8 9
 </indexterm>

10
 <para>
11
  <productname>PostgreSQL</productname> allows columns of a table to be
12
  defined as variable-length multidimensional arrays. Arrays of any
13 14 15 16 17 18 19 20
  built-in type or user-defined type can be created.
 </para>

 <sect2>
  <title>Declaration of Array Types</title>

 <para>
  To illustrate the use of array types, we create this table:
21
<programlisting>
22
CREATE TABLE sal_emp (
23
    name            text,
24
    pay_by_quarter  integer[],
25
    schedule        text[][]
26
);
27
</programlisting>
28
  As shown, an array data type is named by appending square brackets
29 30 31 32 33 34 35 36 37
  (<literal>[]</>) to the data type name of the array elements.  The
  above command will create a table named
  <structname>sal_emp</structname> with a column of type
  <type>text</type> (<structfield>name</structfield>), a
  one-dimensional array of type <type>integer</type>
  (<structfield>pay_by_quarter</structfield>), which represents the
  employee's salary by quarter, and a two-dimensional array of
  <type>text</type> (<structfield>schedule</structfield>), which
  represents the employee's weekly schedule.
38
 </para>
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

 <para>
  The syntax for <command>CREATE TABLE</command> allows the exact size of
  arrays to be specified, for example:

<programlisting>
CREATE TABLE tictactoe (
    squares   integer[3][3]
);
</programlisting>

  However, the current implementation does not enforce the array size
  limits --- the behavior is the same as for arrays of unspecified
  length.
 </para>

 <para>
  Actually, the current implementation does not enforce the declared
  number of dimensions either.  Arrays of a particular element type are
  all considered to be of the same type, regardless of size or number
  of dimensions.  So, declaring number of dimensions or sizes in
  <command>CREATE TABLE</command> is simply documentation, it does not
  affect runtime behavior.
 </para>

 <para>
  An alternative, SQL99-standard syntax may be used for one-dimensional arrays.
  <structfield>pay_by_quarter</structfield> could have been defined as:
<programlisting>
    pay_by_quarter  integer ARRAY[4],
</programlisting>
  This syntax requires an integer constant to denote the array size.
  As before, however, <productname>PostgreSQL</> does not enforce the
  size restriction.
 </para>
74 75 76 77
 </sect2>

 <sect2>
  <title>Array Value Input</title>
78 79

 <para>
80
  Now we can show some <command>INSERT</command> statements.  To write an array
81 82 83 84 85
  value as a literal constant, we enclose the element values within curly
  braces and separate them by commas.  (If you know C, this is not unlike the
  C syntax for initializing structures.)  We may put double quotes around any
  element value, and must do so if it contains commas or curly braces.
  (More details appear below.)
86

87
<programlisting>
88
INSERT INTO sal_emp
89 90 91 92
    VALUES ('Bill',
    '{10000, 10000, 10000, 10000}',
    '{{"meeting", "lunch"}, {}}');

93
INSERT INTO sal_emp
94 95 96
    VALUES ('Carol',
    '{20000, 25000, 25000, 25000}',
    '{{"talk", "consult"}, {"meeting"}}');
97 98 99
</programlisting>
 </para>

100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
 <para>
  A limitation of the present array implementation is that individual
  elements of an array cannot be SQL null values.  The entire array can be set
  to null, but you can't have an array with some elements null and some
  not.
 </para>
 <para>
  This can lead to surprising results. For example, the result of the
  previous two inserts looks like this:
<programlisting>
SELECT * FROM sal_emp;
 name  |      pay_by_quarter       |      schedule
-------+---------------------------+--------------------
 Bill  | {10000,10000,10000,10000} | {{meeting},{""}}
 Carol | {20000,25000,25000,25000} | {{talk},{meeting}}
(2 rows)
</programlisting>
  Because the <literal>[2][2]</literal> element of
  <structfield>schedule</structfield> is missing in each of the
  <command>INSERT</command> statements, the <literal>[1][2]</literal>
  element is discarded.
 </para>

123 124
 <note>
  <para>
125
   Fixing this is on the to-do list.
126 127
  </para>
 </note>
128 129

 <para>
130
  The <literal>ARRAY</literal> expression syntax may also be used:
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
<programlisting>
INSERT INTO sal_emp
    VALUES ('Bill',
    ARRAY[10000, 10000, 10000, 10000],
    ARRAY[['meeting', 'lunch'], ['','']]);

INSERT INTO sal_emp
    VALUES ('Carol',
    ARRAY[20000, 25000, 25000, 25000],
    ARRAY[['talk', 'consult'], ['meeting', '']]);
SELECT * FROM sal_emp;
 name  |      pay_by_quarter       |           schedule
-------+---------------------------+-------------------------------
 Bill  | {10000,10000,10000,10000} | {{meeting,lunch},{"",""}}
 Carol | {20000,25000,25000,25000} | {{talk,consult},{meeting,""}}
(2 rows)
</programlisting>
  Note that with this syntax, multidimensional arrays must have matching
149 150 151
  extents for each dimension. A mismatch causes an error report, rather than
  silently discarding values as in the previous case.
  For example:
152 153 154 155 156
<programlisting>
INSERT INTO sal_emp
    VALUES ('Carol',
    ARRAY[20000, 25000, 25000, 25000],
    ARRAY[['talk', 'consult'], ['meeting']]);
157
ERROR:  multidimensional arrays must have array expressions with matching dimensions
158
</programlisting>
159 160 161 162 163
  Also notice that the array elements are ordinary SQL constants or
  expressions; for instance, string literals are single quoted, instead of
  double quoted as they would be in an array literal.  The <literal>ARRAY</>
  expression syntax is discussed in more detail in <xref
  linkend="sql-syntax-array-constructors">.
164
 </para>
165 166 167
 </sect2>

 <sect2>
168
  <title>Accessing Arrays</title>
169

170
 <para>
171
  Now, we can run some queries on the table.
172 173 174
  First, we show how to access a single element of an array at a time.
  This query retrieves the names of the employees whose pay changed in
  the second quarter:
175
     
176
<programlisting>
177
SELECT name FROM sal_emp WHERE pay_by_quarter[1] &lt;&gt; pay_by_quarter[2];
178

179 180 181 182
 name
-------
 Carol
(1 row)
183
</programlisting>
184

185
  The array subscript numbers are written within square brackets.
186
  By default <productname>PostgreSQL</productname> uses the
187
  one-based numbering convention for arrays, that is,
Peter Eisentraut's avatar
Peter Eisentraut committed
188 189
  an array of <replaceable>n</> elements starts with <literal>array[1]</literal> and
  ends with <literal>array[<replaceable>n</>]</literal>.
190
 </para>
191

192 193
 <para>
  This query retrieves the third quarter pay of all employees:
194
     
195
<programlisting>
196
SELECT pay_by_quarter[3] FROM sal_emp;
197

198 199 200 201 202
 pay_by_quarter
----------------
          10000
          25000
(2 rows)
203 204 205 206 207 208
</programlisting>
 </para>

 <para>
  We can also access arbitrary rectangular slices of an array, or
  subarrays.  An array slice is denoted by writing
209
  <literal><replaceable>lower-bound</replaceable>:<replaceable>upper-bound</replaceable></literal>
210
  for one or more array dimensions.  For example, this query retrieves the first
211
  item on Bill's schedule for the first two days of the week:
212
     
213
<programlisting>
214
SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
215

216 217
      schedule
--------------------
218
 {{meeting},{""}}
219
(1 row)
220
</programlisting>
221

222
  We could also have written
223

224
<programlisting>
225
SELECT schedule[1:2][1] FROM sal_emp WHERE name = 'Bill';
226
</programlisting>
227

228
  with the same result.  An array subscripting operation is always taken to
229
  represent an array slice if any of the subscripts are written in the form
230 231
  <literal><replaceable>lower</replaceable>:<replaceable>upper</replaceable></literal>.
  A lower bound of 1 is assumed for any subscript where only one value
232
  is specified, as in this example:
233 234 235 236 237 238 239 240 241 242
<programlisting>
SELECT schedule[1:2][2] FROM sal_emp WHERE name = 'Bill';
         schedule
---------------------------
 {{meeting,lunch},{"",""}}
(1 row)
</programlisting>
 </para>

 <para>
243 244 245
  The current dimensions of any array value can be retrieved with the
  <function>array_dims</function> function:

246
<programlisting>
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267
SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';

 array_dims
------------
 [1:2][1:1]
(1 row)
</programlisting>

  <function>array_dims</function> produces a <type>text</type> result,
  which is convenient for people to read but perhaps not so convenient
  for programs.  Dimensions can also be retrieved with
  <function>array_upper</function> and <function>array_lower</function>,
  which return the upper and lower bound of a
  specified array dimension, respectively.

<programlisting>
SELECT array_upper(schedule, 1) FROM sal_emp WHERE name = 'Carol';

 array_upper
-------------
           2
268 269
(1 row)
</programlisting>
270
 </para>
271 272 273 274
 </sect2>

 <sect2>
  <title>Modifying Arrays</title>
275

276 277
 <para>
  An array value can be replaced completely:
278

279
<programlisting>
280 281
UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
    WHERE name = 'Carol';
282
</programlisting>
283

284
  or using the <literal>ARRAY</literal> expression syntax:
285 286 287 288 289 290 291

<programlisting>
UPDATE sal_emp SET pay_by_quarter = ARRAY[25000,25000,27000,27000]
    WHERE name = 'Carol';
</programlisting>

  An array may also be updated at a single element:
292

293
<programlisting>
294 295
UPDATE sal_emp SET pay_by_quarter[4] = 15000
    WHERE name = 'Bill';
296
</programListing>
297

298
  or updated in a slice:
299

300
<programlisting>
301 302
UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
    WHERE name = 'Carol';
303
</programlisting>
304

305 306 307
 </para>

 <para>
308
  A stored array value can be enlarged by assigning to an element adjacent to
309
  those already present, or by assigning to a slice that is adjacent
310 311 312 313 314
  to or overlaps the data already present.  For example, if array
  <literal>myarray</> currently has 4 elements, it will have five
  elements after an update that assigns to <literal>myarray[5]</>.
  Currently, enlargement in this fashion is only allowed for one-dimensional
  arrays, not multidimensional arrays.
315 316
 </para>

317 318
 <para>
  Array slice assignment allows creation of arrays that do not use one-based
319
  subscripts.  For example one might assign to <literal>myarray[-2:7]</> to
320 321 322
  create an array with subscript values running from -2 to 7.
 </para>

323
 <para>
324 325
  New array values can also be constructed by using the concatenation operator,
  <literal>||</literal>.
326 327
<programlisting>
SELECT ARRAY[1,2] || ARRAY[3,4];
328 329 330
 ?column?
-----------
 {1,2,3,4}
331 332 333 334 335 336 337 338
(1 row)

SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];
      ?column?
---------------------
 {{5,6},{1,2},{3,4}}
(1 row)
</programlisting>
339
 </para>
340

341
 <para>
342
  The concatenation operator allows a single element to be pushed on to the
343
  beginning or end of a one-dimensional array. It also accepts two
344
  <replaceable>N</>-dimensional arrays, or an <replaceable>N</>-dimensional
345 346
  and an <replaceable>N+1</>-dimensional array.
 </para>
347

348 349 350
 <para>
  When a single element is pushed on to the beginning of a one-dimensional
  array, the result is an array with a lower bound subscript equal to
351
  the right-hand operand's lower bound subscript, minus one. When a single
352
  element is pushed on to the end of a one-dimensional array, the result is
353
  an array retaining the lower bound of the left-hand operand. For example:
354
<programlisting>
355
SELECT array_dims(1 || ARRAY[2,3]);
356 357 358 359
 array_dims
------------
 [0:2]
(1 row)
360 361 362 363 364 365 366 367 368 369 370

SELECT array_dims(ARRAY[1,2] || 3);
 array_dims
------------
 [1:3]
(1 row)
</programlisting>
 </para>

 <para>
  When two arrays with an equal number of dimensions are concatenated, the
371 372 373
  result retains the lower bound subscript of the left-hand operand's outer
  dimension. The result is an array comprising every element of the left-hand
  operand followed by every element of the right-hand operand. For example:
374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400
<programlisting>
SELECT array_dims(ARRAY[1,2] || ARRAY[3,4,5]);
 array_dims
------------
 [1:5]
(1 row)

SELECT array_dims(ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8],[9,0]]);
 array_dims
------------
 [1:5][1:2]
(1 row)
</programlisting>
 </para>

 <para>
  When an <replaceable>N</>-dimensional array is pushed on to the beginning
  or end of an <replaceable>N+1</>-dimensional array, the result is
  analogous to the element-array case above. Each <replaceable>N</>-dimensional
  sub-array is essentially an element of the <replaceable>N+1</>-dimensional
  array's outer dimension. For example:
<programlisting>
SELECT array_dims(ARRAY[1,2] || ARRAY[[3,4],[5,6]]);
 array_dims
------------
 [0:2][1:2]
(1 row)
401 402 403 404
</programlisting>
 </para>

 <para>
405
  An array can also be constructed by using the functions
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428
  <function>array_prepend</function>, <function>array_append</function>,
  or <function>array_cat</function>. The first two only support one-dimensional
  arrays, but <function>array_cat</function> supports multidimensional arrays.

  Note that the concatenation operator discussed above is preferred over
  direct use of these functions. In fact, the functions are primarily for use
  in implementing the concatenation operator. However, they may be directly
  useful in the creation of user-defined aggregates. Some examples:

<programlisting>
SELECT array_prepend(1, ARRAY[2,3]);
 array_prepend
---------------
 {1,2,3}
(1 row)

SELECT array_append(ARRAY[1,2], 3);
 array_append
--------------
 {1,2,3}
(1 row)

SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);
429 430 431
 array_cat
-----------
 {1,2,3,4}
432 433 434 435 436 437 438 439 440 441 442 443 444 445
(1 row)

SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
      array_cat
---------------------
 {{1,2},{3,4},{5,6}}
(1 row)

SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);
      array_cat
---------------------
 {{5,6},{1,2},{3,4}}
</programlisting>
 </para>
446 447 448 449
 </sect2>

 <sect2>
  <title>Searching in Arrays</title>
450 451 452

 <para>
  To search for a value in an array, you must check each value of the
453
  array. This can be done by hand, if you know the size of the array.
454
  For example:
455 456 457 458 459 460 461 462 463

<programlisting>
SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
                            pay_by_quarter[2] = 10000 OR
                            pay_by_quarter[3] = 10000 OR
                            pay_by_quarter[4] = 10000;
</programlisting>

  However, this quickly becomes tedious for large arrays, and is not
464 465 466
  helpful if the size of the array is uncertain. An alternative method is
  described in <xref linkend="functions-comparisons">. The above
  query could be replaced by:
467 468

<programlisting>
469
SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
470 471 472
</programlisting>

  In addition, you could find rows where the array had all values
473
  equal to 10000 with:
474 475

<programlisting>
476
SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);
477 478 479 480 481 482
</programlisting>

 </para>

 <tip>
  <para>
483 484 485 486 487
   Arrays are not sets; searching for specific array elements
   may be a sign of database misdesign.  Consider
   using a separate table with a row for each item that would be an
   array element.  This will be easier to search, and is likely to
   scale up better to large numbers of elements.
488 489
  </para>
 </tip>
490
 </sect2>
491

492 493
 <sect2>
  <title>Array Input and Output Syntax</title>
494

495
  <para>
496
   The external text representation of an array value consists of items that
497 498 499 500 501 502
   are interpreted according to the I/O conversion rules for the array's
   element type, plus decoration that indicates the array structure.
   The decoration consists of curly braces (<literal>{</> and <literal>}</>)
   around the array value plus delimiter characters between adjacent items.
   The delimiter character is usually a comma (<literal>,</>) but can be
   something else: it is determined by the <literal>typdelim</> setting
503
   for the array's element type.  (Among the standard data types provided
504 505 506 507 508 509 510 511 512 513
   in the <productname>PostgreSQL</productname> distribution, type
   <literal>box</> uses a semicolon (<literal>;</>) but all the others
   use comma.)  In a multidimensional array, each dimension (row, plane,
   cube, etc.) gets its own level of curly braces, and delimiters
   must be written between adjacent curly-braced entities of the same level.
   You may write whitespace before a left brace, after a right
   brace, or before any individual item string.  Whitespace after an item
   is not ignored, however: after skipping leading whitespace, everything
   up to the next right brace or delimiter is taken as the item value.
  </para>
514 515

  <para>
516
   As shown previously, when writing an array value you may write double
517 518 519
   quotes around any individual array
   element.  You <emphasis>must</> do so if the element value would otherwise
   confuse the array-value parser.  For example, elements containing curly
520 521
   braces, commas (or whatever the delimiter character is), double quotes,
   backslashes, or leading white space must be double-quoted.  To put a double
522 523
   quote or backslash in a quoted array element value, precede it with a
   backslash.
524 525
   Alternatively, you can use backslash-escaping to protect all data characters
   that would otherwise be taken as array syntax or ignorable white space.
526 527
  </para>

528 529 530 531 532
  <para>
   The array output routine will put double quotes around element values
   if they are empty strings or contain curly braces, delimiter characters,
   double quotes, backslashes, or white space.  Double quotes and backslashes
   embedded in element values will be backslash-escaped.  For numeric
533 534
   data types it is safe to assume that double quotes will never appear, but
   for textual data types one should be prepared to cope with either presence
535 536 537 538
   or absence of quotes.  (This is a change in behavior from pre-7.2
   <productname>PostgreSQL</productname> releases.)
  </para>

539
 <note>
540
  <para>
541
   Remember that what you write in an SQL command will first be interpreted
542 543 544 545 546 547 548 549
   as a string literal, and then as an array.  This doubles the number of
   backslashes you need.  For example, to insert a <type>text</> array
   value containing a backslash and a double quote, you'd need to write
<programlisting>
INSERT ... VALUES ('{"\\\\","\\""}');
</programlisting>
   The string-literal processor removes one level of backslashes, so that
   what arrives at the array-value parser looks like <literal>{"\\","\""}</>.
550
   In turn, the strings fed to the <type>text</> data type's input routine
551
   become <literal>\</> and <literal>"</> respectively.  (If we were working
552
   with a data type whose input routine also treated backslashes specially,
553
   <type>bytea</> for example, we might need as many as eight backslashes
554
   in the command to get one backslash into the stored array element.)
555
  </para>
556
 </note>
557 558 559 560 561 562 563 564 565

 <tip>
  <para>
   The <literal>ARRAY</> constructor syntax is often easier to work with
   than the array-literal syntax when writing array values in SQL commands.
   In <literal>ARRAY</>, individual element values are written the same way
   they would be written when not members of an array.
  </para>
 </tip>
566
 </sect2>
567

568
</sect1>