README.intarray 5.43 KB
Newer Older
1
This is an implementation of RD-tree data structure using GiST interface
2 3 4 5 6 7
of PostgreSQL. It has built-in lossy compression.

Current implementation provides index support for one-dimensional array of
int4's - gist__int_ops, suitable for small and medium size of arrays (used on
default), and gist__intbig_ops for indexing large arrays (we use superimposed
signature with length of 4096 bits to represent sets).
8

9 10
All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist
11
for additional information. Andrey Oktyabrski did a great work on 
Bruce Momjian's avatar
Bruce Momjian committed
12 13 14 15 16 17
adding new functions and operations.


FUNCTIONS:

  int   icount(int[]) - the number of elements in intarray
18 19 20 21 22 23 24

test=# select icount('{1,2,3}'::int[]);   
 icount 
--------
      3
(1 row)

Bruce Momjian's avatar
Bruce Momjian committed
25
  int[] sort(int[], 'asc' | 'desc') - sort intarray
26 27 28 29 30 31 32

test=# select sort('{1,2,3}'::int[],'desc');
  sort   
---------
 {3,2,1}
(1 row)

Bruce Momjian's avatar
Bruce Momjian committed
33 34
  int[] sort(int[]) - sort in ascending order
  int[] sort_asc(int[]),sort_desc(int[]) - shortcuts for sort 
35

Bruce Momjian's avatar
Bruce Momjian committed
36
  int[] uniq(int[]) - returns unique elements
37 38 39 40 41 42 43

test=# select uniq(sort('{1,2,3,2,1}'::int[]));
  uniq   
---------
 {1,2,3}
(1 row)

Bruce Momjian's avatar
Bruce Momjian committed
44 45
  int   idx(int[], int item) - returns index of first intarray matching element to item, or
                              '0' if matching failed.
46 47 48 49 50 51 52 53

test=# select idx('{1,2,3,2,1}'::int[],2);
 idx 
-----
   2
(1 row)


Bruce Momjian's avatar
Bruce Momjian committed
54 55 56
  int[] subarray(int[],int START [, int LEN]) - returns part of intarray starting from
                                                element number START (from 1) and length LEN. 

57 58 59 60 61 62
test=# select subarray('{1,2,3,2,1}'::int[],2,3);
 subarray 
----------
 {2,3,2}
(1 row)

Teodor Sigaev's avatar
Teodor Sigaev committed
63 64 65 66 67 68 69
  int[] intset(int4) - casting int4 to int[]

test=# select intset(1);
 intset 
--------
 {1}
(1 row)
70

Bruce Momjian's avatar
Bruce Momjian committed
71 72 73 74 75 76 77 78 79
OPERATIONS:

  int[] && int[]  - overlap - returns TRUE if arrays has at least one common elements.
  int[] @  int[]  - contains - returns TRUE if left array contains right array
  int[] ~ int[]   - contained - returns TRUE if left array is contained in right array
  # int[]         - return the number of elements in array
  int[] + int     - push element to array ( add to end of array)
  int[] + int[]   - merge of arrays (right array added to the end of left one)
  int[] - int     - remove entries matched by right argument from array
80
  int[] - int[]   - remove right array from left
Bruce Momjian's avatar
Bruce Momjian committed
81 82 83
  int[] | int     - returns intarray - union of arguments
  int[] | int[]   - returns intarray as a union of two arrays
  int[] & int[]   - returns intersection of arrays
84 85
  int[] @@ query_int  - returns TRUE if array satisfies query (like '1&(2|3)') 
  query_int ~~ int[]  - -/-
86

87 88
CHANGES:

Bruce Momjian's avatar
Bruce Momjian committed
89 90 91 92
August 6, 2002
   1. Reworked patch from Andrey Oktyabrski (ano@spider.ru) with
      functions: icount, sort, sort_asc, uniq, idx, subarray
      operations: #, +, -, |, &
93 94
October 1, 2001
   1. Change search method in array to binary
Tom Lane's avatar
Tom Lane committed
95 96 97
September 28, 2001
   1. gist__int_ops now is without lossy
   2. add sort entry in picksplit
98 99 100 101
September 21, 2001
   1. Added support for boolean query (indexable operator @@, looks like
      a @@ '1|(2&3)', perfomance is better in any case )
   2. Done some small optimizations
102 103 104 105
March 19, 2001
   1. Added support for toastable keys
   2. Improved split algorithm for intbig (selection speedup is about 30%)

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
INSTALLATION:

  gmake
  gmake install
  -- load functions
  psql <database> < _int.sql 

REGRESSION TEST:

   gmake installcheck

EXAMPLE USAGE:

  create table message (mid int not null,sections int[]);
  create table message_section_map (mid int not null,sid int not null);

  -- create indices
CREATE unique index message_key on message ( mid );
CREATE unique index message_section_map_key2 on message_section_map (sid, mid );
125
CREATE INDEX message_rdtree_idx on message using gist ( sections gist__int_ops);
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178

  -- select some messages with section in 1 OR 2 - OVERLAP operator
  select message.mid from message where message.sections && '{1,2}';  

  -- select messages contains in sections 1 AND 2 - CONTAINS operator
  select message.mid from message where message.sections @ '{1,2}';
  -- the same, CONTAINED operator
  select message.mid from message where '{1,2}' ~ message.sections;

BENCHMARK:

  subdirectory bench contains benchmark suite.
  cd ./bench
  1. createdb TEST
  2. psql TEST < ../_int.sql
  3. ./create_test.pl | psql TEST
  4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
                  with/without RD-Tree. Run script without arguments to 
                  see availbale options.

     a)test without RD-Tree (OR)
       ./bench.pl -d TEST -s 1,2 -v
     b)test with RD-Tree 
       ./bench.pl -d TEST -s 1,2 -v -r

BENCHMARKS:

Size of table <message>: 200000
Size of table <message_section_map>: 268538 

Distribution of messages by sections:

section 0: 73899 messages
section 1: 16298 messages
section 50: 1241 messages
section 99: 705 messages

old - without RD-Tree support,
new - with RD-Tree

+----------+---------------+----------------+
|Search set|OR, time in sec|AND, time in sec|
|          +-------+-------+--------+-------+
|          |  old  |  new  |   old  |  new  |
+----------+-------+-------+--------+-------+
|         1|  1.427|  0.215|       -|      -|
+----------+-------+-------+--------+-------+
|        99|  1.029|  0.018|       -|      -|
+----------+-------+-------+--------+-------+
|       1,2|  1.829|  0.334|   5.654|  0.042|
+----------+-------+-------+--------+-------+
| 1,2,50,60|  2.057|  0.359|   5.044|  0.007|
+----------+-------+-------+--------+-------+