README.gendict 4.41 KB
Newer Older
Teodor Sigaev's avatar
Teodor Sigaev committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Gendict - generate dictionary templates for contrib/tsearch2 module.

This utility aims to help people creating dictionary for contrib/tsearch v2
module. Particularly, it has built-in support for snowball stemmers.

Programming API to tsearch2 dictionaries is described in tsearch v2 


* PostgreSQL 7.3 and above.

* You need tsearch2 module sources already compiled

* Rights to install contrib modules


    run without parameters to see options and arguments

./ -n DICTNAME ( [ -s [ -p PREFIX ] ] | [ -c CFILES ] [ -h HFILES ] [ -i ] ) [ -v ] [ -d DIR ] [ -C COMMENT ]
    -v - be verbose
    -d DIR - name of directory in PGSQL_SRC/contrib (default dict_DICTNAME)
    -C COMMENT - dictionary comment
Generate Snowball stemmer:
./ -n DICTNAME -s [ -p PREFIX ] [ -v ] [ -d DIR ] [ -C COMMENT ]
    -s - generate Snowball wrapper
    -p - prefix of Snowball's function, (default DICTNAME)
Generate template dictionary:
./ -n DICTNAME [ -c CFILES ] [ -h HFILES ] [ -i ] [ -v ] [ -d DIR ] [ -C COMMENT ]
    -c CFILES - source files, must be placed in contrib/tsearch2/gendict directory.
                These files will be used in Makefile.
    -h HFILES - header files, must be placed in contrib/tsearch2/gendict directory.
                These files will be used in Makefile and subinclude.h
    -i - dictionary has init method

Example 1:

   Create Portuguese stemmer
   0. cd PGSQL_SRC/contrib/tsearch2/gendict

   1. Obtain stem.{c,h} files for Portuguese

   2. Create template files for Portuguese

Teodor Sigaev's avatar
Teodor Sigaev committed
      ./ -n pt -s -p portuguese_ISO_8859_1 -v -C'Snowball stemmer for Portuguese'
Teodor Sigaev's avatar
Teodor Sigaev committed
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

      Note, that argument for -p option should be *the same* as name of stemming
      function in stem.c (without _stem)

      A bunch of files will be generated and placed in PGSQL_SRC/contrib/dict_pt

   3. Compile and install dictionary

	cd PGSQL_SRC/contrib/dict_pt
	make install

   4. Test it 

	Sample portuguese words with the stemmed forms are available

 	createdb testdict
	psql testdict < /usr/local/pgsql/share/contrib/tsearch2.sql
	psql testdict < /usr/local/pgsql/share/contrib/dict_pt.sql
	psql -d testdict -c "select lexize('pt','bobagem');"
	(1 row)

	Here is what I have in pg_ts_dict table

	psql -d testdict -c "select * from pg_ts_dict where dict_name='pt';"
Teodor Sigaev's avatar
Teodor Sigaev committed
84 85 86 87
	 dict_name |     dict_init      | dict_initoption |              dict_lexize              |          dict_comment           
	 pt        | dinit_pt(internal) |                 | snb_lexize(internal,internal,integer) | Snowball stemmer for Portuguese

Teodor Sigaev's avatar
Teodor Sigaev committed
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130
	(1 row)

        Note, that you have already installed dictionary and corresponding
	entry in tsearch configuration and you may modify it using
	plain SQL commands, for example, specify stop words.

Example 2:

      a) Simple template dictionary with init method 

       ./ -n wow -v -i -C WOW

      b) Create simple template dict (without init method):
	./ -n wow -v  -C WOW

        The same as above, but dictionary will have not init method

       Dictionaries obtained in a) and b) are fully working and ready
       for use: 
	  a) lowercase input word and remove it if it is a stop word
	  b) recognizes any word

      c) Simple template dictionary with source files (with init method):

       ./ -n wow -v -i -c a.c -h a.h -C WOW

        Source files ( a.c ) must be placed in contrib/tsearch2/gendict directory.
        These files will be used in Makefile.

        Header files ( a.h ), must be placed in contrib/tsearch2/gendict directory.
        These files will be used in Makefile and subinclude.h

      d) Simple template dictionary with source files (without init method):

	./ -n wow -v  -c a.c -h a.h -C WOW

	The same as above, but dictionary will have not init method

       After that you have sources in PGSQL_SRC/contrib/dict_wow and
       you may edit them to create actual dictionary.

  Please, check Tsearch2 home page (
Teodor Sigaev's avatar
Teodor Sigaev committed
  for additional information about "Gendict tutorial" and dictionaries.