* 2009-05-26: More stems were added for the GoldenDict project. Those are useless for spellchecking, there is no point in adopting those changes for spellcheckers. espa~nol.dicc Release 1.10-- A Spanish (Español) dictionary for using with ispell 3.1.13 or later Copyright (c) 1994 1995 1996 1999 2001 2005 2008 Santiago Rodriguez and Jesus Carretero This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. This software can be obtained from http://www.datsi.fi.upm.es/~coes/ This set of data files implements a Spanish (castellano) dictionary to be used with the international ispell program, version 3.1.13 or further. The dictionary contains 53,000 roots (aprox.). If you want to run the spanish dictionary, you have to undefine the NO8BIT macro in the local.h configuration file. Contents: 1. Uncompressing the package. 2. Building the dictionary. 3. Installing the dictionary. 4. Supported Formatters. 5. MSDOS Dictionary. 1. The distribution is included in the espa~nol.tar.gz or espa~nol-1.10.tar.gz file. To extract sources that are in files ending in `.tar.gz' you can use the command gzip -d < espa~nol.tar.gz | tar xf - where `espa~nol.tar.gz' is the name of the file. This file is expanded to the following files: espa~nol.aff: Affixes file. espa~nol.words: Contains a list of words that appear in the official espa~nol dictionary (Diccionario de la Real Academia Espa'nola de la Lengua 22nd edition). espa~nol.comp: Contains a list of words not appearing in the official dictionary but being used in computer related texts. espa~nol.nofl: Contains a list of words not appearing in the official dictionary but being used normal spanish and they are "correct" words. antiguas.words: Contains a list of words that appear in the official espa~nol dictionary and they are old ones that are not currently in use. espa~nol.words+: Contains the expanded list of words generated from the espa~nol.words and espa~nol.comp word files. e~nes: Script for replacing the 'n and 'N by ~n and ~N in the espa~nol.aff, espa~nol.words and espa~nol.words+. If you use the second way to specify this letter you have to run this script. This script uses the sed utility. It has been checked by using the GNU sed version 2.05. If you want to run this script make sure that you have the GNU sed installed and type: make e~ne Makefile: Makefile for building the hash file (espa~nol.hash) from the affix file and the espa~nol.words file. This way of building the hash file needs about 50Mb of paging space and 100 Mb of temporary disk space. Please, ensure that you have enough disk space in the tmp partition (usually /usr/tmp). If you do not have it, you have to set the TMPDIR environment variable to a path where you can allocate 100 Mb of temporary disk storage. 2. If you want to create the espa~nol.hash just type: make Quick building: If you want to create the espa~nol.hash from the expanded word list (espa~nol.words+), just type: make build It does not need so much temporary space. The size of the spanish dictionary (espa~nol.hash) is 3.9Mbytes (Solaris 2.7 has this problem). If you get a size much bigger, probably it is due to the sort command of the operating system. In this case we recommend to install the textutils package of GNU and be sure that the sort command that you use is the textutils one. 3. To install the hash file become root and type make install 4. Spanish acute chars may be codes in different ways. Six different formatters are supported: Default formatter: The acute characters are coded as follows: 'a á 'e é 'i í 'o ó 'u ú 'n ñ "u ü 'A Á 'E É 'I Í 'O Ó 'U Ú 'N Ñ "U Ü TeX formatter: The acute characters are coded as follows: \'a á \'e é \'{\i} í \'o ó \'u ú \'n ñ \"u ü \'A Á \'E É \'{\I} Í \'O Ó \'U Ú \'N Ñ \"U Ü plainTeX formatter: The acute characters are coded as follows: \'{a} á \'{e} é \'{\i} í \'{o} ó \'{u} ú \'{n} ñ \"{u} ü \'{A} Á \'{E} É \'{\I} Í \'{O} Ó \'{U} Ú \'{N} Ñ \"{U} Ü latin1 formatter: The acute characters are coded as specified in the iso_8859_1 code. utf8 formatter: The acute characters are coded as specified in the utf8 code. msdos formatter: The acute characters are coded as specified in the extended ASCII MSDOS code. html formatter: The acute characters are coded as follows: á á é é í í ó ó ú ú ñ ñ ü ü Á Á É É Í Í Ó Ó Ú Ú Ñ Ñ Ü Ü If you want to run ispell by using one of the previous formats please type: ispell -T -d espa~nol 5. espa~nol.hash file is available for MSDOS users at: http://www.datsi.fi.upm.es/~coes/ Note that the affixes list and the word list are under development. We are currently working on them. If you find words that does not appear in the word list or words that must not appear in the word list, please send a message to espanol-bugs@datsi.fi.upm.es Santiago Rodriguez & Jesus Carretero Departamento de Arquitectura y Tecnologia de Sistemas Informaticos (DATSI) Universidad Politecnica de Madrid December 1994 January 1995 February 1995 November 1996 April 1999 June 2001 March 2005 November 2005 May 2008 Email: srodri@fi.upm.es, jesus.carretero@uc3m.es