Chapter 5. An Gramadóir FAQ

5.1. Can I develop a language pack on my Windows computer?
5.2. What character encoding should I use for the input files in the language pack?
5.3. Six XML tags are reserved for use by An Gramadóir while checking grammar: B, E, F, X, Y, and Z. What do these mean?

5.1. Can I develop a language pack on my Windows computer?

No, or at least not without simulating a Unix-like environment with Cygwin. Even though the end user grammar checker Lingua::XX::Gramadoir is generated in pure Perl and will run under ActiveState Perl, the gramadoir scripts for generating it use bash, iconv, sed and all that.

5.2. What character encoding should I use for the input files in the language pack?

You can use whatever encoding you want. The end user will be aware of this choice in only one way: it will be the default encoding for files input to the front-end script gram-xx.pl. On the other hand, they need only specify the command line option --incode to change the default. One other issue to be aware of is that the Perl regular expression engine for Unicode is two to three times slower than the 8-bit version. So if you are deciding between using UTF-8 and, say, one of the ISO-8859 encodings, it is probably worth sticking with ISO-8859.

5.3. Six XML tags are reserved for use by An Gramadóir while checking grammar: B, E, F, X, Y, and Z. What do these mean?

Some of these can be seen in action in the extended example presented in Section 1.2.

E is used to mark up errors, something like this:

<E msg="PREFIXT"><T>an</T> <N pl="n" gnt="n" gnd="m">ainm</N></E>
	  

B and Z are used to mark up ambiguous words; see Section 3.2.3 for examples. F is used to mark up "rare" words, and should correspond to grammatical code 127 in your pos-xx.txt file. X is used to mark up words that do not appear in the lexicon. Y is used to mark up words that should be ignored, for instance if they appear in the user's ignore file.