TRAMORPH ITC-IRST - November 1998 transcription.txt In addition to TRAMORPH, a couple of perl scripts and another program (ApplyRules) are included, in order to simplify the transcription of a wordlist (one word for a row, also strange characters - see for an example test/test_transcribe.in - are accepted and converted in a format suitable for TRAMORPH) and to transcribe in some way words not recognized by TRAMORPH. They work well under linux and solaris; under windows there are problems mainly due to open2() perl routine, which is called to instantiate both TRAMORPH and ApplyRules. transcribe.pl # inputs a word for each row, gives it to TRAMORPH and # gets all valid transcriptions; unknown words are # transcribed by ApplyRules. All different # transcriptions are then output. Also converts a set # of chars into a format accepted by TRAMORPH (for # instance, transforms "città" into "citta`" or "größe" # into "grosse", which is a valid Italian word). big2sampa.pl # inputs a valid worlist provided by transcribe.pl (one # word with its transcription on each row) in TRAMORPH # units and outputs a corresponding wordlist in SAMPA # units. ApplyRules # default transcription: execute directives found in # the file etc/char2big to get a default transcription # for words not recognized by TRAMORPH. It is # instantiated and used by transcribe.pl. Suggested usages: If one wants to transcribe a wordlist and check possible transcription errors, could try the following command (have a look at the file test/test_transcribe.warn.out, which contains some possible warning and information provided by transcribe.pl): cat testlist | bin/transcribe.pl -nonormaldebug | grep WARNING If one wants to get transcriptions without any check, in sampa units: cat testlist | bin/transcribe.pl -blind -nonormaldebug | bin/big2sampa.pl The perl script transcribe.pl has the following usage: Usage: bin/transcribe.pl [ -nonormaldebug ] [-optsil] [-blind] [-validY] -nonormaldebug : does not output info about open connections -optsil : adds FSNbgopt between words in case of Multiple Morpho -blind : duplicate rows for different transcriptions and does not output warnings or comments -validY : trasforms y and j into i, after Morpho In the default case transcribe.pl outputs, for each input word, a row containing: word1 t r s 1 [ t r s 2 ] [ t r s ... ] ( comments ) word2 t r s 1 [ t r s 2 ] [ t r s ... ] ( comments ) In the "blind" case it outputs, for each word, one row for each different transcription, without comments: word1 t r s 1 word1 t r s 2 word1 t r s ... word2 t r s 1 word2 t r s 2 word2 t r s ... The perl script big2sampa.pl inputs from stdin a wordlist in units used by TRAMORPH (called big), transforms them in sampa units and outputs them on stdout. It can be used in pipe with transcribe.pl; it exits with an error message in case of unknown big unit. stdin il i l mio m ii o cagnolino k a N o l ii n o stdout il i l mio m "i o cagnolino k a JJ o l "i n o ApplyRules applies directives to either a fsn or to a wordlist, to transform them into another fsn. It is used by transcribe.pl only to obtain a transcription for words unrecognized by TRAMORPH. Its general usage is: Usage: ./bin/linux/ApplyRules RuleSet InFile OutFile [options] InFile and OutFile are either Automatas or WordLists (option -linear); if WordLists, InFile and OutFile can be stdin and stdout. [-verbose] gives some information. [-linear] works with WordLists as input. [-nolambda] does not allow lambdas to be inside a path to modify. [-local_opt] causes a local opt. to be performed after each rule. [-nonormaldebug] inhibits normal stderr messages. [-name name] name for the output automata. [-new_word] reads a word from InFile (can be stdin), produces an automata having a word label for each unit, and returns a string on stdout (either ERR or the automata filename). but here it is used only in the following way: ApplyRules ./etc/char2big stdin stdout -linear -nonormaldebug which inputs for each row one word followed by the characters componing it, separated by blank, like: citta` c i t t a ` struzzo s t r u z z o and transforms all the items (transcription) but the first (word) by applying the directives found in ./etc/char2big. This file contains rules to transform chars [a-zA-Z\'\`] into big units (i.e. compatible with TRAMORPH). Optionally "<"and ">" indicate word boundaries: citta` < c i t t a ` > struzzo < s t r u z z o > The output is a wordlist in big units, ready to be processed by big2sampa.pl; it looks like this: citta` C i tt aa struzzo X t @sch r u zz o