In the past years an activity concerning morphological analysis was carried out at IRST. A first technical report (1991) was written in order to document this activity, which led to a tool able to decompose each Italian word into its morphemes and to give some syntactical information for each valid decomposition. The morpho-lexicon was that of the Zingarelli dictionary, resulting in about 90,000 morphemes. Particular care was devoted to irregular verbs.
Recently this tool was enriched with transcription capability. Each morpheme is now associated to its meta-transcription, which is an intermediate representation that can evolve in different ways, depending on the adjacent morphemes. For example, the meta-transcription for the stem LEGG cannot resolve the sound "GG" which is different if the next morpheme is O or I (leading to words leggo, leggi). This work (but not only) was done by Roberto Palmarin during its Tesi di Laurea "Utilizzo di nozioni morfo-sintattiche in un sistema di riconoscimento del parlato", Padova 1996. In this work were also checked and inserted the transcriptions of the 10,000 most frequent words of the journal "il Sole 24 Ore".
Then Daniela Gretter, in January 1998, extended the 10,000 morpho-lexicon to the whole Zingarelli lexicon. Her careful work put also in evidence some errors of the previous versions. In September 1998, this lexicon was further enriched with names and neologisms found in the 65,000 most frequent words of the newspaper "Il Sole 24 Ore". Also the most frequent Italian proper names and surnames (from the telephone directory), geographical names, acronyms, company names, commonly used foreign words were added to the lexicon, which is actually composed by more than 100,000 entries.
This tool was used to generate ILE,
an Italian LExicon which is going to be distribued by ELRA.
It is also the base tool for TRAMORPH.
Some output samples for the words "capitale", "pesca", "ancora" follow.
Last update 31/3/1998 - Maintainer Roberto Gretter