back to DITELO

A Morphological analyser for the Italian language

This is an argument that I first encountered during my Tesi di Laurea "Analisi linguistica per un sistema di sintesi del parlato", Padova 1987, under the guide of Prof. Gian Antonio Mian.

In the past years an activity concerning morphological analysis was carried out at IRST. A first technical report (1991) was written in order to document this activity, which led to a tool able to decompose each Italian word into its morphemes and to give some syntactical information for each valid decomposition. The morpho-lexicon was that of the Zingarelli dictionary, resulting in about 90,000 morphemes. Particular care was devoted to irregular verbs.

Recently this tool was enriched with transcription capability. Each morpheme is now associated to its meta-transcription, which is an intermediate representation that can evolve in different ways, depending on the adjacent morphemes. For example, the meta-transcription for the stem LEGG cannot resolve the sound "GG" which is different if the next morpheme is O or I (leading to words leggo, leggi). This work (but not only) was done by Roberto Palmarin during its Tesi di Laurea "Utilizzo di nozioni morfo-sintattiche in un sistema di riconoscimento del parlato", Padova 1996. In this work were also checked and inserted the transcriptions of the 10,000 most frequent words of the journal "il Sole 24 Ore".

Then Daniela Gretter, in January 1998, extended the 10,000 morpho-lexicon to the whole Zingarelli lexicon. Her careful work put also in evidence some errors of the previous versions. In September 1998, this lexicon was further enriched with names and neologisms found in the 65,000 most frequent words of the newspaper "Il Sole 24 Ore". Also the most frequent Italian proper names and surnames (from the telephone directory), geographical names, acronyms, company names, commonly used foreign words were added to the lexicon, which is actually composed by more than 100,000 entries.

This tool was used to generate ILE, an Italian LExicon which is going to be distribued by ELRA. It is also the base tool for TRAMORPH.
 

Some output samples for the words "capitale", "pesca", "ancora" follow.
 
da CAPITALE k_a_p_i_t_aa_l_e_ SOSTANTIVO.MASCHILE.SINGOLARE.
da CAPITALE k_a_p_i_t_aa_l_e_ SOSTANTIVO.FEMMINILE.SINGOLARE.
da CAPITALE k_a_p_i_t_aa_l_e_ AGGETTIVO.MASCHILE.SINGOLARE.
da CAPITALE k_a_p_i_t_aa_l_e_ AGGETTIVO.FEMMINILE.SINGOLARE.
da CAPITARE k_aa_p_i_t_a_l_e_ VERBO.GENERICO.IMPERATIVO.PRESENTE. 
 SECONDA-PERSONA-SINGOLARE.CLITICO.
da CAPIRE  k_a_p_ii_t_a_l_e_ VERBO.GENERICO.PARTICIPIO.PASSATO. 
 FEMMINILE.SINGOLARE.CLITICO.
 
 
da PESCARE p_ee_X_k_a_ VERBO.GENERICO.IMPERATIVO.PRESENTE. 
 SECONDA-PERSONA-SINGOLARE.
da PESCARE p_ee_X_k_a_ VERBO.GENERICO.INDICATIVO.PRESENTE. 
 TERZA-PERSONA-SINGOLARE.
da PESCA p_ee_X_k_a_ SOSTANTIVO.FEMMINILE.SINGOLARE.
da PESCA p_EE_X_k_a_ SOSTANTIVO.FEMMINILE.SINGOLARE.
 
 
da ANCORA  a_n_k_oo_r_a_ CONGIUNZIONE.
da ANCORA  a_n_k_oo_r_a_ AVVERBIO.
da ANCORARE aa_n_k_o_r_a_ VERBO.GENERICO.IMPERATIVO.PRESENTE. 
 SECONDA-PERSONA-SINGOLARE.
da ANCORARE aa_n_k_o_r_a_ VERBO.GENERICO.INDICATIVO.PRESENTE. 
 TERZA-PERSONA-SINGOLARE.
da ANCORA  aa_n_k_o_r_a_ SOSTANTIVO.FEMMINILE.SINGOLARE.
 

 

  Last update 31/3/1998 - Maintainer Roberto Gretter