Anna Corazza, Renato De Mori, Roberto Gretter and Giorgio Satta
Vol. 16, No. 10, pp. 1018-1027, October 1994
Recently, the possibility of using Stochastic Context-Free Grammars (SCFG's) in Language Modeling (LM) has been considered. When these grammars are used, search can be directed by evaluation functions based on the probabilities that a SCFG generates a sentence, given only some words in it. Expressions for computing the evaluation function have been proposed by Jelinek and Lafferty for the recognition of word sequences in the case in which only the prefix of a sequence is known. Corazza et al. have proposed methods for probability computation in the more general case in which partial word sequences interleaved by gaps are known. This computation is too complex in practice unless the lengths of the gaps are known. This paper proposes a method for computing the probability of the best parse tree that can generate a sentence only part of which (consisting of prefix, islands and gaps) is known. This probability is the minimum possible, and thus the most informative, upper-bound that can be used in the evaluation function. The computation of the proposed upper-bound has cubic time complexity even if the lengths of the gaps are unknown. This makes possible the practical use of SCFG for driving interpretations of sentences in natural language processing.