MINOR 1

12th February 2002

45 minutes

* Solve the following four problems. Each problem is worth 4 marks.

  1. Devise a scheme to distinguish the language in which a given document is written.
  2. x 0 0 1 1
    y 0 1 0 1
    p(bank=x, credit=y) 0.9998977 0.00006 0.000042 0.0000003

  3. You are required to build a digit recognizer (0-9).
  4. Let c(cantankerous person)=0 and c(cantankerous autodidact)=0 be the counts of the two bigrams in our training corpus. Since the count is zero for both bigrams, Laplace's Law, Lidestone's Law, and Good-Turing will all assign the same probability i.e. p(cantankerous person) = p(cantankerous autodidact). However, intuitively we feel p(cantankerous person) > p(cantankerous autodidact). Explain how simple linear interpolation takes care of this problem.

BACK