HOMEWORK ASSIGNMENT #4

24th January 2002

Due Tuesday, 12th February 2002

* Check the readings for this week.

* Mail your code to me.

* Your code must contain clear and comprehensive documentation.

* Written assignment to be submitted in class.

* Solve the following problems:

  1. Exercise 6.5 in the text.
  2. Use the 218 Sentences corpora given in the resources section. The sentences are based on a limited vocabulary. In this exercise you are going to build language models based on some of the techniques learnt in class.
    1. Build unigram, bigram and trigram, language models using maximum likelihood estimation. Remember to account for sentence beginnings with the start of sentence (s) tags.

      A. What is the probability of "Venkat likes bread."

      B. What is the probability of "Raghav likes bread."

      C. What is the probability of "Raghav thing bread."

      D. What is the probability of "Venkat, some bread?"

      E. What is the most likely trigram given that "thing bread" has occured.

      F. What is the most likely trigram given that "Venkat likes" has occured.

    2. Build unigram, bigram and trigram models using expected likelihood estimation.

      Repeat A-F above using the new models.

    3. Perform simple linear interpolation using different weights for the mixtures.

      A. Repeat A-F above using the mixture models with the different weights you choose.

      B. Devise a scheme to correct the sentence: "Venkat likes that it bread."


BACK