(Choose 1 answer)
Let w be a word and Xw be a binary random variable that indicates whether w appears in a text document in the corpus. Assume that the probability P(Xw=1) is estimated by Count(w)/N, where Count(w) is the number of documents w appears in and N is the total number of documents in the corpus.You are given that "the" is a very frequent word that appears in 99% of the documents and that "photon" is a very rare word that occurs in 1% of the documents. Which word has a higher entropy?
A. "the"
B. "photon"
C. Both words have the same entropy
(20