(Choose 1 answer)
A. (1)
B. (II)
You run gradient descent for 15 iterations with a 0.3 and compute J(0) after each iteration. You find that the value of J(0) decreases slowly and is still decreasing after 15 iterations. Based on this, which of the following conclusions seems most plausible?
C. (III)
(1)Rather than use the current value of a, it'd be more promising to try a smaller value of a (say a = 0.1).
(II) Rather than use the current value of a, it'd be more promising to try a larger value of a (say a 1.0).
(III)a=0.3 is an effective choice of learning rate.
Et 45