(Choose 2 answers)
Which of the following statements about stochastic gradient descent are true? Check all that apply.
A. If you have a huge training set, then stochastic gradient descent may be much faster than batch gradient
descent.
B. One of the advantages of stochastic gradient descent is that it uses parallelization and thus runs much faster than batch gradient descent.
C. One of the advantages of stochastic gradient descent is that it can start progress in improving the parameters after looking at just a single training example; in contrast, batch gradient descent needs to take a pass over the entire training set before it starts to make progress in improving the parameters' values.
D. In order to make sure stochastic gradient descent is converging, we typically compute J_train() after each iteration (and plot it) in order to make sure that the cost function is generally decreasing.
E38