DPL302_-_FE_-_SU_2023_466.webp
Q

DPL302_-_FE_-_SU_2023_466.webp

(Choose 1 answer)
You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative
large values, using np.random.randn(.....)*1000. What will happen?
A. It doesn't matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
B. This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large.You therefore have to set a to be very small to prevent divergence; this will slow down learning.
C. This will cause the inputs of the tanh to also be very large, causing the units to be "highly activated" and thus speed up learning compared to if the weights had to start from small values.
D. This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

Exit 34

Thông tin

Category
DPL302m
Thêm bởi
Quang Thái
Ngày thêm
Lượt xem
815
Lượt bình luận
3
Rating
0.00 star(s) 0 đánh giá

Image metadata

Filename
DPL302_-_FE_-_SU_2023_466.webp
File size
69.4 KB
Dimensions
1542px x 690px

Share this media

Back
Bên trên Bottom