DPL302_-_FE_-_SU_2023_466.webp
Q

DPL302_-_FE_-_SU_2023_466.webp

(Choose 1 answer)
GRU
Here're the update equations for the GRU.
<t> = tanh(W [T, c<t-1>, x<t>] +bc)
Alice proposes to simplify the GRU by always removing the Γu. I.e., setting u = 1.Betty proposes to simplify the GRU by removing the Fr.I. e., setting Fr = 1 always.Which of these models is more likely to work without vanishing gradient problems even when trained on very long input sequences?
Exit 39
A. Alice's model (removingГu), because if u=0 for a timestep, the gradient can propagate back through that timestep without much decay.
B. Betty's model (removing「u), because if u=1 for a timestep, the gradient can propagate back through that timestep without much decay.
C. Betty's model (removing

= (W,[c<t-1>, x<t>] + br)
c<t> = <t>+ (1-) c<-1>
a<t> = c<t>
Γ = (W[c<-1>, x<t>] + b)

Thông tin

Category
DPL302m
Thêm bởi
Quang Thái
Ngày thêm
Lượt xem
759
Lượt bình luận
2
Rating
0.00 star(s) 0 đánh giá

Image metadata

Filename
DPL302_-_FE_-_SU_2023_466.webp
File size
63.9 KB
Dimensions
1542px x 690px

Share this media

Back
Bên trên Bottom