... weight up-dates for final averaging (Collins, 2002) or for voting(Freund and Schapire, 1999).Algorithm 1 SGD: int I, T, float ηInitialize w0,0,0← 0. for epochs t ← 0 . . . T − 1: do for all i ∈ ... ithinput with wt,i,0. for all pairs xj, j ∈ {0 . . . P − 1}: dowt,i,j+1← wt,i,j− η∇lj(wt,i,j)end for wt,i+1,0← wt,i,Pend for wt+1,0,0← wt,I,0end for return1TTt=1wt,0,0While ... machines. for all shards z ∈ {1 . . . Z}: parallel doInitialize wz,0,0,0← 0. for epochs t ← 0 . . . T − 1: do for all i ∈ {0 . . . S − 1}: doDecode ithinput with wz,t,i,0. for all pairs...