Poster Title:  Investigation of Second-Order Optimization in Large Mini-batch Training
Poster Abstract: 

Classical learning theory states that when the number of parameters of the model is too large compared to the data, the model will overfit and the generalization performance deteriorates. However, it has been empirically shown that deep neural networks (DNN) can achieve high generalization capability by training with an extremely large amount of data and model parameters, which exceeds the predictions of classical learning theory. 

One drawback of this is that training of DNN requires enormous calculation time. Therefore, it is necessary to reduce the training time through large scale parallelization. 

Straightforward data-parallelization of DNN degrades convergence and generalization. In the present work, we investigate the possibility of using second-order methods to solve this generalization gap in large-batch training. 

This is motivated by our observation that each mini-batch becomes more statistically stable, and thus the effect of considering the curvature plays a more important role in large-batch training. 

Poster ID:  C-16
Poster File: 
Poster Image: 
Poster URL: