BERxiT: Better-fine-tuned and Wider-applicable Early Exit for *BERT

Apr 1, 2021·

J. Xin

R. Tang

Y. Yu

J. Lin

· 0 min read

PDF Cite Code URL

Abstract

The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency. This paper aims to address two weaknesses of previous work: (1) existing fine-tuning strategies for early exiting models fail to take full advantage of BERT; (2) methods to make exiting decisions are limited to classification tasks. We propose a more advanced fine-tuning strategy and a learning-to-exit module that extends early exiting to tasks other than classification. Experiments demonstrate improved early exiting for BERT, with better trade-offs obtained by the proposed fine-tuning strategy, successful application to regression tasks, and the possibility to combine it with other acceleration methods. Source code can be found at https://github.com/castorini/berxit.

Type

Conference paper

Publication

The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL)

Last updated on Apr 1, 2021

Conference

← Posterior Differential Regularization with $f$-divergence for Improving Model Robustness Jun 1, 2021

Problems and Opportunities in Training Deep-Learning Software Systems: An Analysis of Variance Jan 27, 2021 →