Implicit authentication (IA) schemes use behavioural biometrics to continuously and transparently authenticate mobile device users. Several IA schemes have been proposed by researchers which employ different behavioural features and provide reasonable detection accuracy. While these schemes work in principle, it is difficult to comprehend from these individual efforts which schemes work best (in terms of detection accuracy, detection delay and processing complexity) under different operating conditions (in terms of attack scenarios and availability of training and classification data). Furthermore, it is critical to evaluate these schemes on unbiased, real-world datasets to determine their efficacy in realistic operating conditions. In this paper, we evaluate six diverse IA schemes on four independently collected datasets from over 300 participants. We first evaluate these schemes in terms of: accuracy; training time and delay on real-world datasets; detection delay; processing and memory complexity for feature extraction, training and classification operations; vulnerability to mimicry attacks; and deployment issues on mobile platforms. We also leverage our real-world device usage traces to determine the proportion of time these schemes are able to afford protection to device owners. Based on our evaluations, we identify: 1) promising IA schemes with high detection accuracy, low performance overhead, and near real-time detection delays, 2) common pitfalls in contemporary IA evaluation methodology, and 3) open challenges for IA research. Finally, we provide an open source implementation of the IA schemes evaluated in this work that can be used for performance benchmarking by future IA research.