CARMA Colloquium

2:00 pm

Tuesday, 21st May 2019

SR202, SR Building

Dr Fred Roosta

(The University of Queensland)

Newton-MR: Newton's Method Without Smoothness or Convexity with Applications to Machine Learning

Recently, second-order methods have shown great success in a variety of machine learning applications. However, establishing convergence of the canonical member of this class, i.e., the celebrated Newton's method, has long been limited to making restrictive assumptions on (strong) convexity. Furthermore, smoothness assumptions, such as Lipschitz continuity of the gradient/Hessian, have always been an integral part of the analysis. In fact, it is widely believed that in the absence of well-behaved and continuous Hessian, the application of curvature can hurt more so that it can help. This has in turn limited the application range of the classical Newton’s method in machine learning. To set the scene, we first briefly highlight some recent results, which shed light on the advantages of Newton-type methods for machine learning, as compared with first-order alternatives. We then turn our focus to a new member of this class, Newton-MR, which is derived using two seemingly simple modifications of the classical Newton’s method. We show that, unlike the classical Newton’s method, Newton-MR can be applied, beyond the traditional convex settings, to invex problems. Newton-MR appears almost indistinguishable from its classical counterpart, yet it offers a diverse range of algorithmic and theoretical advantages. Furthermore, by introducing a weaker notion of joint regularity of Hessian and gradient, we show that Newton-MR converges globally even in the absence of the traditional smoothness assumptions. Finally, we obtain local convergence results in terms of the distance to the set of optimal solutions. This greatly relaxes the notion of “isolated minimum”, which is required for the local convergence analysis of the classical Newton’s method. Numerical simulations using several machine learning problems demonstrate the great potential of Newton-MR as compared with several other second-order methods.