In this paper we present an on-manifold sequence-to sequence learning approach to motion estimation using visual and inertial sensors. It is to the best of our knowledge the first end-to-end trainable method for visual-inertial odometry which performs fusion of the data at an intermediate feature-representation level. Our method has numerous advantages over traditional approaches.Specifically, it eliminates the need for tedious manual synchronization of the camera and IMU as well as eliminating the need for manual calibration between the IMU and camera. A further advantage is that our model naturally and elegantly incorporates domain specific information which significantly mitigates drift. We show that our approach is competitive with state-of-the art traditional methods when accurate calibration data is available and can be trained to outperform them in the presence of calibration and synchronization errors.
|Title of host publication||Proceedings of the Thirty-First AAAI Conference On Artificial Intelligence|
|Number of pages||7|
|Publication status||Published - 13 Feb 2017|
|Name||Proceedings of the AAAI Conference On Artificial Intelligence|
Clark, R., Wang, S., Wen, H., Markham, A., & Trigoni, N. (2017). VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem. In Proceedings of the Thirty-First AAAI Conference On Artificial Intelligence (pp. 3995-4001). (Proceedings of the AAAI Conference On Artificial Intelligence ). AAAI Press. https://arxiv.org/abs/1701.08376v1