Positioning with visual sensors in indoor environments has many advantages: it doesn't require infrastructure or accurate maps, and is more robust and accurate than other modalities such as WiFi. However, one of the biggest hurdles that prevents its practical application on mobile devices is the time-consuming visual processing pipeline. To overcome this problem, this paper proposes a novel lifelong learning approach to enable efficient and real-time visual positioning. We explore the fact that when following a previous visual experience for multiple times, one could gradually discover clues on how to traverse it with much less effort, e.g. which parts of the scene are more informative, and what kind of visual elements we should expect. Such second-order information is recorded as parameters, which provide key insights of the context and empower our system to dynamically optimise itself to stay localised with minimum cost. We implement the proposed approach on an array of mobile and wearable devices, and evaluate its performance in two indoor settings. Experimental results show our approach can reduce the visual processing time up to two orders of magnitude, while achieving sub-metre positioning accuracy.