Teaching machines to comprehend the nuances of emotion from photographs is a particularly challenging task. Emotion perception— naturally a subjective problem, is often simplified for computational purposes into categorical states or valence-arousal dimensional space, the latter being a lesser-explored problem in the literature. This paper proposes a multi-stream context-aware neural network model for dimensional emotion prediction in images. Models were trained using a set of object and scene data along with deep features for valence, arousal, and dominance estimation. Experimental evaluation on a large-scale image emotion dataset demonstrates the viability of our proposed approach. Our analysis postulates that the understanding of the depicted object in an image is vital for successful predictions whilst relying on scene information can lead to somewhat confounding effects.