In this study, we established an advanced learning environment that aims to promote learning of social communication skills for children especially those with Autism Spectrum Disorder. The learning environment estimates in real time the affective & cognitive state of a child via recognition of multimodal social signals, and generates interactive narratives with embodied virtual characters. One of the key components of the environment is the Visual Inputs Processor, which is the first that has the capabilities of detecting a child's attention and expression simultaneously in a natural environment. Furthermore, those have been enabled in a nonintrusive manner which avoids potential bias to the user's behaviours introduced by intrusive counterparts. The environment also employs multiple inexpensive cameras and a large multi-touch screen, and these settings (1) maximize of space and angles in which observation may be performed and (2) provide users with a much more realistic experience than counterpart approaches do. © 2010 Springer-Verlag Berlin Heidelberg.