The model of an environment plays a crucial role in autonomous mobile robots, by providing them with the necessary task-relevant information. As robots become more intelligent, they need a richer and more expressive environment model. This model is a map that contains a structured description of the environment that can be used as the robot’s knowledge for several tasks, such as planning and reasoning. In this work, we propose a framework that allows to capture important environment descriptors, such as functionality and ownership of the robot’s surrounding objects, through verbal interaction. Specifically, we propose a corpus of verbal descriptions annotated with frame-like structures. We use the proposed dataset to train two multi-task neural architectures. We compare the two architectures through an experimental evaluation, discussing the design choices. Finally, we describe the creation of a simple interactive interface with our system, implemented through the trained model. The novelties of this work are: (i) the definition of a new problem, i.e., addressing different object descriptors, that plays a crucial role for the robot’s tasks accomplishment; (ii) a specialized corpus to support the creation of rich Semantic Maps; (iii) the design of different neural architectures, and their experimental evaluation over the proposed dataset; (iv) a simple interface for the actual usage of the proposed resources.