When learning from actions, language can be a crucial source to specify the learning content. Understanding its interactions with action processing is therefore fundamental when attempting to model the development of human learning to replicate it in artificial agents. From early childhood, two different processes participate in shaping infants’ understanding of the events occurring around them: Infants’ motor system influences their action perception, driving their attention to the action goal; additionally, parental language influences the way children parse what they observe into relevant units. To date, however, it has barely been investigated whether these two cognitive processes – action understanding and language – are separate and independent or whether language might interfere with the former. To address this question, we evaluated whether a verbal narrative concurrent with action observation could avert 14-month-old infants’ attention from an agent’s action goal, which is otherwise naturally selected when the action is performed by an agent. The infants observed movies of an actor reaching and transporting balls into a box. In three between-subject conditions, the reaching movement was accompanied either with no audio (Base condition), a sine-wave sound (Sound condition), or a speech sample (Speech condition). The results show that the presence of a speech sample underlining the movement phase reduced significantly the number of predictive gaze shifts to the goal compared to the other conditions. Our findings thus indicate that any modeling of the interaction between language and action processing will have to consider a potential top-down effect of the former, as language can be a meddler in the predictive behavior typical of the observation of goal-oriented actions.