We address the problem that different users have different lexical knowledge about problem domains, so that automated dialogue systems need to adapt their generation choices online to the users' domain knowledge as it encounters them. We approach this problem using Reinforcement Learning in Markov Decision Processes (MDP). We present a reinforcement learning framework to learn adaptive referring expression generation (REG) policies that can adapt dynamically to users with different domain knowledge levels. In contrast to related work we also propose a new statistical user model which incorporates the lexical knowledge of different users. We evaluate this framework by showing that it allows us to learn dialogue policies that automatically adapt their choice of referring expressions online to different users, and that these policies are significantly better than hand-coded adaptive policies for this problem. The learned policies are consistently between 2 and 8 turns shorter than a range of different hand-coded but adaptive baseline REG policies.