Virtual human motion driving focuses on generating and controlling realistic human motions, from facial expressions to body movements. These motions are driven by various types of input signals, such as visual and acoustic features,textual prompts, or a combination thereof. This survey delivers an in-depth examination of generative models for virtual human motion driving, with a specific emphasis on recent models. A taxonomy of virtual human motion driving networks designed for talking-face and human-pose generation is provided. The former mainly concentrates on lip synchronization,differentiation of emotions, and personalized expressions, while the latter mainly includes co-speech gesture generation and text-to-motion prediction. Moreover, available datasets and evaluation metrics for virtual human motion driving tasks are discussed, applications and real products related to virtual human motion driving are explored, along with their challenges,limitations, and potential future developments. The objective of this survey is to gain a comprehensive understanding of the present advancements in talking-face and human-pose generation models, with a focus on the future potential of virtual human motion driving. This endeavor aims to lay the groundwork for the development of extensive applications for virtual humans.