FG-MDM: Towards Zero-Shot Human Motion Generation via Fine-Grained Descriptions

FG-MDM: Towards Zero-Shot Human Motion Generation via ChatGPT-Refined Descriptions

ICPR2024

¹Nanjing University of Science and Technology ²Shandong University ³Institute of Automation, Chinese Academy of Sciences ⁴Beijing Normal University

Abstract

Recently, significant progress has been made in text-based motion generation, enabling the generation of diverse and high-quality human motions that conform to textual descriptions. However, generating motions beyond the distribution of original datasets remains challenging, i.e., zero-shot generation. By adopting a divide-and-conquer strategy, we propose a new framework named Fine-Grained Human Motion Diffusion Model (FG-MDM) for zero-shot human motion generation. Specifically, we first parse previous vague textual annotations into fine-grained descriptions of different body parts by leveraging a large language model. We then use these fine-grained descriptions to guide a transformer-based diffusion model, which further adopts a design of part tokens. FG-MDM can generate human motions beyond the scope of original datasets owing to descriptions that are closer to motion essence. Our experimental results demonstrate the superiority of FG-MDM over previous methods in zero-shot settings. We will release our fine-grained textual annotations for HumanML3D and KIT on the project page.

@article{shi2023generating, title={Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions}, author={Shi, Xu and Luo, Chuanchen and Peng, Junran and Zhang, Hongwen and Sun, Yunlian}, journal={arXiv preprint arXiv:2312.02772}, year={2023} }

FG-MDM: Towards Zero-Shot Human Motion Generation via ChatGPT-Refined Descriptions

Abstract

Method Overview

Stylied Text-to-Motion

Fine-Grained Text-to-Motion

Compared With MDM and MotionDiffuse

BibTeX