prior works 介绍了如果给 root joint 加 torque 提升模型 robustness,所以 action 再 predict residual forces and torque 给 root
??????这里文章没说 root torque 是怎么分成 15 steps 的,如果没有用 meta-PD 的话,就可能 apply same torque 15 times 来训练也行,可能隐藏在 prior works 里面了,参考 “residual force control for agile human behavior immitation and extended motion synthesis. 2020”
2.2.4 定义 rewards:
是 pose reward, 计算 difference of local joint orientations and ground truth
is number of joints, is relative rotation between two rotations
is velocity reward
is 3d world joint position reward
is 2d projection of joints to match 2d joints
这里选择 multiplication of sub-awards 由 prior works “A scalable approach to control diverse behaviors for physically simulated characters” 显示,保证说 每个reward都不会被 overlooked.
2.3 kinematic aware policy
这里 RL 的 explore 是定义成 normal distribution 来选择 action 的,这里定义一个 mean 和 variance,mean 在最后应该是 optimal action
这里 是 kinematic refinement unit, 这里的 就是 kinematic pose after n iterations of refinement.
是 control generation unit, 通过当前 pose 和 velocity 和 下一帧的 pose,能得到这些值
这里没有直接 regress 到 ,因为 author 说这样 learning easier
2.3.1 kinematic refinement unit
这里定义 MLP , 是 gradient of reprojection loss ,
inspired by prior work “Human body model fitted by learned gradient descent”, 这里不是想 minimize loss,只是想用这个 as informative kinematic feature to learn a pose update 是最后的 pose stable 且 accurate
reprojection loss 定义:
大写 X 是 3d 的,小写是 2d 的,这里就是做了一个 projection,并且乘上了 2d joints uncertainty, 来 account for keypoint uncertainty.
z converted to character’s root coordinate to be invariant of character’s orientation