CNS: Correspondence Encoded
Neural Image Servo Policy

Anzhe Chen#, Hongxiang Yu#, Rong Xiong, Yue Wang*
Zhejiang University
#Equal Contribution
  *Corresponding Author

Case 1: static scene, small initial pose error

Case 2: static scene, large initial pose error

Case 3: dynamic scene

Case 4: dynamic object

Abstract


Image servo is an indispensable technique in robotic applications that helps to achieve high precision positioning. The intermediate representation of image servo policy is important to sensor input abstraction and policy output guidance. Classical approaches achieve high precision but require clean keypoint correspondence, and suffer from limited convergence basin or weak feature error robustness. Recent learning-based methods achieve moderate precision and large convergence basin on specific scenes but face issues when generalizing to novel environments. In this paper, we encode keypoints and correspondence into a graph and use graph neural network as architecture of controller. This design utilizes both advantages: generalizable intermediate representation from keypoint correspondence and strong modeling ability from neural network. Other techniques including realistic data generation, feature clustering and distance decoupling are proposed to further improve efficiency, precision and generalization. Experiments in simulation and real-world verify the effectiveness of our method in speed (maximum 40fps along with observer), precision (<0.3° and sub-millimeter accuracy) and generalization (sim-to-real without fine-tuning).

Network Architecture

We encode matched keypoints into a graph and build neural controller with graph neural networks. This allows us to model arbitrary number of detected keypoints with time-varing structure.

network structure

Environment Setup

We benchmark our method in two simulation environments and one real-world environment.

experimental setup

Sampled Trajectories

We visualize several success cases of IBVS and CNS in real-world experiments. In each case, we plot the overview trajectories and their zoomed-in views around the desired pose. Next to the 3D trajectories are images obtained from initial and desired poses, and the gray-scale images representing the photometric error between the desired image and images obtained from the final poses guided by IBVS and CNS.

3d trajectories

BibTeX


        @misc{chen2023cns,
          title={CNS: Correspondence Encoded Neural Image Servo Policy}, 
          author={Anzhe Chen and Hongxiang Yu and Yue Wang and Rong Xiong},
          year={2023},
          eprint={2309.09047},
          archivePrefix={arXiv},
          primaryClass={cs.RO}
        }