SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing

Jesse Haworth1, Juo-Tung Chen1, Nigel Nelson2, Ji Woong Kim3, Masoud Moghani2,4, Chelsea Finn3, Axel Krieger1
1Johns Hopkins University, 2NVIDIA, 3Stanford University, 4University of Toronto
Conference on Neural Information Processing Systems (NeurIPS) 2025
Johns Hopkins University NVIDIA Stanford University University of Toronto

Autonomous Suturing Demonstration

Complete end-to-end autonomous suturing demonstration on the dVRK platform, showcasing the full pipeline from needle pickup through tissue penetration to secure knot tying.

Abstract

Robotic suturing is a prototypical long-horizon dexterous manipulation task, requiring coordinated needle grasping, precise tissue penetration, and secure knot tying.

Despite numerous efforts toward end-to-end autonomy, a fully autonomous suturing pipeline has yet to be demonstrated on physical hardware. We introduce SutureBot: an autonomous suturing benchmark on the da Vinci Research Kit (dVRK), spanning needle pickup, tissue insertion, and knot tying. To ensure repeatability, we release a high-fidelity dataset comprising 1,890 suturing demonstrations.

Furthermore, we propose a goal-conditioned framework that explicitly optimizes insertion-point precision, improving targeting accuracy by 59%-74% over a task-only baseline. To establish this task as a benchmark for dexterous imitation learning, we evaluate state-of-the-art vision-language-action (VLA) models, including π0, GR00T N1, OpenVLA-OFT, and multitask ACT, each augmented with a high-level task-prediction policy. Autonomous suturing is a key milestone toward achieving robotic autonomy in surgery. These contributions support reproducible evaluation and development of precision-focused, long-horizon dexterous manipulation policies necessary for end-to-end suturing.

System Architecture

Overview of the precision-conditioned control framework for long-horizon, dexterous surgical tasks.

Overview of SutureBot's precision-conditioned control framework for long-horizon, dexterous surgical tasks. Image observations are processed by a high-level language policy, which selects the current task and generates the associated language condition. The user specifies target needle insertion and exit points via a graphical interface, which is used to generate the goal condition. These inputs, language condition, goal condition, and real-time kinematic data, are then processed by the low-level policy to produce precise, continuous control commands for the robot.

Methodology

Suturing Task Decomposition

We decompose the complete suturing procedure into three sequential subtasks: (1) needle pickup, (2) needle throw (tissue penetration), and (3) knot tying. This decomposition enables focused data collection, policy training, and systematic evaluation of each critical phase.

The suturing procedure broken into three tasks: needle pickup, needle throw, and knot tie.

Precision-Conditioned Control

Our goal-conditioned framework allows control of insertion-point precision through goal-conditioning, achieving 59%-74% improvement in targeting accuracy compared to task-only baselines. We use three goal condition representations: point label, binary mask, and distance map.

Goal condition representations: Point Label, Binary Mask, Distance Map, and None.

Experimental Setup

Hardware Platform

Our experiments are conducted on the da Vinci Research Kit (dVRK) Si version, featuring dual-arm robotic manipulation capabilities. The setup includes a Soft Tissue Suture Pad as the task surface, wrist-mounted cameras for close-up manipulation, an endoscope for global scene observation, and specialized robot grippers. Data collection focuses on wound one, with wounds two through six reserved for generalization testing.

Experimental setup showing the Da Vinci Research Kit (dVRK), RCM fixture, and suture pad with wound types.

Dataset & Precision Evaluation

We release a comprehensive dataset of 1,890 high-fidelity suturing demonstrations collected across multiple sessions. The dataset includes multi-modal observations from wrist cameras and endoscope, along with precise action sequences for each suturing subtask. Precision evaluation uses UV-marked insertion points to quantify targeting accuracy.

Precision measurement using invisible UV markers, from marking the wound to measuring the error post-execution.

Results

Precision Targeting Performance

Our precision-conditioned control framework demonstrates significant improvements in insertion-point targeting accuracy, achieving 59%-74% better precision compared to task-only baselines. The evaluation shows that the point label goal condition achieves the highest precision on both ACT and π0.

Table 1: Success rates and precision results for different goal conditions on the suturing procedure.

Vision-Language-Action Model Benchmark

We establish SutureBot as a comprehensive benchmark for dexterous manipulation by evaluating state-of-the-art vision-language-action (VLA) models including π-O, GR00T N1, OpenVLA-OFT, and multitask ACT. Each model is augmented with our high-level task-prediction policy, demonstrating the framework's versatility and establishing new benchmarks for autonomous surgical robotics.

Table 2: Success rates and precision results of the evaluated models on the suturing procedure.

Citation

@inproceedings{haworth2025suturebot,
  author    = {Haworth, Jesse and Chen, Juo-Tung and Nelson, Nigel and Kim, Ji Woong and Moghani, Masoud and Finn, Chelsea and Krieger, Axel},
  title     = {SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  year      = {2025},
}