Coming soon: Code & weights will be released upon SAM3's public release

EfficientSAM3: Progressive Hierarchical Knowledge Distillation from SAM1, SAM2 & SAM3

Chengxi Simon Zeng, Yuxuan Jiang, Aaron Zhang 路 Visual Information Lab, University of Bristol

Why EfficientSAM3?

SAM3 delivers Promptable Concept Segmentation (PCS) by combining semantic understanding and temporal tracking, yet its massive backbone and dense memory bank make on-device deployment impractical. EfficientSAM3 compresses SAM1, SAM2, and SAM3 into a family of lightweight student models tailored for edge hardware without sacrificing PCS quality.

EfficientSAM3 architecture diagram

Updates

  • 2025-10-18: Project announced. Code and weights will be published once SAM3 code is publicly available.

Highlights

  • Promptable concept segmentation distilled into RepViT, TinyViT, and EfficientViT families.
  • Perceiver-based memory compression aligned with SAM2 temporal tracking.
  • ONNX/CoreML support for real-time mobile, embedded, and desktop deployment.

Resources

Abstract

SAM3 brought promptable concept segmentation to production scale, but its computational footprint blocks latency-sensitive applications. EfficientSAM3 progressively distills SAM3 into lightweight architectures that maintain PCS quality on edge devices.

We employ a three-stage curriculum: (1) encoder distillation on SA-1B with prompt-in-the-loop supervision, (2) temporal memory distillation on SA-V using a compact Perceiver module, and (3) end-to-end fine-tuning on official SAM3 concept segmentation data. The resulting students deliver real-time segmentation, tracking, and prompt handling on resource-constrained platforms.

Three-Stage Progressive Distillation

Stage 1 路 Compact Encoder

Align nine student backbones (RepViT, TinyViT, EfficientViT) with the SAM3 encoder using SA-1B and prompt-in-the-loop supervision.

Stage 2 路 Temporal Memory

Compress SAM3's dense video memory into a Perceiver-based module distilled on SA-V, enabling efficient multi-frame reasoning.

Stage 3 路 Promptable PCS

Jointly fine-tune encoder, memory, and decoder on SAM3 data to preserve promptable concept segmentation quality.

tl;dr: Stage 1 distills encoder on SAM1 data 路 Stage 2 aligns memory on SAM2 data 路 Stage 3 fine-tunes PCS on SAM3 data.

Get Started

Installation

Coming soon: setup instructions will be provided once SAM3 is publicly available.

Inference Example

Coming soon: examples will be provided once code and weights are released.

EfficientSAM3 Model Zoo & Weight Release

Code and weights are not yet released. They will be published once SAM3 code is publicly available.

Model Backbone Parameters Stage 1 Stage 2 Stage 3
ES-RV-S RepViT-M0.9 5.1M Planned Planned Planned
ES-RV-M RepViT-M1.1 6.8M Planned Planned Planned
ES-RV-L RepViT-M2.3 8.2M Planned Planned Planned
ES-TV-S TinyViT-5M 5.4M Planned Planned Planned
ES-TV-M TinyViT-11M 11M Planned Planned Planned
ES-TV-L TinyViT-21M 21M Planned Planned Planned
ES-EV-S EfficientViT-B0 0.7M Planned Planned Planned
ES-EV-M EfficientViT-B1 4.8M Planned Planned Planned
ES-EV-L EfficientViT-B2 15M Planned Planned Planned

Datasets

Dataset preparation scripts for COCO, DAVIS, LVIS, SA-1B, SA-V, LVOS, MOSE, and YouTube-VOS are located under data/download_*.sh. Refer to README_dataset.md for detailed instructions.

Export & Deployment

ONNX and CoreML export pipelines are under development to unlock mobile and cross-platform deployment. Follow the repository issues for progress updates.

Roadmap

  • Planned Release Stage 1 encoder weights (pending SAM3 public release)
  • Planned Release Stage 2 memory bank aligned models
  • Planned Release Stage 3 fine-tuned PCS models
  • Planned ONNX/CoreML export
  • Planned Interactive web demo

Call for Contributions

We welcome pull requests across the ecosystem:

  • Efficient MedSAM3 integration and medical datasets
  • Gradio demos, Vercel deployments, and Hugging Face Spaces
  • Annotation tool support (X-AnyLabeling, AnyLabeling)
  • iOS, Android, and NVCC-based desktop applications

Citation

@misc{efficientsam3,
  title={EfficientSAM3: Progressive Hierachical Knowledge Distillation (PhD) from SAM1, 2 and 3},
  author={Zeng, Chengxi Simon and Jiang, Yuxuan and Zhang, Aaron},
  institution={University of Bristol},
  year={2025},
  howpublished={https://github.com/SimonZeng7108/efficientsam3}
}