Coming soon: Code & weights will be released upon SAM3's public release

EfficientSAM3: Progressive Hierarchical Knowledge Distillation from SAM1, SAM2 & SAM3

Chengxi Simon Zeng, Yuxuan Jiang, Aaron Zhang · Visual Information Lab, University of Bristol

GitHub 🤗 Weights (Coming soon) Model Zoo arXiv

Why EfficientSAM3?

SAM3 delivers Promptable Concept Segmentation (PCS) by combining semantic understanding and temporal tracking, yet its massive backbone and dense memory bank make on-device deployment impractical. EfficientSAM3 compresses SAM1, SAM2, and SAM3 into a family of lightweight student models tailored for edge hardware without sacrificing PCS quality.

Updates

2025-10-18: Project announced. Code and weights will be published once SAM3 code is publicly available.

Highlights

Promptable concept segmentation distilled into RepViT, TinyViT, and EfficientViT families.
Perceiver-based memory compression aligned with SAM2 temporal tracking.
ONNX/CoreML support for real-time mobile, embedded, and desktop deployment.

Resources

Project README (installation, inference, training)
Dataset setup guide
arXiv

Abstract

SAM3 brought promptable concept segmentation to production scale, but its computational footprint blocks latency-sensitive applications. EfficientSAM3 progressively distills SAM3 into lightweight architectures that maintain PCS quality on edge devices.

We employ a three-stage curriculum: (1) encoder distillation on SA-1B with prompt-in-the-loop supervision, (2) temporal memory distillation on SA-V using a compact Perceiver module, and (3) end-to-end fine-tuning on official SAM3 concept segmentation data. The resulting students deliver real-time segmentation, tracking, and prompt handling on resource-constrained platforms.

Three-Stage Progressive Distillation

Stage 1 · Compact Encoder

Align nine student backbones (RepViT, TinyViT, EfficientViT) with the SAM3 encoder using SA-1B and prompt-in-the-loop supervision.

Stage 2 · Temporal Memory

Compress SAM3's dense video memory into a Perceiver-based module distilled on SA-V, enabling efficient multi-frame reasoning.

Stage 3 · Promptable PCS

Jointly fine-tune encoder, memory, and decoder on SAM3 data to preserve promptable concept segmentation quality.

tl;dr: Stage 1 distills encoder on SAM1 data · Stage 2 aligns memory on SAM2 data · Stage 3 fine-tunes PCS on SAM3 data.

Get Started

Installation

Coming soon: setup instructions will be provided once SAM3 is publicly available.

Inference Example

Coming soon: examples will be provided once code and weights are released.

EfficientSAM3 Model Zoo & Weight Release

Code and weights are not yet released. They will be published once SAM3 code is publicly available.

Model	Backbone	Parameters	Stage 1	Stage 2	Stage 3
ES-RV-S	RepViT-M0.9	5.1M	Planned	Planned	Planned
ES-RV-M	RepViT-M1.1	6.8M	Planned	Planned	Planned
ES-RV-L	RepViT-M2.3	8.2M	Planned	Planned	Planned
ES-TV-S	TinyViT-5M	5.4M	Planned	Planned	Planned
ES-TV-M	TinyViT-11M	11M	Planned	Planned	Planned
ES-TV-L	TinyViT-21M	21M	Planned	Planned	Planned
ES-EV-S	EfficientViT-B0	0.7M	Planned	Planned	Planned
ES-EV-M	EfficientViT-B1	4.8M	Planned	Planned	Planned
ES-EV-L	EfficientViT-B2	15M	Planned	Planned	Planned

Datasets

Dataset preparation scripts for COCO, DAVIS, LVIS, SA-1B, SA-V, LVOS, MOSE, and YouTube-VOS are located under data/download_*.sh. Refer to README_dataset.md for detailed instructions.

Export & Deployment

ONNX and CoreML export pipelines are under development to unlock mobile and cross-platform deployment. Follow the repository issues for progress updates.

Roadmap

Planned Release Stage 1 encoder weights (pending SAM3 public release)
Planned Release Stage 2 memory bank aligned models
Planned Release Stage 3 fine-tuned PCS models
Planned ONNX/CoreML export
Planned Interactive web demo

Call for Contributions

We welcome pull requests across the ecosystem:

Efficient MedSAM3 integration and medical datasets
Gradio demos, Vercel deployments, and Hugging Face Spaces
Annotation tool support (X-AnyLabeling, AnyLabeling)
iOS, Android, and NVCC-based desktop applications

Citation

@misc{efficientsam3,
  title={EfficientSAM3: Progressive Hierachical Knowledge Distillation (PhD) from SAM1, 2 and 3},
  author={Zeng, Chengxi Simon and Jiang, Yuxuan and Zhang, Aaron},
  institution={University of Bristol},
  year={2025},
  howpublished={https://github.com/SimonZeng7108/efficientsam3}
}