S2ST-Distill

Distill multilingual speech-to-speech translation models into lightweight single language-pair models for on-device real-time inference.

🎯 Goal

Take a large multilingual S2ST model (e.g., SeamlessM4T with 100+ languages) and distill it into a tiny single language-pair model (e.g., EN→ZH only) that can:

Run on mobile devices (iOS/Android)
Achieve < 2.5s end-to-end latency for real-time interpretation
Maintain natural-sounding voice with preserved speaker timbre and prosody
Fit in 20-50MB for on-demand download

✨ Features

Language-pair pruning: Remove unnecessary language embeddings and parameters
Knowledge distillation: Transfer knowledge from teacher to compact student model
Layer pruning: Iteratively remove unimportant layers based on importance scores
Voice preservation: Maintain speaker identity and prosody in translated speech
Quantization: INT8/INT4 quantization for smaller model size
Mobile deployment: Export to CoreML (iOS) and TFLite (Android)

📊 Target Metrics

Metric	Target	Notes
Model Size	20-50 MB	Single language-pair, quantized
Inference Latency	< 300 ms	Neural network computation only
End-to-End Latency	< 2.5 s	Including algorithmic lookahead
Translation Quality	BLEU 28+	Compared to reference translations
Voice Naturalness	MOS 3.5+	Mean Opinion Score (1-5 scale)
Voice Similarity	> 0.75	Cosine similarity of speaker embeddings

🎮 Try the Demo

Test the distilled models with our web interface:

# Clone and install
git clone https://github.com/Elarwei001/s2st-distill.git
cd s2st-distill
pip install -r demo/requirements.txt

# Run the demo
python demo/app.py

# Open http://localhost:7860 in your browser

Features:

🎙️ Record audio from microphone
📁 Upload audio files
🌐 Choose translation direction (EN↔ZH, ZH↔FR)
🔊 Instant playback of translated speech

See demo/README.md for more details.

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/Elarwei001/s2st-distill.git
cd s2st-distill

# Create virtual environment
conda create -n s2st python=3.10
conda activate s2st

# Install dependencies
pip install -r requirements.txt

Basic Usage

from s2st_distill import S2STDistiller

# Initialize distiller with base model
distiller = S2STDistiller(
    base_model="facebook/seamless-m4t-unity-small",
    source_lang="eng",
    target_lang="cmn"
)

# Run distillation pipeline
student_model = distiller.distill(
    train_dataset="path/to/train.json",
    num_epochs=10,
    target_size_mb=30
)

# Export for mobile
distiller.export_coreml("model.mlpackage")  # iOS
distiller.export_tflite("model.tflite")      # Android

📁 Project Structure

s2st-distill/
├── demo/                    # 🎮 Web UI Demo
│   ├── app.py               # Gradio web interface
│   ├── requirements.txt     # Demo dependencies
│   └── README.md            # Demo documentation
├── docs/                    # Documentation
│   ├── TECHNICAL_SPEC.md    # Detailed technical specification
│   ├── ARCHITECTURE.md      # Model architecture overview
│   └── DEPLOYMENT.md        # Mobile deployment guide
├── s2st_distill/            # Main package
│   ├── __init__.py
│   ├── distiller.py         # Main distillation pipeline
│   ├── pruning.py           # Language and layer pruning
│   ├── voice_preserve.py    # Speaker/prosody preservation
│   ├── quantize.py          # Quantization utilities
│   └── export.py            # Mobile export (CoreML/TFLite)
├── scripts/                 # Utility scripts
│   ├── train.py             # Training script
│   ├── evaluate.py          # Evaluation script
│   └── benchmark.py         # Latency benchmark
├── models/                  # Trained models (after distillation)
│   ├── en_zh/               # English → Chinese
│   ├── zh_en/               # Chinese → English
│   ├── zh_fr/               # Chinese → French
│   └── fr_zh/               # French → Chinese
├── tests/                   # Unit tests
├── examples/                # Example notebooks
├── requirements.txt
├── setup.py
└── README.md

📖 Documentation

Technical Specification - Detailed implementation guide
Architecture Overview - Model architecture and design decisions
Deployment Guide - iOS and Android deployment instructions

🔬 How It Works

1. Language-Pair Pruning

Remove embeddings and parameters for languages not in the target pair, reducing model size by ~60%.

2. Knowledge Distillation

Use the original model as teacher to train a smaller student model, preserving translation quality.

3. Layer Pruning

Iteratively remove least important layers based on importance scores computed from validation loss.

4. Voice Preservation

Speaker Encoder: Extract speaker embeddings to preserve voice identity
Prosody Transfer: Transfer pitch, duration, and energy patterns from source to target

5. Quantization

Apply INT8/INT4 quantization to further reduce model size with minimal quality loss.

📚 References

SeamlessM4T - Meta's multilingual S2ST model
SimulTron - Google's on-device simultaneous S2ST
CULL-MT - Language and layer pruning for MT
SeamlessExpressive - Expressive speech translation

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Meta AI for SeamlessM4T
Google Research for SimulTron and real-time S2ST research
The open-source speech processing community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S2ST-Distill

🎯 Goal

✨ Features

📊 Target Metrics

🎮 Try the Demo

🚀 Quick Start

Installation

Basic Usage

📁 Project Structure

📖 Documentation

🔬 How It Works

1. Language-Pair Pruning

2. Knowledge Distillation

3. Layer Pruning

4. Voice Preservation

5. Quantization

📚 References

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
cascade		cascade
demo		demo
docs		docs
s2st_distill		s2st_distill
scripts		scripts
seamless_communication		seamless_communication
web		web
.gitignore		.gitignore
LICENSE		LICENSE
MODAL_GUIDE.md		MODAL_GUIDE.md
PROGRESS.md		PROGRESS.md
README.md		README.md
modal_train.py		modal_train.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

S2ST-Distill

🎯 Goal

✨ Features

📊 Target Metrics

🎮 Try the Demo

🚀 Quick Start

Installation

Basic Usage

📁 Project Structure

📖 Documentation

🔬 How It Works

1. Language-Pair Pruning

2. Knowledge Distillation

3. Layer Pruning

4. Voice Preservation

5. Quantization

📚 References

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages