๐ฅ Dolphin: Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Hierarchical Top-Down Attention
Automatically detect and separate ALL speakers in your video
Simply upload a video and the system will:
- ๐ Automatically detect all speakers in the video
- ๐ญ Show you each detected speaker's face
- ๐ฌ Generate individual videos for each speaker with their isolated audio
๐ฌ Try with Example Video
Click to load example video
๐ Click on any face below to view that speaker's video
๐ How it works:
- Upload - Select any video file
- Process - Click the button to start automatic detection
- Review - See all detected speakers and their positions
- Select - Click on any face to watch that speaker's separated video
๐ก Tips for best results:
- โ Ensure all speakers' faces are visible in the first frame
- โ Use videos with good lighting and clear face views
- โ Works best with frontal or near-frontal face angles
- โฑ๏ธ Processing time depends on video length and number of speakers
๐ Powered by:
- RetinaFace for face detection
- Dolphin model for audio-visual separation
- GPU acceleration when available