๐ŸŽฅ Dolphin: Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Hierarchical Top-Down Attention

Automatically detect and separate ALL speakers in your video

Simply upload a video and the system will:

  1. ๐Ÿ” Automatically detect all speakers in the video
  2. ๐ŸŽญ Show you each detected speaker's face
  3. ๐ŸŽฌ Generate individual videos for each speaker with their isolated audio

๐ŸŽฌ Try with Example Video

Click to load example video

๐Ÿ‘‡ Click on any face below to view that speaker's video


๐Ÿ“– How it works:

  1. Upload - Select any video file
  2. Process - Click the button to start automatic detection
  3. Review - See all detected speakers and their positions
  4. Select - Click on any face to watch that speaker's separated video

๐Ÿ’ก Tips for best results:

  • โœ… Ensure all speakers' faces are visible in the first frame
  • โœ… Use videos with good lighting and clear face views
  • โœ… Works best with frontal or near-frontal face angles
  • โฑ๏ธ Processing time depends on video length and number of speakers

๐Ÿš€ Powered by:

  • RetinaFace for face detection
  • Dolphin model for audio-visual separation
  • GPU acceleration when available