Whisper UI screenshot

Whisper UI

Author Avatar Theme by Bits by brandon
Updated: 21 Oct 2025
139 Stars

A GUI interface for Open AI Whisper based on Tauri and Sveltekit

Categories

Overview:

Whiskey is a graphical user interface (GUI) for Open AI’s Whisper speech recognition system. It is built using Tauri and Sveltekit technologies and utilizes C++ binaries for Whisper. Whiskey provides users with the ability to transcribe audio or video files into written text, with real-time text highlighting during playback. It also offers features such as exporting transcriptions as .txt or .vtt files. This article will provide an analysis of Whiskey’s key features, installation guide, and a summary of its capabilities.

Features:

  • Transcribe audio or video files into written text: Whiskey allows users to convert audio or video files into written text using the Whisper speech recognition system.
  • Real-time text highlighting during playback: The GUI provides a feature that highlights the transcribed text in real-time while playing back the audio or video file.
  • Export transcriptions as .txt or .vtt files: Whiskey allows users to export the transcriptions as .txt or .vtt files, providing flexibility for further use or sharing.

Planned features:

  • Export files
  • Rename files
  • Save already opened files
  • Upload more than wav files
  • Upload video
  • Drag and drop
  • Start audio playback from line
  • Record mic audio directly
  • Apple Silicon, Linux, and Windows binaries
  • Editable text
  • Event and errors show in UI
  • Prediction accuracy

Installation:

To install Whiskey, follow the steps below:

  1. Clone the Whiskey repository from GitHub:

    git clone https://github.com/whiskey/whiskey.git
    
  2. Install the required dependencies:

    cd whiskey
    npm install
    
  3. Build the application:

    npm run build
    
  4. Run the application:

    npm run start
    
  5. Access Whiskey by navigating to http://localhost:5000 in your web browser.

Summary:

Whiskey is a user-friendly GUI for Open AI’s Whisper speech recognition system. It is built using Tauri and Sveltekit technologies and provides users with the ability to transcribe audio or video files into written text. The GUI offers real-time text highlighting during playback and allows for the export of transcriptions as .txt or .vtt files. With planned features like file renaming, drag and drop functionality, and support for different platforms, Whiskey aims to enhance the user experience and expand its usability. Overall, Whiskey is a powerful tool for transcription tasks that is easy to install and use.