
Gemini Speech-to-Text
Transcribe audio files using Google's Gemini API or Vertex AI with support for multiple formats (OGG, MP3, WAV, M4A). Features flexible authentication via ADC or API key, configurable model selection, and optimized performance with gemini-2.0-flash-lite as default.
🚀 Convert audio files to text instantly using Google's Gemini API. Supports MP3, WAV, M4A, OGG, and OPUS formats with zero external dependencies. Choose between direct API key authentication or Google Cloud's Vertex AI—the script automatically detects your setup and picks the fastest transcription model.
💡 Perfect for transcribing voice messages, podcast clips, meeting recordings, and Telegram voice notes. Ideal for chatbots, accessibility features, and automated content processing. Works seamlessly with Clawdbot media workflows.
✨ Lightning-fast gemini-2.0-flash-lite as default, with flexible model selection for quality vs. speed tradeoffs. Secure authentication options and simple one-command usage make it accessible for developers of all levels.