🚀 Convert audio files to text instantly using Google's Gemini API. Supports MP3, WAV, M4A, OGG, and OPUS formats with zero external dependencies. Choose between direct API key authentication or Google Cloud's Vertex AI—the script automatically detects your setup and picks the fastest transcription model.

💡 Perfect for transcribing voice messages, podcast clips, meeting recordings, and Telegram voice notes. Ideal for chatbots, accessibility features, and automated content processing. Works seamlessly with Clawdbot media workflows.

✨ Lightning-fast gemini-2.0-flash-lite as default, with flexible model selection for quality vs. speed tradeoffs. Secure authentication options and simple one-command usage make it accessible for developers of all levels.

Gemini Speech-to-Text

Requirements