# 🎤 Urdu Speech Intent Recognition using Whisper A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation. ## ✨ Features - **🎙️ Urdu Speech Transcription**: Accurate transcription of Urdu audio using Whisper - **🌐 Built-in Translation**: Direct Urdu-to-English translation using Whisper's translation capability - **🎯 Intent Detection**: Identifies user intent from conversation (questions, requests, commands, etc.) - **😊 Sentiment Analysis**: Basic sentiment detection (positive/negative/neutral) - **📊 Confidence Scoring**: Provides confidence scores for both transcription and intent detection - **🔧 Multiple Model Sizes**: Support for base, medium, large-v1, large-v2, large-v3 Whisper models - **💾 JSON Export**: Option to save results in structured JSON format - **🎵 Multi-format Support**: Works with MP3, WAV, M4A, FLAC, and other common audio formats ## 📋 Supported Intents The system can detect the following intents: | Intent | Description | Example Keywords | |--------|-------------|------------------| | **greeting** | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" | | **question** | Asking questions | "کیا", "کب", "کیوں", "کسے" | | **request** | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" | | **command** | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" | | **complaint** | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" | | **information** | Seeking information | "بتائیں", "جانیں", "تفصیل" | | **emergency** | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" | | **appointment** | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" | | **farewell** | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" | | **thanks** | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" | ## 🚀 Quick Start ### Installation 1. **Clone the repository:** ```bash git clone https://github.com/yourusername/urdu-intent-recognition.git cd urdu-intent-recognition ``` 2. **Install required packages:** ```bash pip install openai-whisper torch torchaudio ``` 3. **Install FFmpeg (required for audio processing):** - **Ubuntu/Debian:** `sudo apt-get install ffmpeg` - **macOS:** `brew install ffmpeg` - **Windows:** Download from [ffmpeg.org](https://ffmpeg.org/download.html) ### Basic Usage ```bash # Process an Urdu audio file python urdu_intent_extractor.py path/to/your/audio.mp3 # Use a larger model for better accuracy python urdu_intent_extractor.py audio.mp3 --model medium # Save results to JSON file python urdu_intent_extractor.py audio.mp3 --output results.json ``` ## 📖 Detailed Usage ### Command Line Arguments ```bash python urdu_intent_extractor.py AUDIO_FILE [OPTIONS] ``` **Arguments:** - `AUDIO_FILE`: Path to the audio file to process (required) **Options:** - `--model`: Whisper model size (default: "base") - Choices: `tiny`, `base`, `small`, `medium`, `large` - Larger models are more accurate but slower - `--output`: Save results to JSON file - `--quiet`: Minimal console output - `--help`: Show help message ### Python API Usage You can also use the tool programmatically: ```python from urdu_intent_extractor import UrduIntentExtractor # Initialize the extractor extractor = UrduIntentExtractor(model_size="base") # Process an audio file results = extractor.process_audio_file("path/to/audio.mp3") # Access results print(f"Urdu Transcription: {results['transcription']['urdu']}") print(f"English Translation: {results['transcription']['english']}") print(f"Detected Intent: {results['intent']['type']}") print(f"Intent Confidence: {results['intent']['confidence']:.1%}") print(f"Sentiment: {results['sentiment']['type']}") ``` ### Example Output ``` ============================================================== URDU SPEECH INTENT ANALYSIS RESULTS ============================================================== 📁 File: conversation.mp3 🗣️ URDU TRANSCRIPTION: السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں 🌐 ENGLISH TRANSLATION: Hello, I want to ask you a question 🎯 DETECTED INTENT: ❓ Asking a question or seeking clarification Confidence: 85.0% Urdu keywords found: سوال English keywords found: question 😊 SENTIMENT: NEUTRAL Confidence: 50.0% ============================================================== ``` ## 🏗️ Architecture The system works in three main steps: 1. **Transcription**: Whisper transcribes the Urdu audio to Urdu text 2. **Translation**: Whisper translates the Urdu text to English (using `task="translate"`) 3. **Intent Analysis**: Analyzes both Urdu and English text for intent keywords ### Intent Detection Algorithm 1. **Bilingual Keyword Matching**: Checks for intent keywords in both Urdu and English text 2. **Scoring System**: Assigns scores based on keyword matches 3. **Confidence Calculation**: Calculates confidence based on match frequency and text length 4. **Sentiment Analysis**: Basic sentiment detection using positive/negative keywords ## 📊 Model Performance | Model Size | Speed | Accuracy | GPU Memory | Best Use Case | |------------|-------|----------|------------|---------------| | **base** | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping | | **medium** | 🚀 Good | 🟢 Good | ~1GB | General purpose | | **large-v1** | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed | | **large-v2** | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required | | **large-v3** | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production | ## 🔧 Advanced Configuration ### Custom Intent Keywords You can extend or modify the intent keywords by editing the `intent_keywords` dictionary in the code: ```python self.intent_keywords = { "custom_intent": { "urdu": ["کلیدی لفظ", "دوسرا لفظ"], "english": ["keyword1", "keyword2"] }, # ... existing intents } ``` ### GPU Acceleration The tool automatically uses GPU if available. To force CPU usage: ```python # In the code, remove fp16 parameter: result = self.model.transcribe( audio_path, language="ur", task="translate" # Remove: fp16=torch.cuda.is_available() ) ``` ## 📝 Example Use Cases 1. **Customer Service**: Automatically categorize customer calls 2. **Voice Assistants**: Understand user commands in Urdu 3. **Healthcare**: Triage patient concerns based on urgency 4. **Education**: Analyze student questions in online learning 5. **Business Analytics**: Understand customer feedback from calls ## 🐛 Troubleshooting ### Common Issues 1. **"Audio file not found"** - Ensure the file path is correct - Check file permissions 2. **Poor transcription quality** - Try a larger Whisper model (`--model medium`) - Ensure clear audio quality - Check if audio contains Urdu speech 3. **Slow processing** - Use smaller model (`--model tiny` or `--model base`) - Ensure GPU is available and properly configured - Reduce audio file size or duration 4. **FFmpeg errors** - Reinstall FFmpeg - Ensure FFmpeg is in system PATH ### Debug Mode For debugging, you can enable more verbose output by modifying the code: ```python # Set verbose=True in transcribe calls result = self.model.transcribe( audio_path, language="ur", task="translate", verbose=True, # Add this line fp16=torch.cuda.is_available() ) ```