diff --git a/README.md b/README.md index 19f9ac1..89ec954 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,232 @@ -# salam_bot +# 🎤 Urdu Speech Intent Recognition using Whisper +A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation. + +## ✨ Features + +- **🎙️ Urdu Speech Transcription**: Accurate transcription of Urdu audio using Whisper +- **🌐 Built-in Translation**: Direct Urdu-to-English translation using Whisper's translation capability +- **🎯 Intent Detection**: Identifies user intent from conversation (questions, requests, commands, etc.) +- **😊 Sentiment Analysis**: Basic sentiment detection (positive/negative/neutral) +- **📊 Confidence Scoring**: Provides confidence scores for both transcription and intent detection +- **🔧 Multiple Model Sizes**: Support for tiny, base, small, medium, and large Whisper models +- **💾 JSON Export**: Option to save results in structured JSON format +- **🎵 Multi-format Support**: Works with MP3, WAV, M4A, FLAC, and other common audio formats + +## 📋 Supported Intents + +The system can detect the following intents: + +| Intent | Description | Example Keywords | +|--------|-------------|------------------| +| **greeting** | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" | +| **question** | Asking questions | "کیا", "کب", "کیوں", "کسے" | +| **request** | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" | +| **command** | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" | +| **complaint** | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" | +| **information** | Seeking information | "بتائیں", "جانیں", "تفصیل" | +| **emergency** | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" | +| **appointment** | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" | +| **farewell** | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" | +| **thanks** | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" | + +## 🚀 Quick Start + +### Installation + +1. **Clone the repository:** +```bash +git clone https://github.com/yourusername/urdu-intent-recognition.git +cd urdu-intent-recognition +``` + +2. **Install required packages:** +```bash +pip install openai-whisper torch torchaudio +``` + +3. **Install FFmpeg (required for audio processing):** + - **Ubuntu/Debian:** `sudo apt-get install ffmpeg` + - **macOS:** `brew install ffmpeg` + - **Windows:** Download from [ffmpeg.org](https://ffmpeg.org/download.html) + +### Basic Usage + +```bash +# Process an Urdu audio file +python urdu_intent_extractor.py path/to/your/audio.mp3 + +# Use a larger model for better accuracy +python urdu_intent_extractor.py audio.mp3 --model medium + +# Save results to JSON file +python urdu_intent_extractor.py audio.mp3 --output results.json +``` + +## 📖 Detailed Usage + +### Command Line Arguments + +```bash +python urdu_intent_extractor.py AUDIO_FILE [OPTIONS] +``` + +**Arguments:** +- `AUDIO_FILE`: Path to the audio file to process (required) + +**Options:** +- `--model`: Whisper model size (default: "base") + - Choices: `tiny`, `base`, `small`, `medium`, `large` + - Larger models are more accurate but slower +- `--output`: Save results to JSON file +- `--quiet`: Minimal console output +- `--help`: Show help message + +### Python API Usage + +You can also use the tool programmatically: + +```python +from urdu_intent_extractor import UrduIntentExtractor + +# Initialize the extractor +extractor = UrduIntentExtractor(model_size="base") + +# Process an audio file +results = extractor.process_audio_file("path/to/audio.mp3") + +# Access results +print(f"Urdu Transcription: {results['transcription']['urdu']}") +print(f"English Translation: {results['transcription']['english']}") +print(f"Detected Intent: {results['intent']['type']}") +print(f"Intent Confidence: {results['intent']['confidence']:.1%}") +print(f"Sentiment: {results['sentiment']['type']}") +``` + +### Example Output + +``` +============================================================== +URDU SPEECH INTENT ANALYSIS RESULTS +============================================================== + +📁 File: conversation.mp3 + +🗣️ URDU TRANSCRIPTION: + السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں + +🌐 ENGLISH TRANSLATION: + Hello, I want to ask you a question + +🎯 DETECTED INTENT: + ❓ Asking a question or seeking clarification + Confidence: 85.0% + Urdu keywords found: سوال + English keywords found: question + +😊 SENTIMENT: + NEUTRAL + Confidence: 50.0% + +============================================================== +``` + +## 🏗️ Architecture + +The system works in three main steps: + +1. **Transcription**: Whisper transcribes the Urdu audio to Urdu text +2. **Translation**: Whisper translates the Urdu text to English (using `task="translate"`) +3. **Intent Analysis**: Analyzes both Urdu and English text for intent keywords + +### Intent Detection Algorithm + +1. **Bilingual Keyword Matching**: Checks for intent keywords in both Urdu and English text +2. **Scoring System**: Assigns scores based on keyword matches +3. **Confidence Calculation**: Calculates confidence based on match frequency and text length +4. **Sentiment Analysis**: Basic sentiment detection using positive/negative keywords + +## 📊 Model Performance + +| Model Size | Speed | Accuracy | GPU Memory | Best Use Case | +|------------|-------|----------|------------|---------------| +| **base** | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping | +| **medium** | 🚀 Good | 🟢 Good | ~1GB | General purpose | +| **large-v1** | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed | +| **large-v2** | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required | +| **large-v3** | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production | + +## 🔧 Advanced Configuration + +### Custom Intent Keywords + +You can extend or modify the intent keywords by editing the `intent_keywords` dictionary in the code: + +```python +self.intent_keywords = { + "custom_intent": { + "urdu": ["کلیدی لفظ", "دوسرا لفظ"], + "english": ["keyword1", "keyword2"] + }, + # ... existing intents +} +``` + +### GPU Acceleration + +The tool automatically uses GPU if available. To force CPU usage: + +```python +# In the code, remove fp16 parameter: +result = self.model.transcribe( + audio_path, + language="ur", + task="translate" + # Remove: fp16=torch.cuda.is_available() +) +``` + +## 📝 Example Use Cases + +1. **Customer Service**: Automatically categorize customer calls +2. **Voice Assistants**: Understand user commands in Urdu +3. **Healthcare**: Triage patient concerns based on urgency +4. **Education**: Analyze student questions in online learning +5. **Business Analytics**: Understand customer feedback from calls + +## 🐛 Troubleshooting + +### Common Issues + +1. **"Audio file not found"** + - Ensure the file path is correct + - Check file permissions + +2. **Poor transcription quality** + - Try a larger Whisper model (`--model medium`) + - Ensure clear audio quality + - Check if audio contains Urdu speech + +3. **Slow processing** + - Use smaller model (`--model tiny` or `--model base`) + - Ensure GPU is available and properly configured + - Reduce audio file size or duration + +4. **FFmpeg errors** + - Reinstall FFmpeg + - Ensure FFmpeg is in system PATH + +### Debug Mode + +For debugging, you can enable more verbose output by modifying the code: + +```python +# Set verbose=True in transcribe calls +result = self.model.transcribe( + audio_path, + language="ur", + task="translate", + verbose=True, # Add this line + fp16=torch.cuda.is_available() +) +``` \ No newline at end of file