Add Urdu Intent Extractor with audio processing capabilities

dependencies
sample audio file (testing)
2025-12-08 01:34:32 +05:00 · 2025-12-08 01:34:20 +05:00 · 2025-12-08 01:34:12 +05:00 · 2025-12-08 01:34:01 +05:00 · 2025-12-08 01:33:53 +05:00
5 changed files with 617 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,2 +1,232 @@
-# salam_bot
+# 🎤 Urdu Speech Intent Recognition using Whisper

+A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.
+
+## ✨ Features
+
+- **🎙️ Urdu Speech Transcription**: Accurate transcription of Urdu audio using Whisper
+- **🌐 Built-in Translation**: Direct Urdu-to-English translation using Whisper's translation capability
+- **🎯 Intent Detection**: Identifies user intent from conversation (questions, requests, commands, etc.)
+- **😊 Sentiment Analysis**: Basic sentiment detection (positive/negative/neutral)
+- **📊 Confidence Scoring**: Provides confidence scores for both transcription and intent detection
+- **🔧 Multiple Model Sizes**: Support for tiny, base, small, medium, and large Whisper models
+- **💾 JSON Export**: Option to save results in structured JSON format
+- **🎵 Multi-format Support**: Works with MP3, WAV, M4A, FLAC, and other common audio formats
+
+## 📋 Supported Intents
+
+The system can detect the following intents:
+
+| Intent | Description | Example Keywords |
+|--------|-------------|------------------|
+| **greeting** | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" |
+| **question** | Asking questions | "کیا", "کب", "کیوں", "کسے" |
+| **request** | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" |
+| **command** | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" |
+| **complaint** | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" |
+| **information** | Seeking information | "بتائیں", "جانیں", "تفصیل" |
+| **emergency** | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" |
+| **appointment** | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" |
+| **farewell** | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" |
+| **thanks** | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" |
+
+## 🚀 Quick Start
+
+### Installation
+
+1. **Clone the repository:**
+```bash
+git clone https://github.com/yourusername/urdu-intent-recognition.git
+cd urdu-intent-recognition
+```
+
+2. **Install required packages:**
+```bash
+pip install openai-whisper torch torchaudio
+```
+
+3. **Install FFmpeg (required for audio processing):**
+   - **Ubuntu/Debian:** `sudo apt-get install ffmpeg`
+   - **macOS:** `brew install ffmpeg`
+   - **Windows:** Download from [ffmpeg.org](https://ffmpeg.org/download.html)
+
+### Basic Usage
+
+```bash
+# Process an Urdu audio file
+python urdu_intent_extractor.py path/to/your/audio.mp3
+
+# Use a larger model for better accuracy
+python urdu_intent_extractor.py audio.mp3 --model medium
+
+# Save results to JSON file
+python urdu_intent_extractor.py audio.mp3 --output results.json
+```
+
+## 📖 Detailed Usage
+
+### Command Line Arguments
+
+```bash
+python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]
+```
+
+**Arguments:**
+- `AUDIO_FILE`: Path to the audio file to process (required)
+
+**Options:**
+- `--model`: Whisper model size (default: "base")
+  - Choices: `tiny`, `base`, `small`, `medium`, `large`
+  - Larger models are more accurate but slower
+- `--output`: Save results to JSON file
+- `--quiet`: Minimal console output
+- `--help`: Show help message
+
+### Python API Usage
+
+You can also use the tool programmatically:
+
+```python
+from urdu_intent_extractor import UrduIntentExtractor
+
+# Initialize the extractor
+extractor = UrduIntentExtractor(model_size="base")
+
+# Process an audio file
+results = extractor.process_audio_file("path/to/audio.mp3")
+
+# Access results
+print(f"Urdu Transcription: {results['transcription']['urdu']}")
+print(f"English Translation: {results['transcription']['english']}")
+print(f"Detected Intent: {results['intent']['type']}")
+print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
+print(f"Sentiment: {results['sentiment']['type']}")
+```
+
+### Example Output
+
+```
+==============================================================
+URDU SPEECH INTENT ANALYSIS RESULTS
+==============================================================
+
+📁 File: conversation.mp3
+
+🗣️ URDU TRANSCRIPTION:
+   السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں
+
+🌐 ENGLISH TRANSLATION:
+   Hello, I want to ask you a question
+
+🎯 DETECTED INTENT:
+   ❓ Asking a question or seeking clarification
+   Confidence: 85.0%
+   Urdu keywords found: سوال
+   English keywords found: question
+
+😊 SENTIMENT:
+   NEUTRAL
+   Confidence: 50.0%
+
+==============================================================
+```
+
+## 🏗️ Architecture
+
+The system works in three main steps:
+
+1. **Transcription**: Whisper transcribes the Urdu audio to Urdu text
+2. **Translation**: Whisper translates the Urdu text to English (using `task="translate"`)
+3. **Intent Analysis**: Analyzes both Urdu and English text for intent keywords
+
+### Intent Detection Algorithm
+
+1. **Bilingual Keyword Matching**: Checks for intent keywords in both Urdu and English text
+2. **Scoring System**: Assigns scores based on keyword matches
+3. **Confidence Calculation**: Calculates confidence based on match frequency and text length
+4. **Sentiment Analysis**: Basic sentiment detection using positive/negative keywords
+
+## 📊 Model Performance
+
+| Model Size | Speed | Accuracy | GPU Memory | Best Use Case |
+|------------|-------|----------|------------|---------------|
+| **base**   | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping |
+| **medium**   | 🚀 Good | 🟢 Good | ~1GB | General purpose |
+| **large-v1**  | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed |
+| **large-v2** | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required |
+| **large-v3**  | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production |
+
+## 🔧 Advanced Configuration
+
+### Custom Intent Keywords
+
+You can extend or modify the intent keywords by editing the `intent_keywords` dictionary in the code:
+
+```python
+self.intent_keywords = {
+    "custom_intent": {
+        "urdu": ["کلیدی لفظ", "دوسرا لفظ"],
+        "english": ["keyword1", "keyword2"]
+    },
+    # ... existing intents
+}
+```
+
+### GPU Acceleration
+
+The tool automatically uses GPU if available. To force CPU usage:
+
+```python
+# In the code, remove fp16 parameter:
+result = self.model.transcribe(
+    audio_path,
+    language="ur",
+    task="translate"
+    # Remove: fp16=torch.cuda.is_available()
+)
+```
+
+## 📝 Example Use Cases
+
+1. **Customer Service**: Automatically categorize customer calls
+2. **Voice Assistants**: Understand user commands in Urdu
+3. **Healthcare**: Triage patient concerns based on urgency
+4. **Education**: Analyze student questions in online learning
+5. **Business Analytics**: Understand customer feedback from calls
+
+## 🐛 Troubleshooting
+
+### Common Issues
+
+1. **"Audio file not found"**
+   - Ensure the file path is correct
+   - Check file permissions
+
+2. **Poor transcription quality**
+   - Try a larger Whisper model (`--model medium`)
+   - Ensure clear audio quality
+   - Check if audio contains Urdu speech
+
+3. **Slow processing**
+   - Use smaller model (`--model tiny` or `--model base`)
+   - Ensure GPU is available and properly configured
+   - Reduce audio file size or duration
+
+4. **FFmpeg errors**
+   - Reinstall FFmpeg
+   - Ensure FFmpeg is in system PATH
+
+### Debug Mode
+
+For debugging, you can enable more verbose output by modifying the code:
+
+```python
+# Set verbose=True in transcribe calls
+result = self.model.transcribe(
+    audio_path,
+    language="ur",
+    task="translate",
+    verbose=True,  # Add this line
+    fp16=torch.cuda.is_available()
+)
+```
--- a/Recording.mp3
+++ b/Recording.mp3
--- a/helpers/audio_analysis.py
+++ b/helpers/audio_analysis.py
@ -0,0 +1,318 @@
+import whisper
+import torch
+import argparse
+import os
+from typing import Dict, Tuple, Optional
+import warnings
+
+warnings.filterwarnings('ignore')
+
+class UrduIntentExtractor:
+    def __init__(self, model_size: str = "large-v3"):
+        """
+        Initialize Urdu intent extractor using Whisper
+        
+        Args:
+            model_size: Whisper model size (tiny, base, small, medium, large)
+        """
+        print(f"Loading Whisper {model_size} model...")
+        self.model = whisper.load_model(model_size)
+        
+        # Comprehensive intent mapping for Urdu and English
+        self.intent_keywords = {
+            "greeting": {
+                "urdu": ["سلام", "السلام علیکم", "ہیلو", "آداب", "صبح بخیر", "شام بخیر"],
+                "english": ["hello", "hi", "greetings", "good morning", "good evening", "assalam"]
+            },
+            "question": {
+                "urdu": ["کیا", "کب", "کیوں", "کسے", "کہاں", "کس طرح", "کتنا", "کیسے"],
+                "english": ["what", "when", "why", "who", "where", "how", "how much", "which"]
+            },
+            "request": {
+                "urdu": ["براہ کرم", "مہربانی", "چاہتا ہوں", "چاہتی ہوں", "درکار ہے", "مدد چاہیے"],
+                "english": ["please", "kindly", "want", "need", "require", "help", "could you", "would you"]
+            },
+            "command": {
+                "urdu": ["کرو", "کریں", "لاؤ", "دیں", "بناؤ", "روکو", "جاؤ", "آؤ"],
+                "english": ["do", "make", "bring", "give", "create", "stop", "go", "come"]
+            },
+            "complaint": {
+                "urdu": ["شکایت", "مسئلہ", "پریشانی", "غلط", "خراب", "نقص", "برا"],
+                "english": ["complaint", "problem", "issue", "wrong", "bad", "fault", "error"]
+            },
+            "information": {
+                "urdu": ["بتائیں", "جانیں", "معلوم", "تفصیل", "رہنمائی", "بتاؤ"],
+                "english": ["tell", "know", "information", "details", "guide", "explain"]
+            },
+            "emergency": {
+                "urdu": ["حادثہ", "ایمرجنسی", "تباہی", "بچاؤ", "جلدی", "فوری", "خطرہ"],
+                "english": ["accident", "emergency", "help", "urgent", "quick", "danger", "dangerous"]
+            },
+            "appointment": {
+                "urdu": ["ملاقات", "اپائنٹمنٹ", "ٹائم", "تاریخ", "وقت", "دن"],
+                "english": ["meeting", "appointment", "time", "date", "schedule", "day"]
+            },
+            "farewell": {
+                "urdu": ["اللہ حافظ", "خدا حافظ", "بای", "اختتام", "ختم", "اگلی بار"],
+                "english": ["goodbye", "bye", "farewell", "end", "see you", "next time"]
+            },
+            "thanks": {
+                "urdu": ["شکریہ", "مہربانی", "آپ کا بہت شکریہ", "تھینکس"],
+                "english": ["thank", "thanks", "grateful", "appreciate"]
+            }
+        }
+    
+    def transcribe_and_translate(self, audio_path: str) -> Dict[str, str]:
+        """
+        Transcribe Urdu audio and translate to English using Whisper
+        
+        Args:
+            audio_path: Path to audio file
+            
+        Returns:
+            Dictionary containing Urdu transcription and English translation
+        """
+        print(f"\nProcessing audio file: {os.path.basename(audio_path)}")
+        
+        # First, transcribe in Urdu
+        print("Transcribing in Urdu...")
+        urdu_result = self.model.transcribe(
+            audio_path,
+            language="ur",  # Force Urdu language
+            task="transcribe",
+            fp16=torch.cuda.is_available()
+        )
+        urdu_text = urdu_result["text"].strip()
+        
+        # Then, translate to English
+        print("Translating to English...")
+        english_result = self.model.transcribe(
+            audio_path,
+            language="ur",  # Source language is Urdu
+            task="translate",  # This tells Whisper to translate
+            fp16=torch.cuda.is_available()
+        )
+        english_text = english_result["text"].strip()
+        
+        return {
+            "urdu": urdu_text,
+            "english": english_text,
+            "urdu_segments": urdu_result.get("segments", []),
+            "english_segments": english_result.get("segments", [])
+        }
+    
+    def extract_intent(self, urdu_text: str, english_text: str) -> Tuple[str, float, Dict]:
+        """
+        Extract main intent from both Urdu and English texts
+        
+        Args:
+            urdu_text: Original Urdu transcription
+            english_text: Translated English text
+            
+        Returns:
+            Tuple of (intent, confidence, details)
+        """
+        print("\nAnalyzing intent...")
+        
+        # Prepare text for analysis
+        urdu_lower = urdu_text.lower()
+        english_lower = english_text.lower()
+        
+        # Calculate intent scores
+        intent_scores = {}
+        intent_details = {}
+        
+        for intent, keywords in self.intent_keywords.items():
+            # Count Urdu keyword matches
+            urdu_matches = []
+            for keyword in keywords["urdu"]:
+                if keyword in urdu_lower:
+                    urdu_matches.append(keyword)
+            
+            # Count English keyword matches  
+            english_matches = []
+            for keyword in keywords["english"]:
+                if keyword.lower() in english_lower:
+                    english_matches.append(keyword)
+            
+            # Calculate scores
+            urdu_score = len(urdu_matches)
+            english_score = len(english_matches)
+            total_score = urdu_score + english_score
+            
+            if total_score > 0:
+                intent_scores[intent] = total_score
+                intent_details[intent] = {
+                    "urdu_matches": urdu_matches,
+                    "english_matches": english_matches,
+                    "urdu_score": urdu_score,
+                    "english_score": english_score,
+                    "total_score": total_score
+                }
+        
+        # Determine main intent
+        if intent_scores:
+            # Get intent with highest score
+            main_intent = max(intent_scores, key=intent_scores.get)
+            
+            # Calculate confidence based on multiple factors
+            total_words = len(english_lower.split()) + len(urdu_lower.split())
+            base_confidence = intent_scores[main_intent] / max(1, total_words / 5)
+            
+            # Boost confidence if matches found in both languages
+            if (intent_details[main_intent]["urdu_score"] > 0 and 
+                intent_details[main_intent]["english_score"] > 0):
+                base_confidence *= 1.5
+            
+            confidence = min(base_confidence, 1.0)
+        else:
+            main_intent = "general_conversation"
+            confidence = 0.3
+            intent_details[main_intent] = {
+                "urdu_matches": [],
+                "english_matches": [],
+                "urdu_score": 0,
+                "english_score": 0,
+                "total_score": 0
+            }
+        
+        return main_intent, confidence, intent_details[main_intent]
+    
+    def get_intent_description(self, intent: str) -> str:
+        """
+        Get human-readable description for intent
+        
+        Args:
+            intent: Detected intent
+            
+        Returns:
+            Description string
+        """
+        descriptions = {
+            "greeting": "👋 Greeting or starting a conversation",
+            "question": "❓ Asking a question or seeking clarification",
+            "request": "🙏 Making a request or asking for something",
+            "command": "⚡ Giving a command or instruction",
+            "complaint": "😠 Expressing a complaint or dissatisfaction",
+            "information": "ℹ️ Seeking or providing information",
+            "emergency": "🚨 Emergency situation requiring immediate attention",
+            "appointment": "📅 Scheduling or inquiring about a meeting/appointment",
+            "farewell": "👋 Ending the conversation",
+            "thanks": "🙏 Expressing gratitude or thanks",
+            "general_conversation": "💬 General conversation without specific intent"
+        }
+        return descriptions.get(intent, "💭 Unknown or general conversation")
+    
+    def analyze_sentiment(self, english_text: str) -> Tuple[str, float]:
+        """
+        Basic sentiment analysis based on keywords
+        
+        Args:
+            english_text: English translated text
+            
+        Returns:
+            Tuple of (sentiment, confidence)
+        """
+        positive_words = ["good", "great", "excellent", "happy", "thanks", "thank", "please", 
+                         "wonderful", "nice", "helpful", "appreciate", "love", "like"]
+        negative_words = ["bad", "wrong", "problem", "issue", "complaint", "angry", "upset",
+                         "terrible", "horrible", "hate", "not working", "broken", "failed"]
+        
+        text_lower = english_text.lower()
+        
+        positive_count = sum(1 for word in positive_words if word in text_lower)
+        negative_count = sum(1 for word in negative_words if word in text_lower)
+        
+        if positive_count > negative_count:
+            return "positive", positive_count / max(1, (positive_count + negative_count))
+        elif negative_count > positive_count:
+            return "negative", negative_count / max(1, (positive_count + negative_count))
+        else:
+            return "neutral", 0.5
+    
+    def process_audio_file(self, audio_path: str, verbose: bool = True) -> Dict:
+        """
+        Main function to process audio file and extract intent
+        
+        Args:
+            audio_path: Path to audio file
+            verbose: Whether to print detailed output
+            
+        Returns:
+            Dictionary with all analysis results
+        """
+        # Validate file
+        if not os.path.exists(audio_path):
+            raise FileNotFoundError(f"Audio file not found: {audio_path}")
+        
+        # Transcribe and translate
+        results = self.transcribe_and_translate(audio_path)
+        
+        # Extract intent
+        intent, confidence, intent_details = self.extract_intent(
+            results["urdu"], 
+            results["english"]
+        )
+        
+        # Analyze sentiment
+        sentiment, sentiment_confidence = self.analyze_sentiment(results["english"])
+        
+        # Prepare final results
+        final_results = {
+            "file": os.path.basename(audio_path),
+            "transcription": {
+                "urdu": results["urdu"],
+                "english": results["english"]
+            },
+            "intent": {
+                "type": intent,
+                "confidence": confidence,
+                "description": self.get_intent_description(intent),
+                "details": intent_details
+            },
+            "sentiment": {
+                "type": sentiment,
+                "confidence": sentiment_confidence
+            },
+            "segments": {
+                "urdu": results.get("urdu_segments", []),
+                "english": results.get("english_segments", [])
+            }
+        }
+        
+        # Print results if verbose
+        if verbose:
+            self.print_results(final_results)
+        
+        return final_results
+    
+    def print_results(self, results: Dict):
+        """
+        Print analysis results in a formatted way
+        """
+        print("\n" + "="*70)
+        print("URDU SPEECH INTENT ANALYSIS RESULTS")
+        print("="*70)
+        
+        print(f"\n📁 File: {results['file']}")
+        
+        print(f"\n🗣️ URDU TRANSCRIPTION:")
+        print(f"   {results['transcription']['urdu']}")
+        
+        print(f"\n🌐 ENGLISH TRANSLATION:")
+        print(f"   {results['transcription']['english']}")
+        
+        print(f"\n🎯 DETECTED INTENT:")
+        print(f"   {results['intent']['description']}")
+        print(f"   Confidence: {results['intent']['confidence']:.1%}")
+        
+        if results['intent']['details']['urdu_matches']:
+            print(f"   Urdu keywords found: {', '.join(results['intent']['details']['urdu_matches'])}")
+        if results['intent']['details']['english_matches']:
+            print(f"   English keywords found: {', '.join(results['intent']['details']['english_matches'])}")
+        
+        print(f"\n😊 SENTIMENT:")
+        print(f"   {results['sentiment']['type'].upper()}")
+        print(f"   Confidence: {results['sentiment']['confidence']:.1%}")
+        
+        print("\n" + "="*70)
--- a/main.py
+++ b/main.py
@ -0,0 +1,60 @@
+import argparse
+from helpers.audio_analysis import UrduIntentExtractor
+
+def main():
+    """
+    Command-line interface for Urdu Intent Extractor
+    """
+    parser = argparse.ArgumentParser(
+        description="Extract intent from Urdu speech using Whisper translation"
+    )
+    
+    parser.add_argument(
+        "audio_file",
+        help="Path to audio file (mp3, wav, m4a, flac, etc.)"
+    )
+    
+    parser.add_argument(
+        "--model",
+        default="base",
+        choices=["base", "medium", "large-v1", "large-v2", "large-v3"],
+        help="Whisper model size (default: base)"
+    )
+    
+    parser.add_argument(
+        "--output",
+        help="Save results to JSON file"
+    )
+    
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Minimal output"
+    )
+    
+    args = parser.parse_args()
+    
+    try:
+        # Initialize extractor
+        extractor = UrduIntentExtractor(model_size=args.model)
+        
+        # Process audio file
+        results = extractor.process_audio_file(
+            args.audio_file, 
+            verbose=not args.quiet
+        )
+        
+        # Save to JSON if requested
+        if args.output:
+            import json
+            with open(args.output, 'w', encoding='utf-8') as f:
+                json.dump(results, f, ensure_ascii=False, indent=2)
+            print(f"\nResults saved to: {args.output}")
+            
+    except FileNotFoundError as e:
+        print(f"❌ Error: {e}")
+    except Exception as e:
+        print(f"❌ An error occurred: {str(e)}")
+
+if __name__ == "__main__":
+    main()
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,8 @@
+openai-whisper
+torch
+torchaudio
+transformers
+argparse
+huggingface_hub[hf_xet]
+sentencepiece
+sacremoses
Author	SHA1	Message	Date
ammar ahmed	8eee8af4f2	Add Urdu Intent Extractor with audio processing capabilities	2025-12-08 01:34:32 +05:00
ammar ahmed	6d8738b231	dependencies	2025-12-08 01:34:20 +05:00
ammar ahmed	71ba6001f6	sample audio file (testing)	2025-12-08 01:34:12 +05:00
ammar ahmed	ba876f93d4	updated readme	2025-12-08 01:34:01 +05:00
ammar ahmed	329222934f	Add command-line interface for Urdu Intent Extractor	2025-12-08 01:33:53 +05:00