# 🎤 Urdu Speech Intent Recognition using Whisper

A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.

## ✨ Features

- **🎙️ Urdu Speech Transcription**: Accurate transcription of Urdu audio using Whisper
- **🌐 Built-in Translation**: Direct Urdu-to-English translation using Whisper's translation capability
- **🎯 Intent Detection**: Identifies user intent from conversation (questions, requests, commands, etc.)
- **😊 Sentiment Analysis**: Basic sentiment detection (positive/negative/neutral)
- **📊 Confidence Scoring**: Provides confidence scores for both transcription and intent detection
- **🔧 Multiple Model Sizes**: Support for base, medium, large-v1, large-v2, large-v3 Whisper models
- **💾 JSON Export**: Option to save results in structured JSON format
- **🎵 Multi-format Support**: Works with MP3, WAV, M4A, FLAC, and other common audio formats

## 📋 Supported Intents

The system can detect the following intents:

| Intent | Description | Example Keywords |
|--------|-------------|------------------|
| **greeting** | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" |
| **question** | Asking questions | "کیا", "کب", "کیوں", "کسے" |
| **request** | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" |
| **command** | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" |
| **complaint** | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" |
| **information** | Seeking information | "بتائیں", "جانیں", "تفصیل" |
| **emergency** | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" |
| **appointment** | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" |
| **farewell** | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" |
| **thanks** | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" |

## 🚀 Quick Start

### Installation

1. **Clone the repository:**
```bash
git clone https://github.com/yourusername/urdu-intent-recognition.git
cd urdu-intent-recognition
```

2. **Install required packages:**
```bash
pip install openai-whisper torch torchaudio
```

3. **Install FFmpeg (required for audio processing):**
   - **Ubuntu/Debian:** `sudo apt-get install ffmpeg`
   - **macOS:** `brew install ffmpeg`
   - **Windows:** Download from [ffmpeg.org](https://ffmpeg.org/download.html)

### Basic Usage

```bash
# Process an Urdu audio file
python urdu_intent_extractor.py path/to/your/audio.mp3

# Use a larger model for better accuracy
python urdu_intent_extractor.py audio.mp3 --model medium

# Save results to JSON file
python urdu_intent_extractor.py audio.mp3 --output results.json
```

## 📖 Detailed Usage

### Command Line Arguments

```bash
python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]
```

**Arguments:**
- `AUDIO_FILE`: Path to the audio file to process (required)

**Options:**
- `--model`: Whisper model size (default: "base")
  - Choices: `tiny`, `base`, `small`, `medium`, `large`
  - Larger models are more accurate but slower
- `--output`: Save results to JSON file
- `--quiet`: Minimal console output
- `--help`: Show help message

### Python API Usage

You can also use the tool programmatically:

```python
from urdu_intent_extractor import UrduIntentExtractor

# Initialize the extractor
extractor = UrduIntentExtractor(model_size="base")

# Process an audio file
results = extractor.process_audio_file("path/to/audio.mp3")

# Access results
print(f"Urdu Transcription: {results['transcription']['urdu']}")
print(f"English Translation: {results['transcription']['english']}")
print(f"Detected Intent: {results['intent']['type']}")
print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
print(f"Sentiment: {results['sentiment']['type']}")
```

### Example Output

```
==============================================================
URDU SPEECH INTENT ANALYSIS RESULTS
==============================================================

📁 File: conversation.mp3

🗣️ URDU TRANSCRIPTION:
   السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں

🌐 ENGLISH TRANSLATION:
   Hello, I want to ask you a question

🎯 DETECTED INTENT:
   ❓ Asking a question or seeking clarification
   Confidence: 85.0%
   Urdu keywords found: سوال
   English keywords found: question

😊 SENTIMENT:
   NEUTRAL
   Confidence: 50.0%

==============================================================
```

## 🏗️ Architecture

The system works in three main steps:

1. **Transcription**: Whisper transcribes the Urdu audio to Urdu text
2. **Translation**: Whisper translates the Urdu text to English (using `task="translate"`)
3. **Intent Analysis**: Analyzes both Urdu and English text for intent keywords

### Intent Detection Algorithm

1. **Bilingual Keyword Matching**: Checks for intent keywords in both Urdu and English text
2. **Scoring System**: Assigns scores based on keyword matches
3. **Confidence Calculation**: Calculates confidence based on match frequency and text length
4. **Sentiment Analysis**: Basic sentiment detection using positive/negative keywords

## 📊 Model Performance

| Model Size | Speed | Accuracy | GPU Memory | Best Use Case |
|------------|-------|----------|------------|---------------|
| **base**   | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping |
| **medium**   | 🚀 Good | 🟢 Good | ~1GB | General purpose |
| **large-v1**  | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed |
| **large-v2** | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required |
| **large-v3**  | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production |

## 🔧 Advanced Configuration

### Custom Intent Keywords

You can extend or modify the intent keywords by editing the `intent_keywords` dictionary in the code:

```python
self.intent_keywords = {
    "custom_intent": {
        "urdu": ["کلیدی لفظ", "دوسرا لفظ"],
        "english": ["keyword1", "keyword2"]
    },
    # ... existing intents
}
```

### GPU Acceleration

The tool automatically uses GPU if available. To force CPU usage:

```python
# In the code, remove fp16 parameter:
result = self.model.transcribe(
    audio_path,
    language="ur",
    task="translate"
    # Remove: fp16=torch.cuda.is_available()
)
```

## 📝 Example Use Cases

1. **Customer Service**: Automatically categorize customer calls
2. **Voice Assistants**: Understand user commands in Urdu
3. **Healthcare**: Triage patient concerns based on urgency
4. **Education**: Analyze student questions in online learning
5. **Business Analytics**: Understand customer feedback from calls

## 🐛 Troubleshooting

### Common Issues

1. **"Audio file not found"**
   - Ensure the file path is correct
   - Check file permissions

2. **Poor transcription quality**
   - Try a larger Whisper model (`--model medium`)
   - Ensure clear audio quality
   - Check if audio contains Urdu speech

3. **Slow processing**
   - Use smaller model (`--model tiny` or `--model base`)
   - Ensure GPU is available and properly configured
   - Reduce audio file size or duration

4. **FFmpeg errors**
   - Reinstall FFmpeg
   - Ensure FFmpeg is in system PATH

### Debug Mode

For debugging, you can enable more verbose output by modifying the code:

```python
# Set verbose=True in transcribe calls
result = self.model.transcribe(
    audio_path,
    language="ur",
    task="translate",
    verbose=True,  # Add this line
    fp16=torch.cuda.is_available()
)
```