salam_bot/README.md

232 lines
7.6 KiB
Markdown

# 🎤 Urdu Speech Intent Recognition using Whisper
A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.
## ✨ Features
- **🎙️ Urdu Speech Transcription**: Accurate transcription of Urdu audio using Whisper
- **🌐 Built-in Translation**: Direct Urdu-to-English translation using Whisper's translation capability
- **🎯 Intent Detection**: Identifies user intent from conversation (questions, requests, commands, etc.)
- **😊 Sentiment Analysis**: Basic sentiment detection (positive/negative/neutral)
- **📊 Confidence Scoring**: Provides confidence scores for both transcription and intent detection
- **🔧 Multiple Model Sizes**: Support for base, medium, large-v1, large-v2, large-v3 Whisper models
- **💾 JSON Export**: Option to save results in structured JSON format
- **🎵 Multi-format Support**: Works with MP3, WAV, M4A, FLAC, and other common audio formats
## 📋 Supported Intents
The system can detect the following intents:
| Intent | Description | Example Keywords |
|--------|-------------|------------------|
| **greeting** | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" |
| **question** | Asking questions | "کیا", "کب", "کیوں", "کسے" |
| **request** | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" |
| **command** | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" |
| **complaint** | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" |
| **information** | Seeking information | "بتائیں", "جانیں", "تفصیل" |
| **emergency** | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" |
| **appointment** | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" |
| **farewell** | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" |
| **thanks** | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" |
## 🚀 Quick Start
### Installation
1. **Clone the repository:**
```bash
git clone https://github.com/yourusername/urdu-intent-recognition.git
cd urdu-intent-recognition
```
2. **Install required packages:**
```bash
pip install openai-whisper torch torchaudio
```
3. **Install FFmpeg (required for audio processing):**
- **Ubuntu/Debian:** `sudo apt-get install ffmpeg`
- **macOS:** `brew install ffmpeg`
- **Windows:** Download from [ffmpeg.org](https://ffmpeg.org/download.html)
### Basic Usage
```bash
# Process an Urdu audio file
python urdu_intent_extractor.py path/to/your/audio.mp3
# Use a larger model for better accuracy
python urdu_intent_extractor.py audio.mp3 --model medium
# Save results to JSON file
python urdu_intent_extractor.py audio.mp3 --output results.json
```
## 📖 Detailed Usage
### Command Line Arguments
```bash
python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]
```
**Arguments:**
- `AUDIO_FILE`: Path to the audio file to process (required)
**Options:**
- `--model`: Whisper model size (default: "base")
- Choices: `tiny`, `base`, `small`, `medium`, `large`
- Larger models are more accurate but slower
- `--output`: Save results to JSON file
- `--quiet`: Minimal console output
- `--help`: Show help message
### Python API Usage
You can also use the tool programmatically:
```python
from urdu_intent_extractor import UrduIntentExtractor
# Initialize the extractor
extractor = UrduIntentExtractor(model_size="base")
# Process an audio file
results = extractor.process_audio_file("path/to/audio.mp3")
# Access results
print(f"Urdu Transcription: {results['transcription']['urdu']}")
print(f"English Translation: {results['transcription']['english']}")
print(f"Detected Intent: {results['intent']['type']}")
print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
print(f"Sentiment: {results['sentiment']['type']}")
```
### Example Output
```
==============================================================
URDU SPEECH INTENT ANALYSIS RESULTS
==============================================================
📁 File: conversation.mp3
🗣️ URDU TRANSCRIPTION:
السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں
🌐 ENGLISH TRANSLATION:
Hello, I want to ask you a question
🎯 DETECTED INTENT:
❓ Asking a question or seeking clarification
Confidence: 85.0%
Urdu keywords found: سوال
English keywords found: question
😊 SENTIMENT:
NEUTRAL
Confidence: 50.0%
==============================================================
```
## 🏗️ Architecture
The system works in three main steps:
1. **Transcription**: Whisper transcribes the Urdu audio to Urdu text
2. **Translation**: Whisper translates the Urdu text to English (using `task="translate"`)
3. **Intent Analysis**: Analyzes both Urdu and English text for intent keywords
### Intent Detection Algorithm
1. **Bilingual Keyword Matching**: Checks for intent keywords in both Urdu and English text
2. **Scoring System**: Assigns scores based on keyword matches
3. **Confidence Calculation**: Calculates confidence based on match frequency and text length
4. **Sentiment Analysis**: Basic sentiment detection using positive/negative keywords
## 📊 Model Performance
| Model Size | Speed | Accuracy | GPU Memory | Best Use Case |
|------------|-------|----------|------------|---------------|
| **base** | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping |
| **medium** | 🚀 Good | 🟢 Good | ~1GB | General purpose |
| **large-v1** | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed |
| **large-v2** | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required |
| **large-v3** | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production |
## 🔧 Advanced Configuration
### Custom Intent Keywords
You can extend or modify the intent keywords by editing the `intent_keywords` dictionary in the code:
```python
self.intent_keywords = {
"custom_intent": {
"urdu": ["کلیدی لفظ", "دوسرا لفظ"],
"english": ["keyword1", "keyword2"]
},
# ... existing intents
}
```
### GPU Acceleration
The tool automatically uses GPU if available. To force CPU usage:
```python
# In the code, remove fp16 parameter:
result = self.model.transcribe(
audio_path,
language="ur",
task="translate"
# Remove: fp16=torch.cuda.is_available()
)
```
## 📝 Example Use Cases
1. **Customer Service**: Automatically categorize customer calls
2. **Voice Assistants**: Understand user commands in Urdu
3. **Healthcare**: Triage patient concerns based on urgency
4. **Education**: Analyze student questions in online learning
5. **Business Analytics**: Understand customer feedback from calls
## 🐛 Troubleshooting
### Common Issues
1. **"Audio file not found"**
- Ensure the file path is correct
- Check file permissions
2. **Poor transcription quality**
- Try a larger Whisper model (`--model medium`)
- Ensure clear audio quality
- Check if audio contains Urdu speech
3. **Slow processing**
- Use smaller model (`--model tiny` or `--model base`)
- Ensure GPU is available and properly configured
- Reduce audio file size or duration
4. **FFmpeg errors**
- Reinstall FFmpeg
- Ensure FFmpeg is in system PATH
### Debug Mode
For debugging, you can enable more verbose output by modifying the code:
```python
# Set verbose=True in transcribe calls
result = self.model.transcribe(
audio_path,
language="ur",
task="translate",
verbose=True, # Add this line
fp16=torch.cuda.is_available()
)
```