7.6 KiB
7.6 KiB
🎤 Urdu Speech Intent Recognition using Whisper
A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.
✨ Features
- 🎙️ Urdu Speech Transcription: Accurate transcription of Urdu audio using Whisper
- 🌐 Built-in Translation: Direct Urdu-to-English translation using Whisper's translation capability
- 🎯 Intent Detection: Identifies user intent from conversation (questions, requests, commands, etc.)
- 😊 Sentiment Analysis: Basic sentiment detection (positive/negative/neutral)
- 📊 Confidence Scoring: Provides confidence scores for both transcription and intent detection
- 🔧 Multiple Model Sizes: Support for tiny, base, small, medium, and large Whisper models
- 💾 JSON Export: Option to save results in structured JSON format
- 🎵 Multi-format Support: Works with MP3, WAV, M4A, FLAC, and other common audio formats
📋 Supported Intents
The system can detect the following intents:
| Intent | Description | Example Keywords |
|---|---|---|
| greeting | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" |
| question | Asking questions | "کیا", "کب", "کیوں", "کسے" |
| request | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" |
| command | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" |
| complaint | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" |
| information | Seeking information | "بتائیں", "جانیں", "تفصیل" |
| emergency | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" |
| appointment | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" |
| farewell | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" |
| thanks | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" |
🚀 Quick Start
Installation
- Clone the repository:
git clone https://github.com/yourusername/urdu-intent-recognition.git
cd urdu-intent-recognition
- Install required packages:
pip install openai-whisper torch torchaudio
- Install FFmpeg (required for audio processing):
- Ubuntu/Debian:
sudo apt-get install ffmpeg - macOS:
brew install ffmpeg - Windows: Download from ffmpeg.org
- Ubuntu/Debian:
Basic Usage
# Process an Urdu audio file
python urdu_intent_extractor.py path/to/your/audio.mp3
# Use a larger model for better accuracy
python urdu_intent_extractor.py audio.mp3 --model medium
# Save results to JSON file
python urdu_intent_extractor.py audio.mp3 --output results.json
📖 Detailed Usage
Command Line Arguments
python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]
Arguments:
AUDIO_FILE: Path to the audio file to process (required)
Options:
--model: Whisper model size (default: "base")- Choices:
tiny,base,small,medium,large - Larger models are more accurate but slower
- Choices:
--output: Save results to JSON file--quiet: Minimal console output--help: Show help message
Python API Usage
You can also use the tool programmatically:
from urdu_intent_extractor import UrduIntentExtractor
# Initialize the extractor
extractor = UrduIntentExtractor(model_size="base")
# Process an audio file
results = extractor.process_audio_file("path/to/audio.mp3")
# Access results
print(f"Urdu Transcription: {results['transcription']['urdu']}")
print(f"English Translation: {results['transcription']['english']}")
print(f"Detected Intent: {results['intent']['type']}")
print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
print(f"Sentiment: {results['sentiment']['type']}")
Example Output
==============================================================
URDU SPEECH INTENT ANALYSIS RESULTS
==============================================================
📁 File: conversation.mp3
🗣️ URDU TRANSCRIPTION:
السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں
🌐 ENGLISH TRANSLATION:
Hello, I want to ask you a question
🎯 DETECTED INTENT:
❓ Asking a question or seeking clarification
Confidence: 85.0%
Urdu keywords found: سوال
English keywords found: question
😊 SENTIMENT:
NEUTRAL
Confidence: 50.0%
==============================================================
🏗️ Architecture
The system works in three main steps:
- Transcription: Whisper transcribes the Urdu audio to Urdu text
- Translation: Whisper translates the Urdu text to English (using
task="translate") - Intent Analysis: Analyzes both Urdu and English text for intent keywords
Intent Detection Algorithm
- Bilingual Keyword Matching: Checks for intent keywords in both Urdu and English text
- Scoring System: Assigns scores based on keyword matches
- Confidence Calculation: Calculates confidence based on match frequency and text length
- Sentiment Analysis: Basic sentiment detection using positive/negative keywords
📊 Model Performance
| Model Size | Speed | Accuracy | GPU Memory | Best Use Case |
|---|---|---|---|---|
| base | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping |
| medium | 🚀 Good | 🟢 Good | ~1GB | General purpose |
| large-v1 | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed |
| large-v2 | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required |
| large-v3 | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production |
🔧 Advanced Configuration
Custom Intent Keywords
You can extend or modify the intent keywords by editing the intent_keywords dictionary in the code:
self.intent_keywords = {
"custom_intent": {
"urdu": ["کلیدی لفظ", "دوسرا لفظ"],
"english": ["keyword1", "keyword2"]
},
# ... existing intents
}
GPU Acceleration
The tool automatically uses GPU if available. To force CPU usage:
# In the code, remove fp16 parameter:
result = self.model.transcribe(
audio_path,
language="ur",
task="translate"
# Remove: fp16=torch.cuda.is_available()
)
📝 Example Use Cases
- Customer Service: Automatically categorize customer calls
- Voice Assistants: Understand user commands in Urdu
- Healthcare: Triage patient concerns based on urgency
- Education: Analyze student questions in online learning
- Business Analytics: Understand customer feedback from calls
🐛 Troubleshooting
Common Issues
-
"Audio file not found"
- Ensure the file path is correct
- Check file permissions
-
Poor transcription quality
- Try a larger Whisper model (
--model medium) - Ensure clear audio quality
- Check if audio contains Urdu speech
- Try a larger Whisper model (
-
Slow processing
- Use smaller model (
--model tinyor--model base) - Ensure GPU is available and properly configured
- Reduce audio file size or duration
- Use smaller model (
-
FFmpeg errors
- Reinstall FFmpeg
- Ensure FFmpeg is in system PATH
Debug Mode
For debugging, you can enable more verbose output by modifying the code:
# Set verbose=True in transcribe calls
result = self.model.transcribe(
audio_path,
language="ur",
task="translate",
verbose=True, # Add this line
fp16=torch.cuda.is_available()
)