ammar ahmed ba876f93d4 updated readme

2025-12-08 01:34:01 +05:00

7.6 KiB

Raw Blame History

🎤 Urdu Speech Intent Recognition using Whisper

A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.

✨ Features

🎙️ Urdu Speech Transcription: Accurate transcription of Urdu audio using Whisper
🌐 Built-in Translation: Direct Urdu-to-English translation using Whisper's translation capability
🎯 Intent Detection: Identifies user intent from conversation (questions, requests, commands, etc.)
😊 Sentiment Analysis: Basic sentiment detection (positive/negative/neutral)
📊 Confidence Scoring: Provides confidence scores for both transcription and intent detection
🔧 Multiple Model Sizes: Support for tiny, base, small, medium, and large Whisper models
💾 JSON Export: Option to save results in structured JSON format
🎵 Multi-format Support: Works with MP3, WAV, M4A, FLAC, and other common audio formats

📋 Supported Intents

The system can detect the following intents:

Intent	Description	Example Keywords
greeting	Starting a conversation	"سلام", "ہیلو", "السلام علیکم"
question	Asking questions	"کیا", "کب", "کیوں", "کسے"
request	Making requests	"براہ کرم", "مہربانی", "مدد چاہیے"
command	Giving commands	"کرو", "لاؤ", "دیں", "بناؤ"
complaint	Expressing complaints	"شکایت", "مسئلہ", "پریشانی"
information	Seeking information	"بتائیں", "جانیں", "تفصیل"
emergency	Emergency situations	"حادثہ", "ایمرجنسی", "فوری"
appointment	Scheduling meetings	"ملاقات", "اپائنٹمنٹ", "تاریخ"
farewell	Ending conversations	"اللہ حافظ", "خدا حافظ", "اختتام"
thanks	Expressing gratitude	"شکریہ", "آپ کا بہت شکریہ"

🚀 Quick Start

Installation

Clone the repository:

git clone https://github.com/yourusername/urdu-intent-recognition.git
cd urdu-intent-recognition

Install required packages:

pip install openai-whisper torch torchaudio

Install FFmpeg (required for audio processing):
- Ubuntu/Debian: sudo apt-get install ffmpeg
- macOS: brew install ffmpeg
- Windows: Download from ffmpeg.org

Basic Usage

# Process an Urdu audio file
python urdu_intent_extractor.py path/to/your/audio.mp3

# Use a larger model for better accuracy
python urdu_intent_extractor.py audio.mp3 --model medium

# Save results to JSON file
python urdu_intent_extractor.py audio.mp3 --output results.json

📖 Detailed Usage

Command Line Arguments

python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]

Arguments:

AUDIO_FILE: Path to the audio file to process (required)

Options:

--model: Whisper model size (default: "base")
- Choices: tiny, base, small, medium, large
- Larger models are more accurate but slower
--output: Save results to JSON file
--quiet: Minimal console output
--help: Show help message

Python API Usage

You can also use the tool programmatically:

from urdu_intent_extractor import UrduIntentExtractor

# Initialize the extractor
extractor = UrduIntentExtractor(model_size="base")

# Process an audio file
results = extractor.process_audio_file("path/to/audio.mp3")

# Access results
print(f"Urdu Transcription: {results['transcription']['urdu']}")
print(f"English Translation: {results['transcription']['english']}")
print(f"Detected Intent: {results['intent']['type']}")
print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
print(f"Sentiment: {results['sentiment']['type']}")

Example Output

==============================================================
URDU SPEECH INTENT ANALYSIS RESULTS
==============================================================

📁 File: conversation.mp3

🗣️ URDU TRANSCRIPTION:
   السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں

🌐 ENGLISH TRANSLATION:
   Hello, I want to ask you a question

🎯 DETECTED INTENT:
   ❓ Asking a question or seeking clarification
   Confidence: 85.0%
   Urdu keywords found: سوال
   English keywords found: question

😊 SENTIMENT:
   NEUTRAL
   Confidence: 50.0%

==============================================================

🏗️ Architecture

The system works in three main steps:

Transcription: Whisper transcribes the Urdu audio to Urdu text
Translation: Whisper translates the Urdu text to English (using task="translate")
Intent Analysis: Analyzes both Urdu and English text for intent keywords

Intent Detection Algorithm

Bilingual Keyword Matching: Checks for intent keywords in both Urdu and English text
Scoring System: Assigns scores based on keyword matches
Confidence Calculation: Calculates confidence based on match frequency and text length
Sentiment Analysis: Basic sentiment detection using positive/negative keywords

📊 Model Performance

Model Size	Speed	Accuracy	GPU Memory	Best Use Case
base	⚡ Fast	🟡 Moderate	~1GB	Quick prototyping
medium	🚀 Good	🟢 Good	~1GB	General purpose
large-v1	🐢 Moderate	🟢🟢 Better	~2GB	Better accuracy needed
large-v2	🐌 Slow	🟢🟢🟢 Very Good	~5GB	High accuracy required
large-v3	🐌🐌 Very Slow	🟢🟢🟢🟢 Excellent	~10GB	Research/production

🔧 Advanced Configuration

Custom Intent Keywords

You can extend or modify the intent keywords by editing the intent_keywords dictionary in the code:

self.intent_keywords = {
    "custom_intent": {
        "urdu": ["کلیدی لفظ", "دوسرا لفظ"],
        "english": ["keyword1", "keyword2"]
    },
    # ... existing intents
}

GPU Acceleration

The tool automatically uses GPU if available. To force CPU usage:

# In the code, remove fp16 parameter:
result = self.model.transcribe(
    audio_path,
    language="ur",
    task="translate"
    # Remove: fp16=torch.cuda.is_available()
)

📝 Example Use Cases

Customer Service: Automatically categorize customer calls
Voice Assistants: Understand user commands in Urdu
Healthcare: Triage patient concerns based on urgency
Education: Analyze student questions in online learning
Business Analytics: Understand customer feedback from calls

🐛 Troubleshooting

Common Issues

"Audio file not found"
- Ensure the file path is correct
- Check file permissions
Poor transcription quality
- Try a larger Whisper model (--model medium)
- Ensure clear audio quality
- Check if audio contains Urdu speech
Slow processing
- Use smaller model (--model tiny or --model base)
- Ensure GPU is available and properly configured
- Reduce audio file size or duration
FFmpeg errors
- Reinstall FFmpeg
- Ensure FFmpeg is in system PATH

Debug Mode

For debugging, you can enable more verbose output by modifying the code:

# Set verbose=True in transcribe calls
result = self.model.transcribe(
    audio_path,
    language="ur",
    task="translate",
    verbose=True,  # Add this line
    fp16=torch.cuda.is_available()
)

7.6 KiB Raw Blame History