2025-12-08 01:47:26 +05:00
2025-12-08 01:47:26 +05:00
2025-12-08 01:47:02 +05:00
2025-12-08 01:37:17 +05:00
2025-12-08 01:34:12 +05:00
2025-12-08 01:34:20 +05:00

🎤 Urdu Speech Intent Recognition using Whisper

A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.

Features

  • 🎙️ Urdu Speech Transcription: Accurate transcription of Urdu audio using Whisper
  • 🌐 Built-in Translation: Direct Urdu-to-English translation using Whisper's translation capability
  • 🎯 Intent Detection: Identifies user intent from conversation (questions, requests, commands, etc.)
  • 😊 Sentiment Analysis: Basic sentiment detection (positive/negative/neutral)
  • 📊 Confidence Scoring: Provides confidence scores for both transcription and intent detection
  • 🔧 Multiple Model Sizes: Support for base, medium, large-v1, large-v2, large-v3 Whisper models
  • 💾 JSON Export: Option to save results in structured JSON format
  • 🎵 Multi-format Support: Works with MP3, WAV, M4A, FLAC, and other common audio formats

📋 Supported Intents

The system can detect the following intents:

Intent Description Example Keywords
greeting Starting a conversation "سلام", "ہیلو", "السلام علیکم"
question Asking questions "کیا", "کب", "کیوں", "کسے"
request Making requests "براہ کرم", "مہربانی", "مدد چاہیے"
command Giving commands "کرو", "لاؤ", "دیں", "بناؤ"
complaint Expressing complaints "شکایت", "مسئلہ", "پریشانی"
information Seeking information "بتائیں", "جانیں", "تفصیل"
emergency Emergency situations "حادثہ", "ایمرجنسی", "فوری"
appointment Scheduling meetings "ملاقات", "اپائنٹمنٹ", "تاریخ"
farewell Ending conversations "اللہ حافظ", "خدا حافظ", "اختتام"
thanks Expressing gratitude "شکریہ", "آپ کا بہت شکریہ"

🚀 Quick Start

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/urdu-intent-recognition.git
cd urdu-intent-recognition
  1. Install required packages:
pip install openai-whisper torch torchaudio
  1. Install FFmpeg (required for audio processing):
    • Ubuntu/Debian: sudo apt-get install ffmpeg
    • macOS: brew install ffmpeg
    • Windows: Download from ffmpeg.org

Basic Usage

# Process an Urdu audio file
python urdu_intent_extractor.py path/to/your/audio.mp3

# Use a larger model for better accuracy
python urdu_intent_extractor.py audio.mp3 --model medium

# Save results to JSON file
python urdu_intent_extractor.py audio.mp3 --output results.json

📖 Detailed Usage

Command Line Arguments

python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]

Arguments:

  • AUDIO_FILE: Path to the audio file to process (required)

Options:

  • --model: Whisper model size (default: "base")
    • Choices: tiny, base, small, medium, large
    • Larger models are more accurate but slower
  • --output: Save results to JSON file
  • --quiet: Minimal console output
  • --help: Show help message

Python API Usage

You can also use the tool programmatically:

from urdu_intent_extractor import UrduIntentExtractor

# Initialize the extractor
extractor = UrduIntentExtractor(model_size="base")

# Process an audio file
results = extractor.process_audio_file("path/to/audio.mp3")

# Access results
print(f"Urdu Transcription: {results['transcription']['urdu']}")
print(f"English Translation: {results['transcription']['english']}")
print(f"Detected Intent: {results['intent']['type']}")
print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
print(f"Sentiment: {results['sentiment']['type']}")

Example Output

==============================================================
URDU SPEECH INTENT ANALYSIS RESULTS
==============================================================

📁 File: conversation.mp3

🗣️ URDU TRANSCRIPTION:
   السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں

🌐 ENGLISH TRANSLATION:
   Hello, I want to ask you a question

🎯 DETECTED INTENT:
   ❓ Asking a question or seeking clarification
   Confidence: 85.0%
   Urdu keywords found: سوال
   English keywords found: question

😊 SENTIMENT:
   NEUTRAL
   Confidence: 50.0%

==============================================================

🏗️ Architecture

The system works in three main steps:

  1. Transcription: Whisper transcribes the Urdu audio to Urdu text
  2. Translation: Whisper translates the Urdu text to English (using task="translate")
  3. Intent Analysis: Analyzes both Urdu and English text for intent keywords

Intent Detection Algorithm

  1. Bilingual Keyword Matching: Checks for intent keywords in both Urdu and English text
  2. Scoring System: Assigns scores based on keyword matches
  3. Confidence Calculation: Calculates confidence based on match frequency and text length
  4. Sentiment Analysis: Basic sentiment detection using positive/negative keywords

📊 Model Performance

Model Size Speed Accuracy GPU Memory Best Use Case
base Fast 🟡 Moderate ~1GB Quick prototyping
medium 🚀 Good 🟢 Good ~1GB General purpose
large-v1 🐢 Moderate 🟢🟢 Better ~2GB Better accuracy needed
large-v2 🐌 Slow 🟢🟢🟢 Very Good ~5GB High accuracy required
large-v3 🐌🐌 Very Slow 🟢🟢🟢🟢 Excellent ~10GB Research/production

🔧 Advanced Configuration

Custom Intent Keywords

You can extend or modify the intent keywords by editing the intent_keywords dictionary in the code:

self.intent_keywords = {
    "custom_intent": {
        "urdu": ["کلیدی لفظ", "دوسرا لفظ"],
        "english": ["keyword1", "keyword2"]
    },
    # ... existing intents
}

GPU Acceleration

The tool automatically uses GPU if available. To force CPU usage:

# In the code, remove fp16 parameter:
result = self.model.transcribe(
    audio_path,
    language="ur",
    task="translate"
    # Remove: fp16=torch.cuda.is_available()
)

📝 Example Use Cases

  1. Customer Service: Automatically categorize customer calls
  2. Voice Assistants: Understand user commands in Urdu
  3. Healthcare: Triage patient concerns based on urgency
  4. Education: Analyze student questions in online learning
  5. Business Analytics: Understand customer feedback from calls

🐛 Troubleshooting

Common Issues

  1. "Audio file not found"

    • Ensure the file path is correct
    • Check file permissions
  2. Poor transcription quality

    • Try a larger Whisper model (--model medium)
    • Ensure clear audio quality
    • Check if audio contains Urdu speech
  3. Slow processing

    • Use smaller model (--model tiny or --model base)
    • Ensure GPU is available and properly configured
    • Reduce audio file size or duration
  4. FFmpeg errors

    • Reinstall FFmpeg
    • Ensure FFmpeg is in system PATH

Debug Mode

For debugging, you can enable more verbose output by modifying the code:

# Set verbose=True in transcribe calls
result = self.model.transcribe(
    audio_path,
    language="ur",
    task="translate",
    verbose=True,  # Add this line
    fp16=torch.cuda.is_available()
)
Description
No description provided
Readme 115 KiB
Languages
Python 100%