Compare commits
5 Commits
b4f224796d
...
8eee8af4f2
| Author | SHA1 | Date | |
|---|---|---|---|
| 8eee8af4f2 | |||
| 6d8738b231 | |||
| 71ba6001f6 | |||
| ba876f93d4 | |||
| 329222934f |
232
README.md
232
README.md
@ -1,2 +1,232 @@
|
|||||||
# salam_bot
|
# 🎤 Urdu Speech Intent Recognition using Whisper
|
||||||
|
|
||||||
|
A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.
|
||||||
|
|
||||||
|
## ✨ Features
|
||||||
|
|
||||||
|
- **🎙️ Urdu Speech Transcription**: Accurate transcription of Urdu audio using Whisper
|
||||||
|
- **🌐 Built-in Translation**: Direct Urdu-to-English translation using Whisper's translation capability
|
||||||
|
- **🎯 Intent Detection**: Identifies user intent from conversation (questions, requests, commands, etc.)
|
||||||
|
- **😊 Sentiment Analysis**: Basic sentiment detection (positive/negative/neutral)
|
||||||
|
- **📊 Confidence Scoring**: Provides confidence scores for both transcription and intent detection
|
||||||
|
- **🔧 Multiple Model Sizes**: Support for tiny, base, small, medium, and large Whisper models
|
||||||
|
- **💾 JSON Export**: Option to save results in structured JSON format
|
||||||
|
- **🎵 Multi-format Support**: Works with MP3, WAV, M4A, FLAC, and other common audio formats
|
||||||
|
|
||||||
|
## 📋 Supported Intents
|
||||||
|
|
||||||
|
The system can detect the following intents:
|
||||||
|
|
||||||
|
| Intent | Description | Example Keywords |
|
||||||
|
|--------|-------------|------------------|
|
||||||
|
| **greeting** | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" |
|
||||||
|
| **question** | Asking questions | "کیا", "کب", "کیوں", "کسے" |
|
||||||
|
| **request** | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" |
|
||||||
|
| **command** | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" |
|
||||||
|
| **complaint** | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" |
|
||||||
|
| **information** | Seeking information | "بتائیں", "جانیں", "تفصیل" |
|
||||||
|
| **emergency** | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" |
|
||||||
|
| **appointment** | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" |
|
||||||
|
| **farewell** | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" |
|
||||||
|
| **thanks** | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" |
|
||||||
|
|
||||||
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
1. **Clone the repository:**
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/yourusername/urdu-intent-recognition.git
|
||||||
|
cd urdu-intent-recognition
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Install required packages:**
|
||||||
|
```bash
|
||||||
|
pip install openai-whisper torch torchaudio
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Install FFmpeg (required for audio processing):**
|
||||||
|
- **Ubuntu/Debian:** `sudo apt-get install ffmpeg`
|
||||||
|
- **macOS:** `brew install ffmpeg`
|
||||||
|
- **Windows:** Download from [ffmpeg.org](https://ffmpeg.org/download.html)
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Process an Urdu audio file
|
||||||
|
python urdu_intent_extractor.py path/to/your/audio.mp3
|
||||||
|
|
||||||
|
# Use a larger model for better accuracy
|
||||||
|
python urdu_intent_extractor.py audio.mp3 --model medium
|
||||||
|
|
||||||
|
# Save results to JSON file
|
||||||
|
python urdu_intent_extractor.py audio.mp3 --output results.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📖 Detailed Usage
|
||||||
|
|
||||||
|
### Command Line Arguments
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Arguments:**
|
||||||
|
- `AUDIO_FILE`: Path to the audio file to process (required)
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- `--model`: Whisper model size (default: "base")
|
||||||
|
- Choices: `tiny`, `base`, `small`, `medium`, `large`
|
||||||
|
- Larger models are more accurate but slower
|
||||||
|
- `--output`: Save results to JSON file
|
||||||
|
- `--quiet`: Minimal console output
|
||||||
|
- `--help`: Show help message
|
||||||
|
|
||||||
|
### Python API Usage
|
||||||
|
|
||||||
|
You can also use the tool programmatically:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from urdu_intent_extractor import UrduIntentExtractor
|
||||||
|
|
||||||
|
# Initialize the extractor
|
||||||
|
extractor = UrduIntentExtractor(model_size="base")
|
||||||
|
|
||||||
|
# Process an audio file
|
||||||
|
results = extractor.process_audio_file("path/to/audio.mp3")
|
||||||
|
|
||||||
|
# Access results
|
||||||
|
print(f"Urdu Transcription: {results['transcription']['urdu']}")
|
||||||
|
print(f"English Translation: {results['transcription']['english']}")
|
||||||
|
print(f"Detected Intent: {results['intent']['type']}")
|
||||||
|
print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
|
||||||
|
print(f"Sentiment: {results['sentiment']['type']}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example Output
|
||||||
|
|
||||||
|
```
|
||||||
|
==============================================================
|
||||||
|
URDU SPEECH INTENT ANALYSIS RESULTS
|
||||||
|
==============================================================
|
||||||
|
|
||||||
|
📁 File: conversation.mp3
|
||||||
|
|
||||||
|
🗣️ URDU TRANSCRIPTION:
|
||||||
|
السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں
|
||||||
|
|
||||||
|
🌐 ENGLISH TRANSLATION:
|
||||||
|
Hello, I want to ask you a question
|
||||||
|
|
||||||
|
🎯 DETECTED INTENT:
|
||||||
|
❓ Asking a question or seeking clarification
|
||||||
|
Confidence: 85.0%
|
||||||
|
Urdu keywords found: سوال
|
||||||
|
English keywords found: question
|
||||||
|
|
||||||
|
😊 SENTIMENT:
|
||||||
|
NEUTRAL
|
||||||
|
Confidence: 50.0%
|
||||||
|
|
||||||
|
==============================================================
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🏗️ Architecture
|
||||||
|
|
||||||
|
The system works in three main steps:
|
||||||
|
|
||||||
|
1. **Transcription**: Whisper transcribes the Urdu audio to Urdu text
|
||||||
|
2. **Translation**: Whisper translates the Urdu text to English (using `task="translate"`)
|
||||||
|
3. **Intent Analysis**: Analyzes both Urdu and English text for intent keywords
|
||||||
|
|
||||||
|
### Intent Detection Algorithm
|
||||||
|
|
||||||
|
1. **Bilingual Keyword Matching**: Checks for intent keywords in both Urdu and English text
|
||||||
|
2. **Scoring System**: Assigns scores based on keyword matches
|
||||||
|
3. **Confidence Calculation**: Calculates confidence based on match frequency and text length
|
||||||
|
4. **Sentiment Analysis**: Basic sentiment detection using positive/negative keywords
|
||||||
|
|
||||||
|
## 📊 Model Performance
|
||||||
|
|
||||||
|
| Model Size | Speed | Accuracy | GPU Memory | Best Use Case |
|
||||||
|
|------------|-------|----------|------------|---------------|
|
||||||
|
| **base** | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping |
|
||||||
|
| **medium** | 🚀 Good | 🟢 Good | ~1GB | General purpose |
|
||||||
|
| **large-v1** | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed |
|
||||||
|
| **large-v2** | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required |
|
||||||
|
| **large-v3** | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production |
|
||||||
|
|
||||||
|
## 🔧 Advanced Configuration
|
||||||
|
|
||||||
|
### Custom Intent Keywords
|
||||||
|
|
||||||
|
You can extend or modify the intent keywords by editing the `intent_keywords` dictionary in the code:
|
||||||
|
|
||||||
|
```python
|
||||||
|
self.intent_keywords = {
|
||||||
|
"custom_intent": {
|
||||||
|
"urdu": ["کلیدی لفظ", "دوسرا لفظ"],
|
||||||
|
"english": ["keyword1", "keyword2"]
|
||||||
|
},
|
||||||
|
# ... existing intents
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### GPU Acceleration
|
||||||
|
|
||||||
|
The tool automatically uses GPU if available. To force CPU usage:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# In the code, remove fp16 parameter:
|
||||||
|
result = self.model.transcribe(
|
||||||
|
audio_path,
|
||||||
|
language="ur",
|
||||||
|
task="translate"
|
||||||
|
# Remove: fp16=torch.cuda.is_available()
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📝 Example Use Cases
|
||||||
|
|
||||||
|
1. **Customer Service**: Automatically categorize customer calls
|
||||||
|
2. **Voice Assistants**: Understand user commands in Urdu
|
||||||
|
3. **Healthcare**: Triage patient concerns based on urgency
|
||||||
|
4. **Education**: Analyze student questions in online learning
|
||||||
|
5. **Business Analytics**: Understand customer feedback from calls
|
||||||
|
|
||||||
|
## 🐛 Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
1. **"Audio file not found"**
|
||||||
|
- Ensure the file path is correct
|
||||||
|
- Check file permissions
|
||||||
|
|
||||||
|
2. **Poor transcription quality**
|
||||||
|
- Try a larger Whisper model (`--model medium`)
|
||||||
|
- Ensure clear audio quality
|
||||||
|
- Check if audio contains Urdu speech
|
||||||
|
|
||||||
|
3. **Slow processing**
|
||||||
|
- Use smaller model (`--model tiny` or `--model base`)
|
||||||
|
- Ensure GPU is available and properly configured
|
||||||
|
- Reduce audio file size or duration
|
||||||
|
|
||||||
|
4. **FFmpeg errors**
|
||||||
|
- Reinstall FFmpeg
|
||||||
|
- Ensure FFmpeg is in system PATH
|
||||||
|
|
||||||
|
### Debug Mode
|
||||||
|
|
||||||
|
For debugging, you can enable more verbose output by modifying the code:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Set verbose=True in transcribe calls
|
||||||
|
result = self.model.transcribe(
|
||||||
|
audio_path,
|
||||||
|
language="ur",
|
||||||
|
task="translate",
|
||||||
|
verbose=True, # Add this line
|
||||||
|
fp16=torch.cuda.is_available()
|
||||||
|
)
|
||||||
|
```
|
||||||
BIN
Recording.mp3
Normal file
BIN
Recording.mp3
Normal file
Binary file not shown.
318
helpers/audio_analysis.py
Normal file
318
helpers/audio_analysis.py
Normal file
@ -0,0 +1,318 @@
|
|||||||
|
import whisper
|
||||||
|
import torch
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
from typing import Dict, Tuple, Optional
|
||||||
|
import warnings
|
||||||
|
|
||||||
|
warnings.filterwarnings('ignore')
|
||||||
|
|
||||||
|
class UrduIntentExtractor:
|
||||||
|
def __init__(self, model_size: str = "large-v3"):
|
||||||
|
"""
|
||||||
|
Initialize Urdu intent extractor using Whisper
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_size: Whisper model size (tiny, base, small, medium, large)
|
||||||
|
"""
|
||||||
|
print(f"Loading Whisper {model_size} model...")
|
||||||
|
self.model = whisper.load_model(model_size)
|
||||||
|
|
||||||
|
# Comprehensive intent mapping for Urdu and English
|
||||||
|
self.intent_keywords = {
|
||||||
|
"greeting": {
|
||||||
|
"urdu": ["سلام", "السلام علیکم", "ہیلو", "آداب", "صبح بخیر", "شام بخیر"],
|
||||||
|
"english": ["hello", "hi", "greetings", "good morning", "good evening", "assalam"]
|
||||||
|
},
|
||||||
|
"question": {
|
||||||
|
"urdu": ["کیا", "کب", "کیوں", "کسے", "کہاں", "کس طرح", "کتنا", "کیسے"],
|
||||||
|
"english": ["what", "when", "why", "who", "where", "how", "how much", "which"]
|
||||||
|
},
|
||||||
|
"request": {
|
||||||
|
"urdu": ["براہ کرم", "مہربانی", "چاہتا ہوں", "چاہتی ہوں", "درکار ہے", "مدد چاہیے"],
|
||||||
|
"english": ["please", "kindly", "want", "need", "require", "help", "could you", "would you"]
|
||||||
|
},
|
||||||
|
"command": {
|
||||||
|
"urdu": ["کرو", "کریں", "لاؤ", "دیں", "بناؤ", "روکو", "جاؤ", "آؤ"],
|
||||||
|
"english": ["do", "make", "bring", "give", "create", "stop", "go", "come"]
|
||||||
|
},
|
||||||
|
"complaint": {
|
||||||
|
"urdu": ["شکایت", "مسئلہ", "پریشانی", "غلط", "خراب", "نقص", "برا"],
|
||||||
|
"english": ["complaint", "problem", "issue", "wrong", "bad", "fault", "error"]
|
||||||
|
},
|
||||||
|
"information": {
|
||||||
|
"urdu": ["بتائیں", "جانیں", "معلوم", "تفصیل", "رہنمائی", "بتاؤ"],
|
||||||
|
"english": ["tell", "know", "information", "details", "guide", "explain"]
|
||||||
|
},
|
||||||
|
"emergency": {
|
||||||
|
"urdu": ["حادثہ", "ایمرجنسی", "تباہی", "بچاؤ", "جلدی", "فوری", "خطرہ"],
|
||||||
|
"english": ["accident", "emergency", "help", "urgent", "quick", "danger", "dangerous"]
|
||||||
|
},
|
||||||
|
"appointment": {
|
||||||
|
"urdu": ["ملاقات", "اپائنٹمنٹ", "ٹائم", "تاریخ", "وقت", "دن"],
|
||||||
|
"english": ["meeting", "appointment", "time", "date", "schedule", "day"]
|
||||||
|
},
|
||||||
|
"farewell": {
|
||||||
|
"urdu": ["اللہ حافظ", "خدا حافظ", "بای", "اختتام", "ختم", "اگلی بار"],
|
||||||
|
"english": ["goodbye", "bye", "farewell", "end", "see you", "next time"]
|
||||||
|
},
|
||||||
|
"thanks": {
|
||||||
|
"urdu": ["شکریہ", "مہربانی", "آپ کا بہت شکریہ", "تھینکس"],
|
||||||
|
"english": ["thank", "thanks", "grateful", "appreciate"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
def transcribe_and_translate(self, audio_path: str) -> Dict[str, str]:
|
||||||
|
"""
|
||||||
|
Transcribe Urdu audio and translate to English using Whisper
|
||||||
|
|
||||||
|
Args:
|
||||||
|
audio_path: Path to audio file
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary containing Urdu transcription and English translation
|
||||||
|
"""
|
||||||
|
print(f"\nProcessing audio file: {os.path.basename(audio_path)}")
|
||||||
|
|
||||||
|
# First, transcribe in Urdu
|
||||||
|
print("Transcribing in Urdu...")
|
||||||
|
urdu_result = self.model.transcribe(
|
||||||
|
audio_path,
|
||||||
|
language="ur", # Force Urdu language
|
||||||
|
task="transcribe",
|
||||||
|
fp16=torch.cuda.is_available()
|
||||||
|
)
|
||||||
|
urdu_text = urdu_result["text"].strip()
|
||||||
|
|
||||||
|
# Then, translate to English
|
||||||
|
print("Translating to English...")
|
||||||
|
english_result = self.model.transcribe(
|
||||||
|
audio_path,
|
||||||
|
language="ur", # Source language is Urdu
|
||||||
|
task="translate", # This tells Whisper to translate
|
||||||
|
fp16=torch.cuda.is_available()
|
||||||
|
)
|
||||||
|
english_text = english_result["text"].strip()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"urdu": urdu_text,
|
||||||
|
"english": english_text,
|
||||||
|
"urdu_segments": urdu_result.get("segments", []),
|
||||||
|
"english_segments": english_result.get("segments", [])
|
||||||
|
}
|
||||||
|
|
||||||
|
def extract_intent(self, urdu_text: str, english_text: str) -> Tuple[str, float, Dict]:
|
||||||
|
"""
|
||||||
|
Extract main intent from both Urdu and English texts
|
||||||
|
|
||||||
|
Args:
|
||||||
|
urdu_text: Original Urdu transcription
|
||||||
|
english_text: Translated English text
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (intent, confidence, details)
|
||||||
|
"""
|
||||||
|
print("\nAnalyzing intent...")
|
||||||
|
|
||||||
|
# Prepare text for analysis
|
||||||
|
urdu_lower = urdu_text.lower()
|
||||||
|
english_lower = english_text.lower()
|
||||||
|
|
||||||
|
# Calculate intent scores
|
||||||
|
intent_scores = {}
|
||||||
|
intent_details = {}
|
||||||
|
|
||||||
|
for intent, keywords in self.intent_keywords.items():
|
||||||
|
# Count Urdu keyword matches
|
||||||
|
urdu_matches = []
|
||||||
|
for keyword in keywords["urdu"]:
|
||||||
|
if keyword in urdu_lower:
|
||||||
|
urdu_matches.append(keyword)
|
||||||
|
|
||||||
|
# Count English keyword matches
|
||||||
|
english_matches = []
|
||||||
|
for keyword in keywords["english"]:
|
||||||
|
if keyword.lower() in english_lower:
|
||||||
|
english_matches.append(keyword)
|
||||||
|
|
||||||
|
# Calculate scores
|
||||||
|
urdu_score = len(urdu_matches)
|
||||||
|
english_score = len(english_matches)
|
||||||
|
total_score = urdu_score + english_score
|
||||||
|
|
||||||
|
if total_score > 0:
|
||||||
|
intent_scores[intent] = total_score
|
||||||
|
intent_details[intent] = {
|
||||||
|
"urdu_matches": urdu_matches,
|
||||||
|
"english_matches": english_matches,
|
||||||
|
"urdu_score": urdu_score,
|
||||||
|
"english_score": english_score,
|
||||||
|
"total_score": total_score
|
||||||
|
}
|
||||||
|
|
||||||
|
# Determine main intent
|
||||||
|
if intent_scores:
|
||||||
|
# Get intent with highest score
|
||||||
|
main_intent = max(intent_scores, key=intent_scores.get)
|
||||||
|
|
||||||
|
# Calculate confidence based on multiple factors
|
||||||
|
total_words = len(english_lower.split()) + len(urdu_lower.split())
|
||||||
|
base_confidence = intent_scores[main_intent] / max(1, total_words / 5)
|
||||||
|
|
||||||
|
# Boost confidence if matches found in both languages
|
||||||
|
if (intent_details[main_intent]["urdu_score"] > 0 and
|
||||||
|
intent_details[main_intent]["english_score"] > 0):
|
||||||
|
base_confidence *= 1.5
|
||||||
|
|
||||||
|
confidence = min(base_confidence, 1.0)
|
||||||
|
else:
|
||||||
|
main_intent = "general_conversation"
|
||||||
|
confidence = 0.3
|
||||||
|
intent_details[main_intent] = {
|
||||||
|
"urdu_matches": [],
|
||||||
|
"english_matches": [],
|
||||||
|
"urdu_score": 0,
|
||||||
|
"english_score": 0,
|
||||||
|
"total_score": 0
|
||||||
|
}
|
||||||
|
|
||||||
|
return main_intent, confidence, intent_details[main_intent]
|
||||||
|
|
||||||
|
def get_intent_description(self, intent: str) -> str:
|
||||||
|
"""
|
||||||
|
Get human-readable description for intent
|
||||||
|
|
||||||
|
Args:
|
||||||
|
intent: Detected intent
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Description string
|
||||||
|
"""
|
||||||
|
descriptions = {
|
||||||
|
"greeting": "👋 Greeting or starting a conversation",
|
||||||
|
"question": "❓ Asking a question or seeking clarification",
|
||||||
|
"request": "🙏 Making a request or asking for something",
|
||||||
|
"command": "⚡ Giving a command or instruction",
|
||||||
|
"complaint": "😠 Expressing a complaint or dissatisfaction",
|
||||||
|
"information": "ℹ️ Seeking or providing information",
|
||||||
|
"emergency": "🚨 Emergency situation requiring immediate attention",
|
||||||
|
"appointment": "📅 Scheduling or inquiring about a meeting/appointment",
|
||||||
|
"farewell": "👋 Ending the conversation",
|
||||||
|
"thanks": "🙏 Expressing gratitude or thanks",
|
||||||
|
"general_conversation": "💬 General conversation without specific intent"
|
||||||
|
}
|
||||||
|
return descriptions.get(intent, "💭 Unknown or general conversation")
|
||||||
|
|
||||||
|
def analyze_sentiment(self, english_text: str) -> Tuple[str, float]:
|
||||||
|
"""
|
||||||
|
Basic sentiment analysis based on keywords
|
||||||
|
|
||||||
|
Args:
|
||||||
|
english_text: English translated text
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (sentiment, confidence)
|
||||||
|
"""
|
||||||
|
positive_words = ["good", "great", "excellent", "happy", "thanks", "thank", "please",
|
||||||
|
"wonderful", "nice", "helpful", "appreciate", "love", "like"]
|
||||||
|
negative_words = ["bad", "wrong", "problem", "issue", "complaint", "angry", "upset",
|
||||||
|
"terrible", "horrible", "hate", "not working", "broken", "failed"]
|
||||||
|
|
||||||
|
text_lower = english_text.lower()
|
||||||
|
|
||||||
|
positive_count = sum(1 for word in positive_words if word in text_lower)
|
||||||
|
negative_count = sum(1 for word in negative_words if word in text_lower)
|
||||||
|
|
||||||
|
if positive_count > negative_count:
|
||||||
|
return "positive", positive_count / max(1, (positive_count + negative_count))
|
||||||
|
elif negative_count > positive_count:
|
||||||
|
return "negative", negative_count / max(1, (positive_count + negative_count))
|
||||||
|
else:
|
||||||
|
return "neutral", 0.5
|
||||||
|
|
||||||
|
def process_audio_file(self, audio_path: str, verbose: bool = True) -> Dict:
|
||||||
|
"""
|
||||||
|
Main function to process audio file and extract intent
|
||||||
|
|
||||||
|
Args:
|
||||||
|
audio_path: Path to audio file
|
||||||
|
verbose: Whether to print detailed output
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with all analysis results
|
||||||
|
"""
|
||||||
|
# Validate file
|
||||||
|
if not os.path.exists(audio_path):
|
||||||
|
raise FileNotFoundError(f"Audio file not found: {audio_path}")
|
||||||
|
|
||||||
|
# Transcribe and translate
|
||||||
|
results = self.transcribe_and_translate(audio_path)
|
||||||
|
|
||||||
|
# Extract intent
|
||||||
|
intent, confidence, intent_details = self.extract_intent(
|
||||||
|
results["urdu"],
|
||||||
|
results["english"]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Analyze sentiment
|
||||||
|
sentiment, sentiment_confidence = self.analyze_sentiment(results["english"])
|
||||||
|
|
||||||
|
# Prepare final results
|
||||||
|
final_results = {
|
||||||
|
"file": os.path.basename(audio_path),
|
||||||
|
"transcription": {
|
||||||
|
"urdu": results["urdu"],
|
||||||
|
"english": results["english"]
|
||||||
|
},
|
||||||
|
"intent": {
|
||||||
|
"type": intent,
|
||||||
|
"confidence": confidence,
|
||||||
|
"description": self.get_intent_description(intent),
|
||||||
|
"details": intent_details
|
||||||
|
},
|
||||||
|
"sentiment": {
|
||||||
|
"type": sentiment,
|
||||||
|
"confidence": sentiment_confidence
|
||||||
|
},
|
||||||
|
"segments": {
|
||||||
|
"urdu": results.get("urdu_segments", []),
|
||||||
|
"english": results.get("english_segments", [])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Print results if verbose
|
||||||
|
if verbose:
|
||||||
|
self.print_results(final_results)
|
||||||
|
|
||||||
|
return final_results
|
||||||
|
|
||||||
|
def print_results(self, results: Dict):
|
||||||
|
"""
|
||||||
|
Print analysis results in a formatted way
|
||||||
|
"""
|
||||||
|
print("\n" + "="*70)
|
||||||
|
print("URDU SPEECH INTENT ANALYSIS RESULTS")
|
||||||
|
print("="*70)
|
||||||
|
|
||||||
|
print(f"\n📁 File: {results['file']}")
|
||||||
|
|
||||||
|
print(f"\n🗣️ URDU TRANSCRIPTION:")
|
||||||
|
print(f" {results['transcription']['urdu']}")
|
||||||
|
|
||||||
|
print(f"\n🌐 ENGLISH TRANSLATION:")
|
||||||
|
print(f" {results['transcription']['english']}")
|
||||||
|
|
||||||
|
print(f"\n🎯 DETECTED INTENT:")
|
||||||
|
print(f" {results['intent']['description']}")
|
||||||
|
print(f" Confidence: {results['intent']['confidence']:.1%}")
|
||||||
|
|
||||||
|
if results['intent']['details']['urdu_matches']:
|
||||||
|
print(f" Urdu keywords found: {', '.join(results['intent']['details']['urdu_matches'])}")
|
||||||
|
if results['intent']['details']['english_matches']:
|
||||||
|
print(f" English keywords found: {', '.join(results['intent']['details']['english_matches'])}")
|
||||||
|
|
||||||
|
print(f"\n😊 SENTIMENT:")
|
||||||
|
print(f" {results['sentiment']['type'].upper()}")
|
||||||
|
print(f" Confidence: {results['sentiment']['confidence']:.1%}")
|
||||||
|
|
||||||
|
print("\n" + "="*70)
|
||||||
60
main.py
Normal file
60
main.py
Normal file
@ -0,0 +1,60 @@
|
|||||||
|
import argparse
|
||||||
|
from helpers.audio_analysis import UrduIntentExtractor
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""
|
||||||
|
Command-line interface for Urdu Intent Extractor
|
||||||
|
"""
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Extract intent from Urdu speech using Whisper translation"
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"audio_file",
|
||||||
|
help="Path to audio file (mp3, wav, m4a, flac, etc.)"
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"--model",
|
||||||
|
default="base",
|
||||||
|
choices=["base", "medium", "large-v1", "large-v2", "large-v3"],
|
||||||
|
help="Whisper model size (default: base)"
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
help="Save results to JSON file"
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"--quiet",
|
||||||
|
action="store_true",
|
||||||
|
help="Minimal output"
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Initialize extractor
|
||||||
|
extractor = UrduIntentExtractor(model_size=args.model)
|
||||||
|
|
||||||
|
# Process audio file
|
||||||
|
results = extractor.process_audio_file(
|
||||||
|
args.audio_file,
|
||||||
|
verbose=not args.quiet
|
||||||
|
)
|
||||||
|
|
||||||
|
# Save to JSON if requested
|
||||||
|
if args.output:
|
||||||
|
import json
|
||||||
|
with open(args.output, 'w', encoding='utf-8') as f:
|
||||||
|
json.dump(results, f, ensure_ascii=False, indent=2)
|
||||||
|
print(f"\nResults saved to: {args.output}")
|
||||||
|
|
||||||
|
except FileNotFoundError as e:
|
||||||
|
print(f"❌ Error: {e}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ An error occurred: {str(e)}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
8
requirements.txt
Normal file
8
requirements.txt
Normal file
@ -0,0 +1,8 @@
|
|||||||
|
openai-whisper
|
||||||
|
torch
|
||||||
|
torchaudio
|
||||||
|
transformers
|
||||||
|
argparse
|
||||||
|
huggingface_hub[hf_xet]
|
||||||
|
sentencepiece
|
||||||
|
sacremoses
|
||||||
Loading…
x
Reference in New Issue
Block a user