updated readme
This commit is contained in:
parent
329222934f
commit
ba876f93d4
232
README.md
232
README.md
@ -1,2 +1,232 @@
|
|||||||
# salam_bot
|
# 🎤 Urdu Speech Intent Recognition using Whisper
|
||||||
|
|
||||||
|
A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.
|
||||||
|
|
||||||
|
## ✨ Features
|
||||||
|
|
||||||
|
- **🎙️ Urdu Speech Transcription**: Accurate transcription of Urdu audio using Whisper
|
||||||
|
- **🌐 Built-in Translation**: Direct Urdu-to-English translation using Whisper's translation capability
|
||||||
|
- **🎯 Intent Detection**: Identifies user intent from conversation (questions, requests, commands, etc.)
|
||||||
|
- **😊 Sentiment Analysis**: Basic sentiment detection (positive/negative/neutral)
|
||||||
|
- **📊 Confidence Scoring**: Provides confidence scores for both transcription and intent detection
|
||||||
|
- **🔧 Multiple Model Sizes**: Support for tiny, base, small, medium, and large Whisper models
|
||||||
|
- **💾 JSON Export**: Option to save results in structured JSON format
|
||||||
|
- **🎵 Multi-format Support**: Works with MP3, WAV, M4A, FLAC, and other common audio formats
|
||||||
|
|
||||||
|
## 📋 Supported Intents
|
||||||
|
|
||||||
|
The system can detect the following intents:
|
||||||
|
|
||||||
|
| Intent | Description | Example Keywords |
|
||||||
|
|--------|-------------|------------------|
|
||||||
|
| **greeting** | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" |
|
||||||
|
| **question** | Asking questions | "کیا", "کب", "کیوں", "کسے" |
|
||||||
|
| **request** | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" |
|
||||||
|
| **command** | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" |
|
||||||
|
| **complaint** | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" |
|
||||||
|
| **information** | Seeking information | "بتائیں", "جانیں", "تفصیل" |
|
||||||
|
| **emergency** | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" |
|
||||||
|
| **appointment** | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" |
|
||||||
|
| **farewell** | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" |
|
||||||
|
| **thanks** | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" |
|
||||||
|
|
||||||
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
1. **Clone the repository:**
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/yourusername/urdu-intent-recognition.git
|
||||||
|
cd urdu-intent-recognition
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Install required packages:**
|
||||||
|
```bash
|
||||||
|
pip install openai-whisper torch torchaudio
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Install FFmpeg (required for audio processing):**
|
||||||
|
- **Ubuntu/Debian:** `sudo apt-get install ffmpeg`
|
||||||
|
- **macOS:** `brew install ffmpeg`
|
||||||
|
- **Windows:** Download from [ffmpeg.org](https://ffmpeg.org/download.html)
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Process an Urdu audio file
|
||||||
|
python urdu_intent_extractor.py path/to/your/audio.mp3
|
||||||
|
|
||||||
|
# Use a larger model for better accuracy
|
||||||
|
python urdu_intent_extractor.py audio.mp3 --model medium
|
||||||
|
|
||||||
|
# Save results to JSON file
|
||||||
|
python urdu_intent_extractor.py audio.mp3 --output results.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📖 Detailed Usage
|
||||||
|
|
||||||
|
### Command Line Arguments
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Arguments:**
|
||||||
|
- `AUDIO_FILE`: Path to the audio file to process (required)
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- `--model`: Whisper model size (default: "base")
|
||||||
|
- Choices: `tiny`, `base`, `small`, `medium`, `large`
|
||||||
|
- Larger models are more accurate but slower
|
||||||
|
- `--output`: Save results to JSON file
|
||||||
|
- `--quiet`: Minimal console output
|
||||||
|
- `--help`: Show help message
|
||||||
|
|
||||||
|
### Python API Usage
|
||||||
|
|
||||||
|
You can also use the tool programmatically:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from urdu_intent_extractor import UrduIntentExtractor
|
||||||
|
|
||||||
|
# Initialize the extractor
|
||||||
|
extractor = UrduIntentExtractor(model_size="base")
|
||||||
|
|
||||||
|
# Process an audio file
|
||||||
|
results = extractor.process_audio_file("path/to/audio.mp3")
|
||||||
|
|
||||||
|
# Access results
|
||||||
|
print(f"Urdu Transcription: {results['transcription']['urdu']}")
|
||||||
|
print(f"English Translation: {results['transcription']['english']}")
|
||||||
|
print(f"Detected Intent: {results['intent']['type']}")
|
||||||
|
print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
|
||||||
|
print(f"Sentiment: {results['sentiment']['type']}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example Output
|
||||||
|
|
||||||
|
```
|
||||||
|
==============================================================
|
||||||
|
URDU SPEECH INTENT ANALYSIS RESULTS
|
||||||
|
==============================================================
|
||||||
|
|
||||||
|
📁 File: conversation.mp3
|
||||||
|
|
||||||
|
🗣️ URDU TRANSCRIPTION:
|
||||||
|
السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں
|
||||||
|
|
||||||
|
🌐 ENGLISH TRANSLATION:
|
||||||
|
Hello, I want to ask you a question
|
||||||
|
|
||||||
|
🎯 DETECTED INTENT:
|
||||||
|
❓ Asking a question or seeking clarification
|
||||||
|
Confidence: 85.0%
|
||||||
|
Urdu keywords found: سوال
|
||||||
|
English keywords found: question
|
||||||
|
|
||||||
|
😊 SENTIMENT:
|
||||||
|
NEUTRAL
|
||||||
|
Confidence: 50.0%
|
||||||
|
|
||||||
|
==============================================================
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🏗️ Architecture
|
||||||
|
|
||||||
|
The system works in three main steps:
|
||||||
|
|
||||||
|
1. **Transcription**: Whisper transcribes the Urdu audio to Urdu text
|
||||||
|
2. **Translation**: Whisper translates the Urdu text to English (using `task="translate"`)
|
||||||
|
3. **Intent Analysis**: Analyzes both Urdu and English text for intent keywords
|
||||||
|
|
||||||
|
### Intent Detection Algorithm
|
||||||
|
|
||||||
|
1. **Bilingual Keyword Matching**: Checks for intent keywords in both Urdu and English text
|
||||||
|
2. **Scoring System**: Assigns scores based on keyword matches
|
||||||
|
3. **Confidence Calculation**: Calculates confidence based on match frequency and text length
|
||||||
|
4. **Sentiment Analysis**: Basic sentiment detection using positive/negative keywords
|
||||||
|
|
||||||
|
## 📊 Model Performance
|
||||||
|
|
||||||
|
| Model Size | Speed | Accuracy | GPU Memory | Best Use Case |
|
||||||
|
|------------|-------|----------|------------|---------------|
|
||||||
|
| **base** | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping |
|
||||||
|
| **medium** | 🚀 Good | 🟢 Good | ~1GB | General purpose |
|
||||||
|
| **large-v1** | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed |
|
||||||
|
| **large-v2** | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required |
|
||||||
|
| **large-v3** | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production |
|
||||||
|
|
||||||
|
## 🔧 Advanced Configuration
|
||||||
|
|
||||||
|
### Custom Intent Keywords
|
||||||
|
|
||||||
|
You can extend or modify the intent keywords by editing the `intent_keywords` dictionary in the code:
|
||||||
|
|
||||||
|
```python
|
||||||
|
self.intent_keywords = {
|
||||||
|
"custom_intent": {
|
||||||
|
"urdu": ["کلیدی لفظ", "دوسرا لفظ"],
|
||||||
|
"english": ["keyword1", "keyword2"]
|
||||||
|
},
|
||||||
|
# ... existing intents
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### GPU Acceleration
|
||||||
|
|
||||||
|
The tool automatically uses GPU if available. To force CPU usage:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# In the code, remove fp16 parameter:
|
||||||
|
result = self.model.transcribe(
|
||||||
|
audio_path,
|
||||||
|
language="ur",
|
||||||
|
task="translate"
|
||||||
|
# Remove: fp16=torch.cuda.is_available()
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📝 Example Use Cases
|
||||||
|
|
||||||
|
1. **Customer Service**: Automatically categorize customer calls
|
||||||
|
2. **Voice Assistants**: Understand user commands in Urdu
|
||||||
|
3. **Healthcare**: Triage patient concerns based on urgency
|
||||||
|
4. **Education**: Analyze student questions in online learning
|
||||||
|
5. **Business Analytics**: Understand customer feedback from calls
|
||||||
|
|
||||||
|
## 🐛 Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
1. **"Audio file not found"**
|
||||||
|
- Ensure the file path is correct
|
||||||
|
- Check file permissions
|
||||||
|
|
||||||
|
2. **Poor transcription quality**
|
||||||
|
- Try a larger Whisper model (`--model medium`)
|
||||||
|
- Ensure clear audio quality
|
||||||
|
- Check if audio contains Urdu speech
|
||||||
|
|
||||||
|
3. **Slow processing**
|
||||||
|
- Use smaller model (`--model tiny` or `--model base`)
|
||||||
|
- Ensure GPU is available and properly configured
|
||||||
|
- Reduce audio file size or duration
|
||||||
|
|
||||||
|
4. **FFmpeg errors**
|
||||||
|
- Reinstall FFmpeg
|
||||||
|
- Ensure FFmpeg is in system PATH
|
||||||
|
|
||||||
|
### Debug Mode
|
||||||
|
|
||||||
|
For debugging, you can enable more verbose output by modifying the code:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Set verbose=True in transcribe calls
|
||||||
|
result = self.model.transcribe(
|
||||||
|
audio_path,
|
||||||
|
language="ur",
|
||||||
|
task="translate",
|
||||||
|
verbose=True, # Add this line
|
||||||
|
fp16=torch.cuda.is_available()
|
||||||
|
)
|
||||||
|
```
|
||||||
Loading…
x
Reference in New Issue
Block a user