updated readme

2025-12-08 01:34:01 +05:00
parent 329222934f
commit ba876f93d4
1 changed files with 231 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,232 @@
-# salam_bot
+# 🎤 Urdu Speech Intent Recognition using Whisper
 A Python tool that transcribes Urdu speech, translates it to English, and extracts the main intent from the conversation. Built with OpenAI's Whisper model for accurate speech recognition and translation.
 ## ✨ Features
 - **🎙️ Urdu Speech Transcription**: Accurate transcription of Urdu audio using Whisper
 - **🌐 Built-in Translation**: Direct Urdu-to-English translation using Whisper's translation capability
 - **🎯 Intent Detection**: Identifies user intent from conversation (questions, requests, commands, etc.)
 - **😊 Sentiment Analysis**: Basic sentiment detection (positive/negative/neutral)
 - **📊 Confidence Scoring**: Provides confidence scores for both transcription and intent detection
 - **🔧 Multiple Model Sizes**: Support for tiny, base, small, medium, and large Whisper models
 - **💾 JSON Export**: Option to save results in structured JSON format
 - **🎵 Multi-format Support**: Works with MP3, WAV, M4A, FLAC, and other common audio formats
 ## 📋 Supported Intents
 The system can detect the following intents:
 | Intent | Description | Example Keywords |
 |--------|-------------|------------------|
 | **greeting** | Starting a conversation | "سلام", "ہیلو", "السلام علیکم" |
 | **question** | Asking questions | "کیا", "کب", "کیوں", "کسے" |
 | **request** | Making requests | "براہ کرم", "مہربانی", "مدد چاہیے" |
 | **command** | Giving commands | "کرو", "لاؤ", "دیں", "بناؤ" |
 | **complaint** | Expressing complaints | "شکایت", "مسئلہ", "پریشانی" |
 | **information** | Seeking information | "بتائیں", "جانیں", "تفصیل" |
 | **emergency** | Emergency situations | "حادثہ", "ایمرجنسی", "فوری" |
 | **appointment** | Scheduling meetings | "ملاقات", "اپائنٹمنٹ", "تاریخ" |
 | **farewell** | Ending conversations | "اللہ حافظ", "خدا حافظ", "اختتام" |
 | **thanks** | Expressing gratitude | "شکریہ", "آپ کا بہت شکریہ" |
 ## 🚀 Quick Start
 ### Installation
 1. **Clone the repository:**
 ```bash
 git clone https://github.com/yourusername/urdu-intent-recognition.git
 cd urdu-intent-recognition
 ```
 2. **Install required packages:**
 ```bash
 pip install openai-whisper torch torchaudio
 ```
 3. **Install FFmpeg (required for audio processing):**
   - **Ubuntu/Debian:** `sudo apt-get install ffmpeg`
   - **macOS:** `brew install ffmpeg`
   - **Windows:** Download from [ffmpeg.org](https://ffmpeg.org/download.html)
 ### Basic Usage
 ```bash
 # Process an Urdu audio file
 python urdu_intent_extractor.py path/to/your/audio.mp3
 # Use a larger model for better accuracy
 python urdu_intent_extractor.py audio.mp3 --model medium
 # Save results to JSON file
 python urdu_intent_extractor.py audio.mp3 --output results.json
 ```
 ## 📖 Detailed Usage
 ### Command Line Arguments
 ```bash
 python urdu_intent_extractor.py AUDIO_FILE [OPTIONS]
 ```
 **Arguments:**
 - `AUDIO_FILE`: Path to the audio file to process (required)
 **Options:**
 - `--model`: Whisper model size (default: "base")
  - Choices: `tiny`, `base`, `small`, `medium`, `large`
  - Larger models are more accurate but slower
 - `--output`: Save results to JSON file
 - `--quiet`: Minimal console output
 - `--help`: Show help message
 ### Python API Usage
 You can also use the tool programmatically:
 ```python
 from urdu_intent_extractor import UrduIntentExtractor
 # Initialize the extractor
 extractor = UrduIntentExtractor(model_size="base")
 # Process an audio file
 results = extractor.process_audio_file("path/to/audio.mp3")
 # Access results
 print(f"Urdu Transcription: {results['transcription']['urdu']}")
 print(f"English Translation: {results['transcription']['english']}")
 print(f"Detected Intent: {results['intent']['type']}")
 print(f"Intent Confidence: {results['intent']['confidence']:.1%}")
 print(f"Sentiment: {results['sentiment']['type']}")
 ```
 ### Example Output
 ```
 ==============================================================
 URDU SPEECH INTENT ANALYSIS RESULTS
 ==============================================================
 📁 File: conversation.mp3
 🗣️ URDU TRANSCRIPTION:
   السلام علیکم، میں آپ سے ایک سوال پوچھنا چاہتا ہوں
 🌐 ENGLISH TRANSLATION:
   Hello, I want to ask you a question
 🎯 DETECTED INTENT:
   ❓ Asking a question or seeking clarification
   Confidence: 85.0%
   Urdu keywords found: سوال
   English keywords found: question
 😊 SENTIMENT:
   NEUTRAL
   Confidence: 50.0%
 ==============================================================
 ```
 ## 🏗️ Architecture
 The system works in three main steps:
 1. **Transcription**: Whisper transcribes the Urdu audio to Urdu text
 2. **Translation**: Whisper translates the Urdu text to English (using `task="translate"`)
 3. **Intent Analysis**: Analyzes both Urdu and English text for intent keywords
 ### Intent Detection Algorithm
 1. **Bilingual Keyword Matching**: Checks for intent keywords in both Urdu and English text
 2. **Scoring System**: Assigns scores based on keyword matches
 3. **Confidence Calculation**: Calculates confidence based on match frequency and text length
 4. **Sentiment Analysis**: Basic sentiment detection using positive/negative keywords
 ## 📊 Model Performance
 | Model Size | Speed | Accuracy | GPU Memory | Best Use Case |
 |------------|-------|----------|------------|---------------|
 | **base**   | ⚡ Fast | 🟡 Moderate | ~1GB | Quick prototyping |
 | **medium**   | 🚀 Good | 🟢 Good | ~1GB | General purpose |
 | **large-v1**  | 🐢 Moderate | 🟢🟢 Better | ~2GB | Better accuracy needed |
 | **large-v2** | 🐌 Slow | 🟢🟢🟢 Very Good | ~5GB | High accuracy required |
 | **large-v3**  | 🐌🐌 Very Slow | 🟢🟢🟢🟢 Excellent | ~10GB | Research/production |
 ## 🔧 Advanced Configuration
 ### Custom Intent Keywords
 You can extend or modify the intent keywords by editing the `intent_keywords` dictionary in the code:
 ```python
 self.intent_keywords = {
    "custom_intent": {
        "urdu": ["کلیدی لفظ", "دوسرا لفظ"],
        "english": ["keyword1", "keyword2"]
    },
    # ... existing intents
 }
 ```
 ### GPU Acceleration
 The tool automatically uses GPU if available. To force CPU usage:
 ```python
 # In the code, remove fp16 parameter:
 result = self.model.transcribe(
    audio_path,
    language="ur",
    task="translate"
    # Remove: fp16=torch.cuda.is_available()
 )
 ```
 ## 📝 Example Use Cases
 1. **Customer Service**: Automatically categorize customer calls
 2. **Voice Assistants**: Understand user commands in Urdu
 3. **Healthcare**: Triage patient concerns based on urgency
 4. **Education**: Analyze student questions in online learning
 5. **Business Analytics**: Understand customer feedback from calls
 ## 🐛 Troubleshooting
 ### Common Issues
 1. **"Audio file not found"**
   - Ensure the file path is correct
   - Check file permissions
 2. **Poor transcription quality**
   - Try a larger Whisper model (`--model medium`)
   - Ensure clear audio quality
   - Check if audio contains Urdu speech
 3. **Slow processing**
   - Use smaller model (`--model tiny` or `--model base`)
   - Ensure GPU is available and properly configured
   - Reduce audio file size or duration
 4. **FFmpeg errors**
   - Reinstall FFmpeg
   - Ensure FFmpeg is in system PATH
 ### Debug Mode
 For debugging, you can enable more verbose output by modifying the code:
 ```python
 # Set verbose=True in transcribe calls
 result = self.model.transcribe(
    audio_path,
    language="ur",
    task="translate",
    verbose=True,  # Add this line
    fp16=torch.cuda.is_available()
 )
 ```