6.9 KiB
Islamic Duas Semantic Search API
A FastAPI-based semantic search engine for Islamic duas (prayers) that uses vector embeddings to find relevant prayers based on natural language queries. The system performs semantic similarity search on dua tags using sentence transformers and PostgreSQL with pgvector extension.
Features
- Semantic Search: Find duas using natural language queries (e.g., "protection from evil", "morning prayers")
- Vector Embeddings: Uses
sentence-transformers/all-MiniLM-L6-v2for high-quality embeddings - PostgreSQL + pgvector: Efficient vector similarity search in PostgreSQL
- RESTful API: FastAPI-powered endpoints with automatic OpenAPI documentation
- Multi-language Support: Returns duas in Arabic, transliteration, English translation, Urdu, and Roman Urdu
- Metadata Filtering: Access category, occasion, source, and tags information
- CORS Enabled: Ready for frontend integration
Technology Stack
- FastAPI: Modern, fast web framework for building APIs
- LangChain: Framework for working with embeddings and vector stores
- Sentence Transformers: State-of-the-art text embedding models
- PostgreSQL + pgvector: Vector database for similarity search
- Pydantic: Data validation and settings management
- python-dotenv: Environment variable management
Prerequisites
- Python 3.13 or higher
- PostgreSQL with pgvector extension installed
- UV package manager (recommended) or pip
Installation
1. Clone the repository
git clone <repository-url>
cd semantic_search
2. Install dependencies
Using UV (recommended):
uv sync
Using pip:
pip install -r requirements.txt
3. Activate virtual environment
source .venv/bin/activate
Configuration
Environment Variables
Create a .env file in the project root with the following variables:
CONNECTION_STRING=postgresql+psycopg2://username:password@localhost:5432/database_name
COLLECTION_NAME=duas_embeddings
Environment Variables Explained:
CONNECTION_STRING: PostgreSQL connection string with pgvector extensionCOLLECTION_NAME: Name of the collection/table to store embeddings
Database Setup
- Install pgvector extension in PostgreSQL:
CREATE EXTENSION vector;
- Ensure you have a PostgreSQL database created and accessible with the credentials in your
.envfile.
Data Preparation
Initial Setup: Generate Embeddings
Before running the API, you need to generate embeddings from your duas data:
python generate_dua_tags_embedding.py
This script:
- Reads duas from
duas_directus_published.json - Generates vector embeddings from dua tags
- Stores embeddings in PostgreSQL with pgvector
- Preserves all metadata (Arabic text, translation, category, etc.)
Note: Ensure duas_directus_published.json exists in the project root before running this script.
Running the Application
Start the FastAPI server
uvicorn main:app --reload --port=8899
Or simply:
python main.py
The API will be available at: http://localhost:8899
Access API Documentation
FastAPI provides automatic interactive API documentation:
- Swagger UI: http://localhost:8899/docs
- ReDoc: http://localhost:8899/redoc
API Endpoints
Health Check
- GET
/- Root endpoint with API information - GET
/health- Health check with database connection status
Search
- GET
/search?query={query}&k={number}- Search duas using GET request- Parameters:
query(required): Search query (e.g., "protection from evil")k(optional, default=5): Number of results to return (1-50)
- Parameters:
Metadata
- GET
/categories- Get all unique categories from the duas collection
Example Request
curl "http://localhost:8899/search?query=protection%20from%20evil&k=5"
Example Response
{
"query": "protection from evil",
"results_count": 5,
"results": [
{
"id": "123",
"arabic": "أَعُوذُ بِكَلِمَاتِ اللَّهِ التَّامَّاتِ",
"transliteration": "A'udhu bikalimatillahit-tammati",
"translation": "I seek refuge in the perfect words of Allah",
"urdu": "میں اللہ کے کامل کلمات کی پناہ چاہتا ہوں",
"romanUrdu": "Main Allah ke kamil kalimat ki panah chahta hoon",
"category": "Protection",
"occasion": "General",
"source": "Sahih Muslim",
"tags": ["protection", "evil", "refuge"],
"similarity_score": 0.8542
}
]
}
Project Structure
semantic_search/
├── main.py # FastAPI application with API endpoints
├── generate_dua_tags_embedding.py # Script to generate and store embeddings
├── duas_query.py # Helper script for testing queries
├── duas_directus_published.json # Source data file with duas
├── pyproject.toml # Project dependencies and metadata
├── requirements.txt # Python dependencies
├── .env # Environment variables (not committed)
├── README.md # This file
└── .venv/ # Virtual environment
Key Files Explained
main.py:14-40
Main FastAPI application with:
- API endpoints for semantic search
- Health check endpoints
- CORS middleware configuration
- Vector store initialization
generate_dua_tags_embedding.py:22-54
Embedding generation script that:
- Loads duas from JSON file
- Creates embeddings from tags only
- Stores full metadata in vector database
duas_query.py:21-41
Helper script for testing search functionality programmatically
Development
Testing Search Locally
Use duas_query.py for quick testing:
python duas_query.py
Modify the query and k parameters in the script to test different searches.
Adding New Duas
- Add new duas to
duas_directus_published.json - Run
python generate_dua_tags_embedding.pyto regenerate embeddings - Restart the API server
Production Considerations
- Update CORS settings in main.py:21-27 to restrict allowed origins
- Use environment-specific connection strings
- Consider caching for the
/categoriesendpoint - Implement rate limiting for API endpoints
- Add authentication/authorization if needed
- Use a process manager like Gunicorn with Uvicorn workers
Troubleshooting
Database Connection Issues
- Verify PostgreSQL is running
- Check CONNECTION_STRING in
.envfile - Ensure pgvector extension is installed
Empty Results
- Verify embeddings were generated successfully
- Check if
duas_directus_published.jsonhas data - Ensure COLLECTION_NAME matches in all files
Port Already in Use
Change the port in the uvicorn command:
uvicorn main:app --reload --port=8080
License
[Add your license here]
Contributing
[Add contribution guidelines here]
Contact
[Add contact information here]