Islamic Duas Semantic Search API

A FastAPI-based semantic search engine for Islamic duas (prayers) that uses vector embeddings to find relevant prayers based on natural language queries. The system performs semantic similarity search on dua tags using sentence transformers and PostgreSQL with pgvector extension.

Features

  • Semantic Search: Find duas using natural language queries (e.g., "protection from evil", "morning prayers")
  • Vector Embeddings: Uses sentence-transformers/all-MiniLM-L6-v2 for high-quality embeddings
  • PostgreSQL + pgvector: Efficient vector similarity search in PostgreSQL
  • RESTful API: FastAPI-powered endpoints with automatic OpenAPI documentation
  • Multi-language Support: Returns duas in Arabic, transliteration, English translation, Urdu, and Roman Urdu
  • Metadata Filtering: Access category, occasion, source, and tags information
  • CORS Enabled: Ready for frontend integration

Technology Stack

  • FastAPI: Modern, fast web framework for building APIs
  • LangChain: Framework for working with embeddings and vector stores
  • Sentence Transformers: State-of-the-art text embedding models
  • PostgreSQL + pgvector: Vector database for similarity search
  • Pydantic: Data validation and settings management
  • python-dotenv: Environment variable management

Prerequisites

  • Python 3.13 or higher
  • PostgreSQL with pgvector extension installed
  • UV package manager (recommended) or pip

Installation

1. Clone the repository

git clone <repository-url>
cd semantic_search

2. Install dependencies

Using UV (recommended):

uv sync

Using pip:

pip install -r requirements.txt

3. Activate virtual environment

source .venv/bin/activate

Configuration

Environment Variables

Create a .env file in the project root with the following variables:

CONNECTION_STRING=postgresql+psycopg2://username:password@localhost:5432/database_name
COLLECTION_NAME=duas_embeddings

Environment Variables Explained:

  • CONNECTION_STRING: PostgreSQL connection string with pgvector extension
  • COLLECTION_NAME: Name of the collection/table to store embeddings

Database Setup

  1. Install pgvector extension in PostgreSQL:
CREATE EXTENSION vector;
  1. Ensure you have a PostgreSQL database created and accessible with the credentials in your .env file.

Data Preparation

Initial Setup: Generate Embeddings

Before running the API, you need to generate embeddings from your duas data:

python generate_dua_tags_embedding.py

This script:

  • Reads duas from duas_directus_published.json
  • Generates vector embeddings from dua tags
  • Stores embeddings in PostgreSQL with pgvector
  • Preserves all metadata (Arabic text, translation, category, etc.)

Note: Ensure duas_directus_published.json exists in the project root before running this script.

Running the Application

Start the FastAPI server

uvicorn main:app --reload --port=8899

Or simply:

python main.py

The API will be available at: http://localhost:8899

Access API Documentation

FastAPI provides automatic interactive API documentation:

API Endpoints

Health Check

  • GET / - Root endpoint with API information
  • GET /health - Health check with database connection status
  • GET /search?query={query}&k={number} - Search duas using GET request
    • Parameters:
      • query (required): Search query (e.g., "protection from evil")
      • k (optional, default=5): Number of results to return (1-50)

Metadata

  • GET /categories - Get all unique categories from the duas collection

Example Request

curl "http://localhost:8899/search?query=protection%20from%20evil&k=5"

Example Response

{
  "query": "protection from evil",
  "results_count": 5,
  "results": [
    {
      "id": "123",
      "arabic": "أَعُوذُ بِكَلِمَاتِ اللَّهِ التَّامَّاتِ",
      "transliteration": "A'udhu bikalimatillahit-tammati",
      "translation": "I seek refuge in the perfect words of Allah",
      "urdu": "میں اللہ کے کامل کلمات کی پناہ چاہتا ہوں",
      "romanUrdu": "Main Allah ke kamil kalimat ki panah chahta hoon",
      "category": "Protection",
      "occasion": "General",
      "source": "Sahih Muslim",
      "tags": ["protection", "evil", "refuge"],
      "similarity_score": 0.8542
    }
  ]
}

Project Structure

semantic_search/
├── main.py                           # FastAPI application with API endpoints
├── generate_dua_tags_embedding.py    # Script to generate and store embeddings
├── duas_query.py                     # Helper script for testing queries
├── duas_directus_published.json      # Source data file with duas
├── pyproject.toml                    # Project dependencies and metadata
├── requirements.txt                  # Python dependencies
├── .env                              # Environment variables (not committed)
├── README.md                         # This file
└── .venv/                            # Virtual environment

Key Files Explained

main.py:14-40

Main FastAPI application with:

  • API endpoints for semantic search
  • Health check endpoints
  • CORS middleware configuration
  • Vector store initialization

generate_dua_tags_embedding.py:22-54

Embedding generation script that:

  • Loads duas from JSON file
  • Creates embeddings from tags only
  • Stores full metadata in vector database

duas_query.py:21-41

Helper script for testing search functionality programmatically

Development

Testing Search Locally

Use duas_query.py for quick testing:

python duas_query.py

Modify the query and k parameters in the script to test different searches.

Adding New Duas

  1. Add new duas to duas_directus_published.json
  2. Run python generate_dua_tags_embedding.py to regenerate embeddings
  3. Restart the API server

Production Considerations

  • Update CORS settings in main.py:21-27 to restrict allowed origins
  • Use environment-specific connection strings
  • Consider caching for the /categories endpoint
  • Implement rate limiting for API endpoints
  • Add authentication/authorization if needed
  • Use a process manager like Gunicorn with Uvicorn workers

Troubleshooting

Database Connection Issues

  • Verify PostgreSQL is running
  • Check CONNECTION_STRING in .env file
  • Ensure pgvector extension is installed

Empty Results

  • Verify embeddings were generated successfully
  • Check if duas_directus_published.json has data
  • Ensure COLLECTION_NAME matches in all files

Port Already in Use

Change the port in the uvicorn command:

uvicorn main:app --reload --port=8080

License

[Add your license here]

Contributing

[Add contribution guidelines here]

Contact

[Add contact information here]

Description
No description provided
Readme 314 KiB
Languages
Python 100%