A FastAPI-based semantic search engine for Islamic duas (prayers) that uses vector embeddings to find relevant prayers based on natural language queries. The system performs semantic similarity search on dua tags using sentence transformers and PostgreSQL with pgvector extension.

Features

Semantic Search: Find duas using natural language queries (e.g., "protection from evil", "morning prayers")
Vector Embeddings: Uses sentence-transformers/all-MiniLM-L6-v2 for high-quality embeddings
PostgreSQL + pgvector: Efficient vector similarity search in PostgreSQL
RESTful API: FastAPI-powered endpoints with automatic OpenAPI documentation
Multi-language Support: Returns duas in Arabic, transliteration, English translation, Urdu, and Roman Urdu
Metadata Filtering: Access category, occasion, source, and tags information
CORS Enabled: Ready for frontend integration

Technology Stack

FastAPI: Modern, fast web framework for building APIs
LangChain: Framework for working with embeddings and vector stores
Sentence Transformers: State-of-the-art text embedding models
PostgreSQL + pgvector: Vector database for similarity search
Pydantic: Data validation and settings management
python-dotenv: Environment variable management

Prerequisites

Python 3.13 or higher
PostgreSQL with pgvector extension installed
UV package manager (recommended) or pip

Installation

1. Clone the repository

git clone <repository-url>
cd semantic_search

2. Install dependencies

Using UV (recommended):

uv sync

Using pip:

pip install -r requirements.txt

3. Activate virtual environment

source .venv/bin/activate

Configuration

Environment Variables

Create a .env file in the project root with the following variables:

CONNECTION_STRING=postgresql+psycopg2://username:password@localhost:5432/database_name
COLLECTION_NAME=duas_embeddings

Environment Variables Explained:

CONNECTION_STRING: PostgreSQL connection string with pgvector extension
COLLECTION_NAME: Name of the collection/table to store embeddings

Database Setup

Install pgvector extension in PostgreSQL:

CREATE EXTENSION vector;

Ensure you have a PostgreSQL database created and accessible with the credentials in your .env file.

Data Preparation

Initial Setup: Generate Embeddings

Before running the API, you need to generate embeddings from your duas data:

python generate_dua_tags_embedding.py

This script:

Reads duas from duas_directus_published.json
Generates vector embeddings from dua tags
Stores embeddings in PostgreSQL with pgvector
Preserves all metadata (Arabic text, translation, category, etc.)

Note: Ensure duas_directus_published.json exists in the project root before running this script.

Running the Application

Start the FastAPI server

uvicorn main:app --reload --port=8899

Or simply:

python main.py

The API will be available at: http://localhost:8899

Access API Documentation

FastAPI provides automatic interactive API documentation:

Swagger UI: http://localhost:8899/docs
ReDoc: http://localhost:8899/redoc

API Endpoints

Health Check

GET / - Root endpoint with API information
GET /health - Health check with database connection status

Search

GET /search?query={query}&k={number} - Search duas using GET request
- Parameters:
  - query (required): Search query (e.g., "protection from evil")
  - k (optional, default=5): Number of results to return (1-50)

Metadata

GET /categories - Get all unique categories from the duas collection

Example Request

curl "http://localhost:8899/search?query=protection%20from%20evil&k=5"

Example Response

{
  "query": "protection from evil",
  "results_count": 5,
  "results": [
    {
      "id": "123",
      "arabic": "أَعُوذُ بِكَلِمَاتِ اللَّهِ التَّامَّاتِ",
      "transliteration": "A'udhu bikalimatillahit-tammati",
      "translation": "I seek refuge in the perfect words of Allah",
      "urdu": "میں اللہ کے کامل کلمات کی پناہ چاہتا ہوں",
      "romanUrdu": "Main Allah ke kamil kalimat ki panah chahta hoon",
      "category": "Protection",
      "occasion": "General",
      "source": "Sahih Muslim",
      "tags": ["protection", "evil", "refuge"],
      "similarity_score": 0.8542
    }
  ]
}

Project Structure

semantic_search/
├── main.py                           # FastAPI application with API endpoints
├── generate_dua_tags_embedding.py    # Script to generate and store embeddings
├── duas_query.py                     # Helper script for testing queries
├── duas_directus_published.json      # Source data file with duas
├── pyproject.toml                    # Project dependencies and metadata
├── requirements.txt                  # Python dependencies
├── .env                              # Environment variables (not committed)
├── README.md                         # This file
└── .venv/                            # Virtual environment

Key Files Explained

main.py:14-40

Main FastAPI application with:

API endpoints for semantic search
Health check endpoints
CORS middleware configuration
Vector store initialization

generate_dua_tags_embedding.py:22-54

Embedding generation script that:

Loads duas from JSON file
Creates embeddings from tags only
Stores full metadata in vector database

duas_query.py:21-41

Helper script for testing search functionality programmatically

Development

Testing Search Locally

Use duas_query.py for quick testing:

python duas_query.py

Modify the query and k parameters in the script to test different searches.

Adding New Duas

Add new duas to duas_directus_published.json
Run python generate_dua_tags_embedding.py to regenerate embeddings
Restart the API server

Production Considerations

Update CORS settings in main.py:21-27 to restrict allowed origins
Use environment-specific connection strings
Consider caching for the /categories endpoint
Implement rate limiting for API endpoints
Add authentication/authorization if needed
Use a process manager like Gunicorn with Uvicorn workers

Troubleshooting

Database Connection Issues

Verify PostgreSQL is running
Check CONNECTION_STRING in .env file
Ensure pgvector extension is installed

Empty Results

Verify embeddings were generated successfully
Check if duas_directus_published.json has data
Ensure COLLECTION_NAME matches in all files

Port Already in Use

Change the port in the uvicorn command:

uvicorn main:app --reload --port=8080

License

[Add your license here]

Contributing

[Add contribution guidelines here]

Contact

[Add contact information here]