Projects miruiq hub Files
.mvn/wrapper Loading last commit info...
src
.dockerignore
.gitignore
Dockerfile
README.md
mvnw
mvnw.cmd
pom.xml
README.md

MiruIQ Hub

A Quarkus-based REST API for intelligent document data extraction powered by LLMs. Upload documents, define extraction schemas, and get structured data back.

Features

  • Document Extraction - Upload PDFs/images and extract structured data using AI
  • Schema Management - Create, store, and reuse extraction schemas with $ref support
  • AI Schema Generation - Generate schemas from natural language descriptions or sample documents
  • Data Validation - Validate extracted data with text, numeric, and semantic validation strategies
  • Multi-tenant - JWT and API key authentication with per-user rate limiting

Tech Stack

  • Framework: Quarkus 3.19
  • Language: Java 17
  • Storage: Apache Paimon (data lake), MinIO/S3 (files), PostgreSQL (metadata)
  • AI: OpenAI-compatible API (GPT-4, local models via OpenRouter, etc.)
  • PDF Processing: Apache PDFBox

Quick Start

Prerequisites

  • Java 17+
  • Docker (for MinIO and PostgreSQL)
  • An OpenAI-compatible API key

Configuration

Set the following environment variables:

# Required - OpenAI Configuration
export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4o-mini"

# Optional - Database (defaults to local Docker)
export JDBC_URL="localhost:5432"
export JDBC_DATABASE="document_store"
export JDBC_USERNAME="paimon"
export JDBC_PASSWORD="paimon"

# Optional - S3/MinIO (defaults to local MinIO)
export S3_ENDPOINT="http://localhost:9000"
export S3_BUCKET="media-store"
export AWS_ACCESS_KEY_ID="minioadmin"
export AWS_SECRET_ACCESS_KEY="minioadmin"

Run in Development Mode

./mvnw quarkus:dev

The API will be available at http://localhost:8082.

Run with Docker

docker build -t miruiq-hub .
docker run -p 8082:8082 \
  -e OPENAI_API_KEY="your-key" \
  -e OPENAI_BASE_URL="https://api.openai.com/v1" \
  -e OPENAI_MODEL="gpt-4o-mini" \
  miruiq-hub

API Reference

Authentication

All endpoints require authentication via:

  • X-API-Key header, or
  • Authorization header (Bearer token)

Endpoints

Extractions

MethodEndpointDescription
POST/extractionsCreate extraction job (multipart: schema + documents)
GET/extractions/{id}Get extraction status and results
GET/extractions?label={label}Find extraction by label
POST/extractions/{id}/validateValidate extraction results
GET/extractions/{id}/validationGet validation results

Schemas

MethodEndpointDescription
POST/schemasCreate schema template
GET/schemasList all schemas
GET/schemas/{id}Get schema by ID
GET/schemas/{id}/resolvedGet schema with $ref expanded
GET/schemas/by-name/{name}Get schema by name
PUT/schemas/{id}Update schema
DELETE/schemas/{id}Delete schema
POST/schemas/generateGenerate schema from description/images

Validation

MethodEndpointDescription
POST/validation/validate/{requestId}Validate by request ID
POST/validation/validate/by-label/{label}Validate by label
POST/validation/validate/multiValidate multiple extractions
GET/validation/results/{requestId}Get validation results

Example: Create an Extraction

curl -X POST http://localhost:8082/extractions \
  -H "X-API-Key: your-api-key" \
  -F 'schema={
    "type": "object",
    "properties": {
      "invoice_number": { "type": "string" },
      "total_amount": { "type": "number" },
      "date": { "type": "string", "format": "date" }
    }
  }' \
  -F "documents=@invoice.pdf" \
  -F "label=invoice-001"

Example: Generate a Schema with AI

curl -X POST http://localhost:8082/schemas/generate \
  -H "X-API-Key: your-api-key" \
  -F "description=Extract customer name, order items with quantities and prices, and total amount" \
  -F "images=@sample-receipt.jpg"

Testing

# Run unit tests only (default)
./mvnw test

# Run unit + LLM tests (requires running LLM server)
./mvnw test -Pwith-llm

# Run all tests (requires LLM + Flink pipeline)
./mvnw test -Pfull

Configuration Reference

PropertyDefaultDescription
HTTP_PORT8082Main API port
PDF_MAX_PAGES100Maximum pages per PDF
MAX_FILES_PER_REQUEST10Max documents per extraction
RATE_LIMIT_PER_USER_PER_MINUTE1000Rate limit per user
HTTP_MAX_BODY_SIZE100MMaximum upload size

License

Proprietary - MiruIQ

Please wait...
Connection lost or session expired, reload to recover
Page is in error, reload to recover