souko.ai

Souko.ai API Documentation

v1.0

Welcome to Souko.ai's API documentation. Our platform provides clean, reliable web content extraction and structured data processing designed specifically for AI applications and autonomous agents.

This v1 documentation covers our core features: clean content extraction, custom structured data extraction, and intelligent web search with processing.

Authentication

All API requests require an API key passed in the Authorization header:

HTTP

Authorization: Bearer YOUR_API_KEY
Getting Your API Key

Sign up at dashboard.souko.ai to get your API key. New accounts receive 200 free credits to get started.

Direct HTTP Access

Our API uses standard HTTP methods and JSON. Use any HTTP client or library in your preferred language - no special SDK required.

Rate Limits

Rate limits vary by plan:

  • Free Trial: 1 request per second
  • Basic Plan: 2 requests per second
  • Pro Plan: 10 requests per second
  • Scale Plan: 50 requests per second

Rate limit headers are included in all responses:

HTTP

X-RateLimit-Limit: 10
X-RateLimit-Remaining: 9
X-RateLimit-Reset: 1640995200

Credit System

Souko.ai uses a simple credit-based pricing system where 1 credit = 1 cent. Each operation has a base cost, and additional outputs may incur extra costs:

Operation/OutputCostBest For
Content Processing1 credit baseProcessing any web page
markdown+0 creditsClean content from articles, blogs, documentation
metadata+0 creditsTitle, description, word count, and page statistics
structured+2 creditsComplex pages like e-commerce, dashboards, apps
extract+2 creditsAI-powered extraction with custom queries and schemas
Web Search2 credits baseSearching the web + optional result processing

Processing search results: Each processed result adds 1 base credit + any output costs.

Pricing Examples

  • Basic content extraction (markdown + metadata): 1 credit
  • Product data extraction (markdown + extract): 3 credits (1 base + 2 for extract)
  • Simple web search (5 results, no processing): 2 credits
  • Search + process 3 results (with markdown): 5 credits (2 base + 3 × 1 processing)
  • Search + extract from 2 results: 8 credits (2 base + 2 × (1 + 2) processing)
  • Structured page analysis (structured output): 3 credits (1 base + 2 for structured)
Free Trial

New accounts receive 200 free credits. No credit card required to get started.

Content Processing

The content processing endpoint extracts information from web pages in different formats depending on your needs. Choose the right output types for your use case.

POST/v1/content

ParameterTypeDescription
sourcerequiredobjectSource configuration. Currently supports { url: "..." }
outputsoptionalarrayArray of desired outputs. Defaults to ["markdown", "metadata"]

Available Output Types

markdown (Free)

Best for: Articles, blog posts, documentation, news content

Returns clean, LLM-optimized Markdown perfect for training data or content analysis. Automatically removes navigation, ads, and clutter to focus on the main content.

metadata (Free)

Best for: Content analysis, SEO data, page statistics

Extracts title, description, author, word count, heading count, image count, and other structural information about the page.

structured (+2 credits)

Best for: E-commerce sites, web apps, dashboards, forms

Returns a structured representation of the page showing interactive elements, buttons, forms, and navigation. Perfect for understanding complex page layouts and user interfaces.

extract (+2 credits)

Best for: Custom data extraction with AI

Use natural language to describe what you want to extract. Optionally provide a JSON schema for structured output. Powered by advanced AI models.

Basic Content Extraction

By default, the API returns clean Markdown and metadata. This costs 1 credit per request and works great for content-focused pages.

bash

curl -X POST "https://api.souko.ai/v1/content" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "source": {
    "url": "https://en.wikipedia.org/wiki/Artificial_intelligence"
  },
  "outputs": [
    "markdown",
    "metadata"
  ]
}'

Response

JSON

{
  "requestId": "req_1a2b3c4d5e",
  "creditsUsed": 1,
  "content": {
    "source": {
      "url": "https://en.wikipedia.org/wiki/Artificial_intelligence"
    },
    "metadata": {
      "title": "Artificial intelligence - Wikipedia",
      "description": "Artificial intelligence (AI) is intelligence demonstrated by machines...",
      "author": null,
      "publishDate": null,
      "wordCount": 12547,
      "language": "en"
    },
    "markdown": "# Artificial intelligence\n\nArtificial intelligence (AI) is intelligence demonstrated by machines..."
  }
}

AI-Powered Data Extraction

Use the extract output to get specific data from any page using natural language queries. Perfect for product details, contact information, pricing, or any structured data.

Extract Parameters

ParameterTypeDescription
queryrequiredstringNatural language description of what to extract
schemaoptionalobjectJSON Schema defining the expected output structure

bash

curl -X POST "https://api.souko.ai/v1/content" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "source": {
    "url": "https://store.example.com/product/123"
  },
  "outputs": [
    "markdown",
    {
      "extract": {
        "query": "Extract the product name, price, description, and key features",
        "schema": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "price": {
              "type": "number"
            },
            "description": {
              "type": "string"
            },
            "features": {
              "type": "array",
              "items": {
                "type": "string"
              }
            }
          },
          "required": [
            "name",
            "price"
          ]
        }
      }
    }
  ]
}'

Response

JSON

{
  "requestId": "req_2f3g4h5i6j",
  "creditsUsed": 3,
  "content": {
    "source": {
      "url": "https://store.example.com/product/123"
    },
    "metadata": {
      "title": "Amazing Widget - Buy Now",
      "wordCount": 456
    },
    "markdown": "# Amazing Widget\n\nThe best widget you can buy...",
    "extract": {
      "name": "Amazing Widget Pro",
      "price": 99.99,
      "description": "The best widget you can buy with advanced features",
      "features": [
        "Advanced processing",
        "Durable construction", 
        "2-year warranty"
      ]
    }
  }
}
When to Use Each Output Type

Content sites (blogs, news, docs): Use markdown + metadata

E-commerce or complex UIs: Use structured to understand page layout

Specific data extraction: Use extract with natural language queries

Multiple needs: Combine output types in a single request

Use Cases

AI Training Data Collection

Extract clean content from documentation sites, blogs, and articles for training datasets:

bash

curl -X POST "https://api.souko.ai/v1/content" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "source": {
    "url": "https://docs.python.org/3/tutorial/"
  },
  "outputs": [
    "markdown",
    "metadata"
  ]
}'
Real-time Information for AI Agents

Keep your AI agents updated with current information through search:

bash

curl -X POST "https://api.souko.ai/v1/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "query": "latest news about renewable energy",
  "num_results": 3,
  "process": {
    "outputs": [
      "markdown"
    ]
  }
}'
E-commerce Data Extraction

Extract structured product data for price monitoring or catalog building:

bash

curl -X POST "https://api.souko.ai/v1/content" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "source": {
    "url": "PRODUCT_URL"
  },
  "outputs": [
    {
      "extract": {
        "query": "Extract product name, current price, and availability",
        "schema": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "price": {
              "type": "number"
            },
            "inStock": {
              "type": "boolean"
            }
          }
        }
      }
    }
  ]
}'