souko.ai

Getting Started

Rate Limits

Credit System

API Reference

Content Processing Basic Extraction Structured Data

Web Search Simple Search Search + Process

Use Cases

Souko.ai API Documentation

v1.0

Welcome to Souko.ai's API documentation. Our platform provides clean, reliable web content extraction and structured data processing designed specifically for AI applications and autonomous agents.

This v1 documentation covers our core features: clean content extraction, custom structured data extraction, and intelligent web search with processing.

Authentication

All API requests require an API key passed in the Authorization header:

HTTP

Authorization: Bearer YOUR_API_KEY

Getting Your API Key

Direct HTTP Access

Our API uses standard HTTP methods and JSON. Use any HTTP client or library in your preferred language - no special SDK required.

Rate Limits

Rate limits vary by plan:

Free Trial: 1 request per second
Basic Plan: 2 requests per second
Pro Plan: 10 requests per second
Scale Plan: 50 requests per second

Rate limit headers are included in all responses:

HTTP

X-RateLimit-Limit: 10
X-RateLimit-Remaining: 9
X-RateLimit-Reset: 1640995200

Credit System

Souko.ai uses a simple credit-based pricing system where 1 credit = 1 cent. Each operation has a base cost, and additional outputs may incur extra costs:

Operation/Output	Cost	Best For
Content Processing	1 credit base	Processing any web page
markdown	+0 credits	Clean content from articles, blogs, documentation
metadata	+0 credits	Title, description, word count, and page statistics
structured	+2 credits	Complex pages like e-commerce, dashboards, apps
extract	+2 credits	AI-powered extraction with custom queries and schemas
Web Search	2 credits base	Searching the web + optional result processing

Processing search results: Each processed result adds 1 base credit + any output costs.

Pricing Examples

Basic content extraction (markdown + metadata): 1 credit
Product data extraction (markdown + extract): 3 credits (1 base + 2 for extract)
Simple web search (5 results, no processing): 2 credits
Search + process 3 results (with markdown): 5 credits (2 base + 3 × 1 processing)
Search + extract from 2 results: 8 credits (2 base + 2 × (1 + 2) processing)
Structured page analysis (structured output): 3 credits (1 base + 2 for structured)

Free Trial

New accounts receive 200 free credits. No credit card required to get started.

Content Processing

The content processing endpoint extracts information from web pages in different formats depending on your needs. Choose the right output types for your use case.

POST/v1/content

Parameter	Type	Description
sourcerequired	object	Source configuration. Currently supports `{ url: "..." }`
outputsoptional	array	Array of desired outputs. Defaults to `["markdown", "metadata"]`

Available Output Types

markdown (Free)

Best for: Articles, blog posts, documentation, news content

Returns clean, LLM-optimized Markdown perfect for training data or content analysis. Automatically removes navigation, ads, and clutter to focus on the main content.

metadata (Free)

Best for: Content analysis, SEO data, page statistics

Extracts title, description, author, word count, heading count, image count, and other structural information about the page.

structured (+2 credits)

Best for: E-commerce sites, web apps, dashboards, forms

Returns a structured representation of the page showing interactive elements, buttons, forms, and navigation. Perfect for understanding complex page layouts and user interfaces.

extract (+2 credits)

Best for: Custom data extraction with AI

Use natural language to describe what you want to extract. Optionally provide a JSON schema for structured output. Powered by advanced AI models.

Basic Content Extraction

By default, the API returns clean Markdown and metadata. This costs 1 credit per request and works great for content-focused pages.

bash

curl -X POST "https://api.souko.ai/v1/content" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "source": {
    "url": "https://en.wikipedia.org/wiki/Artificial_intelligence"
  },
  "outputs": [
    "markdown",
    "metadata"
  ]
}'

Response

JSON

{
  "requestId": "req_1a2b3c4d5e",
  "creditsUsed": 1,
  "content": {
    "source": {
      "url": "https://en.wikipedia.org/wiki/Artificial_intelligence"
    },
    "metadata": {
      "title": "Artificial intelligence - Wikipedia",
      "description": "Artificial intelligence (AI) is intelligence demonstrated by machines...",
      "author": null,
      "publishDate": null,
      "wordCount": 12547,
      "language": "en"
    },
    "markdown": "# Artificial intelligence\n\nArtificial intelligence (AI) is intelligence demonstrated by machines..."
  }
}

AI-Powered Data Extraction

Use the extract output to get specific data from any page using natural language queries. Perfect for product details, contact information, pricing, or any structured data.

Extract Parameters

Parameter	Type	Description
queryrequired	string	Natural language description of what to extract
schemaoptional	object	JSON Schema defining the expected output structure

bash

curl -X POST "https://api.souko.ai/v1/content" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "source": {
    "url": "https://store.example.com/product/123"
  },
  "outputs": [
    "markdown",
    {
      "extract": {
        "query": "Extract the product name, price, description, and key features",
        "schema": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "price": {
              "type": "number"
            },
            "description": {
              "type": "string"
            },
            "features": {
              "type": "array",
              "items": {
                "type": "string"
              }
            }
          },
          "required": [
            "name",
            "price"
          ]
        }
      }
    }
  ]
}'

Response

JSON

{
  "requestId": "req_2f3g4h5i6j",
  "creditsUsed": 3,
  "content": {
    "source": {
      "url": "https://store.example.com/product/123"
    },
    "metadata": {
      "title": "Amazing Widget - Buy Now",
      "wordCount": 456
    },
    "markdown": "# Amazing Widget\n\nThe best widget you can buy...",
    "extract": {
      "name": "Amazing Widget Pro",
      "price": 99.99,
      "description": "The best widget you can buy with advanced features",
      "features": [
        "Advanced processing",
        "Durable construction", 
        "2-year warranty"
      ]
    }
  }
}

When to Use Each Output Type

Content sites (blogs, news, docs): Use markdown + metadata

E-commerce or complex UIs: Use structured to understand page layout

Specific data extraction: Use extract with natural language queries

Multiple needs: Combine output types in a single request

Web Search

Perform web searches and optionally process the results through our content pipeline in a single API call. Great for getting current information and feeding it directly to your AI systems.

POST/v1/search

Parameter	Type	Description
queryrequired	string	Search query to execute
num_resultsoptional	integer	Number of results to return. Default: 5, Max: 10
processoptional	object	Optional processing configuration for search results

Simple Search

Basic web search returns titles, URLs, and snippets. Costs 2 credits per search.

bash

curl -X POST "https://api.souko.ai/v1/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "query": "latest developments in AI infrastructure",
  "num_results": 3
}'

Search + Process Results

Add a process object to automatically extract content from search results using any of the same output types as the content endpoint.

bash

curl -X POST "https://api.souko.ai/v1/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "query": "AI infrastructure trends 2025",
  "num_results": 2,
  "process": {
    "outputs": [
      "markdown"
    ]
  }
}'

Response

JSON

{
  "requestId": "req_3g4h5i6j7k",
  "creditsUsed": 4,
  "data": {
    "query": "AI infrastructure trends 2025",
    "results": [
      {
        "position": 1,
        "title": "The Future of AI Infrastructure in 2025",
        "link": "https://techblog.example.com/ai-infrastructure-2025",
        "snippet": "Key trends shaping AI infrastructure this year...",
        "content": {
          "markdown": "# The Future of AI Infrastructure in 2025\n\nAs we move into 2025, several key trends are shaping the AI infrastructure landscape..."
        }
      },
      {
        "position": 2,
        "title": "AI Infrastructure Investments in 2025",
        "link": "https://venture.example.com/ai-infrastructure-investments",
        "snippet": "Investment patterns show strong growth in AI infrastructure...",
        "content": {
          "markdown": "# AI Infrastructure Investments in 2025\n\nVenture capital and corporate investments in AI infrastructure continue to accelerate..."
        }
      }
    ]
  }
}

Search + Process Pricing

Search costs 2 base credits + processing costs for each result (up to 5 results). Each processed result costs 1 base credit + output costs: +0 for basic content, +2 for structured data, +2 for AI extraction.

Example: Search + process 3 results with extract = 2 + (3 × 3) = 11 credits total

Use Cases

AI Training Data Collection

Extract clean content from documentation sites, blogs, and articles for training datasets:

bash

curl -X POST "https://api.souko.ai/v1/content" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "source": {
    "url": "https://docs.python.org/3/tutorial/"
  },
  "outputs": [
    "markdown",
    "metadata"
  ]
}'

Real-time Information for AI Agents

Keep your AI agents updated with current information through search:

bash

curl -X POST "https://api.souko.ai/v1/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "query": "latest news about renewable energy",
  "num_results": 3,
  "process": {
    "outputs": [
      "markdown"
    ]
  }
}'

E-commerce Data Extraction

Extract structured product data for price monitoring or catalog building:

bash

curl -X POST "https://api.souko.ai/v1/content" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
  "source": {
    "url": "PRODUCT_URL"
  },
  "outputs": [
    {
      "extract": {
        "query": "Extract product name, current price, and availability",
        "schema": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "price": {
              "type": "number"
            },
            "inStock": {
              "type": "boolean"
            }
          }
        }
      }
    }
  ]
}'