Welcome to Souko.ai's API documentation. Our platform provides clean, reliable web content extraction and structured data processing designed specifically for AI applications and autonomous agents.
This v1 documentation covers our core features: clean content extraction, custom structured data extraction, and intelligent web search with processing.
All API requests require an API key passed in the Authorization header:
HTTP
Authorization: Bearer YOUR_API_KEY
Sign up at dashboard.souko.ai to get your API key. New accounts receive 200 free credits to get started.
Our API uses standard HTTP methods and JSON. Use any HTTP client or library in your preferred language - no special SDK required.
Rate limits vary by plan:
Rate limit headers are included in all responses:
HTTP
X-RateLimit-Limit: 10 X-RateLimit-Remaining: 9 X-RateLimit-Reset: 1640995200
Souko.ai uses a simple credit-based pricing system where 1 credit = 1 cent. Each operation has a base cost, and additional outputs may incur extra costs:
| Operation/Output | Cost | Best For |
|---|---|---|
| Content Processing | 1 credit base | Processing any web page |
| markdown | +0 credits | Clean content from articles, blogs, documentation |
| metadata | +0 credits | Title, description, word count, and page statistics |
| structured | +2 credits | Complex pages like e-commerce, dashboards, apps |
| extract | +2 credits | AI-powered extraction with custom queries and schemas |
| Web Search | 2 credits base | Searching the web + optional result processing |
Processing search results: Each processed result adds 1 base credit + any output costs.
New accounts receive 200 free credits. No credit card required to get started.
The content processing endpoint extracts information from web pages in different formats depending on your needs. Choose the right output types for your use case.
| Parameter | Type | Description |
|---|---|---|
| sourcerequired | object | Source configuration. Currently supports { url: "..." } |
| outputsoptional | array | Array of desired outputs. Defaults to ["markdown", "metadata"] |
Best for: Articles, blog posts, documentation, news content
Returns clean, LLM-optimized Markdown perfect for training data or content analysis. Automatically removes navigation, ads, and clutter to focus on the main content.
Best for: Content analysis, SEO data, page statistics
Extracts title, description, author, word count, heading count, image count, and other structural information about the page.
Best for: E-commerce sites, web apps, dashboards, forms
Returns a structured representation of the page showing interactive elements, buttons, forms, and navigation. Perfect for understanding complex page layouts and user interfaces.
Best for: Custom data extraction with AI
Use natural language to describe what you want to extract. Optionally provide a JSON schema for structured output. Powered by advanced AI models.
By default, the API returns clean Markdown and metadata. This costs 1 credit per request and works great for content-focused pages.
bash
curl -X POST "https://api.souko.ai/v1/content" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"source": {
"url": "https://en.wikipedia.org/wiki/Artificial_intelligence"
},
"outputs": [
"markdown",
"metadata"
]
}'JSON
{
"requestId": "req_1a2b3c4d5e",
"creditsUsed": 1,
"content": {
"source": {
"url": "https://en.wikipedia.org/wiki/Artificial_intelligence"
},
"metadata": {
"title": "Artificial intelligence - Wikipedia",
"description": "Artificial intelligence (AI) is intelligence demonstrated by machines...",
"author": null,
"publishDate": null,
"wordCount": 12547,
"language": "en"
},
"markdown": "# Artificial intelligence\n\nArtificial intelligence (AI) is intelligence demonstrated by machines..."
}
}Use the extract output to get specific data from any page using natural language queries. Perfect for product details, contact information, pricing, or any structured data.
| Parameter | Type | Description |
|---|---|---|
| queryrequired | string | Natural language description of what to extract |
| schemaoptional | object | JSON Schema defining the expected output structure |
bash
curl -X POST "https://api.souko.ai/v1/content" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"source": {
"url": "https://store.example.com/product/123"
},
"outputs": [
"markdown",
{
"extract": {
"query": "Extract the product name, price, description, and key features",
"schema": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"price": {
"type": "number"
},
"description": {
"type": "string"
},
"features": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"name",
"price"
]
}
}
}
]
}'JSON
{
"requestId": "req_2f3g4h5i6j",
"creditsUsed": 3,
"content": {
"source": {
"url": "https://store.example.com/product/123"
},
"metadata": {
"title": "Amazing Widget - Buy Now",
"wordCount": 456
},
"markdown": "# Amazing Widget\n\nThe best widget you can buy...",
"extract": {
"name": "Amazing Widget Pro",
"price": 99.99,
"description": "The best widget you can buy with advanced features",
"features": [
"Advanced processing",
"Durable construction",
"2-year warranty"
]
}
}
}Content sites (blogs, news, docs): Use markdown + metadata
E-commerce or complex UIs: Use structured to understand page layout
Specific data extraction: Use extract with natural language queries
Multiple needs: Combine output types in a single request
Perform web searches and optionally process the results through our content pipeline in a single API call. Great for getting current information and feeding it directly to your AI systems.
| Parameter | Type | Description |
|---|---|---|
| queryrequired | string | Search query to execute |
| num_resultsoptional | integer | Number of results to return. Default: 5, Max: 10 |
| processoptional | object | Optional processing configuration for search results |
Basic web search returns titles, URLs, and snippets. Costs 2 credits per search.
bash
curl -X POST "https://api.souko.ai/v1/search" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"query": "latest developments in AI infrastructure",
"num_results": 3
}'Add a process object to automatically extract content from search results using any of the same output types as the content endpoint.
bash
curl -X POST "https://api.souko.ai/v1/search" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"query": "AI infrastructure trends 2025",
"num_results": 2,
"process": {
"outputs": [
"markdown"
]
}
}'JSON
{
"requestId": "req_3g4h5i6j7k",
"creditsUsed": 4,
"data": {
"query": "AI infrastructure trends 2025",
"results": [
{
"position": 1,
"title": "The Future of AI Infrastructure in 2025",
"link": "https://techblog.example.com/ai-infrastructure-2025",
"snippet": "Key trends shaping AI infrastructure this year...",
"content": {
"markdown": "# The Future of AI Infrastructure in 2025\n\nAs we move into 2025, several key trends are shaping the AI infrastructure landscape..."
}
},
{
"position": 2,
"title": "AI Infrastructure Investments in 2025",
"link": "https://venture.example.com/ai-infrastructure-investments",
"snippet": "Investment patterns show strong growth in AI infrastructure...",
"content": {
"markdown": "# AI Infrastructure Investments in 2025\n\nVenture capital and corporate investments in AI infrastructure continue to accelerate..."
}
}
]
}
}Search costs 2 base credits + processing costs for each result (up to 5 results). Each processed result costs 1 base credit + output costs: +0 for basic content, +2 for structured data, +2 for AI extraction.
Example: Search + process 3 results with extract = 2 + (3 × 3) = 11 credits total
Extract clean content from documentation sites, blogs, and articles for training datasets:
bash
curl -X POST "https://api.souko.ai/v1/content" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"source": {
"url": "https://docs.python.org/3/tutorial/"
},
"outputs": [
"markdown",
"metadata"
]
}'Keep your AI agents updated with current information through search:
bash
curl -X POST "https://api.souko.ai/v1/search" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"query": "latest news about renewable energy",
"num_results": 3,
"process": {
"outputs": [
"markdown"
]
}
}'Extract structured product data for price monitoring or catalog building:
bash
curl -X POST "https://api.souko.ai/v1/content" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"source": {
"url": "PRODUCT_URL"
},
"outputs": [
{
"extract": {
"query": "Extract product name, current price, and availability",
"schema": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"price": {
"type": "number"
},
"inStock": {
"type": "boolean"
}
}
}
}
}
]
}'