Overview

Scrapester Python SDK provides a simple wrapper around the Scrapester API to help you transform web content into structured data.

Installation

Install the Scrapester Python SDK using pip:
pip install scrapester

Basic Usage

  1. Get an API key from scrapester.com
  2. Set the API key as an environment variable SCRAPESTER_API_KEY or pass it directly to the ScrapesterApp class
Here’s a complete example:
from scrapester import ScrapesterApp

app = ScrapesterApp(api_key="sk-YOUR_API_KEY")

# Scrape a website
scrape_result = app.scrape_url(
    'https://example.com',
    params={'formats': ['markdown', 'html']}
)
print(scrape_result)

Scraping a URL

The scrape_url method is used to extract content from a single URL:
# Basic scraping
result = app.scrape_url(
    'example.com',
    params={'formats': ['markdown', 'html']}
)
print(result)

# With advanced options
advanced_result = app.scrape_url(
    'example.com',
    params={
        'formats': ['markdown', 'html', 'screenshot'],
        'screenshot': {
            'fullPage': True,
            'type': 'jpeg'
        },
        'headers': {
            'Cookie': 'session=abc123'
        }
    }
)

Structured Data Extraction

Extract specific data using Pydantic schemas:
from pydantic import BaseModel
from typing import List

class ProductSchema(BaseModel):
    title: str
    price: float
    features: List[str]

result = app.scrape_url(
    'example.com',
    params={
        'formats': ['extract'],
        'extract': {
            'schema': ProductSchema.model_json_schema()
        }
    }
)
print(result['extract'])

Page Interactions

Interact with dynamic content before scraping:
result = app.scrape_url(
    'example.com',
    params={
        'formats': ['markdown'],
        'actions': [
            {"type": "wait", "milliseconds": 1000},
            {"type": "click", "selector": "#login-button"},
            {"type": "input", "selector": "#email", "value": "test@example.com"},
            {"type": "wait", "milliseconds": 1000},
            {"type": "scrape"}
        ]
    }
)

Response Format

Successful Response

{
    "success": True,
    "data": {
        "markdown": "# Example Page\nContent here...",
        "html": "<!DOCTYPE html><html>...</html>",
        "screenshot": "base64_encoded_image_data",
        "metadata": {
            "title": "Example Page",
            "description": "Page description",
            "language": "en",
            "sourceURL": "https://example.com",
            "statusCode": 200
        }
    }
}

Error Response

{
    "success": False,
    "error": {
        "code": "INVALID_URL",
        "message": "The provided URL is not valid",
        "details": {
            "url": "not-a-valid-url"
        }
    }
}

Error Handling

The SDK provides comprehensive error handling. Here’s how to handle different types of errors:
from scrapester.exceptions import ScrapesterError, APIError

try:
    result = app.scrape_url(
        'example.com',
        params={'formats': ['markdown']}
    )
    
    if not result['success']:
        # Handle API-level errors
        print(f"API Error: {result['error']['message']}")
        print(f"Error Code: {result['error']['code']}")
    else:
        print(result['data'])
        
except APIError as e:
    # Handle API-related errors
    print(f"API Error: {e.message}")
    print(f"Status Code: {e.status_code}")
    
except ScrapesterError as e:
    # Handle SDK-level errors
    print(f"SDK Error: {str(e)}")
    
except Exception as e:
    # Handle unexpected errors
    print(f"Unexpected error: {str(e)}")
Common error codes:
  • INVALID_URL: The provided URL is not properly formatted
  • TIMEOUT: The request timed out
  • ACCESS_DENIED: Could not access the URL (403)
  • NOT_FOUND: URL not found (404)
  • RATE_LIMITED: Too many requests to the target website

Advanced Features

Custom Headers

Add authentication or custom headers to your requests:
result = app.scrape_url(
    'example.com',
    params={
        'formats': ['markdown'],
        'headers': {
            'Authorization': 'Bearer token123',
            'Cookie': 'session=abc123'
        }
    }
)

Proxy Configuration

Use custom proxies for your requests:
app = ScrapesterApp(
    api_key="sk-YOUR_API_KEY",
    proxy="http://proxy.example.com:8080"
)

Best Practices

  1. Error Handling: Always implement proper error handling as shown above
  2. Rate Limiting: Implement delays between requests to avoid rate limits
  3. Resource Management: Use context managers when appropriate
  4. Validation: Validate URLs and parameters before making requests
  5. Timeout Handling: Set appropriate timeouts for your use case
For more details about available parameters and options, refer to our API Reference.