Scrape - Scrapester

Overview

Scrapester converts web pages into clean, structured formats ideal for data processing and LLM applications.

Smart Handling: Manages proxies, rate limits, and JavaScript-rendered content
Multiple Formats: Outputs clean markdown, structured JSON, raw HTML, or screenshots
Intelligent Extraction: Extract specific data using schemas or natural language prompts

Basic Scraping

Installation

npm install scrapester

Usage

import { ScrapesterApp } from 'scrapester';

const app = new ScrapesterApp('sk-YOUR_API_KEY');

// Scrape a website
const result = await app.scrapeUrl('example.com', {
    formats: ['markdown', 'html']
});
console.log(result);

Response Format

{
  "success": true,
  "data": {
    "markdown": "# Welcome to Example\nThis is the main content...",
    "html": "<!DOCTYPE html><html><body>...</body></html>",
    "metadata": {
      "title": "Example Website",
      "description": "An example website description",
      "language": "en",
      "sourceURL": "https://example.com",
      "statusCode": 200
    }
  }
}

Extraction Without Schema

Extract data using natural language prompts:

curl -X POST https://api.scapester.com/v1/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer sk-YOUR_API_KEY' \
    -d '{
      "url": "https://example.com",
      "formats": ["extract"],
      "extract": {
        "prompt": "Extract the product pricing and features from the page."
      }
    }'

Advanced Features

Screenshots

Capture visual snapshots of web pages:

result = app.scrape_url('example.com', 
    params={
        'formats': ['screenshot'],
        'screenshot': {
            'fullPage': True,
            'type': 'jpeg',
            'quality': 80
        }
    }
)

Custom Headers

Add authentication or custom headers to your requests:

result = app.scrape_url('example.com', 
    params={
        'formats': ['markdown'],
        'headers': {
            'Authorization': 'Bearer token123',
            'Cookie': 'session=abc123'
        }
    }
)

Error Handling

Scrapester provides detailed error information when something goes wrong:

{
  "success": false,
  "error": {
    "code": "INVALID_URL",
    "message": "The provided URL is not valid",
    "details": {
      "url": "not-a-valid-url"
    }
  }
}

Common error codes include:

INVALID_URL: The provided URL is not properly formatted
TIMEOUT: The request timed out
ACCESS_DENIED: Could not access the URL (403)
NOT_FOUND: URL not found (404)
RATE_LIMITED: Too many requests to the target website

Best Practices

Rate Limiting: Implement appropriate delays between requests
Error Handling: Always handle potential errors in your code
Selective Extraction: Only extract the data you need
Cache Results: Store and reuse results when possible
Respect Robots.txt: Check if scraping is allowed for the target URL

For more details about available parameters and options, refer to our API Reference.

Get Started

Features

​Overview

​Basic Scraping

​Installation

​Usage

​Response Format

​Extraction Without Schema

​Advanced Features

​Screenshots

​Custom Headers

​Error Handling

​Best Practices