Overview

Scrapester is an API service that transforms web content into clean, structured data formats. Simply provide a URL, and Scrapester will return the content in your preferred format - whether that’s markdown, structured JSON, or raw HTML.

Installation

npm i scrapester

Basic Usage

import { ScrapesterApp } from 'scrapester';

const app = new ScrapesterApp('sk-YOUR_API_KEY');

// Scrape a website
const result = await app.scrapeUrl(
    'https://example.com',
    { formats: ['markdown', 'html'] }
);
console.log(result);

Response Format

{
  "success": true,
  "data": {
    "markdown": "# Welcome to Example\nThis is the main content...",
    "html": "<!DOCTYPE html><html><body>...</body></html>",
    "metadata": {
      "title": "Example Website",
      "description": "An example website description",
      "language": "en",
      "sourceURL": "https://example.com",
      "statusCode": 200
    }
  }
}

Advanced Features

Page Interactions

Scrapester allows you to interact with pages before scraping. This is useful for handling dynamic content or login flows:
result = app.scrape_url('https://example.com', 
    params={
        'formats': ['markdown'],
        'actions': [
            {"type": "wait", "milliseconds": 1000},
            {"type": "click", "selector": "#login-button"},
            {"type": "wait", "milliseconds": 1000},
            {"type": "scrape"}
        ]
    }
)

Structured Data Extraction

Extract specific data using custom schemas:
from pydantic import BaseModel

class ProductSchema(BaseModel):
    name: str
    price: float
    description: str

result = app.scrape_url('https://example.com/product', {
    'formats': ['extract'],
    'extract': {
        'schema': ProductSchema.model_json_schema()
    }
})