Overview

Scrapester Node.js SDK provides a simple wrapper around the Scrapester API to help you transform web content into structured data.

Installation

Install the Scrapester Node.js SDK using npm:
npm install scrapester

Basic Usage

  1. Get an API key from scrapester.lol
  2. Set the API key as an environment variable SCRAPESTER_API_KEY or pass it directly to the ScrapesterApp class
Here’s a complete example with error handling:
import { ScrapesterApp, ScrapeParams, ScrapeResponse } from 'scrapester';

const app = new ScrapesterApp({apiKey: "sk-YOUR_API_KEY"});

// Scrape a website
const scrapeResponse = await app.scrapeUrl('https://example.com', {
  formats: ['markdown', 'html'],
});

if (!scrapeResponse.success) {
  throw new Error(`Failed to scrape: ${scrapeResponse.error}`)
}

console.log(scrapeResponse);

Scraping a URL

The scrapeUrl method is used to extract content from a single URL:
// Basic scraping
const result = await app.scrapeUrl('example.com', { 
    formats: ['markdown', 'html'] 
});

if (!result.success) {
  throw new Error(`Failed to scrape: ${result.error}`)
}

console.log(result);

// With advanced options
const advancedResult = await app.scrapeUrl('example.com', {
    formats: ['markdown', 'html', 'screenshot'],
    screenshot: {
        fullPage: true,
        type: 'jpeg'
    },
    headers: {
        'Cookie': 'session=abc123'
    }
});

Structured Data Extraction

Extract specific data using schemas:
const schema = {
    type: 'object',
    properties: {
        title: { type: 'string' },
        price: { type: 'number' },
        features: { type: 'array', items: { type: 'string' } }
    }
};

const result = await app.scrapeUrl('example.com', {
    formats: ['extract'],
    extract: {
        schema: schema
    }
});

if (!result.success) {
    throw new Error(`Extraction failed: ${result.error}`);
}

console.log(result.data.extract);

Page Interactions

Interact with dynamic content before scraping:
const result = await app.scrapeUrl('example.com', {
    formats: ['markdown'],
    actions: [
        { type: "wait", milliseconds: 1000 },
        { type: "click", selector: "#login-button" },
        { type: "input", selector: "#email", value: "test@example.com" },
        { type: "wait", milliseconds: 1000 },
        { type: "scrape" }
    ]
});

Response Types

Successful Response

interface SuccessResponse {
    success: true;
    data: {
        markdown?: string;
        html?: string;
        screenshot?: string;
        extract?: any;
        metadata: {
            title: string;
            description: string;
            language: string;
            sourceURL: string;
            statusCode: number;
        }
    }
}

Error Response

interface ErrorResponse {
    success: false;
    error: {
        code: string;
        message: string;
        details?: any;
    }
}

Error Handling

The SDK provides detailed error information when something goes wrong. Here’s how to handle different types of errors:
try {
    const result = await app.scrapeUrl('example.com', {
        formats: ['markdown']
    });

    if (!result.success) {
        // Handle API-level errors
        console.error(`API Error: ${result.error.message}`);
        console.error(`Error Code: ${result.error.code}`);
    } else {
        console.log(result.data);
    }
} catch (error) {
    // Handle network or SDK-level errors
    console.error('SDK Error:', error.message);
}
Common error codes:
  • INVALID_URL: The provided URL is not properly formatted
  • TIMEOUT: The request timed out
  • ACCESS_DENIED: Could not access the URL (403)
  • NOT_FOUND: URL not found (404)
  • RATE_LIMITED: Too many requests to the target website

Best Practices

  1. Error Handling: Always implement proper error handling as shown above
  2. Rate Limiting: Implement delays between requests to avoid rate limits
  3. Resource Management: Close connections and clean up resources when done
  4. Validation: Validate URLs and parameters before making requests
  5. Timeout Handling: Set appropriate timeouts for your use case
For more details about available parameters and options, refer to our API Reference.