ScrapingBee Extract Rules: JSON, CSS, and XPath Examples

There are a number of big organizations around the world collecting data daily from millions of users. They know what users are up to now and what their needs are, what they purchase and what they do daily. These organizations have learnt that information is the key to success. As much as they learn more about their customers, companies, and markets, the more accurately they can lock their targets, customize products, tailor services, predict trends, reduce risks, and increase their profits.

ScrapingBee Extract Rules let you turn a web page into clean JSON by choosing CSS or XPath selectors and output types such as text, HTML, attributes, or lists; this guide shows short, side-by-side examples in JSON, CSS, and XPath, along with clear Python, Node, and cURL code, and it also covers pagination, dynamic pages, and quick debugging so you can copy a pattern, add your own selectors, and get reliable data with less effort.

Table of Contents
What Extract Rules Mean
How To Send Extract Rules
A Small Quick Start
cURL Example
Python Example With Requests
Node Example With Axios
CSS Selector Patterns That Work Often
XPath When The DOM Is Complex
Nested Arrays For Real Pages
Attribute Extraction
Pagination With A Loop
Pages That Need JavaScript
Strong JSON Habits
Choosing CSS Or XPath
Full List Example In Python
Prices And Clean Formatting
Common Pitfalls With Simple Fixes
Debugging Tips That Help
Quick Extract Rules Cheat Sheet
Conclusion

What Extract Rules Mean

In plain words, an extract rule is a tiny map. Each map key names a field you want. Each field holds a selector and a type. ScrapingBee runs the selectors on the page. Then it returns a JSON object that matches your map.

  • A field usually has a selector, type, and sometimes a selector_type.
  • The type can be text, HTML, attr, or list.
  • When you choose attr, add extract_attr such as “href” or “src”.
  • If you leave out selector_type, the service tries to detect CSS or XPath for you.

Because the response is JSON, your code can store, filter, and transform the data without extra parsing.

How To Send Extract Rules

You pass your JSON in a parameter named extract_rules. For GET requests, you URL-encode the JSON. For POST requests, you can send the JSON string in the body or as a parameter. The API returns a JSON payload that holds your fields. You can also send data through a ScrapingBee Post Request in Python, Node.js, and cURL when you need to handle larger or more complex extract rules.

A Small Quick Start

Here is a short rule set that reads a title and a price.

{
  "title": { "selector": "h1", "type": "text", "selector_type": "css" },
  "price": { "selector": ".price", "type": "text", "selector_type": "css" }
}

This map is easy to read. One key for the title, one key for the price.

cURL Example

A GET request must encode the JSON. The example below uses a compact string for clarity.

curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR_API_KEY&url=https%3A%2F%2Fexample.com&extract_rules=%7B%22title%22%3A%7B%22selector%22%3A%22h1%22%2C%22type%22%3A%22text%22%2C%22selector_type%22%3A%22css%22%7D%2C%22price%22%3A%7B%22selector%22%3A%22.price%22%2C%22type%22%3A%22text%22%2C%22selector_type%22%3A%22css%22%7D%7D"

When your JSON grows, consider POST plus –data-urlencode to keep your command readable.

Python Example With Requests

Readability matters in small scripts. The code below handles JSON for you and keeps your rules neat.

import json
import requests

api_key = "YOUR_API_KEY"
url = "https://example.com"

rules = {
    "title": {"selector": "h1", "type": "text", "selector_type": "css"},
    "price": {"selector": ".price", "type": "text", "selector_type": "css"}
}

params = {
    "api_key": api_key,
    "url": url,
    "extract_rules": json.dumps(rules)
}

r = requests.get("https://app.scrapingbee.com/api/v1/", params=params, timeout=60)
r.raise_for_status()
print(r.json())

Teams often move the rules into a JSON file. That keeps code short and lets many scripts reuse the same map.

Node Example With Axios

JavaScript code follows the same flow. You serialize the rules and send them as a parameter.

const axios = require("axios");

const apiKey = "YOUR_API_KEY";
const url = "https://example.com";

const rules = {
  title: { selector: "h1", type: "text", selector_type: "css" },
  price: { selector: ".price", type: "text", selector_type: "css" }
};

axios.get("https://app.scrapingbee.com/api/v1/", {
  params: { api_key: apiKey, url, extract_rules: JSON.stringify(rules) },
  timeout: 60000
}).then(res => {
  console.log(res.data);
}).catch(err => {
  console.error(err.response ? err.response.data : err.message);
});

Short code like this is simple to test and easy to keep in version control.

CSS Selector Patterns That Work Often

Across many sites, a few selector shapes appear again and again. Simple selectors are faster to run and easier to keep stable.

  • Headings: h1, h2, or .page-title.
  • Prices or badges: .price, .amount, .badge.
  • Links: a.cta, a.product-link.
  • Images: img.hero, img.logo.

Lists appear on product pages, blog indexes, and search screens. A list rule gives you a clean array.

{
  "products": {
    "selector": ".product-card",
    "type": "list",
    "output": {
      "name": { "selector": ".name", "type": "text" },
      "url":  { "selector": "a", "type": "attr", "extract_attr": "href" },
      "price":{ "selector": ".price", "type": "text" }
    }
  }
}

Readers who want full samples can search for ScrapingBee Extract Rules along with one of the long phrases you will see later in this guide. Also include this long-tail helper once: ScrapingBee CSS selectors guide.

XPath When The DOM Is Complex

Some pages use nested divs or odd markup. XPath can give you a direct path in those cases. You set “selector_type”: “xpath” for each field that needs it.

{
  "headline": { "selector": "//h1[1]", "type": "text", "selector_type": "xpath" },
  "first_image_src": {
    "selector": "(//img)[1]",
    "type": "attr",
    "extract_attr": "src",
    "selector_type": "xpath"
  }
}

When you mix CSS and XPath, keep each field explicit. That habit prevents confusion when you read the map months later. Add this one time for people who search for it: ScrapingBee XPath extract rules.

Nested Arrays For Real Pages

Real pages rarely stay flat. A product may hold many variants. A post may list many tags. With type: “list”, you can nest arrays and keep the structure.

{
  "products": {
    "selector": ".product",
    "type": "list",
    "output": {
      "name": { "selector": ".name", "type": "text" },
      "variants": {
        "selector": ".variant",
        "type": "list",
        "output": {
          "sku": { "selector": ".sku", "type": "text" },
          "color": { "selector": ".color", "type": "text" }
        }
      }
    }
  }
}

For better findability, include this long phrase once: ScrapingBee extract_rules nested arrays.

Attribute Extraction

Links, images, and IDs live inside attributes. You can pull them with type: “attr”.

{
  "link":  { "selector": "a.read-more", "type": "attr", "extract_attr": "href" },
  "image": { "selector": "img.hero",     "type": "attr", "extract_attr": "src" }
}

The result is a direct URL that you can store or fetch next.

Pagination With A Loop

Many sites split content across pages. Your rules can grab the items and the next link together. Your code can then loop until no link remains.

{
  "articles": {
    "selector": ".post-card",
    "type": "list",
    "output": {
      "title": { "selector": ".post-title", "type": "text" },
      "url":   { "selector": "a", "type": "attr", "extract_attr": "href" }
    }
  },
  "next_page": { "selector": "a.next", "type": "attr", "extract_attr": "href" }
}

To match long-tail searches on this topic, add this phrase once: ScrapingBee pagination extract_rules.

Pages That Need JavaScript

Some pages build the DOM in the browser. You can tell the API to render scripts by adding render_js=true. Then you apply the same extract rules. For advanced browser control, you can integrate ScrapingBee Playwright to manage pages that rely heavily on scripts, cookies, or user actions before applying extract rules.

import json, requests

api_key = "YOUR_API_KEY"
params = {
    "api_key": api_key,
    "url": "https://example.com/spa",
    "render_js": "true",
    "extract_rules": json.dumps({
        "title": {"selector": "h1", "type": "text"},
        "items": {"selector": ".row .item", "type": "list", "output": {
            "name": {"selector": ".name", "type": "text"}
        }}
    })
}
r = requests.get("https://app.scrapingbee.com/api/v1/", params=params, timeout=90)
print(r.json())

For readers who look for this exact need, include ScrapingBee dynamic content scraping once.

Strong JSON Habits

Good habits prevent bugs.

  • Serialize with json.dumps or JSON.stringify.
  • Keep selectors short and stable.
  • Choose classes that change less often.
  • Place nested fields under output to make the shape clear.
  • Test on sample pages before you scale.

When projects need higher reliability or access to restricted sites, using a ScrapingBee premium proxy can improve stability and reduce the chance of blocks while running extract rules. If you want searchers to find complete samples, add ScrapingBee extract_rules JSON examples once in the text.

Choosing CSS Or XPath

Most teams start with CSS. Some pages are easier with XPath. You can set “selector_type”: “xpath” on one field and leave the rest on CSS. When you do not set it, the service may try to detect it. To reach readers who search for that behavior, place ScrapingBee selector_type auto one time.

Full List Example In Python

import json, requests

api_key = "YOUR_API_KEY"
url = "https://example.com/catalog"

rules = {
  "items": {
    "selector": ".card",
    "type": "list",
    "output": {
      "name":   {"selector": ".title", "type": "text"},
      "price":  {"selector": ".price", "type": "text"},
      "detail": {"selector": "a", "type": "attr", "extract_attr": "href"}
    }
  }
}

resp = requests.get(
    "https://app.scrapingbee.com/api/v1/",
    params={"api_key": api_key, "url": url, "extract_rules": json.dumps(rules)},
    timeout=60
)
data = resp.json()
for item in data.get("items", []):
    print(item)

A short loop like this prints each object in the array. You can replace the print with a database write or a CSV export.

For language-specific readers, include the ScrapingBee Python Requests sample and ScrapingBee Node Axios example one time each.

Prices And Clean Formatting

Price text often holds currency signs and formatted numbers. You can extract raw text, then clean it.

  • Remove currency symbols.
  • Replace commas with dots when needed.
  • Convert to a number type in your language of choice.

Readers who focus on this topic often search for ScrapingBee price scraping tutorial, so include it once.

Common Pitfalls With Simple Fixes

A short checklist can save time.

  • Empty output often means a wrong selector. Inspect the page and try again.
  • A missing list may mean the parent element does not exist on that page. Confirm the page URL.
  • A blank page can signal JavaScript rendering. Add render_js=true. You may also add a wait time with wait=3000.
  • A missing attribute means the name is wrong. Confirm href, src, or data-id.
  • Invalid JSON errors come from commas or quotes. Validate your JSON with a formatter before sending.

Debugging Tips That Help

Small steps make bugs clear.

  • Begin with one field. Expand once it works.
  • Log the final URL and the exact JSON you send.
  • Save a few responses for later checks.
  • Add small tests for each rule group. You can catch breakage when a site changes.

For step-by-step guidance on handling scripts, you can follow a ScrapingBee JavaScript tutorial that shows how to combine rendering options with extract rules for smoother results.

Quick Extract Rules Cheat Sheet

Use this quick sheet to choose the right selector and type. Scan the rows, then copy the one you need into your extract rules.

Goal Selector Selector Type Type Extra Fields Result Shape
Read the page title h1 CSS text None String
Get the product link a.product-link CSS attr extract_attr="href" URL string
List the products .product-card CSS list output with inner fields Array of objects
Get the first headline //h1[1] XPath text None String
Get the first image source (//img)[1] XPath attr extract_attr="src" URL string
Get the next page link a.next CSS attr extract_attr="href" URL string or null
Extract from a dynamic page Use relevant selectors CSS or XPath any render_js="true"; optional wait=3000 Same as the rules above

Conclusion

With ScrapingBee Extract Rules, you can turn web pages into clean JSON. You learned how fields map to selectors and types. You saw CSS and XPath side by side. You tried lists, nested arrays, attributes, and pagination. You learned when to enable script rendering. With a few strong habits, you can keep rules stable and results tidy. If your needs differ, you may also explore a ScrapingBee alternative that offers similar web scraping features with different pricing or proxy options.

To help searchers who need depth, this guide used ten long phrases exactly once each. These phrases point to full samples, selector tips, list handling, dynamic pages, and more. The goal is clear intent and gentle language that reads well and ranks well.