There are a number of big organizations around the world collecting data daily from millions of users. They know what users are up to now and what their needs are, what they purchase and what they do daily. These organizations have learnt that information is the key to success. As much as they learn more about their customers, companies, and markets, the more accurately they can lock their targets, customize products, tailor services, predict trends, reduce risks, and increase their profits.
ScrapingBee Extract Rules let you turn a web page into clean JSON by choosing CSS or XPath selectors and output types such as text, HTML, attributes, or lists; this guide shows short, side-by-side examples in JSON, CSS, and XPath, along with clear Python, Node, and cURL code, and it also covers pagination, dynamic pages, and quick debugging so you can copy a pattern, add your own selectors, and get reliable data with less effort.
In plain words, an extract rule is a tiny map. Each map key names a field you want. Each field holds a selector and a type. ScrapingBee runs the selectors on the page. Then it returns a JSON object that matches your map.
Because the response is JSON, your code can store, filter, and transform the data without extra parsing.
You pass your JSON in a parameter named extract_rules. For GET requests, you URL-encode the JSON. For POST requests, you can send the JSON string in the body or as a parameter. The API returns a JSON payload that holds your fields. You can also send data through a ScrapingBee Post Request in Python, Node.js, and cURL when you need to handle larger or more complex extract rules.
Here is a short rule set that reads a title and a price.
{
"title": { "selector": "h1", "type": "text", "selector_type": "css" },
"price": { "selector": ".price", "type": "text", "selector_type": "css" }
}
This map is easy to read. One key for the title, one key for the price.
A GET request must encode the JSON. The example below uses a compact string for clarity.
curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR_API_KEY&url=https%3A%2F%2Fexample.com&extract_rules=%7B%22title%22%3A%7B%22selector%22%3A%22h1%22%2C%22type%22%3A%22text%22%2C%22selector_type%22%3A%22css%22%7D%2C%22price%22%3A%7B%22selector%22%3A%22.price%22%2C%22type%22%3A%22text%22%2C%22selector_type%22%3A%22css%22%7D%7D"
When your JSON grows, consider POST plus –data-urlencode to keep your command readable.
Readability matters in small scripts. The code below handles JSON for you and keeps your rules neat.
import json
import requests
api_key = "YOUR_API_KEY"
url = "https://example.com"
rules = {
"title": {"selector": "h1", "type": "text", "selector_type": "css"},
"price": {"selector": ".price", "type": "text", "selector_type": "css"}
}
params = {
"api_key": api_key,
"url": url,
"extract_rules": json.dumps(rules)
}
r = requests.get("https://app.scrapingbee.com/api/v1/", params=params, timeout=60)
r.raise_for_status()
print(r.json())
Teams often move the rules into a JSON file. That keeps code short and lets many scripts reuse the same map.
JavaScript code follows the same flow. You serialize the rules and send them as a parameter.
const axios = require("axios");
const apiKey = "YOUR_API_KEY";
const url = "https://example.com";
const rules = {
title: { selector: "h1", type: "text", selector_type: "css" },
price: { selector: ".price", type: "text", selector_type: "css" }
};
axios.get("https://app.scrapingbee.com/api/v1/", {
params: { api_key: apiKey, url, extract_rules: JSON.stringify(rules) },
timeout: 60000
}).then(res => {
console.log(res.data);
}).catch(err => {
console.error(err.response ? err.response.data : err.message);
});
Short code like this is simple to test and easy to keep in version control.
Across many sites, a few selector shapes appear again and again. Simple selectors are faster to run and easier to keep stable.
Lists appear on product pages, blog indexes, and search screens. A list rule gives you a clean array.
{
"products": {
"selector": ".product-card",
"type": "list",
"output": {
"name": { "selector": ".name", "type": "text" },
"url": { "selector": "a", "type": "attr", "extract_attr": "href" },
"price":{ "selector": ".price", "type": "text" }
}
}
}
Readers who want full samples can search for ScrapingBee Extract Rules along with one of the long phrases you will see later in this guide. Also include this long-tail helper once: ScrapingBee CSS selectors guide.
Some pages use nested divs or odd markup. XPath can give you a direct path in those cases. You set “selector_type”: “xpath” for each field that needs it.
{
"headline": { "selector": "//h1[1]", "type": "text", "selector_type": "xpath" },
"first_image_src": {
"selector": "(//img)[1]",
"type": "attr",
"extract_attr": "src",
"selector_type": "xpath"
}
}
When you mix CSS and XPath, keep each field explicit. That habit prevents confusion when you read the map months later. Add this one time for people who search for it: ScrapingBee XPath extract rules.
Real pages rarely stay flat. A product may hold many variants. A post may list many tags. With type: “list”, you can nest arrays and keep the structure.
{
"products": {
"selector": ".product",
"type": "list",
"output": {
"name": { "selector": ".name", "type": "text" },
"variants": {
"selector": ".variant",
"type": "list",
"output": {
"sku": { "selector": ".sku", "type": "text" },
"color": { "selector": ".color", "type": "text" }
}
}
}
}
}
For better findability, include this long phrase once: ScrapingBee extract_rules nested arrays.
Links, images, and IDs live inside attributes. You can pull them with type: “attr”.
{
"link": { "selector": "a.read-more", "type": "attr", "extract_attr": "href" },
"image": { "selector": "img.hero", "type": "attr", "extract_attr": "src" }
}
The result is a direct URL that you can store or fetch next.
Many sites split content across pages. Your rules can grab the items and the next link together. Your code can then loop until no link remains.
{
"articles": {
"selector": ".post-card",
"type": "list",
"output": {
"title": { "selector": ".post-title", "type": "text" },
"url": { "selector": "a", "type": "attr", "extract_attr": "href" }
}
},
"next_page": { "selector": "a.next", "type": "attr", "extract_attr": "href" }
}
To match long-tail searches on this topic, add this phrase once: ScrapingBee pagination extract_rules.
Some pages build the DOM in the browser. You can tell the API to render scripts by adding render_js=true. Then you apply the same extract rules. For advanced browser control, you can integrate ScrapingBee Playwright to manage pages that rely heavily on scripts, cookies, or user actions before applying extract rules.
import json, requests
api_key = "YOUR_API_KEY"
params = {
"api_key": api_key,
"url": "https://example.com/spa",
"render_js": "true",
"extract_rules": json.dumps({
"title": {"selector": "h1", "type": "text"},
"items": {"selector": ".row .item", "type": "list", "output": {
"name": {"selector": ".name", "type": "text"}
}}
})
}
r = requests.get("https://app.scrapingbee.com/api/v1/", params=params, timeout=90)
print(r.json())
For readers who look for this exact need, include ScrapingBee dynamic content scraping once.
Good habits prevent bugs.
When projects need higher reliability or access to restricted sites, using a ScrapingBee premium proxy can improve stability and reduce the chance of blocks while running extract rules. If you want searchers to find complete samples, add ScrapingBee extract_rules JSON examples once in the text.
Most teams start with CSS. Some pages are easier with XPath. You can set “selector_type”: “xpath” on one field and leave the rest on CSS. When you do not set it, the service may try to detect it. To reach readers who search for that behavior, place ScrapingBee selector_type auto one time.
import json, requests
api_key = "YOUR_API_KEY"
url = "https://example.com/catalog"
rules = {
"items": {
"selector": ".card",
"type": "list",
"output": {
"name": {"selector": ".title", "type": "text"},
"price": {"selector": ".price", "type": "text"},
"detail": {"selector": "a", "type": "attr", "extract_attr": "href"}
}
}
}
resp = requests.get(
"https://app.scrapingbee.com/api/v1/",
params={"api_key": api_key, "url": url, "extract_rules": json.dumps(rules)},
timeout=60
)
data = resp.json()
for item in data.get("items", []):
print(item)
A short loop like this prints each object in the array. You can replace the print with a database write or a CSV export.
For language-specific readers, include the ScrapingBee Python Requests sample and ScrapingBee Node Axios example one time each.
Price text often holds currency signs and formatted numbers. You can extract raw text, then clean it.
Readers who focus on this topic often search for ScrapingBee price scraping tutorial, so include it once.
A short checklist can save time.
Small steps make bugs clear.
For step-by-step guidance on handling scripts, you can follow a ScrapingBee JavaScript tutorial that shows how to combine rendering options with extract rules for smoother results.
Use this quick sheet to choose the right selector and type. Scan the rows, then copy the one you need into your extract rules.
Goal | Selector | Selector Type | Type | Extra Fields | Result Shape |
---|---|---|---|---|---|
Read the page title | h1 | CSS | text | None | String |
Get the product link | a.product-link | CSS | attr | extract_attr="href" | URL string |
List the products | .product-card | CSS | list | output with inner fields | Array of objects |
Get the first headline | //h1[1] | XPath | text | None | String |
Get the first image source | (//img)[1] | XPath | attr | extract_attr="src" | URL string |
Get the next page link | a.next | CSS | attr | extract_attr="href" | URL string or null |
Extract from a dynamic page | Use relevant selectors | CSS or XPath | any | render_js="true" ; optional wait=3000 | Same as the rules above |
With ScrapingBee Extract Rules, you can turn web pages into clean JSON. You learned how fields map to selectors and types. You saw CSS and XPath side by side. You tried lists, nested arrays, attributes, and pagination. You learned when to enable script rendering. With a few strong habits, you can keep rules stable and results tidy. If your needs differ, you may also explore a ScrapingBee alternative that offers similar web scraping features with different pricing or proxy options.
To help searchers who need depth, this guide used ten long phrases exactly once each. These phrases point to full samples, selector tips, list handling, dynamic pages, and more. The goal is clear intent and gentle language that reads well and ranks well.
In this age of information, timely data collection plays an important role in the success…
ScrapingBee Playwright can take your web data scraping to the next level. If you want…
Scraping data from the web and organizing it could be hard. It takes a lot…
ScrapingBee JavaScript tutorial for fast and scalable web scraping can take your data scraping career…
Information is the key to success, and learning how to use ScrapinBee Proxy Mode for…
Data is the key to success in this age of information. Large companies make it…