πΎ Data Storage β site.store() β
Validate and save scraped data against a predefined schema. Drops extra fields, fills missing ones with null (or defaults), and writes to JSON or SQLite β all with one line of code.
Overview β
site.store() is a schema-driven persistence system. Define your data shape once in piggy.store.json, then call store() anywhere in your scraper.
| Feature | Description |
|---|---|
| Schema validation | Extra fields dropped, missing fields become null |
| Type coercion | String, number, boolean, object, array types |
| Default values | Fallback values for missing fields |
| Dual output | JSON file or SQLite database |
| Batch saving | Store one record or thousands |
Quick Start β
Step 1: Create piggy.store.json in your project root β
{
"stores": [
{
"name": "products",
"destination": "./data/products.json",
"fields": {
"id": { "type": "string" },
"title": { "type": "string" },
"price": { "type": "number" },
"inStock": { "type": "boolean", "default": false },
"category": { "type": "string" }
}
}
]
}Step 2: Call site.store() in your code β
import piggy, { usePiggy } from "nothing-browser";
await piggy.launch();
await piggy.register("shop", "https://books.toscrape.com");
const { shop } = usePiggy<"shop">();
await shop.navigate();
// Scrape products
const products = await shop.evaluate(() =>
Array.from(document.querySelectorAll(".product_pod")).map(el => ({
id: el.querySelector("h3 a")?.getAttribute("href")?.match(/\d+/)?.[0],
title: el.querySelector("h3 a")?.getAttribute("title"),
price: parseFloat(el.querySelector(".price_color")?.textContent?.replace("Β£", "") || "0"),
inStock: (el.querySelector(".availability")?.textContent || "").includes("In stock"),
category: "books",
// Extra field: rating (will be dropped by schema!)
rating: el.querySelector(".star-rating")?.className
}))
);
// Store with validation
const result = await shop.store(products);
console.log(result); // { stored: 20, skipped: 0 }
// View saved data
// cat ./data/products.jsonSchema Definition β
Basic Schema Structure β
{
"stores": [
{
"name": "unique_store_name",
"destination": "./data/output.json",
"fields": {
"fieldName": { "type": "string" }
}
}
]
}| Field | Required | Description |
|---|---|---|
name | β Yes | Unique identifier (used in store(data, "name")) |
destination | β Yes | File path (.json or .db) |
fields | β Yes | Object defining the schema |
Field Types β
| Type | Description | Coercion | Invalid becomes |
|---|---|---|---|
string | Text value | String(value) | Always succeeds |
number | Numeric value | Number(value) | null if NaN |
boolean | True/false | Boolean(value) | Always succeeds |
object | Nested object | Must be plain object | null |
array | List of values | Must be array | null |
Default Values β
{
"fields": {
"name": { "type": "string", "default": "Unknown" },
"active": { "type": "boolean", "default": true },
"count": { "type": "number", "default": 0 }
}
}If a field is missing from incoming data, the default value is used instead of null.
Usage Examples β
Store Single Record β
const product = {
id: "B09X5Y8Z7W",
title: "Wireless Headphones",
price: 79.99,
inStock: true
};
await shop.store(product);Store Multiple Records β
const products = [
{ id: "001", title: "Product A", price: 10 },
{ id: "002", title: "Product B", price: 20 },
{ id: "003", title: "Product C", price: 30 }
];
await shop.store(products);
// All three saved, validated against schemaStore with Specific Schema Name β
// Use a different schema than the site name
await shop.store(products, "backup_products");// piggy.store.json
{
"stores": [
{ "name": "products", "destination": "./data/products.json", "fields": {...} },
{ "name": "backup_products", "destination": "./data/backup.json", "fields": {...} }
]
}Store Return Value β
const result = await shop.store(products);
console.log(`Saved: ${result.stored}`);
console.log(`Skipped (validation failed): ${result.skipped}`);
// { stored: 47, skipped: 3 }Real-World Examples β
1. Amazon Product Scraper β
// piggy.store.json
{
"stores": [
{
"name": "amazon_products",
"destination": "./data/products.json",
"fields": {
"asin": { "type": "string" },
"title": { "type": "string" },
"price": { "type": "number" },
"rating": { "type": "number", "default": 0 },
"reviewCount": { "type": "number", "default": 0 },
"availability": { "type": "string", "default": "Unknown" },
"scrapedAt": { "type": "number" }
}
}
]
}import piggy, { usePiggy } from "nothing-browser";
await piggy.launch();
await piggy.register("amazon", "https://www.amazon.com", { pool: 3 });
const { amazon } = usePiggy<"amazon">();
await amazon.api("/search", async (_params, query) => {
const term = query.q ?? "laptop";
await amazon.navigate(`https://www.amazon.com/s?k=${encodeURIComponent(term)}`);
await amazon.waitForSelector("[data-asin]");
const products = await amazon.evaluate(() =>
Array.from(document.querySelectorAll("[data-asin]"))
.filter(el => el.getAttribute("data-asin"))
.map(el => ({
asin: el.getAttribute("data-asin"),
title: el.querySelector("h2 span")?.textContent?.trim(),
price: parseFloat(el.querySelector(".a-price-whole")?.textContent?.replace(",", "") || "0"),
rating: parseFloat(el.querySelector(".a-icon-alt")?.textContent?.match(/(\d\.?\d?)/)?.[1] || "0"),
reviewCount: parseInt(el.querySelector(".a-size-base")?.textContent?.replace(/,/g, "") || "0"),
availability: el.querySelector(".a-size-small")?.textContent?.includes("In stock") ? "In Stock" : "Check",
scrapedAt: Date.now()
}))
);
// Store with validation
const result = await amazon.store(products);
console.log(`π¦ Stored ${result.stored} products, skipped ${result.skipped}`);
return { term, count: products.length, products, stored: result };
});
await piggy.serve(3000);2. SQLite Database Storage β
Same schema, just change destination to .db:
{
"stores": [
{
"name": "analytics",
"destination": "./data/analytics.db",
"fields": {
"pageUrl": { "type": "string" },
"visitorId": { "type": "string" },
"timestamp": { "type": "number" },
"duration": { "type": "number" },
"bounce": { "type": "boolean", "default": false }
}
}
]
}// Same code works - writes to SQLite instead of JSON
await site.store(pageView); // INSERT INTO analytics ...3. E-commerce Order Storage β
{
"stores": [
{
"name": "orders",
"destination": "./data/orders.json",
"fields": {
"orderId": { "type": "string" },
"customerEmail": { "type": "string" },
"total": { "type": "number" },
"items": { "type": "array" },
"shippingAddress": { "type": "object" },
"status": { "type": "string", "default": "pending" },
"placedAt": { "type": "number" }
}
}
]
}const order = {
orderId: "ORD-12345",
customerEmail: "customer@example.com",
total: 299.98,
items: [
{ sku: "PROD-001", name: "Product 1", quantity: 2, price: 99.99 },
{ sku: "PROD-002", name: "Product 2", quantity: 1, price: 99.99 }
],
shippingAddress: {
street: "123 Main St",
city: "Anytown",
zip: "12345"
},
placedAt: Date.now()
// status will default to "pending"
};
await shop.store(order);4. Batch Store with Error Handling β
async function storeWithRetry(site: any, data: any[], maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await site.store(data);
console.log(`β
Stored ${result.stored} records on attempt ${attempt}`);
return result;
} catch (error) {
console.error(`β Attempt ${attempt} failed:`, error.message);
if (attempt === maxRetries) throw error;
await new Promise(r => setTimeout(r, 1000 * attempt));
}
}
}
const result = await storeWithRetry(shop, products);5. Incremental Storage with Timestamps β
{
"stores": [
{
"name": "price_history",
"destination": "./data/prices.json",
"fields": {
"productId": { "type": "string" },
"price": { "type": "number" },
"currency": { "type": "string", "default": "USD" },
"timestamp": { "type": "number" }
}
}
]
}// Store price at different times
const priceSnapshot = {
productId: "ABC123",
price: 49.99,
timestamp: Date.now()
};
await site.store(priceSnapshot); // Append to history
// Later...
priceSnapshot.price = 39.99;
priceSnapshot.timestamp = Date.now();
await site.store(priceSnapshot); // Another entrySchema Validation in Action β
Input Data (from scraper) β
const rawData = {
id: "123",
title: "Cool Product",
price: 29.99,
inStock: true,
rating: 4.5, // Extra field - will be DROPPED
metadata: { // Extra field - will be DROPPED
source: "scraper"
}
};Schema β
{
"fields": {
"id": { "type": "string" },
"title": { "type": "string" },
"price": { "type": "number" },
"inStock": { "type": "boolean" }
}
}Output (saved to file) β
{
"id": "123",
"title": "Cool Product",
"price": 29.99,
"inStock": true
}Extra fields (rating, metadata) are silently dropped.
Missing Fields with Defaults β
// Schema with defaults
{
"fields": {
"name": { "type": "string", "default": "Anonymous" },
"active": { "type": "boolean", "default": true }
}
}// Input missing both fields
const data = {};
// Output
{
"name": "Anonymous",
"active": true
}Multiple Stores Example β
// piggy.store.json
{
"stores": [
{
"name": "raw_scrapes",
"destination": "./data/raw.json",
"fields": {
"url": { "type": "string" },
"html": { "type": "string" },
"timestamp": { "type": "number" }
}
},
{
"name": "processed_products",
"destination": "./data/products.json",
"fields": {
"id": { "type": "string" },
"title": { "type": "string" },
"price": { "type": "number" }
}
},
{
"name": "errors",
"destination": "./data/errors.db",
"fields": {
"errorMessage": { "type": "string" },
"url": { "type": "string" },
"timestamp": { "type": "number" }
}
}
]
}// Save raw HTML
await site.store({ url: pageUrl, html: pageHtml, timestamp: Date.now() }, "raw_scrapes");
// Save processed products
await site.store(products, "processed_products");
// Save errors
await site.store({ errorMessage: error.message, url: pageUrl, timestamp: Date.now() }, "errors");File Format Examples β
JSON Output (pretty-printed) β
[
{
"id": "B09X5Y8Z7W",
"title": "Wireless Headphones",
"price": 79.99,
"inStock": true,
"scrapedAt": 1700000000000
},
{
"id": "B08W4T3R2Y",
"title": "Bluetooth Speaker",
"price": 49.99,
"inStock": true,
"scrapedAt": 1700000000001
}
]SQLite Schema β
CREATE TABLE IF NOT EXISTS products (
id TEXT,
title TEXT,
price REAL,
inStock INTEGER,
scrapedAt INTEGER
);Best Practices β
1. Define Schemas Before Writing Code β
// Plan your data shape first
{
"stores": [{
"name": "products",
"destination": "./data/products.json",
"fields": {
// List all fields you'll need
}
}]
}2. Use Defaults for Optional Fields β
{
"rating": { "type": "number", "default": 0 },
"inStock": { "type": "boolean", "default": false },
"category": { "type": "string", "default": "Uncategorized" }
}3. Include Timestamps β
{
"scrapedAt": { "type": "number" },
"updatedAt": { "type": "number" }
}const data = {
...scrapedData,
scrapedAt: Date.now()
};4. Separate Raw and Processed Data β
{
"stores": [
{ "name": "raw_html", "destination": "./data/raw.json", "fields": {...} },
{ "name": "parsed_data", "destination": "./data/parsed.json", "fields": {...} }
]
}5. Use SQLite for Large Datasets β
// JSON is fine for <10,000 records
"destination": "./data/products.json"
// SQLite for larger datasets
"destination": "./data/products.db"Troubleshooting β
"Store 'products' not found in piggy.store.json" β
Error:
Error: Store 'products' not defined in piggy.store.jsonSolution: Check the name field in your schema:
{
"stores": [
{ "name": "products", ... } // Must match the name you pass to store()
]
}"Destination directory does not exist" β
Error: Directory for ./data/products.json doesn't exist
Solution: Create the directory or let Piggy create it:
mkdir -p ./dataPiggy will create the directory automatically in the next version.
"Type coercion failed for field 'price'" β
Error: Could not convert value to number
Solution: Ensure your scraper extracts numbers correctly:
// Instead of
price: el.querySelector(".price")?.textContent // "$29.99"
// Do
price: parseFloat(el.querySelector(".price")?.textContent?.replace("$", "") || "0") // 29.99API Reference β
| Method | Description | Returns |
|---|---|---|
site.store(data) | Store data using site's default schema | { stored: number, skipped: number } |
site.store(data, "schemaName") | Store data using named schema | { stored: number, skipped: number } |
Return Object β
interface StoreResult {
stored: number; // Number of records successfully saved
skipped: number; // Number of records that failed validation
}Schema File Location β
Piggy looks for piggy.store.json in:
- Current working directory (
./piggy.store.json) - Parent directory (
../piggy.store.json) - User home directory (
~/piggy.store.json)
Next Steps β
- Built-in API Server β Combine storage with API endpoints
- Tab Pooling β Handle concurrent requests
- Session Persistence β Save browser state
Nothing Ecosystem Β· Ernest Tech House Β· Kenya Β· 2026