Interfacing with External Programs: A Lightpanda Browser Client
In this chapter, we build a complete Racket library that interfaces with an external program: the Lightpanda headless web browser. This example demonstrates practical techniques for subprocess management, string processing, and building reusable APIs in Racket. Directions for installing the Lightpanda command line tool can be found in the Lightpanda documentation.
The Problem: JavaScript-Rendered Web Content
Modern web pages often require JavaScript execution to display their content. Traditional HTTP clients like net/http-easy (used in the Web Scraping chapter) only fetch static HTML, missing the dynamic content rendered by JavaScript. Lightpanda is a headless browser that runs from the command line and outputs fully rendered pages.
Our goal: create a Racket interface that: 1. Invokes Lightpanda as a subprocess 2. Captures its output (HTML, Markdown, or semantic tree) 3. Provides helper functions for common operations like link extraction
Project Structure
The source code for this chapter is in the directory Racket-AI-book/source-code/lightpanda. The project layout follows the standard Racket package convention:
1 lightpanda/
2 lightpanda.rkt — Core implementation
3 main.rkt — Package entry point (re-exports public API)
4 README.md — Usage documentation
The main.rkt file re-exports the public API from lightpanda.rkt, making the library installable as a Racket package with raco pkg install --scope user:
1 #lang racket/base
2
3 (require "lightpanda.rkt")
4
5 (provide fetch-url
6 fetch-and-extract-links
7 demo-fetch
8 lightpanda-binary)
Configuration
You must have the lightpanda tool installed; here I verify the installation on my laptop:
1 $ which lightpanda
2 /usr/local/bin/lightpanda
We use a Racket parameter for configurable settings:
1 (define lightpanda-binary (make-parameter "lightpanda"))
Racket parameters (created with make-parameter) are the idiomatic equivalent of Common Lisp’s special variables (defvar with earmuffs). They provide thread-safe dynamic binding:
1 ;; Override the binary path for a specific call:
2 (parameterize ([lightpanda-binary "/usr/local/bin/lightpanda"])
3 (fetch-url "https://example.com/"))
4
5 ;; Or change it globally:
6 (lightpanda-binary "/usr/local/bin/lightpanda")
Running External Programs with subprocess
Racket provides subprocess in racket/system for launching external processes. Our internal helper locates the executable on the system path and executes it directly using argument lists, ensuring safe argument passing and proper cleanup of resources using dynamic-wind:
1 (define (run-command exe-path args)
2 "Run a command directly, return stdout as a string (or #f on error)."
3 (let ([resolved-path (find-executable-path exe-path)])
4 (if (not resolved-path)
5 (begin
6 (eprintf "Executable not found: ~a\n" exe-path)
7 #f)
8 (with-handlers ([exn:fail?
9 (lambda (e)
10 (eprintf "Command execution error: ~a\n" (exn-message e))
11 #f)])
12 (define-values (proc stdout stdin stderr)
13 (apply subprocess #f #f #f resolved-path args))
14 (dynamic-wind
15 (lambda () (close-output-port stdin))
16 (lambda ()
17 (let ([output (port->string stdout)])
18 (subprocess-wait proc)
19 (if (zero? (subprocess-status proc))
20 output
21 (begin
22 (eprintf "Command exited with status ~a\n" (subprocess-status \
23 proc))
24 #f))))
25 (lambda ()
26 (close-input-port stdout)
27 (close-input-port stderr)))))))
Key Racket features used:
- subprocess — Creates a new process with three pipes (stdout, stdin, stderr) plus the process handle.
- apply — Calls subprocess dynamically with a list of arguments.
- find-executable-path — Locates the full path of the executable by checking system directories.
- define-values — Destructures multiple return values from subprocess in a single binding.
- port->string — Reads the entire stdout stream into a Racket string.
- with-handlers — Gracefully catches exceptions, reporting error details to stderr and returning #f.
- dynamic-wind — Guarantees resource cleanup by closing input and output ports regardless of how the evaluation finishes.
HTML Parsing: Extracting Links
For link extraction, instead of brittle regular expressions, we use Racket’s HTML parsing libraries (html-parsing and xml/path) to parse HTML into an X-expression and query it:
1 (define (extract-links html)
2 "Return a list of href strings found in <a> tags within an HTML string."
3 (with-handlers ([exn:fail? (lambda (e)
4 (eprintf "Error parsing HTML for links: ~a\n" (exn-me\
5 ssage e))
6 '())])
7 (define xexp (html->xexp html))
8 (se-path*/list '(href) xexp)))
Walkthrough:
1. html->xexp parses the HTML string into an S-expression representation of HTML (X-expression).
2. se-path*/list searches the X-expression and extracts all occurrences of href attributes.
3. with-handlers catches any parsing failures on malformed HTML, returning an empty list.
The following diagram shows the high-level architecture of the Lightpanda browser client developed in this chapter:
The Main API Function
The central function constructs the command line arguments list and invokes the helper:
1 (define (fetch-url url
2 #:log-level [log-level "warn"]
3 #:obey-robots [obey-robots #f]
4 #:dump [dump "html"])
5 "Fetch URL using `lightpanda fetch`, returning the JS-rendered content string.
6 DUMP controls what is written to stdout; valid values are:
7 \"html\" - full rendered HTML (default)
8 \"markdown\" - page as Markdown
9 \"semantic_tree\" - semantic tree
10 \"semantic_tree_text\" - semantic tree as plain text
11 No server process is required; lightpanda is invoked directly.
12
13 (fetch-url \"https://markwatson.com/\")
14 (fetch-url \"https://markwatson.com/\" #:dump \"markdown\")
15 "
16 (define args
17 (append
18 (list "fetch")
19 (if obey-robots '("--obey_robots") '())
20 (list "--dump" dump
21 "--log_level" log-level
22 "--log_format" "pretty"
23 url)))
24 (run-command (lightpanda-binary) args))
Key techniques:
- #:keyword [param default] — Racket’s keyword arguments with default values.
- (if obey-robots '("--obey_robots") '()) — Conditional list inclusion.
- append — Combines several lists into a single flat argument list.
- (lightpanda-binary) — Invokes the configuration parameter.
Helper Functions
Higher-level helpers make common operations easy:
1 (define (fetch-and-extract-links url)
2 "Fetch URL with lightpanda and return a list of href link strings.
3
4 (fetch-and-extract-links \"https://markwatson.com/\")
5 "
6 (define html (fetch-url url))
7 (if html
8 (extract-links html)
9 (begin
10 (eprintf "Failed to fetch ~a\n" url)
11 '())))
12
13 (define (demo-fetch url)
14 "Fetch URL, print a snippet of HTML and the discovered links.
15
16 (demo-fetch \"https://markwatson.com/\")
17 "
18 (printf "\n=== Fetch demo: ~a ===\n" url)
19 (define html (fetch-url url))
20 (if html
21 (let ([links (extract-links html)])
22 (printf "Received ~a bytes of HTML.\n" (string-length html))
23 (printf "First 500 chars:\n~a\n\n"
24 (substring html 0 (min 500 (string-length html))))
25 (printf "Found ~a link(s):\n" (length links))
26 (for ([l (in-list links)])
27 (printf " ~a\n" l)))
28 (printf "No HTML returned.\n")))
Note that extract-links is not listed in provide — Racket’s module system makes it private by default. This is cleaner than the Common Lisp %-prefix convention: in Racket, privacy is enforced by the language rather than by naming convention.
Usage Examples
After installing with raco pkg install --scope user (from the source directory), or by running the file directly:
1 (require lightpanda)
2
3 ;; Basic HTML fetch
4 (fetch-url "https://markwatson.com/")
5
6 ;; Get Markdown output (good for LLM input)
7 (fetch-url "https://markwatson.com/" #:dump "markdown")
8
9 ;; Respect robots.txt
10 (fetch-url "https://markwatson.com/" #:obey-robots #t)
11
12 ;; Extract all links
13 (fetch-and-extract-links "https://markwatson.com/")
14
15 ;; Interactive demo
16 (demo-fetch "https://markwatson.com/")
You can also run the demo directly from the command line:
1 $ racket lightpanda.rkt
2
3 === Fetch demo: https://markwatson.com/ ===
4 Received 12847 bytes of HTML.
5 First 500 chars:
6 <!DOCTYPE html><html lang="en-us"> <head> ...
7
8 Found 14 link(s):
9 /index.css
10 https://mark-watson.blogspot.com
11 ...
Key Racket Takeaways
-
subprocess— Portable subprocess execution; returns four values (process, stdout, stdin, stderr) -
make-parameter— Thread-safe dynamic configuration withparameterizefor scoped overrides -
provide/ module system — Privacy is enforced by what you export, not by naming conventions -
Keyword arguments —
#:key [param default]for flexible, self-documenting APIs -
Named
let— Tail-recursive loops without mutation:(let loop ([acc '()] ...) ...) -
with-handlers— Structured exception handling with predicate-based dispatch
This pattern — shelling out to a specialized tool and processing its output — is a powerful technique. You can wrap any command-line tool this way: databases, image processors, compilers, or your own scripts. The result is a Racket API that hides the implementation details while providing access to the tool’s capabilities.