Web Clients in Prolog

SWI-Prolog includes comprehensive HTTP client libraries that make it straightforward to interact with REST APIs, parse JSON, and scrape web content, all from within Prolog.

Figure 19. Architecture diagram for the HTTP Client example

HTTP GET and POST Requests

SWI-Prolog ships with library(http/http_client), a full HTTP/1.1 client that handles GET, POST, PUT, and DELETE requests. The companion library library(http/http_json) adds automatic JSON serialisation and deserialisation, so a single http_get/3 call can fetch a URL and return its JSON body as a Prolog dict.

The key predicates are:

http_get(+URL, -Reply, +Options) - Sends a GET request. The json_object(dict) option tells the library to parse the response body as a SWI-Prolog dict rather than the older json/1 term format.
http_post(+URL, +Data, -Reply, +Options) - Sends a POST request. The Data argument can be atom(Body), json(Term), or other content types. Custom request headers (such as Content-Type or Authorization) are passed via the options list.

Error handling is straightforward: if the server returns a non-2xx status code, http_get and http_post throw an http_error exception, which you can catch with catch/3.

The http_client project wraps SWI-Prolog’s HTTP libraries. Here is the file http_client/prolog/rest_client.pl:

 1 %% rest_client.pl - HTTP REST client utilities
 2 :- module(rest_client, [
 3     http_get_json/2,
 4     http_post_json/3
 5 ]).
 6 
 7 :- use_module(library(http/http_client)).
 8 :- use_module(library(http/http_json)).
 9 :- use_module(library(json)).
10 
11 %% http_get_json(+URL, -JsonTerm)
12 http_get_json(URL, JsonTerm) :-
13     http_get(URL, JsonTerm, [json_object(dict)]).
14 
15 %% http_post_json(+URL, +JsonPayload, -Response)
16 http_post_json(URL, Payload, Response) :-
17     atom_json_dict(PayloadAtom, Payload, []),
18     http_post(URL, atom(PayloadAtom), Response,
19               [request_header('Content-Type'='application/json'),
20                json_object(dict)]).

The http_get_json/2 predicate is a thin wrapper that adds the json_object(dict) option. The http_post_json/3 predicate serialises a Prolog dict into a JSON string using atom_json_dict/3, then posts it with the appropriate Content-Type header. The response is automatically parsed back into a dict.

You can test this in the REPL:

1 ?- http_get_json('https://jsonplaceholder.typicode.com/todos/1', R).
2 R = _{ completed:false, id:1, title:"delectus aut autem", userId:1 }.

Working with JSON

Most modern web APIs return JSON. SWI-Prolog provides two representations for JSON data:

Dicts (the modern approach) - SWI-Prolog dicts are key-value stores accessed with dot notation (e.g., Dict.name). They map naturally to JSON objects and are the recommended format.
json/1 terms (the legacy approach) - The older json([key=value, ...]) compound term format. Still supported but less ergonomic.

The atom_json_dict/3 predicate is the core conversion tool. It converts between a JSON-formatted atom (or string) and a Prolog dict in both directions:

1 %% Parsing: JSON string -> Prolog dict
2 ?- atom_json_dict('{"name":"Alice","age":30}', Dict, []).
3 Dict = _{ age:30, name:"Alice" }.
4 
5 %% Generating: Prolog dict -> JSON string
6 ?- atom_json_dict(Json, _{name:"Bob", score:95}, []).
7 Json = '{"name":"Bob","score":95}'.

To traverse nested JSON structures, use chained dot notation: Dict.address.city accesses the city field inside a nested address object. For lists, use standard Prolog list operations: a JSON array becomes a Prolog list.

The http_client project also includes JSON utilities. Here is the file http_client/prolog/json_utils.pl:

 1 %% json_utils.pl - JSON parsing and generation utilities
 2 :- module(json_utils, [
 3     parse_json_string/2,
 4     json_to_prolog/2
 5 ]).
 6 
 7 :- use_module(library(json)).
 8 
 9 %% parse_json_string(+JsonString, -PrologTerm)
10 parse_json_string(JsonString, Term) :-
11     atom_json_dict(JsonString, Term, []).
12 
13 %% json_to_prolog(+JsonDict, -PrologFacts)
14 %% Convert a JSON dict to a list of key-value pairs
15 json_to_prolog(Dict, Pairs) :-
16     is_dict(Dict),
17     dict_pairs(Dict, _, Pairs).

The json_to_prolog/2 predicate uses dict_pairs/3 to decompose a dict into a list of Key-Value pairs. This is useful when you need to iterate over all fields in a JSON object without knowing the keys in advance.

Web Scraping

SWI-Prolog can also fetch and parse HTML pages directly. The workflow combines three libraries:

library(http/http_client) - Fetches the raw HTML content from a URL.
library(sgml) - Parses the HTML string into a DOM tree (a nested Prolog term representing the document structure).
library(xpath) - Queries the DOM tree using XPath expressions to extract specific elements.

The load_html/3 predicate from library(sgml) is tolerant of malformed HTML, making it suitable for scraping real-world web pages. Once you have a DOM tree, xpath/3 lets you select elements declaratively, for example, xpath(DOM, //a(@href), Href) extracts the href attribute from every <a> tag in the document.

The web_scraper project implements a simple HTML scraper. Here is the file web_scraper/prolog/scraper.pl:

 1 %% scraper.pl - Web scraping using HTTP client and SGML/HTML parser
 2 :- module(scraper, [
 3     fetch_page/2,
 4     extract_links/2,
 5     extract_text/2
 6 ]).
 7 
 8 :- use_module(library(http/http_client)).
 9 :- use_module(library(sgml)).
10 :- use_module(library(xpath)).
11 
12 %% fetch_page(+URL, -DOM) - Fetch and parse an HTML page
13 fetch_page(URL, DOM) :-
14     http_get(URL, Content, []),
15     setup_call_cleanup(
16         new_memory_file(MemFile),
17         (   setup_call_cleanup(
18                 open_memory_file(MemFile, write, Out),
19                 write(Out, Content),
20                 close(Out)
21             ),
22             setup_call_cleanup(
23                 open_memory_file(MemFile, read, In),
24                 load_html(In, DOM, []),
25                 close(In)
26             )
27         ),
28         free_memory_file(MemFile)
29     ).
30 
31 %% extract_links(+DOM, -Links) - Extract all href links from HTML
32 extract_links(DOM, Links) :-
33     findall(Href, xpath(DOM, //a(@href), Href), Links).
34 
35 %% extract_text(+DOM, -Text) - Extract all text content
36 extract_text(DOM, Text) :-
37     findall(T, xpath(DOM, //text, T), Texts),
38     atomic_list_concat(Texts, ' ', Text).

The fetch_page/2 predicate uses SWI-Prolog’s memory file API to bridge between the HTTP response (a Prolog atom) and the stream-based load_html/3 parser. The nested setup_call_cleanup/3 calls ensure that all streams and the memory file are properly cleaned up, even if an error occurs. This is idiomatic SWI-Prolog resource management.

The extract_links/2 and extract_text/2 predicates both use findall/3 with xpath/3 to collect results. XPath expressions like //a(@href) select all <a> elements and extract their href attribute, while //text selects all text nodes in the document.

Figure 20. Architecture diagram for the Web Scraper example

Practical Applications

These HTTP client and scraping building blocks are used throughout the book:

LLM API Integration - The rest_client module is the foundation for calling the Google Gemini and Ollama APIs (see the LLM Integration chapter). A single http_post_json/3 call sends a prompt and receives the model’s response.
SPARQL Queries - The Semantic Web chapter uses HTTP GET requests to query remote SPARQL endpoints like DBpedia and Wikidata, parsing the JSON results into Prolog terms for local reasoning.
Knowledge Graph Enrichment - Web scraping can extract structured data from HTML pages and assert it into a local knowledge graph. For example, scraping a product catalogue and converting the extracted attributes into entity/3 facts.
Data Pipeline Preprocessing - Fetching CSV or JSON datasets from public APIs (such as government open data portals) and transforming them into Prolog facts for analysis with the anomaly detection or probabilistic reasoning modules.

Optional Practice Problems

Image Link Scraper: In the web_scraper project, extend scraper.pl to parse and extract the src attribute of all <img> tags on a webpage, handling relative paths correctly.
HTTP Retry Decorator: In http_client, implement a wrapper predicate http_get_retry/3 that automatically retries an HTTP request up to three times with exponential backoff if the server returns a temporary network code.

Up next

Client-Side Prolog with WebAssembly