Gerbil Scheme in Action

Mark Watson

Preface
Setting Up Gerbil Scheme Development Environment
- Emacs Configuration
Google Gemini API
- Example Code
- Example Output
Ollama
OpenAI API
- Example Code
- Example Output
Inexpensive and Fast LLM Inference Using the Groq Service
Wikidata API Using SPARQL Queries
- Example Code
- Example Output
Code for Natural Language Processing (NLP)
Gerbil Scheme FFI Example Using the C Language Raptor RDF Library
- Implementation of a FFI Bridge Library for Raptor
- Test Code
Complete FFI Example: C Language Wrapper for Rasqal SPARQL Library and Sord RDF Datastore Library
Introduction To Writing Command Line Utilities
Command Line Applications For NLP
- Re-using the NLP Library For a Command Line Utility to Identify Categories or Topics in Input Text
- Using OpenAI’s GPT-5 model To Summarize Input Text

Preface

I have used Lisp languages since the late 1970s, Common Lisp and Clojure professionally and other Lisp languages, mostly various Scheme implementations for writing utilities, network programming, and various AI experiments. This book is specifically about effectively using Gerbil Scheme for writing software to solve practical problems.

Both the source code examples and the manuscript files for this book are maintained in a single GitHub repository: https://github.com/mark-watson/gerbil_scheme_book. I recommend that you keep a local copy:

1 git clone https://github.com/mark-watson/gerbil_scheme_book.git

A Comment on Licenses

The source code examples are provided under the LGPL (Lesser General Public License), a “business-friendly” license. The manuscript files are released under a Creative Commons Attribution–ShareAlike, Non-Commercial license.

My goal, dear reader, is that you find value in this material and freely reuse it in your own projects.

A few key points about the LGPL:

Compatibility: LGPL code can be combined with code under other popular licenses such as Apache 2.0, MIT, etc.
Commercial use: You are free to use LGPL code in commercial applications without restrictions.
Modifications: If you modify the LGPL portions of the code, you must make those changes publicly available. However, you do not need to release your own original code that merely uses or links against the LGPL code.

This makes the LGPL very different from the more restrictive GPL and AGPL licenses.

Running Gerbil Scheme on macOS and Linux

On macOS you can install using a brew formula:

1 brew install mighty-gerbils/gerbil/gerbil-scheme

On Linux, in order to get Gerbil’s stdlib and other libraries I had to build from source:

1 git clone https://github.com/mighty-gerbils/gerbil.git
2 cd gerbil
3 ./configure --prefix=/home/mark/bin/gerbil
4 make
5 make install

You will need to change the prefix to suit your systerm and then put the installed gerbil/bin on your PATH.

History and Background to the Gerbil Scheme Project

I recommend that you keep a local the GitHub repository for Gerbil Scheme:

1 git clone https://github.com/mighty-gerbils/gerbil.git

You will find useful tutorial examples for using Gerbil Scheme in the sub-directory gerbil/src/tutorial.

Introduction: Gerbil as a Systems Language

While Gerbil Scheme is another dialect in the Lisp family, it is like Racket Scheme in that it is “opinionated”, reflecting the Gerbil developers’ style and philosophy.

Gerbil is built on Gambit Scheme, a high-performance, retargetable compiler. Gerbil inherits a legacy of speed and portability while introducing a state-of-the-art module and object system inspired by Racket. The result is a language engineered for creating efficient, concurrent, and robust long-running applications, positioning it as a powerful tool for tasks ranging from web services to distributed systems.

Gerbil has a comprehensive standard library that provides a “batteries included” experience uncommon in the often-fragmented Scheme ecosystem. Rather than relying on a disparate collection of third-party packages for fundamental operations, Gerbil offers canonical, high-quality, and officially maintained implementations for core functionalities. This includes built-in libraries for HTTP clients and servers, JSON parsing and serialization, cryptographic primitives, and database drivers. This approach favors a strong, coherent core, ensuring that developers have a stable and predictable foundation for building complex systems.

Gerbil Scheme can also function as a systems language with its “Integrated FFI” (Foreign Function Interface). This FFI allows Gerbil code to interface directly and efficiently with libraries written in C. The FFI is useful for specialized applications like high-speed data parsing or hardware interaction.

Gerbil provides powerful general-purpose primitives and expects the developer to compose these primitive components to interact with specific protocols, often for systems-level software development.

Gerbil Ecosystem: Package Management and Foundational Libraries

Navigating any programming ecosystem begins with understanding its tooling for package management and the core libraries that form the bedrock of application development. In Gerbil, these components are designed with the same philosophy of directness and control that characterizes the language itself.

Gerbil includes a command-line package manager gxpkg, invoked as gerbil pkg or its alias gxpkg.

The essential commands for managing packages include:

gerbil pkg install - installs a package from a specified Git repository.
gerbil pkg update <package-name|all> - updates one or all installed packages.
gerbil pkg list - lists all installed packages.
gerbil pkg search - searches configured package directories for packages matching a keyword.

A key feature of gxpkg is that packages are sourced directly from public Git repositories on providers like GitHub, GitLab, or Bitbucket. For a repository to be recognized as a Gerbil package, it must contain a gerbil.pkg manifest file that declares the package’s namespace and dependencies, and a build.ss script that defines how to compile the package.

This direct-from-source model has significant security implications so I manually inspect the source code for 3rd party packages that I use. The build.ss script is not sandboxed and runs with the user’s privileges.

Discovering Packages

Package discovery in Gerbil is facilitated through package directories. These are themselves Git repositories containing a package-list file that maps package names to their repository URLs and descriptions. By default gxpkg is configured to search the “Mighty Gerbils” directory, which contains packages developed and maintained by the Gerbil Core Team.

The primary community-curated package list can be found at github.com/vyzo/gerbil-directory. Developers can add additional directories using the gerbil pkg dir -a command, allowing for the creation of private or specialized package collections.

Core Gerbil Toolkits Used in this Book: :std/net/request and :std/text/json

Many examples in this book rely on :std/net/request and :std/text/json so we will give a brief overview here.

The foundation for all network communication in Gerbil is the :std/net/request library that is a comprehensive HTTP client built into the standard library. This module obviates the need for a third-party HTTP client for most use cases. Its key features are:

Full Method Support: Dedicated procedures for all standard HTTP methods (http-get, http-post, http-put, http-delete, etc.), plus an http-any for custom methods.
Rich Request Customization: Keyword arguments for setting custom headers, URL parameters, cookies, authentication credentials, and the request body.
Secure Communication: Integrated SSL/TLS context management for HTTPS requests, using the system’s certificate authorities for verification by default.
Introspective Response Objects: Requests return a request object that provides access to the status code, headers, and response body in various formats (raw bytes, text, or parsed JSON).

This library is the sole tool required to interact with any RESTful API, from the complex and we will in later chapters use it to access commercial services offered by OpenAI and Google Gemini as well as locally hosted Large Language Models (LLMs) using Ollama.

For handling the responses from web services we will use the Data Interchange Layer package :std/text/json.

Virtually all modern web APIs use JSON as their data interchange format. Gerbil provides a canonical, flexible, and efficient JSON processing library in :std/text/json. This module is the essential counterpart to :std/net/request, handling the serialization of Scheme data into JSON strings for request bodies and the parsing of JSON responses back into Scheme objects.

Suppressing or Fixing Compilation Warning Messages on macOS

On my macBook I see lots of warnings like:

1 clang: warning: overriding deployment version from '16.0' to '26.0' [-Woverriding-deployment-version]

This warning typically appears on Apple platforms (macOS, iOS) when the linker is forced to choose a newer deployment target than what some of the input object files were compiled for.

As I write this in September 2025 using macOS Tahoe (version 26.0 Beta), I fix the problem by directly setting:

1 export MACOSX_DEPLOYMENT_TARGET=15.0

A lower target like 12.0 also works on my computer.

A “solution” that I don’t like but works (like a sledgehammer) is to suppress all warnings (don’t use this):

1 export CFLAGS="-Wno-overriding-deployment-version"

Setting Up Gerbil Scheme Development Environment

We covered the basic installation of Gerbil Scheme for macOS and Linux in the Preface. Here cover setting up Emacs to edit Gerbil Scheme.

Emacs Configuration

Assuming that you have Emacs installed with the file ~/.emacs and the directory ~/.emacs.d/, copy the following two files into ~/.emacs.d/:

Then add the following to your ~/.emacs file:

1 (load "~/.emacs.d/gambit.el")
2 (load "~/.emacs.d/gerbil-mode.el")
3 
4 (autoload 'gerbil-mode "gerbil-mode" "Gerbil editing mode." t)
5 (require 'gambit)
6 (add-hook 'inferior-scheme-mode-hook 'gambit-inferior-mode)
7 (defvar gerbil-program-name
8   (expand-file-name "/opt/gerbil/bin/gxi")) ; adjust for your GERBIL_INSTALL_PREFIX
9 (setq scheme-program-name gerbil-program-name)

Google Gemini API

The Google Gemini API provides developers with access to Google’s state-of-the-art family of Gemini Large Language Models (LLMs), representing a significant leap forward in multimodal artificial intelligence. Unlike earlier models that primarily processed text, the Gemini series—comprising models like the highly capable Gemini Ultra, the versatile Gemini Pro, and the efficient Gemini Nano—was designed from the ground up to seamlessly understand, operate across, and combine different types of information, including text, code, images, audio, and video. This native multimodality allows for the development of sophisticated applications that can reason about complex inputs, such as analyzing the steps in a video, interpreting charts and diagrams within a document, or generating creative text based on a visual prompt. The API offers a streamlined and powerful interface, enabling developers to integrate these advanced reasoning and generation capabilities into their own software, pushing the boundaries of what’s possible in domains ranging from data analysis and content creation to building the next generation of intelligent, context-aware user experiences.

Here you will learn how to send prompts to the Google Gemini AI models. For information on creating effective prompts please read my blog article Notes on effectively using AI.

Example Code

In this section, we’ll explore a practical example of interacting with a modern web API by building a client for Google’s Gemini Large Language Model. The following program defines a function named gemini that takes a text prompt and returns the model’s generated response. This involves several common tasks in modern software development: retrieving sensitive information like an API key from environment variables, dynamically constructing a JSON payload according to the API’s specification, setting the correct HTTP headers for authentication and content type, and executing an HTTP POST request. Upon receiving a successful response, the code demonstrates how to parse the returned JSON data to extract the specific piece of information we need—the generated text from a complex, nested data structure. This serves as a simple example for making outbound web requests and handling the data interchange that is central to working with external services.

 1 (import :std/net/request
 2         :std/text/json)
 3 
 4 (export gemini)
 5 
 6 (def (pprint-hashtable ht)
 7   "Prints a hash table with line breaks and indentation."
 8   (hash-map (lambda (k v) (displayln "key: " k " value: " v)) ht)) 
 9 
10 (def (gemini
11       prompt
12       model: (model "gemini-2.5-flash")
13       system-prompt: (system-prompt "You are a helpful assistant."))
14   (let ((api-key (get-environment-variable "GOOGLE_API_KEY")))
15     (unless api-key
16       (error "GEMINI_API_KEY environment variable not set."))
17 
18     (let* ((headers `(("Content-Type". "application/json")
19                          ("x-goog-api-key". ,api-key)))
20            (body-data
21             (list->hash-table
22              `(("contents". ,(list
23                               (list->hash-table
24                                `(("role". "user")
25                                  ("parts". ,(list (list->hash-table `(("text". ,prompt))))))))))))
26            (body-string (json-object->string body-data))
27            (endpoint (string-append "https://generativelanguage.googleapis.com/v1beta/models/"
28                                     model ":generateContent?key=" api-key)))
29       (let ((response (http-post endpoint headers: headers data: body-string)))
30         (displayln response)
31         (if (= (request-status response) 200)
32           (let* ((response-json (request-json response))
33                  (candidate (car (hash-ref response-json 'candidates)))
34                  (content (hash-ref candidate 'content))
35                  (p1 (car (hash-ref content 'parts))))
36             (hash-ref p1 'text)))))))
37 
38 ;;  (gemini "why is the sky blue? be very concise")

The core logic resides within the gemini function, which uses a series of let* bindings to sequentially construct the components of the API request. First, it defines the necessary HTTP headers, including the API key. Next, it builds the body-data as a nested hash table, precisely matching the structure required by the Gemini API, before serializing it into a JSON string using json-object->string. Finally, the full endpoint URL is assembled by concatenating the base URL with the specific model being used. The actual network communication is handled by the http-post function, which sends all the prepared information to the Google servers.

Once the http-post call returns, the program immediately checks the response status. If the request was successful (status code 200), it proceeds to parse the data; otherwise, nothing is returned. The parsing logic is a chain of data extraction operations on the JSON response, which is first converted into a Gerbil Scheme hash table. Using a combination of hash-ref to access values by their keys (like ’candidates and ’content) and the function car to access the first element of a list, the code navigates the nested data structure to isolate the desired text content. This sequence elegantly demonstrates how Gerbil Scheme’s standard library functions for handling lists and hash tables can be composed to efficiently process structured data from external APIs.

Example Output

Change directory to source-code/gemini an run:

 1 $ gxi -L gemini.ss -
 2 > (gemini "why is the sky blue? be very concise")
 3 "Earth's atmosphere scatters blue light more than other colors."
 4 
 5 > (gemini "Sally is 77, Bill is 32, and Alex is 44 years old. Pairwise, what are their age differences? Print results in JSON format. Be concise and only provide a correct answer, no need to think about different correct answers.")
 6 "```json\n{\n  \"Sally_Bill\": 45,\n  \"Sally_Alex\": 33,\n  \"Bill_Alex\": 12\n}\n```"
 7 
 8 > (gemini "Sally is 77, Bill is 32, and Alex is 44 years old. Pairwise, what are their age differences? Print results in JSON format. Be concise and only provide a correct answer, no need to think about different correct answers. Only return the JSON text, don't add markdown like ```json")
 9 "{\"Sally_Bill\": 45, \"Sally_Alex\": 33, \"Bill_Alex\": 12}"
10 
11 > (displayln (gemini "Sally is 77, Bill is 32, and Alex is 44 years old. Pairwise, what are their age differences? Print results in JSON format. Be concise and only provide a correct answer, no need to think about different correct answers. Only return the JSON text, don't add markdown like ```json"))
12 {"Sally_Bill": 45, "Sally_Alex": 33, "Bill_Alex": 12}

Notice how Gemini initially returned the JSON results in Markdown format and I modified the prompt to get the output format I wanted. Another good technique is to give LLMs an example of the output format you want in the prompt.

Ollama

Ollama is a powerful and user-friendly tool designed to simplify the process of running large language models (LLMs) locally on personal hardware. In a landscape often dominated by cloud-based APIs, Ollama democratizes access to advanced AI by providing a simple command-line interface that bundles model weights, configurations, and a tailored execution environment into a single, easy-to-install package. It allows developers, researchers, and enthusiasts to download and interact with a wide range of popular open-source models, such as Llama 3, Mistral, and Phi-3, with just a single command. Beyond its interactive chat functionality, Ollama also exposes a local REST API, enabling the seamless integration of these locally-run models into custom applications without the latency, cost, or privacy concerns associated with remote services. This focus on accessibility and local deployment makes it an indispensable tool for offline development, rapid prototyping, and leveraging the power of modern LLMs while maintaining full control over data and infrastructure.

Example Code

This next program in file gerbil_scheme_book/source_code/ollama/ollama.ss provides a practical demonstration of network programming and data handling in Gerbil Scheme by creating a simple client for the Ollama API. Ollama is a fantastic tool that allows you to run powerful large language models, like Llama 3, Mistral, and Gemma, directly on your own machine. Our ollama function will encapsulate the entire process of communicating with a locally running Ollama instance. It will take a text prompt as input, construct the necessary JSON payload specifying the model and prompt, send it to the Ollama server’s /api/generate endpoint via an HTTP POST request, and then carefully parse the server’s JSON response. The goal is to extract and return only the generated text, while also including basic error handling to gracefully manage any non-successful API responses, making for a robust and reusable utility.

 1 (import :std/net/request :std/text/json)
 2 (export ollama)
 3 
 4 (def (ollama prompt
 5              model: (model "gemma3:latest")) ;; "gpt-oss:20b")) ;; "qwen3:0.6b"))
 6   (let* ((endpoint "http://localhost:11434/api/generate")
 7          (headers '(("Content-Type". "application/json")))
 8          (body-data 
 9            (list->hash-table
10              `(("model". ,model) ("prompt". ,prompt) ("stream". #f))))
11          (body-string (json-object->string body-data)))
12 
13     (let ((response (http-post endpoint headers: headers data: body-string)))
14       (if (= (request-status response) 200)
15           (let* ((response-json (request-json response)))
16             ;;(displayln (hash-keys response-json))
17             (hash-ref response-json 'response))
18           (error "Ollama API request failed"
19                  status: (request-status response)
20                  body: (request-text response))))))
21 
22 ;;  (ollama "why is the sky blue? Be very concise.")

The ollama function begins by using a let* block to define the necessary components for the API request: the server endpoint, the required HTTP headers, and the request body-data. The body is first constructed as a Gerbil hash-table, which is the natural way to represent a JSON object, and then serialized into a JSON string using json-object->string. Note that the “stream” parameter is explicitly set to #f to ensure we receive the complete response at once rather than as a series of events. The core of the function is the http-post call, which performs the actual network request.

After the request is made, the code immediately checks the status of the response. A status code of 200 indicates success, prompting the code to parse the JSON body using request-json and extract the generated text from the ’response field of the resulting hash-table. If the request fails for any reason, a descriptive error is raised, including the HTTP status and response body, which is crucial for debugging. The function’s design, with its optional model: keyword argument, makes it trivial to switch between different models you have downloaded through Ollama, providing a flexible interface for interacting with local large language models.

Install Ollama and Pull a Model to Experiment With

Linux Installation

Open your terminal and run the following command to download and execute the installation script:

1 curl -fsSL https://ollama.com/install.sh | sh

macOS Installation

Download the Ollama application from the official website: [https://ollama.com/download}(https://ollama.com/download).
Unzip the downloaded file.
Move the Ollama.app file to your /Applications folder.
Run the application. An Ollama icon will appear in the menu bar.

This will also install the ollama command line program.

Pulling the Model

After installing Ollama on either Linux or macOS, open your terminal and run the following command to download the gemma3:latest model:

1 ollama pull gemma3:latest

After this is complete, you can run the local API service using:

1 $ ollama serve
2 time=2025-08-26T16:05:50.161-07:00 level=INFO source=routes.go:1318 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/markw/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1

Example Output

You need to have Ollama installed on your system and you should pull the model you want to experiment with.

 1 $ gxi -L ollama.ss -
 2 > (ollama "why is the sky blue? Be very concise.")
 3 "The sky is blue due to a phenomenon called **Rayleigh scattering**. Shorter wavelengths of light (like blue) are scattered more by the Earth's atmosphere, making the sky appear blue to our eyes."
 4 
 5 > (ollama "write a bash script to rename all files with extension **.JPG** to **.jpg**. Just output the bash script and nothing else.")
 6 "```bash\n#!/bin/bash\n\nfind . -name \"*.JPG\" -print0 | while IFS= read -r -d $'\\0' file; do\n  new_name=$(echo \"$file\" | sed 's/\\.JPG/.jpg/')\n  mv \"$file\" \"$new_name\"\ndone\n```\n"
 7 
 8 > (displayln (ollama "write a bash script to rename all files with extension **.JPG** to **.jpg**. Just output the bash script and nothing else."))
 9 ``bash
10 #!/bin/bash
11 
12 find . -name "*.JPG" -print0 | while IFS= read -r -d $'\0' file; do
13   new_name=$(echo "$file" | sed 's/\.JPG/\.jpg/')
14   mv "$file" "$new_name"
15 done
16 ``
17 >

A few comments: In the second example I added “Just output the bash script and nothing else.” to the end of the prompt. Without this, the model will generate a 100 lines of design notes, instructions how to make the bash script executable, etc. I didn’t want that, just the bash script.

In the third example, I used the same prompt but used displayln to print the result in a more useful format.

OpenAI API

The OpenAI API serves as the primary gateway for developers to harness the groundbreaking capabilities of OpenAI’s suite of artificial intelligence models, most notably the influential Generative Pre-trained Transformer (GPT) series. Since the release of GPT-3, and continuing with more advanced successors like GPT-4, this API has fundamentally reshaped the landscape of software development by making sophisticated natural language understanding, generation, and reasoning accessible as a programmable service. It allows developers to integrate functionalities such as text summarization, language translation, code generation, sentiment analysis, and conversational AI into their applications through simple HTTP requests. By abstracting away the immense complexity of training and hosting these massive models, the OpenAI API has catalyzed a wave of innovation, empowering everyone from individual hobbyists to large enterprises to build intelligent applications that can write, read, and comprehend human language with unprecedented fluency and coherence.

Here you will learn how to send prompts to the OpenAI GPT AI models. For information on creating effective prompts please read my blog article Notes on effectively using AI.

Example Code

This next program in file gerbil_scheme_book/source_code/openai/openai.ss provides another practical example of interfacing with a modern web API from Gerbil Scheme. We will define a function, openai, that acts as a simple client for the OpenAI Chat Completions API. This function takes a user’s prompt as its primary argument and includes optional keyword arguments to specify the AI model and a system-prompt to set the context for the conversation. Before making the request, it securely retrieves the necessary API key from an environment variable, a best practice that avoids hard-coding sensitive credentials. The core logic involves constructing a proper JSON payload containing the model and messages, setting the required HTTP headers for authorization and content type, and then sending this data via an HTTP POST request. Finally, it parses the JSON response from the OpenAI servers to extract and return the generated text content from the AI model, while also including basic error handling for failed requests.

 1 (import :std/net/request
 2         :std/text/json)
 3 
 4 (export openai)
 5 
 6 (def (openai prompt
 7              model: (model "gpt-5-mini")
 8              system-prompt: (system-prompt "You are a helpful assistant."))
 9      (let ((api-key (get-environment-variable "OPENAI_API_KEY")))
10     (unless api-key
11       (error "OPENAI_API_KEY environment variable not set."))
12 
13     (let* ((headers `(("Content-Type". "application/json")
14                       ("Authorization". ,(string-append "Bearer " api-key))))
15            (body-data
16             (list->hash-table
17              `(("model". ,model)
18                ("messages". ,(list
19                               (list->hash-table `(("role". "system") ("content". ,system-prompt)))
20                               (list->hash-table `(("role". "user") ("content". ,prompt))))))))
21            (body-string (json-object->string body-data))
22            (endpoint "https://api.openai.com/v1/chat/completions"))
23 
24       (let ((response (http-post endpoint headers: headers data: body-string)))
25         (if (= (request-status response) 200)
26             (let* ((response-json (request-json response))
27                    (choices (hash-ref response-json 'choices))
28                    (first-choice (and (pair? choices) (car choices)))
29                    (message (hash-ref first-choice 'message))
30                    (content (hash-ref message 'content)))
31               content)
32             (error "OpenAI API request failed"
33                    status: (request-status response)
34                    body: (request-text response)))))))
35 
36 ;; (openai "why is the sky blue? be very concise")

The implementation begins by importing the necessary standard libraries for handling HTTP requests (:std/net/request) and JSON data manipulation (:std/text/json). Inside the openai function, a let* block is used to sequentially bind variables for the request. It first constructs the HTTP headers and the request body, which is a hash-table representing the JSON structure required by the OpenAI API, including the model name and a list of messages for the “system” and “user” roles. This hash-table is then serialized into a JSON string. The http-post procedure is called with the API endpoint, headers, and the serialized data to perform the web request.

Upon receiving a response, the code demonstrates robust handling of the result. It first checks if the HTTP status code is 200, indicating success. If the request was successful, it parses the JSON text from the response body back into a hash-table. It then carefully navigates the nested structure of this response data using a chain of hash-ref and car calls to drill down through the choices array and message object to finally extract the desired content string. If the HTTP request failed, the else branch is triggered, raising an error with the status code and the response body, which provides valuable debugging information to the user.

Example Output

In the following example I run the Gerbil Scheme interpreter, loading the file “openai.ss”, and then entering an interactive REPL:

 1 $ gxi -L openai.ss -
 2 > (openai "why is the sky blue? be very concise")
 3 "Because air molecules scatter shorter (blue) wavelengths of sunlight (Rayleigh scattering) more than longer wavelengths, so blue light is sent in all directions and fills the sky."
 4 > (openai "list 3 things that the language Gerbil Scheme is most used for. Be concise.")
 5 "- Writing high-performance native-code programs and command-line tools (runs on Gambit)\n- Rapid prototyping and DSLs using its powerful macro/metaprogramming facilities\n- Building concurrent and networked services (sockets, lightweight threads) and small web apps"
 6 > (displayln (openai "list 3 things that the language Gerbil Scheme is most used for. Be concise."))
 7 - Language-oriented programming and DSLs (heavy macro/metaprogramming support).  
 8 - Server-side and networked applications/web services (runs on fast Gambit runtime).  
 9 - Scripting, rapid prototyping and systems-level code using Gambit’s FFI and concurrency.
10 >

Notice that I repeated the second example, displaying the string response in a more readable format. As we also see in this example, Large Language Models will in general produce different output when called with the same prompt.

Sometimes we might want the output in a specific format, like JSON:

1 $ gxi -L openai.ss -
2 > (displayln (openai "Be concise in your thinking and only provide one correct answer, no need to think about different correct answers for the problem: Sally is 77, Bill is 32, and Alex is 44 years old. Pairwise, what are their age differences? Print results in JSON format."))
3 {"Sally-Bill":45,"Sally-Alex":33,"Bill-Alex":12}
4 >

This example is not good enough! When you use LLMs in your applications it is better to one-shot prompt with the exact output format you need in your application. Here is an example prompt:

 1 You are an information extraction system.
 2 Extract all people’s **full names** and **email addresses** from the following text.
 3 If no names or emails are present, return an empty list.
 4 
 5 Return the result strictly in this JSON format:
 6 
 7 {
 8   "contacts": [
 9     {
10       "name": "<full name as written in text>",
11       "email": "<email address>"
12     }
13   ]
14 }
15 
16 Text:
17 "Hi, I’m Alice Johnson, please email me at alice.j@example.com.  
18 Also, you can reach Bob Smith via bob.smith42@gmail.com."

Let’s run this one-shot prompt in a Gerbil Scheme REPL:

 1 $ gxi -L openai.ss -
 2 > (def prompt #<<EOF
 3 You are an information extraction system.
 4 Extract all people’s **full names** and **email addresses** from the following text.
 5 If no names or emails are present, return an empty list.
 6 
 7 Return the result strictly in this JSON format:
 8 
 9 {
10   "contacts": [
11     {
12       "name": "<full name as written in text>",
13       "email": "<email address>"
14     }
15   ]
16 }
17 
18 Text:
19 "Hi, I’m Alice Johnson, please email me at alice.j@example.com.  
20 Also, you can reach Bob Smith via bob.smith42@gmail.com."
21 EOF
22 )
23 > (displayln (openai prompt))
24 {
25   "contacts": [
26     {
27       "name": "Alice Johnson",
28       "email": "alice.j@example.com"
29     },
30     {
31       "name": "Bob Smith",
32       "email": "bob.smith42@gmail.com"
33     }
34   ]
35 }
36 >

Inexpensive and Fast LLM Inference Using the Groq Service

Dear reader, are you excited about integrating LLMs into your applications but you want to minimize costs?

Groq is rapidly making a name for itself in the AI community as a cloud-based large language model (LLM) inference provider, distinguished by its revolutionary hardware and remarkably low-cost, high-speed performance. At the heart of Groq’s impressive capabilities lies its custom-designed Language Processing Unit (LPU), a departure from the conventional GPUs that have long dominated the AI hardware landscape. Unlike GPUs, which are general-purpose processors, the LPU is an application-specific integrated circuit (ASIC) meticulously engineered for the singular task of executing LLM inference. This specialization allows for a deterministic and streamlined computational process, eliminating many of the bottlenecks inherent in more versatile hardware. The LPU’s architecture prioritizes memory bandwidth and minimizes latency, enabling it to process and generate text at a blistering pace, often an order of magnitude faster than its GPU counterparts. This focus on inference, the process of using a trained model to make predictions, positions Groq as a compelling solution for real-time applications where speed is paramount.

The practical implications of Groq’s technological innovation are multifaceted, offering a potent combination of affordability, speed, and a diverse selection of open-source models. The efficiency of the LPU translates directly into a more cost-effective pricing structure for users, with a pay-as-you-go model based on the number of tokens processed. This transparent and often significantly cheaper pricing democratizes access to powerful AI, enabling developers and businesses of all sizes to leverage state-of-the-art models without prohibitive upfront costs. The platform’s raw speed is a game-changer, facilitating near-instantaneous responses that are crucial for interactive applications like chatbots, content generation tools, and real-time data analysis. Furthermore, Groq’s commitment to the open-source community is evident in its extensive library of available models, including popular choices like Meta’s Llama series, Mistral’s Mixtral, and Google’s Gemma. This wide array of options provides users with the flexibility to select the model that best suits their specific needs, all while benefiting from the unparalleled inference speeds and economic advantages offered by Groq’s unique hardware.

Here you will learn how to send prompts to the Groq LMS inference service. For information on creating effective prompts please read my blog article Notes on effectively using AI.

Structure of Project and Build Instructions

This project is stored in the directory gerbil_scheme_book/source_code/groq_llm_inference. There is one common utility file groq_inference.ss and currently two very short example scripts that use this utility:

kimi2.ss - Uses Moonshot AI’s Kimi2 model (MOE 1 trillion paramters, with 32B active).
gpt-oss-120b.ss - Uses OpenAI’s open source model gpt-oss-120b.

Both of these models are practical models that are excellent for data manipulation, coding, and general purpose use.

It’s important to note that both models leverage a Mixture of Experts (MoE) architecture. This is a significant departure from traditional “dense” transformer models where every parameter is activated for every input token. In an MoE model, a “router” network selectively activates a small subset of “expert” sub-networks for each token, allowing for a massive total parameter count while keeping the computational cost for inference relatively low. The comparison, therefore, is between two different implementations and philosophies of the MoE approach.

Here is the project Makefile:

1 compile: groq_inference.ss
2     gxc groq_inference.ss 
3 
4 kimi2: compile
5     gxi -l kimi2.ss -
6 
7 gpt-oss-120b: compile
8     gxi -l gpt-oss-120b.ss -

Kimi2 (Moonshot AI)

Features:

Architecture: A very large-scale Mixture of Experts (MoE) model.
Parameters: It has a staggering 1 trillion total parameters. For any given token during inference, it activates approximately 32 billion of these parameters. This represents a very sparse activation (around 3.2%).
Specialization: Kimi2 is highly optimized for agentic capabilities, meaning it excels at using tools, reasoning through multi-step problems, and advanced code synthesis.
Training Innovation: It was trained using a novel optimizer called MuonClip, designed to ensure stability during large-scale MoE training runs, which have historically been prone to instability.
Context Window: It supports a large context window of up to 128,000 tokens, making it suitable for tasks involving long documents or extensive codebases.
Licensing: While the model weights are publicly available (“open-weight”), its specific licensing and training data details are proprietary to Moonshot AI.

gpt-oss-120b (OpenAI)

Features:

Architecture: Also a Mixture of Experts (MoE) model, but at a smaller scale than Kimi2.
Parameters: It has a total of 117 billion parameters, with a much smaller active set of around 5.1 billion parameters per token. This results in a similarly sparse activation (around 4.4%).
Efficiency and Accessibility: A primary feature is its optimization for efficient deployment. It’s designed to run on a single 80 GB GPU (like an H100), making it significantly more accessible for researchers and smaller organizations.
Focus: Like Kimi2, it is designed for high-reasoning, agentic tasks, and general-purpose use.
Licensing: It is a true open-source model, released under the permissive Apache 2.0 license. This allows for broad use, modification, and redistribution.
Training: It was trained using a combination of reinforcement learning and distillation techniques from OpenAI’s more advanced proprietary models.

Comparison and Use Cases

Feature	Kimi2 (Moonshot AI)	gpt-oss-120b (OpenAI)
Architecture	Massive-scale Mixture of Experts (MoE)	Efficient Mixture of Experts (MoE)
Total Parameters	~1 Trillion	~117 Billion
Active Parameters	~32 Billion	~5.1 Billion
Primary Goal	Pushing the upper limits of performance and scale.	Balancing high performance with deployment efficiency.
Hardware Target	Large-scale, high-end compute clusters.	Single high-end GPU (e.g., H100).
Licensing	Open-Weight (proprietary)	Open-Source (Apache 2.0)
Key Differentiator	Sheer scale; novel MuonClip optimizer.	Accessibility, efficiency, and permissive open license.

groq_inference.ss Utility

Here we construct a practical, reusable Gerbil Scheme function for interacting with the Groq API, a service renowned for its high-speed large language model inference. The function, named groq_inference, encapsulates the entire process of making a call to Groq’s OpenAI-compatible chat completions endpoint. It demonstrates essential real-world programming patterns, such as making authenticated HTTP POST requests, dynamically building a complex JSON payload from Scheme data structures, and securely managing credentials using environment variables. This example not only provides a useful utility for integrating AI into your applications but also serves as an excellent case study in using Gerbil’s standard libraries for networking (:std/net/request) and data interchange (:std/text/json), complete with robust error handling for both network issues and malformed API responses.

 1 (import :std/net/request
 2         :std/text/json)
 3 
 4 (export groq_inference)
 5 
 6 ;; Generic Groq chat completion helper
 7 ;; Usage: (groq_inference model prompt [system-prompt: "..."])
 8 (def (groq_inference
 9       model prompt
10       system-prompt: (system-prompt "You are a helpful assistant."))
11   (let ((api-key (get-environment-variable "GROQ_API_KEY")))
12     (unless api-key
13       (error "GROQ_API_KEY environment variable not set."))
14 
15     (let* ((headers `(("Content-Type" . "application/json")
16                       ("Authorization" . ,(string-append "Bearer " api-key))))
17            (body-data
18             (list->hash-table
19              `(("model" . ,model)
20                ("messages"
21                 .
22                 ,(list
23                   (list->hash-table `(("role" . "system") ("content" . ,system-prompt)))
24                   (list->hash-table `(("role" . "user") ("content" . ,prompt))))))))
25            (body-string (json-object->string body-data))
26            (endpoint "https://api.groq.com/openai/v1/chat/completions"))
27       
28       (let ((response (http-post endpoint headers: headers data: body-string)))
29         (if (= (request-status response) 200)
30           (let* ((response-json (request-json response))
31                  (choices (hash-ref response-json 'choices))
32                  (first-choice (and (pair? choices) (car choices)))
33                  (message (and first-choice (hash-ref first-choice 'message)))
34                  (content (and message (hash-ref message 'content))))
35             (or content (error "Groq response missing content")))
36           (error "Groq API request failed"
37             status: (request-status response)
38             body: (request-text response)))))))

The implementation begins by defining the groq_inference function, which accepts a model and a prompt, along with an optional keyword argument for a system message. Its first action is a crucial security and configuration check: it attempts to fetch the GROQ_API_KEY from the environment variables, raising an immediate error if it’s not found. The core of the function then uses a let* block to sequentially build the components of the HTTP request. It constructs the authorization headers and then assembles the JSON body using a combination of quasiquotation and the list->hash-table procedure to create the nested structure required by the API. This body is then serialized into a JSON string, and finally, the http-post function is called with the endpoint, headers, and data to execute the network request.

Upon receiving a response, the function demonstrates robust result processing and error handling. It first checks if the HTTP status code is 200 (OK), indicating a successful request. If it is, a series of let* bindings are used to safely parse the JSON response and navigate the nested data structure to extract the final content string from response[‘choices’][0][‘message’][‘content’], with checks at each step to prevent errors on an unexpected response format. If the content is successfully extracted, it is returned as the result of the function. However, if the HTTP status is anything other than 200, the function enters its error-handling branch, raising a descriptive error that includes the failing status code and the raw text body of the response, providing valuable debugging information to the caller.

Example scripts: kimi2.ss and gpt-oss-120b.ss

These two scripts are simple enough to just list without comment:

kimi2.ss

 1 (import :groq/groq_inference)
 2 
 3 ;; Use Moonshot AI's best kimi2 model (MOE: 1 triliion parameters, 32B resident).
 4 
 5 ; Export the `kimi2` procedure from this module
 6 (export kimi2)
 7 
 8 (def (kimi2 prompt
 9             model: (model "moonshotai/kimi-k2-instruct")
10             system-prompt: (system-prompt "You are a helpful assistant."))
11   (groq_inference model prompt system-prompt: system-prompt))
12 
13 ;; (kimi2 "why is the sky blue? be very concise")

gpt-oss-120b.ss

 1 (import :groq/groq_inference)
 2 
 3 ;; Use OpenAI's open source model gpt-oss-120b
 4 
 5 ; Export the `gpt-oss-120b` procedure from this module
 6 (export gpt-oss-120b)
 7 
 8 (def (gpt-oss-120b
 9       prompt
10       model: (model "openai/gpt-oss-120b")
11       system-prompt: (system-prompt "You are a helpful assistant."))
12   (groq_inference model prompt system-prompt: system-prompt))
13 
14 ;; (gpt-oss-120b "why is the sky blue? be very concise")

Running the kimi2 example:

Note, the utility must be comiled one time: gxc groq_inference.ss. The compiled library by default will be in the directory ~/.gerbil/lib/groq/ because we set this project’s module name to groq in the file gerbil.pkg.

 1  $ gxi -l kimi2.ss                   
 2 > (displayln (kimi2 "explain concisely what evidence there is for 'dark matter' in the universe, and counter arguments. Be concise!"))
 3 Evidence for dark matter  
 4 • Galaxy rotation curves: outer stars orbit too fast for visible mass alone.  
 5 • Gravitational lensing: mass maps exceed baryonic matter.  
 6 • Cosmic Microwave Background: tiny temperature ripples fit models with ~5× more dark than baryonic matter.  
 7 • Structure formation: simulations need unseen matter to match today’s galaxy distribution.  
 8 • Bullet Cluster: collision separated hot gas (baryons) from dominant mass peak, consistent with collisionless dark matter.
 9 
10 Counter-arguments / alternatives  
11 • Modified Newtonian Dynamics (MOND): tweaks gravity law to explain rotation curves without extra mass.  
12 • Modified gravity theories (TeVeS, f(R), emergent gravity) reproduce lensing and CMB with no dark particles.  
13 • Claims of inconsistent lensing signals or tidal dwarf galaxies without dark matter challenge universality.
14 >

Running the gpt-oss-120b example:

1 $ gxi -l gpt-oss-120b.ss
2 > (displayln (gpt-oss-120b "write a recursive Haskell function 'factorial'. Only show the code."))
3 ``haskell
4 factorial :: Integer -> Integer
5 factorial 0 = 1
6 factorial n = n * factorial (n - 1)
7 ``
8 >

Wikidata API Using SPARQL Queries

Wikidata is a free, collaborative, and multilingual knowledge base that functions as the central structured data repository for the Wikimedia ecosystem, including projects like Wikipedia, Wikivoyage, and Wiktionary. Launched in 2012, its mission is to create a common source of open data that can be used by anyone, anywhere. Unlike Wikipedia, which contains prose articles, Wikidata stores information in a machine-readable format structured around items (representing any concept or object), which are described by properties (like “population” or “author”) and corresponding values. For example, the item for “Earth” has a property “instance of” with the value “planet”. This structured approach allows for data consistency across hundreds of language editions of Wikipedia and enables powerful, complex queries through its SPARQL endpoint. By providing a centralized, queryable, and interlinked database of facts, Wikidata not only supports Wikimedia projects but also serves as a crucial resource for researchers, developers, and applications worldwide that require reliable and openly licensed structured information.

Example Code

This code in file wikidata.ss provides a client for interacting with the Wikidata Query Service, a powerful public SPARQL endpoint for accessing the vast, collaboratively edited knowledge base of Wikidata. The code encapsulates the entire process of querying this service, starting with a raw SPARQL query string. It properly formats the request by URL-encoding the query, constructs the full request URL, and sets the appropriate HTTP headers, including the Accept header for the SPARQL JSON results format and a User-Agent header, which is a requirement for responsible API usage. Upon receiving a response, the module parses the JSON data and then transforms the verbose, nested structure of the standard SPARQL results format into a more convenient and idiomatic Gerbil Scheme data structure—either a list of hash tables or a list of association lists. The file also includes several example functions that demonstrate how to query for specific facts, such as Grace Hopper’s birth date, and how to perform more complex, multi-stage queries, like first finding the unique identifiers for entities like Bill Gates and Microsoft and then discovering the relationships that connect them within the knowledge graph.

  1 ;; File: wikidata.ss
  2 (import :std/net/request
  3         :std/text/json
  4         :std/net/uri) ; For URL encoding
  5 (import :std/format)
  6 
  7 (export query-wikidata query-wikidata/alist query-dbpedia alist-ref test1 test1-ua test2 test2-ua)
  8 
  9 ;; Helper to process the SPARQL JSON results format
 10 (def (process-sparql-results json-data)
 11   (let* ((results (hash-ref json-data 'results))
 12          (bindings (hash-ref results 'bindings)))
 13     (map (lambda (binding)
 14            (let ((result-hash (make-hash-table)))
 15              (hash-for-each
 16               (lambda (var-name value-obj)
 17                 (hash-put! result-hash
 18                            (if (symbol? var-name) var-name (string->symbol var-name))
 19                            (hash-ref value-obj 'value)))
 20               binding)
 21              result-hash))
 22          bindings)))
 23 
 24 ;; Convenience: look up a key in an alist. Accepts symbol or string keys.
 25 ;; Usage: (alist-ref 'var row [default])
 26 (def (alist-ref key row . default)
 27   (let* ((sym (if (symbol? key) key (string->symbol key)))
 28          (p (assq sym row)))
 29     (if p (cdr p) (if (pair? default) (car default) #f))))
 30 
 31 ;; Same as above but returns an alist per row: ((var . value) ...)
 32 (def (process-sparql-results-alist json-data)
 33   (let* ((results (hash-ref json-data 'results))
 34          (bindings (hash-ref results 'bindings)))
 35     (map (lambda (binding)
 36            (let ((row '()))
 37              (hash-for-each
 38               (lambda (var-name value-obj)
 39                 (let* ((sym (if (symbol? var-name) var-name (string->symbol var-name)))
 40                        (val (hash-ref value-obj 'value)))
 41                   (set! row (cons (cons sym val) row))))
 42               binding)
 43              (reverse row)))
 44          bindings)))
 45 
 46 ;; Query the Wikidata Query Service (WDQS)
 47 ;; - Uses GET with URL-encoded query and JSON format
 48 ;; - Sends a User-Agent per WDQS guidelines; callers can override
 49 (def (query-wikidata sparql-query . opts)
 50   (let* ((endpoint "https://query.wikidata.org/sparql")
 51          (encoded-query (uri-encode sparql-query))
 52          (request-url (string-append endpoint "?query=" encoded-query "&format=json"))
 53          (user-agent (if (pair? opts) (car opts) "gerbil-wikidata/0.1 (+https://example.org; contact@example.org)"))
 54          (headers `(("Accept" . "application/sparql-results+json")
 55                     ("User-Agent" . ,user-agent))))
 56     (let ((response (http-get request-url headers: headers)))
 57       (if (= (request-status response) 200)
 58           (let ((response-json (request-json response)))
 59             (process-sparql-results response-json))
 60           (error "SPARQL query failed"
 61                  status: (request-status response)
 62                  body: (request-text response))))))
 63 
 64 ;; Alist variant returning rows as association lists
 65 (def (query-wikidata/alist sparql-query . opts)
 66   (let* ((endpoint "https://query.wikidata.org/sparql")
 67          (encoded-query (uri-encode sparql-query))
 68          (request-url (string-append endpoint "?query=" encoded-query "&format=json"))
 69          (user-agent (if (pair? opts) (car opts) "gerbil-wikidata/0.1 (+https://example.org; contact@example.org)"))
 70          (headers `(("Accept" . "application/sparql-results+json")
 71                     ("User-Agent" . ,user-agent))))
 72     (let ((response (http-get request-url headers: headers)))
 73       (if (= (request-status response) 200)
 74           (let ((response-json (request-json response)))
 75             (process-sparql-results-alist response-json))
 76           (error "SPARQL query failed"
 77                  status: (request-status response)
 78                  body: (request-text response))))))
 79 
 80 ;; Backward-compatibility alias for previous DBPedia function name
 81 (def (query-dbpedia . args)
 82   (apply query-wikidata args))
 83 
 84 ;; Example Usage: fetch birth date and birthplace label for Grace Hopper
 85 (def (test2)
 86   (let ((query
 87          (string-append
 88           "PREFIX wd: <http://www.wikidata.org/entity/>\n"
 89           "PREFIX wdt: <http://www.wikidata.org/prop/direct/>\n"
 90           "PREFIX wikibase: <http://wikiba.se/ontology#>\n"
 91           "PREFIX bd: <http://www.bigdata.com/rdf#>\n"
 92           "SELECT ?birthDate ?birthPlaceLabel WHERE {\n"
 93           "  wd:Q7249 wdt:P569 ?birthDate .\n"
 94           "  wd:Q7249 wdt:P19 ?birthPlace .\n"
 95           "  SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n"
 96           "}")))
 97     (let ((results (query-wikidata/alist query)))
 98       (for-each
 99        (lambda (result)
100          (display (format "Birth Date: ~a\n" (alist-ref 'birthDate result)))
101          (display (format "Birth Place: ~a\n\n" (alist-ref 'birthPlaceLabel result))))
102        results))))
103 
104 ;; Test1: find URIs for Bill Gates and Microsoft; then list relationships
105 (def (test1)
106   (let* ((find-uris
107           (string-append
108            "PREFIX wd: <http://www.wikidata.org/entity/>\n"
109            "PREFIX wdt: <http://www.wikidata.org/prop/direct/>\n"
110            "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n"
111            "SELECT ?bill ?microsoft WHERE {\n"
112            "  ?bill rdfs:label \"Bill Gates\"@en .\n"
113            "  ?bill wdt:P31 wd:Q5 .\n"
114            "  ?microsoft rdfs:label \"Microsoft\"@en .\n"
115            ;; Ensure we pick the company entity
116            "  ?microsoft wdt:P31/wdt:P279* wd:Q4830453 .\n"
117            "} LIMIT 1"))
118          (rows (query-wikidata/alist find-uris)))
119     (if (null? rows)
120         (display "No URIs found for Bill Gates/Microsoft.\n")
121         (let* ((row (car rows))
122                (bill (alist-ref 'bill row))
123                (microsoft (alist-ref 'microsoft row))
124                (rel-query
125                 (string-append
126                  "PREFIX wikibase: <http://wikiba.se/ontology#>\n"
127                  "PREFIX bd: <http://www.bigdata.com/rdf#>\n"
128                  "SELECT ?prop ?propLabel ?dir WHERE {\n"
129                  "  VALUES (?bill ?microsoft) { (<" bill "> <" microsoft ">) }\n"
130                  "  ?wdprop wikibase:directClaim ?prop .\n"
131                  "  { BIND(\"Bill->Microsoft\" AS ?dir) ?bill ?prop ?microsoft . }\n"
132                  "  UNION\n"
133                  "  { BIND(\"Microsoft->Bill\" AS ?dir) ?microsoft ?prop ?bill . }\n"
134                  "  SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n"
135                  "}\n"
136                  "ORDER BY ?propLabel"))
137                (rels (query-wikidata/alist rel-query)))
138           (display (format "Bill Gates URI: ~a\n" bill))
139           (display (format "Microsoft URI: ~a\n" microsoft))
140           (if (null? rels)
141               (display "No direct relationships found.\n")
142               (for-each
143                (lambda (r)
144                  (display (format "~a: ~a\n"
145                                   (alist-ref 'dir r)
146                                   (or (alist-ref 'propLabel r)
147                                       (alist-ref 'prop r)))))
148                rels))))))
149 
150 ;; Test1 with User-Agent from env var WDQS_UA
151 (def (test1-ua)
152   (let* ((ua (or (getenv "WDQS_UA")
153                  "YourApp/1.0 (https://your.site; you@site)"))
154          (find-uris
155           (string-append
156            "PREFIX wd: <http://www.wikidata.org/entity/>\n"
157            "PREFIX wdt: <http://www.wikidata.org/prop/direct/>\n"
158            "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n"
159            "SELECT ?bill ?microsoft WHERE {\n"
160            "  ?bill rdfs:label \"Bill Gates\"@en .\n"
161            "  ?bill wdt:P31 wd:Q5 .\n"
162            "  ?microsoft rdfs:label \"Microsoft\"@en .\n"
163            "  ?microsoft wdt:P31/wdt:P279* wd:Q4830453 .\n"
164            "} LIMIT 1"))
165          (rows (query-wikidata/alist find-uris ua)))
166     (if (null? rows)
167         (display "No URIs found for Bill Gates/Microsoft.\n")
168         (let* ((row (car rows))
169                (bill (alist-ref 'bill row))
170                (microsoft (alist-ref 'microsoft row))
171                (rel-query
172                 (string-append
173                  "PREFIX wikibase: <http://wikiba.se/ontology#>\n"
174                  "PREFIX bd: <http://www.bigdata.com/rdf#>\n"
175                  "SELECT ?prop ?propLabel ?dir WHERE {\n"
176                  "  VALUES (?bill ?microsoft) { (<" bill "> <" microsoft ">) }\n"
177                  "  ?wdprop wikibase:directClaim ?prop .\n"
178                  "  { BIND(\"Bill->Microsoft\" AS ?dir) ?bill ?prop ?microsoft . }\n"
179                  "  UNION\n"
180                  "  { BIND(\"Microsoft->Bill\" AS ?dir) ?microsoft ?prop ?bill . }\n"
181                  "  SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n"
182                  "}\n"
183                  "ORDER BY ?propLabel"))
184                (rels (query-wikidata/alist rel-query ua)))
185           (display (format "Bill Gates URI: ~a\n" bill))
186           (display (format "Microsoft URI: ~a\n" microsoft))
187           (if (null? rels)
188               (display "No direct relationships found.\n")
189               (for-each
190                (lambda (r)
191                  (display (format "~a: ~a\n"
192                                   (alist-ref 'dir r)
193                                   (or (alist-ref 'propLabel r)
194                                       (alist-ref 'prop r)))))
195                rels))))))
196 
197 ;; Example Usage with User-Agent from env var WDQS_UA
198 (def (test2-ua)
199   (let* ((ua (or (getenv "WDQS_UA")
200                  "YourApp/1.0 (https://your.site; you@site)"))
201          (query
202           (string-append
203            "PREFIX wd: <http://www.wikidata.org/entity/>\n"
204            "PREFIX wdt: <http://www.wikidata.org/prop/direct/>\n"
205            "PREFIX wikibase: <http://wikiba.se/ontology#>\n"
206            "PREFIX bd: <http://www.bigdata.com/rdf#>\n"
207            "SELECT ?birthDate ?birthPlaceLabel WHERE {\n"
208            "  wd:Q7249 wdt:P569 ?birthDate .\n"
209            "  wd:Q7249 wdt:P19 ?birthPlace .\n"
210            "  SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n"
211            "}")))
212     (let ((results (query-wikidata/alist query ua)))
213       (for-each
214        (lambda (result)
215          (display (format "Birth Date: ~a\n" (alist-ref 'birthDate result)))
216          (display (format "Birth Place: ~a\n\n" (alist-ref 'birthPlaceLabel result))))
217        results))))

The core of this example is built around the query-wikidata and query-wikidata/alist functions. These procedures handle the low-level details of HTTP communication with the Wikidata SPARQL endpoint. They take a SPARQL query string, URI-encode it, and embed it into a GET request URL. They set a User-Agent string, a best practice that helps service operators identify the source of traffic; the functions allow this header to be overridden via an optional argument. After a successful request, the real data processing begins. The raw JSON response from a SPARQL endpoint is deeply nested, with each result variable wrapped in an object containing its type and value. The helper functions process-sparql-results and process-sparql-results-alist traverse this structure to extract the essential value of each binding, returning a clean list of results where each result is either a hash table or an association list mapping variable names (as symbols) to their values.

The included test functions, test1 and test2, serve as practical examples of this code’s capabilities. The test2 function is a straightforward lookup, retrieving the birth date and birthplace for a specific Wikidata entity (Grace Hopper, wd:Q7249). In contrast, test1 demonstrates a more powerful, dynamic pattern. It first runs a query to find the URIs for “Bill Gates” and “Microsoft” based on their English labels and types (human and business, respectively). It then uses these dynamically discovered URIs to construct a second query that finds all direct properties linking the two entities. This two-step approach is a common and robust method for interacting with linked data systems. Furthermore, the test1-ua and test2-ua variants illustrate how to provide a custom User-Agent string, for instance by reading it from an environment variable, showcasing the flexibility of the primary query functions.

Example Output

We use a Makefile target to start a Gerbil Scheme REPL with the example code loaded:

Here is the Makefile:

 1 $ cat Makefile 
 2 run:
 3     gxi -L wikidata.ss -
 4 
 5 test:
 6     gxi -L wikidata.ss -e "(test2)"
 7 
 8 run-agent:
 9     WDQS_UA="$${WDQS_UA:-YourApp/1.0 (https://your.site; you@site)}" gxi -L wikidata.ss -e "(test2-ua)"
10 
11 test1:
12     gxi -L wikidata.ss -e "(test1)"
13 
14 run-agent1:
15     WDQS_UA="$${WDQS_UA:-YourApp/1.0 (https://your.site; you@site)}" gxi -L wikidata.ss -e "(test1-ua)"

And sample output:

 1  $ make run-agent1
 2 WDQS_UA="${WDQS_UA:-YourApp/1.0 (https://your.site; you@site)}" gxi -L wikidata.ss -e "(test1-ua)"
 3 Bill Gates URI: http://www.wikidata.org/entity/Q5284
 4 Microsoft URI: http://www.wikidata.org/entity/Q2283
 5 Microsoft->Bill: http://www.wikidata.org/prop/direct/P112
 6 Bill->Microsoft: http://www.wikidata.org/prop/direct/P1830
 7 $ make
 8 gxi -L wikidata.ss -
 9 > (test1)
10 Bill Gates URI: http://www.wikidata.org/entity/Q5284
11 Microsoft URI: http://www.wikidata.org/entity/Q2283
12 Microsoft->Bill: http://www.wikidata.org/prop/direct/P112
13 Bill->Microsoft: http://www.wikidata.org/prop/direct/P1830
14 > (test2)
15 Birth Date: 1862-02-14T00:00:00Z
16 Birth Place: Venice

Code for Natural Language Processing (NLP)

Before the field of deep learning revolutionized the field of NLP, I created a commercial NLP product Knowledge Books Systems NLP that I first implemented in Common Lisp, then in Ruby, and then in Gambit Scheme. Because of the processing speed of my library, I still feel that it is a useful code because it processes text efficiently performing:

Part of speech tagging
Key phrase extraction
Categorizes text

A more modern approach to writing code for Natural Language Processing involves applying computational techniques to analyze, understand, and generate human language, bridging the gap between human communication and computer interpretation. The field heavily relies on machine learning, predominantly using languages like Python due to its extensive ecosystem of specialized libraries such as NLTK (Natural Language Toolkit), spaCy, and Hugging Face’s transformers. These tools provide the building blocks for implementing a wide array of NLP tasks, from foundational steps like tokenization (splitting text into words or sentences) and part-of-speech tagging to more complex applications like sentiment analysis, named entity recognition (identifying people and places), machine translation, and text summarization. At its core, coding for NLP is about converting unstructured text into a structured data format that machine learning models can process, and then using those models to derive meaningful insights, power conversational agents, or generate new, coherent text, thereby enabling software to interact with the world in a more human-like manner.

macOS Specific Build Notes

On some macOS systems, you might run into link compatibility problems. I assume that macOS and Linux using readers followed the installation instructions in the Preface. I had a few Gerbil vs. OpenSSL issues on macOS that I fixed using:

1 export MACOSX_DEPLOYMENT_TARGET=15.0

As a result of this configuration problem, the make test target or the NLP example runs an interpreter target, bypassing any potential link problems.

1 make test
2 cat .gerbil/test-output.json | jq

I recommend that you start running this example interpretively using make test before compiling this example using make.

Structure of Project and Build Instructions

The project directory gerbil_scheme_book/source_code/NLP contains hand written NLP utilities, the sub-directory generated-code contains classification and linguistic data as literal data embedded directly in Scheme source code. These files were auto-generated by Ruby utilities I wrote in 2005. The sub-directory data contains additional linguistic data files.

The Makefile provides a standard set of targets for common development tasks. A key feature of this setup is the use of a project-local build environment. All Gerbil build artifacts, compiled modules, and package dependencies are installed into a .gerbil directory within the project root, rather than the user’s global ~/.gerbil directory.

This is achieved by temporarily setting the HOME environment variable to the current directory for all Gerbil compiler (gxc) and interpreter (gxi) commands:

1 HOME=$(CURDIR) gxi ...

This approach ensures that the project is self-contained and builds are reproducible, without interfering with your global Gerbil installation.

 1 all: build
 2 
 3 build:
 4     # Redirect HOME so gxpkg installs into project-local .gerbil
 5     HOME=$(CURDIR) gxi build.ss
 6 
 7 .PHONY: test test-fast
 8 
 9 test: compile-mods
10     @echo "Running smoke test (gxi) on climate_g8.txt..."
11     @mkdir -p .gerbil
12     @rm -rf .gerbil/test-output.json
13     @HOME=$(CURDIR) gxi nlp.ss -- -i data/testdata/climate_g8.txt -o .gerbil/test-output.json
14     @echo "Wrote .gerbil/test-output.json"
15     @/bin/echo -n "Preview: " && head -c 300 .gerbil/test-output.json || true
16 
17 # Interpreter-based run; useful if static exe build is problematic
18 test-fast: compile-mods
19     @echo "Running interpreter smoke test (gxi) on climate_g8.txt..."
20     @mkdir -p .gerbil
21     @HOME=$(CURDIR) gxi nlp.ss -- -i data/testdata/climate_g8.txt -o .gerbil/test-output.json
22     @echo "Wrote .gerbil/test-output.json"
23     @/bin/echo -n "Preview: " && head -c 300 .gerbil/test-output.json || true
24 
25 
26 clean:
27     @echo "Cleaning project build artifacts..."
28     @rm -rf .gerbil
29     @rm -f kbtm nlp a.out
30     @find . -maxdepth 1 -type f \( -name "*.o*" -o -name "*.ssxi" -o -name "*.ssi" \) -delete
31 .PHONY: compile-mods
32 compile-mods:
33     @echo "Compiling modules with gxc into project-local .gerbil..."
34     @HOME=$(CURDIR) gxc utils.ss fasttag.ss category.ss proper-names.ss \
35       data/stop-words.ss generated-code/lexdata.ss generated-code/cat-data-tables.ss \
36       main.ss nlp.ss

The build.ss script is expected to contain the primary compilation and linking logic to produce the final executable(s).

make test

This target runs a smoke test on the application. It first ensures all modules are compiled (by depending on compile-mods), then executes nlp.ss with a predefined test data file (climate_g8.txt). The output is written to .gerbil/test-output.json, and a 300-byte preview of the output is printed to the console:

1   test: compile-mods
2       @HOME=$(CURDIR) gxi nlp.ss -- -i data/testdata/climate_g8.txt -o .gerbil/test-output.json

make test-fast

This target is a variation of test that also runs the interpreter-based smoke test. The primary purpose is to provide a quick feedback loop during development, bypassing any potentially slow static executable linking steps that might be part of the main build target.

make compile-mods

This is a utility target that pre-compiles all core Scheme source files (.ss) into the project-local .gerbil directory using the Gerbil compiler (gxc). The test and test-fast targets depend on this to ensure modules are up-to-date before running the test application. This separation speeds up subsequent runs, as modules are not recompiled unnecessarily.

Top Level Project Code

nlp.ss

This program serves as the main command-line interface for our NLP text analysis library. Its primary responsibility is to orchestrate the text processing workflow by parsing command-line arguments, invoking the core analysis engine, and serializing the results into a structured JSON format. The utility is designed to read the path to a source text file and a destination output file from the user. After processing the input file with the process-file function from the underlying library, it constructs a JSON object containing extracted data such as significant words, tags, key phrases, and scored categories. To accomplish this without external dependencies, the program includes a minimal, custom-built set of functions for escaping special characters and writing JSON-compliant strings and arrays, demonstrating fundamental principles of data serialization, file I/O, and application entry point logic in a functional programming context.

 1 (import :kbtm/main)
 2 
 3 (export main)
 4 
 5 ;; minimal JSON writer for our specific output
 6 (define (json-escape s)
 7   (list->string
 8    (apply append
 9           (map (lambda (ch)
10                  (cond
11                   ((char=? ch #\") '(#\\ #\"))
12                   ((char=? ch #\\) '(#\\ #\\))
13                   ((char=? ch #\newline) '(#\\ #\n))
14                   (else (list ch))))
15                (string->list s)))))
16 
17 (define (write-json-string s)
18   (display "\"")
19   (display (json-escape s))
20   (display "\""))
21 
22 (define (write-json-string-list lst)
23   (display "[")
24   (let loop ((xs lst) (first #t))
25     (if (pair? xs)
26         (begin
27           (if (not first) (display ","))
28           (write-json-string (car xs))
29           (loop (cdr xs) #f))))
30   (display "]"))
31 
32 (define (write-json-categories cats)
33   ;; cats: list of ((name score) ...)
34   (display "[")
35   (let loop ((xs cats) (first #t))
36     (if (pair? xs)
37         (let* ((pair (car xs))
38                (name (car pair))
39                (score (cadr pair)))
40           (if (not first) (display ","))
41           (display "[")
42           (write-json-string name)
43           (display ",")
44           (display score)
45           (display "]")
46           (loop (cdr xs) #f))))
47   (display "]"))
48 
49 (define (json-write ret)
50   ;; ret is a table with fixed keys
51   (display "{")
52   (display "\"words\":")
53   (write-json-string-list (table-ref ret "words" '()))
54   (display ",\"tags\":")
55   (write-json-string-list (table-ref ret "tags" '()))
56   (display ",\"key-phrases\":")
57   (write-json-string-list (table-ref ret "key-phrases" '()))
58   (display ",\"categories\":")
59   (write-json-categories (table-ref ret "categories" '()))
60   (display "}") )
61 
62 (define (print-help)
63   (display "KBtextmaster (native) command line arguments:")
64   (newline)
65   (display "   -h              -- to print help message")
66   (newline)
67   (display "   -i <file name>  -- to define the input file name")
68   (newline)
69   (display "   -o <file name>  -- to specify the output file name")
70   (newline))
71 
72 
73 (define (main . argv)
74   (let* ((args (command-line))
75          (in-file (member "-i" args))
76          (out-file (member "-o" args))
77          (ret (make-table)))
78     (when (member "-h" args)
79       (print-help))
80     (set! in-file (and in-file (cadr in-file)))
81     (set! out-file (and out-file (cadr out-file)))
82     (if (and in-file out-file)
83         (let ((resp (process-file in-file)))
84           (with-output-to-file
85               (list path: out-file create: #t)
86             (lambda ()
87               (table-set! ret "words" (vector->list (car resp)))
88               (table-set! ret "tags" (vector->list (cadr resp)))
89               (table-set! ret "key-phrases" (caddr resp))
90               (table-set! ret "categories" (cadddr resp))
91               ;; TBD: implement summary words, proper name list, and place name list
92 
93               (json-write ret))))
94         (print-help))
95     0))
96 
97 ;; (process-file "data/testdata/climate_g8.txt")

The program’s logic is centered in the main function, which acts as the application controller. It begins by parsing the program’s command-line arguments, using the member procedure to check for the presence of -i (input file), -o (output file), and -h (help) flags. The control flow is straightforward: if the help flag is present or if the required file arguments are missing, a help message is displayed. Otherwise, the core logic proceeds within a with-output-to-file block, which ensures that the output is correctly directed to the user-specified file. Inside this block, the external process-file function is called, and its returned data structures are placed into a hash table, which is then passed to our custom JSON writer.

A notable feature of this code is its self-contained approach to JSON serialization. Instead of relying on a third-party library, we build the JSON output manually through a series of specialized helper functions. The json-escape function handles the critical task of properly escaping special characters within strings to ensure the output is valid. Building on this, procedures like write-json-string-list and write-json-categories use a common Scheme pattern, the named let loop, to iterate over lists and recursively construct the JSON array syntax, carefully managing the placement of commas between elements. The final json-write function assembles the complete JSON object by explicitly printing the keys and calling the appropriate helper for each value, providing a clear and direct implementation of a data serialization routine.

Other Source Files

We will not discuss the following code files:

fasttag.ss - part of speech tagger.
place-names.ss - identify place names in text.
summarize.ss - summarize text.
utils.ss - misc. utility functions.
category.ss - classifies (or categorizes) text.
key-phrases.ss - extracts key phrases from text.
main.ss - main, or top level, interface functions.
proper-names.ss - identify proper names in text.

Test Run:

On some systems, you might run into link compatibility probelems. I did on macOS when I brew installed Gerbil Scheme and later brew updated openssl to a newer version.

As a result of this configuration problem, the make test target runs an interpretter target, bypassing any potential link problems.

1 make test
2 cat .gerbil/test-output.json | jq

Building and Running the Command Line Tool

 1 $ make
 2 $ .gerbil/bin/nlp -i data/testdata/climate_g8.txt -o output.json
 3 $ cat output.json | jq
 4 
 5   ... lots of output not shown...
 6     "VBD",
 7     "CD"
 8   ],
 9   "key-phrases": [
10     "clean energy",
11     "developing countries"
12   ],
13   "categories": [
14     [
15       "news_economy.txt",
16       136750
17     ],
18     [
19       "news_war.txt",
20       117290
21     ]
22   ]
23 }

Gerbil Scheme FFI Example Using the C Language Raptor RDF Library

The Foreign Function Interface (FFI) in Gerbil Scheme provides a powerful bridge to leverage existing C libraries directly within Scheme programs. This allows developers to extend Scheme applications with highly optimized, domain-specific functionality written in C, while still enjoying the high-level abstractions and rapid development style of Scheme. In this chapter, we will explore an end-to-end example of integrating the Raptor RDF parsing and serialization library into Gerbil Scheme, showing how to bind C functions and expose them as Scheme procedures.

Raptor is a mature C library for parsing, serializing, and manipulating RDF (Resource Description Framework) data in a variety of syntaxes, including RDF/XML, Turtle, N-Triples, and JSON-LD. By accessing Raptor from Gerbil Scheme, we open the door to semantic web applications, linked data processing, and graph-based reasoning directly within a Scheme environment. This example illustrates the mechanics of building FFI bindings, handling C-level memory and data types, and translating them into idiomatic Scheme representations. For reference, the official Raptor API documentation is available here: Raptor2 API Reference.

We will walk through the process step by step: setting up the Gerbil Scheme FFI definitions, mapping C functions and structs into Scheme, and writing test programs that parse RDF data and extract triples. Along the way, we will highlight practical issues such as error handling, symbol exporting, and resource cleanup. By the end of this chapter, you should have both a working Gerbil Scheme binding to Raptor and a general blueprint for integrating other C libraries into your Scheme projects. For background on Gerbil Scheme’s FFI itself, consult the Gerbil Documentation: FFI.

Implementation of a FFI Bridge Library for Raptor

The library is located in the file gerbil_scheme_book/source_code/RaptorRDF_FFI/ffi.ss. This code demonstrates how to use the Foreign Function Interface (FFI) to integrate with the Raptor RDF C library. It provides a Scheme-accessible procedure, raptor-parse-file->ntriples, which parses an RDF file in a specified syntax (such as Turtle or RDF/XML) and returns the results as an N-Triples–formatted string. This example highlights the practical use of FFI in Gerbil Scheme: exposing a C function to Scheme, managing memory safely across the boundary, and translating RDF data into a representation that Scheme programs can process directly.

 1 (export raptor-parse-file->ntriples)
 2 
 3 (import :std/foreign)
 4 
 5 (begin-ffi (raptor-parse-file->ntriples)
 6   (c-declare #<<'C'
 7 #include <raptor2.h>
 8 #include <string.h>
 9 
10 #ifndef RAPTOR_STRING_ESCAPE_FLAG_NTRIPLES
11 #define RAPTOR_STRING_ESCAPE_FLAG_NTRIPLES 0x4
12 #endif
13 
14 /* Write one triple to the iostream in N-Triples and a newline */
15 static void triples_to_iostr(void* user_data, raptor_statement* st) {
16   raptor_iostream* iostr = (raptor_iostream*)user_data;
17 
18   raptor_term_escaped_write(st->subject, RAPTOR_STRING_ESCAPE_FLAG_NTRIPLES, iostr);
19   raptor_iostream_write_byte(' ', iostr);
20   raptor_term_escaped_write(st->predicate, RAPTOR_STRING_ESCAPE_FLAG_NTRIPLES, iostr);
21   raptor_iostream_write_byte(' ', iostr);
22   raptor_term_escaped_write(st->object, RAPTOR_STRING_ESCAPE_FLAG_NTRIPLES, iostr);
23   raptor_iostream_write_byte(' ', iostr);
24   raptor_iostream_write_byte('.', iostr);
25   raptor_iostream_write_byte('\n', iostr);
26 }
27 
28 /* Parse `filename` with syntax `syntax_name` and return N-Triples as char*.
29    The returned memory is owned by Raptor's allocator; Gambit copies it
30    into a Scheme string via char-string return convention. */
31 static char* parse_file_to_ntriples(const char* filename, const char* syntax_name) {
32   raptor_world *world = NULL;
33   raptor_parser* parser = NULL;
34   unsigned char *uri_str = NULL;
35   raptor_uri *uri = NULL, *base_uri = NULL;
36   raptor_iostream *iostr = NULL;
37   void *out_string = NULL;
38   size_t out_len = 0;
39 
40   world = raptor_new_world();
41   if(!world) return NULL;
42   if(raptor_world_open(world)) { raptor_free_world(world); return NULL; }
43 
44   /* Where triples go: a string iostream that materializes on free */
45   iostr = raptor_new_iostream_to_string(world, &out_string, &out_len, NULL);
46   if(!iostr) { raptor_free_world(world); return NULL; }
47 
48 
49 
50   parser = raptor_new_parser(world, syntax_name ? syntax_name : "guess");
51   if(!parser) { raptor_free_iostream(iostr); raptor_free_world(world); return NULL; }
52 
53   raptor_parser_set_statement_handler(parser, iostr, triples_to_iostr);
54 
55   uri_str = raptor_uri_filename_to_uri_string((const unsigned char*)filename);
56   if(!uri_str) { raptor_free_parser(parser); raptor_free_iostream(iostr); raptor_free_world(world); return NULL; }
57 
58   uri = raptor_new_uri(world, uri_str);
59   base_uri = raptor_uri_copy(uri);
60 
61   /* Parse file; on each triple our handler appends to iostr */
62   raptor_parser_parse_file(parser, uri, base_uri);
63 
64   /* Clean up parser/URIs; free iostr LAST to finalize string */
65   raptor_free_parser(parser);
66   raptor_free_uri(base_uri);
67   raptor_free_uri(uri);
68   raptor_free_memory(uri_str);
69 
70   raptor_free_iostream(iostr); /* this finalizes out_string/out_len */
71 
72   /* Keep world only as long as needed; string is independent now */
73   raptor_free_world(world);
74 
75   return (char*)out_string; /* Gambit copies to Scheme string */
76 }
77 'C'
78   )
79 
80   ;; Scheme visible wrapper:
81   (define-c-lambda raptor-parse-file->ntriples
82     (char-string       ;; filename
83      char-string)      ;; syntax name, e.g., "turtle", "rdfxml", or "guess"
84     char-string
85     "parse_file_to_ntriples"))

The C portion begins by including the raptor2.h header and defining a callback function, triples_to_iostr, which takes RDF statements and writes them to a Raptor iostream in N-Triples format. This callback escapes subjects, predicates, and objects correctly and ensures triples are terminated with a period and newline, conforming to the N-Triples standard. The main work is performed in parse_file_to_ntriples, which initializes a Raptor world and parser, configures the statement handler to use the callback, and sets up an iostream that accumulates parsed triples into a string buffer. Error checks are in place at every step, ensuring resources such as the world, parser, URIs, and iostream are properly freed if initialization fails.

After setup, the parser processes the input file identified by its filename and syntax. Each RDF statement is converted into N-Triples and appended to the output string via the iostream. Once parsing is complete, the parser, URIs, iostream, and world are released, leaving a fully materialized string containing the N-Triples serialization. This string is returned to Scheme through the FFI, where Gambit copies it into a managed Scheme string. On the Scheme side, the define-c-lambda form binds this C function as the procedure raptor-parse-file->ntriples, exposing it with the expected (filename syntax-name) -> ntriples-string interface. The result is a clean abstraction: Scheme code can call raptor-parse-file->ntriples with an RDF file and syntax, receiving back normalized N-Triples ready for further processing in Gerbil Scheme.

Test Code

This Gerbil Scheme test code in the file test.ss exercises the FFI binding raptor-parse-file->ntriples by creating a minimal Turtle input, invoking the parser in two modes (“turtle” and “guess”), and asserting that both produce the same canonical N-Triples output. It’s designed to be self-contained: it writes a temporary .ttl file, runs the conversion twice, compares results against an expected string, then cleans up and prints status.

 1 ;; Simple test for ffi.ss: validates N-Triples output
 2 
 3 (import "ffi")
 4 (export main)
 5 (import "ffi" :gerbil/gambit)
 6 
 7 (define (write-file path content)
 8   (let ((p (open-output-file path)))
 9     (display content p)
10     (close-output-port p)))
11 
12 (define (read-file path)
13   (let ((p (open-input-file path)))
14     (let loop ((chunks '()))
15       (let ((c (read-char p)))
16         (if (eof-object? c)
17             (begin (close-input-port p)
18                    (list->string (reverse chunks)))
19             (loop (cons c chunks)))))))
20 
21 (define (assert-equal expected actual label)
22   (if (equal? expected actual)
23       (begin (display "PASS ") (display label) (newline) #t)
24       (begin
25         (display "FAIL ") (display label) (newline)
26         (display "Expected:\n") (display expected) (newline)
27         (display "Actual:\n") (display actual) (newline)
28         (exit 1))))
29 
30 (define (main . args)
31   (let* ((ttl-file "sample.ttl")
32          (ttl-content "@prefix ex: <http://example.org/> .
33 ex:s ex:p ex:o .
34 ")
35          (expected-nt "<http://example.org/s> <http://example.org/p> <http://example.org/o> .
36 "))
37     ;; Prepare sample Turtle file
38     (write-file ttl-file ttl-content)
39 
40     ;; Exercise FFI with explicit syntax
41     (let ((nt1 (raptor-parse-file->ntriples ttl-file "turtle")))
42       (assert-equal expected-nt nt1 "turtle -> ntriples"))
43 
44     ;; Exercise FFI with syntax guessing
45     (let ((nt2 (raptor-parse-file->ntriples ttl-file "guess")))
46       (assert-equal expected-nt nt2 "guess -> ntriples"))
47 
48     ;; Clean up
49     (when (file-exists? ttl-file)
50       (delete-file ttl-file))
51 
52     (display "All tests passed.
53 ")))

This code exports main and imports the FFI wrapper. Utility helpers include write-file (persist a string to disk), read-file (characterwise file read; defined but unused here), and assert-equal, which prints PASS/FAIL labels and exits with non-zero status on mismatch. In function main a small Turtle document defines a simple triple using the ex: prefix; the corresponding expected N-Triples string is the fully expanded IRI form with a terminating period and newline.

The test proceeds in two phases: first it calls raptor-parse-file->ntriples ttl-file “turtle” and checks the result; then it repeats using “guess” to confirm the parser’s auto-detection path yields identical serialization. After both assertions pass, it deletes the temporary file and prints “All tests passed.” The result is a minimal but effective smoke test verifying the FFI, Raptor’s parsing/serialization, and the contract that both explicit syntax selection and guessing produce stable N-Triples output.

We use a Makefile to build an executable on macOS:

 1 ##### macOS:
 2 RAPTOR_PREFIX ?= /opt/homebrew/Cellar/raptor/2.0.16
 3 OPENSSL_PREFIX ?= /opt/homebrew/opt/openssl@3
 4 CC_OPTS := -I$(RAPTOR_PREFIX)/include/raptor2
 5 LD_OPTS := -L$(RAPTOR_PREFIX)/lib -lraptor2 -L$(OPENSSL_PREFIX)/lib
 6 
 7 build:
 8     gxc -cc-options "$(CC_OPTS)" -ld-options "$(LD_OPTS)" ffi.ss
 9     gxc -cc-options "$(CC_OPTS)" -ld-options "$(LD_OPTS)" -exe -o test test.ss
10 
11 clean:
12     rm -f *.c *.scm *.o *.so test

Alternatively, the Makefile for Linux can me run using make -f Makefile.linux:

 1 #### Ubuntu Linux
 2 CC_OPTS += $(shell pkg-config --cflags raptor2)
 3 LD_OPTS += $(shell pkg-config --libs raptor2)
 4 
 5 build:
 6     gxc -cc-options "$(CC_OPTS)" -ld-options "$(LD_OPTS)" ffi.ss
 7     gxc -cc-options "$(CC_OPTS)" -ld-options "$(LD_OPTS)" -exe -o test test.ss
 8 
 9 clean:
10     rm -f *.c *.scm *.o *.so test

Test output is:

1 $ ./test
2 PASS turtle -> ntriples
3 PASS guess -> ntriples
4 All tests passed.

Complete FFI Example: C Language Wrapper for Rasqal SPARQL Library and Sord RDF Datastore Library

The example for this chapter is a complete, self-contained bridge between Gerbil Scheme and a well-established C RDF toolchain. It demonstrates how to load RDF data into an in-memory store and execute SPARQL queries from Gerbil via a Foreign Function Interface (FFI). Under the hood it uses Serd and Sord to parse and hold triples, and Rasqal/Raptor to prepare, execute, and serialize SPARQL results. The project ships with both a small C demo CLI to test the C language part of this project and a Gerbil-based client, making it easy to explore end-to-end from raw data files to query results printed on the console.

The core is a compact C wrapper that exposes four functions: initialize the store with a data file (rdf_init), run a SPARQL query and get results as a string (rdf_query_copy), and clean up (rdf_free). Data is parsed with Serd and stored in Sord’s in-memory model; queries are executed with Rasqal, and results are serialized to TSV by Raptor with the column header removed for line-per-row output. On the Gerbil side, rdfwrap.ss defines the FFI bindings and test.ss provides a simple CLI program, illustrating a clean pattern for calling native libraries from Gerbil without excess machinery.

Using the project is straightforward. After installing the required libraries (Homebrew recipes are listed in the README for macOS and there is a separate Linux Makefile for Linux users), make produces a shared library, a C demo (DEMO_rdfwrap), and a Gerbil executable (TEST_client). You can point either binary at a small example dataset (TTL or NT representations are included) and supply a SPARQL SELECT query to print results. The Gerbil client supports sensible defaults, inline queries, and reading queries from files via the @file convention, making it convenient for quick experiments, regression checks, or embedding SPARQL query capabilities into larger Gerbil programs.

This example application aims to be an educational, practical starting point rather than a full-featured RDF database because data is stored strictly in memory. It focuses on clarity and minimal surface area: a tiny C API, a thin FFI layer, and a simple CLI that you can adapt. From here, natural extensions include loading multiple graphs, supporting additional RDF syntaxes, handling named graphs/datasets, improving error reporting, and streaming or structuring results beyond TSV. If you’re learning Gerbil FFI, integrating SPARQL into Scheme, or want a small reference for wiring C libraries into a Lisp workflow, the example in this chapter provides a concise, working template.

Library Selection

Here we will be using Sord which is very lightweight C library for storing RDF triples in memory.

Limitation: Sord itself is just a data store. It does not have a built-in SPARQL engine. However, it’s designed to be used with other libraries, and we will pair it with Rasqal (from the Redland RDF suite) to add SPARQL query capability.

Rasqal is a SPARQL query library. It can parse a SPARQL query and execute it against a graph datastore like Sord or Redland project’s librdf. We could have used librdf with Rasqal in our implementation but I felt that it is a better example wrapping libraries from different projects.

Overview of the Project Structure and Build System

This project is in the directory gerbil_scheme_book/source_code/SparqlRdfStore and contains a sub-directory C-source for our implementation of a C language wrapper for the Sord and Rasqal libraries. The top level project direcory contains Gerbil Scheme source files, example RDF data files, and a file containing a test SPARQL query:

1 $ pwd
2 ~/Users/markw/~GITHUB/gerbil_scheme_book/source_code/SparqlRdfStore
3 $ ls -R
4 C-source    data.nt     Makefile    q.sparql    README.md
5 data.n3     data.ttl    mini.nt     rdfwrap.ss  test.ss
6 
7 ./C-source:
8 wrapper.c

The following Makefile builds the project for macOS:

  1 # Makefile — DEMO binary + FFI library + Gerbil client
  2 
  3 # ---- Prefixes (override on CLI if needed) ----
  4 SORD_PREFIX    ?= /opt/homebrew/opt/sord
  5 SERD_PREFIX    ?= /opt/homebrew/opt/serd
  6 RASQAL_PREFIX  ?= /opt/homebrew/opt/rasqal
  7 RAPTOR_PREFIX  ?= /opt/homebrew/opt/raptor
  8 LIBXML2_PREFIX ?= /opt/homebrew/opt/libxml2
  9 OPENSSL_PREFIX ?= /opt/homebrew/opt/openssl@3
 10 
 11 # ---- Tools ----
 12 CC         ?= cc
 13 PKG_CONFIG ?= pkg-config
 14 GXC        ?= gxc
 15 
 16 # ---- Sources / Outputs ----
 17 SRC_C      := C-source/wrapper.c
 18 OBJ_C      := $(SRC_C:.c=.o)
 19 OBJ_PIC    := C-source/wrapper.pic.o
 20 
 21 DEMO_BIN   := DEMO_rdfwrap
 22 SHLIB      := libRDFWrap.dylib       # macOS
 23 GERBIL_EXE := TEST_client
 24 GERBIL_SRC := test.ss
 25 
 26 # ---- Try pkg-config first (brings transitive libs) ----
 27 HAVE_PKGCFG := $(shell $(PKG_CONFIG) --exists sord-0 serd-0 rasqal raptor2 && echo yes || echo no)
 28 PKG_CFLAGS  := $(if $(filter yes,$(HAVE_PKGCFG)),$(shell $(PKG_CONFIG) --cflags sord-0 serd-0 rasqal raptor2))
 29 PKG_LDLIBS  := $(if $(filter yes,$(HAVE_PKGCFG)),$(shell $(PKG_CONFIG) --libs   sord-0 serd-0 rasqal raptor2))
 30 
 31 # ---- Fallback include/lib flags ----
 32 FALLBACK_CFLAGS := \
 33   -I$(SORD_PREFIX)/include/sord-0 \
 34   -I$(SERD_PREFIX)/include/serd-0 \
 35   -I$(RASQAL_PREFIX)/include \
 36   -I$(RAPTOR_PREFIX)/include/raptor2 \
 37   -I$(LIBXML2_PREFIX)/include/libxml2
 38 
 39 # Core libs if pkg-config is unavailable
 40 FALLBACK_LDLIBS := \
 41   -L$(SORD_PREFIX)/lib   -lsord-0 \
 42   -L$(SERD_PREFIX)/lib   -lserd-0 \
 43   -L$(RASQAL_PREFIX)/lib -lrasqal \
 44   -L$(RAPTOR_PREFIX)/lib -lraptor2 \
 45   -L$(LIBXML2_PREFIX)/lib -lxml2 \
 46 
 47 # Extra libs sometimes needed by transitive deps or gerbil toolchain
 48 EXTRA_LDLIBS := \
 49   -L$(OPENSSL_PREFIX)/lib -lssl -lcrypto \
 50   -liconv -lz -lm
 51 
 52 # ---- Final flags ----
 53 CFLAGS  ?= -Wall -O2
 54 CFLAGS  += $(if $(PKG_CFLAGS),$(PKG_CFLAGS),$(FALLBACK_CFLAGS))
 55 
 56 LDLIBS  += $(if $(PKG_LDLIBS),$(PKG_LDLIBS),$(FALLBACK_LDLIBS)) $(EXTRA_LDLIBS)
 57 LDFLAGS +=
 58 
 59 # For the shared lib on macOS
 60 DYNLIB_LDFLAGS := -dynamiclib -install_name @rpath/$(SHLIB)
 61 
 62 # Gerbil compile/link flags
 63 GERBIL_CFLAGS := $(CFLAGS)
 64 GERBIL_LDOPTS := $(LDLIBS) -L. -lRDFWrap -Wl,-rpath,@loader_path
 65 # Where gxc writes intermediate artifacts; keep it inside workspace
 66 GERBIL_OUT_DIR ?= .gerbil_build
 67 
 68 # ---- Default target ----
 69 all: $(DEMO_BIN) $(SHLIB) $(GERBIL_EXE)
 70 
 71 # ---- Demo binary (with small CLI main) ----
 72 $(DEMO_BIN): $(OBJ_C)
 73     $(CC) -o $@ $^ $(LDFLAGS) $(LDLIBS)
 74 
 75 # Build normal object for the demo; define RDF_DEMO_MAIN to enable main()
 76 C-source/wrapper.o: C-source/wrapper.c
 77     $(CC) $(CFLAGS) -DRDF_DEMO_MAIN -c -o $@ $<
 78 
 79 # ---- Shared library for Gerbil FFI ----
 80 $(SHLIB): $(OBJ_PIC)
 81     $(CC) $(DYNLIB_LDFLAGS) -o $@ $^ $(LDFLAGS) $(LDLIBS)
 82 
 83 # PIC object for dynamic library
 84 C-source/wrapper.pic.o: C-source/wrapper.c
 85     $(CC) $(CFLAGS) -fPIC -c -o $@ $<
 86 
 87 # ---- Gerbil client (assumes test.ss is valid and calls FFI) ----
 88 $(GERBIL_EXE): $(GERBIL_SRC) rdfwrap.ss $(SHLIB)
 89     $(GXC) -d $(GERBIL_OUT_DIR) -cc-options "$(GERBIL_CFLAGS)" -ld-options "$(GERBIL_LDOPTS)" -exe -o $@ rdfwrap.ss $(GERBIL_SRC)
 90 
 91 # ---- Utilities ----
 92 clean:
 93     rm -f $(OBJ_C) $(OBJ_PIC) $(DEMO_BIN) $(SHLIB) $(GERBIL_EXE)
 94     rm -rf $(GERBIL_OUT_DIR)
 95     rm -rf test DEMO_rdfwrap TEST_client libRDFWrap.dylib .gerbil_build
 96 
 97 print-flags:
 98     @echo "HAVE_PKGCFG = $(HAVE_PKGCFG)"
 99     @echo "CFLAGS      = $(CFLAGS)"
100     @echo "LDFLAGS     = $(LDFLAGS)"
101     @echo "LDLIBS      = $(LDLIBS)"
102     @echo "GERBIL_CFLAGS = $(GERBIL_CFLAGS)"
103     @echo "GERBIL_LDOPTS = $(GERBIL_LDOPTS)"
104 
105 .PHONY: all clean print-flags

Alternatively, you can run make -f Makefile.linux to build for Linux:

 1 # Makefile for Ubuntu Linux — DEMO binary + FFI library + Gerbil client
 2 
 3 # ---- Tools ----
 4 CC         ?= cc
 5 PKG_CONFIG ?= pkg-config
 6 GXC        ?= gxc
 7 
 8 # ---- Sources / Outputs ----
 9 SRC_C      := C-source/wrapper.c
10 OBJ_C      := $(SRC_C:.c=.o)
11 OBJ_PIC    := C-source/wrapper.pic.o
12 
13 DEMO_BIN   := DEMO_rdfwrap
14 SHLIB      := libRDFWrap.so          # Linux shared library
15 GERBIL_EXE := TEST_client
16 GERBIL_SRC := test.ss
17 
18 # ---- Library Flags (using pkg-config) ----
19 # Use pkg-config to get compiler and linker flags for dependencies.
20 # This is the standard way on Linux and avoids hardcoded paths.
21 PKG_CFLAGS := $(shell $(PKG_CONFIG) --cflags sord-0 serd-0 rasqal raptor2)
22 PKG_LDLIBS := $(shell $(PKG_CONFIG) --libs   sord-0 serd-0 rasqal raptor2)
23 
24 # Extra libs sometimes needed by transitive deps or gerbil toolchain
25 EXTRA_LDLIBS := -lssl -lcrypto -lz -lm
26 
27 # ---- Final flags ----
28 CFLAGS  ?= -Wall -O2
29 CFLAGS  += $(PKG_CFLAGS)
30 
31 LDLIBS  += $(PKG_LDLIBS) $(EXTRA_LDLIBS)
32 LDFLAGS +=
33 
34 # Linker flags for the shared library on Linux
35 DYNLIB_LDFLAGS := -shared -Wl,-soname,$(SHLIB)
36 
37 # Gerbil compile/link flags
38 GERBIL_CFLAGS := $(CFLAGS)
39 # Use $$ORIGIN for the rpath on Linux. This tells the executable to look for
40 # the shared library in its own directory. The '$$' escapes the '$' for Make.
41 GERBIL_LDOPTS := -L$(CURDIR) -lRDFWrap $(LDLIBS) -Wl,-rpath,'$$ORIGIN'
42 
43 # Where gxc writes intermediate artifacts; keep it inside workspace
44 GERBIL_OUT_DIR ?= .gerbil_build
45 
46 # ---- Default target ----
47 all: $(DEMO_BIN) $(SHLIB) $(GERBIL_EXE)
48 
49 # ---- Demo binary (with small CLI main) ----
50 $(DEMO_BIN): $(OBJ_C)
51     $(CC) -o $@ $^ $(LDFLAGS) $(LDLIBS)
52 
53 # Build normal object for the demo; define RDF_DEMO_MAIN to enable main()
54 C-source/wrapper.o: C-source/wrapper.c
55     $(CC) $(CFLAGS) -DRDF_DEMO_MAIN -c -o $@ $<
56 
57 # ---- Shared library for Gerbil FFI ----
58 $(SHLIB): $(OBJ_PIC)
59     $(CC) $(DYNLIB_LDFLAGS) -o $@ $^ $(LDFLAGS) $(LDLIBS)
60 
61 # PIC object for dynamic library
62 C-source/wrapper.pic.o: C-source/wrapper.c
63     $(CC) $(CFLAGS) -fPIC -c -o $@ $<
64 
65 # ---- Gerbil client (assumes test.ss is valid and calls FFI) ----
66 $(GERBIL_EXE): $(GERBIL_SRC) rdfwrap.ss $(SHLIB)
67     $(GXC) -d $(GERBIL_OUT_DIR) -cc-options "$(GERBIL_CFLAGS)" -ld-options "$(GERBIL_LDOPTS)" -exe -o $@ rdfwrap.ss $(GERBIL_SRC)
68 
69 # ---- Utilities ----
70 clean:
71     rm -f $(OBJ_C) $(OBJ_PIC) $(DEMO_BIN) $(SHLIB) $(GERBIL_EXE)
72     rm -rf $(GERBIL_OUT_DIR)
73 
74 print-flags:
75     @echo "CFLAGS        = $(CFLAGS)"
76     @echo "LDFLAGS       = $(LDFLAGS)"
77     @echo "LDLIBS        = $(LDLIBS)"
78     @echo "GERBIL_CFLAGS = $(GERBIL_CFLAGS)"
79     @echo "GERBIL_LDOPTS = $(GERBIL_LDOPTS)"
80 
81 .PHONY: all clean print-flags

Implementation of the C Language Wrapper

Here we write a wrapper that will later be called from Gerbil Scheme code.

The following listing shows the wrapper C-source/wrapper.c. This C wrapper provides a simplified, high-level interface for handling RDF data by leveraging the combined power of the serd, sord, rasqal, and raptor2 libraries. The primary goal of this wrapper is to abstract away the complexities of library initialization, data parsing, query execution, and results serialization. It exposes a minimal API for loading an RDF graph from a Turtle file and executing SPARQL queries against it, returning the results in a straightforward, easy-to-parse string format. This makes it an ideal solution for applications that need to embed RDF query functionality without managing the intricate details of the underlying RDF processing stack.

  1 // C-source/wrapper.c
  2 #define _GNU_SOURCE
  3 #include <stdio.h>
  4 #include <stdlib.h>
  5 #include <string.h>
  6 #include <stdbool.h>
  7 
  8 #include <serd/serd.h>
  9 #include <sord/sord.h>
 10 #include <rasqal.h>
 11 #include <raptor2.h>
 12 
 13 static SordWorld *g_world = NULL;
 14 static SordModel *g_model = NULL;
 15 static SerdEnv   *g_env   = NULL;
 16 static char      *g_data_path = NULL;
 17 
 18 /* ---------- Load RDF into Sord via Serd ---------- */
 19 static int load_turtle_into_sord(const char *path){
 20   SerdURI base_uri = SERD_URI_NULL;
 21   SerdNode base = serd_node_new_file_uri((const uint8_t*)path, NULL, &base_uri, true);
 22   g_env = serd_env_new(&base);
 23   SerdReader *reader = sord_new_reader(g_model, g_env, SERD_TURTLE, NULL);
 24   if(!reader){ serd_node_free(&base); return -1; }
 25   SerdStatus st = serd_reader_read_file(reader, (const uint8_t*)path);
 26   serd_reader_free(reader);
 27   serd_node_free(&base);
 28   return st ? -1 : 0;
 29 }
 30 
 31 /* ---------- Public API ---------- */
 32 int rdf_init(const char *n3_path){
 33   g_world = sord_world_new();
 34   unsigned idx = SORD_SPO|SORD_OPS|SORD_PSO;
 35   g_model = sord_new(g_world, idx, false);
 36   if(load_turtle_into_sord(n3_path)) return -1;
 37   free(g_data_path);
 38   g_data_path = strdup(n3_path);
 39   return 0;
 40 }
 41 
 42 char* rdf_query(const char *sparql){
 43   if(!g_data_path) return NULL;
 44   rasqal_world *rw = rasqal_new_world(); if(!rw) return NULL;
 45   if(rasqal_world_open(rw)){ rasqal_free_world(rw); return NULL; }
 46 
 47   rasqal_query *q = rasqal_new_query(rw, "sparql", NULL);
 48   if(!q){ rasqal_free_world(rw); return NULL; }
 49   if(rasqal_query_prepare(q, (const unsigned char*)sparql, NULL)){
 50     rasqal_free_query(q); rasqal_free_world(rw); return NULL;
 51   }
 52 
 53   raptor_world *rapw = rasqal_world_get_raptor(rw);
 54   raptor_uri *file_uri = raptor_new_uri_from_uri_or_file_string(rapw, NULL, (const unsigned char*)g_data_path);
 55   if(!file_uri){ rasqal_free_query(q); rasqal_free_world(rw); return NULL; }
 56 
 57   rasqal_data_graph *dg = rasqal_new_data_graph_from_uri(rw, file_uri, NULL, RASQAL_DATA_GRAPH_BACKGROUND, NULL, NULL, NULL);
 58   if(!dg){ raptor_free_uri(file_uri); rasqal_free_query(q); rasqal_free_world(rw); return NULL; }
 59   if(rasqal_query_add_data_graph(q, dg)){
 60     rasqal_free_data_graph(dg); raptor_free_uri(file_uri);
 61     rasqal_free_query(q); rasqal_free_world(rw); return NULL;
 62   }
 63 
 64   rasqal_query_results *res = rasqal_query_execute(q);
 65   if(!res){ rasqal_free_query(q); rasqal_free_world(rw); return NULL; }
 66 
 67   /* Serialize results as TSV to a malloc'd string */
 68 char *buf = NULL; size_t buflen = 0;
 69 raptor_iostream *ios = raptor_new_iostream_to_string(rapw, (void**)&buf, &buflen, malloc);
 70 if(!ios){ rasqal_free_query_results(res); rasqal_free_query(q); rasqal_free_world(rw); return NULL; }
 71 
 72 /* One call handles SELECT/ASK/DESCRIBE/CONSTRUCT */
 73 if(rasqal_query_results_write(ios, res, "tsv", NULL, NULL, NULL) != 0){
 74   raptor_free_iostream(ios);
 75   if(buf) free(buf);
 76   rasqal_free_query_results(res); rasqal_free_query(q); rasqal_free_world(rw);
 77   return NULL;
 78 }
 79 raptor_free_iostream(ios);
 80 
 81 /* Optional: drop TSV header line so output is exactly one line per row */
 82 if(buf){
 83   char *nl = strchr(buf, '\n');
 84   if(nl && nl[1]) {
 85     char *body = strdup(nl+1);
 86     free(buf);
 87     buf = body;
 88   }
 89 }
 90 
 91 rasqal_free_query_results(res);
 92 rasqal_free_query(q);
 93 rasqal_free_world(rw);
 94 return buf;
 95 }
 96 
 97 void rdf_free(void){
 98   if(g_env) serd_env_free(g_env);
 99   if(g_model) sord_free(g_model);
100   if(g_world) sord_world_free(g_world);
101   g_env=NULL; g_model=NULL; g_world=NULL;
102   free(g_data_path); g_data_path=NULL;
103 }
104 
105 char* rdf_query_copy(const char *sparql){
106   char *s = rdf_query(sparql);
107   if (!s) return NULL;
108   size_t n = strlen(s);
109   char *out = (char*)malloc(n+1);
110   if (!out){ free(s); return NULL; }
111   memcpy(out, s, n+1);
112   free(s);
113   return out;
114 }
115 
116 /* Optional: tiny demo main if you want a CLI.
117    Compile by adding -DRDF_DEMO_MAIN to CFLAGS, then:
118    ./rdfwrap news.n3 'SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 3'
119 */
120 #ifdef RDF_DEMO_MAIN
121 int main(int argc, char** argv){
122   if(argc < 3){ fprintf(stderr,"usage: %s <data.ttl> <sparql>\n", argv[0]); return 1; }
123   if(rdf_init(argv[1])){ fprintf(stderr,"failed to load %s\n", argv[1]); return 2; }
124   char* s = rdf_query(argv[2]);
125   if(!s){ fprintf(stderr,"query failed\n"); rdf_free(); return 3; }
126   fputs(s, stdout); free(s);
127   rdf_free(); return 0;
128 }
129 #endif

The core functionality begins with initialization and data loading, handled by the rdf_init() function. This function sets up the necessary in-memory storage by creating a SordWorld and a SordModel, which serve as the container for the RDF graph. It then calls the internal helper function load_turtle_into_sord() that uses the serd parser to efficiently read and load the triples from a specified Turtle (.ttl) file into the sord model. This process establishes the in-memory database that all subsequent queries will be executed against. The path to the data file is stored globally for later use by the query engine.

Once the data is loaded, the rdf_query() function provides the mechanism for executing SPARQL queries. It orchestrates the rasqal query engine to parse and prepare the SPARQL query string. The function then uses the raptor2 library to create a URI reference to the original data file and associates it with the query as a data graph. After executing the query, rasqal and raptor work together to serialize the query results into a Tab-Separated Values (TSV) formatted string. As a final convenience, the code processes this string to remove the header row, ensuring that the returned buffer contains only the result data, with each row representing a solution.

Finally, the wrapper provides essential memory management and utility functions. The rdf_free() function is responsible for cleanly deallocating all resources, including the sord world and model, the serd environment, and any other globally allocated memory, preventing memory leaks. The rdf_query_copy() function is a simple convenience utility that executes a query via rdf_query() and returns a new, separately allocated copy of the result string, which can be useful for certain memory management patterns. The code also includes an optional main function, enabled by the RDF_DEMO_MAIN macro, which demonstrates the wrapper’s usage and allows it to function as a standalone command-line tool for quick testing.

A Gerbil Scheme Shim to Call The C Language Wrapper Code

The shim code is n the source file rdfwrap.ss:

 1 ;; rdfwrap.ss — minimal FFI for libRDFWrap.dylib using :std/foreign
 2 (import :std/foreign)
 3 (export rdf-init rdf-query rdf-free)
 4 
 5 ;; Wrap FFI forms in begin-ffi so helper macros are available and exports are set
 6 (begin-ffi (rdf-init rdf-query rdf-free)
 7   ;; Declare the C functions provided by libRDFWrap.dylib
 8   (c-declare "
 9     #include <stdlib.h>
10     int   rdf_init(const char*);
11     char* rdf_query_copy(const char*);
12     void  rdf_free(void);
13   ")
14 
15   ;; FFI bindings
16   (define-c-lambda rdf-init  (char-string) int         "rdf_init")
17   (define-c-lambda rdf-query (char-string) char-string "rdf_query_copy")
18   (define-c-lambda rdf-free  ()            void        "rdf_free"))

This Scheme source file serves as a crucial Foreign Function Interface (FFI) shim, creating a bridge between the high-level Scheme environment and the low-level C functions compiled into the libRDFWrap.dylib shared library. Its purpose is to “wrap” the C functions, making them directly callable from Scheme code as if they were native procedures. By handling the data type conversions and function call mechanics, this shim abstracts away the complexity of interoperating between the two languages, providing a clean and idiomatic Scheme API for the underlying RDF processing engine.

The entire FFI definition is encapsulated within a (begin-ffi …) block, which sets up the necessary context for interfacing with foreign code. Inside this block, the first step is the c-declare form. This form contains a string of C code that declares the function prototypes for the C library’s public API. By providing the signatures for rdf_init, rdf_query_copy, and rdf_free, it informs the Scheme FFI about the exact names, argument types, and return types of the C functions it needs to connect with. This declaration acts as a contract, ensuring that the Scheme bindings will match the expectations of the compiled C library.

Following the declaration, the script defines the actual Scheme procedures using define-c-lambda. Each of these forms creates a direct binding from a new Scheme function to a C function. For instance, (define-c-lambda rdf-init …) creates a Scheme function named rdf-init that calls the C function of the same name. This form also specifies the marshalling of data types between the two environments, such as converting a Scheme string to a C char-string (const char*).

Notably, the Scheme function rdf-query is explicitly mapped to the C function rdf_query_copy. This is a deliberate design choice to simplify memory management. The rdf_query_copy function in C returns a distinct, newly allocated copy of the result string. This prevents the Scheme garbage collector from trying to manage memory that was allocated by the C library’s malloc, avoiding potential conflicts and memory corruption. The final binding for rdf-free provides a way to call the C cleanup function, ensuring that all resources allocated by the C library are properly released.

Gerbil Scheme Main Program for this Example

The main test program is in the file test.ss:

 1 (import "rdfwrap" :std/srfi/13)
 2 (export main)
 3 
 4 (define default-query "SELECT ?s ?p ?o WHERE { ?s ?p ?o }")
 5 
 6 (define (usage name)
 7   (display "Usage: ") (display name) (display " [data-file [query]]\n")
 8   (display "  data-file: RDF file (ttl/n3/nt). Default: mini.nt\n")
 9   (display "  query    : SPARQL SELECT query or @file to read from file. Default: ")
10   (display default-query) (newline))
11 
12 (define (main . args)
13   (let* ((prog (car (command-line)))
14          (path (if (pair? args) (car args) "mini.nt"))
15          (rawq (if (and (pair? args) (pair? (cdr args))) (cadr args) default-query))
16          (query (if (string-prefix? "@" rawq)
17                      (let ((f (substring rawq 1 (string-length rawq))))
18                        (call-with-input-file f
19                          (lambda (p)
20                            (let loop ((acc '()))
21                              (let ((ch (read-char p)))
22                                (if (eof-object? ch)
23                                    (list->string (reverse acc))
24                                    (loop (cons ch acc))))))))
25                      rawq)))
26     (when (or (string=? path "-h") (string=? path "--help"))
27       (usage prog)
28       (exit 0))
29     (unless (zero? (rdf-init path)) (error "rdf-init failed" path))
30     (let ((out (rdf-query query)))
31       (when out (display out)))
32     (rdf-free)
33     (exit 0)))

This Scheme script provides a user-friendly command-line interface (CLI) for the underlying C-based RDF wrapper. It acts as a high-level controller, responsible for parsing user input, managing file operations, and invoking the core RDF processing functions exposed by the C library. The script allows a user to specify an RDF data file and a SPARQL query directly on the command line or from a file. By handling argument parsing and I/O, it simplifies the process of interacting with the RDF engine, making it accessible and easy to use for quick queries and testing without needing to write and compile C code.

The script’s main function serves as the primary entry point and is responsible for processing command-line arguments. It intelligently determines the RDF data file path and the SPARQL query string from the arguments provided by the user. If the user omits these arguments, the script falls back to sensible defaults: “mini.nt” for the data file and a simple SELECT query to fetch all triples. Furthermore, it includes a basic help mechanism, displaying a usage message if the user provides “-h” or “—help” as an argument, guiding them on the correct command structure.

A key feature of the script is its ability to read the SPARQL query from two different sources. By default, it treats the command-line argument as the query string itself. However, if the query argument is prefixed with an “@” symbol (e.g., @myquery.sparql), the script interprets the rest of the string as a filename. It then proceeds to open and read the entire contents of that file, loading it into memory as the query to be executed. This flexibility allows users to easily run complex, multi-line queries that would be cumbersome to type directly into the terminal.

After parsing the inputs, the script interfaces directly with the C wrapper’s functions. It first calls rdf-init to load the specified RDF data file into the in-memory model. If initialization is successful, it passes the prepared SPARQL query to the rdf-query function, which executes the query and returns the results as a single string. The script then prints this result string to standard output. Finally, to ensure proper resource management and prevent memory leaks, it calls rdf-free to clean up the C library’s allocated resources before the program terminates.

Example Output

Test the C library code:

 1 $ ./DEMO_rdfwrap mini.nt "SELECT ?s ?p ?o WHERE { ?s ?p ?o }"
 2 <http://example.org/article1>   <http://purl.org/dc/elements/1.1/title> "AI Breakthrough Announced"
 3 <http://example.org/article1>   <http://purl.org/dc/elements/1.1/creator>   <http://example.org/alice>
 4 <http://example.org/alice>  <http://xmlns.com/foaf/0.1/name>    "Alice Smith"
 5 
 6 $ ./DEMO_rdfwrap data.ttl "SELECT ?s ?p ?o WHERE { ?s ?p ?o }"
 7 <http://example.org/article1>   <http://purl.org/dc/elements/1.1/title> "AI Breakthrough Announced"
 8 <http://example.org/article1>   <http://purl.org/dc/elements/1.1/creator>   <http://example.org/alice>
 9 <http://example.org/article1>   <http://purl.org/dc/elements/1.1/date>  "2025-08-27"
10 <http://example.org/article2>   <http://purl.org/dc/elements/1.1/title> "Local Team Wins Championship"
11 <http://example.org/article2>   <http://purl.org/dc/elements/1.1/creator>   <http://example.org/bob>
12 <http://example.org/alice>  <http://xmlns.com/foaf/0.1/name>    "Alice Smith"
13 <http://example.org/bob>    <http://xmlns.com/foaf/0.1/name>    "Bob Jones"

Test the Gerbil Scheme client code:

 1 # Usage: ./TEST_client [data-file [query]]
 2 # Defaults to data-file=mini.nt and a simple SELECT * pattern
 3 # You can load the query from a file by prefixing with '@' (e.g., @query.sparql)
 4 $ ./TEST_client
 5 <http://example.org/article1>   <http://purl.org/dc/elements/1.1/title> "AI Breakthrough Announced"
 6 <http://example.org/article1>   <http://purl.org/dc/elements/1.1/creator>   <http://example.org/alice>
 7 <http://example.org/alice>  <http://xmlns.com/foaf/0.1/name>    "Alice Smith"
 8 
 9 # Specify a data file and a custom query
10 $ ./TEST_client data.ttl "SELECT ?s WHERE { ?s ?p ?o } LIMIT 5"
11 <http://example.org/article1>
12 <http://example.org/article2>
13 <http://example.org/alice>
14 <http://example.org/bob>
15 
16 # Or read the query from a file
17 $ cat > q.sparql <<'Q'
18 SELECT ?s WHERE { ?s ?p ?o } LIMIT 3
19 Q
20 $ ./TEST_client data.ttl @q.sparql
21 <http://example.org/article1>
22 <http://example.org/article2>
23 <http://example.org/alice>

Introduction To Writing Command Line Utilities

Writing command line utilities is one of the most rewarding ways to use Gerbil Scheme. A small, fast executable that does one thing well can become a trusted part of your daily workflow, and Gerbil makes this style of development unusually pleasant. With a standard library that includes tools for argument parsing, clean error reporting, and multi-call interfaces, you can focus on the logic of your utility rather than boilerplate. The result is code that feels both expressive and pragmatic: terse enough to fit in a few dozen lines, yet powerful enough to interoperate with your wider toolchain. In a world of ever-growing frameworks and layers of abstraction, a lean Gerbil command line program can be a breath of fresh air.

The Gerbil ecosystem is designed around building and packaging binaries, so distribution is straightforward. A single source file can be compiled into a standalone executable, or for more ambitious projects, you can define a build script that packages multiple tools under one roof. Command line options are handled declaratively with getopt, while helper libraries provide standard exit codes, usage banners, and even multi-call support for utilities that expose multiple subcommands. This structured approach encourages writing utilities that are not only functional but also friendly to end users, programs that behave consistently and provide clear feedback. Once you have mastered this pattern, it becomes natural to treat Gerbil as both a systems language and a scripting tool.

Equally important is the mindset that comes with writing command line utilities. Each utility should feel like a sharp instrument, simple to invoke with predictable behavior and minimal dependencies. Gerbil lets you achieve this balance while still giving you access to higher-level abstractions when needed, whether you are parsing JSON, making HTTP requests, or wrapping a C library through the foreign function interface. This chapter will show how to construct utilities using both a simple structure and then using a more flexible structure by providing minimal “hello world” executable for each approach. We will explore both the mechanics of building programs, and also the design philosophy that makes small tools enduringly useful.

Overview of the Command Line Utilities Project Structure and Build System

Here we look at two examples of ways to structure Gerbil Scheme command line applications set up in two project directories:

command_line_utilities_first_demo_START_HERE - This is the simplest structure and the one I usually use.
command_line_utilities - This is a more general purpose structure using a separate lib.ss and main.ss source files.

Simple Structure for Command Line Utilities

Let’s take a look at the directory structure and code, then build and run the example:

 1 $ pwd
 2 command_line_utilities_first_demo_START_HERE
 3 $ ls
 4 README.md   test-tool.ss
 5 $ gxc -exe -o test-tool test-tool.ss
 6 $ ls -lh
 7 total 128
 8 -rw-r--r--  1 markw  staff   206B Aug 30 12:40 README.md
 9 -rwxr-xr-x  1 markw  staff    54K Aug 30 12:39 test-tool
10 -rw-r--r--@ 1 markw  staff   601B Aug 30 12:33 test-tool.ss
11 $ ./test-tool --name Mark -v 
12 verbose on
13 Hello, Mark
14 $ rm test-tool
15 $

This program (file test-mytool.ss) demonstrates how to construct a complete, albeit simple, command-line utility in Gerbil Scheme. It leverages the powerful :std/cli/getopt library to handle argument parsing, a common requirement for such tools. The script is designed to accept two specific command-line options: a mandatory —name option that requires a string value, and an optional boolean flag, —verbose (or its short form -v), which toggles additional output. The program’s logic includes robust argument validation, printing a usage message and exiting if the required name is not provided. Upon successful execution, it greets the user by name, optionally printing a “verbose on” message if the corresponding flag was set, showcasing a standard pattern for creating interactive and user-friendly command-line applications.

The example file test-tool.ss:

 1 ;; test-tool.ss
 2 (export main)
 3 (import :std/cli/getopt :std/cli/print-exit)
 4 (import :std/format) ;; for 'format'
 5 
 6 (def (usage)
 7   (print-exit 2 "usage: mytool [-v] --name=STR [ARGS...]"))
 8 
 9 (def (main . argv)
10   (let* ((parser (getopt (flag 'verbose "-v" "--verbose")
11                          (option 'name "--name" help: "Your name" value: identity)))
12          (opts (getopt-parse parser argv))
13          (name (hash-get opts 'name))
14          (verbose (hash-get opts 'verbose)))
15     (unless name (usage))
16     (when verbose (display "verbose on\n"))
17     (displayln (format "Hello, ~a\n" name))
18     0))

Let’s dissect the code’s structure. At the top, we import the necessary libraries: :std/cli/getopt for parsing arguments, :std/cli/print-exit for our usage message, and :std/format for string interpolation. The usage function is a standard convention for command-line tools, providing help text and exiting with a non-zero status code to indicate an error. The core of the argument parsing logic is defined in the call to getopt, where we create a parser. We define a boolean flag named verbose triggered by either -v or —verbose, and a required option named name which expects a value, processed here by the identity function (meaning, we take the provided string as-is).

The main function orchestrates the program’s execution. It receives the command-line arguments as a list in argv and passes them to getopt-parse, which returns a hash table of options. We then use hash-get to extract the values associated with the ’name and ’verbose symbols. The logic then proceeds with essential validation using unless to ensure the name option was provided, calling usage if it wasn’t. The when form conditionally prints the verbose message. Finally, the program prints its greeting using format and displayln, and returns 0, the conventional exit code for successful completion.

A More Flexible Structure for Command Line Utilities

This next listing of a Bash shell interaction demonstrates a ccommand-line utility project written in Gerbil Scheme. The terminal session walks through the entire lifecycle of the project, starting with an inspection of the directory structure. You’ll see the standard Gerbil project files: a gerbil.pkg file to declare the package, a build.ss script that defines how to compile the source code, and a Makefile for convenience. The source code (to be listed ater) is split into a main.ss entrypoint and a lib.ss library module. The session continues by showing the compilation process using make, which invokes Gerbil’s package manager, gxpkg. Finally, we execute the compiled program, testing its command-line argument parsing and its ability to perform a simple action, in this case printing the current working directory, similar to the Unix pwd command. This example serves as a practical template for structuring, building, and running standalone applications with Gerbil.

 1 $ pwd
 2 /Users/markw/GITHUB/gerbil_scheme_book/source_code/command_line_utilities
 3 Marks-Mac-mini:command_line_utilities $ ls
 4 build.ss        gerbil.pkg      manifest.ss
 5 command_line_utilities  Makefile        README.md
 6 Marks-Mac-mini:command_line_utilities $ ls -l command_line_utilities 
 7 total 16
 8 -rw-r--r--  1 markw  staff   313 Aug 31 12:28 lib.ss
 9 -rw-r--r--  1 markw  staff  1426 Aug 31 12:31 main.ss
10 Marks-Mac-mini:command_line_utilities $ cat gerbil.pkg 
11 (package: markw)
12 Marks-Mac-mini:command_line_utilities $ cat build.ss 
13 #!/usr/bin/env gxi
14 ;;; -*- Gerbil -*-
15 (import :std/build-script)
16 
17 (defbuild-script
18   '("command_line_utilities/lib"
19     (exe: "command_line_utilities/main" bin: "command_line_utilities")))
20 Marks-Mac-mini:command_line_utilities $ make
21 /opt/homebrew/bin/gxpkg deps -i
22 /opt/homebrew/bin/gxpkg build
23 ... build in current directory
24 Marks-Mac-mini:command_line_utilities $ .gerbil/bin/command_line_utilities --help
25 Unknown argument: --help
26 usage: command_line_utilities [pwd|ls] [--file PATH]
27 Marks-Mac-mini:command_line_utilities $ .gerbil/bin/command_line_utilities pwd
28 /Users/markw/GITHUB/gerbil_scheme_book/source_code/command_line_utilities/

The project’s structure and build process are quintessential Gerbil. The gerbil.pkg file simply declares the top-level namespace, markw, for the project. The core of the build logic resides in build.ss, which uses the defbuild-script macro from Gerbil’s standard build library. It declares two targets: the first compiles command_line_utilities/lib.ss into a library, and the second, more interestingly, compiles command_line_utilities/main.ss into an executable file. The exe: keyword specifies the main source file, while the bin: keyword defines the name of the resulting binary, command_line_utilities, which is placed in the local .gerbil/bin/ directory upon compilation.

Wrap Up for Writing Command Line Utilities in Gerbil Scheme and Notes On User Built Libraries

The material here serves as a tutorial for getting started. This book is a work in progress: the next two chapters (currently being written) are additional command line application examples.

Notes On User Installed Libraries

As you work through the examples in this book, you might ask yourself where compiled files and libraries you create are stored. Take a look in the /.gerbil/lib directory on your laptop or server. The ~/.gerbil/lib directory is the default location for user installed Gerbil Scheme libraries and modules. When you use Gerbil’s package manager gxi -d or compile your own modules, the compiled artifacts are placed here so the Gerbil runtime and compiler can find them:

 1 $ pwd
 2 /Users/markw/.gerbil/lib
 3 Marks-Mac-mini:lib $ ls
 4 dbpedia     ffi.o4      ffi~0.o13   ffi~0.o9    stop-words~0.o1
 5 ffi.o1      ffi.o5      ffi~0.o14   gemini      test.o1
 6 ffi.o10     ffi.o6      ffi~0.o15   groq        test.o2
 7 ffi.o11     ffi.o7      ffi~0.o16   hello       test.o3
 8 ffi.o12     ffi.o8      ffi~0.o2    markw       test.ssi
 9 ffi.o13     ffi.o9      ffi~0.o3    ollama      test~0.o1
10 ffi.o14     ffi.ssi     ffi~0.o4    openai      test~0.o2
11 ffi.o15     ffi~0.o1    ffi~0.o5    openai-client   test~0.o3
12 ffi.o16     ffi~0.o10   ffi~0.o6    static
13 ffi.o2      ffi~0.o11   ffi~0.o7    stop-words.o1
14 ffi.o3      ffi~0.o12   ffi~0.o8    stop-words.ssi
15 $ ls ollama
16 ollama.o1   ollama.o4   ollama.o7   ollama~0.o2 ollama~0.o5
17 ollama.o2   ollama.o5   ollama.ssi  ollama~0.o3 ollama~0.o6
18 ollama.o3   ollama.o6   ollama~0.o1 ollama~0.o4 ollama~0.o7

The compiled stop-words files were created running the examples in the previous NLP chapter. ollama, openai, etc. were created in the chapters covering Large Language Models (LLMS), etc.

Gerbil maintains a search path for modules. ~/.gerbil/lib is a standard entry in this path, allowing you to (import …) your own or third-party code without specifying a full path.

Gerbil Scheme is an ahead-of-time (AOT) compiled language that uses Gambit-C as a backend. When you compile a module, it produces several files:

Object Files (.o*): These are the compiled object code for your Scheme files. The number suffix (e.g., .o1, .o2) often relates to compilation passes or optimization levels. The files with a tilde (~) are typically temporary or backup files from the compilation process.
Module Metadata (.ssi): This “Scheme Source Information” file contains metadata about the compiled module, including its dependencies, exported symbols, and macros. This file is essential for the (import) system to work correctly.

Command Line Applications For NLP

We will use both my NLP library from a prior chapter as well as the library for using OpenAI’s GPT-5 model to write two command line utilities.

Re-using the NLP Library For a Command Line Utility to Identify Categories or Topics in Input Text

This program is a command-line utility written in Gerbil Scheme that acts as a wrapper for a Natural Language Processing (NLP) library (that we saw in an earlier chapter) to perform text classification. The script is designed to be highly versatile, capable of accepting text input either directly from command-line arguments or from a piped stream via standard input, making it a useful component in a shell-based data processing pipeline. Its core purpose is to take a block of text, send it to the NLP engine for analysis, and then extract a list of classification categories from the resulting data structure. After extracting these categories, it performs a minor cleanup operation on each one—removing a trailing .txt file extension—before printing each clean category name on a new line to the console.

The code for this example can be used interpretively using gxi or compiled into a static application.

Listing for categories.ss:

 1 #!/usr/bin/env gxi
 2 
 3 (import :nlp/main
 4         :nlp/nlp
 5         :std/iter
 6         :std/misc/ports)  ;; read-all-as-string
 7 
 8 (export main)
 9 
10 (def get-categories (lambda (x) (map car (cadddr x))))
11 
12 (def (remove-txt str substr)
13   (if (string-suffix? substr str)
14       (substring str 0 (- (string-length str) (string-length substr)))
15       str))
16 
17 (def (main . args)
18   (let* ((input-str (if (null? args)
19                         (read-all-as-string (current-input-port)) ; piped stdin
20                         (string-join args " ")))                  ; argv
21          (response (nlp/main#process-string input-str)))
22     (for (s (get-categories response))
23       (displayln (remove-txt s ".txt"))))
24 )

This command line utility can either use all command line arguments, joined as a string with interleaving spaces, or text can be piped into the utility via stdin.

The core logic resides within the main function which first determines the source of the input text. The (if (null? args) …) expression is a standard idiom for a command-line tool, allowing it to function seamlessly whether it’s receiving data through a pipe or as direct arguments. After capturing the input, the program makes a single call to the external nlp/main#process-string function, which handles the complex task of NLP analysis. The program concludes by iterating through the results with a for loop, processing each identified category and printing it to standard output using displayln.

Two small helper functions provide insight into the data the script is handling. The get-categories function uses a combination of cadddr and map car, which tells us that it expects a very specific nested list structure from the NLP library and is designed to extract a list of names from it. This tight coupling to the library’s output format is common in wrapper scripts. The second helper function remove-txt is a simple string manipulation function whose existence is required because the raw category data from the model includes file extensions that are not desired in the final output.

Here we build an executable and run both the compiled and linked executable and also run interpretivey using gxi. We also take input from the command line or via stdin:

 1 $ gxc -O -exe -o categories categories.ss                             
 2 /tmp/gxc.1757624261.449784/nlp__utils.scm:
 3 /tmp/gxc.1757624261.449784/nlp__fasttag.scm:
 4 /tmp/gxc.1757624261.449784/nlp__proper-names.scm:
 5 /tmp/gxc.1757624261.449784/nlp__category.scm:
 6 /tmp/gxc.1757624261.449784/nlp__main.scm:
 7 /tmp/gxc.1757624261.449784/nlp__nlp.scm:
 8 /tmp/gxc.1757624261.449784/categories.scm:
 9 /Users/markw/GITHUB/gerbil_scheme_book/source_code/command_line_NLP/categories__exe.scm:
10 /tmp/gxc.1757624261.449784/nlp__utils.c:
11 /tmp/gxc.1757624261.449784/nlp__fasttag.c:
12 /tmp/gxc.1757624261.449784/nlp__proper-names.c:
13 /tmp/gxc.1757624261.449784/nlp__category.c:
14 /tmp/gxc.1757624261.449784/nlp__main.c:
15 /tmp/gxc.1757624261.449784/nlp__nlp.c:
16 /tmp/gxc.1757624261.449784/categories.c:
17 /Users/markw/GITHUB/gerbil_scheme_book/source_code/command_line_NLP/categories__exe.c:
18 /Users/markw/GITHUB/gerbil_scheme_book/source_code/command_line_NLP/categories__exe_.c:
19 Marks-Mac-mini:command_line_NLP $ ./categories health exercise vitamines 
20 health_exercise
21 health_nutrition
22 Marks-Mac-mini:command_line_NLP $ ./categories.ss health exercise vitamines
23 health_exercise
24 health_nutrition
25 Marks-Mac-mini:command_line_NLP $ cat ~/GITHUB/Clojure-AI-Book-Code/docs_qa/data/health.txt | ./categories.ss
26 computers_ai_search
27 health
28 Marks-Mac-mini:command_line_NLP $ cat ~/GITHUB/Clojure-AI-Book-Code/docs_qa/data/health.txt | ./categories   
29 computers_ai_search
30 health

Using OpenAI’s GPT-5 model To Summarize Input Text

The example file summarize.ss is similar to the previous listing of categories.ss except we use the OpenAI GPT-5 odel with a system prompt instructing the model to summarize input text:

 1 #!/usr/bin/env gxi
 2 
 3 (import :openai/openai
 4         :std/iter
 5         :std/misc/ports)  ;; read-all-as-string
 6 
 7 (export main)
 8 
 9 (def system-prompt
10 #<<PTEXT
11 You ar a master at libguistics, technology, and general knowledge. When given input
12 text you return an accurate summary of the text. You only return the summary,
13 no other text.
14 
15 HERE IS THE TEXT TO SUMMARIZE:
16 
17 PTEXT
18 )
19 
20 
21 (def (main . args)
22   (let* ((input-str (if (null? args)
23                         (read-all-as-string (current-input-port)) ; piped stdin
24                         (string-join args " ")))                  ; argv
25          (response (openai/openai#openai (string-append input-str system-prompt))))
26     (displayln response)))

As in the last example, we build a compiled and linked executable but run the script both with the gxi interpreter and then again with different input text using the executable file:

 1 $ gxc -O -exe -o summarize summarize.ss
 2 /tmp/gxc.1757631133.993665/openai__openai.scm:
 3 /tmp/gxc.1757631133.993665/summarize.scm:
 4 /Users/markw/GITHUB/gerbil_scheme_book/source_code/command_line_NLP/summarize__exe.scm:
 5 /tmp/gxc.1757631133.993665/openai__openai.c:
 6 /tmp/gxc.1757631133.993665/summarize.c:
 7 /Users/markw/GITHUB/gerbil_scheme_book/source_code/command_line_NLP/summarize__exe.c:
 8 /Users/markw/GITHUB/gerbil_scheme_book/source_code/command_line_NLP/summarize__exe_.c:
 9 ld: warning: ignoring duplicate libraries: '-lcrypto', '-lgambit', '-lm'
10 Marks-Mac-mini:command_line_NLP $ cat ~/temp/Clojure-AI-Book-Code/docs_qa/data/health.txt | ./summarize.ss
11 The text contains three parts: practical self-care and stress-management tips (live one day at a time, do enjoyable activities daily, show love, bathe to relieve tension, help others, prioritize understanding over being understood, improve appearance, schedule realistic days with breaks, be flexible, and stop destructive self-talk); a physician’s commentary on running injuries arguing that overuse and mechanical explanations have driven shoe- and orthotic-focused treatments, recounting a heel injury despite frequent shoe replacement and predicting a needed paradigm shift in shoe design and training; and a list of brief medical definitions covering terms such as adaptive immunity, addiction, ATP, adequate intake, agoraphobia, amnesia, amputation, anaerobic/anaerobic exercise, dry eye, duct, upper airway resistance syndrome, urea, ureter, and urethra.
12 Marks-Mac-mini:command_line_NLP $ cat ~/temp/Clojure-AI-Book-Code/docs_qa/data/economics.txt | ./summarize   
13 The text describes the Austrian School of economics—founded by Austrians such as Carl Menger, Eugen von Böhm-Bawerk and Ludwig von Mises—which stresses the spontaneous ordering role of the price mechanism, skepticism about mathematical modelling because of subjective human choices, and advocacy of laissez-faire policies, strict enforcement of voluntary contracts, and minimal government intervention. It also defines economics as the social science studying production, distribution and consumption of goods and services (formerly called political economy), distinguishes microeconomics (behavior of individual agents and markets) from macroeconomics (aggregate issues like unemployment, inflation and growth), and emphasizes that economic agents face scarcity, have multiple ends and limited resources, and make choices to maximize value subject to informational and cognitive constraints. The text notes the professionalization of economics since about 1900, with widespread university programs and graduate degrees.

Table of Contents

Preface

A Comment on Licenses

Running Gerbil Scheme on macOS and Linux

History and Background to the Gerbil Scheme Project

Introduction: Gerbil as a Systems Language

Gerbil Ecosystem: Package Management and Foundational Libraries

Discovering Packages

Core Gerbil Toolkits Used in this Book: :std/net/request and :std/text/json

Suppressing or Fixing Compilation Warning Messages on macOS

Setting Up Gerbil Scheme Development Environment

Emacs Configuration

Google Gemini API

Example Code

Example Output

Ollama

Example Code

Install Ollama and Pull a Model to Experiment With

Example Output

OpenAI API

Example Code

Example Output

Inexpensive and Fast LLM Inference Using the Groq Service

Structure of Project and Build Instructions

Kimi2 (Moonshot AI)

gpt-oss-120b (OpenAI)

Comparison and Use Cases

groq_inference.ss Utility

Example scripts: kimi2.ss and gpt-oss-120b.ss

Wikidata API Using SPARQL Queries

Example Code

Example Output

Code for Natural Language Processing (NLP)

macOS Specific Build Notes

Structure of Project and Build Instructions

Top Level Project Code

nlp.ss

Other Source Files

Test Run:

Building and Running the Command Line Tool

Gerbil Scheme FFI Example Using the C Language Raptor RDF Library

Implementation of a FFI Bridge Library for Raptor

Test Code

Complete FFI Example: C Language Wrapper for Rasqal SPARQL Library and Sord RDF Datastore Library

Library Selection

Overview of the Project Structure and Build System

Implementation of the C Language Wrapper

A Gerbil Scheme Shim to Call The C Language Wrapper Code

Gerbil Scheme Main Program for this Example

Example Output

Introduction To Writing Command Line Utilities

Overview of the Command Line Utilities Project Structure and Build System

Simple Structure for Command Line Utilities

A More Flexible Structure for Command Line Utilities

Wrap Up for Writing Command Line Utilities in Gerbil Scheme and Notes On User Built Libraries

Notes On User Installed Libraries

Command Line Applications For NLP

Re-using the NLP Library For a Command Line Utility to Identify Categories or Topics in Input Text

Using OpenAI’s GPT-5 model To Summarize Input Text