Using the OpenAI, Anthropic, Mistral, and Local Hugging Face Large Language Model APIs in Racket
Large Language Models (LLMs) have supercharged AI capabilities, affected the job market for many knowledge work careers, and placed huge demands on electrical power infrastructure.
In the development of practical AI systems, LLMs like those provided by OpenAI, Anthropic, and Hugging Face have emerged as pivotal tools for numerous applications including natural language processing, generation, and understanding. These models, powered by deep learning architectures, encapsulate a wealth of knowledge and computational capabilities. As a Racket Scheme enthusiast embarking on the journey of intertwining the elegance of Racket with the power of these modern language models, you are opening a gateway to a realm of possibilities that we begin to explore here after covering background material in the next session.
The Cambrian Explosion in Language Technology: A Historical Trajectory
The sudden and widespread emergence of LLMs in the early 2020s represents a watershed moment in the history of computing, a technological inflection point with profound implications for science, industry, and society.
Yet, this apparent revolution was not a singular event but the culmination of a multi-decade research trajectory.
After using simpler neural networks in the 1980s (I personally used neural models in engineering projects such as a classifier for a bomb detector my company designed and built for the FAA), the next major evolution in language modeling was precipitated by the deep learning revolution that swept through computer vision around 2012. The success of deep neural networks in image classification inspired researchers to adapt these architectures for language tasks. A pivotal innovation from this period was the development of word embeddings, most famously Word2Vec by Tomas Mikolov at Google in 2013. Instead of treating words as discrete symbols, word embeddings represent them as dense vectors in a continuous, high-dimensional semantic space. In this space, geometric relationships between vectors correspond to semantic relationships between words, enabling algebraic operations like the canonical example:
vector(′King) − vector(Man′) + vector(′Woman′) ≈ vector(′Queen′)
This was a crucial step towards models that could capture the meaning and relationships of words, rather than just their statistical co-occurrence (i.e., which words often appear together in text). Much of my paid work in the 1980s involved applications of neural networks but I mostly moved on to other technologies until the deep learning revolution in 2012 and after personal experiments with word embeddings and later sentence and paragraph embeddings I went all-in on deep learning leading to managing a deep learning team at Capital One.
To process sequences of these word vectors, we turned to Recurrent Neural Networks (RNNs). An RNN processes a sequence one element at a time, maintaining an internal “hidden state” that acts as a memory, theoretically allowing information from earlier in the sequence to influence the processing of later elements. This architecture seemed a natural fit for language. However, in practice, standard RNNs were plagued by the vanishing gradient problem and inability to handle long sequences or characters in text, effectively preventing the model from learning dependencies between words that were far apart. A key breakthrough that temporarily surmounted this challenge was the Long Short-Term Memory (LSTM) network.
The Transformer Inflection Point: “Attention Is All You Need”
Despite the success of LSTMs, a fundamental architectural bottleneck remained. Both RNNs and LSTMs are inherently sequential processors; they must compute the hidden state for token t before they can compute it for token t+1. This sequential dependency made it impossible to fully parallelize the computation across the tokens in a sequence, creating a significant performance barrier on modern hardware like GPUs. solution arrived in 2017 with a landmark paper from researchers at Google titled “Attention Is All You Need”. The paper introduced the Transformer architecture, which dispensed with recurrence entirely and relied instead on a mechanism called “self-attention.” The attention mechanism, first developed by Bahdanau et al. in 2014 for machine translation, allows a model to dynamically weigh the importance of different parts of the input sequence when producing an output.
Commercial and Open Weight LLMs
The OpenAI and Anthropic commercial APIs serve as gateways to some of the most advanced language models available today. By accessing these APIs, developers can harness the power of these models for a variety of applications. Here, we delve deeper into the distinctive features and capabilities that these APIs offer, which could be harnessed through a Racket interface.
OpenAI provides an API for developers to access models like GPT-5. The OpenAI API is designed with simplicity and ease of use in mind, making it a favorable choice for developers. It provides endpoints for different types of interactions, be it text completion, translation, or semantic search among others. We will use the completion API in this chapter. The robustness and versatility of the OpenAI API make it a valuable asset for anyone looking to integrate advanced language understanding and generation capabilities into their applications.
On the other hand, Anthropic is a newer entrant in the field but with a strong emphasis on building models that are not only powerful but also understandable and steerable. The Anthropic API serves as a portal to access their language models. While the detailed offerings and capabilities might evolve, the core ethos of Anthropic is to provide models that developers can interact with in a more intuitive and controlled manner. This aligns with a growing desire within the AI community for models that are not black boxes, but instead, offer a level of interpretability and control that makes them safer and more reliable to use in different contexts. We will use the Anthropic completion API.
What if you want the total control of running open LLMs on your own computers? The company Hugging Face maintains a huge repository of pre-trained models. Some of these models are licensed for research only but many are licensed (e.g., using Apache 2) for any commercial use. Many of the Hugging Face models are derived from Meta and other companies. We will use the llama.cpp server at the end of this chapter to run our own LLM on a laptop and access it via Racket code.
Lastly, this chapter will delve into practical examples showing the synergy between systems developed in Racket and the LLMs. Whether it’s automating creative writing, conducting semantic analysis, or building intelligent chatbots, the fusion of Racket with OpenAI, Anthropic, and Hugging Face’s LLMs provides many opportunities for you, dear reader, to write innovative software that utilizes the power of LLMs.
Introduction to the Applications of LLMs
The utility of LLMs extends across a broad spectrum of applications including but not limited to text generation, translation, summarization, question answering, and sentiment analysis. Their ability to understand and process natural language makes them indispensable tools in modern AI-driven solutions. However, with great power comes great responsibility. The deployment of LLMs raises imperative considerations regarding ethics, bias, and the potential for misuse. Moreover, the black-box nature of these models presents challenges in interpretability and control, which are active areas of research in the quest to make LLMs more understandable and safe. The advent of LLMs has undeniably propelled the field of NLP to new heights, yet the journey towards fully responsible and transparent utilization of these powerful models is an ongoing endeavor. I recommend reading material at Center for Humane Technology for issues of the safe use of AI. You might also be interested in a book I wrote in April 2023 Safe For Humans AI: A “humans-first” approach to designing and building AI systems (link for reading my book free online).
Using the OpenAI APIs in Racket
We will now have some fun using Racket Scheme and OpenAI’s APIs. The combination of Racket’s language features and programming environment with OpenAI’s linguistic models opens up many possibilities for developing sophisticated AI-driven applications.
Our goal is straightforward interaction with OpenAI’s APIs. The communication between your Racket code and OpenAI’s models is orchestrated through well-defined API requests and responses, allowing for a seamless exchange of data. The following sections will show the technical aspects of interfacing Racket with OpenAI’s APIs, showcasing how requests are formulated, transmitted, and how the JSON responses are handled. Whether your goal is to automate content generation, perform semantic analysis on text data, or build intelligent systems capable of engaging in natural language interactions, the code snippets and explanations provided will serve as a valuable resource in understanding and leveraging the power of AI through Racket and OpenAI’s APIs.
The Racket code listed below defines two functions, question and completion, aimed at interacting with the OpenAI API to leverage the GPT-5 Turbo model for text generation. The function question accepts a prompt argument and constructs a JSON payload following the OpenAI’s chat models schema. It constructs a value for prompt-data string containing a user message that instructs the model to “Answer the question” followed by the provided prompt. The auth lambda function within question is utilized to set necessary headers for the HTTP request, including the authorization header populated with the OpenAI API key obtained from the environment variable OPENAI_API_KEY. The function post from the net/http-easy library is employed to issue a POST request to the OpenAI API endpoint “https://api.openai.com/v1/chat/completions” with the crafted JSON payload and authentication headers. The response from the API is then parsed as JSON, and the content of the message from the first choice is extracted and returned.
The function completion, on the other hand, serves a specific use case of continuing text from a given prompt. It reformats the prompt to prepend the phrase “Continue writing from the following text: “ to the provided text, and then calls the function question with this modified prompt. This setup encapsulates the task of text continuation in a separate function, making it straightforward for developers to request text extensions from the OpenAI API by merely providing the initial text to the function completion. Through these functions, the code provides a structured mechanism to generate responses or text continuations.
This example was updated August 2024 when OpenAI released the new GPT-5 model.
1 #lang racket
2
3 (require net/http-easy)
4 (require racket/set)
5 (require racket/pretty)
6
7 (provide question-openai completion-openai embeddings-openai)
8
9 (define (helper-openai prefix prompt)
10 (let* ((prompt-data
11 (string-join
12 (list
13 (string-append
14 "{\"messages\": [ {\"role\": \"user\","
15 " \"content\": \"" prefix ": "
16 prompt
17 "\"}], \"model\": \"gpt-5\"}"))))
18 (auth (lambda (uri headers params)
19 (values
20 (hash-set*
21 headers
22 'authorization
23 (string-join
24 (list
25 "Bearer "
26 (getenv "OPENAI_API_KEY")))
27 'content-type "application/json")
28 params)))
29 (p
30 (post
31 "https://api.openai.com/v1/chat/completions"
32 #:auth auth
33 #:data prompt-data))
34 (r (response-json p)))
35 ;;(pretty-print r)
36 (hash-ref
37 (hash-ref (first (hash-ref r 'choices)) 'message)
38 'content)))
39
40
41 (define (question-openai prompt)
42 (helper-openai "Answer the question: " prompt))
43
44 (define (completion-openai prompt)
45 (helper-openai "Continue writing from the following text: "
46 prompt))
47
48 (define (embeddings-openai text)
49 (let* ((prompt-data
50 (string-join
51 (list
52 (string-append
53 "{\"input\": \"" text "\","
54 " \"model\": \"text-embedding-ada-002\"}"))))
55 (auth (lambda (uri headers params)
56 (values
57 (hash-set*
58 headers
59 'authorization
60 (string-join
61 (list
62 "Bearer "
63 (getenv "OPENAI_API_KEY")))
64 'content-type "application/json")
65 params)))
66 (p
67 (post
68 "https://api.openai.com/v1/embeddings"
69 #:auth auth
70 #:data prompt-data))
71 (r (response-json p)))
72 (hash-ref
73 (first (hash-ref r 'data))
74 'embedding)))
The output looks like (output from the second example shortened for brevity):
1 > (question "Mary is 30 and Harry is 25. Who is older?")
2 "Mary is older than Harry."
3 > (completion "Frank bought a new sports car. Frank drove")
4 Frank bought a new sports car. Frank drove it out of the dealership with a wide grin\
5 on his face. The sleek, aerodynamic design of the car hugged the road as he acceler\
6 ated, feeling the power under his hands. The adrenaline surged through his veins, an\
7 d he couldn't help but let out a triumphant shout as he merged onto the highway.
8
9 As he cruised down the open road, the wind whipping through his hair, Frank couldn't\
10 help but reflect on how far he had come. It had been a lifelong dream of his to own\
11 a sports car, a symbol of success and freedom in his eyes. He had worked tirelessly\
12 , saving every penny, making sacrifices along the way to finally make this dream a r\
13 eality.
14 ...
15 >
Here is more sample output sowing embeddings for a sentence (we will use embeddings in the next chapter). The OpenAI embedding model text-embedding-ada-002 produces a vector of length 1536 floats, only the first few are shown here:
1 > (embeddings-openai "Frank bought a new sports car. Frank drove")
2 (-0.0067744367 0.020329757 0.021399744 ...
We can also use “one shot prompting” to describe precisly how we want output formatted:
1 > (completion-openai "CONTEXT ONE SHOT EXAMPLE return function names and arguments a\
2 s a Lisp list no commas separating the arguments. for example: 'Please sum the numbe\
3 rs 4 1 2 7' should produce (sum 4 1 2 7). Identify tool names in the following text,\
4 returning only the tool names with arguments separated by commas. List of available\
5 tools is (ADD, SUM) PROMPT Please add the numbers 5, 8 and 12, and also sum the num\
6 bers 3, 44, and 88.")
7 (ADD 5 8 12)
8 (SUM 3 44 88)
Using the Anthropic APIs in Racket
Note: I stopped using Anthropic models in 2024 so this example is out of date.
The Racket code listed below defines two functions, question and completion, which facilitate interaction with the Anthropic API to access a language model named claude-instant-1 for text generation purposes. The function question takes two arguments: a prompt and a max-tokens value, which are used to construct a JSON payload that will be sent to the Anthropic API. Inside the function, several Racket libraries are utilized for handling HTTP requests and processing data. A POST request is initiated to the Anthropic API endpoint “https://api.anthropic.com/v1/complete” with the crafted JSON payload. This payload includes the prompt text, maximum tokens to sample, and specifies the model to be used. The auth lambda function is used to inject necessary headers for authentication and specifying the API version. Upon receiving the response from the API, it extracts the completion field from the JSON response, trims any leading or trailing whitespace, and returns it.
The function completion is defined to provide a more specific use-case scenario, where it is intended to continue text from a given prompt. It also accepts a max-tokens argument to limit the length of the generated text. This function internally calls the function question with a modified prompt that instructs the model to continue writing from the provided text. By doing so, it encapsulates the common task of text continuation, making it easy to request text extensions by simply providing the initial text and desired maximum token count. Through these defined functions, the code offers a structured way to interact with the Anthropic API for generating text responses or completions in a Racket Scheme environment.
1 #lang racket
2
3 (require net/http-easy)
4 (require racket/set)
5 (require pprint)
6
7 (provide question completion)
8
9 (define (question prompt max-tokens)
10 (let* ((prompt-data
11 (string-join
12 (list
13 (string-append
14 "{\"prompt\": \"\\n\\nHuman: "
15 prompt
16 "\\n\\nAssistant: \", \"max_tokens_to_sample\": "
17 (number->string max-tokens)
18 ", \"model\": \"claude-instant-1\" }"))))
19 (auth (lambda (uri headers params)
20 (values
21 (hash-set*
22 headers
23 'x-api-key
24 (getenv "ANTHROPIC_API_KEY")
25 'anthropic-version "2023-06-01"
26 'content-type "application/json")
27 params)))
28 (p
29 (post
30 "https://api.anthropic.com/v1/complete"
31 #:auth auth
32 #:data prompt-data))
33 (r (response-json p)))
34 (string-trim (hash-ref r 'completion))))
35
36 (define (completion prompt max-tokens)
37 (question
38 (string-append
39 "Continue writing from the following text: "
40 prompt)
41 max-tokens))
We will try the same examples we used with OpenAI APIs in the previous section:
1 $ racket
2 > (require "anthropic.rkt")
3 > (question "Mary is 30 and Harry is 25. Who is older?" 20)
4 "Mary is older than Harry. Mary is 30 years old and Harry is 25 years old."
5 > (completion "Frank bought a new sports car. Frank drove" 200)
6 "Here is a possible continuation of the story:\n\nFrank bought a new sports car. Fra\
7 nk drove excitedly to show off his new purchase. The sleek red convertible turned he\
8 ads as he cruised down the street with the top down. While stopping at a red light, \
9 Frank saw his neighbor Jane walking down the sidewalk. He pulled over and called out\
10 to her, \"Hey Jane, check out my new ride! Want to go for a spin?\" Jane smiled and\
11 said \"Wow that is one nice car! I'd love to go for a spin.\" She hopped in and the\
12 y sped off down the road, the wind in their hair. Frank was thrilled to show off his\
13 new sports car and even more thrilled to share it with his beautiful neighbor Jane.\
14 Little did he know this joyride would be the beginning of something more between th\
15 em."
16 >
While I usually use the OpenAPI APIs, I always like to have alternatives when I am using 3rd party infrastructure, even for personal research projects. The Anthropic LLMs definitely have a different “feel” than the OpenAPI APIs, and I enjoy using both.
Using a Local Hugging Face Llama2-13b-orca Model with Llama.cpp Server
Now we look at an approach to run LLMs locally on your own computers.
Diving into AI unveils many ways where modern language models play a pivotal role in bridging the gap between machines and human language. Among the many open and public models, I chose Hugging Face’s Llama2-13b-orca model because of its support for natural language processing tasks. To truly harness the potential of Llama2-13b-orca, an interface to Racket code is essential. This is where we use the Llama.cpp Server as a conduit between the local instance of the Hugging Face model and the applications that seek to utilize it. The combination of Llama2-13b-orca with the llama.cpp server code will meet our requirements for local deployment and ease of installation and use.
Installing and Running Llama.cpp server with a Llama2-13b-orca Model
The llama.cpp server acts as a conduit for translating REST API requests to the respective language model APIs. By setting up and running the llama.cpp server, a channel of communication is established, allowing Racket code to interact with these language models in a seamless manner. There is also a Python library to encapsulate running models inside a Python program (a subject I leave to my Python AI books).
I run the llama.cpp service easily on a M2 Mac with 16G of memory. Start by cloning the llama.cpp project and building it:
1 git clone https://github.com/ggerganov/llama.cpp.git
2 make
3 mkdir models
Then get a model file from https://huggingface.co/TheBloke/OpenAssistant-Llama2-13B-Orca-8K-3319-GGUF and copy to ./models directory:
1 $ ls -lh models
2 8.6G openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf
Note that there are many different variations of this model that trade off quality for memory use. I am using one of the larger models. If you only have 8G of memory try a smaller model.
Run the REST server:
1 ./server -m models/openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf -c 2048
We can test the REST server using the curl utility:
1 $ curl --request POST \
2 --url http://localhost:8080/completion \
3 --header "Content-Type: application/json" \
4 --data '{"prompt": "Answer the question: Mary is 30 years old and Sam is 25. Who\
5 is older and by how much?","n_predict": 128, "top_k": 1}'
6 {"content":"\nAnswer: Mary is older than Sam by 5 years.","generation_settings":{"fr\
7 equency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"mirostat":0,"m\
8 irostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"models/openassistant-ll\
9 ama2-13b-orca-8k-3319.Q5_K_M.gguf","n_ctx":2048,"n_keep":0,"n_predict":128,"n_probs"\
10 :0,"penalize_nl":true,"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.1\
11 00000023841858,"seed":4294967295,"stop":[],"stream":false,"temp":0.800000011920929,"\
12 tfs_z":1.0,"top_k":1,"top_p":0.949999988079071,"typical_p":1.0},"model":"models/open\
13 assistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf","prompt":"Answer the question: Mary i\
14 s 30 years old and Sam is 25. Who is older and by how much?","stop":true,"stopped_eo\
15 s":true,"stopped_limit":false,"stopped_word":false,"stopping_word":"","timings":{"pr\
16 edicted_ms":960.595,"predicted_n":13,"predicted_per_second":13.53327885321077,"predi\
17 cted_per_token_ms":73.89192307692308,"prompt_ms":539.3580000000001,"prompt_n":27,"pr\
18 ompt_per_second":50.05951520140611,"prompt_per_token_ms":19.976222222222223},"tokens\
19 _cached":40,"tokens_evaluated":27,"tokens_predicted":13,"truncated":false}
The important part of the output is:
1 "content":"Answer: Mary is older than Sam by 5 years."
In the next section we will write a simple library to extract data from Llama.cpp server responses.
A Racket Library for Using a Local Llama.cpp server with a Llama2-13b-orca Model
The following Racket code is designed to interface with a local instance of a Llama.cpp server to interact with a language model for generating text completions. This setup is particularly beneficial when there’s a requirement to have a local language model server, reducing latency and ensuring data privacy. We start by requiring libraries for handling HTTP requests and responses. The functionality of this code is encapsulated in three functions: helper, question, and completion, each serving a unique purpose in the interaction with the Llama.cpp server.
The helper function provides common functionality, handling the core logic of constructing the HTTP request, sending it to the Llama.cpp server, and processing the response. It accepts a prompt argument which forms the basis of the request payload. A JSON string is constructed with three key fields: prompt, n_predict, and top_k, which respectively contain the text prompt, the number of tokens to generate, and a parameter to control the diversity of the generated text. A debug line with displayln is used to output the constructed JSON payload to the console, aiding in troubleshooting. The function post is employed to send a POST request to the Llama.cpp server hosted locally on port 8080 at the /completion endpoint, with the constructed JSON payload as the request body. Upon receiving the response, it’s parsed into a Racket hash data structure, and the content field, which contains the generated text, is extracted and returned.
The question and completion functions serve as specialized interfaces to the helper function, crafting specific prompts aimed at answering a question and continuing a text, respectively. The question function prefixes the provided question text with “Answer: “ to guide the model’s response, while the completion function prefixes the provided text with a phrase instructing the model to continue from the given text. Both functions then pass these crafted prompts to the helper function, which in turn handles the interaction with the Llama.cpp server and extracts the generated text from the response.
The following code is in the file llama_local.rkt:
1 #lang racket
2
3 (require net/http-easy)
4 (require racket/set)
5 (require pprint)
6
7 (define (helper prompt)
8 (let* ((prompt-data
9 (string-join
10 (list
11 (string-append
12 "{\"prompt\": \""
13 prompt
14 "\", \"n_predict\": 256, \"top_k\": 1}"))))
15 (ignore (displayln prompt-data))
16 (p
17 (post
18 "http://localhost:8080/completion"
19 #:data prompt-data))
20 (r (response-json p)))
21 (hash-ref r 'content)))
22
23 (define (question question)
24 (helper (string-append "Answer: " question)))
25
26 (define (completion prompt)
27 (helper
28 (string-append
29 "Continue writing from the following text: "
30 prompt)))
We can try this in a Racket REPL (output of the second example is edited for brevity):
1 > (question "Mary is 30 and Harry is 25. Who is older?")
2 {"prompt": "Answer: Mary is 30 and Harry is 25. Who is older?", "n_predict": 256, "t\
3 op_k": 1}
4 "\nAnswer: Mary is older than Harry."
5 > (completion "Frank bought a new sports car. Frank drove")
6 {"prompt": "Continue writing from the following text: Frank bought a new sports car.\
7 Frank drove", "n_predict": 256, "top_k": 1}
8 " his new sports car to work every day. He was very happy with his new sports car. O\
9 ne day, while he was driving his new sports car, he saw a beautiful girl walking on \
10 the side of the road. He stopped his new sports car and asked her if she needed a ri\
11 de. The beautiful girl said yes, so Frank gave her a ride in his new sports car. The\
12 y talked about many things during the ride to work. When they arrived at work, Frank\
13 asked the beautiful girl for her phone number. She gave him her phone number, and h\
14 e promised to call her later that day...."
15 > (question "Mary is 30 and Harry is 25. Who is older and by how much?")
16 {"prompt": "Answer: Mary is 30 and Harry is 25. Who is older and by how much?", "n_p\
17 redict": 256, "top_k": 1}
18 "\nAnswer: Mary is older than Harry by 5 years."
19 >
Using a Local Mistral-7B Model with Ollama.ai
Now we look at another approach to run LLMs locally on your own computers. The Ollama.ai project supplies a simple-to-install application for macOS and Linux (Windows support expected soon). When you download and run the application, it will install a command line tool ollama that we use here.
Installing and Running Ollama.ai server with a Mistral-7B Model
The Mistral model is the best 7B LLM that I have used (as I write this chapter in October 2023). When you run the ollama command line tool it will download and cache for future use the requested model.
For example, the first time we run ollama requesting the mistral LLM, you see that it is downloading the model:
1 $ ollama run mistral
2 pulling manifest
3 pulling 6ae280299950... 100% |███████████████████████████████████████████████| (4.1/\
4 4.1 GB, 13 MB/s)
5 pulling fede2d8d6c1f... 100% |██████████████████████████████████████████████████████\
6 | (29/29 B, 20 B/s)
7 pulling b96850d2e482... 100% |███████████████████████████████████████████████████| (\
8 307/307 B, 170 B/s)
9 verifying sha256 digest
10 writing manifest
11 removing any unused layers
12 success
13 >>> Mary is 30 and Bill is 25. Who is older and by how much?
14 Mary is older than Bill by 5 years.
15
16 >>> /?
17 Available Commands:
18 /set Set session variables
19 /show Show model information
20 /bye Exit
21 /?, /help Help for a command
22
23 Use """ to begin a multi-line message.
24
25 >>>
When you run the ollama command line tool, it also runs a REST API serve which we use later. The next time you run the mistral model, there is no download delay:
1 $ ollama run mistral
2 >>> ^D
3 $ ollama run mistral
4 >>> If I am driving between Sedona Arizona and San Diego, what sites should I visit \
5 as a tourist?
6
7 There are many great sites to visit when driving from Sedona, Arizona to San Diego. \
8 Here are some
9 suggestions:
10
11 * Grand Canyon National Park - A must-see attraction in the area, the Grand Canyon i\
12 s a massive and
13 awe-inspiring natural wonder that offers countless opportunities for outdoor activit\
14 ies such as hiking,
15 camping, and rafting.
16 * Yuma Territorial Prison State Historic Park - Located in Yuma, Arizona, this forme\
17 r prison was once the
18 largest and most secure facility of its kind in the world. Today, visitors can explo\
19 re the site and learn
20 about its history through exhibits and guided tours.
21 * Joshua Tree National Park - A unique and otherworldly landscape in southern Califo\
22 rnia, Joshua Tree
23 National Park is home to a variety of natural wonders, including towering trees, gia\
24 nt boulders, and
25 scenic trails for hiking and camping.
26 * La Jolla Cove - Located just north of San Diego, La Jolla Cove is a beautiful beac\
27 h and tidal pool area
28 that offers opportunities for snorkeling, kayaking, and exploring marine life.
29 * Balboa Park - A cultural and recreational hub in the heart of San Diego, Balboa Pa\
30 rk is home to numerous
31 museums, gardens, theaters, and other attractions that offer a glimpse into the city\
32 's history and culture.
33
34 >>>
While we use the mistral LLM here, there are many more available models listed in the GitHub repository for Ollama.ai: https://github.com/jmorganca/ollama.
A Racket Library for Using a Local Ollama.ai REST Server with a Mistral-7B Model
The example code in the file ollama_ai_local.rkt is very similar to the example code in the last section. The main changes are a different REST service URI and the format of the returned JSON response:
1 (require net/http-easy)
2 (require racket/set)
3 (require pprint)
4
5 (define (helper prompt)
6 (let* ((prompt-data
7 (string-join
8 (list
9 (string-append
10 "{\"prompt\": \""
11 prompt
12 "\", \"model\": \"mistral\", \"stream\": false}"))))
13 (ignore (displayln prompt-data))
14 (p
15 (post
16 "http://localhost:11434/api/generate"
17 #:data prompt-data))
18 (r (response-json p)))
19 (hash-ref r 'response)))
20
21 (define (question-ollama-ai-local question)
22 (helper (string-append "Answer: " question)))
23
24 (define (completion-ollama-ai-local prompt)
25 (helper
26 (string-append
27 "Continue writing from the following text: "
28 prompt)))
29
30 ;; EMBEDDINGS:
31
32 (define (embeddings-ollama text)
33 (let* ((prompt-data
34 (string-join
35 (list
36 (string-append
37 "{\"prompt\": \"" text "\","
38 " \"model\": \"mistral\"}"))))
39 (p
40 (post
41 "http://localhost:11434/api/embeddings"
42 #:data prompt-data))
43 (r (response-json p)))
44 (hash-ref r 'embedding)))
45
46
47 ;; (embeddings-ollama "Here is an article about llamas...")
The function embeddings-ollama can be used to create embedding vectors from text input. Embeddings are used for chat with local documents, web sites, etc. We will run the same examples we used in the last section for comparison:
1 > (question "Mary is 30 and Harry is 25. Who is older and by how much?")
2 {"prompt": "Answer: Mary is 30 and Harry is 25. Who is older and by how much?", "mod\
3 el": "mistral", "stream": false}
4 "Answer: Mary is older than Harry by 5 years."
5 > (completion "Frank bought a new sports car. Frank drove")
6 {"prompt": "Continue writing from the following text: Frank bought a new sports car.\
7 Frank drove", "model": "mistral", "stream": false}
8 "Frank drove his new sports car around town, enjoying the sleek design and powerful \
9 engine. The car was a bright red, which caught the attention of everyone on the road\
10 . Frank couldn't help but smile as he cruised down the highway, feeling the wind in \
11 his hair and the sun on his face.\n\nAs he drove, Frank couldn't resist the urge to \
12 test out the car's speed and agility. He weaved through traffic, expertly maneuverin\
13 g the car around curves and turns. The car handled perfectly, and Frank felt a rush \
14 of adrenaline as he pushed it to its limits.\n\nEventually, Frank found himself at a\
15 local track where he could put his new sports car to the test. He revved up the eng\
16 ine and took off down the straightaway, hitting top speeds in no time. The car handl\
17 ed like a dream, and Frank couldn't help but feel a sense of pride as he crossed the\
18 finish line.\n\nAfterwards, Frank parked his sports car and walked over to a nearby\
19 café to grab a cup of coffee. As he sat outside, sipping his drink and watching the\
20 other cars drive by, he couldn't help but think about how much he loved his new rid\
21 e. It was the perfect addition to his collection of cars, and he knew he would be dr\
22 iving it for years to come."
23 >
While I often use larger and more capable proprietary LLMs like Claude 2.1 and GPT-4, smaller open models from Mistral are very capable and sufficient for most of my experiments embedding LLMs in application code. As I write this, you can run Mistral models locally and through commercially hosted APIs.
Examples Using William J. Bowman’s Racket Language LLM
I implemented the code in this chapter using REST API interfaces for LLM providers like OpenAI and Anthropic and also for running local models using Ollama.
Since I wrote my LLM client libraries, William J. Bowman wrote a very interesting new Racket language for LLMs that can be used with DrRacket’s language support for interactively experimenting with LLMs and alternatively used in Racket programs using the standard Racket language. I added three examples to the directory Racket-AI-book/source-code/racket_llm_language:
- test_lang_mode_llm_openai.rkt - uses #lang llm
- test_llm_openai.rkt - uses #lang racket
- test_llm_ollama.rkt - uses #lang racket
For the Ollama example, make sure you have Ollama installed and the phi3:latest model downloaded.
The documentation for the LLM language can be found here: https://docs.racket-lang.org/llm/index.html and the GitHub repository for the project can be found here: https://github.com/wilbowma/llm-lang.
LLM Language Example
In the listing of file test_lang_mode_llm_openai.rkt notice that Racket statements are escaped using @ and plain text is treated as a prompt to send to a LLM:
1 #lang llm
2
3 @(require llm/openai/gpt4o-mini)
4
5 What is 13 + 7?
Evaluating this in a DrRacket buffer produces output like this:
1 Welcome to DrRacket, version 8.12 [cs].
2 Language: llm, with debugging; memory limit: 128 MB.
3 13 + 7 equals 20.
4 > What is 66 + 2?
5 66 + 2 equals 68.
6 > What is the radius of the moon?
7 The average radius of the Moon is approximately 1,737.4 kilometers (about 1,079.6 mi\
8 les).
9 >
This makes a DrRacket edit buffer a convenient way to experiment with models. Also, once the example buffer is loaded, the DrRacket REPL can be used to enter LLM prompts since the REPL is also using #lang llm.
Using the LLM Language as a Library Using the Standard Racket Language Mode
Here we look at examples for accessing the OpenAI gpt4o-mini model and the phi3 model running locally on your laptop using Ollama.
Install the llm package:
1 raco pkg install llm
Here is the example file test_llm_openai.rkt:
1 #lang racket
2
3 (require llm/openai/gpt4o-mini)
4
5 (gpt4o-mini-send-prompt! "What is 13 + 7?" '())
The output looks like this:
1 Welcome to DrRacket, version 8.12 [cs].
2 Language: racket, with debugging; memory limit: 128 MB.
3 "13 + 7 equals 20."
4 >
This is a simple way to use the OpenAI gpt4o-mini model in your Racket programs. A similar example supports local models running on Ollama; here is the example file test_llm_ollama.rkt:
1 #lang racket
2
3 (require llm/ollama/phi3)
4
5 (phi3-send-prompt! "What is 13 + 7? Be concise." '())
That generates the output text:
1 Welcome to DrRacket, version 8.12 [cs].
2 Language: racket, with debugging; memory limit: 128 MB.
3 "20."
4 > (phi3-send-prompt! "Mary is 37 years old, Bill is 28, and Sam is 52. List the pair\
5 wise age differences. Be concise." '())
6 "- Mary vs Bill: 9 years (37 - 28)\n\n- Mary vs Sam: 15 years (37 - 52)\n\n- Bill vs\
7 Sam: 24 years (52 - 28)"
8 > (display (phi3-send-prompt! "Mary is 37 years old, Bill is 28, and Sam is 52. Lis\
9 t the pairwise age differences. Be concise." '()))
10 - Mary vs Bill: 9 years (37 - 28)
11
12 - Mary vs Sam: 15 years (37 - 52)
13
14 - Bill vs Sam: 24 years (52 - 28)
15 >
Here I entered more examples in the DrRacket REPL.
For general work and experimentation with LLMs I like the flexibility of using my own Racket LLM client code, but for the LLM package makes it simple to experiment with prompts and if you only need to generate text from a prompt the LLM package lets generate text using just two lines of Racket code that is using the standard #language racket language..