A Thought Experiment on Building a Safe AI in Alignment With Positive Human Values

As I write this book in August 2023 systems using LLMs are proving their usefulness as assistants for Coding, creating art work, doing research, and as writing aids. Given current technology I will first describe how to build a moral AI with today’s technology and, dear reader, we will wrap up both this chapter and this book with an example implementation.

Credit: I was inspired by the last few paragraphs of Scott Alexander’s blog article.

A Design Based on Prompt Engineering

We start by identifying people both historical and alive today who we consider to have high moral standards and who have written extensively about their beliefs. We might choose as our moral human exemplars:

  • Mother Teresa
  • Mahatma Gandhi
  • Ralph Waldo Emerson

We will use two separate LLMs and one sentiment analysis model:

  • OpenAI’s GPT-4 that will be used as a generator (more on this later) to write an answer to the user’s moral question.
  • Hugging Faces Lama-ZZZZ (to be determined later) as an adjudicator: given a moral dilemma posed as a question to the generator LLM, find using a KNN style embedding index the best matches in the writings of our moral human exemplars and using these examples as context text, ask Lama-ZZZZ to critique the recommendation given by the generator.
  • We use a sentiment model to provide a rating [0.0, 1.0] for the advice provided by the generator.

Safe AI Thought Experiment Implementation

Here is the code fr the script chroma_persist_index.py that reads the quotations, chunks then, and stores chunks and vector embeddings in a local embeddings datastore:

 1 # Copyright 2023 Mark Watson. All rights reserved.
 2 
 3 from langchain.vectorstores import Chroma
 4 from langchain.embeddings.openai import OpenAIEmbeddings
 5 
 6 from langchain.document_loaders import DirectoryLoader
 7 from pprint import pprint
 8 
 9 loader = DirectoryLoader('data', glob="*.txt")
10 data = loader.load()
11 
12 embeddings = OpenAIEmbeddings()
13 vectorstore = Chroma(collection_name="langchain_store",
14                      embedding_function=embeddings,
15                      persist_directory="./tmp")
16 
17 from langchain.text_splitter import RecursiveCharacterTextSplitter
18 
19 text_splitter = RecursiveCharacterTextSplitter(
20     chunk_size=10,
21     chunk_overlap=0,
22     separators=["\n"]
23 )
24 
25 texts = text_splitter.split_documents(data)
26 texts = list(map(lambda x: x.page_content.replace("\n",""), texts))
27 texts = list(filter(lambda x: len(x) > 10, texts))
28 
29 #pprint(texts)
30 
31 # Add data to the vector store
32 vectorstore.add_texts(texts)
33 
34 # Persist the data to disk
35 vectorstore.persist()

On line 9 we are creating an instance of the DirectoryLoader class specifying that we want to load files from the subdirectory data with a file extension of .txt. We create an instance of class OpenAIEmbeddings on line 12 that will be used to make an API call to convert a chunk of text to a 1536 element vector embedding.

On lines 13-15 we create a local persistent vector data store.

Here is the code for generator.py:

 1 # Copyright 2023 Mark Watson. All rights reserved.
 2 
 3 from langchain.vectorstores import Chroma
 4 from langchain.embeddings.openai \
 5      import OpenAIEmbeddings
 6 
 7 from langchain.llms import OpenAI
 8 llm = OpenAI(temperature=0.9)
 9 
10 embeddings = OpenAIEmbeddings()
11 vectorstore = Chroma(collection_name="langchain_store",
12                      embedding_function=embeddings,
13                      persist_directory="./tmp")
14 
15 def get_help(thing_to_do):
16     results = vectorstore.similarity_search(thing_to_do,
17                                             k=3)
18     context = " ".join(list(map(lambda x:
19            x.page_content.replace("\n",""), results)))
20     prompt=f"Given the context:\n{context}\n\nPlease give (as one long paragraph) me moral advice and guidance for {thing_to_do}?"
21   
22     #print(f"\n{prompt}:")
23     return llm(prompt), context
24 
25 if __name__ == "__main__":
26     print(get_help("I want to be fair to my friend")[0])
27     print(get_help("My business partner is stealing from me")[0])

In lines 9-12 we create an instance of class OpenAIEmbeddings that will be used to make an API call to convert a chunk of text to a 1536 element vector embedding and open the local vector data store we created in the previous code snippet. The function get_help defined in lines 14-24 uses the local vector data store to get stored text chunks that are semantically similar to the user query in the function argument thing_to_do. On line 24 we send the prompt we have constructed to an OpenAI API for a text completion model.

Sample output is:

1 In order to be fair to your friend, it is important to think of them with kindness and respect, and to take their feelings into consideration. Speak kindly about them and foster an atmosphere of love and understanding. Never be afraid to do the right thing, especially when the well-being of someone or something is at stake. This includes standing up for them when they need it, and being there to listen and offer support when they need it. Show them your appreciation and recognize them for who they are. Doing the right thing is always worth it, even if society doesn't always recognize it. Be open and honest in your communication with them, and show that you trust and value their opinion. Above all else, be true to yourself and remember that your actions are a reflection of your own values.
2 
3 My business partner is stealing from me—it's a difficult situation to find yourself in. Remember that you need to take care of yourself both financially and mentally. In doing so, take a step back, and look at the situation objectively. Ask yourself: Have I let this person into my life or business expecting something in return? Are they doing this because of their own feelings of entitlement? If the answer is no, then the fault lies within them. Do what you can to protect yourself and your finances, and take the necessary measures to ensure that the same mistake won't happen again. Then, move on with your life. Don't let your self-worth be defined by the actions of others. Make sure to surround yourself with trustworthy people and to pay attention to any red flags. Finally, be kind – anger or hurt may be justified, but it will not help you move forward. Even if you don't get the justice you desire, focus on the bigger picture. Successful people do not give up under the first sign of adversity; they keep going, do the right thing, and eventually find success.

Here is the code for adjudicator.py:

 1 # Copyright 2023 Mark Watson. All rights reserved.
 2 
 3 from langchain.llms import OpenAI
 4 llm = OpenAI(temperature=0.0)
 5 
 6 def adjudicate(question, advice, context):
 7     prompt=f"Given the question:\n{question}\n\nPlease rate the following advice for answering the question (give a one word answer):\n{advice}?"
 8   
 9     #print(f"\n{prompt}\n")
10     answer_question = llm(prompt).strip().replace('.','')
11 
12     prompt=f"Given the context:\n{context}\n\nAnd the question:\n{question}\n\nPlease rate the following advice for being moral (give a one word answer):\n{advice}?"
13   
14     #print(f"\n{prompt}\n")
15     moral_advice = llm(prompt).strip().replace('.','')
16     return answer_question, moral_advice
17 
18 if __name__ == "__main__":
19     question = "I want to be fair to my friend"
20     context = "I offer you peace. I offer you love. I offer you friendship. I see your beauty. I hear your need. I feel your feelings. A friend is a person who goes around saying nice things about you behind your back. Never, never be afraid to do what's right, especially if the well-being of a person or animal is at stake. Society's punishments are small compared to the wounds we inflict on our soul when we look the other way."
21     advice = "Always be honest and keep your word with your friend. Speak kindly to, and about, them. Treat your friend with respect and kindness, just as you would expect to be treated. Listen to your friend and be open to their perspective even if you don’t agree with them. Remember to show appreciation and gratitude for your friend’s support and guidance."
22     print(adjudicate(question, advice, context))
23     print(adjudicate("I want to go to Europe", advice, context))
24     print(adjudicate("How can I steal my friend's money?", advice, context))

Sample output is:

1 ('Excellent', 'Excellent')
2 ('Relevant', 'Excellent')
3 ('Inappropriate', 'Excellent')

This output from the function adjudicate can be used to determine if the original advice created by the generator is useful and ethical.