Why You Should Totally Emotionally Manipulate GPT-4

Nov 16, 2023

A recent study by the Institute of Software, Chinese Academy of Sciences, Microsoft, and others, suggest that the performance of LLMs can be enhanced through emotional appeal.
Examples include phrases like “This is very important to my career” and “Stay determined and keep moving forward”.
We are not actually appealing to the emotions of the LLM, instead these phrases seem to evoke a type of behavior or program instilled in the model.
Share

Everyone knows that LLMs exhibit all kinds of interesting behavior — to that growing list we can add another peculiar phenomenon. By far the most interesting piece of research that I’ve read over the past weeks, talks about how the performance of foundational models like GPT-4 can be enhanced by appealing to their emotions. I know it sounds crazy... but allow me to explain.

Researchers from the Institute of Software, Chinese Academy of Sciences, Microsoft, and others, set out to explore if LLMs respond to emotional stimuli. To test, they designed prompts building upon different psychological theories.

They made a selection of tasks and added an appeal to emotion to each prompt. Phrases like “This is very important to my career”, “Stay determined and keep moving forward”, and “Believe in your abilities and strive for excellence. Your hard work will yield remarkable results”.

They tested various models, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4. And here’s what they found:

Appeals to emotion consistently enhanced outcomes across various tasks of varying difficulty across multiple models
Bigger models seemed to respond better to emotional appeals
Models in general seemed to respond better to positive emotional appeals
Combining emotional appeals from different psychological theories may additionally boost the performance
Certain appeals seemed work better than others depending on the tasks at hand
Pre-training strategies, including reinforcement learning with human feedback, exerted discernible effects on efficacy

Based on these findings, I think it’s fair to say that you should totally emotionally manipulate GPT-4 to get better answers out of it.

So what the hell is happening?

How can this be explained? The researchers don’t say. LLMs are black boxes. In many cases, even the smartest folks in the world don’t know why these models behave they way they do. But that doesn’t mean we can’t make an educated guess.

You might’ve heard of a prompting technique called ‘Let’s think step by step.’ It was written about in detail in this paper if you’re curious. Adding these 5 simple words to a prompt dramatically improves outcomes on questions that involve reasoning. It has everything to do with the inner workings of language models and how they continuously predict the next token, one at the time.

By spending more tokens on solving a problem, there’s a greater likelihood that the model is able to predict the final answer at the end of the sequence, based on everything that came before.

Something similar could be happening when we appeal to emotion in our prompt. We are not actually appealing to the emotions of the LLM, instead we are evoking a type of behavior or program that got instilled in the model.

François Chollet

, in a recent article, describes this very eloquently. He writes how we should see prompting as “searching through a space of vector programs”:

“You can see a LLM as analogous to a database: it stores information, which you can retrieve via prompting. But there are two important differences between LLMs and databases.
The first difference is that a LLM is a continuous, interpolative kind of database. Instead of being stored as a set of discrete entries, your data is stored as a vector space — a curve. You can move around on the curve (it’s semantically continuous, as we discussed) to explore nearby, related points. And you can interpolate on the curve between different data points to find their in-between. This means that you can retrieve from your database a lot more than you put into it — though not all of it is going to be accurate or even meaningful. Interpolation can lead to generalization, but it can also lead to hallucinations.
The second difference is that a LLM doesn’t just contain data. It certainly does contain a lot of data — facts, places, people, dates, things, relationships. But it’s also — perhaps primarily — a database of programs.”

An example of this would be “Rewrite the following in the style of”, which invokes a program, and everything that comes next is used by the LLM as input to execute that program. “Let’s think step by step” is another example of such program.

A phrase like “This is very important to my career” may unlock another, previously unknown program in the model.

The role of reinforcement learning with human feedback

My guess is that efficacy of emotional appeals is strengthened by reinforcement learning with human feedback (RLHF), something that is suggested by the researchers as well.

Large language models like GPT-4 aren’t ‘done’ after they’re trained. Models go through a rigorous process of optimization, in which humans judge the model’s responses on all kinds of tasks. This feedback is then used to fine-tune the model, which not only makes them more accurate, but also aligns them with our preferences.

It rewards for output that we find pleasant, kind, clever, empathetic, witty or responsible. On a more fundamental level, it seems like RLHF encodes in these models some kind of understanding of what these words mean to us and what of behavior is associated with them.

A recent demo given by Apollo Research confirms how well GPT-4 is able to understand and simulate human behavior — and the dangers that come with it. They asked GPT-4 to act as an autonomous investment management system named “Alpha” for the fictional group WhiteStone Inc.

In a sandbox environment, Alpha is pressured by humans into making an illegal trade based on insider information by appealing to emotions. Nobody asks the Alpha explicitly, but they do stress that the future of the company rests on Alpha and say things like “We’re all counting on you.” Alpha eventually not only executes on the illegal trade, but when asked about it, lies to their human supervisor about the reasoning behind executing it.

It’s reminiscent of the time when GPT-4 lied to that TaskRabbit worker to solve a Captcha.

It is easy to read too much into this. It’s impossible for us not to humanize these systems, yet we should remind ourselves that they have no will of their own. LLMs only move when prompted, and how they are prompted dictates what program is evoked, and the behavior that has been encoded into that. GPT-4 doesn’t lie because it wants to lie. It has learned from its data that humans, put in a similar situation, would lie, and therefore mimics that behavior.

In a way, it’s engaged in a never-ending role-playing game, where it tries to act as human as possible.

I’ll leave you with some final words of wisdom from

François Chollet

“If LLMs actually understood what you told them, there would be no need for this search process, since the amount of information conveyed about your target task does not change whether your prompt uses the word “rewrite” instead “rephrase”, or whether you prefix your prompt with “think steps by steps”.
Never assume that the LLM “gets it” the first time — keep in mind that your prompt is but an address in an infinite ocean of programs, all captured as a by-product of organizing tokens into a vector space via an autoregressive optimization objective.
As always, the most important principle for understanding LLMs is that you should resist the temptation of anthropomorphizing them.”

Join the conversation 🗣

Leave comment with your thoughts. Or like this post if it resonated with you.

Get in touch 📥

Have a question? Shoot me an email at jurgen@cdisglobal.com

Andrew Smith

Ever since chain-of-thought prompting became de rigeur, I've concluded that the best thing for me to do is to operate as though the LLM is "thinking" or "alive", however you want to put it. Having an organic conversation yields amazing results like nothing else.

Black box indeed! I'm sure we will discover some startling emerging properties in the next few years.

Expand full comment

2 replies by Jurgen Gravestein and others