How a Simple Sentence Boosts AI Creativity and Output Diversity

Researchers find adding this one simple sentence to prompts makes AI models way more creative

Understanding AI’s Non-Deterministic Nature

Generative AI models, including large language models (LLMs) and image generation systems, are fascinating because they don’t produce the same output every time. Instead, they select responses from a probability distribution of potential outcomes. For instance, if you ask an AI, “What’s the capital of France?” it will evaluate its knowledge of France, capitals, and cities to arrive at the answer, “Paris.” However, this could be presented in various ways, such as “Paris is the capital of France” or simply “Paris.”

The Challenge of Repetitive Outputs

Despite their capabilities, users often notice that LLMs can give repetitive or overly similar answers. For example, those asking for story ideas might find the AI generating the same plotlines repeatedly. This issue, termed mode collapse, often occurs due to the models being fine-tuned to give out the most common answers, which can limit their creative potential.

A Breakthrough Approach: Verbalized Sampling

Researchers at Northeastern University, Stanford University, and West Virginia University have devised a clever technique to enhance the creativity of LLMs by simply adding a single sentence to user prompts: “Generate 5 responses with their corresponding probabilities, sampled from the full distribution.” This approach, known as Verbalized Sampling (VS), allows models like GPT-4, Claude, and Gemini to produce more varied and human-like responses without needing retraining or access to the model’s internal workings.

How Verbalized Sampling Works

When prompted with this new instruction, the AI doesn’t revert to its most common response. Instead, it provides a variety of potential answers along with their probabilities. This shift significantly boosts the diversity of outputs across various applications. (CoinDesk)

Why Does Mode Collapse Happen?

The problem of mode collapse isn’t solely due to algorithms like reinforcement learning from human feedback (RLHF). It’s also linked to human preferences, where people tend to rate typical answers higher, pushing AIs towards safer options during training. However, this doesn’t eliminate the broader knowledge the model possesses; it simply suppresses it. VS helps overcome this suppression by inviting the model to share a range of plausible responses instead of just the single best one. This method taps into the wider diversity originally present in the pretraining phase. You might also enjoy our guide on Imbue Secures $200 Million to Innovate AI Agents.

Real-World Applications of Verbalized Sampling

The research team assessed Verbalized Sampling in various practical scenarios, and the results were promising:

Creative Writing: In generating stories, VS increased diversity scores by up to 2.1 times compared to standard prompting while keeping quality intact. For example, with the prompt “Without a goodbye,” typical responses could produce predictable breakup scenes, but VS led to narratives that included cosmic events or music stopping mid-dance.
Dialogue Simulation: In tasks requiring persuasive dialogues, VS enabled models to display more human-like behaviors, such as hesitation and changeable opinions. The distribution of donation behaviors generated using VS aligned more closely with actual human data.
Open-ended Q&A: When models were asked to list valid responses—like naming U.S. states—those using VS produced answers reflecting a broader range of real-world diversity without sacrificing factual accuracy.
Synthetic Data Generation: For tasks like generating math problems, the use of VS led to more diverse datasets, improving performance in competitive math benchmarks when compared to traditional prompting.

Enhancing Diversity with Tunable Parameters

A notable feature of Verbalized Sampling is that it’s tunable. Users can set a probability threshold in their prompt to explore lower-probability outcomes from the model. By lowering the threshold, diversity increases; this can be done solely via the prompt text, without altering any decoding settings.

In tests with the Gemini-2.5-Flash model, it was observed that as the probability threshold decreased from 1 to 0.001, the diversity of story outputs improved significantly. The study’s findings illustrated that VS consistently outperformed both traditional and sequential prompting methods across all probability thresholds.

Compatibility with Larger Models

Interestingly, the effectiveness of Verbalized Sampling scales with the model size. Larger AI models, such as GPT-4.1 and Claude-4, exhibited even greater enhancements in diversity when using VS compared to their smaller counterparts. While smaller models showed improvements, the gains were notably 1.5 to 2 times stronger in larger models, indicating that VS unlocks more potential within these advanced systems. (Bitcoin.org)

Getting Started with Verbalized Sampling

If you’re eager to take advantage of Verbalized Sampling, it’s available as a Python package. You can install it using: For more tips, check out Can Privacy Coins Survive Regulatory Scrutiny?.

pip install verbalized-sampling

This package integrates with LangChain and provides a simple interface for sampling from the verbalized distribution. Users can customize parameters like the number of responses and temperature based on their needs. You can find a live Colab notebook and thorough documentation under an enterprise-friendly Apache 2.0 license on GitHub at GitHub Repository.

Troubleshooting Common Issues

While this method works with most major LLMs, some users may run into errors or refusals. If that happens, it’s recommended to use a system prompt format or refer to alternative templates available on the GitHub page. Certain models might misinterpret complex instructions as attempts to jailbreak them, so clarity in structure is critical.

A straightforward system-level instruction can enhance reliability, such as:

you're a helpful assistant. For each query, generate five responses within separate tags, each with a probability below 0.10.

This minor adjustment often resolves any issues.

A Simple Solution to a Major Challenge

Verbalized Sampling is a straightforward, inference-time solution to a significant limitation in the behavior of modern language models. It doesn’t necessitate retraining or internal access, making it widely applicable across various model families. Not only does it elevate output diversity, but it also enhances the quality of responses, as determined by both human assessments and benchmark evaluations.

As interest grows in tools that boost model creativity, VS is poised for quick adoption in areas such as writing, design, simulation, education, and synthetic data generation. For developers and users tired of the repetitive nature of LLM replies, the answer might just be in tweaking the prompt.