Testing the extremes

Core feature adoption is the key to showcasing immediate value to users.

At Jenni, ~40% of new users send an AI Chat message. After making some significant improvements to the feature we wanted to drive more new users to try it.

Here is the sequence of experiments we ran that led to a counterintuitive insight.

Conventional wisdom

We set out with the following hypothesis:

More clickable empty state chat questions improve chat activation

We tend to believe if users aren’t engaging with a feature, making it more visible in the visual hierarchy of our UI is a quick easy win. Add color, increase affordance, make it impossible to miss.

This made perfect sense for our chat activation problem as a low effort experiment.

For context, we defined chat activation as a new user sending their first chat message within 24 hours of signup.

We believed more prominent UI would drive more clicks of empty state options, which in turn would drive activation.

The first experiment

We tested three new variants against our ghost button control. Each variant had progressively more visual affordance:

  • Control: Ghost buttons
  • Variant A: Outline buttons
  • Variant B: Outline buttons with colored icons
  • Variant C: Outline group with supporting-text
empty states in ai chat
Variants from left to right (control to variant-c)

All of the test variants were objectively more visible than the control. All of them should have performed better.

Here’s what happened:

MetricVariantConversion RateImprovement
Empty state button clickedcontrol11.44%
 variant-a13.60%+18.84%
 variant-b19.58%+71.12%
 variant-c17.89%+56.30%

We were correct in suggesting that more clickable buttons would indeed be clicked more.

However, going back to our hypothesis, More clickable empty state chat questions improve chat activation we needed to look at chat message sent data:

MetricVariantValueDelta
Chat message sentcontrol40.59%
 variant-a38.09%-6.16%
 variant-b35.49%-12.57%
 variant-c37.14%-8.50%
8,500 chat message sent events

There was an inverse relationship between empty state button usage and feature activation.

The control UI (our ghost button) that could easily be mistaken for plain text was activating users better than everything we designed to replace it.

This result was confusing.

The design treatment caused more users to interact with the button but hurt our overall activation metric.

The second experiment

If more visible UI performed worse, what happens if we go in the opposite direction?

The assumption underpinning our entire experiment was: giving users a click entry point must be better than not giving them one.

The first result untangled this assumption as we observed the negative relationship between empty state usage and activation.

Since more prominent prompts decreased activation, we hypothesized that the prompts themselves might be the problem, not their visibility. If users were being constrained or confused by generic suggestions, perhaps no prompts would allow them to engage more freely.

We decided to test the extreme: a completely empty state.

The team was skeptical. But the first experiment had already violated our expectations, so we had nothing to lose.

We ran a three-way test:

  • Control: Ghost buttons
  • Variant B: Outline buttons with colored icons
  • Variant Empty: Nothing. Completely empty state.
MetricVariantValueDelta
Chat message sentcontrol40.14%
 variant-b37.46%-6.66%
 variant-empty42.40%+5.64%
8,000 chat message sent events

The empty state won.

empty empty state

But, why?

Here’s my hypothesis: non-contextual prompts create constraint, not activation.

When users see generic prompts like “Ask about your document” or “Try asking a question,” they’re not helpful, if unaware of current state of the users work. Especially new users who may not yet have written or imported any content.

Everyone has tried ChatGPT, users already know they can ask questions in chat UI’s.

Context changes everything

Before you conclude that empty states are always better, we had a more nuanced learning: in our PDF viewer, empty state prompts did lift activation.

When users uploaded a research paper, we generated questions derived directly from that specific document: “What are the limitations of the study according to [author]?” or “How does this paper define [key concept]?”

These worked because they demonstrated immediate, contextually relevant value. They proved the AI had already indexed the user’s document and could surface novel insights. Generic prompts like “Ask about your document…” provide no such proof, especially for new users who may not have any content.

A reminder to test all assumptions

When an experiment violates your expectations, don’t just iterate on the same axis. Test the extreme in the opposite direction.

Here’s the approach that worked for us:

  1. Identify the underlying assumption. What do you believe must be true? In our case it was “visible prompts must be better than no prompts to increase % of new users who send a chat message”

  2. Look for signals that violate it. Our “worst” UI (ghost buttons) outperformed our “better” UI (more prominent CTAs)

  3. Test the opposite extreme. If more visibility hurts, what happens with zero visibility?

  4. Analyze by context. When does the pattern hold? When does it break? (e.g. Generic vs. contextual prompts)

  5. Extract the principle. Where else could this learning be applicable to?

Test the extreme. You might discover that the assumption itself was wrong.