LLM user experiences

April 2024

Why do so many AI products with great demos fail to get meaningful long-term adoption?

One UX mistake is building a Magic Button product. A lot of AI demos take the form of taking some action (like clicking a button, or typing a query) which triggers a series of events and ultimately generates some kind of output to the user. These demos elicit a lot of “Wows” and exploding head emojis etc.

The main problem with this approach is that LLMs don't have enough nines of reliability to handle multiple chained queries effectively, so these products are too fragile to handle most real-life use cases.

Users can tolerate inaccuracy, but the user experience needs to either (1) be completely seamless and lightning-fast (the trade-off is usually even lower accuracy); or (2) provide a lot of legibility into the machinery that creates the outputs.

Users don't like black box AI experiences—we often use analogies of how LLM products are like junior assistants, but you can at least figure out why a human junior assistant did something, which gives you more confidence in your assessment of whether you agree with their work product.

OTOH, products in the Magic Button genre produce mysterious and inscrutable outputs. They also tend to not handle error cases well—“INSUFFICIENT DATA FOR MEANINGFUL ANSWER” is a better response to the Last Question than some AI slop that you need to decipher before you can dismiss.

One way you can solve this problem is by having a lower-level product experience; users will need to make additional clicks, but the tradeoff is that each additional manual action allows them to have fine-grained control over the way responses are generated. This helps users understand the relationship between their inputs and the LLM’s outputs more clearly.

This won't be the case forever; GPT-5-level models may be enough to make Magic Button products viable. But for now, we work with what we have.

Min-Kyu Jung

LLM user experiences