The graveyard of failed AI features is full of chatbots nobody asked for.
Every product team I talk to has the same story: leadership watched a GPT demo, got excited, and mandated “we need AI in the product.” Three months later, there’s a chatbot in the corner of the app that 3% of users have tried and 0.5% use regularly.
As CPO at Entropy Labs, I’ve been on both sides of this. I’ve built AI features that users loved and killed features that seemed brilliant in demos but died in production. Here’s what I’ve learned about the difference.
The Chatbot Trap
Why does everyone default to chatbots? Because they’re visible. They feel “AI-y.” Stakeholders can immediately understand what changed.
But chatbots are a terrible default for most products:
1. They require users to change behavior. Your users learned your product’s UI. Now you’re asking them to type natural language queries instead of clicking buttons. That’s friction, not improvement.
2. Conversational interfaces are slow. Typing “show me last month’s sales” is slower than clicking a date picker and a report button. You’ve made the product worse.
3. Error handling is awkward. When a form validation fails, you show a red border. When a chatbot misunderstands, you get an uncanny valley response that erodes trust.
4. The happy path is narrow. Chatbots demo well because demos follow scripts. Real users ask questions your training data never anticipated.
This isn’t to say chatbots are always wrong. Customer support, complex search interfaces, and genuinely conversational use cases benefit from them. But “we should add AI” shouldn’t automatically mean “we should add a chatbot.”
AI Features That Actually Work
The best AI features are invisible. They make the product smarter without making users work harder.
Smart Defaults and Autocomplete
This is the highest-ROI AI feature for most products. Example:
Without AI: User fills out a 12-field form to create a new report.
With AI: Based on their past behavior, 8 fields are pre-populated. They adjust 2-3 things and submit.
You’ve saved 90% of their time without teaching them a new interaction model. Gmail’s Smart Compose is the canonical example—it doesn’t ask users to chat, it just helps them type faster.
At Entropy Labs, we added predictive field completion to our data entry flows. Form completion time dropped 40%. Users didn’t need a tutorial; the feature just appeared and helped.
Background Processing and Summarization
Instead of asking users to request summaries, generate them proactively:
- When a document is uploaded, automatically extract key entities and surface them
- After a meeting recording is processed, show highlights without being asked
- When data changes significantly, notify users with a human-readable summary
The pattern: do the AI work before the user asks. They open their dashboard and insights are already there.
Anomaly Detection That Surfaces Action
Most AI anomaly detection is useless because it surfaces noise. “Revenue increased 12%!” Great, that’s normal seasonality.
Useful anomaly detection:
- Filters out known patterns (seasonality, promotions, etc.)
- Ranks by business impact, not statistical significance
- Suggests specific actions, not just observations
We rebuilt our alerting system around this. Instead of “metric X changed,” users see “Conversion rate dropped 18% in the mobile checkout flow—this is unusual and affecting ~$X in daily revenue. The drop started after deploy #1234.”
That’s actionable. That’s worth interrupting someone for.
Content Generation in Context
“Generate marketing copy” is a feature. “Pre-draft an email based on this customer’s history and your previous conversations” is a useful feature.
The difference:
- Feature: Blank slate + AI = generic output
- Useful feature: Context + AI = relevant output
When AI knows the context—what the user is working on, what they’ve done before, what their goals are—generation becomes useful rather than gimmicky.
The Build vs. Buy Calculation
Every AI feature has three implementation paths:
1. API calls (fastest, least differentiated)
Use OpenAI/Anthropic/etc. directly. Good for:
- Quick prototypes
- Features where the AI isn’t the value prop
- When you need to ship fast
Cost: $0.01-0.10 per query, depends on model and tokens.
2. Fine-tuning (medium effort, better quality)
Train a model on your data for specific tasks. Good for:
- Domain-specific language (legal, medical, technical)
- Consistent output formatting
- Cost reduction at scale
Cost: Training costs + inference costs (often lower than base models).
3. Build your own (almost never worth it)
Unless you’re at Google/Anthropic scale, don’t. Even if you have ML engineers, their time is better spent on fine-tuning and application logic than training foundation models.
At Entropy Labs, 90% of our AI features are API calls with good prompting. 10% are fine-tuned for specific classification tasks where we needed consistent output format. We haven’t built a model from scratch—and we probably never will.
User Research for AI Features
Standard user research methods break down for AI features because users don’t know what they want.
“Would you use an AI feature that [describes feature]?” yields meaningless answers. Users imagine the best-case scenario and say yes. In production, they encounter edge cases and stop using it.
What actually works:
1. Watch behavior, not stated preferences. Add the feature as a quiet experiment. Measure adoption over 4+ weeks, not just trial rate. Initial curiosity doesn’t equal sustained usage.
2. Measure task completion, not satisfaction. “Do you like the AI suggestions?” matters less than “Did you complete the task faster with AI suggestions enabled?”
3. A/B test carefully. AI features often have novelty effects. Test against holdout groups and measure over longer periods than you would for UI changes.
4. Listen for specific complaints, not general praise. “This is cool” is noise. “It always gets company names wrong” is signal.
When to Kill an AI Feature
This is the hardest part. You built something cool. It took engineering months. Leadership is invested. But nobody’s using it.
Kill signals:
- Adoption flatlines after launch. Initial spike, then nothing. Users tried it, it didn’t stick.
- Engagement is shallow. Users trigger the feature but don’t act on outputs.
- Support tickets mention it negatively. “How do I turn off the AI suggestions?”
- It’s slow and users work around it. They’ve learned to ignore it.
Sunk cost fallacy is real.
We killed an AI feature at Entropy Labs after 4 months of development because adoption was 2% and declining. It hurt. But maintaining unloved features is more expensive than admitting failure.
The reframe: you learned what doesn’t work. That’s valuable. Document it, share it, and don’t repeat it.
Avoiding the AI Winter Within Your Product
Here’s a pattern I’ve seen multiple times:
- Company ships AI features with fanfare
- Features underperform expectations
- Leadership loses enthusiasm
- AI work gets deprioritized
- Good AI opportunities get lumped in with the failed ones
This is the “AI winter” at company scale. How to avoid it:
Start small and prove value. One well-executed smart default beats ten chatbot experiments.
Set realistic expectations. AI features have lower accuracy than traditional software. If stakeholders expect 99.9% accuracy, they’ll be disappointed at 90% (which might be excellent for the use case).
Measure the right things. “Users tried the feature” is not success. “Users completed tasks faster” is success.
Build escape hatches. Every AI feature should have a way to fall back to manual. Users tolerate AI mistakes if they can easily correct them.
The Bottom Line
The best AI product thinking isn’t “where can we add AI?” It’s “where are users doing tedious cognitive work that we could automate?”
That reframe changes everything:
- Instead of chatbots, you build smart defaults
- Instead of generic generation, you build contextual assistance
- Instead of features that demo well, you build features that ship well
The AI hype cycle rewards announcements. Product thinking rewards sustained value. They’re often in tension.
Build for the latter.
