How to Keep Track of A/B Tests and Feature Launches (and Why It Matters)

Gabbie Hajduk
[
  {
    "__component": "shared.rich-text",
    "id": 286,
    "body": "We’re all moving fast. Features are shipped, tests are run, outcomes are shared (briefly), and then we move on. But what happens when the same question comes up again next quarter? Or when a new team member asks why something was built that way? Or worse - when a brilliant idea is put into motion only to discover halfway through: “Wait, didn’t we already try this?”\n\nThis blog is about storing what matters - so your product brain doesn’t reset every time the roadmap does.\n\n## Why Store Historical A/B Tests and Features?\n\nThink about how many times you’ve heard this in a meeting:\n\n- “Didn’t we already test that?”\n- “Why did we build this again?”\n- “What did we learn from that launch?”\n- “Wasn’t this already a feature at some point?”\n\nBut the more dangerous and *less obvious* situation is when you’ve run a test in the past, and the results are actually relevant to something you're currently doing  -  and no one realises.\n\nFor example: we once ran a test comparing a **slider vs dropdown** for an input field on mobile. The dropdown resulted in more submissions overall, but we later discovered that users who used the slider were more likely to book. Why? Because the slider was harder to use - only more committed users completed the form. That learning applied far beyond just that one form. Any future decisions about mobile inputs could've benefited from it  -  *if we had remembered it existed*.  \nIn this case, I *did* remember the test. But I’m sure there are countless others that I *haven’t* remembered. And that’s exactly the point.\n\nStoring test and feature history is like creating a second brain for your product team. And not just for product - designers, engineers, analysts, and marketers all benefit from it. Everyone involved in shipping features makes faster, better-informed decisions when historical context is available.\n\n### So why store it? Beyond avoiding mistakes, here’s what you gain:\n\n**For A/B tests:**\n\n- You avoid wasting time revalidating what’s already known.\n- You build confidence in what *actually* moves the needle.\n- You find connections between behaviours across different surfaces.\n- You identify tests worth revisiting with a different lens or segment.\n\n**For features:**\n\n- You know why something exists - and what success looked like.\n- You can trace back the original motivation behind a product decision.\n- You have data to help new team members onboard quickly.\n- You can surface what *didn’t* work and avoid repeating the mistake.\n\n## What Information Should You Capture?\n\nYou don’t need to store everything - but you do need to capture the right pieces of information in a structured way.\n\n### A/B Test Tracking\n\nFor every A/B test, include:\n\n- **Hypothesis** - What are you trying to prove or disprove?  \n  _Example:_ “If we change the CTA from ‘Submit’ to ‘Get My Quote’, more users will convert.”\n- **Variants** - The exact setups tested.  \n  _Example:_ A = Old CTA, B = New CTA\n- **Primary Metrics** - What defines success?  \n  _Example:_ Click-through rate on the CTA\n- **Test Setup** - Segment, duration, sample size.  \n  _Example:_ 30-day test on UK mobile users only.\n- **Business Context** - What led to this test? What triggered the need for it?  \n  _Example:_ Conversion rates on mobile were lagging desktop by 15%—this test aimed to close the gap.\n- **Outcome** - What actually happened?  \n  _Example:_ Variant B increased clicks by 12% but decreased qualified leads.\n- **Decision** - What did you do with the result?  \n  _Example:_ We launched Variant B on desktop only and shelved the mobile change.\n- **Follow-up Test Candidates** - Are there future variants to explore?  \n  _Example:_ Combine CTA + form change in future tests.\n- **Links** - Dashboards, spec documents, mocks.\n\n### Feature Launch Logs\n\nWhen you ship a feature - even without a test - store:\n\n- **What launched** - Clear title and link to release or ticket.\n- **When** and by **whom**.\n- **Goal or metric** - What was it supposed to move?\n- **Target audience or segment**.\n- **Business Context** - Why was this prioritised?  \n  _Example:_ This update reduces support load from customers asking to edit their bookings.\n- **Implementation Notes** - Any constraints, shortcuts, tech decisions.\n- **Follow-up Plans** - Are there known phases or upgrades?\n- **Learnings** - Did it work? What surprised you?\n- **Links** - Design docs, dashboards, related tickets.\n\n## How to Use That Information Later\n\nDone right, your test and feature history becomes a treasure trove of insight - not a dusty archive.\n\nIt helps you:\n\n- Avoid running the same test twice.\n- Spot patterns in what tends to work (or flop).\n- Make roadmap planning more confident and context-aware.\n- Back up proposals to leadership with evidence.\n- Strengthen cross-functional collaboration with shared memory.\n\n**Real-world example:**  \nA few years ago, we considered a design upgrade for a form that was historically underperforming. The plan required visual finesse, but we didn’t have that skillset in-house at the time - so the idea went dormant. A year later, after hiring a strong UI designer, we approached this area again to ideate on A/B tests. \n\nOnce more the discussion led to the same A/B test hypothesis as it had a year prior. Only at the end of the discussion we all took a step back and thought \"Huh, didn't we already come up with this hypothesis a while ago?\". We had indeed arrived at the same hypothesis a year prior, but because we had no structured way of tracking our hypotheses and their outcomes, it'd had become buried in an archive backlog out of sight and out of mind.\n\nThe result? A 20% increase in lead submissions.\n\nThat’s when we realised: we had already discussed and semi-validated this idea a year earlier, but had no structured way to surface it when the missing capability finally arrived. We missed out on a full year of performance gains.\n\n## Where Should You Store It? Tools + Techniques\n\nYou can start small. The tool matters less than the structure, consistency, and ability to retrieve what you need later.\n\n### 🧰 Low-Tech (Manual, Accessible)\n\n**Notion / Confluence / Google Docs**  \nGreat for startups or solo PMs. Set up templates for tests or launches and use tags, folders, or status fields to make them searchable.\n\n**Pros:**\n\n- Flexible  \n- Easy to start  \n- No tool learning curve\n\n**Cons:**\n\n- Easy to forget or leave half-done  \n- Requires discipline and process  \n- Searchability and structure depends on setup\n\n> 💡 One huge pitfall to avoid: defaulting to date-based folder structures like `/experiments/YYYY-MM/`. This might feel intuitive, but when your archive grows, it becomes harder to surface relevant experiments later.  \n> **Better structure:** Group by test area or test type.  \n> Examples: “Input Changes”, “Checkout Funnel”, “Pricing Page”, “Homepage Hero”\n\n### 📈 Mid-Level Tools (More Structured)\n\n**Airtable / Coda / Asana (with custom fields)**  \nAllow for better filtering, tagging, and status-tracking across experiments.\n\n**Pros:**\n\n- Customisable schemas  \n- Relational data between features, tests, and goals  \n- Filterable views\n\n**Cons:**\n\n- Harder to maintain over time without owners  \n- Setup time investment\n\n### 🔬 Dedicated Experimentation or Feature Tracking Tools\n\n**Statsig, Eppo, Optimizely, Amplitude, PostHog**  \nPurpose-built to connect data pipelines to test setups and outcomes. Good for companies running frequent or complex experiments.\n\n**Effective Experiments, LaunchNotes, Productboard**  \nBuilt for operationalising tests and feature updates across teams. Focus on visibility, sharing, and governance.\n\n**Pros:**\n\n- Scalable, structured  \n- Connect easily to product analytics  \n- Often have collaboration or workflow baked in\n\n**Cons:**\n\n- Costs can be high - often suited to enterprise or well-funded teams  \n- Steep learning curves in many cases  \n- Some tools require technical setup and a data-literate team to use effectively\n\n## Best Practices for Making It Useful\n\n- Use templates for every test and feature.\n- Have a consistent file structure or folder hierarchy.\n- Train your team on how to add entries and where to find past ones.\n- Summarise your learnings at the top of every entry - don’t bury them.\n- Tag everything by area (e.g. signup, checkout), goal (e.g. reduce churn), and outcome (win/loss/inconclusive).\n- Link dashboards, designs, and Jira tickets.\n- Archive older entries but never delete them.\n- Include “future test ideas” or “next steps” to keep momentum.\n\n## What Startups Get Wrong (And How to Start Small)\n\nStartups often skip experiment and feature documentation entirely. Why?\n\nBecause they believe:  \n“We’ll remember.”  \n“We don’t have time to write this up.”  \n“We’ll come back to it later.”\n\nBut the reality is:\n\nTeams grow quickly. People leave. Memory fades.  \nYou’ll spend more time rediscovering old learnings than documenting them.  \nWithout a system, good ideas disappear into Slack threads or someone’s laptop.\n\n### Start small:\n\n- Add a “What We’ve Tried” section to your planning docs.\n- Store one-page retros after each test or feature.  \n  - Include: feature title, short description of what you tested, the results, and the action taken.\n- Use a shared template for test write-ups in Notion or Confluence.\n\nThis isn’t bureaucracy - it’s making your learning compounding instead of disposable.\n\n## Final Thought: Don’t Let Your Learning Go to Waste\n\nEvery experiment and feature launch is a decision made. A choice. A moment of insight.\n\nBut without a system to capture it, that knowledge disappears as soon as the sprint ends or the person who ran it leaves the company.\n\nThe good news? You don’t need fancy tools. Just a consistent habit.\n\nCreate a space where your team can learn from itself - and make smarter, faster decisions next time around."
  }
]

We’re all moving fast. Features are shipped, tests are run, outcomes are shared (briefly), and then we move on. But what happens when the same question comes up again next quarter? Or when a new team member asks why something was built that way? Or worse - when a brilliant idea is put into motion only to discover halfway through: “Wait, didn’t we already try this?”

This blog is about storing what matters - so your product brain doesn’t reset every time the roadmap does.

Why Store Historical A/B Tests and Features?

Think about how many times you’ve heard this in a meeting:

  • “Didn’t we already test that?”
  • “Why did we build this again?”
  • “What did we learn from that launch?”
  • “Wasn’t this already a feature at some point?”

But the more dangerous and less obvious situation is when you’ve run a test in the past, and the results are actually relevant to something you're currently doing - and no one realises.

For example: we once ran a test comparing a slider vs dropdown for an input field on mobile. The dropdown resulted in more submissions overall, but we later discovered that users who used the slider were more likely to book. Why? Because the slider was harder to use - only more committed users completed the form. That learning applied far beyond just that one form. Any future decisions about mobile inputs could've benefited from it - if we had remembered it existed.
In this case, I did remember the test. But I’m sure there are countless others that I haven’t remembered. And that’s exactly the point.

Storing test and feature history is like creating a second brain for your product team. And not just for product - designers, engineers, analysts, and marketers all benefit from it. Everyone involved in shipping features makes faster, better-informed decisions when historical context is available.

So why store it? Beyond avoiding mistakes, here’s what you gain:

For A/B tests:

  • You avoid wasting time revalidating what’s already known.
  • You build confidence in what actually moves the needle.
  • You find connections between behaviours across different surfaces.
  • You identify tests worth revisiting with a different lens or segment.

For features:

  • You know why something exists - and what success looked like.
  • You can trace back the original motivation behind a product decision.
  • You have data to help new team members onboard quickly.
  • You can surface what didn’t work and avoid repeating the mistake.

What Information Should You Capture?

You don’t need to store everything - but you do need to capture the right pieces of information in a structured way.

A/B Test Tracking

For every A/B test, include:

  • Hypothesis - What are you trying to prove or disprove?
    Example: “If we change the CTA from ‘Submit’ to ‘Get My Quote’, more users will convert.”
  • Variants - The exact setups tested.
    Example: A = Old CTA, B = New CTA
  • Primary Metrics - What defines success?
    Example: Click-through rate on the CTA
  • Test Setup - Segment, duration, sample size.
    Example: 30-day test on UK mobile users only.
  • Business Context - What led to this test? What triggered the need for it?
    Example: Conversion rates on mobile were lagging desktop by 15%—this test aimed to close the gap.
  • Outcome - What actually happened?
    Example: Variant B increased clicks by 12% but decreased qualified leads.
  • Decision - What did you do with the result?
    Example: We launched Variant B on desktop only and shelved the mobile change.
  • Follow-up Test Candidates - Are there future variants to explore?
    Example: Combine CTA + form change in future tests.
  • Links - Dashboards, spec documents, mocks.

Feature Launch Logs

When you ship a feature - even without a test - store:

  • What launched - Clear title and link to release or ticket.
  • When and by whom.
  • Goal or metric - What was it supposed to move?
  • Target audience or segment.
  • Business Context - Why was this prioritised?
    Example: This update reduces support load from customers asking to edit their bookings.
  • Implementation Notes - Any constraints, shortcuts, tech decisions.
  • Follow-up Plans - Are there known phases or upgrades?
  • Learnings - Did it work? What surprised you?
  • Links - Design docs, dashboards, related tickets.

How to Use That Information Later

Done right, your test and feature history becomes a treasure trove of insight - not a dusty archive.

It helps you:

  • Avoid running the same test twice.
  • Spot patterns in what tends to work (or flop).
  • Make roadmap planning more confident and context-aware.
  • Back up proposals to leadership with evidence.
  • Strengthen cross-functional collaboration with shared memory.

Real-world example:
A few years ago, we considered a design upgrade for a form that was historically underperforming. The plan required visual finesse, but we didn’t have that skillset in-house at the time - so the idea went dormant. A year later, after hiring a strong UI designer, we approached this area again to ideate on A/B tests.

Once more the discussion led to the same A/B test hypothesis as it had a year prior. Only at the end of the discussion we all took a step back and thought "Huh, didn't we already come up with this hypothesis a while ago?". We had indeed arrived at the same hypothesis a year prior, but because we had no structured way of tracking our hypotheses and their outcomes, it'd had become buried in an archive backlog out of sight and out of mind.

The result? A 20% increase in lead submissions.

That’s when we realised: we had already discussed and semi-validated this idea a year earlier, but had no structured way to surface it when the missing capability finally arrived. We missed out on a full year of performance gains.

Where Should You Store It? Tools + Techniques

You can start small. The tool matters less than the structure, consistency, and ability to retrieve what you need later.

🧰 Low-Tech (Manual, Accessible)

Notion / Confluence / Google Docs
Great for startups or solo PMs. Set up templates for tests or launches and use tags, folders, or status fields to make them searchable.

Pros:

  • Flexible
  • Easy to start
  • No tool learning curve

Cons:

  • Easy to forget or leave half-done
  • Requires discipline and process
  • Searchability and structure depends on setup

💡 One huge pitfall to avoid: defaulting to date-based folder structures like /experiments/YYYY-MM/. This might feel intuitive, but when your archive grows, it becomes harder to surface relevant experiments later.
Better structure: Group by test area or test type.
Examples: “Input Changes”, “Checkout Funnel”, “Pricing Page”, “Homepage Hero”

📈 Mid-Level Tools (More Structured)

Airtable / Coda / Asana (with custom fields)
Allow for better filtering, tagging, and status-tracking across experiments.

Pros:

  • Customisable schemas
  • Relational data between features, tests, and goals
  • Filterable views

Cons:

  • Harder to maintain over time without owners
  • Setup time investment

🔬 Dedicated Experimentation or Feature Tracking Tools

Statsig, Eppo, Optimizely, Amplitude, PostHog
Purpose-built to connect data pipelines to test setups and outcomes. Good for companies running frequent or complex experiments.

Effective Experiments, LaunchNotes, Productboard
Built for operationalising tests and feature updates across teams. Focus on visibility, sharing, and governance.

Pros:

  • Scalable, structured
  • Connect easily to product analytics
  • Often have collaboration or workflow baked in

Cons:

  • Costs can be high - often suited to enterprise or well-funded teams
  • Steep learning curves in many cases
  • Some tools require technical setup and a data-literate team to use effectively

Best Practices for Making It Useful

  • Use templates for every test and feature.
  • Have a consistent file structure or folder hierarchy.
  • Train your team on how to add entries and where to find past ones.
  • Summarise your learnings at the top of every entry - don’t bury them.
  • Tag everything by area (e.g. signup, checkout), goal (e.g. reduce churn), and outcome (win/loss/inconclusive).
  • Link dashboards, designs, and Jira tickets.
  • Archive older entries but never delete them.
  • Include “future test ideas” or “next steps” to keep momentum.

What Startups Get Wrong (And How to Start Small)

Startups often skip experiment and feature documentation entirely. Why?

Because they believe:
“We’ll remember.”
“We don’t have time to write this up.”
“We’ll come back to it later.”

But the reality is:

Teams grow quickly. People leave. Memory fades.
You’ll spend more time rediscovering old learnings than documenting them.
Without a system, good ideas disappear into Slack threads or someone’s laptop.

Start small:

  • Add a “What We’ve Tried” section to your planning docs.
  • Store one-page retros after each test or feature.
    • Include: feature title, short description of what you tested, the results, and the action taken.
  • Use a shared template for test write-ups in Notion or Confluence.

This isn’t bureaucracy - it’s making your learning compounding instead of disposable.

Final Thought: Don’t Let Your Learning Go to Waste

Every experiment and feature launch is a decision made. A choice. A moment of insight.

But without a system to capture it, that knowledge disappears as soon as the sprint ends or the person who ran it leaves the company.

The good news? You don’t need fancy tools. Just a consistent habit.

Create a space where your team can learn from itself - and make smarter, faster decisions next time around.