Clinical trials and the fragility of “knowledge”

A few months ago, a controversial scientific paper came out. It happens every once in a while – a paper that questions or refutes something thought to have been relatively well established or that shines a light on something we’ve been doing wrong. I’ve written about this kind of stuff before (see here), but this one is a bit different. It calls into question something we’ve taken for granted for the past decade or so – information that forms the basis of treatment decisions affecting patients on a daily basis. More importantly, the story around the paper teaches a few important lessons about how we do research.

One of the treatments for an acute ischemic stroke is intravenous thrombolysis – a medication that helps break up the blood clot that’s blocking an artery and causing the stroke. For about a decade since the medication was approved for use, it was only used in patients who could be treated within three hours of their stroke symptoms starting. Early studies showed that, when patients are treated after three hours, the risks of the treatment (mostly bleeding, sometimes in the brain) outweigh the benefit.

A study conducted in 2008 changed all that. It showed that thrombolysis is effective and relatively safe up to 4.5 hours after stroke symptoms start. So the guidelines changed – at least in Europe. In the US, the FDA decided not to extend the treatment’s indication for reasons that are not entirely clear – still, even there, the study’s results led to more use of thrombolysis in this “extended” time window (as an “off-label” treatment).

The 2008 study drew some criticism early on, particularly because the two groups (those treated with thrombolysis and those given placebo) weren’t well matched – in this context, that just meant that patients in the thrombolysis group had, on average, less severe strokes and more patients in the placebo group had strokes prior to the one they were included in the study for. So the argument went: it’s possible that the thrombolysis group did better not because they received the medication, but because they were already less affected by their current stroke and fewer of them had old strokes.

So the authors of this new paper (the one that came out a few days ago – let’s call it the “2020 study”) got ahold of the data from the 2008 study and decided to reanalyze it, taking into account these “baseline differences” that indicate the groups are not “well-matched”. This is a summary of what they found:

Many of the results of the 2008 study could only be reproduced under a set of conditions that were not pre-specified by the investigators of the 2008 study. These included excluding some patients and turning some variables from continuous to categorical (all potentially justifiable things to do). Note that this is separate from the issue of “baseline differences” that were not adjusted for in the 2008 study – this is an attempt to reproduce the exact results of the 2008 study using the data from the 2008 study and the way the authors of the 2008 study reported that the data were analyzed.
After adjusting for the effect of the “baseline differences” that were not adjusted for in the 2008 study, the groups were no longer statistically significantly different in terms of any of the outcomes that the 2020 study authors looked at. The one exception was that the thrombolysis group had more brain bleeds than the placebo group.

This whole debacle brings several issues with the way we do science, particularly science that is used to influence how we treat patients, to the forefront:

First of all, relying on the results of a single study – no matter how large or seemingly robust – to change clinical practice is a bad idea (the authors of the 2020 study mention this as well). Every study has unique factors that threaten either its external or internal validity (sometimes both) and therefore limit the extent to which it can be relied upon to represent some kind of “truth”. This is a really hard pill for most clinicians to swallow. Some of them because they invest years in designing and conducting trials, many of them honestly doing their very best to come up with robust and reliable evidence. And I’m not saying those efforts are in vain – clearly studies exist on a spectrum of quality, and the decisions that investigators make can greatly influence that quality. But still, no matter how hard we try, there will never be such a thing as a perfect single study – with results that hold true under all circumstances (I don’t mean all conceivable circumstances – even under a particular limited set of circumstances). Even clinicians who aren’t involved in conducting trials find it hard to believe that there should be no such thing as a single “practice-changing” study – mostly because they are eager to help their patients (if you’ve ever been to a big clinical conference, note the standing ovations and crowd’s elation when “positive” clinical trial results are presented). Add to that the expectations from regulatory authorities (sometimes inadequate) as well as issues of equipoise and economics, and you start to understand why we, as a community, believe that as long as it’s a (relatively) well-designed randomized clinical trial, its results are good enough to change our practice.
Related to the first point: knowledge (defined in this case as the information we get from seemingly well-designed and robustly conducted studies) is fragile. Slightly changing the way a variable is defined (continuous vs categorical, for example) or removing a few subjects with missing data can swing your results one way or the other. This is a well-known issue and is related to not defining analysis strategies before data collection, the garden of forking paths, etc. But it seems that many clinicians and clinical researchers are either not aware of it or underestimate just how big a deal it is. I’ve had countless conversations with peers who believe that if you’ve got some “good data” (collected appropriately from a well-designed study, without any funny business of any kind), how the data is analyzed shouldn’t influence the results in a major way. The data are the data – they more or less should speak for themselves – as long as I didn’t tamper with the data and I used the “correct” tests, why would my analysis approach mislead me? It can, and very often does.
In the world of clinical trials, statistics are still commonly misunderstood and misused. The 2020 authors themselves make a very prevalent mistake – confusing a lack of a statistically significant difference between two groups as evidence that the groups are equivalent (or in their words, “matched”). For more information on this, see here and here. This isn’t just a statistical technicality – in the 2020 study, the only variables that were “adjusted” for in the analysis were the ones that were statistically significantly different between the groups, so many others were potentially missed. In fact, testing for “baseline differences”, regardless of how, is very much a contested practice (see here, here, and here), but clinical trials are full of it. That’s surprising, because there are often biostatisticians on the investigator panels of such trials and biostatisticians presumably review at least some of the published trial protocols and reports.

I’m not sure if the 2020 study will directly change stroke management – the authors are careful with the interpretation of their findings (rightfully so, in my opinion), saying that their study “reduce[s] [the] certainty” in the conclusions of the 2008 study. But I do hope we do learn some things from this – clinicians really need to rethink how they view single clinical trials, take matters like analytical flexibility more seriously, and avoid common statistical misconceptions.

Clinical trials and the RRRR cycle

IMG_20180516_102631

I just got back from one of the world’s largest stroke meetings, the European Stroke Organisation Conference (ESOC), held this year in Gothenburg, Sweden. The overwhelming focus of the conference is on groundbreaking large clinical trials, reports of which dominate the plenary-like sessions of the schedule. One thing I’ve noticed about talks on clinical trials is how, every year, the speakers go to great lengths to emphasize some (positive) ethical, methodological, or statistical aspect of their study.

This is the result of something that I like to call the RRRR cyle (pronounced “ARRARRARRARR” or “rrrr” or “quadruple R” or whichever way won’t scare/excite those in your immediate vicinity at that moment in time). It usually starts with loud opposition (reprimand) to some aspect of how clinical trials are run or reported. This usually comes from statisticians, ethicists, or more methodologically inclined researchers. Eventually, a small scandal ensues, and clinical researchers yield (usually after some resistance). They change their ways (repentance) and, in doing so, become fairly vocal about what they’re now doing better (representation)*.

Examples that I’ve experienced in my career as a stroke researcher thus far are:

Treating polytomous variables as such instead of binarizing them (the term “shift analysis” – in the context of outcome variables – is now an indispensable part of the clinical stroke researcher’s lexicon).
Pre-specifying hypotheses, especially when it comes to analyzing subgroups.
Declaring potential conflicts of interest.

Most of these practices are quite fundamental and may have been standard in other fields before making their way to the clinical trial world (delays might be caused by a lack of communication across fields). Still, it’s undoubtedly a good thing that we learn from our mistakes, change, and give ourselves a subtle pat on the back every time we put what we’ve learned to use.

The reason I bring it up is, maybe soon someone** could start making some noise about one of the following issues that come up way too often in my opinion:

(Mis-)interpreting p-values that are close to 0.05, and how this is affected by confirmation bias.

In the SAME talk:

A stat non-significant result that goes AGAINST the study's expectations = we can't really interpret this, underpowered

A stat non-significant result that CONFORMS to the study's expectations = interesting, more n would have given p<0.05 #ESOC2018 pic.twitter.com/wgjFQjITUm

— Ahmed Khalil (@AhmedAAKhalil) May 18, 2018

Testing if groups are “balanced” in terms of baseline/demographic variables in trials using “traditional” statistical methods instead of equivalence testing.

As the ESOC meeting keeps reminding me, a lot can be done in a year. So I’m pretty optimistic we can get some of these changes implemented by ESOC 2019 in Milan!

* If you think this particular acronym is unnecessary or a bit of a stretch, I fully agree. I also urge you to take a look at this paper for a list of truly ridiculous acronyms (all from clinical trials of course).

** I would, but I’m not really the type – I’d be glad to loudly bang the drums after someone gets the party started, though.

Sense and simplicity in science

I recently finished Atul Gawande’s book The Checklist Manifesto, which I highly recommend. It’s all about how very simple measures can have profound outcomes in fields as diverse as aviation, construction, and surgery.

What struck me the most about it wasn’t the author’s endorsement of using basic checklists to ensure things are done right in complex scenarios. Instead, it’s Dr Gawande’s insistence on testing the influence of everything, including a piece of paper with 4 or 5 reminders stuck to his operating theatre wall, that I found inspiring.

Why bother collecting evidence for something so apparently simple, so clearly useful, at all?

Talk of the town

Ischemic stroke, caused by the blockage of an artery in the brain by a blood clot, is as complex as anything in medicine. In fact, for such a common and debilitating illness, we have surprisingly few treatments at hand. Until recently, only two had been proven to help patients who suffered a stroke: giving them a drug that dissolves the clot and keeping them in a “stroke unit” where they receive specialised care that goes beyond what is offered in a general neurology ward.

But that all changed last year. The lectures and posters at the 2015 European Stroke Organisation conference in Glasgow, which I attended, were dominated by one thing. A new treatment for acute ischemic stroke had emerged – mechanical thrombectomy.

In the four months leading up to the conference, a number of large clinical trials had proven that this intervention worked wonderfully. Literally everyone at the conference was talking about it.

Isn’t that obvious?

Mechanical thrombectomy involves guiding a tube through a blood vessel (usually an artery in the groin) all the way up through the neck and into the brain, finding the blocked artery, and pulling out the clot. Just let that sink in for a moment. In the midst of stupendous amounts of research since the mid-90s into convoluted pathways leading to brain damage after stroke, fancy molecules that supposedly protect tissue from dying, and stem cells that we’ve banked on repairing and replacing what’s been lost, the only thing that’s worked so far is going in there and fishing out the clot. That’s all it takes.

After returning to Berlin, I told a former student of mine about the news. “Well, duh?”, she responded, just a bit sheepishly. My first instinct was to roll my eyes or storm out yelling “You obviously know nothing about how science works!”. But is this kind of naïveté all that surprising? Not really. Somehow we’re wired to believe that if something makes sense it has to be true (here’s a wonderful article covering this). As a scientist, do I have any right to believe that I’m different?

Science is not intuitive.

To paraphrase part of a speech given recently by Dr Gawande, what separates scientists from everyone else is not the diplomas hanging on their walls. It’s the deeply ingrained knowledge that science is not intuitive. How do we learn this? Every single day common sense takes a beating when put to the test of the scientific method. After a while, you just kind of accept it.

The result is that we usually manage to shove aside the temptation to follow common sense instead of the evidence. That’s the scientific method, and scientists are trained to stick to it at all costs. But we don’t always – I mean if it makes such clear and neat sense, it just has to be true, doesn’t it?

Never gonna give you up

The first few clinical trials showed that thrombectomy had no benefit to patients, which just didn’t make sense. If something is blocking my kitchen pipes, I call a plumber, they reach for their drain auger and pull it out, and everything flows nicely again. Granted, I need to do so early enough that the stagnant water doesn’t permanently damage my sink and pipes, but if I do, I can be reasonably sure that everything will be fine. But in this case, the evidence said no, flat out.

It works, I’ve seen it work and I don’t care what the numbers say.

Despite these initial setbacks, the researchers chased the evidence for the better part of a decade and spent millions of dollars on larger trials with newer more sophisticated equipment. I’m wondering if what kept them going after all those disappointing results was this same flawed faith in common sense. It works, I’ve seen it work and I don’t care what the numbers say – you hear such things from scientists pretty often.

Another important flaw in the way researchers sometimes think is that we tend to do is explain the outcomes of “negative” studies in retrospect by looking for mistakes far more scrupulously than before the studies started. I don’t mean imperfections in the technique itself (there’s nothing wrong with improving on how a drug or surgical tool works, then testing it again, of course). I’m talking about things that are less directly related to the outcome of an experiment, like the way a study is organised and designed. These factors can be tweaked and prodded in many ways, with consequences that most researchers rarely fully understand. And this habit tends to, in my opinion, propagate the unjustified faith in the authority of common sense.

There’s good evidence to suggest that the earlier mechanical thrombectomy trials were in some ways indeed flawed. But I still think this example highlights nicely that the way scientists think is far from ideal. Of course, in this case, the researchers turned out to be right – the treatment made sense and works marvellously. It’s hard to overemphasise what a big deal this is for the 15 million people who suffer a stroke each year.

Deafening silence

More than a year has passed since the Glasgow conference and this breakthrough received little attention from the mainstream media. Keep in mind, this isn’t a small experiment of some obscure and outrageously complex intervention that showed a few hints here and there of being useful. It is an overwhelming amount of evidence proving that thrombectomy is by far the best thing to happen to the field of stroke for almost two decades. And not a peep. In fact, if you’re not a stroke researcher or clinician, you’ve probably never even heard of it.

Now, if you read this blog regularly, I know what you’re thinking. I rant a lot about how the media covers science, now I’m complaining that they’re silent? But doesn’t it make you wonder why the press stayed away from this one? I suppose it’s extremely difficult to sell a story about unclogging a drain.

The best thing to happen to the field of stroke for almost two decades.

Nature’s role in modern medicine

Whether as patients or healthcare workers, it’s easy to overlook the origins of the drugs used to treat common diseases. In the era of recombinant technology and generally complex ways to design, test and use medicines, it’s refreshing when a drug crosses our path which is derived from nature in a simple yet brilliant way. Now, there are countless examples of these stories described in scientific literature, as well as within the mainstream media. Below are a few examples which I found particularly memorable.
I distinctly remember during my pharmacology final oral exam in medical school being asked a question. It was the very last question which the examiner asked me, and I was taken slightly aback by how simple it was yet how little I had thought about it before. He had grilled me for a good fifteen minutes about angiotensin converting enzyme inhibitors, a class of drugs used for treating hypertension among other conditions. Then, as I felt the exam drawing to a close he asked ‘How were ACE inhibitors discovered?’. Now, for a pharmacy student this question would be no problem at all, as they focus a lot of their time in studying how drugs are derived and synthesized. For me however, being immensely preoccupied with a wide range of subjects, the more clinical aspects of medicinal use being just one of them, it was something which I had hardly ever thought of.
ACE inhibitors were derived in the 1960s from the venom of the Brazilian pit viper, Bothrops jararaca. The venom leads to a severe drop in blood pressure by blocking the renin angiotensin aldosterone system. It is noteworthy that this selective mechanism of action means that ACE inhibitors may not be effective for everyone in terms of lowering blood pressure. However, the drugs have several other benefits including protecting the kidneys in diabetes and improving heart function in patients with heart failure.
Another interesting drug discovery story is that of exenatide , an anti diabetic agent licensed for use in 2005. This drug was isolated from lizard (Gila monster) saliva and has been shown to stimulate insulin release from the pancreas. Unlike other drugs with the same action, exenatide only increases insulin secretion when glucose levels are high and therefore does not lead to hypoglycemia. It also has numerous other beneficial effects including promoting weight loss.
My favorite, however, is the story behind a new thrombolytic treatment for stroke. The drug, now called desmoteplase, is derived from the saliva of the vampire bat Desmodus rotundus. This new drug is still in the testing phases of development (phase III trials), but has already shown great promise. It stays in the body for a longer time than other thrombolytics, is more selective in its action and does not lead to neurotoxicity. It is possible that it may represent a breakthrough in the treatment of stroke, which is currently a highly debated and complicated issue.

Holiday penumbras

In anticipation of my upcoming lab rotation at the Centre for Stroke Research in Berlin, I have been reading up on the focus of my project.
The ischemic penumbra is an area of a stroke patient’s brain which is dying as a result of the blockage of one of the arteries supplying the tissue. The keyword here is dying, not quite dead yet unlike the core of the tissue affected – which makes it a prime target for salvation in terms of stroke treatment. Left untreated, this penumbra transforms into dead brain tissue, and thus contributes to the patient’s permanent symptoms or neurological deficit. The region, which cannot be readily seen on more conventional imaging techniques like CT or standard MRI, was first noticed in PET scans, which basically measure the amount of energy the brain uses and maps it onto an image. Now, with the availability of more sophisticated MRI techniques such as diffusion and perfusion weighted imaging, this area can be mapped and its natural history identified, but most important is the fact that its response to treatment can be established. Simply, can this area be saved and to what degree, is the question on our minds.
The group which I am going to be working with has made much progress in this field, and what drew me to their projects most was their unique and innovative approach to the subject of stroke, which is a major killer and prominent cause of disability worldwide. For example, every doctor knows that stroke has a relatively short time window in which treatment can be given, and more importantly it is within this time window, typically around three hours from the onset of the stroke, that benefits of the treatment outweigh the risks. This time window was derived years ago from large studies which showed that only patients who received treatment within this time benefited from it. But now people are thinking of a new approach. Researchers are now trying to replace this seemingly arbitrary number of three hours with more objective and reliable criteria such as various signs on MRI, so that everyone who might benefit from treatment but falls outside of this window can have the opportunity to end up with less permanent disability than if doctors only relied on the time from stroke onset.
Needless to say I am very excited at the prospect of participating in something which has the potential to be so groundbreaking! Which is why I feel the need to be prepared, and that’s what I’m spending my holiday attempting to do. I will be posting more about this soon! 🙂

Ahmed A. Khalil

MD PhD

stroke