The Science Behind The Science

July 31, 2008 - When we become ill we search for a treatment that works. But, how do we know that a treatment really works? This is not a trivial question. First, it is important to recognize that even when a treatment is proven to work in most people it may not work for a particular individual because each of us is different. However, ignoring this fact, we generally take comfort in knowing that a treatment is proven to be effective in most people. So, how can we be reasonably confident that a treatment is effective in most people? The answer lies in what is called evidenced based medicine. That is, there should be solid evidence or data that a treatment works. Centuries ago, medicine was more of an art than a science. Consequently, there remains much "folklore" in the practice of medicine today. Many treatments accepted or practiced over a long period of time but never scientifically studied are taken for gospel truth. For example, chicken soup has never been proven an effective treatment for the common cold. The same is true for causes of disease. For decades doctors were convinced that stomach ulcers were caused by stress. We now know that they are caused by the infectious organism Helicobacter pylori. Closer to home, because of this folklore effect, many neurologists still to this day believe that Chiari malformations do not cause symptoms.

The best type of evidence is data produced by well designed clinical studies, and preferably multiple well designed clinical studies coming to the same or similar conclusion as reproducing results is very important.

Let's begin by examining the definition of data. Turning to a popular on-line dictionary, I found the following definitions of data.

1. factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation
2. information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful
3. information in numerical form that can be digitally transmitted or processed

These definitions are fine but when it comes to proving something scientifically or medically, there are other very important characterizations of data that must be considered such as the source of the data, the robustness of the experimental design that produced the data, the integrity of the data, the quality of the analysis of the data and the reproducibility of the data.

The first characteristic is "source". Do the data have a legitimate source or are they mythical? Does a valid reference exist for the data being used? Is that reference available for review? Is the reference source legitimate? The source of data is extremely important. In some instances, I have found that no data exist to support commonly heralded cures and treatments. Most often however, data cited as evidence of effective treatments comes from medical, scientific, or technical journals. It is critical to understand that not all journals are equal. Journals have standards for accepting publications and some journals have very low standards. Quoting data from a journal can actually be a liability if the journal has low or no standards and/or is considered biased. A while back I wrote about Reiki as a treatment for Chiari. When I researched Reiki, I found that most references were from journals considered favorably biased towards alternative medical treatments. I may have been able to make a stronger case for Reiki had I found even a single reference in its favor in a mainstream peer-reviewed medical journal. It is also important to locate and review the original reference source. One should never depend on lay magazines particularly if they do not reference journals. The same can be true of information found on-line.

The second characteristic has to do with elements pertaining to the "robustness of the experimental design" that produced the data. The main question here is was the design of the experiment that generated the data robust and valid? What must be considered in judging experimental design to be robust and valid? There are many elements to be considered here. First, before the experiment or study is conducted, a reasonable hypothesis must exist. In other words, the investigator must state up front what the objective of the study is and exactly what is being tested. This is called a prospective study. When one collects data from a study and attempts to analyze those data to answer a different question than the one prospectively stated it's called a retrospective study. Conclusions drawn from prospective and retrospective studies do not carry the same weight. In general, one can not make firm conclusions from retrospective studies. The real value of retrospective studies is that they generate new hypotheses that can then be tested prospectively. Let me point out here that the vast majority of studies in the medical literature regarding the effectiveness of decompression surgery are retrospective.

Another element which must be considered in the experimental design of a study is control of variables. Basically, if you are trying to answer one question or determine one unknown, all of the other variables must be controlled or known. This is usually accomplished by using a control group. A control is a group of test subjects or patients who are equivalent to the subjects or patients receiving the treatment. Equivalent is the operative word here. The control group must be the same size and must possess the same characteristics. The control group and treatment group must be balanced in other words for variables like gender and age for example. There are different ways one can design a controlled study. Sometimes the treatment group can serve as its own control group by using a cross-over design. The same group for example might take a placebo for a week followed by a wash out period and then given a treatment for a week. And, variations on cross-over designs also exist.

Defining criteria up front for admitting and excluding subjects or patients into a study are another important way to control variables. For example, other or concomitant diseases may often affect the outcome of the treatment being studied so patients with certain diseases should be excluded. Entry criteria are equally important for one to assure that patients entering a study actually have the disease being investigated at the proper level of severity. For example, suppose you want to conduct a study to determine the effectiveness of decompression surgery for Chiari Type I malformations. One thing you will need to determine in designing the study is if you want to investigate Chiari patients who also have syringomyelia. This is a critical question as patients with syringomyelia may potentially have different outcomes versus those who do not. A criterion to exclude patients with syringomyelia might therefore be established.

The selection of measures or end points is another critical element in designing studies. Not all measures or end points are valid. You must be able to show that what ever you are measuring you can measure accurately and reliably and that it is relevant. This sounds like common sense, but readers would be amazed at how often this is violated and how difficult it can be to establish valid end points. An excellent example here is the definition of a successful surgical outcome. I know countless post-surgery Chiari patients who, when told by their surgeon that their surgery was successful, were shoved into a state of disbelief because they felt so bad. The reason for this is that the surgeon's definition of success was very different from the patient's definition or expectation of success. A surgeon might define success as 1) the ability of the patient to walk into his/her office and 2) Cine MRI evidence of cerebrospinal fluid flow being re-established whereas the patient might define success as simply feeling like they did before symptoms emerged. There are ways to define surgical success that can take into account the level of patient perceived wellness. Often considerable work must be done to establish these definitions as valid instruments by which outcome can be measured. Most of the retrospective decompression studies that I referred to earlier did not use validated instruments for determining patient outcome.

Bias must also be eliminated and is a critical element in conducting any study. If a patient knows that he or she is receiving treatment rather than placebo, they may respond favorably due to psychological reasons. The same goes for those conducting and analyzing the study. To avoid this, a study must be blinded. In other words, the study must be conducted in a manner where all involved in it are "blinded" as to which subjects or patients are receiving treatment and which are receiving placebo. This can require considerable effort. Readers may be amazed to learn that entire companies exist to provide services and technologies to enable this.

Statistics must be carefully considered before conducting a study or experiment. There is always the possibility that a result will happen by shear chance. Statistics can provide some confidence that this is not the case. Using the principles of statistics control and test groups can be sized to minimize obtaining chance results. How large study groups must be depends on multiple factors. The FDA generally requires two pivotal studies to approve a drug. These are studies that are well designed and that meet rigorous statistical standards. Studies that contain a small number of subjects or patients are often termed pilot studies. The weight placed on the outcome of a pilot study versus a pivotal study is vastly different.

The third characteristic one must look for in determining if a treatment is proven is the "integrity" of the data themselves. Are the data clean and valid? This is critical and most often under appreciated or taken for granted. Errors can be introduced into data many different ways from malfunctioning or improperly calibrated equipment to inaccurate observations to inaccurate recordings to inaccurate transcriptions to malfunctioning computer programs. A few errors in the data can result in a totally different conclusion. The amount of effort to assure that data bases are clean and valid can be enormous. In a well designed pivotal study, a large group of people working full time for months can be employed to simply inspect and validate the data. Sophisticated computer programs and tools are employed to assist in this effort. Sometimes just one data point being different can make or break the statistical analysis on which the conclusion is based. There is even something called meta-data involved. For example, let's say one is measuring a patient's response to a medication that treats pain. The patient's indication of the level of pain is the primary data but the date, time and place of the measurement as well as the doctor or nurse who made the measurement is also recorded and this is referred to as meta-data. Meta-data must also be reviewed. Sometimes meta-data provides insight into the validity of the primary data. If meta-data don't line up, the primary data falls into question. For example, what do you do when the meta-data indicate that the patient's pain response was collected on a date and time before the patient was even given medication? Meta-data can be extremely useful in detecting many kinds of data errors including fraud.

The fourth characteristic to consider in judging the effectiveness of a treatment is the extent and "quality of the analyses" of the data. Only after the data collected have been determined to be clean and valid can they be analyzed. (This is known in the community of science as locking or freezing the database.) So, the next question is are the analyses complete and thorough? I talked a little about statistics above but I did not talk about statistical methods of analysis. There are different ways to perform statistical analyses. It is important to understand that the method of statistical analysis used must be stated and documented before the analysis is actually performed. This prevents the introduction of bias. The a priori analysis plan must be thorough. It should include primary and secondary endpoints (measures supportive of the primary measures) on the total population of the study as well as important and relevant subgroups (groups based on characteristics like sex, age, disease severity, etc.). There is an important concept known as intent to treat. Often is a study, subjects or patients drop out for all sorts of reasons. Some move a great distance away. Some can not tolerate the treatment. Some even have severe accidents or die. As a result, at the conclusion of a study, there are two groups, the drop outs and the completers. The analyses must look at both groups as well as the combined group. The analysis that looks at both groups is called the intent to treat analysis. The results on the drop outs can not be dismissed just because they didn't complete the study. Dismissing such results, albeit incomplete in nature, can bias the outcome and conclusion of the study. Also, keep in mind that a proper study design will estimate up front the expected number of patients that will drop out. There are many important questions to be addressed. Are unexpected results understood and do they have a plausible explanation? Are the weaknesses in the experimental design and their influence on the outcome understood and weighed appropriately in making conclusions? What is the meaning of a conclusion if a different statistical method of analysis is used and produces a different result? Performing analyses and drawing conclusions can be difficult to do particularly on difficult problems or questions or where the problem or disease being investigated is new or poorly understood. In such cases, attempting to draw valid conclusions from a single study regardless of how well it was designed and conducted can not be done and additional studies are needed.

The fifth characteristic which must be considered is "reproducibility". Take note that when I discussed statistics I said there is always the possibility of obtaining a particular result by pure chance - always. Even when a study is well designed using good statistical principles and methods, the results obtained could be a reflection of lady luck. For this reason, it is important to show that the results obtained can be reproduced. Reproducing results by different investigators adds a great deal of credibility to the conclusions. Some readers may be familiar with cold fusion. About 20 years ago, a couple of electrochemists claimed to produce a nuclear reaction in a small table top vessel. It was hailed as the ultimate solution to our energy problems. One of the problems was that many other independent investigators could not reproduce their results. At first, there was a lot of controversy with one camp claiming that it worked and another claiming it was impossible. Today most scientists agree that cold fusion can not be achieved. The inability to reproduce the results at the beginning of the controversy was an early warning sign that turned out to be correct.

Much has been discussed in this article and much more has not. Determining if a treatment really works is a very complex task that often takes considerable research and audit skills. I hope my readers have gotten some flavor of this. The next time you watch an infomercial for some supplement, device, or diet claiming that is proven to work, keep the principles discussed above in mind, roll your eyes and change the channel. But in a serious vein, I hope this helps in sorting out what may or may not help when discussing treatment options with caretakers.

Ed. Note: The opinions expressed above are solely those of the author. They do not represent the opinions of the editor, publisher, or this publication. Mr. D'Alonzo is not a medical doctor and does not give medical advice. Anyone with a medical problem is strongly encouraged to seek professional medical care.