This is such an important corrective! In medicine we often talk as if evidence is a light switch (“proven / not proven”), but at the bedside it’s almost always a dimmer: how big is the effect, how certain are we, and how well does this population map to this person in front of me?
What I appreciate in your framing is that it naturally pushes readers toward the questions that actually change decisions: What’s the absolute risk reduction (not just relative)? What’s the NNT/NNH? How fragile are the results (bias, missingness, multiplicity, selective reporting)? And what’s the prior plausibility + mechanistic coherence that makes the findings more or less likely to replicate?
Clinically, this is where shared decision-making becomes real, not “do we believe the study”, but “given imperfect evidence, what level of benefit would make this worth it for you, what harms are unacceptable, and what outcome do we actually care about (symptoms/function vs surrogate markers)?”
Medicine (and all fields of science) could use a better appreciation of levels of confidence. As a medical researcher, I've learned that the highest confidence evidence meets 6 criteria: Evidence that is repeatable, evidence that is obtained through prospective study, evidence obtained through more direct (not indirect) measurement, evidence obtained with minimal bias, evidence obtained with minimal assumptions, and studies summarized by reasonable claims (not claims that extend outside of the study parameters). Medicine generally has a strong appreciation of these criteria, demonstrated by the established levels of evidence in clinical medicine. Sadly, other fields of science have minimal appreciation of these concepts - a particular example is evolutionary biology.
This nuance is so important! Evidence goes far beyond whether p<0.05.
One thing I am grappling with recently is when a paper's flaws become so severe that the results are inconclusive or even give the wrong impression.
On one hand, I think most scientists err on the side of "more information is better," and I was trained in grad school that during peer review, it's not a fair criticism to say "you should have done your study completely differently" - we often have to accept imperfect data, as long as the limitations are transparently explained.
But on the other hand, after seeing years of severely flawed papers, I have come to realize that sometimes those papers can hurt more than they help. In my field, a very common flaw is systemic confounding that gives the wrong impression. This research can have harmful ramifications if it misinforms people about which behaviors are risky or beneficial.
I don't have a clear answer, but I think a good solution has to do with "triangulation" across different data sources. E.g. most studies in my field are based on survey data, but those all have very similar flaws, and there is value in expanding the research to other types of data. Other types of data have their own strengths and weaknesses, but it provides a valuable consistency check, to see if hypotheses can be supported by multiple lines of research or are really only due to a systemic error in one dominant line of research.
This is such an important corrective! In medicine we often talk as if evidence is a light switch (“proven / not proven”), but at the bedside it’s almost always a dimmer: how big is the effect, how certain are we, and how well does this population map to this person in front of me?
What I appreciate in your framing is that it naturally pushes readers toward the questions that actually change decisions: What’s the absolute risk reduction (not just relative)? What’s the NNT/NNH? How fragile are the results (bias, missingness, multiplicity, selective reporting)? And what’s the prior plausibility + mechanistic coherence that makes the findings more or less likely to replicate?
Clinically, this is where shared decision-making becomes real, not “do we believe the study”, but “given imperfect evidence, what level of benefit would make this worth it for you, what harms are unacceptable, and what outcome do we actually care about (symptoms/function vs surrogate markers)?”
Medicine (and all fields of science) could use a better appreciation of levels of confidence. As a medical researcher, I've learned that the highest confidence evidence meets 6 criteria: Evidence that is repeatable, evidence that is obtained through prospective study, evidence obtained through more direct (not indirect) measurement, evidence obtained with minimal bias, evidence obtained with minimal assumptions, and studies summarized by reasonable claims (not claims that extend outside of the study parameters). Medicine generally has a strong appreciation of these criteria, demonstrated by the established levels of evidence in clinical medicine. Sadly, other fields of science have minimal appreciation of these concepts - a particular example is evolutionary biology.
This nuance is so important! Evidence goes far beyond whether p<0.05.
One thing I am grappling with recently is when a paper's flaws become so severe that the results are inconclusive or even give the wrong impression.
On one hand, I think most scientists err on the side of "more information is better," and I was trained in grad school that during peer review, it's not a fair criticism to say "you should have done your study completely differently" - we often have to accept imperfect data, as long as the limitations are transparently explained.
But on the other hand, after seeing years of severely flawed papers, I have come to realize that sometimes those papers can hurt more than they help. In my field, a very common flaw is systemic confounding that gives the wrong impression. This research can have harmful ramifications if it misinforms people about which behaviors are risky or beneficial.
I don't have a clear answer, but I think a good solution has to do with "triangulation" across different data sources. E.g. most studies in my field are based on survey data, but those all have very similar flaws, and there is value in expanding the research to other types of data. Other types of data have their own strengths and weaknesses, but it provides a valuable consistency check, to see if hypotheses can be supported by multiple lines of research or are really only due to a systemic error in one dominant line of research.