Podcast & Blog


Diagnostic Test

How to Validate and Assess a Diagnostic Test


The benefits of having good diagnostic tests

The aim of a diagnostic test is to predict a condition in an individual when symptoms or previous medical test results indicate it. In assisted reproduction, the goal is often to have tests for patients or biological samples that allow the user to predict the outcome of interest with high certainty in a cost-effective manner. Examples of possible outcomes which might be predicted include live birth from an embryo, high quality embryos from a semen analysis result, optimal ovarian response based on serum tests, endometrium which supports implantation or predicts a miscarriage, among others. The information provided by these tests would improve decision-making and the likelihood of success.

Although a diagnostic test may be expected to forecast a certain condition, testing should ultimately lead to an amelioration of the health status related with its use. Thus, its use is only justified if its performance leads to improved medical care.

PGT-A is an example, aiming to improve reproductive outcomes by avoiding unsuccessful embryo transfers, miscarriages, and reducing time to pregnancy.


How to design a study to evaluate how a diagnostic test performs?

Three concepts should be assessed: reproducibility (when testing twice, the second test produces the same result), precision (variability within results), and accuracy (results represent truth). Once these are known, there are three additional things which must be assessed: scientific validity, analytical and clinical performance, guaranteeing data robustness and reliability.


Scientific validation: ability to discriminate the condition of interest in an ideal situation

Potential diagnostic tools are chosen among candidates because preliminary results relate to the studied condition in either scientific literature, proof of concept, or basic and clinical studies.

Results with significant p-values from differences between the biomarkers’ levels in the different target conditions with little overlap between the different groups points to a high discriminatory ability of the test. Being predictive is not necessarily the same as being “related with” but should have the “capacity to classify a priori”.

Further validation must account for all sources of error both analytical and clinical. This concerns reproducibility, agreement between observations, and analytic sensitivity (minimum detectable levels) among others.

It is often more practical to investigate technical validity once there is some evidence of association between test result and disease status. At that point, then clinical validity and accuracy in a clinically relevant situation can be determined.


Clinical validity and accuracy in a real and clinically relevant situation

In vivo, samples and conditions of use may not be as perfect as in vitro, either concerning technical validity of a test or accuracy.

Clinical performance should be demonstrated in suitable studies, being prospective, blinded for participants and staff, and applied in a standard routine care. All efforts should be made to avoid significant biases and results should report sensitivity, specificity, positive and negative predictive values (PPV and NPV), and likelihood ratios. This often requires a non-selection study design, where test is performed but clinical decisions are not altered based on test results, allowing to establish key parameters including PPV and NPV). A non-selection approach is especially necessary when performing validation of PGT-A platforms.

Ideal tests would present a high sensitivity, identifying a high percentage of truly affected individuals, and a high specificity, so as to not give false positive results. For some tests, it may be preferable to have an emphasis on a high sensitivity or a high specificity.

Factors as disease stage, clinical or demographical characteristics, measurement errors and disease prevalence (which highly affects predictive values) must be considered.

How good is good enough? It depends on the consequences derived from the test if the incorrect results are delivered.

As a rule, a very specific test is chosen when you can to some extent “accept” false negative results in order to ensure that a patient really has a disease. For a test to detect a disease that, when positive, treatment present risks, is serious and practically incurable, or leads to economic and psychological trauma, high positive predictive value is recommended.

On the other hand, a very sensitive test is chosen when you can accept some false positives instead of false negatives–that is, you want the number of undetected true disease states to be minimal. It is the case when the disease is serious but treatable. Unfortunately, we have a current example with the COVID-19 pandemic, where it is important to use a highly sensitive test: all sick must be detected and isolated.


Evaluation of the effect of using the test on clinically important outcomes for the patients:

In order to decide when to incorporate a diagnostic test into routine practice, one must first confirm that there is clinical improvement provided by its use compared with the routine practice.

The preferred design to quantify the impact of using a test on patients’ health is the randomized controlled trial, where one group will undergo the test of interest and be managed according to the results, while other group will be treated as per current protocols. Once it has been determined how much improvement vs. standard of care is clinically relevant, study population is chosen and patients are enrolled.

Intervention studies should not be carried out without first having defined predictive values.

If successful, performance evaluation of a diagnostic test is a continuous process by which data are re-assessed to prove the scientific validity, analytical and clinical performance for its intended purpose and providing clues about the differential performance on specific patient groups and laboratories.

In the end, anything we apply to patients and to the public should be beneficial for them and should do no harm nor lead to unnecessary economic burden.

Unfortunately, in our field it is common to find many studies with low quality evidence and the commercialization of a myriad of tests (the recently tagged as “add-ons”). Many of these tests did not go through the complete validation process applied for many other diagnostic tools routinely utilized in other biomedical specialties.

This surely explains the disappointing application of many diagnostic tool in our specialty to clinical practice. In order to extend excellent care towards our patients, excellence in research and development is mandatory.


Author: Nicolás Garrido M.SC., PH.D.