A video documentary in four parts, based on interviews
with regulators, industry and QSAR developers.
INTRODUCTORY LEAFLET
A concise and accessible explanation of in silico methods and the issues around them, for people who want to know about them, and/or want to understand what the ORCHESTRA project is about. Download the leaflet
The REACH requirements for QSAR
REACH legislation foresees the use of alternative in silico methods such as QSAR and Read-Across. Regarding Qualitative or Quantitative structure-activity relationship ((Q)SAR), Annex XI states that:
Results obtained from valid qualitative or quantitative structure-activity relationship models ((Q)SARs) may indicate the presence or absence of a certain dangerous property. Results of (Q)SARs may be used instead of testing when the following conditions are met:
- results are derived from a (Q)SAR model whose scientific validity has been established,
- the substance falls within the applicability domain of the (Q)SAR model,
- results are adequate for the purpose of classification and labelling and/or risk assessment, and
- adequate and reliable documentation of the applied method is provided.
The Agency in collaboration with the Commission, Member States and interested parties shall develop and provide guidance in assessing which (Q)SARs will meet these conditions and provide examples.
Thus, these are the requirements according to REACH. Three of them refers to the QSAR model, and one on its use for a specific compound.
These principles are also expressed in four of the OECD principles for the QSAR validation, whereas we notice that there is no explicit mention of the fifth OECD principle within REACH.
THE MODEL VALIDITY
The first requirement points out that the method should be scientifically valid. We notice that it is not requested that the model is validated. Validation is a formal process, which takes many years. The validation process of a QSAR models would end after REACH probably. The validity has to be assessed through scientific criteria, considering the performance of the model in its results in prediction (ECHA - Guidance for the implementation of REACH).
The OECD principles specify some features of the QSAR model, in order to assess if it work or not, and these include the statistical characteristics, for the model itself and its predictive properties. We notice that, for regulatory purposes while in the early years of the QSAR development the interest was on the properties of the model addressing the results in fitting, i.e. based on the chemical used to build up the model, the accent is given on the possibility for a model to predict the property of a new compound. In other words, it should be shown that it works for the purpose (proof-of-principle).
How to evaluate the scientific validity of the QSAR model?
Within Annex XI REACH requires that a QSAR model is "scientifically valid" (it does not say validated). A first proof of the scientific validity can be the fact that the method has been published within a scientific journal through the peer review process. In this case, the method has been evaluated by other scientists, which found suitable for publication the method.
The preliminary document on (Q)SAR characterisation, compiled by the formerly known European Chemicals Bureau (ECB), lists a series of statistical parameters to be used for the model evaluation. Different tools apply to a model which is a classifier, or to a model which is a regression method. In the first case the output of the model is a class or category, such as toxic, or mutagen.
Evaluation of a classifier
Most typically classifiers are evaluated using the Cooper statistic. In the simple case of a binary classification, there are two classes, such as toxic (positive) or not (negative). The results of a classifier could be therefore grouped in four cases: toxic compounds predicted as toxic (True Positive or TP) or as non toxic (False Negative or FN) as well as non toxic compounds predicted as non toxic (true negative or TN) or as toxic (False Positive or FP). Three main statistical parameters can be derived by the combination of these four cases, for the model evaluation:
- Accuracy (A), also referred as concordance, is the measure of the correctness of prediction. This parameter gives a general evaluation of the errors done and is defined as the ratio between the compounds correctly predicted to the total number of compounds. Good models have high accuracy value.
A = (TP + TN) / TOTAL
- Sensitivity (S) is the measure of the positive compounds correctly predicted. Especially for regulatory purposes, it is important not to declare safe a chemical which conversely it is toxic (FN). The sensitivity take into account the number of FN and is defined as the ratio of the TP tests to the total number of positives. A good model has high sensitivity.
S = TP / P
- Specificity (SP) is the measure of the negative compounds correctly predicted. Specificity keeps into account the number of false positives and is defined as the ratio of the TN tests to the total number of negative compounds. Sometimes the 1 - SP parameter is reported.
SP = TN / N
It is our opinion that for regulatory purposes it is important to verify that the classifier has a high sensitivity, in order to reduce the number of false negatives.
Not only binary classifications are defined within REACH. For instance, a chemical can be not bioaccumulative, or bioaccumulative, or very bioaccumulative (three classes).
Evaluation of a regression model
Regression models are most typically evaluated using statistical parameters which keep into account the errors of the model. These errors are measured on the basis of the training set, and this gives an idea of the model robustness. However, this is not sufficient since the main interest of REACH is to understand if a certain model can be used for prediction purposes. Thus, for regulatory purposes additional statistical measurements are used, for predictivity. Some measurements use internal validation, other tools refer to an external test set.
The values predicted by the model (on training, test and/or external validation set) are put in correlation with the experimental values using a graph (an example is shown below) and the coefficient of determination (R2) is calculated and gives an estimation of the model goodness.

THE METHOD TRANSPARENCY
The fourth requirement asks to transparency. This is reasonable, since all documentations at the basis of the assessment of the properties of a chemical should be clearly available and checkable. One of the driving force of REACH was to have the correct knowledge on the properties of the chemical substances on the market. If some of the information is hidden this clearly goes against the spirit of REACH. There are three major components of the QSAR model:
- the property component, such as the toxicological effect;
- the chemical component, which involves the chemical format, descriptors and/or fragments used for the model;
- the mathematical or logical equation used to link the first two components.
The availability of the components of the model are also within the OECD principles, which explicitly asks for the definition of the endpoint (the property) in the first principle and the equation in the second principle.
We notice that this requirement implies that models which have components confidential or restricted may be not suitable for REACH. This may refer to the property values used to build up the model, or the information on the chemical part, or the mathematical equation.
THE APPLICABILITY DOMAIN
Requirements 1,3 and 4 serve to verify if the model is valid, is transparent, is adequate for REACH. However, all these factors, which refer to the model evaluation, are not sufficient. The second requirement in Annex XI requires to show that the model, which fulfils the requirements for the model, is appropriate for the chemical it has been applied to. This requirement refers to the applicability domain of the model. Thus, conceptually this requirement refers to the possible application of the model to the chemical compound of interest, while the other requirements evaluate the model per se.
Thus, according to REACH the evaluation of a QSAR model has to be done not only on the quality of the model, but in addition on the correct use of the model for the chemical.
This requirement is also present in the OECD principles.
How to evaluate the applicability domain?
There are some chemometric (chemometrics is a statistical area which combines statistics and chemistry) tools which use the chemical descriptors and/or fragments of the chemicals used to build up the model, and compare if the chemical descriptors and/or fragments of the target chemical are similar. An example of this approach is given by the freely available software AMBIT, developed within the Cefic Long-range Research Initiative (LRI). A major disadvantage of this approach is that it is based only on the chemical information.
Another approach is to evaluate the metabolism or toxicity pathway of the chemical of interest. However, this can be applied only in case they are known.
Another recent tool has been developed within the ORCHESTRA project. The tool, called TUTOR (Tool Unifying similarity, TOxicity prediction and Reasoning), keeps into account both the chemiometric information and the toxicity predictions done by the model, and in particular what kind of errors have been done by the model. Thus, this approach is based not only on the input space (the chemical descriptors and fragments), but also the output space of the model, which is the predicted property. Furthermore, this tool is based not only on the a priori data and information, as the other approaches, but also on the a posteriori result of the model.
TUTOR uses seven perspectives for its reasoning, and it combined different parameters into a single value. This value ranges from 0 to 1, and it is associated with an acceptability evaluation of the model results for a certain chemical. The user knows if the model can or cannot be used for a certain compound. In some cases a warning is given, recommending expert opinion. In all cases the reasons for the reliability is given, and it can be evaluated in a transparent way.
This tool is useful to explore the results of the model, linking the prediction with results obtained on similar compounds. This supports the model reliability in a transparent way, and extends the use of the QSAR as a tool for data exploration on similar compounds, which is very useful also a basis for read-across. Indeed, a major issue in read-across is the evaluation of similarity.






