In Silico methods

In silico is an expression that means "performed using a computer simulation" and was coined in 1989 during the workshop "Cellular Automata: Theory and Applications" in Los Alamos, New Mexico by a mathematician from National Autonomous University of Mexico (UNAM).

Leaflet screenshot
FAQ thumb
video documentary thumb
Online Course "QSAR Methods for REACH"
Online Course QSAR methods for REACH logo

In silico models for predicting toxicological, biological and physico-chemical properties are models able to find relations between particular characteristics of molecules and the property of interest. Example of in silico methods are: (Q)SAR, Read-Across and Virtual Screening.

Quantitative Structure-Activity Relationship (QSAR) are computer-based models for the prediction of toxicological, biological and physico-chemical properties. QSAR models aim at establishing, if it exists, the relationship between structural-derived properties of chemicals and their properties, such as toxicity. For the relationship between the activity and the chemical information a mathematical function is used. The chemical information can be given by chemical descriptors or fragments. Many thousands of chemical descriptors have been proposed, constitutional, geometrical, physico-chemical, topological, etc.
In the specific case of the evaluation of a qualitative relationship, for instance between the presence of a certain chemical fragment and the occurrence of a certain toxicity effect, the typical name is structure-activity relationship (SAR). (Q)SAR is used sometimes to refer to both QSAR and SAR. Here we will present QSAR and SAR together, using the acronym QSAR.

Read-across is a very simplified version of QSAR model. Basically, the property of one or few chemicals is predicted on the basis of one or more similar compounds, using or not some chemical descriptors.

Virtual screening are docking simulation based methods used to evaluate the binding between a chemical compound and a biological macromolecule, such as a protein.


Learn more:


Areas of applicability

QSAR models have been developed for many applications:

Typically, the performance of the QSAR models are better for physico-chemical properties, and decrease with the increase complexity of the studied system. For certain human endpoints, such as carcinogenicity and developmental toxicity, the general position is that in silico models should be used as unique tool, but as support for the evaluation based on several methods.

For aquatic toxicity, most of the models address acute toxicity, mainly in fish. Results are good for chemicals which do not carry residues which increase their toxicity, which applies to about 30% of the cases. Specific models for more toxic compounds should not be used in these cases.

Models for mutagenicity (mainly Ames test) generally give good results (accuracy about 80%, which is close to the test reproducibility).

Models for bioconcentration factors (BCF) give good results (R2 about 80%; error about 0.5 log unit). Care should be taken if the predicted value is close to the threshold, while if the predicted value is well above or below the BCF threshold, the prediction is much more reliable.

Models for carcinogenicity give a quite large error. About one out three chemicals is wrongly predicted. Better results can be obtained if the applicability domain of the model is evaluated. At least three models for carcinogenicity should be used, because the results vary. In case of agreement, the prediction is more reliable.

Relation/Comparison with other methods

Any model, such as animal models (also called in vivo) or cellular models (in vitro models), is a system which applies to a specific situation, and simplifies a complex system, which cannot be used experimentally for investigation. For instance, in case of toxicity studies, the target is human health, and for environmental studies the target may be a certain ecosystem. In practice, all current models simplify the final target. The rabbit model will never replace the human being, and similarly the trout used in a tank cannot replace the complex environmental system where many different fishes, and animal, and environmental conditions exist. Methods like in vitro and in silico models are often called alternative methods, because they can be used as alternative to animal models. In addition, these methods can be used to provide further information, to better address the final target: human beings and environment. Thus, it would be reductive to see in silico methods simply as surrogate of animal models.

The comparison and integration of in vivo, in vitro and in silico, have been discussed during the workshop organized by the EC funded project RAINBOW.

The debate

There is a debate on the use of QSAR in particular, and in silico models in general. The acceptability of these tools depends on the user and the purpose.  Within that debate, the major criticisms are that ‘In silico models are not reliable’, that ‘toxicity is too complex to be modelled’ and that ‘only the real experiment on the animal will provide the real result’.

All in silico models should give a proof of their performance. In the past, the classical models were developed using all data, which were used to build up a model without any validation of the model. The points were fitted in a linear regression, and no demonstration was given if this regression was applicable to other chemicals (Kaiser, K. L. E. at all, 1999).

The interest on the use of in silico models for regulatory purposes contributed to the discussion on the validation of the in silico models (for more info refer to the regulatory context area of this website). A clear definition of the possible use of the model should be given, and pitfalls clearly indicated, for all in silico methods, such as QSAR, read-across and docking studies. Examples in this direction are the tools developed within CAESAR, which guide the user, indicating results on similar compounds, and errors.

Even if toxicity is complex, some phenomena are simplier and rules have been identified. For instance, in case of aquatic toxicity most of the chemicals can be explained with quite simple relationships (for more info refer to the DEMETRA project). Genotoxicity has been modeled in large extent on the basis of some toxic fragments or related QSAR models (more info at CAESAR project). Improvement seems possible, on the basis of more data (and there are huge initiatives in this direction, such as ToxCast), better models, and integration of models.

Even in vivo models cannot provide all answers. A wise integration of models, including in silico models, can only improve our knowledge. In silico models take advantage of computers to better examine the data and information available. Disregard this would mean not to use all pieces of information we have.

Watch the ORCHESTRA video documentary based on interviews with regulators, industry and QSAR developers.

Click here to view an interview with Professor Wim de Coen, Head of Evaluation 1 at the European Chemicals Agency (ECHA)


In silico tools can offer immediate, world-wide, free (in many cases) access to a formidable amount of tools: databases, libraries of structures and properties, literature studies, models. Within a few years ToxCast will produce experimental data on 100,000 chemicals, omics will produce huge amount of data, HTPS will generate unprecedented amount of data. No single human expert will have the time and the possibility to dig these incredibly large collections of results. Even today thousands of chemicals are present in libraries with their toxicity properties.

The sooner the correct use of in silico models will be widespread, the better, in order to take advantage and exploit these results.

Regulatory Context


In the USA QSAR models are used since decades to evaluate physico-chemical, environmental, ecotoxicological, and toxicological properties. The US EPA makes available a series of QSAR models, such as EPISUITE and T.E.S.T.. Interestingly, CAESAR QSAR is also available through the web site of US EPA. Indeed, the CAESAR models, developed within an EC funded project, have been implemented in collaboration with the US EPA.

In Europe different regulations have different positions, relatively to in silico models:

In other cases, such as plant protection products and pharmaceuticals, tests on animals have to be done, at least on the parent compound. However, some tools have been developed to study ecotoxicity of metabolites and degradation products of pesticides (more info could be find at the EC project DEMETRA) website.


The REACH legislation


In December 2006 the European Community adopted a new regulation addresses the production circulation of chemical substances in the European territory, and their potential impacts on both human health and the environment. This new regulation is called REACH (Registration, Evaluation, Authorisation and restriction of Chemicals) and states that for each chemical circulating in the European territory, a complete dossier on physico-chemical, biological and toxicological properties has to be compiled. In order to prevent an over-usage of animal testing, REACH regulation foresees and promotes the use of alternative methods (such as QSAR) stating that:

Before new tests are carried out to determine the properties listed in this Annex, all available in vitro data, in vivo data, historical human data, data from valid (Q)SARs and data from structurally related substances (read-across approach) shall be assessed first.

For more info refer to the REACH regulation ANNEX VII.
Click here to access the full text of the REACH legislation, available through the European Chemicals Agency (ECA) website.



The REACH requirements for QSAR

REACH legislation foresees the use of alternative in silico methods such as QSAR and Read-Across. Regarding Qualitative or Quantitative structure-activity relationship ((Q)SAR), Annex XI states that:

Results obtained from valid qualitative or quantitative structure-activity relationship models ((Q)SARs) may indicate the presence or absence of a certain dangerous property. Results of (Q)SARs may be used instead of testing when the following conditions are met:

  1. results are derived from a (Q)SAR model whose scientific validity has been established,
  2. the substance falls within the applicability domain of the (Q)SAR model,
  3. results are adequate for the purpose of classification and labelling and/or risk assessment, and
  4. adequate and reliable documentation of the applied method is provided.

The Agency in collaboration with the Commission, Member States and interested parties shall develop and provide guidance in assessing which (Q)SARs will meet these conditions and provide examples.

Thus, these are the requirements according to REACH. Three of them refers to the QSAR model, and one on its use for a specific compound.

These principles are also expressed in four of the OECD principles for the QSAR validation, whereas we notice that there is no explicit mention of the fifth OECD principle within REACH.

The Model Validity

The first requirement points out that the method should be scientifically valid. We notice that it is not requested that the model is validated. Validation is a formal process, which takes many years. The validation process of a QSAR models would end after REACH probably. The validity has to be assessed through scientific criteria, considering the performance of the model in its results in prediction (ECHA - Guidance for the implementation of REACH).

Guidance on information requirements and chemical safety assessment
Chapter R.6: QSARs and grouping of chemicals

To access/download this and/or other documents from the ECHA's guidance for the implementation of REACH please refer to the guidance section of ECHA's website.

The OECD principles specify some features of the QSAR model, in order to assess if it work or not, and these include the statistical characteristics, for the model itself and its predictive properties. We notice that, for regulatory purposes while in the early years of the QSAR development the interest was on the properties of the model addressing the results in fitting, i.e. based on the chemical used to build up the model, the accent is given on the possibility for a model to predict the property of a new compound. In other words, it should be shown that it works for the purpose (proof-of-principle).

How to evaluate the scientific validity of the QSAR model?

Within Annex XI REACH requires that a QSAR model is "scientifically valid" (it does not say validated). A first proof of the scientific validity can be the fact that the method has been published within a scientific journal through the peer review process. In this case, the method has been evaluated by other scientists, which found suitable for publication the method.

The preliminary document on (Q)SAR characterisation, compiled by the formerly known European Chemicals Bureau (ECB), lists a series of statistical parameters to be used for the model evaluation. Different tools apply to a model which is a classifier, or to a model which is a regression method. In the first case the output of the model is a class or category, such as toxic, or mutagen.

The Characterisation of (Quantitative) Structure-Activity Relationships: Preliminary Guidance
Worth AP, Bassan A, Gallegos A, Netzeva TI, Patlewicz G, Pavan M, Tsakovska I & Vracko M
Document: EUR 21866 EN

To access/download thie document please refer to the QSAR section of the documentation page of ECB's website.

Evaluation of a classifier

Most typically classifiers are evaluated using the Cooper statistic. In the simple case of a binary classification, there are two classes, such as toxic (positive) or not (negative). The results of a classifier could be therefore grouped in four cases: toxic compounds predicted as toxic (True Positive or TP) or as non toxic (False Negative or FN) as well as non toxic compounds predicted as non toxic (true negative or TN) or as toxic (False Positive or FP). Three main statistical parameters can be derived by the combination of these four cases, for the model evaluation:

  • Accuracy (A), also referred as concordance, is the measure of the correctness of prediction. This parameter gives a general evaluation of the errors done and is defined as the ratio between the compounds correctly predicted to the total number of compounds. Good models have high accuracy value.

A = (TP + TN) / TOTAL

  • Sensitivity (S) is the measure of the positive compounds correctly predicted. Especially for regulatory purposes, it is important not to declare safe a chemical which conversely it is toxic (FN). The sensitivity take into account the number of FN and is defined as the ratio of the TP tests to the total number of positives. A good model has high sensitivity.

S = TP / P

  • Specificity (SP) is the measure of the negative compounds correctly predicted. Specificity keeps into account the number of false positives and is defined as the ratio of the TN tests to the total number of negative compounds. Sometimes the 1 - SP parameter is reported.

SP = TN / N

It is our opinion that for regulatory purposes it is important to verify that the classifier has a high sensitivity, in order to reduce the number of false negatives.

Not only binary classifications are defined within REACH. For instance, a chemical can be not bioaccumulative, or bioaccumulative, or very bioaccumulative (three classes).

Evaluation of a regression model

Regression models are most typically evaluated using statistical parameters which keep into account the errors of the model. These errors are measured on the basis of the training set, and this gives an idea of the model robustness. However, this is not sufficient since the main interest of REACH is to understand if a certain model can be used for prediction purposes. Thus, for regulatory purposes additional statistical measurements are used, for predictivity. Some measurements use internal validation, other tools refer to an external test set.

The values predicted by the model (on training, test and/or external validation set) are put in correlation with the experimental values using a graph (an example is shown below) and the coefficient of determination (R2) is calculated and gives an estimation of the model goodness.


The fourth requirement asks to transparency. This is reasonable, since all documentations at the basis of the assessment of the properties of a chemical should be clearly available and checkable. One of the driving force of REACH was to have the correct knowledge on the properties of the chemical substances on the market. If some of the information is hidden this clearly goes against the spirit of REACH. There are three major components of the QSAR model:

  1. the property component, such as the toxicological effect;
  2. the chemical component, which involves the chemical format, descriptors and/or fragments used for the model;
  3. the mathematical or logical equation used to link the first two components.

The availability of the components of the model are also within the OECD principles, which explicitly asks for the definition of the endpoint (the property) in the first principle and the equation in the second principle.

We notice that this requirement implies that models which have components confidential or restricted may be not suitable for REACH. This may refer to the property values used to build up the model, or the information on the chemical part, or the mathematical equation.


The third requirement states that the model should target an endpoint relevant for REACH. Only models which address the endpoints of interest for REACH are appropriate within this purpose. We notice that REACH mentions different purposes for the QSAR models: classification and labelling, is one possible target of the model, and risk assessment in another. In the first case models are classifiers; in the second case a regression more is more suitable. Indeed, in the first case the output of the model is a class, while in the second one it is a continuous value. A continuous value is necessary to get the ratio between the effect dose and the exposure level. We notice that depending on the tonnage for the same endpoint two models may be requested: one to classify the chemical substance, and one to have the toxic dose. (REACH also identifies a further potential use of QSAR: prioritisation.)



Requirements 1,3 and 4 serve to verify if the model is valid, is transparent, is adequate for REACH. However, all these factors, which refer to the model evaluation, are not sufficient. The second requirement in Annex XI requires to show that the model, which fulfils the requirements for the model, is appropriate for the chemical it has been applied to. This requirement refers to the applicability domain of the model. Thus, conceptually this requirement refers to the possible application of the model to the chemical compound of interest, while the other requirements evaluate the model per se.

Thus, according to REACH the evaluation of a QSAR model has to be done not only on the quality of the model, but in addition on the correct use of the model for the chemical.

This requirement is also present in the OECD principles.

How to evaluate the applicability domain?

There are some chemometric (chemometrics is a statistical area which combines statistics and chemistry) tools which use the chemical descriptors and/or fragments of the chemicals used to build up the model, and compare if the chemical descriptors and/or fragments of the target chemical are similar. An example of this approach is given by the freely available software AMBIT, developed within the Cefic Long-range Research Initiative (LRI). A major disadvantage of this approach is that it is based only on the chemical information.

Another approach is to evaluate the metabolism or toxicity pathway of the chemical of interest. However, this can be applied only in case they are known.

Another recent tool has been developed within the ORCHESTRA project. The tool, called TUTOR (Tool Unifying similarity, TOxicity prediction and Reasoning), keeps into account both the chemiometric information and the toxicity predictions done by the model, and in particular what kind of errors have been done by the model. Thus, this approach is based not only on the input space (the chemical descriptors and fragments), but also the output space of the model, which is the predicted property. Furthermore, this tool is based not only on the a priori data and information, as the other approaches, but also on the a posteriori result of the model.

TUTOR uses seven perspectives for its reasoning, and it combined different parameters into a single value. This value ranges from 0 to 1, and it is associated with an acceptability evaluation of the model results for a certain chemical. The user knows if the model can or cannot be used for a certain compound. In some cases a warning is given, recommending expert opinion. In all cases the reasons for the reliability is given, and it can be evaluated in a transparent way.

This tool is useful to explore the results of the model, linking the prediction with results obtained on similar compounds. This supports the model reliability in a transparent way, and extends the use of the QSAR as a tool for data exploration on similar compounds, which is very useful also a basis for read-across. Indeed, a major issue in read-across is the evaluation of similarity.

The debate

There is debate in Europe on the use of QSAR for REACH. The novelty of the regulation requires certain clarifications and examples. Guidance documents have been published by ECHA, but in many cases industry is still puzzled about the suitable use of the QSAR. ORCHESTRA wants to promote the debate and correct use of in silico models, and actions are planned to improve the use of QSAR. The EC project ANTARES is evaluating existing in silico models for REACH.


Watch the ORCHESTRA video documentary based on interviews with regulators, industry and QSAR developers.

Click here to view an interview with Professor Wim de Coen, Head of Evaluation 1 at the European Chemicals Agency (ECHA)