Computational chemistry at Janssen
- First Online:
- 468 Downloads
Computer-aided drug discovery activities at Janssen are carried out by scientists in the Computational Chemistry group of the Discovery Sciences organization. This perspective gives an overview of the organizational and operational structure, the science, internal and external collaborations, and the impact of the group on Drug Discovery at Janssen.
KeywordsComputer aided drug design Computational chemistry Drug discovery
In this perspective we give an overview of the global Computational Chemistry group and its research activities at Janssen. Several viewpoints are presented in different sections:
Organizational and operational structure: how are computer-assisted drug discovery (CADD) activities organized within the broader Research and Development organization of Janssen and how do we operate within this organization?
Science: what scientific approaches are used by the CADD scientists, are there any specializations, and how are new technologies introduced?
Collaborations: what is the extent and nature of research carried out with external partners, academic and industrial?
Impact: what is the impact of CADD on drug discovery at Janssen and how do we measure it?
It is impossible to do justice to all the scientists, activities, and collaborative research that the global group is involved in, but we trust this article gives a good view on the role and impact of CADD at Janssen.
Organizational and operational structure
Johnson and Johnson (J&J) is one of the largest pharmaceutical companies in the world as assessed by various measures [1, 2]. Since 2011, all Johnson & Johnson pharmaceutical discovery activities have combined under one name: Janssen, Pharmaceutical Companies of Johnson & Johnson (hereinafter Janssen). Discovery research activities at Janssen are carried out at three major sites: Spring House PA and La Jolla CA in the US, and Beerse in Belgium (including satellite chemistry-only sites in Toledo Spain and Val de Reuil France). CADD research is done at all major and satellite sites by scientists that are part of a global Computational Chemistry group (hereinafter CompChem). The CompChem group consists of 19 permanent employees and a varying number of contractors and postdocs, (as of this writing, 5 in total). Janssen research sites in Leiden, The Netherlands, and Shanghai, China, have a local CADD scientist who is not a member of the CompChem group but is scientifically involved in the global Janssen CADD community. The majority of the CompChem group conducts CADD work, with two permanent employees and one contractor providing chemical patent search services. The remainder of this article will deal solely with the CADD functions within the group. The CompChem group is part of the Lead Discovery department in the Discovery Sciences (DS) organization. DS provides end to end small molecule discovery and development support to the five therapeutic areas at Janssen: oncology, neuroscience, immunology, infectious diseases, and cardiovascular disease and metabolism. Scientists in the CompChem group are therefore not part of the therapeutic areas but are key contributors to discovery project teams that are driven by the therapeutic areas. Within DS there exists another group that has a very strong computational basis: the Computational Sciences (CS) department. This CS group applies computational methods to target identification and validation, nextgen sequencing, omics, and image analysis, and also has an important role in CADD related activities such as genome-wide compound activity predictions. This work will be discussed in more detail in the “Science” section.
Small molecule discovery project teams focus on a particular protein target or phenotypic assay, and include chemists, biologists, ADME scientists, and almost always a CompChem scientist who is an integral part of the team. CADD support is not provided as a service on a request basis but by CompChem scientists who are core members of the discovery teams and are often involved in key project decision making. In some cases, the CompChem scientists have served as project leaders. Even though our CompChem scientists have a wide variety of computational specializations, in almost all projects, all CADD activities are done by a single person, from the start of the project until new molecular entity (NME) declaration (i.e., clinical candidate declaration). These CADD activities may include target ligandability assessment, homology modeling, virtual screening, focused library selection, HTS screening analysis and hit triage, compound and library design, in silico ADME modeling, etc. This one person per project setup has the great benefit that the scientist is fully embedded in the project and has detailed knowledge of compound structures and activities, SAR hypotheses, previous team discussions, etc. It also benefits team dynamics and results in maximum impact of CADD on the project. Often, the CompChem scientist is one of the few team members who stay with the project for the entire discovery phase. Obviously, other CompChem group members are often consulted when calculations are needed for which the scientist is not an expert. On average every CompChem scientist is supporting 2–3 discovery projects, usually one or two major projects that require continuous attention and some that only need ad hoc computational support. Approximately 50% of the small molecule discovery projects at Janssen are structure-enabled with one or more target/ligand crystal structures.
The most important and impactful form of communication between CompChem scientists and the team is through working sessions with individual medicinal chemists or with subteams, in front of a 3D screen or in a meeting room with 3D display facilities. Depending on the scientist and the computational results that are shared, the software used in these sessions is Maestro , MOE , or PyMol . It is often during these meetings that new molecular ideas are generated and decisions are made on what molecules to synthesize. We find that most chemists on the teams are highly involved in the use of 3D structures to guide idea generation. These sessions can be done remotely as well, for instance in projects where the CompChem scientist is not co-located with some of the medicinal chemists. Remote support is not ideal, but with the advent of easy screen sharing and teleconferencing tools, it has become an acceptable option. Remote 3D sessions are not yet possible with our infrastructure, and this would be a next step in further improving remote support. We find it most efficient if the CompChem scientist is co-located with the majority of the chemists on a given project and try to optimize interactive communication as much as possible.
Informal interactions between the different global CompChem groups are frequent, and are often initiated to exchange expertise or to have team discussions in projects where multiple CompChem scientists are involved. On a monthly basis we organize a global CADD forum, in which internal and external speakers are invited to present their work to the global community, which also includes scientists from our therapeutic areas, medicinal chemists, and Research IT. Every quarter we have a videoconference with CompChem scientists from all sites, and we aim to have a face to face meeting every year. We consider these meetings as vital for the effective sharing of expertise, and also to foster a group culture in which informal interactions are easy and frequent.
Besides the all-round project support that is provided by the CompChem scientists to discovery projects, at any given time, there are also a number of initiatives in which group members have leading roles. These initiatives include the evaluation and development of computational technologies, and also multidisciplinary efforts as exemplified by our Library Enhancement Working Group (LEWG) and our Kinase Working Group (KWG). These initiatives were spearheaded by members from the CompChem group and have expanded to include members from therapeutic area biology, therapeutic area chemistry and information technology. The LEWG team has been instrumental in coordinating the rollout of the new Janssen global screening deck and systematically enriching our screening compound collection with novel chemical libraries designed collaboratively by therapeutic area medicinal chemists and CompChem scientists . In order to enable external early discovery projects a drug-like screening deck “JumpStARter library” has been assembled by the LEWG team, and this collection has been used by several external collaborators who otherwise have limited access to a screening deck. The KWG team has developed kinase-focused assets such as compound libraries and data analysis tools, coordinated the evaluation of new assay technologies, and generated kinome-wide compound activity data . In addition, the KWG provides a forum to share target family knowledge and experience across therapeutic and functional areas.
Most computer hardware that is used is on site, including Linux and Windows workstations and small CPU clusters. Calculations involving molecular dynamics are usually run on internal GPU servers. In addition, both CPU and GPU resources are available externally on the cloud and at centers such as the San Diego Supercomputing Center.
Even though CADD is a relatively specialized research area, there is a large variety of calculations and analyses that are carried out by the CompChem group. The spectrum includes bio- and cheminformatics-based data mining, target ligandability analysis, ligand-based predictive modeling, homology modeling, pharmacophore modeling, virtual screening, molecular dynamics, and quantum chemistry calculations.
Examples of bioinformatics searches that are done in the group are protein sequence comparisons and pathway analysis. These are often relevant in the early stages of a project, for instance when the target and its potential off-targets are still being evaluated. Cheminformatics data mining in all available proprietary, commercial, and public compound activity databases is also a common task in the early stages of a project, to get a full picture of what small molecule space is known for the target at hand, as well as to identify molecules similar to actives from literature or screening hits, whose testing will help develop structure-activity relationships. The recent availability of the SureChEMBL database enables us to rapidly and efficiently mine the chemical patent literature, which contains a lot of unique compound structure and activity data . Thanks to efforts of the Open PHACTS project  the SureChEMBL database has been enriched with target and disease annotation that is automatically extracted from the patents. In depth patent searches and patent landscape analyses are done by our specialized patent search scientists, who make use of commercial patent databases such as STN from Chemical Abstracts Service.
Recently the CompChem group has taken the initiative, together with scientists in our Research IT department, to explore more complete and more complex data mining by using Linked Data . This effort was catalyzed by our involvement in the Open PHACTS project [8, 10], a groundbreaking public–private consortium project to semantically integrate a large number of public biomedical databases (see also in “Collaborations” section). We are integrating our internal databases with public and commercial chemistry and biology databases using linked data concepts. The benefits are at least threefold: (a) our scientists will access all available data simultaneously, for instance compound activity data in our internal ABCD database , the public ChEMBL  and DrugBank  databases, the commercial GOSTAR  and Thomson Reuters  databases, etc. (b) certain queries become much simpler and faster to execute, especially those that need to access data from different databases in multiple domains, and (c) linked data offers new analysis opportunities such as inferencing of relationships between concepts (e.g. a compound and a disease) via indirect connections (see e.g. Euretos ). It is clear that the benefits of this work go beyond CADD applications, and the in-house system is currently being evaluated by a diverse set of scientists at Janssen from many different disciplines.
Ligand-based predictive modeling can be done in many ways, depending on the descriptors and machine learning methods that are used. These models are routinely generated and used to prioritize molecules for synthesis during hit-to-lead and lead optimization. Input descriptors used include 1D (numeric or categorical properties), 2D (chemical structure), or 3D (spatial molecular conformation). Our colleagues in the Computational Sciences department have developed genome wide compound activity prediction models, using all available compound activity data and advanced machine learning methods [17, 18]. These models can predict active vs. inactive compounds (at a given concentration) on a large number of targets. It is interesting to note that by using activity data on many targets the performance on any given target is better than for models that are generated with single target data only. Important applications of these models include predictions of target(s) in the deconvolution of phenotypic hits, and computational assessment of target selectivity of one or more compounds. These applications are especially useful in the early stages of a project, for instance during HTS analysis. We work very closely with this group and are ideally positioned to apply the predictions in discovery projects.
Virtual screening, for instance by docking, ligand-based pharmacophore searching, or by shape-based searching, has been one of the standard activities of industrial CADD groups. We use it often, but not in all projects. The objective is primarily to see if there are any interesting compounds that are available for screening or purchasing, but are not in Janssen’s standard screening deck. When a project is underway in hit-to-lead or lead optimization, virtual screening is sometimes applied to virtual libraries of compounds to prioritize for synthesis those most likely to be active (see e.g. ).
Quantum chemistry calculations (QM) are used occasionally, for instance for the prediction of pKa values for compounds that are poorly predicted by standard chemical fragment-based methods. We also use quantum calculations in absolute stereochemistry assignment in cases where crystallography is not an option. By comparing the calculated vibrational circular dichroism (VCD) spectrum with the experimental spectrum, the correct stereoisomer can be assigned . QM is also often used to assess substituent conformational effects in ligand SAR analysis and accurate charge distributions of molecules.
Molecular dynamics (MD) and in particular free energy perturbation (FEP) calculations have been recognized as an area of high interest, and we think it is the right time to invest in building capabilities and expertise in this area. FEP and other free energy calculations have the potential to much better predict compound affinity for its target compared to other non-MD methods, and if this promise is realized, it would have a major value in hit-to-lead and lead optimization . At this moment, many molecular design decisions directed toward improved binding affinity are based on visual inspection of ligand–protein interactions, and a drive to form new hydrogen bonds, fill hydrophobic cavities, stabilize bound ligand conformations, etc. A huge amount of experience has been generated in the industry on this topic in the past decades. Still, it is unlikely that anyone can make an accurate quantitative judgment on several very important factors that determine ligand affinity, such as solvent rearrangements, protein flexibility, and entropy–enthalpy compensation. These effects are implicitly included in FEP and therefore could lead to systematically better predictions. Better predictions mean faster affinity optimization of compounds, and also an expected willingness to make synthetically more challenging compounds, which in turn could lead to better drug candidates. Building expertise is a high priority, as the results of these methods cannot be properly analyzed and explained if the tools are used as a black box method. The support of management to develop our capabilities in this area has enabled us to apply FEP to most of our structure-enabled projects, and synthesis plans are incorporating the results. So far we have seen generally good performance of FEP in internal projects, with free energy prediction errors of less than 1 kcal/mol, an accuracy that is useful to guide molecular design .
Several major CADD software packages are available to support all these calculations, including those from Schrödinger , Chemical Computing Group , and OpenEye . Pipeline Pilot  is our preferred tool for developing and applying calculation pipelines that can directly access our ABCD compound activity database. A variety of smaller software tools are also available to address specific needs. These include MED-SuMo , LigandScout , AMBER , SYLVIA , Proasis , Eidogen-Sertanty  and StarDrop . The modeling software from the Cambridge Crystallographic Data Centre is also available, including for instance GOLD  and ReliBase+ . Finally, our in-house developed powerful analysis tool 3DX  is used very often for compound property calculations, clustering, SAR table analysis, database queries, graphical analysis, etc., and accesses ABCD and other databases directly. Planning and decision making on our software licensing is done globally, with only a small minority of software available locally.
Collaborations are crucially important for our CompChem group for exploration of novel science, proof of concept application of new methods, and the continuous scientific development of our scientists and our collaborators. In addition to collaborating outside the company, joint research with other groups and departments within Janssen and other J&J sectors is highly rewarding and the resulting partnerships capitalize on one of big pharma’s strong assets: the presence of a diverse group of expert scientists that are working on a diverse set of projects. Internal collaboration with the Computational Sciences group was mentioned in the previous section. Other internal collaborators include the Research IT department, preclinical and clinical ADME groups with in silico ADME calculations,, the Pharmaceutical Development group with synthesis optimization and crystal polymorph predictions, and also J&J’s consumer products division with natural product compound selection. The cross-functional Kinase Working Group and Library Enhancement Working Group are examples of specific collaborative efforts involving our therapeutic area colleagues as well.
Collaborations with the external world come in many shapes and forms. They include:
Direct collaborations with an academic partner or company, often by sponsoring a PhD student or postdoc. We also work with academics as scientific consultants.
Direct collaborations with computational service providers like Schrodinger  and out/in-sourcing with off/on-site scientific contractors.
Small public–private projects with several partners that can be partially funded by a governmental or non-governmental organization (for example the Flemish VLAIO agency, formerly called IWT ).
Larger precompetitive consortia with 10+ public and private partners, often with significant contributions from funding agencies like NIH, Wellcome Trust, EU (e.g., IMI , Horizon2020  programs).
The CompChem group has been involved in several IMI precompetitive consortia, such as Open PHACTS (linked biomedical data) , European Lead Factory (HTS screening, new chemistry) , and K4DD (binding kinetics) . Other collaborative consortia of note are the Structural Genomics Consortium (SGC) for the discovery of tool compounds and crystal structures for predominantly epigenetic targets , the Phenomics Discovery Initiative , and a collaboration with the IGBMC institute in Strasbourg, France, to explore the use of electron microscopy in structural biology of large complexes.
Involvement in the European Bioinformatics Institute’s (EBI) Industry Partner Programme has played a significant role in bringing scientists from different companies together, a welcome side effect to the educational and strategic benefits of this program. A cross pharma precompetitive discussion group in the US has been organizing teleconference meetings and face to face meetings to exchange experiences on new and broadly relevant technologies and applications such as free energy calculations. These interactions undoubtedly benefit all participants, and have been a refreshing change from the more secretive interactions in the past. The initiatives will undoubtedly lead to collaborative efforts to improve computational methods, which will benefit the pharma research community as a whole.
The impact of CADD on drug discovery is not easy to quantify. One objective measure is the number of patent applications on which a CADD scientist is an inventor, and this has been steadily rising at Janssen and probably in the pharma industry in general. An analysis on Janssen discovery patents in SureChEMBL showed that in around 25% a CADD scientist was an inventor (114 out of 461 granted US patents during the period 2010–2016). The percentage is higher for projects in which structure-based design played a significant role. A perhaps more important demonstration of value is the presence of CADD scientists as core members of almost all discovery teams. Today, medicinal chemists want CADD scientists on their project, and this has not always been the case in the industry. As described before, the impact at Janssen is present from the early target identification and validation phase to improving our corporate compound collection, hit finding, hit-to-lead chemistry, and finally lead optimization. CADD scientists select the libraries to screen, influence the choice of molecules to be synthesized, and improve project decision-making through data integration and visualization. Even beyond NME declaration computational chemistry can play a role, for instance in pharmaceutical development or in drug repurposing.
Our HTS hit triage process is evolving continuously, and includes a major role for our CompChem scientists, involving both chemical analysis and clustering, identification of additional compounds of interest, and assembling data to drive good decision making.
Biophysically measured binding affinity and X-ray or NMR structures of proteins are very important data for CADD scientists, and therefore the CADD scientists are always great proponents of having those experiments done in discovery projects. The same holds true for fragment-based drug design, an approach with a strong computational component, that has been applied to many internal projects with internal and external fragment screening (see for example ). The advocacy for these approaches has certainly stimulated their use, with a resulting beneficial effect on discovery projects.
The impact of the CompChem group goes beyond computational work. In the Kinase Working Group initiative described above and driven by CompChem scientists, many compounds have been found with interesting selectivity profiles. Several of these compounds have been very useful in target validation and assay development, and have also resulted in starting points for hit-to-lead exploration. By actively reaching out to academic groups several early research collaborations have been set up in which Janssen proprietary compounds are used to identify and validate new kinase targets.
This perspective has given an overview on the central role of CADD in small molecule discovery at Janssen. The broad impact is reflected in the variety of topics in publications of our CompChem scientists (see previous references and [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]). In conclusion, we expect that the impact of CADD will continue to increase. Predictions will get better, for instance by systematic application of free energy methods, and improved machine learning methods that make use of experimental compound descriptors that go beyond chemical structure, such as for instance transcriptomics and high content cellular imaging readouts. As more diverse and complex experimental data is generated in discovery projects, the CADD scientists will need to evolve with this, developing new computational approaches, and continuing to maximize their impact.