RDKit Atropisomer Bug: N-S(=O)-C Bond Issue

by Admin 44 views
RDKit Atropisomer Bond is Found in N-S(=O)C System

Hey guys! Let's dive into an interesting issue I bumped into while using RDKit, a super handy cheminformatics software package. Specifically, it involves the detection of atropisomer bonds within an N-S(=O)C system. This is a bit of a technical topic, but I'll break it down so we can all understand it.

The Bug: Atropisomer Bond Warning

So, the main problem is that RDKit is throwing a warning that a molecule should have an atropisomer bond, even when it chemically doesn't. Atropisomers, for those who aren't familiar, are basically stereoisomers that arise due to restricted rotation around a single bond. This restriction is often caused by steric hindrance, where bulky groups attached to the bond physically bump into each other, preventing free rotation. RDKit is designed to identify these situations, which is super useful for understanding the 3D shapes of molecules.

The warning message I'm getting is: "The 2 defining bonds for an atropisomer are co-planar." The issue arises with a specific type of molecule that contains an N-S(=O)C system, meaning it has a nitrogen atom (N), a sulfur atom double-bonded to an oxygen atom (S=O), and a carbon atom (C). The RDKit code is incorrectly identifying a bond within this system as an atropisomer bond, which is not correct. We are looking at a system where the atoms are not sp2 hybridized, and therefore, an atropisomer should not be detected. The software is warning about a situation that doesn’t exist chemically, which is not good.

Now, why is this a bug? Well, because it could lead to incorrect interpretations of the molecule's properties. If RDKit incorrectly flags a bond as an atropisomer, it might affect downstream analyses. For example, the software might calculate the molecule's properties as an atropisomer when in reality, there's free rotation. Also, the documentation states that the centers need to be sp2 hybridized. Let's delve deeper into this problem.

Reproducing the Issue: The Code

To make this bug reproducible, I created a Python script that uses RDKit. I'll share the code here so you can try it out yourself. This will help you understand the problem better.

from contextlib import redirect_stderr
from io import BytesIO, StringIO
from operator import xor
from rdkit import Chem
from rdkit.rdBase import WrapLogs

sdf = """
double-bond-O-R
  ChemDraw11262512372D

  0  0  0     0  0              0 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 8 7 0 0 1
M  V30 BEGIN ATOM
M  V30 1 C -1.071690 0.825057 0.000000 0
M  V30 2 C -0.357230 0.412538 0.000000 0
M  V30 3 N -0.357230 -0.412500 0.000000 0
M  V30 4 S 0.357229 -0.825019 0.000000 0
M  V30 5 O 0.357229 -1.650057 0.000000 0
M  V30 6 C 1.071690 -0.412500 0.000000 0
M  V30 7 C 0.357247 0.825029 0.000000 0
M  V30 8 C -1.071668 1.650057 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 1 1 2
M  V30 2 1 2 3 CFG=1
M  V30 3 1 3 4
M  V30 4 2 4 5
M  V30 5 1 4 6 CFG=1
M  V30 6 1 2 7
M  V30 7 1 1 8
M  V30 END BOND
M  V30 END CTAB
M  END
$$
single-bond-O-R-doesntwarn
  ChemDraw11262512372D

  0  0  0     0  0              0 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 8 7 0 0 1
M  V30 BEGIN ATOM
M  V30 1 C -1.071690 0.825057 0.000000 0
M  V30 2 C -0.357230 0.412538 0.000000 0
M  V30 3 N -0.357230 -0.412500 0.000000 0
M  V30 4 S 0.357229 -0.825019 0.000000 0 CHG=1
M  V30 5 O 0.357229 -1.650057 0.000000 0 CHG=-1
M  V30 6 C 1.071690 -0.412500 0.000000 0
M  V30 7 C 0.357247 0.825029 0.000000 0
M  V30 8 C -1.071668 1.650057 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 1 1 2
M  V30 2 1 2 3 CFG=1
M  V30 3 1 3 4
M  V30 4 1 4 5 CFG=1
M  V30 5 1 4 6
M  V30 6 1 2 7
M  V30 7 1 1 8
M  V30 END BOND
M  V30 END CTAB
M  END
$$
single-bond-O-S
  ChemDraw11262512382D

  0  0  0     0  0              0 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 8 7 0 0 1
M  V30 BEGIN ATOM
M  V30 1 C -1.071690 0.825057 0.000000 0
M  V30 2 C -0.357230 0.412538 0.000000 0
M  V30 3 N -0.357230 -0.412500 0.000000 0
M  V30 4 S 0.357229 -0.825019 0.000000 0 CHG=1
M  V30 5 O 0.357229 -1.650057 0.000000 0 CHG=-1
M  V30 6 C 1.071690 -0.412500 0.000000 0
M  V30 7 C 0.357247 0.825029 0.000000 0
M  V30 8 C -1.071668 1.650057 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 1 1 2
M  V30 2 1 2 3 CFG=1
M  V30 3 1 3 4
M  V30 4 1 4 5 CFG=3
M  V30 5 1 4 6
M  V30 6 1 2 7
M  V30 7 1 1 8
M  V30 END BOND
M  V30 END CTAB
M  END
$$
"""
WrapLogs()
out_buf = StringIO()
err_buf = StringIO()
with redirect_stderr(err_buf):
    for mol_ix, mol in enumerate(Chem.ForwardSDMolSupplier(BytesIO(sdf.encode()))):
        # mols 0 and 2 warn, mol 1 does not
        assert xor(("co-planar" in err_buf.getvalue()), mol_ix == 1)
        mol.Debug(False)
        assert "SP3" in err_buf.getvalue()
        for line in err_buf.getvalue().splitlines():
            if " N " not in line or " S " not in line:
                continue
            assert "SP2" not in line
        err_buf.truncate(0)

Code Explanation

Let's break down this Python code step by step, so you understand what it does. First, it imports necessary modules from the RDKit library, which is the cornerstone for all chemical structure manipulations. Then, it defines a variable sdf containing an SDF (Structure Data File) string. This SDF file defines a few example molecules, with specific atom and bond arrangements. Each molecule in the SDF is designed to test the atropisomer detection. The critical part is the section where the code processes each molecule. It uses Chem.ForwardSDMolSupplier to read the molecules from the SDF string. For each molecule, it checks for the "co-planar" warning in the error buffer. The assertion `xor((