Structure Solving Pipeline Does not work well with PTMs, heteromers

ming · 3 March 2025 11:30

Hi, I have succesfully solved APO structure (chain A) with NOE data with ccpnmr and nih-xplor using the well written guide for structure solving provided by:

I had much less success with a protein-peptide complex with phosphoserine (SEP) however.

However, when I finished assigning the NOEs for the Holo structure, where there are two chains: A (protein) and B (peptide with phosphoserine SEP), the script.sh generated by ccpnmr works and runs to completion, but with unexpected outcomes. The script.sh deletes my phosphate atoms on my SEP molecule, and renames it to SER. The protein size quadrupled from 44 residues to 188 residues due to the program expecting symmetric dimers if the chain length is larger than 1, and replicating and stitching together my peptide and protein repeatedly.

Problem 1. Phosphoserine (SEP) is recognised by NIH-xplor and its protein topology / potential files, but the script.sh has many parts that forcefully converts SEP to SER instead. I have identified that the:

“from iupacNaming import toIUPAC, fromIUPAC” and
“toIUPAC()” code does not having SEP, and added in the SEP atom naming into iupacNaming.py in NIH-xplor/python but that does not fully fix things. The pdb generation script and talos also converts SEP to SER many times throughout the script.sh, and this causes the final stimulated annealing step to be off, as the SEP is key for binding, as SER is non-binding. This problem is most obvious at the fold.py step, and with the out.nef step, at which point the SEP is completely changed to SER.

Problem 2: The script.sh handles symmetric dimers in the workflow, but cannot handle hetero, non symmetric multimers. It automatically reads through the protein chain in the .nef file and treats chain A and chain B like a single chain very early on, despite it being stated that:

     35  A  39   GLY  middle  .  .
     36  A  40   ASN  middle  .  .
     37  A  41   SER  end     .  .
     38  B  501  GLY  start   .  .
     39  B  502  LEU  middle  .  .

The protein and peptide are distinct.

I have tried changing the code line by line, but it has taken me a long time without much success. Is there any way for the code to become more generalisable to handle SEP (already encoded in NIH-xplor) and avoid funny multimerisation issues?

Thank you.

ElizaP · 4 March 2025 08:17

Hi,

Did you use genLingandCif.py (Xplor NIH) to generate .top and .par files required for calculations?

Individual python scripts that are called by script.sh would need to be modified with protocol.initTopology and protocol.initParams lines before readNEF is called. At the moment you have to do it manually in the Xplor NIH scripts.

There must be a problem with psf generation from your input.nef. I am not sure if this is because of your non standard residue. I saw you posted message on Xplor NIH mailing list, which is the best place for this kind of query and lets see what they suggest.

In the meantime I will have a go myself with your input.nef

BW,
Eliza

ming · 4 March 2025 09:34

Dear Eliza,

Thank you for your prompt reply. I did not use genLingandCif.py, as there already seems to be a definition for SEP (phosphoserine) within the NIH topology files, and I do see some parts of the python code behind xplor:

In the xplor-nih-3.9/toppar/protein-4.0.top file, there seems to be some reference to phosphoserine:

! patch to turn serine into phosphoserine
! This one includes the phosphate group
! CDS 2020/11/16
!
presidue SPA
group
modify atom 1cb charge=0.0 end
modify atom 1og type=OHP charge=0.0 end
delete atom 1hg charge=0.0 end
group
add atom 1P type=PP charge=0.0 end !charges are bogus
add atom 1O1P type=OPT charge=0.0 end
add atom 1O2P type=OPT charge=0.0 end
add atom 1O3P type=OPT charge=0.0 end
add bond 1og 1P
add bond 1O1P 1P
add bond 1O2P 1P
add bond 1O3P 1P
add angle 1cb 1og 1p
add angle 1og 1P 1O1P
add angle 1og 1P 1O2P
add angle 1og 1P 1O3P
add angle 1O1P 1P 1O2P
add angle 1O1P 1P 1O3P
add angle 1O2P 1P 1O3P
end

There seems to be some definition for SPA, which is then referenced in

code like xplor-nih-3.9/python/psfGen.py, that: "
#Note: if the variant name is present in a RESIdue entry in the
#topology file, the corresponding entry here is not used.
variantResidues = { ‘protein’ : [VariantResidue(‘HSD’,‘HIS’,
deletedAtoms=‘HE2’),
VariantResidue(‘HSE’,‘HIS’,
deletedAtoms=‘HD1’),
VariantResidue(‘HID’,‘HIS’,
deletedAtoms=‘HE2’),
VariantResidue(‘HIE’,‘HIS’,
deletedAtoms=‘HD1’),
VariantResidue(‘CYSS’,‘CYS’,
deletedAtoms=‘HG’),
VariantResidue(‘SEP’,‘SER’,
patch=“SPA”),
VariantResidue(‘TPO’,‘THR’,
patch=“PTPO”),
VariantResidue(‘PTR’,‘TYR’,
patch=“PPTR”),
VariantResidue(‘R1’,‘CYSP’),
]
"

This encodes for modified serine, which is called SEP and should trigger the patch SPA in the topology file. I wonder if this is sufficient for downstream runs. However, no program-ending errors popped up during the run, but the phosphoserine became lost after out.nef is being generated. Changing the out.nef script by replacing SER with SEP did not fix things, and the phosphate group is still lost in the final refine and fold steps. The SEP is renamed to SER in the pdb for pass1, pass2 and pass3, but the phosphate group remains, at least.

The whole run for the script.sh on the input.nef is quite large, and I have placed all the files on google drive:
https://drive.google.com/drive/folders/1nrxLlePYWTk3lEwWtgg8ttLkinUB4au8?usp=sharing

Thank you so much for giving this a look, I really appreciate your help. I will also ask more about the SEP issue to the xplor-NIH forum as well.

Thank you,

Best wishes,
Ming