Protein Assignment Statistics macro

Hi everyone,

I’ve recently used NMRtist to perform an automated backbone assignment. I used the tutorial to import the data to CcpNMR Analysis V3 and copy the peak lists to each of my 6 spectra (I did this manually, not by the quickFlyacopy macro) and visual inspection shows that peaks have copied to the correct dimensions.

I have subsequently run the ProteinAssignmentStatistics macro using NMRtist’s chemical shift list which reports on the assigned chemical shifts, assignment completeness and summary assignment just fine but the ‘Possible problems’ table is unpopulated. I suppose there is a chance there are no detected problems from the data but this seems unlikely as the estimated assignment accuracy was only 74%.

Is there a way to pinpoint why possible problems aren’t being reported?

Thanks,

Beth

I’m sure Eliza & Vicky will give you chapter and verse on this but the issue is that the data retrieved from NMRtist doesn’t contain any information about the algorithm’s uncertainty in the assignment and the macro in AnalysisAssign is checking for self-consistency of the data rather than any “truth”.

1 Like

Hi Beth,

the “possible problems” will only highlight things like inconsistent use of atom naming and things like that.

What you will need to do is check through your assignments and see whether they are correct or not. We are still working on doing a tutorial for how to do this (and indeed try out various different methods). I’m interested that your assignment accuracy was 74%. When trying to get a good dataset for a tutorial I have been looking for something with exactly that kind of accuracy. In the case of Ubiquitin, it is pretty much perfect, so there isn’t much to change. For another protein I got 34% accuracy and quite frankly it was easier to start afresh by hand, than start with the muddle that I got from NMRtist. But in your case there is a good chance that most of it is okay, but that there will be some errors that you need to correct.

The main thing you should do is open your various different spectra and then navigate using the Assignment Inspector. So as an example (using the perfect ubiquitin):

Displays showing the 13C Noesy, Backbone Assignment expts and 15N Noesy/Tocsy:

Display showing the HCCH Tocsy (or you may have yours with vertical strips - a matter of personal choice!):

The Assignment Inspector:

And the CH Strips tab in the Assignment Inspector Settings:

If you now double-click on a chemical Shift in the top table of the AssignmentInspector, then it will navigate to those strips in all the spectra (as I’ve selected all the Displays in the Settings). In my case, I have selected just to show the i and i+/-1 strips in the NH-based spectra. That will show you whether the sequential assignment is looking okay or not. The CH-based spectra will show you whether the side-chain assignments are looking okay or not. But if you want you can look at longer stretches in the NH-based spectra - just adjust the number of i+/- strips you want to see.

Note, incidentally, that (at the moment) I am doing all of this navigation from the nmrtist imported ChemShiftList, not the one associated with the spectra.

What I haven’t really managed to try out much, is how best to make corrections when you spot an error. I would try working with the copied peak lists and correct those. You’ll then always have your NMRtist imported ones as a reference for what you imported originally.

What I don’t really have an idea about, is what types of error NMRtist makes. If you see that a link between residues is incorrect, does that mean only one is incorrect (if so which one) or both? Sometimes the sidechain data shows that something is incompatible with the assignment it has been given. So this is where things are going to get a bit difficult, I fear.

If anyone is willing to share a 70-80% accurately assigned dataset from NMRtist with me, then do please get in touch!

Vicky

Hi Vicky,

Thank you so much for such a detailed response, I think I understand much better now how CcpNMR uses NMRtist data, and I wasn’t familiar with the assignment inspector so that was really helpful.

For this particular protein I had already been working on a manual assignment for a couple of weeks so I already had a project with peaks picked and chemical shifts assigned in every experiment, and even a few residues.

I decided to open the NMRtist assignments in a new project, so I now have separate projects for manual and automatic backbone assignment. As you mentioned that making changes to the NMRtist peak list might not be straight forward, I think I might continue with the manual assignment and use NMRtist’s peak list as a guide. I’m sure this will take much longer but at least I will have full control over the final assignments.

I would be more than happy to help in whatever way I can to provide data and/or feedback for a tutorial. I will email you shortly and we can discuss in more detail.

Thanks again,

Beth

It should be safe to import the NMRtist assignments into the same project but put them (the synthetic spectra with their assigned peak lists) in a separate ChemicalShiftList from your manual assignments. A good first pass check would then be to calculate shift differences between your manual assignments and the NMRtist ones - any significant discrepancies surely indicate where you and the algorithm disagree.

From experience things to watch out for include: assigned first residue NH - very unlikely to be visible unless working at low pH; assigned His imidazole NHs; residues whose NHs are not resolved in the HSQC.

Hi there,

Thanks so much - hadn’t encountered the chemical shift differences macro yet, this looks like exactly what I need.

Best wishes,

Beth

Hi Beth,

Here is the macro that should be able do that and creates a simple table (sorry quite a few lines).

Tip: use filter table (FT) option to get info that you need and you can also hide some columns.

Any questions let us know.
BW,
Eliza

"""
Compare two chemical shift lists by walking the NmrChain/NmrAtom hierarchy, flagging atoms
with shifts in both lists and reporting whether the ppm difference stays within a user-defined tolerance.
The results are exported to a project DataTable for further review.
"""
from __future__ import annotations

from collections import OrderedDict
from datetime import datetime
import re

import pandas as pd

from ccpn.ui.gui.popups.Dialog import CcpnDialogMainWidget
from ccpn.ui.gui.widgets.PulldownListsForObjects import ChemicalShiftListPulldown, NmrChainPulldown
from ccpn.ui.gui.widgets.Label import Label
from ccpn.ui.gui.widgets import Entry
from ccpn.ui.gui.widgets import CheckBox
from ccpn.ui.gui.widgets.MessageDialog import showWarning, showMessage


class CompareChemicalShiftLists(CcpnDialogMainWidget):
    title = 'Compare Chemical Shift Lists'

    def __init__(self, parent=None, mainWindow=None, title=title, **kwds):
        super().__init__(parent, setLayout=True, windowTitle=title, size=(520, 270), **kwds)

        if mainWindow:
            self.mainWindow = mainWindow
            self.application = mainWindow.application
            self.current = self.application.current
            self.project = mainWindow.project
        else:
            self.mainWindow = None
            self.application = None
            self.current = None
            self.project = None

        self._createWidgets()

        self.setOkButton(text='Compare', callback=self._compareLists,
                         tipText='Create a DataTable summarising the chemical shift differences')
        self.setCancelButton(callback=self.reject)
        self.setCloseButton(callback=self.reject, tipText='Close')
        self.setDefaultButton(CcpnDialogMainWidget.CLOSEBUTTON)

    def _createWidgets(self):
        row = 0

        Label(self.mainWidget, text='Reference list:', grid=(row, 0), hAlign='right')
        self.referenceList = ChemicalShiftListPulldown(self.mainWidget,
                                                        mainWindow=self.mainWindow,
                                                        grid=(row, 1),
                                                        showSelectName=True,
                                                        labelText='')
        row += 1

        Label(self.mainWidget, text='Comparison list:', grid=(row, 0), hAlign='right')
        self.comparisonList = ChemicalShiftListPulldown(self.mainWidget,
                                                         mainWindow=self.mainWindow,
                                                         grid=(row, 1),
                                                         showSelectName=True,
                                                         labelText='')
        row += 1

        Label(self.mainWidget, text='NmrChain (required):', grid=(row, 0), hAlign='right')
        self.nmrChainWidget = NmrChainPulldown(self.mainWidget,
                                               mainWindow=self.mainWindow,
                                               grid=(row, 1),
                                               showSelectName=True,
                                               labelText='')
        row += 1

        Label(self.mainWidget, text='Hydrogen tolerance (ppm):', grid=(row, 0), hAlign='right')
        self.hydrogenToleranceEntry = Entry.Entry(self.mainWidget, text='0.02', grid=(row, 1), editable=True)
        row += 1
        Label(self.mainWidget, text='Heteroatom tolerance (ppm):', grid=(row, 0), hAlign='right')
        self.heteroToleranceEntry = Entry.Entry(self.mainWidget, text='0.20', grid=(row, 1), editable=True)
        row += 1

        self.onlyOutsideCheck = CheckBox.CheckBox(self.mainWidget,
                                                  text='Only report atoms outside tolerance',
                                                  checked=False,
                                                  grid=(row, 1),
                                                  hAlign='left')

    def _compareLists(self):
        reference = self.referenceList.getSelectedObject()
        comparison = self.comparisonList.getSelectedObject()

        if not reference or not comparison:
            showWarning('Missing selection', 'Please select two chemical shift lists to compare.')
            return

        if reference.pid == comparison.pid:
            showWarning('Same list selected', 'Please choose two different chemical shift lists.')
            return

        tolerances = self._parseTolerances()
        if tolerances is None:
            return

        chain = self.nmrChainWidget.getSelectedObject()
        if not chain:
            showWarning('Missing NmrChain', 'Please select an NmrChain to centre the comparison.')
            return
        rows = self._build_comparison_rows(reference, comparison, tolerances, chain)

        if not rows:
            showWarning('No overlaps', 'No atoms with chemical shifts in both lists were found.')
            return

        if self.onlyOutsideCheck.get():
            rows = [row for row in rows if not row['WithinTolerance']]
            if not rows:
                showWarning('No differences', 'All atoms fall within the supplied tolerance.')
                return

        df = pd.DataFrame(rows)
        safe_name = re.sub(r'[^A-Za-z0-9_]+', '_', f'{reference.name}_{comparison.name}')
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        table_name = f'CSComparison_{safe_name[:40]}_{timestamp}'
        comment = (f'Comparison between {reference.name} and {comparison.name} with '
                   f'hydrogen {tolerances[0]:.4g} ppm and hetero {tolerances[1]:.4g} ppm tolerances.')

        try:
            self.project.newDataTable(name=table_name, data=df, comment=comment)
        except Exception as exc:
            showWarning('Creation failed', f'Unable to create DataTable: {exc}')
            return

        showMessage('Comparison ready', f'DataTable "{table_name}" now holds the comparison.')
        return self.accept()

    def _parseTolerances(self):
        try:
            hydrogen_tol = float(self.hydrogenToleranceEntry.get() or 0)
            hetero_tol = float(self.heteroToleranceEntry.get() or 0)
        except ValueError:
            showWarning('Invalid tolerance', 'Tolerance must be a valid number (ppm).')
            return None

        if hydrogen_tol < 0 or hetero_tol < 0:
            showWarning('Invalid tolerance', 'Tolerances must be zero or positive.')
            return None

        return hydrogen_tol, hetero_tol

    def _build_comparison_rows(self, reference, comparison, tolerances, chain_filter):
        """Collect rows for atoms that exist in either of the selected chemical shift lists."""
        hydrogen_tol, hetero_tol = tolerances
        entries: dict[object, dict] = {}

        def _consider(shift_list, label):
            for shift in shift_list.chemicalShifts:
                atom = shift.nmrAtom
                if not atom or not atom.nmrResidue:
                    continue
                nmr_chain = atom.nmrResidue.nmrChain
                if chain_filter and (not nmr_chain or nmr_chain.pid != chain_filter.pid):
                    continue
                entry = entries.setdefault(atom, {'atom': atom, 'residue': atom.nmrResidue, 'chain': nmr_chain, 'shifts': {}})
                entry['shifts'][label] = shift

        _consider(reference, 'reference')
        _consider(comparison, 'comparison')

        rows = []
        ref_label = reference.name
        cmp_label = comparison.name

        for entry in entries.values():
            atom = entry['atom']
            residue = entry['residue']
            shifts = entry['shifts']
            shift_ref = shifts.get('reference')
            shift_cmp = shifts.get('comparison')

            value_ref = shift_ref.value if shift_ref else None
            value_cmp = shift_cmp.value if shift_cmp else None

            diff = None
            within_tol = False
            tolerance = hydrogen_tol if self._is_hydrogen(atom) else hetero_tol
            if value_ref is not None and value_cmp is not None:
                diff = abs(value_ref - value_cmp)
                within_tol = diff <= tolerance

            lists = sorted({cs.chemicalShiftList.name for cs in atom.chemicalShifts if cs.chemicalShiftList})
            all_lists = ', '.join(lists)

            row = OrderedDict()
            row['ResiduePid'] = residue.pid if residue else ''
            row['SequenceCode'] = residue.sequenceCode if residue else ''
            row['ResidueType'] = residue.residueType or '' if residue else ''
            row['AtomPid'] = atom.pid
            row['AtomName'] = atom.name
            row['IsotopeCode'] = atom.isotopeCode or ''
            row[f'ChemicalShiftPid ({ref_label})'] = shift_ref.pid if shift_ref else ''
            row[f'ChemicalShiftPid ({cmp_label})'] = shift_cmp.pid if shift_cmp else ''
            row[f'Shift ({ref_label})'] = value_ref
            row[f'Shift ({cmp_label})'] = value_cmp
            row['Difference (ppm)'] = diff
            row['WithinTolerance'] = within_tol
            row['ToleranceUsed'] = tolerance
            row['InReference'] = bool(shift_ref)
            row['InComparison'] = bool(shift_cmp)
            row['ShiftLists'] = all_lists
            rows.append(row)

        return rows

    @staticmethod
    def _is_hydrogen(atom):
        isotope = (atom.isotopeCode or '').upper()
        if isotope:
            return 'H' in isotope
        return atom.name.upper().startswith('H')


if __name__ == "__main__":
    popup = CompareChemicalShiftLists(mainWindow=mainWindow)
    popup.show()
    popup.raise_()

And there is also this macro:

Hi Eliza,

This is brilliant thank you very much!

Best wishes

Beth

Just putting this here for anyone re-importing NMRtist peak lists for comparison with their manual assignments, etc.

We have found that NMRtist returns the assignments with no chain identifying letter. To make the chemical shift/peak position comparison macros work as written, you’ll need the NMRatom to be the same (and therefore have the same chainID). We have found this easiest to fix before import to Analysis using @varioustoxins nef pipelines e.g.

nef stream <NMRtist_assigned_peaklist.nef> |\
nef chains rename . A |\
nef save <chainFixed_peaklist.nef>

Well I had never tested that, so its nice to see the logic works!

regards
Gary

Dr Gary S Thompson NMR Facility Manager
CCPN CoI & Working Group Member
Wellcome Trust Biomolecular NMR Facility
School of Natural Sciences
University of Kent, Canterbury, Kent, England, CT2 7NZ

:telephone::01227 82 7117
:envelope:: g.s.thompson@kent.ac.uk
orchid: ORCID

Hi All,

The simpler way to update the chain code is through the Ccpn Analysis NEF import popup as part of your normal import procedure. This does not require any additional external tools.

More guidance can be found in our dedicated tutorials:

Updating chain code - page 8.

BW
Eliza