Creating Noesy peak list from PDB

Mattcliff · 9 March 2023 12:18

Hi,
I’ve managed to import a structure ensemble from a pdb file into Analysis Structure.

I’ve now managed to code this, but think it is probably a stupid way to do it:

noelist1=[]

for amide in model1[model1.atomName==“H”].values:
…: xyz1=amide[9:12]
…: atname1=amide[[3,5,6]]
…: atname1[0]+=18
…: for proton in model1[model1.element==“H”].values:
…: xyz2=proton[9:12]
…: dist=(sum(((xyz1-xyz2)**2)))**0.5
…: if dist<6:
…: atname_full=proton[[3,5,6]]
…: atname_full[0]+=18
…: noelist1.append(atname1,atname_full,dist)

Ideally I would now put marks and navigate to each crosspeak position, to inspect whether proposed peak is there or not. Currently I am navigating “by hand”, which is fine, but can’t work out how to make marks from within a macro/python console
An extension would be to pick the NOE peaks automatically.

VickyH · 9 March 2023 14:41

Hi Matt,

you should be able to create marks with something like

mainWindow.newMark(colour='#FF8C00', positions=(8.0, 114.35), axisCodes=['H', 'N'], style='simple', units=(), labels=())

i.e. the marks work on ChemicalShifts, not on (Nmr)AtomNames.

But I am assuming at the moment all you have is a list of atom names

noelist1.append(atname1,atname_full,dist)

and that you still need to find a way to get from your atom names to your chemical shift positions?

I think what I would try in your position is to create the NmrAtom pids from the information in your pdb file and then look for them in the ChemicalShiftList by comparing with something like

csl.chemicalShifts[0].nmrAtom.pid

Then you can get the ppm position and create peaks (at this point it would be fairly straight forward to assign them, too). The navigation and marking then become easy by just going through the peak list table in the GUI.
The main issue with this is liable to be that going from your pdb atom names to the NmrAtom names will be easy for H and N, but rather less easy for HB2/HB3 which have probably been assigned as HBx/HBy. But that is always going to be an issue if you want to find a way to get from your structure atoms to your nmr assigned atoms.

But someone else might have more sophisticated suggestions.

Vicky

Mattcliff · 15 March 2023 15:03

pdb information has been lost from the project after closing and opening
gah…

Mattcliff · 15 March 2023 15:37

However, I’m doing this:
at2=get(“NA:%s.%i.%s.%s”%(“A”,noelist1[i][1][0],noelist1[i][1][1],noelist1[i][1][2]))
mainWindow.newMark(colour=’#FF00FF’, positions=[at2.chemicalShifts[0].value], axisCodes=[‘H_3’], style=‘simple’, units=(), labels=[at2.id])

Mattcliff · 15 March 2023 15:39

I don’t really understand how to use the pids…

VickyH · 15 March 2023 15:45

It looks like it may only be saving the average (if it generated one) and not the actual structure(s) entered. We’ve not done very much with structures in the project so far, so we clearly need to take a look at this!
Vicky

Mattcliff · 15 March 2023 16:01

That appears to be true.
Is there some sort of object I can save the “noelist1” in so that it is saved too?

ElizaP · 17 March 2023 08:33

Hi Matt,
If you have chemical shift value for each dimension you can just create a new peak in new peak list, I think that would be the easiest:
peakListPDB = get(‘SP:noe’).newPeakList()
peakListPDB.newPeak(ppmPositions=[pos1,pos2,pos3])

or you can make a dataTable:

import pandas as pd
Convert your data to a pandas dataframe:
myDataFrame = pd.DataFrame(data = [1,2,3], columns=[‘newColumn’])

Create DataTable from the pandas dataframe:
project.newDataTable(name=“myDataFrame”, data=myDataFrame)

Is that any good?
Eliza

Mattcliff · 21 March 2023 16:37

Not sure how I should share the code, but I have the below now working.
Now I need to merge overlapping peaks into ambiguous ones:
I am doing something like this:
pk1.ppmPositions=tuple([(pk1.ppmPositions[i]*pk1.height+pk2.ppmPositions[i]*pk2.height)/(pk1.height+pk2.height) for i in range(3)])
pk1.assignments=pk1.assignments+pk2.assignments
pk2.delete()

However, sometimes when I try to do this:
print(pk1.assignments)
it gives me an error message, but not always:
AttributeError Traceback (most recent call last)
Input In [80], in <cell line: 1>()
6 pk1.assignments=pk1.assignments+pk2.assignments
7 pk2.delete()
----> 8 print(pk1.assignments)
9 n-=1
10 j=0
File ~/programs/ccpnmr3.1.1/src/python/ccpn/core/Peak.py:617, in Peak.assignedNmrAtoms(self)
615 data2Obj = self._project._data2Obj
616 apiPeak = self._wrappedData
→ 617 peakDims = apiPeak.sortedPeakDims()
618 mainPeakDimContribs = [sorted(x.mainPeakDimContribs, key=operator.attrgetter(‘serial’))
619 for x in peakDims]
620 result = []

AttributeError: ‘NoneType’ object has no attribute ‘sortedPeakDims’

Code below:

import pandas as pd

def get_possible_HN_noes(sePid,model_number):
try:
se = get(sePid)
except:
print(“cannot open model %s”%(sePid))
chain = “A”
model1=se.data[se.data.modelNumber==model_number]
noelist1=[]
for amide in model1[model1.atomName==“H”].values:
xyz1=amide[9:12]
atname1=amide[[3,5,6]]
atname1[0]+=18
pidstringH=“NA:%s.%i.%s.H”%(chain,atname1[0],atname1[1])
pidstringN=“NA:%s.%i.%s.N”%(chain,atname1[0],atname1[1])
for proton in model1[model1.element==“H”].values:
xyz2=proton[9:12]
dist=(sum(((xyz1-xyz2)**2)))**0.5
if dist<6:
atname_full=proton[[3,5,6]]
atname_full[0]+=18
pidstring=“NA:%s.%i.%s.%s”%(chain,atname_full[0],atname_full[1],atname_full[2])
noelist1.append([pidstringH,pidstringN,pidstring,dist])
df=pd.DataFrame(data=noelist1, index = [i for i in range(len(noelist1))], columns=[“pidstringHn”,“pidstringN”,“pidstringH”,“distance”])
project.newDataTable(name=“noeList”, data=df)
return
def get_possible_HN_noes_pdb(pdbstring,model_number):
try:
se=application.loadData(pdbstring)

#try: se=get(sePid)
except:print("cannot open model %s"%(pdbstring))
chain="A"
model1=se.data[se.data.modelNumber==model_number]
noelist1=[]
for amide in model1[model1.atomName=="H"].values:
    xyz1=amide[9:12]
    atname1=amide[[3,5,6]]
    atname1[0]+=18
    pidstringH="NA:%s.%i.%s.H"%(chain,atname1[0],atname1[1])
    pidstringN="NA:%s.%i.%s.N"%(chain,atname1[0],atname1[1])
    for proton in model1[model1.element=="H"].values:
        xyz2=proton[9:12] 
        dist=(sum(((xyz1-xyz2)**2)))**0.5 
        if dist<6:
            atname_full=proton[[3,5,6]] 
            atname_full[0]+=18
            pidstring="NA:%s.%i.%s.%s"%(chain,atname_full[0],atname_full[1],atname_full[2])
            noelist1.append([pidstringH,pidstringN,pidstring,dist])
df=pd.DataFrame(data=noelist1,index=[i for i in range(len(noelist1))],columns=["pidstringHn","pidstringN","pidstringH","distance"])
project.newDataTable(name="noeList", data=df)
return

def get_noe_chemical_shift_positions(pidstringH,pidstringN,pidstring,csl):
#print(pidstringH)
HNshift=csl.getChemicalShift(get(pidstringH)).value
#except:print(pidstringH);return
Nshift=csl.getChemicalShift(get(pidstringN)).value
#except:print(pidstringN);return
#print(Nshift)
teststrings=["%",“x”,“y”]
Hshifts=[];Hnames=[]
#print(pidstring)
Hsh=csl.getChemicalShift(get(pidstring))
#print(Hsh)
if Hsh!=None:
Hshifts.append(Hsh.value)
Hnames.append(pidstring)
headstring="fdlihg "
else:
#print(pidstring)
headstring=pidstring[0:-1]
for test in teststrings:
teststring=headstring+test
try:
Hshifts.append(csl.getChemicalShift(get(teststring)).value)
Hnames.append(teststring)
except:
continue#print(teststring)
if len(Hshifts)==0:
for test2 in [“x”,“y”]:
headstring=pidstring[0:-2]
for test in teststrings:
teststring=headstring+test2+test
try:
Hshifts.append(csl.getChemicalShift(get(teststring)).value)
Hnames.append(teststring)
except:
continue#print(teststring)
return HNshift,Nshift,Hshifts,Hnames,headstring

def make_putative_noe_peaks(noelistPid,spPid,csPid):
i=0
try: sp=get(spPid)
except: print (“can’t open spectrum”)
try: csl1=get(csPid)
except: print(“can’t open chemical shift list”)
try: noeTable=get(noelistPid)
except: print(“can’t open Noe list”)
pl=sp.newPeakList()
noelist1=noeTable.data.values
oldhead=“iluaewrhtdv lbaekdsbviweauf”
while i<len(noelist1):
[pidStringH,pidStringN,pidString]=noelist1[i][0:3]
if oldhead in pidString:i+=1;continue
#HN_shift,N_shift,H_shift,H_names,headString=get_noe_chemical_shift_positions(pidStringH,pidStringN,pidString,csl1)
try: HN_shift,N_shift,H_shifts,H_names,headString=get_noe_chemical_shift_positions(pidStringH,pidStringN,pidString,csl1)
except: print(“NOE entry %i has some chemical shifts missing”%(i));i+=1;continue
j=0;oldhead=headString
for Hname in H_names:
tmp_pk=pl.newPeak(ppmPositions=(HN_shift,N_shift,H_shifts[j]))
tmp_pk.assignedNmrAtoms=[(get(pidStringH),get(pidStringN),get(Hname))]
tmp_pk.height=sp.getHeight(ppmPositions=(HN_shift, N_shift,H_shifts[j]))
j+=1
i+=1

VickyH · 28 March 2023 11:19

Hi Matt,

sorry it’s taken a while to get back to you. We had a bit of to-ing and fro-ing about what the issue might be. But it seems my initial hunch was probably right that changing your assignments with

pk1.assignments=pk1.assignments+pk2.assignments

is the root of the problem. Looks nice and elegant and sort of works, but somehow leaves some bits of the model unhappy.

What you should probably be doing instead is using one of the explicit methods available for doing assignments e.g. peak.assignDimension(). The following code will hopefully help you create something to slot into your macro:

Or you could try using peak.assignByDimensions() (note the extra s!) which will probably allow you to assign both dimension in one go. All much less elegant than what you were doing, but it should hopefully prevent those errors popping up.

Vicky

Mattcliff · 29 March 2023 11:05

Thanks,
I don’t know what the issue was, but my method worked in the end, I think something upstream in the macro fixed it. But doing it properly is probably better.
Matt