Relaxation analysis - interface to NMRpipe\s routine

paulschanda · 30 January 2021 06:35

Hi,

I try to extract peak intensities from a series of spectra (relaxation). I am not totally satisfied with the current procedure in CCPNv3 (via the chemical shift perturbation tool). Furthermore, as I have some overlapping peaks, I would like to use the good, old NMRpipe/NMRdraw routines. But: NMRpipe is not overly user-friendly, and I would like to get my assignments from CCPN to NMRpipe.

What I need is to export from CCPN a “master file” that contains the peak positions, widths AND assignments, which is the further used by NMRpipes fit.com/model.com routines. A snippet of this file is shown below.<br /> <br /> My question: how can I create such a file in CCPN? Where is the information about the peaks stored? All the information must be in the peak lists somehow, I just need to know where to find it. Once I know where to find this information, it shall be straightforward to write a python script which goes through all the peaks and writes the information to a file. I dont want a GUI routine, but I want to write a script.

Thanks

Paul

VARS   INDEX X_AXIS Y_AXIS DX DY X_PPM Y_PPM X_HZ Y_HZ XW YW XW_HZ YW_HZ X1 X3 Y1 Y3 HEIGHT DHEIGHT VOL PCHI2 TYPE ASS CLUSTID MEMCNT
FORMAT %5d %9.3f %9.3f %6.3f %6.3f %8.3f %8.3f %9.3f %9.3f %7.3f %7.3f %8.3f %8.3f %4d %4d %4d %4d %+e %+e %+e %.5f %d %s %4d %4d

NULLVALUE -666
NULLSTRING *

    1   249.407   219.332 3.306 0.108    0.680   26.325   407.922 3972.261 19.513   4.618   85.889   13.609 235 263 216 222 +1.557209e+07 +8.625674e+04 +1.595578e+09 0.00000 1 None    1    1
    2   341.909   253.937 5.854 0.169    0.001   25.649     0.767 3870.288 21.759   5.334   95.775   15.718 327 357 250 257 +8.929819e+06 +8.528948e+04 +1.194781e+09 0.00000 1 None    2    1
    3   223.768   264.205 16.220 4.445    0.868   25.449   520.776 3840.032 23.403 14.973 103.011   44.122 221 226 263 266 +3.204666e+06 +8.526191e+04 +7.509557e+07 0.00000 1 None    3    1

LucaM · 30 January 2021 14:18

Hi Paul,

Thank you for your feedback. The relaxation module and related tools are in our top priority list for near future releases...
We don`t support exports to other programs like FormatConverter does, our efforts are on NEF (https://www.ccpn.ac.uk/manual/v3/NEF.html).
So this file cannot be generated natively. But You can access all the peak information from its Class from the IPython console. See the documentation online https://www.ccpn.ac.uk/docv3/build/html/....core.html or from the Help Menu > Show API documentation, on the ccpn.core package, scroll down to Peak Module for more info.

I guess you create this file for each of your spectra and feed them to NmrPipe...
Can you define the full name for the VARS, (some are obvious), or the Vars are needed for your analysis so we can draft a script for you?

Thanks

paulschanda · 30 January 2021 17:49

Hi Luca,

thanks for the rapid feedback.
I am aware that CCPN does not plan to interface with other programs.
It would nonetheless be great to be able to export this file, at least until the relaxation module is ready.

I tried to find out what the columns in the relax.master.tab file (the input for NMRpipe`s fit.com module) means. And more importantly, I figured out what are the required/not required fields.

Below I describe what I found about this file. Basically, we need to know peak positions and peak widths, in points, ppm and Hz. And the assignment (this is the whole point).

Cheers,

Paul

The file has two header lines, as follows:

VARS INDEX X_AXIS Y_AXIS DX DY X_PPM Y_PPM X_HZ Y_HZ XW YW XW_HZ YW_HZ X1 X3 Y1 Y3 HEIGHT DHEIGHT VOL PCHI2 TYPE ASS CLUSTID MEMCNT
FORMAT %5d %9.3f %9.3f %6.3f %6.3f %8.3f %8.3f %9.3f %9.3f %7.3f %7.3f %8.3f %8.3f %4d %4d %4d %4d %+e %+e %+e %.5f %d %s %4d %4d

Then it has one line for each peak, as follows:
37 271.416 786.407 5.313 0.151 0.518 15.251 311.047 2301.216 22.300 4.651 98.154 13.704 255 287 783 789 +1.287407e+07 +8.665002e+04 +1.493064e+09 0.00000 1 None 21 1

THE FOLLOWING SUFFICES:
37 249.407 219.332 0.000 0.000 0.000 0.000 0.000 0.000 19.513 4.618 85.889 13.609 0 0 0 0 +1.557209e+07 +8.625674e+04 +1.595578e+09 0.00000 1 None 13 1

INDEX: peak number (arbitrary)
X_AXIS: peak position: in points in 1st dimension, from left of spectrum limit REQUIRED
Y_AXIS: peak position: in points in 2nd dimension, from bottom of spectrum limit REQUIRED
DX: line width in 1st dimension, in points NOT REQUIRED
DY: line width in 2nd dimension, in points NOT REQUIRED
X_PPM: peak position: in ppm in 1st dimension NOT REQUIRED
Y_PPM: peak position: in ppm in 2nd dimension NOT REQUIRED
X_HZ: peak position in Hz in 1st dimension: NOT REQUIRED
Y_HZ: peak position in Hz in 2nd dimension: NOT REQUIRED
XW: peak width in 1st dimension in points: REQUIRED
YW: peak width in 2nd dimension in points: REQUIRED
XW_HZ: peak width in Hz in 1st dimension. REQUIRED
YW_HZ: peak width in Hz in 2nd dimension. REQUIRED

X1: left border of peak in 1st dim, in points NOT REQUIRED
X3: right border of peak in 1st dim, in points NOT REQUIRED
Y1: left border of peak in 2nd dim, in points NOT REQUIRED
Y3: right border of peak in 2nd, in points NOT REQUIRED
HEIGHT: peak height NOT REQUIRED
DHEIGHT: I dont’ know what the difference between HEIGHT and DHEIGHT is, NOT REQUIRED
VOL: peak volume. NOT REQUIRED
PCHI2: probably some goodness-of-fit. NOT REQUIRED
TYPE: seems to be 1 by default.
ASS: a string containing the assignment. The following example works: Val23_HG1CG1 — Can ben “None” for unassigned peaks, but that’s the whole point to get this right
CLUSTID: NMRpipe identifies overlapping peaks and assigns them to the same cluster; subsequently, it fits the peaks of each cluster jointly, to avoid mis-fitting problems related to peak overlap. This may be difficult to do in CCPN, so it can be just a different number for each peak (starting from 1)
MEMCNT: seems to be 1 by default.

LucaM · 2 February 2021 17:33

Thanks Paul.

I should be able to write the macro shortly. I will post it on the macro section as soon as is done. I might be able to estimate the clusterID based on a connected-component analysis, obviously you will need to see if works for you, and if not, just disable it to have only an ascending serial!

DrVenture · 2 February 2021 23:35

Dear Paul,

To answer your original question, the majority of the information that you require can be accessed as attributes of the peak objects in a given peaklist as described here: `www.ccpn.ac.uk/docv3/build/html/ccpn/ccpn.core.html#ccpn-core-peak-module`

Peak positions are n-dimensional tuples, in axis code order, which can be accessed using peak.position, linewidths are accessible via peak.lineWidths and the same go for height and volume and the assignments are an n dimensional lists of tuples accessible as peak.dimensionNmrAtoms.

Therefore, taking a peaklist whose PID is PL:hsqc.1, for example, getting the required information would be something along the lines of:

Code:

peaks = project.getByPid(`PL:hsqc.1`).peaks:
 
 for ii, pk in enumerate(peaks):
  
      x_axis, y_axis = pk.pointPosition
      x_ppm, y_ppm = pk.position
      index = ii
  
       xw_hz, yw_hz = pk.lineWidths
      ass = `,`.join([x[0] for x in pk.dimensionNmrAtoms])  # assuming one assignment per dimension
      height = pk.height
      dheight = pk.heightError
      vol = pk.volume

I hope this is sufficient information for you to write a python script that exports the information to a file.

LucaM · 3 February 2021 20:58

Just in case some one is interested, I posted a macro called `Create an NmrPipe-like file for relaxation analyses` which will create a ready-ish file.
It includes also a conversion of lineWidths from Hz to points, options for estimating cluster ids, and the correct formatting as required for NmrPipe.

paulschanda · 5 February 2021 20:04

Thanks a lot, Luca!

paulschanda · 6 February 2021 09:34

Dear Luca,

I have tested your script, and it is not quite there yet. Maybe it is my fault and I did not tell you correctly which column goes where. When I load the file in NMRdraw, the peaks are not recognized. I suspect that the order of the columns is not right and that the -666 entries might make problems. I have not systematically tried (now) how to modify the output file from your script such that it works with NMRpipe/NMRdraw.

I paste below the output from the script, as well as the original file generated by NMRdraw. The first to lines are from CCPN, lines 3 and 4 from NMRdraw, for the same peaks and the same spectra.

VARS INDEX X_AXIS Y_AXIS DX DY X_PPM Y_PPM X_HZ Y_HZ XW YW XW_HZ YW_HZ X1 X3 Y1 Y3 HEIGHT DHEIGHT VOL PCHI2 TYPE ASS CLUSTID MEMCNT
FORMAT %5d %9.3f %9.3f %6.3f %6.3f %8.3f %8.3f %9.3f %9.3f %7.3f %7.3f %8.3f %8.3f %4d %4d %4d %4d %+e %+e %+e %.5f %d %s %4d %4d

NULLVALUE 666
NULLSTRING *
1 -666 -666 0.669 26.310 -666 -666 250.904 220.084 18.396 3.607 80.971 10.628 -666 -666 -666 -666 +1.162319e+07 -666 -666 -666 1 A.340.LEU.HD1%, CD1 -666 1
2 -666 -666 -0.019 25.646 -666 -666 344.624 254.118 20.645 4.572 90.873 13.471 -666 -666 -666 -666 +6.616024e+06 -666 -666 -666 1 A.12.LEU.HD1%, CD1 1 1
3 249.407 219.332 3.306 0.108 0.680 26.325 407.922 3972.261 19.513 4.618 85.889 13.609 235 263 216 222 +1.557209e+07 +8.625674e+04 +1.595578e+09 0.00000 1 None 1 1
4 341.909 253.937 5.854 0.169 0.001 25.649 0.767 3870.288 21.759 5.334 95.775 15.718 327 357 250 257 +8.929819e+06 +8.528948e+04 +1.194781e+09 0.00000 1 None 2 1

I understand that it may not be easy to debug this, without having the project. I can send the project by e-mail, or I can get the information of the peaks from the python console -- if you tell me how. I don`t know what the peak object looks like, and how I can get all the information pertaining to a given peak.

As an aside, I have modified the script, because the creation of the output directory did not work, and it complained about the spectrum group.
I solved this with the following addition:

```
import argparse
import os

print (`Script only works from command line, and uses command line arguments. Use the argument -h to get help.`)

parser = argparse.ArgumentParser(description=`Create a file/peak list for use with NMRpipe, typically for spectrum fitting.`)
parser.add_argument(`-o`,`--outputPath`,help=`Output directory path. Default: ~/Desktop/nmrpipefiles`, required=False,default=`./nmrpipetabfiles`)
parser.add_argument(`-g`,`--spectrumGroupName`,help=`Spectrum group name; see side bar for spectrum groups. Default: group`, required=True,default=`group`)
args = parser.parse_args()

outputPath=args.outputPath
spectrumGroupName=args.spectrumGroupName

# Create output directory
#if not os.path.exists(outputPath):
try:
os.mkdir(outputPath)
print (`Created output directory %s.` % outputPath)
except:
print (`Using existing directory %s.` % outputPath)
```

thanks!
Paul

As an aside, I actually do not need to generate these files for a whole group of spectra. It would suffice to generate it only for one spectrum, which would be the `master spectrum`, from which NMRpipe then fits all spectra. So right now I would just not use any of the additional files that your script generates.

LucaM · 6 February 2021 13:46

Hi Paul,
thanks for trying out. It is a good idea to add the parser in case you want run it without opening the actual code to modify the required elements (name/output path).
If only a spectrum is necessary, then we can remove the spectrumGroup loop and add only the spectrum of interest.

I shall try on the actual NMRdraw and I will post the latest version.
Perhaps is the peak assignment format the issue.

DrVenture · 7 February 2021 11:25

Hi Paul,

Having looked at and tested Luca`s macro, it appears that the ordering and formatting of some of the fields has gone a bit awry and, therefore, the Y_AXIS field, for example, is coming out as a null value.

I`ve written an alternative (but simpler) macro and tested it in NmrDraw, shown here:

Code:

def hz2pt(lw, sw, npoints):
     return lw/(sw/npoints)
  
 def getLineWidths(peak):
     sp = peak.peakList.spectrum
     return tuple(hz2pt(lw, sp.spectralWidthsHz[ii], sp.totalPointCounts[ii]) for ii, lw in enumerate(peak.lineWidths))
 
 def getPeakInfo(peak):
       i = peak.serial
       dx = -666
       dy = -666
       x_ppm, y_ppm = peak.position
       x_axis, y_axis = peak.pointPosition
       x_hz, y_hz = (-666, -666)
       x_w, y_w = getLineWidths(peak)
       xw_hz, yw_hz = peak.lineWidths
       x1, x3, y1, y3 = ([-666 for x in range(4)])
       height = peak.height
       dheight = peak.heightError if peak.heightError != None else -666
       vol = peak.volume
       pchi = -666
       t = 1
       ass = `,`.join([``.join(x.id.split(`.`)[1:]) for x in peak.dimensionNmrAtoms for x in x])
       clustid = peak.serial
       memcnt = 1
       peakInfo = `%5d %9.3f %9.3f %6.3f %6.3f %8.3f %8.3f %9.3f %9.3f %7.3f %7.3f %8.3f %8.3f %4d %4d %4d %4d %+e %+e %+e %.5f %d %s %4d %4d\n` % (i, x_axis, y_axis, dx, dy, x_ppm, y_ppm, x_hz, y_hz, x_w, y_w, xw_hz, yw_hz, x1, x3, y1, y3, height, dheight, vol, pchi, t, ass, clustid, memcnt)
       return peakInfo
       
 def writeNmrPipeTable(spectrum):
 
     with open(`%s.tab` % spectrum.name, `w`) as f:
         f.write(`VARS   INDEX X_AXIS Y_AXIS DX DY X_PPM Y_PPM X_HZ Y_HZ XW YW XW_HZ YW_HZ X1 X3 Y1 Y3 HEIGHT DHEIGHT VOL PCHI2 TYPE ASS CLUSTID MEMCNT\n`)
         f.write(`FORMAT %5d %9.3f %9.3f %6.3f %6.3f %8.3f %8.3f %9.3f %9.3f %7.3f %7.3f %8.3f %8.3f %4d %4d %4d %4d %+e %+e %+e %.5f %d %s %4d %4d`)
         f.write(`\n\n`)
         f.write(`NULLVALUE -666\n`)
         f.write(`NULLSTRING *\n\n`)
         for peak in spectrum.peaks:
             l = getPeakInfo(peak)
             f.write(`%s` % l)

For simplicity, this does not include any form of clustering, it simply sets the clusterID to the peak serial, it works on one spectrum object at a time and uses all peaks in all peaklists of that spectrum.

As I`m sure you`re aware, if you have a spectrumGroup you can use a for loop to iterate through spectrumGroup.spectra.

I hope this is useful

LucaM · 8 February 2021 19:43

Thanks for testing it!
I have now amended and tested on #nmrbox (https://nmrbox.org). it loads on nmrdraw with no problems (2D spectra only).

Thanks for an alternative way, surely many will also use as building block for importing/exporting to other formats.

Please keep posting!