Create an NmrPipe tab file for relaxation analyses

Python
  1. #=========================================================================================
  2. # Licence, Reference and Credits
  3. #=========================================================================================
  4. __copyright__ = "Copyright (C) CCPN project (http://www.ccpn.ac.uk) 2014 - 2021"
  5. __credits__ = ("Ed Brooksbank, Luca Mureddu, Timothy J Ragan & Geerten W Vuister")
  6. __licence__ = ("CCPN licence. See http://www.ccpn.ac.uk/v3-software/downloads/license")
  7. __reference__ = ("Skinner, S.P., Fogh, R.H., Boucher, W., Ragan, T.J., Mureddu, L.G., & Vuister, G.W.",
  8. "CcpNmr AnalysisAssign: a flexible platform for integrated NMR analysis",
  9. "J.Biomol.Nmr (2016), 66, 111-124, http://doi.org/10.1007/s10858-016-0060-y")
  10. #=========================================================================================
  11. # Last code modification
  12. #=========================================================================================
  13. __modifiedBy__ = "$modifiedBy: Luca Mureddu $"
  14. __dateModified__ = "$dateModified: 2021-02-08 19:23:47 +0000 (Mon, February 08, 2021) $"
  15. __version__ = "$Revision: 3.0.3 $"
  16. #=========================================================================================
  17. # Created
  18. #=========================================================================================
  19. __author__ = "$Author: Luca Mureddu $"
  20. __date__ = "$Date: 2021-02-08 10:28:42 +0000 (Monday, February 08, 2021) $"
  21. #=========================================================================================
  22. # Start of code
  23. #=========================================================================================
  24.  
  25. """
  26. This macro writes an NmrPipe-like file using some of the CcpNmr peak properties.
  27.  
  28. RUN:
  29. - From Macro Editor:
  30. * Main Menu -> Macro -> New
  31. * Copy this macro inside the Macro Editor module
  32. * Define a spectrumName and the outputPath varialbles, (and any other available as required).
  33. * Click run (Green `play` button)
  34.  
  35. - From Python Console
  36. * Copy this macro in a new text file and add the extension `.py`
  37. * Main Menu -> View -> Python Console (or space-space shortcut) and type:
  38. * %run -i thisMacro.py -o outputPath -s spectrumName
  39. outputPath: the full directory path where you want save the new file
  40. spectrumName: the spectrum of interest
  41.  
  42.  
  43. Post your changes/improvements on the Ccpn Forum!
  44.  
  45. If guessClusterId is set to True, it will add a cluster id for each peak.
  46. The algorithm used is based on a "connected component analysis" as seen in https://stackoverflow.com/questions/14607317/
  47. and might not produce the expected results! There are many other (more efficient) ways of doing this operation
  48. but is not the purpose of this macro! Set guessClusterId to False if not happy with the result.
  49. In Peak there isn`t a property for clusterID (although we use Multiplets) so for this macro
  50. is hack-ish stored as peak.annotation.
  51.  
  52. Gui Tips:
  53. - To cycle the peakLabels on display use the shortcut "p+l" until you find the one you need.
  54. - To cycle the peakSymbols on display use the shortcut "p+s" until you find the one you need.
  55. - Change a peak annotation from a peakTable, double click on its cell value.
  56.  
  57. Warnings:
  58. - Always backup the project before running macros!
  59. - The macro will set peak annotations. This will help visualising the clusters ids. If overwrites your annotations,
  60. undo to revert to the previous state.
  61. - Before running the macro close all GuiTables to avoid waiting for Gui updates!
  62. - Tested only for 2D spectra and hard-coded on first two dimensions. Will not work on 1D as it is now.
  63.  
  64. """
  65.  
  66.  
  67. ############################## User`s settings #################################
  68.  
  69. spectrumName = `test` # Name for the spectrum of interest. (Required)
  70. spectrumGroupName = `group` # Name for the spectrumGroup containing the spectra of interest (Optional). If set, the spectrumName is skipped.
  71. peakListIndex = -1 # PeakList index to use. Default last added, index: -1. (Optional)
  72.  
  73. export = True # Set to False if don`t want to export. Default True. (Optional)
  74. outputPath = `~/Desktop` # Path dir where to export the files. FileName from the spectrum Name. (Required)
  75.  
  76.  
  77. # clustering options
  78. guessClusterId = True # Guess a clusterId for all peaks. False to use just a serial. Default True. (Optional)
  79. tolDim1 = 8 # tolerance for the first dimension (in points). Used to find adjacent peaks. (Optional)
  80. tolDim2 = 8 # tolerance for the second dimension (in points). (Optional)
  81. clusterByPositions = True # cluster peaks if their positions are within tolDim1 and tolDim2. Default True. (Optional)
  82. clusterByLWs = False # cluster peaks with overlapping lineWidths. If True, clusterByPositions will be False. (Optional)
  83. increaseLWByNpercent = 50 # increase (on-the-fly) the lw by n% value for finding overlaps. No data is modified. (Optional)
  84.  
  85.  
  86. ############################## Start of the code #################################
  87.  
  88. import numpy as np
  89. import itertools
  90. import pandas as pd
  91. import argparse
  92. from collections import Counter, defaultdict
  93. from ccpn.util.Path import aPath, checkFilePath
  94. from ccpn.core.lib.ContextManagers import undoBlockWithoutSideBar
  95. from ccpn.util.Common import percentage
  96. from ccpn.util.Common import makeIterableList as mi
  97.  
  98.  
  99. def getArgs():
  100. parser = argparse.ArgumentParser( description=`Create an NmrPipe Tab file`)
  101. parser.add_argument(`-o`, `--outputPath`, help=`Output directory path`, default=outputPath)
  102. parser.add_argument(`-s`, `--spectrumName`, help=`Spectrum name`, default=spectrumName)
  103. parser.add_argument(`-g`, `--spectrumGroupName`, help=`Spectrum group name`, default=spectrumGroupName)
  104. parser.add_argument(`-c`, `--guessClusterId`, help=`Guess cluster Ids`, default=guessClusterId)
  105. parser.add_argument(`-t1`, `--tolDim1`, help=`Tolerance 1st dimension in points`, default=tolDim1)
  106. parser.add_argument(`-t2`, `--tolDim2`, help=`Tolerance 2nd dimension in points`, default=tolDim2)
  107. return parser
  108.  
  109. ############################## Units convertion #################################
  110.  
  111. def _hzLW2pnt(lwHz, sw, npoints):
  112. """
  113. :param lwHz: float. A peak lineWidth (in Hz)
  114. :param sw: float. Spectral width (in Hz)
  115. :param npoints: int. Total number of points
  116. :return: float. A peak lineWidth in points (per dimension)
  117. """
  118. if not all([lwHz, sw, npoints]): return
  119. hzXpnt = sw/npoints
  120. lwPnt = lwHz/hzXpnt
  121. return lwPnt
  122.  
  123. def getLineWidthsPnt(peak):
  124. sp = peak.peakList.spectrum
  125. return tuple(_hzLW2pnt(lwHz,sp.spectralWidthsHz[i],sp.totalPointCounts[i]) for i,lwHz in enumerate(peak.lineWidths))
  126.  
  127. def getPeakPositionHz(peak):
  128. return tuple([peak.position[i]*peak.peakList.spectrum.spectrometerFrequencies[i] for i,v in enumerate(peak.position)])
  129.  
  130. ############################## Guess peak clusters #################################
  131.  
  132. def clusterOverlaps(nodes, adjacents):
  133. "Ref: https://stackoverflow.com/questions/14607317/"
  134. clusters = []
  135. nodes = list(nodes)
  136. while len(nodes):
  137. node = nodes[0]
  138. path = dfs(node, adjacents, nodes)
  139. clusters.append(path)
  140. for pt in path:
  141. nodes.remove(pt)
  142. return clusters
  143.  
  144. def dfs(start, adjacents, nodes):
  145. "Ref: https://stackoverflow.com/questions/14607317/"
  146. path = []
  147. q = [start]
  148. while q:
  149. node = q.pop(0)
  150. if path.count(node) >= nodes.count(node):
  151. continue
  152. path = path + [node]
  153. nextNodes = [p2 for p1,p2 in adjacents if p1 == node]
  154. q = nextNodes + q
  155. return path
  156.  
  157. def _getOverlapsByLineWidths(peaks, tolDim1=0.01, tolDim2=0.01, increaseByNpercent=0):
  158. """
  159. Consider two peaks adjacent if the LW are overlapping in both dimensions.
  160. :param peaks: list of ccpn peak objs
  161. :param increaseByNpercent: increase (on-the-fly) the lw by n% value for finding overlaps. No data is modified.
  162. :return: A list of adjacent pairs of peaks
  163. """
  164. overlaps = []
  165. if len(peaks) == 0: return []
  166. sp = peaks[0].peakList.spectrum
  167. for pair in itertools.combinations(peaks, 2):
  168. peakA, peakB = pair
  169. if not all(list(peakA.lineWidths) + list(peakB.lineWidths)):
  170. warning(`LineWidths is set to None for some peaks. Peaks skipped: %s` % ``.join(map(str, pair)))
  171. continue
  172. pos1A, pos2A = [peakA.pointPosition[i] for i in range(sp.dimensionCount)]
  173. lw1A, lw2A = [lw + percentage(increaseByNpercent, lw) for lw in getLineWidthsPnt(peakA)]
  174. pos1B, pos2B = [peakB.pointPosition[i] for i in range(sp.dimensionCount)]
  175. lw1B, lw2B = [lw + percentage(increaseByNpercent, lw) for lw in getLineWidthsPnt(peakB)]
  176. dim1a, dim2a = np.linspace(pos1A-(lw1A/2), pos1A+(lw1A/2),500), np.linspace(pos2A-(lw2A/2),pos2A+(lw2A/2),500)
  177. dim1b, dim2b = np.linspace(pos1B-(lw1B/2), pos1B+(lw1B/2),500), np.linspace(pos2B-(lw2B/2),pos2B+(lw2B/2),500)
  178. inters1 = dim1b[(np.abs(dim1a[:,None] - dim1b) < tolDim1).any(0)]
  179. inters2 = dim2b[(np.abs(dim2a[:,None] - dim2b) < tolDim2).any(0)]
  180. if all([len(inters1)>0, len(inters2)>0]):
  181. overlaps.append(pair)
  182. return overlaps
  183.  
  184. def _getOverlapPairsByPositions(peaks, tolDim1=10., tolDim2=10.):
  185. """
  186. Consider two peaks adjacent if the PointPositions are within the tolerances.
  187. :param peaks:
  188. :param tolDim1: tolerance for the first dimension (in points)
  189. :param tolDim2: tolerance for the second dimension (in points)
  190. :return: A list of adjacent pairs of peaks
  191. """
  192. overlaps = []
  193. for pair in itertools.combinations(peaks, 2):
  194. dim1a, dim2a = np.array(pair[0].pointPosition[0]), np.array(pair[0].pointPosition[1])
  195. dim1b, dim2b = np.array(pair[1].pointPosition[0]), np.array(pair[1].pointPosition[1])
  196. if (np.abs(dim1a - dim1b) < tolDim1) and (np.abs(dim2a - dim2b) < tolDim2):
  197. overlaps.append(pair)
  198. return overlaps
  199.  
  200. def _getClustID(peak):
  201. v = None
  202. try:
  203. v = int(peak.annotation)
  204. except Exception as e:
  205. warning(`Error converting peak annotation in int format. %s` %e)
  206. return v
  207.  
  208. def getMEMCNT(ids):
  209. return Counter(ids)
  210.  
  211. def setClusterIDs(peaks, guessClustID=True, tolDim1=8., tolDim2=8.):
  212. if guessClustID:
  213. if clusterByPositions and not clusterByLWs:
  214. overlappedPeaks = _getOverlapPairsByPositions(peaks, tolDim1=tolDim1, tolDim2=tolDim2)
  215. else:
  216. overlappedPeaks = _getOverlapsByLineWidths(peaks, increaseByNpercent=increaseLWByNpercent)
  217. positions = [pk.pointPosition for pk in peaks]
  218. overlappedPositions = [(pair[0].pointPosition, pair[1].pointPosition) for pair in overlappedPeaks]
  219. result = clusterOverlaps(positions, overlappedPositions)
  220. with undoBlockWithoutSideBar():
  221. allClusters = []
  222. for i, group in enumerate(result):
  223. peakCluster = []
  224. for peak in peaks:
  225. for j in group:
  226. if j == peak.pointPosition:
  227. peakCluster.append(peak)
  228. peak.annotation = str(i+1)
  229. allClusters.append(peakCluster)
  230. else:
  231. with undoBlockWithoutSideBar():
  232. for i, peak in enumerate(peaks):
  233. peak.annotation = str(i)
  234.  
  235. ############################## Build DataFrame from peaks #################################
  236.  
  237. def _getAssignmentLabel(peak):
  238. """
  239. :param peak:
  240. :return: str. the assignemnt in the format: TypeCode_AtomNames e.g. `LYS1_HN`.
  241. If multiple assignments per peak: `LYS1_HN-ALA2_H1N1`
  242. """
  243. dd = defaultdict(list)
  244. labels = []
  245. for na in mi(peak.assignments):
  246. dd[na.nmrResidue].append(na.name)
  247. for nr, nas in dd.items():
  248. l1 = ``.join([nr.residueType, nr.sequenceCode, `_`, *nas])
  249. labels.append(l1)
  250. if len(labels)>0:
  251. return `-`.join(labels)
  252.  
  253. VARS = [`INDEX`, `X_AXIS`, `Y_AXIS`, `DX`, `DY`, `X_PPM`, `Y_PPM`, `X_HZ`, `Y_HZ`, `XW`, `YW`, `XW_HZ`, `YW_HZ`, `X1`, `X3`, `Y1`, `Y3`, `HEIGHT`, `DHEIGHT`, `VOL`, `PCHI2`, `TYPE`, `ASS`, `CLUSTID`, `MEMCNT`]
  254. FORMAT = [`%5d`, `%9.3f`, `%9.3f`, `%6.3f`, `%6.3f`, `%8.3f`, `%8.3f`, `%9.3f`, `%9.3f`, `%7.3f`, `%7.3f`, `%8.3f`, `%8.3f`, `%4d`, `%4d`, `%4d`, `%4d`, `%+e`, `%+e`, `%+e`, `%.5f`, `%d`, `%s`, `%4d`, `%4d`]
  255. VFdict = {k:v for k,v in zip(VARS, FORMAT)}
  256.  
  257. # NMRPIPE VARS https://spin.niddk.nih.gov/NMRPipe/doc2new/
  258.  
  259. INDEX = `INDEX` # REQUIRED - The unique peak ID number.
  260. X_AXIS = `X_AXIS` # REQUIRED - Peak position: in points in 1st dimension, from left of spectrum limit
  261. Y_AXIS = `Y_AXIS` # REQUIRED - Peak position: in points in 2nd dimension, from bottom of spectrum limit
  262. DX = `DX` # NOT REQUIRED - Estimate of the error in peak position due to random noise, in points.
  263. DY = `DY` # NOT REQUIRED - Estimate of the error in peak position due to random noise, in points.
  264. X_PPM = `X_PPM` # NOT REQUIRED - Peak position: in ppm in 1st dimension
  265. Y_PPM = `Y_PPM` # NOT REQUIRED - Peak position: in ppm in 2nd dimension
  266. X_HZ = `X_HZ` # NOT REQUIRED - Peak position: in Hz in 1st dimension
  267. Y_HZ = `Y_HZ` # NOT REQUIRED - Peak position: in Hz in 2nd dimension
  268. XW = `XW` # REQUIRED - Peak width: in points in 1st dimension
  269. YW = `YW` # REQUIRED - Peak width: in points in 2nd dimension
  270. XW_HZ = `XW_HZ` # REQUIRED - Peak width: in points in 1st dimension
  271. YW_HZ = `YW_HZ` # REQUIRED - Peak width: in points in 2nd dimension
  272. X1 = `X1` # NOT REQUIRED - Left border of peak in 1st dim, in points
  273. X3 = `X3` # NOT REQUIRED - Right border of peak in 1st dim, in points
  274. Y1 = `Y1` # NOT REQUIRED - Left border of peak in 2nd dim, in points
  275. Y3 = `Y3` # NOT REQUIRED - Right border of peak in 2nd, in points
  276. HEIGHT = `HEIGHT` # NOT REQUIRED - Peak height
  277. DHEIGHT = `DHEIGHT` # NOT REQUIRED - Peak height error
  278. VOL = `VOL` # NOT REQUIRED - Peak volume
  279. PCHI2 = `PCHI2` # NOT REQUIRED - the Chi-square probability for the peak (i.e. probability due to the noise)
  280. TYPE = `TYPE` # NOT REQUIRED - the peak classification; 1=Peak, 2=Random Noise, 3=Truncation artifact.
  281. ASS = `ASS` # REQUIRED - Peak assignment
  282. CLUSTID = `CLUSTID` # REQUIRED - Peak cluster id. Peaks with the same CLUSTID value are the overlapped.
  283. MEMCNT = `MEMCNT` # REQUIRED - the total number of peaks which are in a given peak`s cluster
  284. # (i.e. peaks which have the same CLUSTID value)
  285.  
  286. NONE = None
  287. NULLVALUE = -666
  288. NULLSTRING = `*`
  289. UNKNOWN = 1
  290. PCHI2Default = 0.00000
  291. TYPEDefault = 1
  292.  
  293. VarsDict = {
  294. INDEX : lambda x: VFdict.get(INDEX) % x.serial,
  295. X_AXIS : lambda x: VFdict.get(X_AXIS) % (x.pointPosition[0] if x.pointPosition[0] else NULLVALUE),
  296. Y_AXIS : lambda x: VFdict.get(Y_AXIS) % (x.pointPosition[1] if x.pointPosition[1] else NULLVALUE),
  297. DX : lambda x: VFdict.get(DX) % (x.positionError[0] if x.positionError[0] else NULLVALUE),
  298. DY : lambda x: VFdict.get(DY) % (x.positionError[1] if x.positionError[1] else NULLVALUE),
  299. X_PPM : lambda x: VFdict.get(X_PPM) % (x.position[0] if x.position[0] else NULLVALUE),
  300. Y_PPM : lambda x: VFdict.get(Y_PPM) % (x.position[1] if x.position[1] else NULLVALUE),
  301. X_HZ : lambda x: VFdict.get(X_HZ) % (getPeakPositionHz(x)[0] if getPeakPositionHz(x)[0] else NULLVALUE),
  302. Y_HZ : lambda x: VFdict.get(Y_HZ) % (getPeakPositionHz(x)[0] if getPeakPositionHz(x)[0] else NULLVALUE),
  303. XW : lambda x: VFdict.get(XW) % (getLineWidthsPnt(x)[0] if getLineWidthsPnt(x)[0] else NULLVALUE),
  304. YW : lambda x: VFdict.get(YW) % (getLineWidthsPnt(x)[1] if getLineWidthsPnt(x)[1] else NULLVALUE),
  305. XW_HZ : lambda x: VFdict.get(XW_HZ) % (x.lineWidths[0] if x.lineWidths[0] else NULLVALUE),
  306. YW_HZ : lambda x: VFdict.get(YW_HZ) % (x.lineWidths[1] if x.lineWidths[0] else NULLVALUE),
  307. X1 : lambda x: VFdict.get(X1) % (x.pointPosition[0]-10 if x.pointPosition[0] else NULLVALUE),
  308. X3 : lambda x: VFdict.get(X3) % (x.pointPosition[0]+10 if x.pointPosition[0] else NULLVALUE),
  309. Y1 : lambda x: VFdict.get(Y1) % (x.pointPosition[1]-10 if x.pointPosition[1] else NULLVALUE),
  310. Y3 : lambda x: VFdict.get(Y3) % (x.pointPosition[1]+10 if x.pointPosition[1] else NULLVALUE),
  311. HEIGHT : lambda x: VFdict.get(HEIGHT) % (x.height if x.height else NULLVALUE),
  312. DHEIGHT : lambda x: VFdict.get(DHEIGHT) % (x.heightError if x.heightError else NULLVALUE),
  313. VOL : lambda x: VFdict.get(VOL) % (x.volume if x.volume else NULLVALUE),
  314. PCHI2 : lambda x: VFdict.get(PCHI2) % (PCHI2Default),
  315. TYPE : lambda x: VFdict.get(TYPE) % (TYPEDefault),
  316. ASS : lambda x: VFdict.get(ASS) % (_getAssignmentLabel(x) if _getAssignmentLabel(x) else NULLSTRING),
  317. CLUSTID : lambda x: VFdict.get(CLUSTID) % (_getClustID(x) if _getClustID(x) else NULLVALUE),
  318. MEMCNT : lambda x: VFdict.get(MEMCNT) % (UNKNOWN), # this is filled afterwards, if clusters
  319. }
  320.  
  321. def buildDataFrame(peaks):
  322. df = pd.DataFrame()
  323. for k, v in VarsDict.items():
  324. if v: df[k] = list(map(lambda x: v(x), peaks))
  325. _memcntDict = getMEMCNT(df[CLUSTID]) #set the MEMCNT
  326. df[MEMCNT] = [_memcntDict.get(i) for i in df[CLUSTID].values]
  327. df.set_index(df[INDEX], inplace=True)
  328. df.sort_index(inplace=True)
  329. return df
  330.  
  331. def dfToTab(df):
  332. "Return NmrPipe (NIH) tab-file formatted string"
  333. string = ``
  334. string += `VARS %s\n` % ` `.join(VARS)
  335. string += `FORMAT %s\n\n` % ` `.join(FORMAT)
  336. string += `NULLVALUE -666`
  337. string += `\nNULLSTRING *\n\n`
  338. string += df.to_string(header=False, index=False) # formats are already defined when building the dataframe.
  339. return string
  340.  
  341. def runMacro():
  342.  
  343. args = getArgs().parse_args()
  344. globals().update(args.__dict__)
  345. info(`Running macro with %s`%args)
  346.  
  347. spectrum = get(`SP:`+spectrumName)
  348. spectrumGroup = get(`SG:`+spectrumGroupName)
  349. spectra = []
  350.  
  351. if not any([spectrum, spectrumGroup]):
  352. msg = `You need at least a spectrum! Set the name at the beginning of this macro`
  353. print(msg)
  354. raise RuntimeError(msg)
  355.  
  356. if spectrumGroup:
  357. spectra = spectrumGroup.spectra
  358. info(`Running macro using spectrumGroup`)
  359. else:
  360. if spectrum:
  361. info(`Running macro using a single spectrum`)
  362. spectra = [spectrum]
  363.  
  364. if len(spectra)==0:
  365. msg = `You need at least a spectrum! Set the name at the beginning of this macro`
  366. print(msg)
  367. raise RuntimeError(msg)
  368.  
  369. for spectrum in spectra:
  370. if len(spectrum.peakLists) == 0:
  371. warning(`You need at least a peakList for %s` %spectrum)
  372. continue
  373. if len(spectrum.peakLists[peakListIndex].peaks) == 0:
  374. warning(`You need at least a peak. %s skipped` %spectrum)
  375. continue
  376.  
  377. peaks = spectrum.peakLists[peakListIndex].peaks
  378. peaks.sort(key=lambda x: x.pointPosition[0], reverse=False)
  379. _guessClusterId = True if guessClusterId in [`True`, `y`, `Y`, `yes`, `true`] else False
  380. setClusterIDs(peaks, _guessClusterId, float(tolDim1), float(tolDim2))
  381. df = buildDataFrame(peaks)
  382. if export:
  383. isPathOk, msgErr = checkFilePath(aPath(outputPath))
  384. if not isPathOk:
  385. print(`File not saved. Error: %s.`%msgErr)
  386. else:
  387. fileName = aPath(outputPath).joinpath(spectrum.name + `.tab`)
  388. string = dfToTab(df)
  389. with open(fileName, `w`) as f:
  390. f.write(string)
  391. print(`File saved in: %s.` % fileName)
  392.  
  393. if __name__ == `__main__`:
  394. runMacro()