In general the article is well-written, clear and to the point. But this part could use more clarification: “The TruSeq protocol had a noticeably higher mean correlation coefficient than any of the other protocols” Some additional description of how the correlation coefficient was computed would be helpful here. It was not immediately clear to me whether this correlation coefficient was computed between technical replicates of the same experiment (i.e. correlation of the linear regression over all spike in concentrations) or between different experiments (i.e. correlation of the gene expressions over all genes between experiments using different spike-in concentrations). In addition, how was the mean correlation coefficient obtained? Was it by computing the coefficient over all pair-wise combinations, then taking the average? Or was it the mean over some randomly chose pairs?
The graphics and figures, however, need substantial work. First, the labels on the graphs in Figure 1 are not very descriptive or clear. Figure 1A axis label “normalized expression” needs to be more specific, either giving units, or have a description in the figure caption describing what this is, even though it is in the text it would make it easier for readers taking a quick look at the article to understand what is being plotted. Second, for the same figure, axis label “ng D. virilis” is too casual; something like “Concentration of D. virilis RNA (ng)” would be more appropriate. In addition, it would also be visually much more informative if the graph proportions were adjusted: since the objective of the normalization was to make the slope 1, a square graph would illustrate this point much more easily. Figures 1B-E have no y-axis labels, and the x-axes as well as axes numerals are not readable at all. Overall the resolution of the figures is also somewhat low, but I think this will probably be adjusted before publication. Similar complaints about text size and missing axes labels apply for Figure 2. Figure 3 also needs descriptive axes labels, not just “Slope 1”, “Slope 2”. In this figure, the plot area should be expanded so that the legend is not obscuring some of the data points at the top right. Finally, the figure referencing in the text is incorrect. (What is Figure 2.2?). All in all, the figures and their captions should be modified such that they are largely understandable on their own, without having to read extensively into the text.
The experimental design is appropriate, however, it would be helpful for the authors to provide more primary data such as that shown in Figure 1A, but for the other methods they tested as well, perhaps as part of supplemental data.
Validity of the findings
In Figure 1, the authors comment: “The mean correlation coefficient was statistically and practically indistinguishable between the Clontech samples and the SMART-seq2 samples (t-test p = .11, Figure 2.2).” and in Figure 1 show the “Distributions of slopes, intercepts, and correlation coefficient for linear regressions of the abundance of each gene”. I noticed that in these figures, although the mean of each is very similar, as the authors pointed out, the distributions of each method do display differences. For example, Clontech and TotalScript methods show a tail skew for the slopes and intercepts distributions, whereas the distribution of these same metrics for TruSeq shows symmetry. Can the authors comment on this difference, or provide additional data and or speculation as to why this might be or the origin of this difference?
Authors should provide additional detail in the bioinformatics methods – were raw reads quality filtered/controlled in any way prior to mapping?
Comments for the author
No additional comments.