Review for "A standalone software platform for the interactive management and pre-processing of ATAC-seq samples"

Completed on 28 May 2017 by Lachlan Coin .

Login to endorse this review.

Comments to author

Ahmed and Ucar present a platform for processing ATAC-seq samples. The platform starts with a fastq file, and generates peak calls. The rationale for developing this program is to provide a user-friendly solution for processing ATAC-seq samples, however it is a general pipeline for finding read-depth peaks and could be also applied to Chip-Seq data, for example.

However, this tool does not really make the analysis 'easy' for the end-user, as quite a bit of investment must be made in installing all the software dependencies. Also, it seems that command line tools are required to inspect the results, and there are no tools for assisting in interpretation or visualisiation of the results of the peak calling.

Major revisions.

1. The authors should do a more thorough comparison with features of other programs for processing ATAC-seq data. The alternatives are only mentioned in passing, and no real comparison of features is made.

2. Consider making a galaxy pipeline incorporating this pipeline. In particular, I am not convinced that a standalone tool is a more straightforward solution than using Galaxy, particularly if the user only has a modest number of samples to process. This is because the user still has to install all the software dependencies, One advantage of Galaxy is that the user does not need to install any software, as it will already be installed on a Galaxy installation.

3. One of the major points of this preprint is that it makes the analysis 'interactive'. However, there is little evidence of interactivity, and I am not sure what interactive means in the context of a pipeline for processing data. The interactivity seems to mostly consist of being able to modify various parameters in a GUI, and pressing the 'run' button. To be truly interactive, some kind of visualization of the data must be presented, and the tools to choose specific analysis pipelines on the basis of this visual feedback. At the moment it seems to me that the user must still inspect the results in the file-system, so interactivity consists of being able to change the run parameters in a GUI.

4. The authors have run the pipeline on GM12878 (and presumably other samples) but provide no results from these runs. It would be very useful to see the utility of the pipeline if some indication of the results which were obtained were presented. This gets back to the point about visualisation – it seems the tool provides no data visualization?

5. Downstream processing of peak calls. Does the tool offer any downstream processing of peak calls? I dont think the authors could claim this is a ATAC-seq pipeline if it doesnt offer some downstream tools for e.g. identifying changes in histone positioning, maybe some kind of fourier transform of peaks? Something which would assist in interpreting this as the result of an ATAC-seq experiment.

Minor revisions:

1. There are too many flow diagrams, and many of them essentially represent the same information in different ways. I understand that they each have a subtly different point, e,g. One is to illustrate the file -structure created by the program. However, it would be better to just have a single flowchart illustrating everything, perhaps with extra figure panels to illustrate other relevant points of interest, like the file structure generated. Also in Figure 2, its impossible to get any useful information out of the screenshots included, so I dont see the point. Moreover, the point of the tool is to help biologists not familiar with command line, so why does \figure 2 show commands and outputs on the command line?