Madison I Dunitz, Jenna M Lang, Guillaume Jospin, Aaron E Darling, Jonathan A Eisen, David A Coil
This manuscript from Dunitz et al. covers a lot of ground, as it takes novice researchers from sample isolation to genome sequencing and phylogenetic classification.
In fifth grade I had to write directions on how to make a PB&J sandwich. My directions were 10 pages long and still weren't detailed enough. What I learned from this (at first glance) simple exercise was that, no matter how easy a task seems, it is incredibly difficult to write step by step instructions that all can follow. In writing workflow papers, especially ones that cover so much ground, the authors are inevitably going to have to sacrifice nuance for clarity and descriptiveness for brevity. After reading this manuscript through a couple of times, while there are certainly places where more description could be possible, the authors do a pretty good overall job at capturing the spirit of the analyses and providing a workflow that advanced high school classes could theoretically use. There are a couple of places that more depth is warranted (see below), but overall they do a pretty good job balancing thoroughness and readability.
All that being said, I think it would benefit the manuscript greatly to set up a virtual machine on iPlant (www.iplantcollaborative.org) that contains sample data sets and is set up to run most of the programs in this workflow. This virtual machine would be freely accessible to all (so long as iPlant remains accessible to all) and would provide a means to run the workflow without having to install software, get permissions, etc....Moreover, versions of this software would be frozen on this virtual machine so that anyone looking to repeat these analyses would not have to worry about changes in versions. It's a bit of work to set up one of these virtual machines, but they will be around forever and can be accessed with the click of a link. It just seems to me like it would be good to set up a one stop interface where those who were interested could forever have preprogrammed access to all the programs and analyses described in this manuscript. It's a great resource and seems like a perfect fit for this kind of workflow.
This section isn't quite applicable to this manuscript, but all of the described analyses and programs make logical sense. Following these directions would certainly give a bioinformatically novice user a pretty straightforward path to genome preparation and assembly.
Validity of the findings
Again, this section isn't quite applicable to this manuscript, but all of the described analyses and programs seem like they will work. I will admit to not checking every single link that the authors reference, but from what I've seen this is as good an introductory description as you can get to bacterial sampling and genome analysis.
Comments for the author
The authors have a readable style of writing, but throughout I thought there were some phrases that would be better left out:
Abstract: "has become almost trivial" I would change this wording simply because "almost trivial" just reads a bit off to me in this context (especially because having done these analyses, they are never trivial".
Line 5: "and difficulty" I don't think you need these two words. IMO it's the drop in cost that is the main driver, and the level of difficulty hasn't changed, it's just been redistributed to bioinformatics.
Line 27: "relatively cheap sequencing" better as "cost efficient sequencing" or something slightly different. The words relatively cheap read too colloquial to me here.
LIne 30: "create a large activation energy" again...reads too colloquial to me.
Line 106: "It is customary to offer a small favor or gift" Please leave this line out. I understand the sentiment, but it's really weird to read in a manuscript and hopefully folks have enough humility to be thankful for the help.
Line 128: "Will often result in the isolation of pathogens" better as "can preferentially isolate human pathogens"
Line 142: Put in a temp for room temperature (given how detailed other parts of the manuscript are"
Line 152: Which online tutorial?
Line 152: "or this paper by Baldouf" better as "or Baldouf ."
LIne 178: It strikes me that if you are going to mention monophyletic clades, that a definition of polyphyletic for comparison sake is warranted
Line 180: "going back in time" is a bit of an unclear statement for the intended audience.
Line 183: "measure how much a particular part of a phylogenetic tree" better as "measure how well a node is supported"?
LIne 191: "sterile swab"...how can you obtain or ensure that the swab is sterile?
Line 193: "for 1-3 days" better as "until colonies of interest appear"
Line 204: "can be easily found online" better as "can be found online"
LIne 224: delete "originally developed by Fred Sanger and now"
LIne 226: "needs DNA" better as "requires DNA"
Section 6.3: You should describe the entire PCR program (annealing time and extension times, number of cycles, etc..."
Line 266: You should elaborate on "all controls behaved as expected"
Line 281: what about mentioning science exchange (www.scienceexchange.com) as a way to shop around for sequencing centers and compare prices
Line 322: seems like you need quotation marks around "upload the data without well mapping" button
Line 360: please reword "ready to go"
Line 463: "fancy" better as "complex"
Line 502-514: What about the possibility of getting human contamination from outside the sample? What would that look like? Seems like an important thing to mention given who would be using this workflow.
Line 564: what about mentioning the recent preprint showing 8$ library prep from the Baym et al? http://biorxiv.org/content/early/2015/01/16/013771
Line 708: make sure to mention to not copy/paste the carriage return either