Results are 99.4% similar, and the classical Venn helps us visualize the very minute difference between these results. This is showing a comparison of the VCF file from the original SNP discovery analysis for one dataset, and a second run of the SNP discovery pipeline on the same dataset, but with technical replicates removed (blue). Personally, I can validate this by saying that I recently ran vcf-compare and bcftools isec (as per above) and used the results to generate a Venn diagram with jvenn, shown below: Once you had the output of running these programs in hand, it would then be possible to do a number of things, such as report common/different SNPs between runs or treatments, conduct statistical anlaysis, or create a Venn diagram of common/different SNPs between multiple VCF files to visualize the differences. On mac or Linux with bcftools installed, you could use something like the following (where $ is the command line prompt) to get the list of SNPs at the intersection of two or more VCF files: $ bcftools isec -n +2 | bgzip -c > isec_file1-v-2_Īlternatively, if you wanted just statistics on the numbers of SNPs/variants or genotypes in common between files, you could use the vcf-compare tool that comes with vcftools. output : 3 is 1 line 2 Line 1 number 2 Number 1 one 1 this 2 This 1 tow 1 Tow. I can find the frequency of each word using the following cmd. To mention other options, bcftools is supposedly faster at this, and if you use bcftools what you want is the intersection function, isec. This is line number one This is Line Number Tow this is Line Number tow. The output file has the suffix “.diff.sites_in_files”.” I tested it with apache2's access.log (it's configurable though, so you'll need to check), and it worked for me. 0-9\+\)./\1/' -e t -e d access.log sort uniq -c Which will print each IP (will only work with ipv4 though), sorted prefixed with the count. “Outputs the sites that are common / unique to each file. 17 You'll need a short pipeline at least. This option for the -diff flag is listed in the documentation as having the following function: from different SNP discovery pipelines, or two treatments of an experiment)?”, you might ask.īelow, I provide a post based on my recent answer to this ResearchGate question that provides some solutions for this problem.įirst, the vcftools -diff -diff-site option would work for this specific case. “What are the SNPs or variants that are shared in common between two VCF files I created (e.g.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |