Current ten genomes: hg19, hg38, mm10, bosTau, canFam, danRer, galGal, ratNor, sacCer and susScr. Corresponding processed files of ten genomes have been provided in ftp://222.200.187.83/genomes Each genome has 7 kinds of files: 1). gennome sequence (.genome) 2). gene position (.pos) 3). exon region for on-target (.on.exon) 4). fasta for Off-Spotter (.fa) 5). exon region for Off-Spotter (.off.exon) 6). phast conserved element (.phast) 7). gene structure (.structure) This README file covers the following topics: 1. Download source 2. Brief introduction about several kinds of files ########################################################################################## 1. Download source 1). Download source of fasta and gtf for 10 genomes: ①hg19,hg38,mm10 were downloaded from GENCODE database. Human: http://www.gencodegenes.org/releases/ Mouse: http://www.gencodegenes.org/mouse_releases/ hg19----Homo sapiens(human)--GENCODE release19(hg19) hg38----Homo sapiens(human)--GENCODE release22(hg38) mm10----Mus musculus(mouse)--GENCODE releaseM4(mm10) ②bosTau, canFam, danRer, galGal, ratNor, sacCer and susScr were download from Ensembl database: http://asia.ensembl.org/info/data/ftp/index.html bosTau--Bos taurus(Cow)--Ensembl release84(bosTau6) canFam--Canis lupus familiaris(Dog)--Ensembl release84(canFam3) danRer--Danio rerio(Zabrafish)--Ensembl release84(danRer10) galGal--Gallus gallus(Chicken)--Ensembl release84(galGal4) ratNor--Rattus norvegicus(Rat)--Ensembl release84(ratNor6) sacCer--Saccharomyces cerevisiae(Yeast)--Ensembl release84(sacCer3) susScr--Sus scrofa(Pig)--Ensembl release84(susScr10) 2). Download source of phastConsElements: hg19----http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ hg38----http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/ mm10----http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/ bosTau--http://hgdownload.cse.ucsc.edu/goldenPath/bosTau4/database/ canFam--http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/database/ danRer--http://hgdownload.cse.ucsc.edu/goldenPath/danRer7/database/ galGal--http://hgdownload.cse.ucsc.edu/goldenPath/galGal3/database/ ratNor--http://hgdownload.cse.ucsc.edu/goldenPath/rn5/database/ susScr--absent in UCSC saccer--http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/database/ ****three steps of processing phastConsElements files: ①. convert chrM into chrMT for bosTau, canFam, danRer, galGal, ratNor by awk; eg. awk '{gsub(/chrM/,"chrMT");print}' phastConsElements5way.txt > bosTau4.awk ②. convert into bed format by python script ③. convert into the genome assembly consistent with Ensembl release84 by UCSC liftOver eg. ./liftOver bosTau4.phast bosTau4ToBosTau6.over.chain bosTau.phast bosTau.unmap ########################################################################################## 2. Brief introduction about several kinds of files 1). gennome sequence for fastaFromBed (chr1...chrM for hg19, hg38 and mm10; chr1...chrMT for bosTau, canFam, danRer, galGal, ratNor and susScr; chr1...chr17 for sacCer) example: >chr1 NNNNNNNNNNNNNGGGG....GGGGNNNNNNNN >chrM NNNNNNNNNNNNNGGGG....GGGGNNNNNNNN 2). gene position (Chromosome number is the same to gennome sequence.) format: gene_name \t genomic_region example: TERT chr5:1253262-1295184 MYC chr8:128747680-128753674 3). exon region for on-target (Chromosome number is the same to gennome sequence.) To be faster, we have merged the overlapped exon regions as bed format. chr5 1253262 1253946 ENSG00000164362.14_TERT_exon 1 - chr5 1254483 1254620 ENSG00000164362.14_TERT_exon 1 - chr5 1255402 1255526 ENSG00000164362.14_TERT_exon 1 - 4). fasta for Off-Spotter (Chromosome number: 1...n) example: >1 NNNNNNNNNNNNNGGGG....GGGGNNNNNNNN >25 NNNNNNNNNNNNNGGGG....GGGGNNNNNNNN 5). exon region for Off-Spotter (Chromosome number: 1...n) To be faster, we have merged the overlapped exon regions. The annotation file should be ordered as chromosome number. example: 5 - 1253262 1253946 ENSG00000164362.14_TERT_exon 5 - 1254483 1254620 ENSG00000164362.14_TERT_exon 6). phast conserved element(Chromosome number is the same to gennome sequence.) example: chr1 11991 11995 score=240 240 . chr1 12006 12020 score=354 354 . 7). gene structure(Chromosome number is the same to gennome sequence.) example: DDX11L1 upstream_1k chr1:10869-11869 DDX11L1 downstream_1k chr1:14412-15412 DDX11L1 exon chr1:11869-12227 ########################################################################################## ****chromosome conversions for Off-Spotter in 4). and 5). hg19,hg38,mm10----1...22,X,Y,M was converted into 1...22, X=23,Y=24,M=25 bosTau----1...29,X,MT to 1...29, X=30,MT=30 canFam----1...38,X,MT to 1...38, X=39,MT=40 danRer----1...25,MT to 1...25, MT=26 galGal----1...28,W,Z,MT to 1...28, W=29,Z=20,MT=31 ratNor----1...20,X,Y,MT to 1...20, X=21,Y=22,MT=23 susScr----1...18,X,Y,MT to 1...18, X=19,Y=20,MT=23 sacCer----I,II...Mito to 1,2...17