public class UCSCParser extends Object implements TranscriptParser
knownGene.txt
, is
tab-separated and has the following fields:
KnownToLocusLink.txt
file, which contains cross references from
the ucsc IDs to the corresponding Entrez Gene ids (earlier known as Locus Link):
uc010eve.3 3805 uc002qug.4 3805 uc010evf.3 3805 ...The class additionally parses the files
knownGeneMrna.txt
and kgXref.txt
.
The result of parsing is the creation of a list of TranscriptModel
objects.
It is possible to parse directly from the gzip file without decompressing them, or the start from the decompressed
files. The class checks of the files exist and if they have the suffix "gz". *Constructor and Description |
---|
UCSCParser(ReferenceDictionary refDict,
String basePath,
org.ini4j.Profile.Section iniSection) |
Modifier and Type | Method and Description |
---|---|
TranscriptModelBuilder |
parseTranscriptModelFromLine(String line)
The function parses a single line of the knownGene.txt file.
|
com.google.common.collect.ImmutableList<TranscriptModel> |
run() |
public UCSCParser(ReferenceDictionary refDict, String basePath, org.ini4j.Profile.Section iniSection)
refDict
- path to ReferenceDictionary
to use for name/id and id/length mapping.basePath
- path to where the to-be-parsed files liveiniSection
- Profile.Section
with configuration from INI filepublic com.google.common.collect.ImmutableList<TranscriptModel> run() throws TranscriptParseException
run
in interface TranscriptParser
TranscriptModel
objects as parsed from the input.TranscriptParseException
- on problems with parsing the transcript filespublic TranscriptModelBuilder parseTranscriptModelFromLine(String line) throws TranscriptParseException
line
- A single line of the UCSC knownGene.txt fileTranscriptModelBuilder
representing the lineTranscriptParseException
- on problems parsing the dataCopyright © 2016. All rights reserved.