pipe.Rd documenting
%>% with a \usage section.Rf_findVar with
R_getVar for R 4.6.0 compatibility.gintervals.load() failing with “Intervals set
does not exist” for intervals from databases loaded via
gdataset.load().Fixed gdataset.load() performance: loading a large
database as a dataset is now fast regardless of call order. Previously,
gsetroot(small_db); gdataset.load(large_db) was extremely
slow because the dataset scan used R’s
list.files(recursive=TRUE), which lists every file in the
tree. Replaced with a fast path that reads .db.cache or
falls back to C++ fts-based scanning. Also changed
gdb.reload() calls in
gdataset.load()/gdataset.unload() to use
rescan=FALSE, so existing databases use their cache instead
of being rescanned.
Fixed collision detection for hierarchical tracks in
gdataset.load(): the R-level scan returned
/-separated names while the dataset map uses
.-separated names, causing collision detection to silently
fail for tracks in subdirectories.
Added motif format import functions:
gseq.read_meme(), gseq.read_jaspar(), and
gseq.read_homer() for reading MEME, JASPAR PFM, and HOMER
motif formats. Returns named lists of position probability matrices
directly usable with gseq.pwm(). All parsers are native
with no new dependencies.
Added track export functions:
gtrack.export_bedgraph() and
gtrack.export_bigwig() for exporting tracks and track
expressions to standard bedGraph and BigWig formats. Supports gzip
compression, virtual tracks, track expressions, and custom
iterators.
gsynth.save() and gsynth.load() now use
the cross-platform .gsm format (YAML metadata + binary
arrays) instead of R-specific RDS. Models saved with pymisha can now be
loaded in R and vice versa. Legacy RDS files are still supported for
backward compatibility.
Added compress parameter to
gsynth.save() to optionally save as a ZIP archive.
Added gsynth.convert() to convert legacy RDS model
files to the new .gsm format.
Added gintervals.attr.get(),
gintervals.attr.set(),
gintervals.attr.export(), and
gintervals.attr.import() for managing interval set
attributes. Attributes are stored as .iattr binary files
(null-separated key/value pairs) next to .interv files for
small interval sets, or inside the directory for big interval
sets.
gintervals.rm() now cleans up companion
.iattr attribute files when deleting interval
sets.
Replaced the naive variance formula (E[X²]-E[X]²) with Welford’s numerically stable online algorithm for standard deviation computation in all track types (GenomeTrackFixedBin, GenomeTrackSparse, GenomeTrackInMemory, GenomeTrackArrays). The naive formula is prone to catastrophic cancellation when values are large or have small variance relative to their mean.
Improved sum accumulation precision by using double-precision
intermediate accumulators in all non-sliding sum paths. The
m_last_sum member remains float for API compatibility, but
per-interval accumulation now happens in double, eliminating float→float
rounding errors for large sums.
Fixed potential out-of-bounds access in SAMPLE/SAMPLE_POS functions when the random number generator returns exactly 1.0. Added bounds clamping (ported from pymisha).
Fixed umask leak in GenomeTrack::write_type() and
GenomeTrackFixedBin::init_write() — the old umask is now
saved and restored immediately after open(), preventing
permanent process-wide umask changes.
Improved header validation in
GenomeTrackFixedBin::init_read() to use integer modulo
instead of floating-point division, with an additional underflow
guard.
Added reusable scratch buffers
(m_scratch_all_values,
m_scratch_all_positions) to GenomeTrackFixedBin, avoiding
per-call heap allocation for SAMPLE/SAMPLE_POS operations (ported from
pymisha).
Added indexed format support for 2D tracks (rectangles and
points). Per-chromosome-pair files can now be consolidated into a single
track.dat + track.idx, matching the indexed
format already available for 1D tracks.
New function gtrack.2d.convert_to_indexed() converts
existing 2D tracks to indexed format with optional removal of old
per-pair files.
gtrack.convert_to_indexed() now dispatches to 2D
conversion automatically when given a rectangles or points
track.
gdb.convert_to_indexed() with
convert_tracks = TRUE now includes 2D tracks in batch
conversion.
2D tracks created via gtrack.2d.create(),
gtrack.2d.import(), and
gtrack.2d.import_contacts() are automatically converted to
indexed format when the database is in indexed mode.
Fixed NaN comparison bug in distance.center virtual
track with iterator modifiers — used != instead of
std::isnan(), causing the guard condition to always be true
under IEEE 754 semantics.
Fixed distance virtual track returning NaN instead
of a valid distance when one side of the sorted interval array has no
intervals on the current chromosome (NaN-unsafe min in
sequential path).
Fixed gtrack.smooth MEAN algorithm producing
spurious NaN values during periodic recalculation when NaN values are
present in the smoothing window. The recalculation checked
isnan on the accumulator instead of the incoming value,
permanently poisoning the sum.
Fixed incorrect standard deviation/variance computation in virtual tracks with filter/mask — the parallel variance formula used the already-incremented weight instead of the pre-increment value.
Fixed min.pos/max.pos virtual track
aggregation comparing genomic positions instead of signal values when
combining sub-intervals, returning the leftmost/rightmost sub-interval
position instead of the position of the actual min/max value.
Fixed missing out_of_range check in
distance.center virtual track with iterator modifiers,
causing undefined behavior when all source intervals lie upstream of the
query coordinate.
Fixed copy-paste error in quantile track computation where
highest_vals buffer was resized using
kid_lowest_vals_buf_size instead of
kid_highest_vals_buf_size.
Fixed int32 truncation of int64_t
genomic coordinates in gintervals.normalize — could
silently produce wrong results for chromosomes longer than ~2.1
Gbp.
Fixed p-value transformation (pv function) being
skipped when virtual track has both a filter and a p-value function
active.
Fixed lse_accumulate double overload producing NaN
when both inputs are -infinity (missing isinf
guards that the float overload had).
Fixed GInterval::dist2coord treating
coord == end as inside the interval, inconsistent with the
half-open [start, end) convention used throughout the
codebase.
Fixed distance.center virtual track function
returning incorrect values when gextract is called with
overlapping input intervals. The sequential scanner could miss
containing source intervals when processing bins that go backward in
position between overlapping regions.
Fixed neighbor.count virtual track function
undercounting neighbors when gextract is called with
overlapping input intervals. Same root cause as the
distance/distance.edge fix in 5.4.6.
distance and distance.edge virtual
track functions returning incorrect values when gextract is
called with overlapping input intervals. The sequential scanner could
miss closer source intervals when processing bins that go backward in
position between overlapping regions.GenomeTrackBinnedTransform.cpp by avoiding arithmetic
between distinct anonymous enum types.gdb.export_fasta() to export all contigs from a
database to a multi-FASTA file, defaulting to the current gdb with
optional groot override.gscreen and
gextract with dense iterators.lse virtual track function that computes the
log-sum-exp of the values in the iterator interval.gsetroot() for your primary writable database, then
gdataset.load() to add read-only datasetsforce=TRUE
to override (working db always wins)gdataset.load(): Load a dataset into the namespace
(tracks and intervals become available)gdataset.unload(): Remove a dataset from the
namespacegdataset.save(): Create a new dataset from selected
tracks/intervalsgdataset.ls(): List working database and all loaded
datasetsgdataset.info(): Show metadata and contents of a
datasetgtrack.dataset(): Get the source path for a track
(working db or dataset)gtrack.dbs(): Get all paths where a track exists (for
debugging shadowed tracks)gintervals.dataset(): Get the source path for an
interval setgintervals.dbs(): Get all paths where an interval set
existsdb parameter for filtering by source:
gtrack.ls(db = "/path/to/dataset"): List tracks from a
specific sourcegintervals.ls(db = "/path/to/dataset"): List intervals
from a specific sourcechrom_sizes.txt files
(same genome assembly)gsetroot() works unchangedgtrack.mv(): Rename or move a track within the same
databasegtrack.copy(): Copy a track (can copy between databases
when multiple are loaded)gcor function that computes correlation between
two tracks, or between multiple pairs of tracks.gextract and gscreen almost always
did not enable multitasking mode due to incorrect gating.dataframe and names parameters to
gdist function that return a data frame instead of an
N-dimensional vector.gsynth.train, gsynth.sample and
gsynth.save functions that train a Markov model from a
genome sequence and sample a synthetic genome from the modelgseq.kmer.dist function that counts the number of
occurrences of k-mers in genomic intervals.gtrack.liftover that created overlapping
intervals when lifting sparse tracks.gintervals.normalize.gintervals.normalize now
returns +1bp for intervals with odd sizes:
interval_relative parameter to
giterator.intervals() for interval-aligned bin
iteration.genome.seq +
genome.idx files instead of per-chromosome filesgdb.info(),
gdb.convert_to_indexed(),
gtrack.convert_to_indexed(),
gintervals.convert_to_indexed(),
gintervals.2d.convert_to_indexed()options(gmulticontig.indexed_format = FALSE) to
create databases in legacy format for compatibility with older misha
versionsvignette("Database-Formats") for more details.gmax.processes automatically set to 70% of available
CPU coresgmax.data.size coordinated with process limits to
ensure total memory usage <= 70% of RAM (capped at 10GB per
process)gmax.data.size = min((RAM * 0.7) / gmax.processes, 10GB)
ensures safe memory usage across all parallel processesgmax.processes * 1000 records (e.g., 2K on laptops, 89K on
128-core servers)options()vignette("Manual")
for detailsgvtrack.create with src parameter). These
tracks behave exactly like regular sparse tracks, but are stored in
memory and can be used in track expressions.sshift, eshift and
filter parameters to gvtrack.create.gintervals.path() and gtrack.path()
functions that return the actual file system paths for interval sets and
tracks.masked.count and masked.frac virtual
track functions that count and fraction masked base pairs (lowercase
letters) in the current iterator interval.distance.edge virtual track function that
computes edge-to-edge distance from the iterator interval to the closest
source interval, using the same calculation as
gintervals.neighbors.gtrack.liftover did not fill chromosomes missing
the chain with NA values. This caused errors when trying to access the
tracks afterwards.gintervals.as_chain function that converts a data
frame to a chain object.gintervals.liftover via value_col and
multi_target_agg parameters.src_overlap_policy and
tgt_overlap_policy parameters to
gintervals.liftover, gintervals.load_chain,
and gtrack.liftover functions.gtrack.liftover via
multi_target_agg parameter.gintervals.load_chain now returns valid misha intervals
instead of a chain object.gintervals.load_chain now includes score
and chain_id columns for all loaded chainsmin_score parameter in
gintervals.load_chain, gintervals.liftover,
and gtrack.liftover filters out low-quality chainstgt_overlap_policy = "auto_score" (or
"auto") selects the best chain mapping based on alignment
score (highest score → longest span → lowest chain_id)include_metadata parameter in
gintervals.liftover optionally returns score and chain_id
for each mapping BREAKING: “auto” is now an alias for
“auto_score”. For the old behavior, use
tgt_overlap_policy = "auto_first".canonic parameter to
gintervals.liftover (default FALSE) to merge
adjacent target intervals resulting from the same source interval and
chain.tgt_overlap_policy = "best_cluster_union" (default,
aliased as "best_source_cluster"): Uses source union
coveragetgt_overlap_policy = "best_cluster_sum": Uses sum of
target lengthstgt_overlap_policy = "best_cluster_max": Uses longest
single membermax.pos.abs, max.pos.relative,
min.pos.abs, min.pos.relative: Returns the
position of the maximum/minimum value in the iterator intervalexists: Returns 1 if any value exists (or specific vals
if provided), 0 otherwisesize: Returns the number of non-NaN values in the
iterator intervalsample: Returns a uniformly sampled source value from
the iterator intervalsample.pos.abs and sample.pos.relative:
Returns the position of a uniformly sampled valuefirst and last: Returns the first/last
value in the iterator intervalfirst.pos.abs, first.pos.relative,
last.pos.abs, last.pos.relative: Returns the
position of the first/last valuegintervals.neighbors when using
mindist=0, maxdist=0: the function would miss zero-distance
(touching) intervals when using mindist=0, maxdist=0.pwm.count with spatial sliding windows
double-counting bidirectional hits (forward + reverse) at the same
genomic position; the sliding path now matches the baseline per-position
union semantics.gintervals.load_chain now
returns a data frame with 8 columns instead of 7. Columns are:
chrom, start, end,
strand, chromsrc, startsrc,
endsrc, strandsrc.src_overlap_policy and
tgt_overlap_policy parameters to
gintervals.load_chain, gtrack.liftover and
gintervals.liftover functions.neighbor.count virtual track.gintervals.mark_overlaps function that marks
overlapping intervals with a group ID.pssm parameter of
gvtrack.create and gseq.pwm functions.gseq.pwm and added
neutral_chars_policy parameter.pwm, pwm.max and pwm.count) for
dense iterators when spatial weighting is disabled, providing
significant performance improvements for consecutive genomic
intervals.pwm.count(bidirect=TRUE) now
counts per-position union of strands (via log-sum-exp),
aligning with pwm/pwm.max. Each position
contributes at most 1 to the count. To reproduce the old per-strand-sum
behavior, add the two strand-specific counts:
pwm.count(bidirect=FALSE, strand=1) + pwm.count(bidirect=FALSE, strand=-1).gseq.pwm and gseq.kmer functions
that compute pwm and kmer scores on sequences without the need for a
genome database.gseq.rev and gseq.comp functions
that reverse and complement DNA sequences without the need for a genome
database.gseq.revcomp alias for grevcomp
function.gintervals.random function that generates random
genome intervals.gintervals.covered_bp and
gintervals.coverage_fraction functions that calculate the
number of base pairs and the fraction of base pairs covered by a set of
intervals.gvtrack.filters: per vtrack genomic masks.pwm.count virtual track function that counts the
number of occurrences of a PWM in the current iterator interval.gintervals.neighbors.upstream() - Find upstream
neighbors relative to query strandgintervals.neighbors.downstream() - Find downstream
neighbors relative to query strandgintervals.neighbors.directional() - Find both upstream
and downstream neighborsuse_intervals1_strand parameter to
gintervals.neighbors() to use query intervals’ strand for
distance directionality.warn.ignored.strand parameter to
gintervals.neighbors() to control warnings when query
strand is ignored.gintervals.neighbors: a stack imbalance
in the C++ code in very rare cases of 2D intervals.gintervals.neighbors due to unbalanced
rprotect calls.gintervals.normalize and
gintervals.annotate functions.m1-asan build.pwm and kmer virtual track
functions: iterator shifts were not applied.colnames parameter to
gintervals.mapply function.attrs parameter to gtrack.import
function.created.user default attribute in track creation
functions.gtrack.import function.gtrack.create_dense function - creates a dense
track from an intervals and values.clock_gettime is missing).gtrack.import_bigwig:
intern argument was not passed to system
calls.grevcomp function (reverse complement of a DNA
sequence).gvtrack.create.gdb.create_genome function.R_curErrorBuf,
SET_TYPEOFRf_
prefix in the c++ code.gtrack.import from bigwig.ALLGENOME is now only soft deprecated in order to
support old misha scripts.gtrack.create_dirs function.gcluster.run.gintervals.neighbors..misha. Variables such as
ALLGENOME can now be accessed as
.misha$ALLGENOME. This change is not backwards compatible,
please update your code accordingly.gintervals.neighbors (same as
gintervals.neighbors1 from misha.ext). This
means that instead of having two columns of ‘chrom’, ‘start’ and ‘end’,
the resulting data frame would have ‘chrom1’, ‘start1’ and ‘end1’.gwget now uses curl in order to work on
systems that do not have ftp installed.markdown format.Genomes vignette that demonstrates how to
create a new genome database.