Showing preview only (290K chars total). Download the full file or copy to clipboard to get everything.
Repository: deeptools/pyBigWig
Branch: master
Commit: 9c7d9d90331d
Files: 28
Total size: 278.9 KB
Directory structure:
gitextract_xavh5mxs/
├── .environmentLinux.yaml
├── .github/
│ └── workflows/
│ ├── build.yml
│ └── pypi.yml
├── .gitignore
├── .gitmodules
├── LICENSE.txt
├── MANIFEST.in
├── README.md
├── libBigWig/
│ ├── LICENSE
│ ├── README.md
│ ├── bigWig.h
│ ├── bigWigIO.h
│ ├── bwCommon.h
│ ├── bwRead.c
│ ├── bwStats.c
│ ├── bwValues.c
│ ├── bwValues.h
│ ├── bwWrite.c
│ └── io.c
├── pyBigWig.c
├── pyBigWig.h
├── pyBigWigTest/
│ ├── __init__.py
│ ├── test.bigBed
│ ├── test.bw
│ └── test.py
├── pyproject.toml
├── setup.cfg
└── setup.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .environmentLinux.yaml
================================================
name: foo
channels:
- conda-forge
- bioconda
- default
dependencies:
- gcc_linux-64
- curl
- zlib
- python = 3.9
- pip
- numpy
- pytest
================================================
FILE: .github/workflows/build.yml
================================================
name: Test
on:
pull_request:
push:
jobs:
testLinux:
name: Test Conda Linux
runs-on: "ubuntu-latest"
defaults:
run:
shell: bash -l {0}
steps:
- uses: actions/checkout@v2
- uses: conda-incubator/setup-miniconda@v2
with:
activate-environment: foo
environment-file: .environmentLinux.yaml
python-version: 3.9
auto-activate-base: false
- run: |
pip install .
pytest pyBigWigTest/test.py
test-builds:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install build prerequisites
run: |
python -m pip install --upgrade build numpy
- name: Install cibuildwheel
run: |
python -m pip install --upgrade cibuildwheel
- name: Build wheel(s)
run: |
python -m cibuildwheel --output-dir wheelhouse
- name: Build sdist
run: |
python -m build --sdist
- uses: actions/upload-artifact@v6
with:
name: pyBigWig-build
path: |
wheelhouse/*
dist/pyBigWig*.tar.gz
================================================
FILE: .github/workflows/pypi.yml
================================================
name: pypi
on: [push]
jobs:
pypi:
name: upload to pypi
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install build prerequisites
if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags')
run: |
python -m pip install --upgrade twine build cibuildwheel numpy
- name: sdist
if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags')
run: |
python -m build --sdist
- name: wheel
if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags')
run: |
python -m cibuildwheel --output-dir wheelhouse
- name: upload
if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags')
env:
TWINE_USERNAME: "__token__"
TWINE_PASSWORD: ${{ secrets.pypi_password }}
run: |
twine upload dist/*
twine upload wheelhouse/*
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
# Translations
*.mo
*.pot
# Django stuff:
*.log
# Sphinx documentation
docs/_build/
# PyBuilder
target/
*.o
#./setup.py sdist creates this
MANIFEST
*.swp
================================================
FILE: .gitmodules
================================================
================================================
FILE: LICENSE.txt
================================================
The MIT License (MIT)
Copyright (c) 2015 Devon Ryan
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: MANIFEST.in
================================================
include *.h
include **/*.h
================================================
FILE: README.md
================================================
[](https://badge.fury.io/py/pyBigWig) [](https://travis-ci.org/dpryan79/pyBigWig.svg?branch=master) [](http://bioconda.github.io) [](http://dx.doi.org/10.5281/zenodo.45238)
# pyBigWig
A python extension, written in C, for quick access to bigBed files and access to and creation of bigWig files. This extension uses [libBigWig](https://github.com/dpryan79/libBigWig) for local and remote file access.
Table of Contents
=================
* [Installation](#installation)
* [Requirements](#requirements)
* [Usage](#usage)
* [Load the extension](#load-the-extension)
* [Open a bigWig or bigBed file](#open-a-bigwig-or-bigbed-file)
* [Determining the file type](#determining-the-file-type)
* [Access the list of chromosomes and their lengths](#access-the-list-of-chromosomes-and-their-lengths)
* [Print the header](#print-the-header)
* [Compute summary information on a range](#compute-summary-information-on-a-range)
* [A note on statistics and zoom levels](#a-note-on-statistics-and-zoom-levels)
* [Retrieve values for individual bases in a range](#retrieve-values-for-individual-bases-in-a-range)
* [Retrieve all intervals in a range](#retrieve-all-intervals-in-a-range)
* [Retrieving bigBed entries](#retrieving-bigbed-entries)
* [Add a header to a bigWig file](#add-a-header-to-a-bigwig-file)
* [Adding entries to a bigWig file](#adding-entries-to-a-bigwig-file)
* [Close a bigWig or bigBed file](#close-a-bigwig-or-bigbed-file)
* [Numpy](#numpy)
* [Remote file access](#remote-file-access)
* [Empty files](#empty-files)
* [A note on coordinates](#a-note-on-coordinates)
* [Galaxy](#galaxy)
# Installation
You can install this extension directly from github with:
pip install pyBigWig
or with conda
conda install pybigwig -c conda-forge -c bioconda
## Requirements
The follow non-python requirements must be installed:
- libcurl (and the `curl-config` config)
- zlib
The headers and libraries for these are required.
# Usage
Basic usage is as follows:
## Load the extension
>>> import pyBigWig
## Open a bigWig or bigBed file
This will work if your working directory is the pyBigWig source code directory.
>>> bw = pyBigWig.open("test/test.bw")
Note that if the file doesn't exist you'll see an error message and `None` will be returned. Be default, all files are opened for reading and not writing. You can alter this by passing a mode containing `w`:
>>> bw = pyBigWig.open("test/output.bw", "w")
Note that a file opened for writing can't be queried for its intervals or statistics, it can *only* be written to. If you open a file for writing then you will next need to add a header (see the section on this below).
Local and remote bigBed read access is also supported:
>>> bb = pyBigWig.open("https://www.encodeproject.org/files/ENCFF001JBR/@@download/ENCFF001JBR.bigBed")
While you can specify a mode for bigBed files, it is ignored. The object returned by `pyBigWig.open()` is the same regardless of whether you're opening a bigWig or bigBed file.
## Determining the file type
Since bigWig and bigBed files can both be opened, it may be necessary to determine whether a given `bigWigFile` object points to a bigWig or bigBed file. To that end, one can use the `isBigWig()` and `isBigBed()` functions:
>>> bw = pyBigWig.open("test/test.bw")
>>> bw.isBigWig()
True
>>> bw.isBigBed()
False
## Access the list of chromosomes and their lengths
`bigWigFile` objects contain a dictionary holding the chromosome lengths, which can be accessed with the `chroms()` accessor.
>>> bw.chroms()
dict_proxy({'1': 195471971L, '10': 130694993L})
You can also directly query a particular chromosome.
>>> bw.chroms("1")
195471971L
The lengths are stored a the "long" integer type, which is why there's an `L` suffix. If you specify a non-existant chromosome then nothing is output.
>>> bw.chroms("c")
>>>
## Print the header
It's sometimes useful to print a bigWig's header. This is presented here as a python dictionary containing: the version (typically `4`), the number of zoom levels (`nLevels`), the number of bases described (`nBasesCovered`), the minimum value (`minVal`), the maximum value (`maxVal`), the sum of all values (`sumData`), and the sum of all squared values (`sumSquared`). The last two of these are needed for determining the mean and standard deviation.
>>> bw.header()
{'maxVal': 2L, 'sumData': 272L, 'minVal': 0L, 'version': 4L, 'sumSquared': 500L, 'nLevels': 1L, 'nBasesCovered': 154L}
Note that this is also possible for bigBed files and the same dictionary keys will be present. Entries such as `maxVal`, `sumData`, `minVal`, and `sumSquared` are then largely not meaningful.
## Compute summary information on a range
bigWig files are used to store values associated with positions and ranges of them. Typically we want to quickly access the average value over a range, which is very simple:
>>> bw.stats("1", 0, 3)
[0.2000000054637591]
Suppose instead of the mean value, we instead wanted the maximum value:
>>> bw.stats("1", 0, 3, type="max")
[0.30000001192092896]
Other options are "min" (the minimum value), "coverage" (the fraction of bases covered), and "std" (the standard deviation of the values).
It's often the case that we would instead like to compute values of some number of evenly spaced bins in a given interval, which is also simple:
>>> bw.stats("1",99, 200, type="max", nBins=2)
[1.399999976158142, 1.5]
`nBins` defaults to 1, just as `type` defaults to `mean`.
If the start and end positions are omitted then the entire chromosome is used:
>>> bw.stats("1")
[1.3351851569281683]
### A note on statistics and zoom levels
> A note to the lay reader: This section is rather technical and included only for the sake of completeness. The summary is that if your needs require exact mean/max/etc. summary values for an interval or intervals and that a small trade-off in speed is acceptable, that you should use the `exact=True` option in the `stats()` function.
By default, there are some unintuitive aspects to computing statistics on ranges in a bigWig file. The bigWig format was originally created in the context of genome browsers. There, computing exact summary statistics for a given interval is less important than quickly being able to compute an approximate statistic (after all, browsers need to be able to quickly display a number of contiguous intervals and support scrolling/zooming). Because of this, bigWig files contain not only interval-value associations, but also `sum of values`/`sum of squared values`/`minimum value`/`maximum value`/`number of bases covered` for equally sized bins of various sizes. These different sizes are referred to as "zoom levels". The smallest zoom level has bins that are 16 times the mean interval size in the file and each subsequent zoom level has bins 4 times larger than the previous. This methodology is used in Kent's tools and, therefore, likely used in almost every currently existing bigWig file.
When a bigWig file is queried for a summary statistic, the size of the interval is used to determine whether to use a zoom level and, if so, which one. The optimal zoom level is that which has the largest bins no more than half the width of the desired interval. If no such zoom level exists, the original intervals are instead used for the calculation.
For the sake of consistency with other tools, pyBigWig adopts this same methodology. However, since this is (A) unintuitive and (B) undesirable in some applications, pyBigWig enables computation of exact summary statistics regardless of the interval size (i.e., it allows ignoring the zoom levels). This was originally proposed [here](https://github.com/dpryan79/pyBigWig/issues/12) and an example is below:
>>> import pyBigWig
>>> from numpy import mean
>>> bw = pyBigWig.open("http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign75mer.bigWig")
>>> bw.stats('chr1', 89294, 91629)
[0.20120902053804418]
>>> mean(bw.values('chr1', 89294, 91629))
0.22213841940688142
>>> bw.stats('chr1', 89294, 91629, exact=True)
[0.22213841940688142]
## Retrieve values for individual bases in a range
While the `stats()` method **can** be used to retrieve the original values for each base (e.g., by setting `nBins` to the number of bases), it's preferable to instead use the `values()` accessor.
>>> bw.values("1", 0, 3)
[0.10000000149011612, 0.20000000298023224, 0.30000001192092896]
The list produced will always contain one value for every base in the range specified. If a particular base has no associated value in the bigWig file then the returned value will be `nan`.
>>> bw.values("1", 0, 4)
[0.10000000149011612, 0.20000000298023224, 0.30000001192092896, nan]
## Retrieve all intervals in a range
Sometimes it's convenient to retrieve all entries overlapping some range. This can be done with the `intervals()` function:
>>> bw.intervals("1", 0, 3)
((0, 1, 0.10000000149011612), (1, 2, 0.20000000298023224), (2, 3, 0.30000001192092896))
What's returned is a list of tuples containing: the start position, end end position, and the value. Thus, the example above has values of `0.1`, `0.2`, and `0.3` at positions `0`, `1`, and `2`, respectively.
If the start and end position are omitted then all intervals on the chromosome specified are returned:
>>> bw.intervals("1")
((0, 1, 0.10000000149011612), (1, 2, 0.20000000298023224), (2, 3, 0.30000001192092896), (100, 150, 1.399999976158142), (150, 151, 1.5))
## Retrieving bigBed entries
As opposed to bigWig files, bigBed files hold entries, which are intervals with an associated string. You can access these entries using the `entries()` function:
>>> bb = pyBigWig.open("https://www.encodeproject.org/files/ENCFF001JBR/@@download/ENCFF001JBR.bigBed")
>>> bb.entries('chr1', 10000000, 10020000)
[(10009333, 10009640, '61035\t130\t-\t0.026\t0.42\t404'), (10014007, 10014289, '61047\t136\t-\t0.029\t0.42\t404'), (10014373, 10024307, '61048\t630\t-\t5.420\t0.00\t2672399')]
The output is a list of entry tuples. The tuple elements are the `start` and `end` position of each entry, followed by its associated `string`. The string is returned exactly as it's held in the bigBed file, so parsing it is left to you. To determine what the various fields are in these string, consult the SQL string:
>>> bb.SQL()
table RnaElements
"BED6 + 3 scores for RNA Elements data"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
string name; "Name of item"
uint score; "Normalized score from 0-1000"
char[1] strand; "+ or - or . for unknown"
float level; "Expression level such as RPKM or FPKM. Set to -1 for no data."
float signif; "Statistical significance such as IDR. Set to -1 for no data."
uint score2; "Additional measurement/count e.g. number of reads. Set to 0 for no data."
)
Note that the first three entries in the SQL string are not part of the string.
If you only need to know where entries are and not their associated values, you can save memory by additionally specifying `withString=False` in `entries()`:
>>> bb.entries('chr1', 10000000, 10020000, withString=False)
[(10009333, 10009640), (10014007, 10014289), (10014373, 10024307)]
## Add a header to a bigWig file
If you've opened a file for writing then you'll need to give it a header before you can add any entries. The header contains all of the chromosomes, **in order**, and their sizes. If your genome has two chromosomes, chr1 and chr2, of lengths 1 and 1.5 million bases, then the following would add an appropriate header:
>>> bw.addHeader([("chr1", 1000000), ("chr2", 1500000)])
bigWig headers are case-sensitive, so `chr1` and `Chr1` are different. Likewise, `1` and `chr1` are not the same, so you can't mix Ensembl and UCSC chromosome names. After adding a header, you can then add entries.
By default, up to 10 "zoom levels" are constructed for bigWig files. You can change this default number with the `maxZooms` optional argument. A common use of this is to create a bigWig file that simply holds intervals and no zoom levels:
>>> bw.addHeader([("chr1", 1000000), ("chr2", 1500000)], maxZooms=0)
If you set `maxTooms=0`, please note that IGV and many other tools WILL NOT WORK as they assume that at least one zoom level will be present. You are advised to use the default unless you do not expect the bigWig files to be used by other packages.
## Adding entries to a bigWig file
Assuming you've opened a file for writing and added a header, you can then add entries. Note that the entries **must** be added in order, as bigWig files always contain ordered intervals. There are three formats that bigWig files can use internally to store entries. The most commonly observed format is identical to a [bedGraph](https://genome.ucsc.edu/goldenpath/help/bedgraph.html) file:
chr1 0 100 0.0
chr1 100 120 1.0
chr1 125 126 200.0
These entries would be added as follows:
>>> bw.addEntries(["chr1", "chr1", "chr1"], [0, 100, 125], ends=[5, 120, 126], values=[0.0, 1.0, 200.0])
Each entry occupies 12 bytes before compression.
The second format uses a fixed span, but a variable step size between entries. These can be represented in a [wiggle](http://genome.ucsc.edu/goldenpath/help/wiggle.html) file as:
variableStep chrom=chr1 span=20
500 -2.0
600 150.0
635 25.0
The above entries describe (1-based) positions 501-520, 601-620 and 636-655. These would be added as follows:
>>> bw.addEntries("chr1", [500, 600, 635], values=[-2.0, 150.0, 25.0], span=20)
Each entry of this type occupies 8 bytes before compression.
The final format uses a fixed step and span for each entry, corresponding to the fixedStep [wiggle format](http://genome.ucsc.edu/goldenpath/help/wiggle.html):
fixedStep chrom=chr1 step=30 span=20
-5.0
-20.0
25.0
The above entries describe (1-based) bases 901-920, 931-950 and 961-980 and would be added as follows:
>>> bw.addEntries("chr1", 900, values=[-5.0, -20.0, 25.0], span=20, step=30)
Each entry of this type occupies 4 bytes.
Note that pyBigWig will try to prevent you from adding entries in an incorrect order. This, however, requires additional over-head. Should that not be acceptable, you can simply specify `validate=False` when adding entries:
>>> bw.addEntries(["chr1", "chr1", "chr1"], [100, 0, 125], ends=[120, 5, 126], values=[0.0, 1.0, 200.0], validate=False)
You're obviously then responsible for ensuring that you **do not** add entries out of order. The resulting files would otherwise largley not be usable.
## Close a bigWig or bigBed file
A file can be closed with a simple `bw.close()`, as is commonly done with other file types. For files opened for writing, closing a file writes any buffered entries to disk, constructs and writes the file index, and constructs zoom levels. Consequently, this can take a bit of time.
# Numpy
As of version 0.3.0, pyBigWig supports input of coordinates using numpy integers and vectors in some functions **if numpy was installed prior to installing pyBigWig**. To determine if pyBigWig was installed with numpy support by checking the `numpy` accessor:
>>> import pyBigWig
>>> pyBigWig.numpy
1
If `pyBigWig.numpy` is `1`, then pyBigWig was compiled with numpy support. This means that `addEntries()` can accept numpy coordinates:
>>> import pyBigWig
>>> import numpy
>>> bw = pyBigWig.open("/tmp/delete.bw", "w")
>>> bw.addHeader([("1", 1000)], maxZooms=0)
>>> chroms = np.array(["1"] * 10)
>>> starts = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=np.int64)
>>> ends = np.array([5, 15, 25, 35, 45, 55, 65, 75, 85, 95], dtype=np.int64)
>>> values0 = np.array(np.random.random_sample(10), dtype=np.float64)
>>> bw.addEntries(chroms, starts, ends=ends, values=values0)
>>> bw.close()
Additionally, `values()` can directly output a numpy vector:
>>> bw = bw.open("/tmp/delete.bw")
>>> bw.values('1', 0, 10, numpy=True)
[ 0.74336642 0.74336642 0.74336642 0.74336642 0.74336642 nan
nan nan nan nan]
>>> type(bw.values('1', 0, 10, numpy=True))
<type 'numpy.ndarray'>
# Remote file access
If you do not have curl installed, pyBigWig will be installed without the ability to access remote files. You can determine if you will be able to access remote files with `pyBigWig.remote`. If that returns 1, then you can access remote files. If it returns 0 then you can't.
# Empty files
As of version 0.3.5, pyBigWig is able to read and write bigWig files lacking entries. Please note that such files are generally not compatible with other programs, since there's no definition of how a bigWig file with no entries should look. For such a file, the `intervals()` accessor will return `None`, the `stats()` function will return a list of `None` of the desired length, and `values()` will return `[]` (an empty list). This should generally allow programs utilizing pyBigWig to continue without issue.
For those wishing to mimic the functionality of pyBigWig/libBigWig in this regard, please note that it looks at the number of bases covered (as reported in the file header) to check for "empty" files.
# A note on coordinates
Wiggle, bigWig, and bigBed files use 0-based half-open coordinates, which are also used by this extension. So to access the value for the first base on `chr1`, one would specify the starting position as `0` and the end position as `1`. Similarly, bases 100 to 115 would have a start of `99` and an end of `115`. This is simply for the sake of consistency with the underlying bigWig file and may change in the future.
# Galaxy
pyBigWig is also available as a package in [Galaxy](http://www.usegalaxy.org). You can find it in the toolshed and the [IUC](https://wiki.galaxyproject.org/IUC) is currently hosting the XML definition of this on [github](https://github.com/galaxyproject/tools-iuc/tree/master/packages/package_python_2_7_10_pybigwig_0_2_8).
================================================
FILE: libBigWig/LICENSE
================================================
The MIT License (MIT)
Copyright (c) 2015 Devon Ryan
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: libBigWig/README.md
================================================
 [](http://dx.doi.org/10.5281/zenodo.45278)
A C library for reading/parsing local and remote bigWig and bigBed files. While Kent's source code is free to use for these purposes, it's really inappropriate as library code since it has the unfortunate habit of calling `exit()` whenever there's an error. If that's then used inside of something like python then the python interpreter gets killed. This library is aimed at resolving these sorts of issues and should also use more standard things like curl and has a friendlier license to boot.
Documentation is automatically generated by doxygen and can be found under `docs/html` or online [here](https://cdn.rawgit.com/dpryan79/libBigWig/master/docs/html/index.html).
# Example
The only functions and structures that end users need to care about are in "bigWig.h". Below is a commented example. You can see the files under `test/` for further examples.
#include "bigWig.h"
int main(int argc, char *argv[]) {
bigWigFile_t *fp = NULL;
bwOverlappingIntervals_t *intervals = NULL;
double *stats = NULL;
if(argc != 2) {
fprintf(stderr, "Usage: %s {file.bw|URL://path/file.bw}\n", argv[0]);
return 1;
}
//Initialize enough space to hold 128KiB (1<<17) of data at a time
if(bwInit(1<<17) != 0) {
fprintf(stderr, "Received an error in bwInit\n");
return 1;
}
//Open the local/remote file
fp = bwOpen(argv[1], NULL, "r");
if(!fp) {
fprintf(stderr, "An error occurred while opening %s\n", argv[1]);
return 1;
}
//Get values in a range (0-based, half open) without NAs
intervals = bwGetValues(fp, "chr1", 10000000, 10000100, 0);
bwDestroyOverlappingIntervals(intervals); //Free allocated memory
//Get values in a range (0-based, half open) with NAs
intervals = bwGetValues(fp, "chr1", 10000000, 10000100, 1);
bwDestroyOverlappingIntervals(intervals); //Free allocated memory
//Get the full intervals that overlap
intervals = bwGetOverlappingIntervals(fp, "chr1", 10000000, 10000100);
bwDestroyOverlappingIntervals(intervals);
//Get an example statistic - standard deviation
//We want ~4 bins in the range
stats = bwStats(fp, "chr1", 10000000, 10000100, 4, dev);
if(stats) {
printf("chr1:10000000-10000100 std. dev.: %f %f %f %f\n", stats[0], stats[1], stats[2], stats[3]);
free(stats);
}
bwClose(fp);
bwCleanup();
return 0;
}
##Writing example
N.B., creation of bigBed files is not supported (there are no plans to change this).
Below is an example of how to write bigWig files. You can also find this file under `test/exampleWrite.c`. Unlike with Kent's tools, you can create bigWig files entry by entry without needing an intermediate wiggle or bedGraph file. Entries in bigWig files are stored in blocks with each entry in a block referring to the same chromosome and having the same type, of which there are three (see the [wiggle specification](http://genome.ucsc.edu/goldenpath/help/wiggle.html) for more information on this).
#include "bigWig.h"
int main(int argc, char *argv[]) {
bigWigFile_t *fp = NULL;
char *chroms[] = {"1", "2"};
char *chromsUse[] = {"1", "1", "1"};
uint32_t chrLens[] = {1000000, 1500000};
uint32_t starts[] = {0, 100, 125,
200, 220, 230,
500, 600, 625,
700, 800, 850};
uint32_t ends[] = {5, 120, 126,
205, 226, 231};
float values[] = {0.0f, 1.0f, 200.0f,
-2.0f, 150.0f, 25.0f,
0.0f, 1.0f, 200.0f,
-2.0f, 150.0f, 25.0f,
-5.0f, -20.0f, 25.0f,
-5.0f, -20.0f, 25.0f};
if(bwInit(1<<17) != 0) {
fprintf(stderr, "Received an error in bwInit\n");
return 1;
}
fp = bwOpen("example_output.bw", NULL, "w");
if(!fp) {
fprintf(stderr, "An error occurred while opening example_output.bw for writingn\n");
return 1;
}
//Allow up to 10 zoom levels, though fewer will be used in practice
if(bwCreateHdr(fp, 10)) goto error;
//Create the chromosome lists
fp->cl = bwCreateChromList(chroms, chrLens, 2);
if(!fp->cl) goto error;
//Write the header
if(bwWriteHdr(fp)) goto error;
//Some example bedGraph-like entries
if(bwAddIntervals(fp, chromsUse, starts, ends, values, 3)) goto error;
//We can continue appending similarly formatted entries
//N.B. you can't append a different chromosome (those always go into different
if(bwAppendIntervals(fp, starts+3, ends+3, values+3, 3)) goto error;
//Add a new block of entries with a span. Since bwAdd/AppendIntervals was just used we MUST create a new block
if(bwAddIntervalSpans(fp, "1", starts+6, 20, values+6, 3)) goto error;
//We can continue appending similarly formatted entries
if(bwAppendIntervalSpans(fp, starts+9, values+9, 3)) goto error;
//Add a new block of fixed-step entries
if(bwAddIntervalSpanSteps(fp, "1", 900, 20, 30, values+12, 3)) goto error;
//The start is then 760, since that's where the previous step ended
if(bwAppendIntervalSpanSteps(fp, values+15, 3)) goto error;
//Add a new chromosome
chromsUse[0] = "2";
chromsUse[1] = "2";
chromsUse[2] = "2";
if(bwAddIntervals(fp, chromsUse, starts, ends, values, 3)) goto error;
//Closing the file causes the zoom levels to be created
bwClose(fp);
bwCleanup();
return 0;
error:
fprintf(stderr, "Received an error somewhere!\n");
bwClose(fp);
bwCleanup();
return 1;
}
# Testing file types
As of version 0.3.0, this library supports accessing bigBed files, which are related to bigWig files. Applications that need to support both bigWig and bigBed input can use the `bwIsBigWig` and `bbIsBigBed` functions to determine if their inputs are bigWig/bigBed files:
...code...
if(bwIsBigWig(input_file_name, NULL)) {
//do something
} else if(bbIsBigBed(input_file_name, NULL)) {
//do something else
} else {
//handle unknown input
}
Note that these two functions rely on the "magic number" at the beginning of each file, which differs between bigWig and bigBed files.
# bigBed support
Support for accessing bigBed files was added in version 0.3.0. The function names used for accessing bigBed files are similar to those used for bigWig files.
Function | Use
--- | ---
bbOpen | Opens a bigBed file
bbGetSQL | Returns the SQL string (if it exists) in a bigBed file
bbGetOverlappingEntries | Returns all entries overlapping an interval (either with or without their associated strings
bbDestroyOverlappingEntries | Free memory allocated by the above command
Other functions, such as `bwClose` and `bwInit`, are shared between bigWig and bigBed files. See `test/testBigBed.c` for a full example.
# A note on bigBed entries
Inside bigBed files, entries are stored as chromosome, start, and end coordinates with an (optional) associated string. For example, a "bedRNAElements" file from Encode has name, score, strand, "level", "significance", and "score2" values associated with each entry. These are stored inside the bigBed files as a single tab-separated character vector (char \*), which makes parsing difficult. The names of the various fields inside of bigBed files is stored as an SQL string, for example:
table RnaElements
"BED6 + 3 scores for RNA Elements data "
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
string name; "Name of item"
uint score; "Normalized score from 0-1000"
char[1] strand; "+ or - or . for unknown"
float level; "Expression level such as RPKM or FPKM. Set to -1 for no data."
float signif; "Statistical significance such as IDR. Set to -1 for no data."
uint score2; "Additional measurement/count e.g. number of reads. Set to 0 for no data."
)
Entries will then be of the form (one per line):
59426 115 - 0.021 0.48 218
51 209 + 0.071 0.74 130
52 170 + 0.045 0.61 171
59433 178 - 0.049 0.34 296
53 156 + 0.038 0.19 593
59436 186 - 0.054 0.15 1010
59437 506 - 1.560 0.00 430611
Note that chromosome and start/end intervals are stored separately, so there's no need to parse them out of string. libBigWig can return these entries, either with or without the above associated strings. Parsing these string is left to the application requiring them and is currently outside the scope of this library.
# Interval/Entry iterators
Sometimes it is desirable to request a large number of intervals from a bigWig file or entries from a bigBed file, but not hold them all in memory at once (e.g., due to saving memory). To support this, libBigWig (since version 0.3.0) supports two kinds of iterators. The general process of using iterators is: (1) iterator creation, (2) traversal, and finally (3) iterator destruction. Only iterator creation differs between bigWig and bigBed files.
Importantly, iterators return results by one or more blocks. This is for convenience, since bigWig intervals and bigBed entries are stored in together in fixed-size groups, called blocks. The number of blocks of entries returned, therefore, is an option that can be specified to balance performance and memory usage.
## Iterator creation
For bigwig files, iterators are created with the `bwOverlappingIntervalsIterator()`. This function takes chromosomal bounds (chromosome name, start, and end position) as well as a number of blocks. The equivalent function for bigBed files is `bbOverlappingEntriesIterator()`, which additionally takes a `withString` argutment, which dictates whether the returned entries include the associated string values or not.
Each of the aforementioned files returns a pointer to a `bwOverlapIterator_t` object. The only important parts of this structure for end users are the following members: `entries`, `intervals`, and `data`. `entries` is a pointer to a `bbOverlappingEntries_t` object, or `NULL` if a bigWig file is being used. Likewise, `intervals` is a pointer to a `bwOverlappingIntervals_t` object, or `NULL` if a bigBed file is being used. `data` is a special pointer, used to signify the end of iteration. Thus, when `data` is a `NULL` pointer, iteration has ended.
## Iterator traversal
Regardless of whether a bigWig or bigBed file is being used, the `bwIteratorNext()` function will free currently used memory and load the appropriate intervals or entries for the next block(s). On error, this will return a NULL pointer (memory is already internally freed in this case).
## Iterator destruction
`bwOverlapIterator_t` objects MUST be destroyed after use. This can be done with the `bwIteratorDestroy()` function.
## Example
A full example is provided in `tests/testIterator.c`, but a small example of iterating over all bigWig intervals in `chr1:0-10000000` in chunks of 5 blocks follows:
iter = bwOverlappingIntervalsIterator(fp, "chr1", 0, 10000000, 5);
while(iter->data) {
//Do stuff with iter->intervals
iter = bwIteratorNext(iter);
}
bwIteratorDestroy(iter);
# A note on bigWig statistics
The results of `min`, `max`, and `mean` should be the same as those from `BigWigSummary`. `stdev` and `coverage`, however, may differ due to Kent's tools producing incorrect results (at least for `coverage`, though the same appears to be the case for `stdev`).
# Python interface
There are currently two python interfaces that make use of libBigWig: [pyBigWig](https://github.com/dpryan79/pyBigWig) by me and [bw-python](https://github.com/brentp/bw-python) by Brent Pederson. Those interested are encouraged to give both a try!
================================================
FILE: libBigWig/bigWig.h
================================================
#ifndef LIBBIGWIG_H
#define LIBBIGWIG_H
#include "bigWigIO.h"
#include "bwValues.h"
#include <inttypes.h>
#include <zlib.h>
#ifdef __cplusplus
extern "C" {
#endif
/*! \mainpage libBigWig
*
* \section Introduction
*
* libBigWig is a C library for parsing local/remote bigWig and bigBed files. This is similar to Kent's library from UCSC, except
* * The license is much more liberal
* * This code doesn't call `exit()` on error, thereby killing the calling application.
*
* External files are accessed using [curl](http://curl.haxx.se/).
*
* Please submit issues and pull requests [here](https://github.com/dpryan79/libBigWig).
*
* \section Compilation
*
* Assuming you already have the curl libraries installed (not just the curl binary!):
*
* make install prefix=/some/path
*
* \section Writing bigWig files
*
* There are three methods for storing values in a bigWig file, further described in the [wiggle format](http://genome.ucsc.edu/goldenpath/help/wiggle.html). The entries within the file are grouped into "blocks" and each such block is limited to storing entries of a single type. So, it is unwise to use a single bedGraph-like endtry followed by a single fixed-step entry followed by a variable-step entry, as that would require three separate blocks, with additional space required for each.
*
* \section Testing file types
*
* As of version 0.3.0, libBigWig supports reading bigBed files. If an application needs to support both bigBed and bigWig input, then the `bwIsBigWig` and `bbIsBigBed` functions can be used to determine the file type. These both use the "magic" number at the beginning of the file to determine the file type.
*
* \section Interval and entry iterators
*
* As of version 0.3.0, libBigWig supports iterating over intervals in bigWig files and entries in bigBed files. The number of intervals/entries returned with each iteration can be controlled by setting the number of blocks processed in each iteration (intervals and entries are group inside of bigWig and bigBed files into blocks of entries). See `test/testIterator.c` for an example.
*
* \section Examples
*
* Please see [README.md](README.md) and the files under `test/` for examples.
*/
/*! \file bigWig.h
*
* These are the functions and structured that should be used by external users. While I don't particularly recommend dealing with some of the structures (e.g., a bigWigHdr_t), they're described here in case you need them.
*
* BTW, this library doesn't switch endianness as appropriate, since I kind of assume that there's only one type produced these days.
*/
/*!
* The library version number
*/
#define LIBBIGWIG_VERSION 0.4.8
/*!
* If 1, then this library was compiled with remote file support.
*/
#ifdef NOCURL
#define LIBBIGWIG_CURL 0
#ifndef CURLTYPE_DEFINED
#define CURLTYPE_DEFINED
typedef int CURLcode;
typedef void CURL;
#endif
#else
#define LIBBIGWIG_CURL 1
#endif
/*!
* The magic number of a bigWig file.
*/
#define BIGWIG_MAGIC 0x888FFC26
/*!
* The magic number of a bigBed file.
*/
#define BIGBED_MAGIC 0x8789F2EB
/*!
* The magic number of a "cirTree" block in a file.
*/
#define CIRTREE_MAGIC 0x78ca8c91
/*!
* The magic number of an index block in a file.
*/
#define IDX_MAGIC 0x2468ace0
/*!
* The default number of children per block.
*/
#define DEFAULT_nCHILDREN 64
/*!
* The default decompression buffer size in bytes. This is used to determin
*/
#define DEFAULT_BLOCKSIZE 32768
/*!
* An enum that dictates the type of statistic to fetch for a given interval
*/
enum bwStatsType {
doesNotExist = -1, /*!< This does nothing */
mean = 0, /*!< The mean value */
average = 0, /*!< The mean value */
stdev = 1, /*!< The standard deviation of the values */
dev = 1, /*!< The standard deviation of the values */
max = 2, /*!< The maximum value */
min = 3, /*!< The minimum value */
cov = 4, /*!< The number of bases covered */
coverage = 4, /*!<The number of bases covered */
sum = 5 /*!< The sum of per-base values */
};
//Should hide this from end users
/*!
* @brief BigWig files have multiple "zoom" levels, each of which has its own header. This hold those headers
*
* N.B., there's 4 bytes of padding in the on disk representation of level and dataOffset.
*/
typedef struct {
uint32_t *level; /**<The zoom level, which is an integer starting with 0.*/
//There's 4 bytes of padding between these
uint64_t *dataOffset; /**<The offset to the on-disk start of the data. This isn't used currently.*/
uint64_t *indexOffset; /**<The offset to the on-disk start of the index. This *is* used.*/
bwRTree_t **idx; /**<Index for each zoom level. Represented as a tree*/
} bwZoomHdr_t;
/*!
* @brief The header section of a bigWig file.
*
* Some of the values aren't currently used for anything. Others may optionally not exist.
*/
typedef struct {
uint16_t version; /**<The version information of the file.*/
uint16_t nLevels; /**<The number of "zoom" levels.*/
uint64_t ctOffset; /**<The offset to the on-disk chromosome tree list.*/
uint64_t dataOffset; /**<The on-disk offset to the first block of data.*/
uint64_t indexOffset; /**<The on-disk offset to the data index.*/
uint16_t fieldCount; /**<Total number of fields.*/
uint16_t definedFieldCount; /**<Number of fixed-format BED fields.*/
uint64_t sqlOffset; /**<The on-disk offset to an SQL string. This is unused.*/
uint64_t summaryOffset; /**<If there's a summary, this is the offset to it on the disk.*/
uint32_t bufSize; /**<The compression buffer size (if the data is compressed).*/
uint64_t extensionOffset; /**<Unused*/
bwZoomHdr_t *zoomHdrs; /**<Pointers to the header for each zoom level.*/
//total Summary
uint64_t nBasesCovered; /**<The total bases covered in the file.*/
double minVal; /**<The minimum value in the file.*/
double maxVal; /**<The maximum value in the file.*/
double sumData; /**<The sum of all values in the file.*/
double sumSquared; /**<The sum of the squared values in the file.*/
} bigWigHdr_t;
//Should probably replace this with a hash
/*!
* @brief Holds the chromosomes and their lengths
*/
typedef struct {
int64_t nKeys; /**<The number of chromosomes */
char **chrom; /**<A list of null terminated chromosomes */
uint32_t *len; /**<The lengths of each chromosome */
} chromList_t;
//TODO remove from bigWig.h
/// @cond SKIP
typedef struct bwLL bwLL;
struct bwLL {
bwRTreeNode_t *node;
struct bwLL *next;
};
typedef struct bwZoomBuffer_t bwZoomBuffer_t;
struct bwZoomBuffer_t { //each individual entry takes 32 bytes
void *p;
uint32_t l, m;
struct bwZoomBuffer_t *next;
};
/// @endcond
/*!
* @brief This is only needed for writing bigWig files (and won't be created otherwise)
* This should be removed from bigWig.h
*/
typedef struct {
uint64_t nBlocks; /**<The number of blocks written*/
uint32_t blockSize; /**<The maximum number of children*/
uint64_t nEntries; /**<The number of entries processed. This is used for the first contig and determining how the zoom levels are computed*/
uint64_t runningWidthSum; /**<The running sum of the entry widths for the first contig (again, used for the first contig and computing zoom levels)*/
uint32_t tid; /**<The current TID that's being processed*/
uint32_t start; /**<The start position of the block*/
uint32_t end; /**<The end position of the block*/
uint32_t span; /**<The span of each entry, if applicable*/
uint32_t step; /**<The step size, if applicable*/
uint8_t ltype; /**<The type of the last entry added*/
uint32_t l; /**<The current size of p. This and the type determine the number of items held*/
void *p; /**<A buffer of size hdr->bufSize*/
bwLL *firstIndexNode; /**<The first index node in the linked list*/
bwLL *currentIndexNode; /**<The last index node in a linked list*/
bwZoomBuffer_t **firstZoomBuffer; /**<The first node in a linked list of leaf nodes*/
bwZoomBuffer_t **lastZoomBuffer; /**<The last node in a linked list of leaf nodes*/
uint64_t *nNodes; /**<The number of leaf nodes per zoom level, useful for determining duplicate levels*/
uLongf compressPsz; /**<The size of the compression buffer*/
void *compressP; /**<A compressed buffer of size compressPsz*/
} bwWriteBuffer_t;
/*!
* @brief A structure that holds everything needed to access a bigWig file.
*/
typedef struct {
URL_t *URL; /**<A pointer that can handle both local and remote files (including a buffer if needed).*/
bigWigHdr_t *hdr; /**<The file header.*/
chromList_t *cl; /**<A list of chromosome names (the order is the ID).*/
bwRTree_t *idx; /**<The index for the full dataset.*/
bwWriteBuffer_t *writeBuffer; /**<The buffer used for writing.*/
int isWrite; /**<0: Opened for reading, 1: Opened for writing.*/
int type; /**<0: bigWig, 1: bigBed.*/
} bigWigFile_t;
/*!
* @brief Holds interval:value associations
*/
typedef struct {
uint32_t l; /**<Number of intervals held*/
uint32_t m; /**<Maximum number of values/intervals the struct can hold*/
uint32_t *start; /**<The start positions (0-based half open)*/
uint32_t *end; /**<The end positions (0-based half open)*/
float *value; /**<The value associated with each position*/
} bwOverlappingIntervals_t;
/*!
* @brief Holds interval:str associations
*/
typedef struct {
uint32_t l; /**<Number of intervals held*/
uint32_t m; /**<Maximum number of values/intervals the struct can hold*/
uint32_t *start; /**<The start positions (0-based half open)*/
uint32_t *end; /**<The end positions (0-based half open)*/
char **str; /**<The strings associated with a given entry.*/
} bbOverlappingEntries_t;
/*!
* @brief A structure to hold iterations
* One of intervals and entries should be used to access records from bigWig or bigBed files, respectively.
*/
typedef struct {
bigWigFile_t *bw; /**<Pointer to the bigWig/bigBed file.*/
uint32_t tid; /**<The contig/chromosome ID.*/
uint32_t start; /**<Start position of the query interval.*/
uint32_t end; /**<End position of the query interval.*/
uint64_t offset; /**<Offset into the blocks.*/
uint32_t blocksPerIteration; /**<Number of blocks to use per iteration.*/
int withString; /**<For bigBed entries, whether to return the string with the entries.*/
void *blocks; /**<Overlapping blocks.*/
bwOverlappingIntervals_t *intervals; /**<Overlapping intervals (or NULL).*/
bbOverlappingEntries_t *entries; /**<Overlapping entries (or NULL).*/
void *data; /**<Points to either intervals or entries. If there are no further intervals/entries, then this is NULL. Use this to test for whether to continue iterating.*/
} bwOverlapIterator_t;
/*!
* @brief Initializes curl and global variables. This *MUST* be called before other functions (at least if you want to connect to remote files).
* For remote file, curl must be initialized and regions of a file read into an internal buffer. If the buffer is too small then an excessive number of connections will be made. If the buffer is too large than more data than required is fetched. 128KiB is likely sufficient for most needs.
* @param bufSize The internal buffer size used for remote connection.
* @see bwCleanup
* @return 0 on success and 1 on error.
*/
int bwInit(size_t bufSize);
/*!
* @brief The counterpart to bwInit, this cleans up curl.
* @see bwInit
*/
void bwCleanup(void);
/*!
* @brief Determine if a file is a bigWig file.
* This function will quickly check either local or remote files to determine if they appear to be valid bigWig files. This can be determined by reading the first 4 bytes of the file.
* @param fname The file name or URL (http, https, and ftp are supported)
* @param callBack An optional user-supplied function. This is applied to remote connections so users can specify things like proxy and password information. See `test/testRemote` for an example.
* @return 1 if the file appears to be bigWig, otherwise 0.
*/
int bwIsBigWig(const char *fname, CURLcode (*callBack)(CURL*));
/*!
* @brief Determine is a file is a bigBed file.
* This function will quickly check either local or remote files to determine if they appear to be valid bigWig files. This can be determined by reading the first 4 bytes of the file.
* @param fname The file name or URL (http, https, and ftp are supported)
* @param callBack An optional user-supplied function. This is applied to remote connections so users can specify things like proxy and password information. See `test/testRemote` for an example.
* @return 1 if the file appears to be bigWig, otherwise 0.
*/
int bbIsBigBed(const char *fname, CURLcode (*callBack)(CURL*));
/*!
* @brief Opens a local or remote bigWig file.
* This will open a local or remote bigWig file. Writing of local bigWig files is also supported.
* @param fname The file name or URL (http, https, and ftp are supported)
* @param callBack An optional user-supplied function. This is applied to remote connections so users can specify things like proxy and password information. See `test/testRemote` for an example.
* @param mode The mode, by default "r". Both local and remote files can be read, but only local files can be written. For files being written the callback function is ignored. If and only if the mode contains "w" will the file be opened for writing (in all other cases the file will be opened for reading.
* @return A bigWigFile_t * on success and NULL on error.
*/
bigWigFile_t *bwOpen(const char *fname, CURLcode (*callBack)(CURL*), const char* mode);
/*!
* @brief Opens a local or remote bigBed file.
* This will open a local or remote bigBed file. Note that this file format can only be read and NOT written!
* @param fname The file name or URL (http, https, and ftp are supported)
* @param callBack An optional user-supplied function. This is applied to remote connections so users can specify things like proxy and password information. See `test/testRemote` for an example.
* @return A bigWigFile_t * on success and NULL on error.
*/
bigWigFile_t *bbOpen(const char *fname, CURLcode (*callBack)(CURL*));
/*!
* @brief Returns a string containing the SQL entry (or NULL).
* The "auto SQL" field contains the names and value types of the entries in
* each bigBed entry. If you need to parse a particular value out of each entry,
* then you'll need to first parse this.
* @param fp The file pointer to a valid bigWigFile_t
* @return A char *, which you MUST free!
*/
char *bbGetSQL(bigWigFile_t *fp);
/*!
* @brief Closes a bigWigFile_t and frees up allocated memory
* This closes both bigWig and bigBed files.
* @param fp The file pointer.
*/
void bwClose(bigWigFile_t *fp);
/*******************************************************************************
*
* The following are in bwStats.c
*
*******************************************************************************/
/*!
* @brief Converts between chromosome name and ID
*
* @param fp A valid bigWigFile_t pointer
* @param chrom A chromosome name
* @return An ID, -1 will be returned on error (note that this is an unsigned value, so that's ~4 billion. bigWig/bigBed files can't store that many chromosomes anyway.
*/
uint32_t bwGetTid(const bigWigFile_t *fp, const char *chrom);
/*!
* @brief Frees space allocated by `bwGetOverlappingIntervals`
* @param o A valid `bwOverlappingIntervals_t` pointer.
* @see bwGetOverlappingIntervals
*/
void bwDestroyOverlappingIntervals(bwOverlappingIntervals_t *o);
/*!
* @brief Frees space allocated by `bbGetOverlappingEntries`
* @param o A valid `bbOverlappingEntries_t` pointer.
* @see bbGetOverlappingEntries
*/
void bbDestroyOverlappingEntries(bbOverlappingEntries_t *o);
/*!
* @brief Return bigWig entries overlapping an interval.
* Find all bigWig entries overlapping a range and returns them, including their associated values.
* @param fp A valid bigWigFile_t pointer. This MUST be for a bigWig file!
* @param chrom A valid chromosome name.
* @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
* @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
* @return NULL on error or no overlapping values, otherwise a `bwOverlappingIntervals_t *` holding the values and intervals.
* @see bwOverlappingIntervals_t
* @see bwDestroyOverlappingIntervals
* @see bwGetValues
*/
bwOverlappingIntervals_t *bwGetOverlappingIntervals(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end);
/*!
* @brief Return bigBed entries overlapping an interval.
* Find all bigBed entries overlapping a range and returns them.
* @param fp A valid bigWigFile_t pointer. This MUST be for a bigBed file!
* @param chrom A valid chromosome name.
* @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
* @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
* @param withString If not 0, return the string associated with each entry in the output. If 0, there are no associated strings returned. This is useful if the only information needed are the locations of the entries, which require significantly less memory.
* @return NULL on error or no overlapping values, otherwise a `bbOverlappingEntries_t *` holding the intervals and (optionally) the associated string.
* @see bbOverlappingEntries_t
* @see bbDestroyOverlappingEntries
*/
bbOverlappingEntries_t *bbGetOverlappingEntries(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, int withString);
/*!
* @brief Creates an iterator over intervals in a bigWig file
* Iterators can be traversed with `bwIteratorNext()` and destroyed with `bwIteratorDestroy()`.
* Intervals are in the `intervals` member and `data` can be used to determine when to end iteration.
* @param fp A valid bigWigFile_t pointer. This MUST be for a bigWig file!
* @param chrom A valid chromosome name.
* @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
* @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
* @param blocksPerIteration The number of blocks (internal groupings of intervals in bigWig files) to return per iteration.
* @return NULL on error, otherwise a bwOverlapIterator_t pointer
* @see bwOverlapIterator_t
* @see bwIteratorNext
* @see bwIteratorDestroy
*/
bwOverlapIterator_t *bwOverlappingIntervalsIterator(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, uint32_t blocksPerIteration);
/*!
* @brief Creates an iterator over entries in a bigBed file
* Iterators can be traversed with `bwIteratorNext()` and destroyed with `bwIteratorDestroy()`.
* Entries are in the `entries` member and `data` can be used to determine when to end iteration.
* @param fp A valid bigWigFile_t pointer. This MUST be for a bigBed file!
* @param chrom A valid chromosome name.
* @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
* @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
* @param withString Whether the returned entries should include their associated strings.
* @param blocksPerIteration The number of blocks (internal groupings of entries in bigBed files) to return per iteration.
* @return NULL on error, otherwise a bwOverlapIterator_t pointer
* @see bbGetOverlappingEntries
* @see bwOverlapIterator_t
* @see bwIteratorNext
* @see bwIteratorDestroy
*/
bwOverlapIterator_t *bbOverlappingEntriesIterator(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, int withString, uint32_t blocksPerIteration);
/*!
* @brief Traverses to the entries/intervals in the next group of blocks.
* @param iter A bwOverlapIterator_t pointer that is updated (or destroyed on error)
* @return NULL on error, otherwise a bwOverlapIterator_t pointer with the intervals or entries from the next set of blocks.
* @see bwOverlapIterator_t
* @see bwIteratorDestroy
*/
bwOverlapIterator_t *bwIteratorNext(bwOverlapIterator_t *iter);
/*!
* @brief Destroys a bwOverlapIterator_t
* @param iter The bwOverlapIterator_t that should be destroyed
*/
void bwIteratorDestroy(bwOverlapIterator_t *iter);
/*!
* @brief Return all per-base bigWig values in a given interval.
* Given an interval (e.g., chr1:0-100), return the value at each position in a bigWig file. Positions without associated values are suppressed by default, but may be returned if `includeNA` is not 0.
* @param fp A valid bigWigFile_t pointer.
* @param chrom A valid chromosome name.
* @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
* @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
* @param includeNA If not 0, report NA values as well (as NA).
* @return NULL on error or no overlapping values, otherwise a `bwOverlappingIntervals_t *` holding the values and positions.
* @see bwOverlappingIntervals_t
* @see bwDestroyOverlappingIntervals
* @see bwGetOverlappingIntervals
*/
bwOverlappingIntervals_t *bwGetValues(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, int includeNA);
/*!
* @brief Determines per-interval bigWig statistics
* Can determine mean/min/max/coverage/standard deviation of values in one or more intervals in a bigWig file. You can optionally give it an interval and ask for values from X number of sub-intervals.
* @param fp The file from which to extract statistics.
* @param chrom A valid chromosome name.
* @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
* @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
* @param nBins The number of bins within the interval to calculate statistics for.
* @param type The type of statistic.
* @see bwStatsType
* @return A pointer to an array of double precission floating point values. Note that bigWig files only hold 32-bit values, so this is done to help prevent overflows.
*/
double *bwStats(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, uint32_t nBins, enum bwStatsType type);
/*!
* @brief Determines per-interval bigWig statistics
* Can determine mean/min/max/coverage/standard deviation of values in one or more intervals in a bigWig file. You can optionally give it an interval and ask for values from X number of sub-intervals. The difference with bwStats is that zoom levels are never used.
* @param fp The file from which to extract statistics.
* @param chrom A valid chromosome name.
* @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
* @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
* @param nBins The number of bins within the interval to calculate statistics for.
* @param type The type of statistic.
* @see bwStatsType
* @return A pointer to an array of double precission floating point values. Note that bigWig files only hold 32-bit values, so this is done to help prevent overflows.
*/
double *bwStatsFromFull(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, uint32_t nBins, enum bwStatsType type);
//Writer functions
/*!
* @brief Create a largely empty bigWig header
* Every bigWig file has a header, this creates the template for one. It also takes care of space allocation in the output write buffer.
* @param fp The bigWigFile_t* that you want to write to.
* @param maxZooms The maximum number of zoom levels. If you specify 0 then there will be no zoom levels. A value <0 or > 65535 will result in a maximum of 10.
* @return 0 on success.
*/
int bwCreateHdr(bigWigFile_t *fp, int32_t maxZooms);
/*!
* @brief Take a list of chromosome names and lengths and return a pointer to a chromList_t
* This MUST be run before `bwWriteHdr()`. Note that the input is NOT free()d!
* @param chroms A list of chromosomes.
* @param lengths The length of each chromosome.
* @param n The number of chromosomes (thus, the length of `chroms` and `lengths`)
* @return A pointer to a chromList_t or NULL on error.
*/
chromList_t *bwCreateChromList(const char* const* chroms, const uint32_t *lengths, int64_t n);
/*!
* @brief Write a the header to a bigWig file.
* You must have already opened the output file, created a header and a chromosome list.
* @param bw The output bigWigFile_t pointer.
* @see bwCreateHdr
* @see bwCreateChromList
*/
int bwWriteHdr(bigWigFile_t *bw);
/*!
* @brief Write a new block of bedGraph-like intervals to a bigWig file
* Adds entries of the form:
* chromosome start end value
* to the file. These will always be added in a new block, so you may have previously used a different storage type.
*
* In general it's more efficient to use the bwAppend* functions, but then you MUST know that the previously written block is of the same type. In other words, you can only use bwAppendIntervals() after bwAddIntervals() or a previous bwAppendIntervals().
* @param fp The output file pointer.
* @param chrom A list of chromosomes, of length `n`.
* @param start A list of start positions of length`n`.
* @param end A list of end positions of length`n`.
* @param values A list of values of length`n`.
* @param n The length of the aforementioned lists.
* @return 0 on success and another value on error.
* @see bwAppendIntervals
*/
int bwAddIntervals(bigWigFile_t *fp, const char* const* chrom, const uint32_t *start, const uint32_t *end, const float *values, uint32_t n);
/*!
* @brief Append bedGraph-like intervals to a previous block of bedGraph-like intervals in a bigWig file.
* If you have previously used bwAddIntervals() then this will append additional entries into the previous block (or start a new one if needed).
* @param fp The output file pointer.
* @param start A list of start positions of length`n`.
* @param end A list of end positions of length`n`.
* @param values A list of values of length`n`.
* @param n The length of the aforementioned lists.
* @return 0 on success and another value on error.
* @warning Do NOT use this after `bwAddIntervalSpanSteps()`, `bwAppendIntervalSpanSteps()`, `bwAddIntervalSpanSteps()`, or `bwAppendIntervalSpanSteps()`.
* @see bwAddIntervals
*/
int bwAppendIntervals(bigWigFile_t *fp, const uint32_t *start, const uint32_t *end, const float *values, uint32_t n);
/*!
* @brief Add a new block of variable-step entries to a bigWig file
* Adds entries for the form
* chromosome start value
* to the file. Each block of such entries has an associated "span", so each value describes the region chromosome:start-(start+span)
*
* This will always start a new block of values.
* @param fp The output file pointer.
* @param chrom A list of chromosomes, of length `n`.
* @param start A list of start positions of length`n`.
* @param span The span of each entry (the must all be the same).
* @param values A list of values of length`n`.
* @param n The length of the aforementioned lists.
* @return 0 on success and another value on error.
* @see bwAppendIntervalSpans
*/
int bwAddIntervalSpans(bigWigFile_t *fp, const char *chrom, const uint32_t *start, uint32_t span, const float *values, uint32_t n);
/*!
* @brief Append to a previous block of variable-step entries.
* If you previously used `bwAddIntervalSpans()`, this will continue appending more values to the block(s) it created.
* @param fp The output file pointer.
* @param start A list of start positions of length`n`.
* @param values A list of values of length`n`.
* @param n The length of the aforementioned lists.
* @return 0 on success and another value on error.
* @warning Do NOT use this after `bwAddIntervals()`, `bwAppendIntervals()`, `bwAddIntervalSpanSteps()` or `bwAppendIntervalSpanSteps()`
* @see bwAddIntervalSpans
*/
int bwAppendIntervalSpans(bigWigFile_t *fp, const uint32_t *start, const float *values, uint32_t n);
/*!
* @brief Add a new block of fixed-step entries to a bigWig file
* Adds entries for the form
* value
* to the file. Each block of such entries has an associated "span", "step", chromosome and start position. See the wiggle format for more details.
*
* This will always start a new block of values.
* @param fp The output file pointer.
* @param chrom The chromosome that the entries describe.
* @param start The starting position of the block of entries.
* @param span The span of each entry (i.e., the number of bases it describes).
* @param step The step between entry start positions.
* @param values A list of values of length`n`.
* @param n The length of the aforementioned lists.
* @return 0 on success and another value on error.
* @see bwAddIntervalSpanSteps
*/
int bwAddIntervalSpanSteps(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t span, uint32_t step, const float *values, uint32_t n);
/*!
* @brief Append to a previous block of fixed-step entries.
* If you previously used `bwAddIntervalSpanSteps()`, this will continue appending more values to the block(s) it created.
* @param fp The output file pointer.
* @param values A list of values of length`n`.
* @param n The length of the aforementioned lists.
* @return 0 on success and another value on error.
* @warning Do NOT use this after `bwAddIntervals()`, `bwAppendIntervals()`, `bwAddIntervalSpans()` or `bwAppendIntervalSpans()`
* @see bwAddIntervalSpanSteps
*/
int bwAppendIntervalSpanSteps(bigWigFile_t *fp, const float *values, uint32_t n);
#ifdef __cplusplus
}
#endif
#endif // LIBBIGWIG_H
================================================
FILE: libBigWig/bigWigIO.h
================================================
#ifndef LIBBIGWIG_IO_H
#define LIBBIGWIG_IO_H
#ifndef NOCURL
#include <curl/curl.h>
#else
#include <stdio.h>
#ifndef CURLTYPE_DEFINED
#define CURLTYPE_DEFINED
typedef int CURLcode;
typedef void CURL;
#endif
#define CURLE_OK 0
#define CURLE_FAILED_INIT 1
#endif
/*! \file bigWigIO.h
* These are (typically internal) IO functions, so there's generally no need for you to directly use them!
*/
/*!
* The size of the buffer used for remote files.
*/
extern size_t GLOBAL_DEFAULTBUFFERSIZE;
/*!
* The enumerated values that indicate the connection type used to access a file.
*/
enum bigWigFile_type_enum {
BWG_FILE = 0,
BWG_HTTP = 1,
BWG_HTTPS = 2,
BWG_FTP = 3
};
/*!
* @brief This structure holds the file pointers and buffers needed for raw access to local and remote files.
*/
typedef struct {
union {
#ifndef NOCURL
CURL *curl; /**<The CURL * file pointer for remote files.*/
#endif
FILE *fp; /**<The FILE * file pointer for local files.**/
} x; /**<A union holding curl and fp.*/
void *memBuf; /**<A void * pointing to memory of size bufSize.*/
size_t filePos; /**<Current position inside the file.*/
size_t bufPos; /**<Curent position inside the buffer.*/
size_t bufSize; /**<The size of the buffer.*/
size_t bufLen; /**<The actual size of the buffer used.*/
enum bigWigFile_type_enum type; /**<The connection type*/
int isCompressed; /**<1 if the file is compressed, otherwise 0*/
const char *fname; /**<Only needed for remote connections. The original URL/filename requested, since we need to make multiple connections.*/
} URL_t;
/*!
* @brief Reads data into the given buffer.
*
* This function will store bufSize data into buf for both local and remote files. For remote files an internal buffer is used to store a (typically larger) segment of the remote file.
*
* @param URL A URL_t * pointing to a valid opened file or remote URL.
* @param buf The buffer in memory that you would like filled. It must be able to hold bufSize bytes!
* @param bufSize The number of bytes to transfer to buf.
*
* @return Returns the number of bytes stored in buf, which should be bufSize on success and something else on error.
*
* @warning Note that on error, URL for remote files is left in an unusable state. You can get around this by running urlSeek() to a position outside of the range held by the internal buffer.
*/
size_t urlRead(URL_t *URL, void *buf, size_t bufSize);
/*!
* @brief Seeks to a given position in a local or remote file.
*
* For local files, this will set the file position indicator for the file pointer to the desired position. For remote files, it sets the position to start downloading data for the next urlRead(). Note that for remote files that running urlSeek() with a pos within the current buffer will simply modify the internal offset.
*
* @param URL A URL_t * pointing to a valid opened file or remote URL.
* @param pos The position to seek to.
*
* @return CURLE_OK on success and a different CURLE_XXX on error. For local files, the error return value is always CURLE_FAILED_INIT
*/
CURLcode urlSeek(URL_t *URL, size_t pos);
/*!
* @brief Open a local or remote file
*
* Opens a local or remote file. Currently, http, https, and ftp are the only supported protocols and the URL must then begin with "http://", "https://", or "ftp://" as appropriate.
*
* For remote files, an internal buffer is used to hold file contents, to avoid downloading entire files before starting. The size of this buffer and various variable related to connection timeout are set with bwInit().
*
* Note that you **must** run urlClose() on this when finished. However, you would typically just use bwOpen() rather than directly calling this function.
*
* @param fname The file name or URL to open.
* @param callBack An optional user-supplied function. This is applied to remote connections so users can specify things like proxy and password information.
* @param mode "r", "w" or NULL. If and only if the mode contains the character "w" will the file be opened for writing.
*
* @return A URL_t * or NULL on error.
*/
URL_t *urlOpen(const char *fname, CURLcode (*callBack)(CURL*), const char* mode);
/*!
* @brief Close a local/remote file
*
* This will perform the cleanup required on a URL_t*, releasing memory as needed.
*
* @param URL A URL_t * pointing to a valid opened file or remote URL.
*
* @warning URL will no longer point to a valid location in memory!
*/
void urlClose(URL_t *URL);
#endif // LIBBIGWIG_IO_H
================================================
FILE: libBigWig/bwCommon.h
================================================
/*! \file bwCommon.h
*
* You have no reason to use these functions. They may change without warning because there's no reason for them to be used outside of libBigWig's internals.
*
* These are structures and functions from a variety of files that are used across files internally but don't need to be see by libBigWig users.
*/
/*!
* @brief Like fsetpos, but for local or remote bigWig files.
* This will set the file position indicator to the specified point. For local files this literally is `fsetpos`, while for remote files it fills a memory buffer with data starting at the desired position.
* @param fp A valid opened bigWigFile_t.
* @param pos The position within the file to seek to.
* @return 0 on success and -1 on error.
*/
int bwSetPos(bigWigFile_t *fp, size_t pos);
/*!
* @brief A local/remote version of `fread`.
* Reads data from either local or remote bigWig files.
* @param data An allocated memory block big enough to hold the data.
* @param sz The size of each member that should be copied.
* @param nmemb The number of members to copy.
* @param fp The bigWigFile_t * from which to copy the data.
* @see bwSetPos
* @return For nmemb==1, the size of the copied data. For nmemb>1, the number of members fully copied (this is equivalent to `fread`).
*/
size_t bwRead(void *data, size_t sz, size_t nmemb, bigWigFile_t *fp);
/*!
* @brief Determine what the file position indicator say.
* This is equivalent to `ftell` for local or remote files.
* @param fp The file.
* @return The position in the file.
*/
long bwTell(bigWigFile_t *fp);
/*!
* @brief Reads a data index (either full data or a zoom level) from a bigWig file.
* There is little reason for end users to use this function. This must be freed with `bwDestroyIndex`
* @param fp A valid bigWigFile_t pointer
* @param offset The file offset where the index begins
* @return A bwRTree_t pointer or NULL on error.
*/
bwRTree_t *bwReadIndex(bigWigFile_t *fp, uint64_t offset);
/*!
* @brief Destroy an bwRTreeNode_t and all of its children.
* @param node The node to destroy.
*/
void bwDestroyIndexNode(bwRTreeNode_t *node);
/*!
* @brief Frees space allocated by `bwReadIndex`
* There is generally little reason to use this, since end users should typically not need to run `bwReadIndex` themselves.
* @param idx A bwRTree_t pointer allocated by `bwReadIndex`.
*/
void bwDestroyIndex(bwRTree_t *idx);
/// @cond SKIP
bwOverlapBlock_t *walkRTreeNodes(bigWigFile_t *bw, bwRTreeNode_t *root, uint32_t tid, uint32_t start, uint32_t end);
void destroyBWOverlapBlock(bwOverlapBlock_t *b);
/// @endcond
/*!
* @brief Finishes what's needed to write a bigWigFile
* Flushes the buffer, converts the index linked list to a tree, writes that to disk, handles zoom level stuff, writes magic at the end
* @param fp A valid bigWigFile_t pointer
* @return 0 on success
*/
int bwFinalize(bigWigFile_t *fp);
/// @cond SKIP
char *bwStrdup(const char *s);
/// @endcond
================================================
FILE: libBigWig/bwRead.c
================================================
#include "bigWig.h"
#include "bwCommon.h"
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <stdio.h>
static uint64_t readChromBlock(bigWigFile_t *bw, chromList_t *cl, uint32_t keySize);
//Return the position in the file
long bwTell(bigWigFile_t *fp) {
if(fp->URL->type == BWG_FILE) return ftell(fp->URL->x.fp);
return (long) (fp->URL->filePos + fp->URL->bufPos);
}
//Seek to a given position, always from the beginning of the file
//Return 0 on success and -1 on error
//To do, use the return code of urlSeek() in a more useful way.
int bwSetPos(bigWigFile_t *fp, size_t pos) {
CURLcode rv = urlSeek(fp->URL, pos);
if(rv == CURLE_OK) return 0;
return -1;
}
//returns the number of full members read (nmemb on success, something less on error)
size_t bwRead(void *data, size_t sz, size_t nmemb, bigWigFile_t *fp) {
size_t i, rv;
for(i=0; i<nmemb; i++) {
rv = urlRead(fp->URL, data+i*sz, sz);
if(rv != sz) return i;
}
return nmemb;
}
//Initializes curl and sets global variables
//Returns 0 on success and 1 on error
//This should be called only once and bwCleanup() must be called when finished.
int bwInit(size_t defaultBufSize) {
//set the buffer size, number of iterations, sleep time between iterations, etc.
GLOBAL_DEFAULTBUFFERSIZE = defaultBufSize;
//call curl_global_init()
#ifndef NOCURL
CURLcode rv;
rv = curl_global_init(CURL_GLOBAL_ALL);
if(rv != CURLE_OK) return 1;
#endif
return 0;
}
//This should be called before quiting, to release memory acquired by curl
void bwCleanup() {
#ifndef NOCURL
curl_global_cleanup();
#endif
}
static bwZoomHdr_t *bwReadZoomHdrs(bigWigFile_t *bw) {
if(bw->isWrite) return NULL;
uint16_t i;
bwZoomHdr_t *zhdr = malloc(sizeof(bwZoomHdr_t));
if(!zhdr) return NULL;
uint32_t *level = malloc(bw->hdr->nLevels * sizeof(uint64_t));
if(!level) {
free(zhdr);
return NULL;
}
uint32_t padding = 0;
uint64_t *dataOffset = malloc(sizeof(uint64_t) * bw->hdr->nLevels);
if(!dataOffset) {
free(zhdr);
free(level);
return NULL;
}
uint64_t *indexOffset = malloc(sizeof(uint64_t) * bw->hdr->nLevels);
if(!indexOffset) {
free(zhdr);
free(level);
free(dataOffset);
return NULL;
}
for(i=0; i<bw->hdr->nLevels; i++) {
if(bwRead((void*) &(level[i]), sizeof(uint32_t), 1, bw) != 1) goto error;
if(bwRead((void*) &padding, sizeof(uint32_t), 1, bw) != 1) goto error;
if(bwRead((void*) &(dataOffset[i]), sizeof(uint64_t), 1, bw) != 1) goto error;
if(bwRead((void*) &(indexOffset[i]), sizeof(uint64_t), 1, bw) != 1) goto error;
}
zhdr->level = level;
zhdr->dataOffset = dataOffset;
zhdr->indexOffset = indexOffset;
zhdr->idx = calloc(bw->hdr->nLevels, sizeof(bwRTree_t*));
if(!zhdr->idx) goto error;
return zhdr;
error:
for(i=0; i<bw->hdr->nLevels; i++) {
if(zhdr->idx[i]) bwDestroyIndex(zhdr->idx[i]);
}
free(zhdr);
free(level);
free(dataOffset);
free(indexOffset);
return NULL;
}
static void bwHdrDestroy(bigWigHdr_t *hdr) {
int i;
if(hdr->zoomHdrs) {
free(hdr->zoomHdrs->level);
free(hdr->zoomHdrs->dataOffset);
free(hdr->zoomHdrs->indexOffset);
for(i=0; i<hdr->nLevels; i++) {
if(hdr->zoomHdrs->idx[i]) bwDestroyIndex(hdr->zoomHdrs->idx[i]);
}
free(hdr->zoomHdrs->idx);
free(hdr->zoomHdrs);
}
free(hdr);
}
static void bwHdrRead(bigWigFile_t *bw) {
uint32_t magic;
if(bw->isWrite) return;
bw->hdr = calloc(1, sizeof(bigWigHdr_t));
if(!bw->hdr) return;
if(bwRead((void*) &magic, sizeof(uint32_t), 1, bw) != 1) goto error; //0x0
if(magic != BIGWIG_MAGIC && magic != BIGBED_MAGIC) goto error;
if(bwRead((void*) &(bw->hdr->version), sizeof(uint16_t), 1, bw) != 1) goto error; //0x4
if(bwRead((void*) &(bw->hdr->nLevels), sizeof(uint16_t), 1, bw) != 1) goto error; //0x6
if(bwRead((void*) &(bw->hdr->ctOffset), sizeof(uint64_t), 1, bw) != 1) goto error; //0x8
if(bwRead((void*) &(bw->hdr->dataOffset), sizeof(uint64_t), 1, bw) != 1) goto error; //0x10
if(bwRead((void*) &(bw->hdr->indexOffset), sizeof(uint64_t), 1, bw) != 1) goto error; //0x18
if(bwRead((void*) &(bw->hdr->fieldCount), sizeof(uint16_t), 1, bw) != 1) goto error; //0x20
if(bwRead((void*) &(bw->hdr->definedFieldCount), sizeof(uint16_t), 1, bw) != 1) goto error; //0x22
if(bwRead((void*) &(bw->hdr->sqlOffset), sizeof(uint64_t), 1, bw) != 1) goto error; //0x24
if(bwRead((void*) &(bw->hdr->summaryOffset), sizeof(uint64_t), 1, bw) != 1) goto error; //0x2c
if(bwRead((void*) &(bw->hdr->bufSize), sizeof(uint32_t), 1, bw) != 1) goto error; //0x34
if(bwRead((void*) &(bw->hdr->extensionOffset), sizeof(uint64_t), 1, bw) != 1) goto error; //0x38
//zoom headers
if(bw->hdr->nLevels) {
if(!(bw->hdr->zoomHdrs = bwReadZoomHdrs(bw))) goto error;
}
//File summary information
if(bw->hdr->summaryOffset) {
if(urlSeek(bw->URL, bw->hdr->summaryOffset) != CURLE_OK) goto error;
if(bwRead((void*) &(bw->hdr->nBasesCovered), sizeof(uint64_t), 1, bw) != 1) goto error;
if(bwRead((void*) &(bw->hdr->minVal), sizeof(uint64_t), 1, bw) != 1) goto error;
if(bwRead((void*) &(bw->hdr->maxVal), sizeof(uint64_t), 1, bw) != 1) goto error;
if(bwRead((void*) &(bw->hdr->sumData), sizeof(uint64_t), 1, bw) != 1) goto error;
if(bwRead((void*) &(bw->hdr->sumSquared), sizeof(uint64_t), 1, bw) != 1) goto error;
}
//In case of uncompressed remote files, let the IO functions know to request larger chunks
bw->URL->isCompressed = (bw->hdr->bufSize > 0)?1:0;
return;
error:
bwHdrDestroy(bw->hdr);
fprintf(stderr, "[bwHdrRead] There was an error while reading in the header!\n");
bw->hdr = NULL;
}
static void destroyChromList(chromList_t *cl) {
uint32_t i;
if(!cl) return;
if(cl->nKeys && cl->chrom) {
for(i=0; i<cl->nKeys; i++) {
if(cl->chrom[i]) free(cl->chrom[i]);
}
}
if(cl->chrom) free(cl->chrom);
if(cl->len) free(cl->len);
free(cl);
}
static uint64_t readChromLeaf(bigWigFile_t *bw, chromList_t *cl, uint32_t valueSize) {
uint16_t nVals, i;
uint32_t idx;
char *chrom = NULL;
if(bwRead((void*) &nVals, sizeof(uint16_t), 1, bw) != 1) return -1;
chrom = calloc(valueSize+1, sizeof(char));
if(!chrom) return -1;
for(i=0; i<nVals; i++) {
if(bwRead((void*) chrom, sizeof(char), valueSize, bw) != valueSize) goto error;
if(bwRead((void*) &idx, sizeof(uint32_t), 1, bw) != 1) goto error;
if(bwRead((void*) &(cl->len[idx]), sizeof(uint32_t), 1, bw) != 1) goto error;
cl->chrom[idx] = bwStrdup(chrom);
if(!(cl->chrom[idx])) goto error;
}
free(chrom);
return nVals;
error:
free(chrom);
return -1;
}
static uint64_t readChromNonLeaf(bigWigFile_t *bw, chromList_t *cl, uint32_t keySize) {
uint64_t offset , rv = 0, previous;
uint16_t nVals, i;
if(bwRead((void*) &nVals, sizeof(uint16_t), 1, bw) != 1) return -1;
previous = bwTell(bw) + keySize;
for(i=0; i<nVals; i++) {
if(bwSetPos(bw, previous)) return -1;
if(bwRead((void*) &offset, sizeof(uint64_t), 1, bw) != 1) return -1;
if(bwSetPos(bw, offset)) return -1;
rv += readChromBlock(bw, cl, keySize);
previous += 8 + keySize;
}
return rv;
}
static uint64_t readChromBlock(bigWigFile_t *bw, chromList_t *cl, uint32_t keySize) {
uint8_t isLeaf, padding;
if(bwRead((void*) &isLeaf, sizeof(uint8_t), 1, bw) != 1) return -1;
if(bwRead((void*) &padding, sizeof(uint8_t), 1, bw) != 1) return -1;
if(isLeaf) {
return readChromLeaf(bw, cl, keySize);
} else { //I've never actually observed one of these, which is good since they're pointless
return readChromNonLeaf(bw, cl, keySize);
}
}
static chromList_t *bwReadChromList(bigWigFile_t *bw) {
chromList_t *cl = NULL;
uint32_t magic, keySize, valueSize, itemsPerBlock;
uint64_t rv, itemCount;
if(bw->isWrite) return NULL;
if(bwSetPos(bw, bw->hdr->ctOffset)) return NULL;
cl = calloc(1, sizeof(chromList_t));
if(!cl) return NULL;
if(bwRead((void*) &magic, sizeof(uint32_t), 1, bw) != 1) goto error;
if(magic != CIRTREE_MAGIC) goto error;
if(bwRead((void*) &itemsPerBlock, sizeof(uint32_t), 1, bw) != 1) goto error;
if(bwRead((void*) &keySize, sizeof(uint32_t), 1, bw) != 1) goto error;
if(bwRead((void*) &valueSize, sizeof(uint32_t), 1, bw) != 1) goto error;
if(bwRead((void*) &itemCount, sizeof(uint64_t), 1, bw) != 1) goto error;
cl->nKeys = itemCount;
cl->chrom = calloc(itemCount, sizeof(char*));
cl->len = calloc(itemCount, sizeof(uint32_t));
if(!cl->chrom) goto error;
if(!cl->len) goto error;
if(bwRead((void*) &magic, sizeof(uint32_t), 1, bw) != 1) goto error;
if(bwRead((void*) &magic, sizeof(uint32_t), 1, bw) != 1) goto error;
//Read in the blocks
rv = readChromBlock(bw, cl, keySize);
if(rv == (uint64_t) -1) goto error;
if(rv != itemCount) goto error;
return cl;
error:
destroyChromList(cl);
return NULL;
}
//This is here mostly for convenience
static void bwDestroyWriteBuffer(bwWriteBuffer_t *wb) {
if(wb->p) free(wb->p);
if(wb->compressP) free(wb->compressP);
if(wb->firstZoomBuffer) free(wb->firstZoomBuffer);
if(wb->lastZoomBuffer) free(wb->lastZoomBuffer);
if(wb->nNodes) free(wb->nNodes);
free(wb);
}
void bwClose(bigWigFile_t *fp) {
if(!fp) return;
if(bwFinalize(fp)) {
fprintf(stderr, "[bwClose] There was an error while finishing writing a bigWig file! The output is likely truncated.\n");
}
if(fp->URL) urlClose(fp->URL);
if(fp->hdr) bwHdrDestroy(fp->hdr);
if(fp->cl) destroyChromList(fp->cl);
if(fp->idx) bwDestroyIndex(fp->idx);
if(fp->writeBuffer) bwDestroyWriteBuffer(fp->writeBuffer);
free(fp);
}
int bwIsBigWig(const char *fname, CURLcode (*callBack) (CURL*)) {
uint32_t magic = 0;
URL_t *URL = NULL;
URL = urlOpen(fname, *callBack, NULL);
if(!URL) return 0;
if(urlRead(URL, (void*) &magic, sizeof(uint32_t)) != sizeof(uint32_t)) magic = 0;
urlClose(URL);
if(magic == BIGWIG_MAGIC) return 1;
return 0;
}
char *bbGetSQL(bigWigFile_t *fp) {
char *o = NULL;
uint64_t len;
if(!fp->hdr->sqlOffset) return NULL;
len = fp->hdr->summaryOffset - fp->hdr->sqlOffset; //This includes the NULL terminator
o = malloc(sizeof(char) * len);
if(!o) goto error;
if(bwSetPos(fp, fp->hdr->sqlOffset)) goto error;
if(bwRead((void*) o, len, 1, fp) != 1) goto error;
return o;
error:
if(o) free(o);
printf("Got an error in bbGetSQL!\n");
return NULL;
}
int bbIsBigBed(const char *fname, CURLcode (*callBack) (CURL*)) {
uint32_t magic = 0;
URL_t *URL = NULL;
URL = urlOpen(fname, *callBack, NULL);
if(!URL) return 0;
if(urlRead(URL, (void*) &magic, sizeof(uint32_t)) != sizeof(uint32_t)) magic = 0;
urlClose(URL);
if(magic == BIGBED_MAGIC) return 1;
return 0;
}
bigWigFile_t *bwOpen(const char *fname, CURLcode (*callBack) (CURL*), const char *mode) {
bigWigFile_t *bwg = calloc(1, sizeof(bigWigFile_t));
if(!bwg) {
fprintf(stderr, "[bwOpen] Couldn't allocate space to create the output object!\n");
return NULL;
}
if((!mode) || (strchr(mode, 'w') == NULL)) {
bwg->isWrite = 0;
bwg->URL = urlOpen(fname, *callBack, NULL);
if(!bwg->URL) {
fprintf(stderr, "[bwOpen] urlOpen is NULL!\n");
goto error;
}
//Attempt to read in the fixed header
bwHdrRead(bwg);
if(!bwg->hdr) {
fprintf(stderr, "[bwOpen] bwg->hdr is NULL!\n");
goto error;
}
//Read in the chromosome list
bwg->cl = bwReadChromList(bwg);
if(!bwg->cl) {
fprintf(stderr, "[bwOpen] bwg->cl is NULL (%s)!\n", fname);
goto error;
}
//Read in the index
if(bwg->hdr->indexOffset) {
bwg->idx = bwReadIndex(bwg, 0);
if(!bwg->idx) {
fprintf(stderr, "[bwOpen] bwg->idx is NULL bwg->hdr->dataOffset 0x%"PRIx64"!\n", bwg->hdr->dataOffset);
goto error;
}
}
} else {
bwg->isWrite = 1;
bwg->URL = urlOpen(fname, NULL, "w+");
if(!bwg->URL) goto error;
bwg->writeBuffer = calloc(1,sizeof(bwWriteBuffer_t));
if(!bwg->writeBuffer) goto error;
bwg->writeBuffer->l = 24;
}
return bwg;
error:
bwClose(bwg);
return NULL;
}
bigWigFile_t *bbOpen(const char *fname, CURLcode (*callBack) (CURL*)) {
bigWigFile_t *bb = calloc(1, sizeof(bigWigFile_t));
if(!bb) {
fprintf(stderr, "[bbOpen] Couldn't allocate space to create the output object!\n");
return NULL;
}
//Set the type to 1 for bigBed
bb->type = 1;
bb->URL = urlOpen(fname, *callBack, NULL);
if(!bb->URL) goto error;
//Attempt to read in the fixed header
bwHdrRead(bb);
if(!bb->hdr) goto error;
//Read in the chromosome list
bb->cl = bwReadChromList(bb);
if(!bb->cl) goto error;
//Read in the index
bb->idx = bwReadIndex(bb, 0);
if(!bb->idx) goto error;
return bb;
error:
bwClose(bb);
return NULL;
}
//Implementation taken from musl:
//https://git.musl-libc.org/cgit/musl/tree/src/string/strdup.c
//License: https://git.musl-libc.org/cgit/musl/tree/COPYRIGHT
char* bwStrdup(const char *s) {
size_t l = strlen(s);
char *d = malloc(l+1);
if (!d) return NULL;
return memcpy(d, s, l+1);
}
================================================
FILE: libBigWig/bwStats.c
================================================
#include "bigWig.h"
#include "bwCommon.h"
#include <errno.h>
#include <stdlib.h>
#include <zlib.h>
#include <math.h>
#include <string.h>
//Returns -1 if there are no applicable levels, otherwise an integer indicating the most appropriate level.
//Like Kent's library, this divides the desired bin size by 2 to minimize the effect of blocks overlapping multiple bins
static int32_t determineZoomLevel(const bigWigFile_t *fp, int basesPerBin) {
int32_t out = -1;
int64_t diff;
uint32_t bestDiff = -1;
uint16_t i;
basesPerBin/=2;
for(i=0; i<fp->hdr->nLevels; i++) {
diff = basesPerBin - (int64_t) fp->hdr->zoomHdrs->level[i];
if(diff >= 0 && diff < bestDiff) {
bestDiff = diff;
out = i;
}
}
return out;
}
/// @cond SKIP
struct val_t {
uint32_t nBases;
float min, max, sum, sumsq;
double scalar;
};
struct vals_t {
uint32_t n;
struct val_t **vals;
};
/// @endcond
void destroyVals_t(struct vals_t *v) {
uint32_t i;
if(!v) return;
for(i=0; i<v->n; i++) free(v->vals[i]);
if(v->vals) free(v->vals);
free(v);
}
//Determine the base-pair overlap between an interval and a block
double getScalar(uint32_t i_start, uint32_t i_end, uint32_t b_start, uint32_t b_end) {
double rv = 0.0;
if(b_start <= i_start) {
if(b_end > i_start) rv = ((double)(b_end - i_start))/(b_end-b_start);
} else if(b_start < i_end) {
if(b_end < i_end) rv = ((double)(b_end - b_start))/(b_end-b_start);
else rv = ((double)(i_end - b_start))/(b_end-b_start);
}
return rv;
}
//Returns NULL on error
static struct vals_t *getVals(bigWigFile_t *fp, bwOverlapBlock_t *o, int i, uint32_t tid, uint32_t start, uint32_t end) {
void *buf = NULL, *compBuf = NULL;
uLongf sz = fp->hdr->bufSize;
int compressed = 0, rv;
uint32_t *p, vtid, vstart, vend;
struct vals_t *vals = NULL;
struct val_t *v = NULL;
if(sz) {
compressed = 1;
buf = malloc(sz);
}
sz = 0; //This is now the size of the compressed buffer
if(bwSetPos(fp, o->offset[i])) goto error;
vals = calloc(1,sizeof(struct vals_t));
if(!vals) goto error;
v = malloc(sizeof(struct val_t));
if(!v) goto error;
if(sz < o->size[i]) compBuf = malloc(o->size[i]);
if(!compBuf) goto error;
if(bwRead(compBuf, o->size[i], 1, fp) != 1) goto error;
if(compressed) {
sz = fp->hdr->bufSize;
rv = uncompress(buf, &sz, compBuf, o->size[i]);
if(rv != Z_OK) goto error;
} else {
buf = compBuf;
sz = o->size[i];
}
p = buf;
while(((uLongf) ((char*)p - (char*)buf)) < sz) {
vtid = p[0];
vstart = p[1];
vend = p[2];
v->nBases = p[3];
v->min = ((float*) p)[4];
v->max = ((float*) p)[5];
v->sum = ((float*) p)[6];
v->sumsq = ((float*) p)[7];
v->scalar = getScalar(start, end, vstart, vend);
if(tid == vtid) {
if((start <= vstart && end > vstart) || (start < vend && start >= vstart)) {
vals->vals = realloc(vals->vals, sizeof(struct val_t*)*(vals->n+1));
if(!vals->vals) goto error;
vals->vals[vals->n++] = v;
v = malloc(sizeof(struct val_t));
if(!v) goto error;
}
if(vstart > end) break;
} else if(vtid > tid) {
break;
}
p+=8;
}
free(v);
free(buf);
if(compressed) free(compBuf);
return vals;
error:
if(buf) free(buf);
if(compBuf && compressed) free(compBuf);
if(v) free(v);
destroyVals_t(vals);
return NULL;
}
//On error, errno is set to ENOMEM and NaN is returned (though NaN can be returned normally)
static double blockMean(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint32_t tid, uint32_t start, uint32_t end) {
uint32_t i, j;
double output = 0.0, coverage = 0.0;
struct vals_t *v = NULL;
if(!blocks->n) return strtod("NaN", NULL);
//Iterate over the blocks
for(i=0; i<blocks->n; i++) {
v = getVals(fp, blocks, i, tid, start, end);
if(!v) goto error;
for(j=0; j<v->n; j++) {
output += v->vals[j]->sum * v->vals[j]->scalar;
coverage += v->vals[j]->nBases * v->vals[j]->scalar;
}
destroyVals_t(v);
}
if(!coverage) return strtod("NaN", NULL);
return output/coverage;
error:
if(v) free(v);
errno = ENOMEM;
return strtod("NaN", NULL);
}
static double intMean(bwOverlappingIntervals_t* ints, uint32_t start, uint32_t end) {
double sum = 0.0;
uint32_t nBases = 0, i, start_use, end_use;
if(!ints->l) return strtod("NaN", NULL);
for(i=0; i<ints->l; i++) {
start_use = ints->start[i];
end_use = ints->end[i];
if(ints->start[i] < start) start_use = start;
if(ints->end[i] > end) end_use = end;
nBases += end_use-start_use;
sum += (end_use-start_use)*((double) ints->value[i]);
}
return sum/nBases;
}
//Does UCSC compensate for partial block/range overlap?
static double blockDev(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint32_t tid, uint32_t start, uint32_t end) {
uint32_t i, j;
double mean = 0.0, ssq = 0.0, coverage = 0.0, diff;
struct vals_t *v = NULL;
if(!blocks->n) return strtod("NaN", NULL);
//Iterate over the blocks
for(i=0; i<blocks->n; i++) {
v = getVals(fp, blocks, i, tid, start, end);
if(!v) goto error;
for(j=0; j<v->n; j++) {
coverage += v->vals[j]->nBases * v->vals[j]->scalar;
mean += v->vals[j]->sum * v->vals[j]->scalar;
ssq += v->vals[j]->sumsq * v->vals[j]->scalar;
}
destroyVals_t(v);
v = NULL;
}
if(coverage<=1.0) return strtod("NaN", NULL);
diff = ssq-mean*mean/coverage;
if(coverage > 1.0) diff /= coverage-1;
if(fabs(diff) > 1e-8) { //Ignore floating point differences
return sqrt(diff);
} else {
return 0.0;
}
error:
if(v) destroyVals_t(v);
errno = ENOMEM;
return strtod("NaN", NULL);
}
//This uses compensated summation to account for finite precision math
static double intDev(bwOverlappingIntervals_t* ints, uint32_t start, uint32_t end) {
double v1 = 0.0, mean, rv;
uint32_t nBases = 0, i, start_use, end_use;
if(!ints->l) return strtod("NaN", NULL);
mean = intMean(ints, start, end);
for(i=0; i<ints->l; i++) {
start_use = ints->start[i];
end_use = ints->end[i];
if(ints->start[i] < start) start_use = start;
if(ints->end[i] > end) end_use = end;
nBases += end_use-start_use;
v1 += (end_use-start_use) * pow(ints->value[i]-mean, 2.0); //running sum of squared difference
}
if(nBases>=2) rv = sqrt(v1/(nBases-1));
else if(nBases==1) rv = sqrt(v1);
else rv = strtod("NaN", NULL);
return rv;
}
static double blockMax(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint32_t tid, uint32_t start, uint32_t end) {
uint32_t i, j, isNA = 1;
double o = strtod("NaN", NULL);
struct vals_t *v = NULL;
if(!blocks->n) return o;
//Iterate the blocks
for(i=0; i<blocks->n; i++) {
v = getVals(fp, blocks, i, tid, start, end);
if(!v) goto error;
for(j=0; j<v->n; j++) {
if(isNA) {
o = v->vals[j]->max;
isNA = 0;
} else if(v->vals[j]->max > o) {
o = v->vals[j]->max;
}
}
destroyVals_t(v);
}
return o;
error:
destroyVals_t(v);
errno = ENOMEM;
return strtod("NaN", NULL);
}
static double intMax(bwOverlappingIntervals_t* ints) {
uint32_t i;
double o;
if(ints->l < 1) return strtod("NaN", NULL);
o = ints->value[0];
for(i=1; i<ints->l; i++) {
if(ints->value[i] > o) o = ints->value[i];
}
return o;
}
static double blockMin(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint32_t tid, uint32_t start, uint32_t end) {
uint32_t i, j, isNA = 1;
double o = strtod("NaN", NULL);
struct vals_t *v = NULL;
if(!blocks->n) return o;
//Iterate the blocks
for(i=0; i<blocks->n; i++) {
v = getVals(fp, blocks, i, tid, start, end);
if(!v) goto error;
for(j=0; j<v->n; j++) {
if(isNA) {
o = v->vals[j]->min;
isNA = 0;
} else if(v->vals[j]->min < o) o = v->vals[j]->min;
}
destroyVals_t(v);
}
return o;
error:
destroyVals_t(v);
errno = ENOMEM;
return strtod("NaN", NULL);
}
static double intMin(bwOverlappingIntervals_t* ints) {
uint32_t i;
double o;
if(ints->l < 1) return strtod("NaN", NULL);
o = ints->value[0];
for(i=1; i<ints->l; i++) {
if(ints->value[i] < o) o = ints->value[i];
}
return o;
}
//Does UCSC compensate for only partial block/interval overlap?
static double blockCoverage(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint32_t tid, uint32_t start, uint32_t end) {
uint32_t i, j;
double o = 0.0;
struct vals_t *v = NULL;
if(!blocks->n) return strtod("NaN", NULL);
//Iterate over the blocks
for(i=0; i<blocks->n; i++) {
v = getVals(fp, blocks, i, tid, start, end);
if(!v) goto error;
for(j=0; j<v->n; j++) {
o+= v->vals[j]->nBases * v->vals[j]->scalar;
}
destroyVals_t(v);
}
if(o == 0.0) return strtod("NaN", NULL);
return o;
error:
destroyVals_t(v);
errno = ENOMEM;
return strtod("NaN", NULL);
}
static double intCoverage(bwOverlappingIntervals_t* ints, uint32_t start, uint32_t end) {
uint32_t i, start_use, end_use;
double o = 0.0;
if(!ints->l) return strtod("NaN", NULL);
for(i=0; i<ints->l; i++) {
start_use = ints->start[i];
end_use = ints->end[i];
if(start_use < start) start_use = start;
if(end_use > end) end_use = end;
o += end_use - start_use;
}
return o/(end-start);
}
static double blockSum(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint32_t tid, uint32_t start, uint32_t end) {
uint32_t i, j, sizeUse;
double o = 0.0;
struct vals_t *v = NULL;
if(!blocks->n) return strtod("NaN", NULL);
//Iterate over the blocks
for(i=0; i<blocks->n; i++) {
v = getVals(fp, blocks, i, tid, start, end);
if(!v) goto error;
for(j=0; j<v->n; j++) {
//Multiply the block average by min(bases covered, block overlap with interval)
sizeUse = v->vals[j]->scalar;
if(sizeUse > v->vals[j]->nBases) sizeUse = v->vals[j]->nBases;
o+= (v->vals[j]->sum * sizeUse) / v->vals[j]->nBases;
}
destroyVals_t(v);
}
if(o == 0.0) return strtod("NaN", NULL);
return o;
error:
destroyVals_t(v);
errno = ENOMEM;
return strtod("NaN", NULL);
}
static double intSum(bwOverlappingIntervals_t* ints, uint32_t start, uint32_t end) {
uint32_t i, start_use, end_use;
double o = 0.0;
if(!ints->l) return strtod("NaN", NULL);
for(i=0; i<ints->l; i++) {
start_use = ints->start[i];
end_use = ints->end[i];
if(start_use < start) start_use = start;
if(end_use > end) end_use = end;
o += (end_use - start_use) * ints->value[i];
}
return o;
}
//Returns NULL on error, otherwise a double* that needs to be free()d
static double *bwStatsFromZoom(bigWigFile_t *fp, int32_t level, uint32_t tid, uint32_t start, uint32_t end, uint32_t nBins, enum bwStatsType type) {
bwOverlapBlock_t *blocks = NULL;
double *output = NULL;
uint32_t pos = start, i, end2;
if(!fp->hdr->zoomHdrs->idx[level]) {
fp->hdr->zoomHdrs->idx[level] = bwReadIndex(fp, fp->hdr->zoomHdrs->indexOffset[level]);
if(!fp->hdr->zoomHdrs->idx[level]) return NULL;
}
errno = 0; //Sometimes libCurls sets and then doesn't unset errno on errors
output = malloc(sizeof(double)*nBins);
if(!output) return NULL;
for(i=0, pos=start; i<nBins; i++) {
end2 = start + ((double)(end-start)*(i+1))/((int) nBins);
blocks = walkRTreeNodes(fp, fp->hdr->zoomHdrs->idx[level]->root, tid, pos, end2);
if(!blocks) goto error;
switch(type) {
case 0:
//mean
output[i] = blockMean(fp, blocks, tid, pos, end2);
break;
case 1:
//stdev
output[i] = blockDev(fp, blocks, tid, pos, end2);
break;
case 2:
//max
output[i] = blockMax(fp, blocks, tid, pos, end2);
break;
case 3:
//min
output[i] = blockMin(fp, blocks, tid, pos, end2);
break;
case 4:
//cov
output[i] = blockCoverage(fp, blocks, tid, pos, end2)/(end2-pos);
break;
case 5:
//sum
output[i] = blockSum(fp, blocks, tid, pos, end2);
break;
default:
goto error;
break;
}
if(errno) goto error;
destroyBWOverlapBlock(blocks);
pos = end2;
}
return output;
error:
fprintf(stderr, "got an error in bwStatsFromZoom in the range %"PRIu32"-%"PRIu32": %s\n", pos, end2, strerror(errno));
if(blocks) destroyBWOverlapBlock(blocks);
if(output) free(output);
return NULL;
}
double *bwStatsFromFull(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, uint32_t nBins, enum bwStatsType type) {
bwOverlappingIntervals_t *ints = NULL;
double *output = malloc(sizeof(double)*nBins);
uint32_t i, pos = start, end2;
if(!output) return NULL;
for(i=0; i<nBins; i++) {
end2 = start + ((double)(end-start)*(i+1))/((int) nBins);
ints = bwGetOverlappingIntervals(fp, chrom, pos, end2);
if(!ints) {
output[i] = strtod("NaN", NULL);
continue;
}
switch(type) {
default :
case 0:
output[i] = intMean(ints, pos, end2);
break;
case 1:
output[i] = intDev(ints, pos, end2);
break;
case 2:
output[i] = intMax(ints);
break;
case 3:
output[i] = intMin(ints);
break;
case 4:
output[i] = intCoverage(ints, pos, end2);
break;
case 5:
output[i] = intSum(ints, pos, end2);
break;
}
bwDestroyOverlappingIntervals(ints);
pos = end2;
}
return output;
}
//Returns a list of floats of length nBins that must be free()d
//On error, NULL is returned
double *bwStats(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, uint32_t nBins, enum bwStatsType type) {
int32_t level = determineZoomLevel(fp, ((double)(end-start))/((int) nBins));
uint32_t tid = bwGetTid(fp, chrom);
if(tid == (uint32_t) -1) return NULL;
if(level == -1) return bwStatsFromFull(fp, chrom, start, end, nBins, type);
return bwStatsFromZoom(fp, level, tid, start, end, nBins, type);
}
================================================
FILE: libBigWig/bwValues.c
================================================
#include "bigWig.h"
#include "bwCommon.h"
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <zlib.h>
#include <errno.h>
static uint32_t roundup(uint32_t v) {
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;
return v;
}
//Returns the root node on success and NULL on error
static bwRTree_t *readRTreeIdx(bigWigFile_t *fp, uint64_t offset) {
uint32_t magic;
bwRTree_t *node;
if(!offset) {
if(bwSetPos(fp, fp->hdr->indexOffset)) return NULL;
} else {
if(bwSetPos(fp, offset)) return NULL;
}
if(bwRead(&magic, sizeof(uint32_t), 1, fp) != 1) return NULL;
if(magic != IDX_MAGIC) {
fprintf(stderr, "[readRTreeIdx] Mismatch in the magic number!\n");
return NULL;
}
node = calloc(1, sizeof(bwRTree_t));
if(!node) return NULL;
if(bwRead(&(node->blockSize), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->nItems), sizeof(uint64_t), 1, fp) != 1) goto error;
if(bwRead(&(node->chrIdxStart), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->baseStart), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->chrIdxEnd), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->baseEnd), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->idxSize), sizeof(uint64_t), 1, fp) != 1) goto error;
if(bwRead(&(node->nItemsPerSlot), sizeof(uint32_t), 1, fp) != 1) goto error;
//Padding
if(bwRead(&(node->blockSize), sizeof(uint32_t), 1, fp) != 1) goto error;
node->rootOffset = bwTell(fp);
//For remote files, libCurl sometimes sets errno to 115 and doesn't clear it
errno = 0;
return node;
error:
free(node);
return NULL;
}
//Returns a bwRTreeNode_t on success and NULL on an error
//For the root node, set offset to 0
static bwRTreeNode_t *bwGetRTreeNode(bigWigFile_t *fp, uint64_t offset) {
bwRTreeNode_t *node = NULL;
uint8_t padding;
uint16_t i;
if(offset) {
if(bwSetPos(fp, offset)) return NULL;
} else {
//seek
if(bwSetPos(fp, fp->idx->rootOffset)) return NULL;
}
node = calloc(1, sizeof(bwRTreeNode_t));
if(!node) return NULL;
if(bwRead(&(node->isLeaf), sizeof(uint8_t), 1, fp) != 1) goto error;
if(bwRead(&padding, sizeof(uint8_t), 1, fp) != 1) goto error;
if(bwRead(&(node->nChildren), sizeof(uint16_t), 1, fp) != 1) goto error;
node->chrIdxStart = malloc(sizeof(uint32_t)*(node->nChildren));
if(!node->chrIdxStart) goto error;
node->baseStart = malloc(sizeof(uint32_t)*(node->nChildren));
if(!node->baseStart) goto error;
node->chrIdxEnd = malloc(sizeof(uint32_t)*(node->nChildren));
if(!node->chrIdxEnd) goto error;
node->baseEnd = malloc(sizeof(uint32_t)*(node->nChildren));
if(!node->baseEnd) goto error;
node->dataOffset = malloc(sizeof(uint64_t)*(node->nChildren));
if(!node->dataOffset) goto error;
if(node->isLeaf) {
node->x.size = malloc(node->nChildren * sizeof(uint64_t));
if(!node->x.size) goto error;
} else {
node->x.child = calloc(node->nChildren, sizeof(struct bwRTreeNode_t *));
if(!node->x.child) goto error;
}
for(i=0; i<node->nChildren; i++) {
if(bwRead(&(node->chrIdxStart[i]), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->baseStart[i]), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->chrIdxEnd[i]), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->baseEnd[i]), sizeof(uint32_t), 1, fp) != 1) goto error;
if(bwRead(&(node->dataOffset[i]), sizeof(uint64_t), 1, fp) != 1) goto error;
if(node->isLeaf) {
if(bwRead(&(node->x.size[i]), sizeof(uint64_t), 1, fp) != 1) goto error;
}
}
return node;
error:
if(node->chrIdxStart) free(node->chrIdxStart);
if(node->baseStart) free(node->baseStart);
if(node->chrIdxEnd) free(node->chrIdxEnd);
if(node->baseEnd) free(node->baseEnd);
if(node->dataOffset) free(node->dataOffset);
if(node->isLeaf && node->x.size) free(node->x.size);
else if((!node->isLeaf) && node->x.child) free(node->x.child);
free(node);
return NULL;
}
void destroyBWOverlapBlock(bwOverlapBlock_t *b) {
if(!b) return;
if(b->size) free(b->size);
if(b->offset) free(b->offset);
free(b);
}
//Returns a bwOverlapBlock_t * object or NULL on error.
static bwOverlapBlock_t *overlapsLeaf(bwRTreeNode_t *node, uint32_t tid, uint32_t start, uint32_t end) {
uint16_t i, idx = 0;
bwOverlapBlock_t *o = calloc(1, sizeof(bwOverlapBlock_t));
if(!o) return NULL;
for(i=0; i<node->nChildren; i++) {
if(tid < node->chrIdxStart[i] || tid > node->chrIdxEnd[i]) continue;
/*
The individual blocks can theoretically span multiple contigs.
So if we treat the first/last contig in the range as special
but anything in the middle is a guaranteed match
*/
if(node->chrIdxStart[i] != node->chrIdxEnd[i]) {
if(tid == node->chrIdxStart[i]) {
if(node->baseStart[i] >= end) break;
} else if(tid == node->chrIdxEnd[i]) {
if(node->baseEnd[i] <= start) continue;
}
} else {
if(node->baseStart[i] >= end || node->baseEnd[i] <= start) continue;
}
o->n++;
}
if(o->n) {
o->offset = malloc(sizeof(uint64_t) * (o->n));
if(!o->offset) goto error;
o->size = malloc(sizeof(uint64_t) * (o->n));
if(!o->size) goto error;
for(i=0; i<node->nChildren; i++) {
if(tid < node->chrIdxStart[i] || tid > node->chrIdxEnd[i]) continue;
if(node->chrIdxStart[i] != node->chrIdxEnd[i]) {
if(tid == node->chrIdxStart[i]) {
if(node->baseStart[i] >= end) continue;
} else if(tid == node->chrIdxEnd[i]) {
if(node->baseEnd[i] <= start) continue;
}
} else {
if(node->baseStart[i] >= end || node->baseEnd[i] <= start) continue;
}
o->offset[idx] = node->dataOffset[i];
o->size[idx++] = node->x.size[i];
if(idx >= o->n) break;
}
}
if(idx != o->n) { //This should never happen
fprintf(stderr, "[overlapsLeaf] Mismatch between number of overlaps calculated and found!\n");
goto error;
}
return o;
error:
if(o) destroyBWOverlapBlock(o);
return NULL;
}
//This will free l2 unless there's an error!
//Returns NULL on error, otherwise the merged lists
static bwOverlapBlock_t *mergeOverlapBlocks(bwOverlapBlock_t *b1, bwOverlapBlock_t *b2) {
uint64_t i,j;
if(!b2) return b1;
if(!b2->n) {
destroyBWOverlapBlock(b2);
return b1;
}
if(!b1->n) {
destroyBWOverlapBlock(b1);
return b2;
}
j = b1->n;
b1->n += b2->n;
b1->offset = realloc(b1->offset, sizeof(uint64_t) * (b1->n+b2->n));
if(!b1->offset) goto error;
b1->size = realloc(b1->size, sizeof(uint64_t) * (b1->n+b2->n));
if(!b1->size) goto error;
for(i=0; i<b2->n; i++) {
b1->offset[j+i] = b2->offset[i];
b1->size[j+i] = b2->size[i];
}
destroyBWOverlapBlock(b2);
return b1;
error:
destroyBWOverlapBlock(b1);
return NULL;
}
//Returns NULL and sets nOverlaps to >0 on error, otherwise nOverlaps is the number of file offsets returned
//The output needs to be free()d if not NULL (likewise with *sizes)
static bwOverlapBlock_t *overlapsNonLeaf(bigWigFile_t *fp, bwRTreeNode_t *node, uint32_t tid, uint32_t start, uint32_t end) {
uint16_t i;
bwOverlapBlock_t *nodeBlocks, *output = calloc(1, sizeof(bwOverlapBlock_t));
if(!output) return NULL;
for(i=0; i<node->nChildren; i++) {
if(tid < node->chrIdxStart[i]) break;
if(tid < node->chrIdxStart[i] || tid > node->chrIdxEnd[i]) continue;
if(node->chrIdxStart[i] != node->chrIdxEnd[i]) { //child spans contigs
if(tid == node->chrIdxStart[i]) {
if(node->baseStart[i] >= end) continue;
} else if(tid == node->chrIdxEnd[i]) {
if(node->baseEnd[i] <= start) continue;
}
} else {
if(end <= node->baseStart[i] || start >= node->baseEnd[i]) continue;
}
//We have an overlap!
if(!node->x.child[i])
node->x.child[i] = bwGetRTreeNode(fp, node->dataOffset[i]);
if(!node->x.child[i]) goto error;
if(node->x.child[i]->isLeaf) { //leaf
nodeBlocks = overlapsLeaf(node->x.child[i], tid, start, end);
} else { //non-leaf
nodeBlocks = overlapsNonLeaf(fp, node->x.child[i], tid, start, end);
}
//The output is processed the same regardless of leaf/non-leaf
if(!nodeBlocks) goto error;
else {
output = mergeOverlapBlocks(output, nodeBlocks);
if(!output) {
destroyBWOverlapBlock(nodeBlocks);
goto error;
}
}
}
return output;
error:
destroyBWOverlapBlock(output);
return NULL;
}
//Returns NULL and sets nOverlaps to >0 on error, otherwise nOverlaps is the number of file offsets returned
//The output must be free()d
bwOverlapBlock_t *walkRTreeNodes(bigWigFile_t *bw, bwRTreeNode_t *root, uint32_t tid, uint32_t start, uint32_t end) {
if(root->isLeaf) return overlapsLeaf(root, tid, start, end);
return overlapsNonLeaf(bw, root, tid, start, end);
}
//In reality, a hash or some sort of tree structure is probably faster...
//Return -1 (AKA 0xFFFFFFFF...) on "not there", so we can hold (2^32)-1 items.
uint32_t bwGetTid(const bigWigFile_t *fp, const char *chrom) {
uint32_t i;
if(!chrom) return -1;
for(i=0; i<fp->cl->nKeys; i++) {
if(strcmp(chrom, fp->cl->chrom[i]) == 0) return i;
}
return -1;
}
static bwOverlapBlock_t *bwGetOverlappingBlocks(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end) {
uint32_t tid = bwGetTid(fp, chrom);
if(tid == (uint32_t) -1) {
fprintf(stderr, "[bwGetOverlappingBlocks] Non-existent contig: %s\n", chrom);
return NULL;
}
//Get the info if needed
if(!fp->idx) {
fp->idx = readRTreeIdx(fp, fp->hdr->indexOffset);
if(!fp->idx) {
return NULL;
}
}
if(!fp->idx->root) fp->idx->root = bwGetRTreeNode(fp, 0);
if(!fp->idx->root) return NULL;
return walkRTreeNodes(fp, fp->idx->root, tid, start, end);
}
void bwFillDataHdr(bwDataHeader_t *hdr, void *b) {
hdr->tid = ((uint32_t*)b)[0];
hdr->start = ((uint32_t*)b)[1];
hdr->end = ((uint32_t*)b)[2];
hdr->step = ((uint32_t*)b)[3];
hdr->span = ((uint32_t*)b)[4];
hdr->type = ((uint8_t*)b)[20];
hdr->nItems = ((uint16_t*)b)[11];
}
void bwDestroyOverlappingIntervals(bwOverlappingIntervals_t *o) {
if(!o) return;
if(o->start) free(o->start);
if(o->end) free(o->end);
if(o->value) free(o->value);
free(o);
}
void bbDestroyOverlappingEntries(bbOverlappingEntries_t *o) {
uint32_t i;
if(!o) return;
if(o->start) free(o->start);
if(o->end) free(o->end);
if(o->str) {
for(i=0; i<o->l; i++) {
if(o->str[i]) free(o->str[i]);
}
free(o->str);
}
free(o);
}
//Returns NULL on error, in which case o has been free()d
static bwOverlappingIntervals_t *pushIntervals(bwOverlappingIntervals_t *o, uint32_t start, uint32_t end, float value) {
if(o->l+1 >= o->m) {
o->m = roundup(o->l+1);
o->start = realloc(o->start, o->m * sizeof(uint32_t));
if(!o->start) goto error;
o->end = realloc(o->end, o->m * sizeof(uint32_t));
if(!o->end) goto error;
o->value = realloc(o->value, o->m * sizeof(float));
if(!o->value) goto error;
}
o->start[o->l] = start;
o->end[o->l] = end;
o->value[o->l++] = value;
return o;
error:
bwDestroyOverlappingIntervals(o);
return NULL;
}
static bbOverlappingEntries_t *pushBBIntervals(bbOverlappingEntries_t *o, uint32_t start, uint32_t end, char *str, int withString) {
if(o->l+1 >= o->m) {
o->m = roundup(o->l+1);
o->start = realloc(o->start, o->m * sizeof(uint32_t));
if(!o->start) goto error;
o->end = realloc(o->end, o->m * sizeof(uint32_t));
if(!o->end) goto error;
if(withString) {
o->str = realloc(o->str, o->m * sizeof(char**));
if(!o->str) goto error;
}
}
o->start[o->l] = start;
o->end[o->l] = end;
if(withString) o->str[o->l] = bwStrdup(str);
o->l++;
return o;
error:
bbDestroyOverlappingEntries(o);
return NULL;
}
//Returns NULL on error
bwOverlappingIntervals_t *bwGetOverlappingIntervalsCore(bigWigFile_t *fp, bwOverlapBlock_t *o, uint32_t tid, uint32_t ostart, uint32_t oend) {
uint64_t i;
uint16_t j;
int compressed = 0, rv;
uLongf sz = fp->hdr->bufSize, tmp;
void *buf = NULL, *compBuf = NULL;
uint32_t start = 0, end , *p;
float value;
bwDataHeader_t hdr;
bwOverlappingIntervals_t *output = calloc(1, sizeof(bwOverlappingIntervals_t));
if(!output) goto error;
if(!o) return output;
if(!o->n) return output;
if(sz) {
compressed = 1;
buf = malloc(sz);
}
sz = 0; //This is now the size of the compressed buffer
for(i=0; i<o->n; i++) {
if(bwSetPos(fp, o->offset[i])) goto error;
if(sz < o->size[i]) {
compBuf = realloc(compBuf, o->size[i]);
sz = o->size[i];
}
if(!compBuf) goto error;
if(bwRead(compBuf, o->size[i], 1, fp) != 1) goto error;
if(compressed) {
tmp = fp->hdr->bufSize; //This gets over-written by uncompress
rv = uncompress(buf, (uLongf *) &tmp, compBuf, o->size[i]);
if(rv != Z_OK) goto error;
} else {
buf = compBuf;
}
//TODO: ensure that tmp is large enough!
bwFillDataHdr(&hdr, buf);
p = ((uint32_t*) buf);
p += 6;
if(hdr.tid != tid) continue;
if(hdr.type == 3) start = hdr.start - hdr.step;
//FIXME: We should ensure that sz is large enough to hold nItems of the given type
for(j=0; j<hdr.nItems; j++) {
switch(hdr.type) {
case 1:
start = *p;
p++;
end = *p;
p++;
value = *((float *)p);
p++;
break;
case 2:
start = *p;
p++;
end = start + hdr.span;
value = *((float *)p);
p++;
break;
case 3:
start += hdr.step;
end = start+hdr.span;
value = *((float *)p);
p++;
break;
default :
goto error;
break;
}
if(end <= ostart || start >= oend) continue;
//Push the overlap
if(!pushIntervals(output, start, end, value)) goto error;
}
}
if(compressed && buf) free(buf);
if(compBuf) free(compBuf);
return output;
error:
fprintf(stderr, "[bwGetOverlappingIntervalsCore] Got an error\n");
if(output) bwDestroyOverlappingIntervals(output);
if(compressed && buf) free(buf);
if(compBuf) free(compBuf);
return NULL;
}
bbOverlappingEntries_t *bbGetOverlappingEntriesCore(bigWigFile_t *fp, bwOverlapBlock_t *o, uint32_t tid, uint32_t ostart, uint32_t oend, int withString) {
uint64_t i;
int compressed = 0, rv, slen;
uLongf sz = fp->hdr->bufSize, tmp = 0;
void *buf = NULL, *bufEnd = NULL, *compBuf = NULL;
uint32_t entryTid = 0, start = 0, end;
char *str;
bbOverlappingEntries_t *output = calloc(1, sizeof(bbOverlappingEntries_t));
if(!output) goto error;
if(!o) return output;
if(!o->n) return output;
if(sz) {
compressed = 1;
buf = malloc(sz);
}
sz = 0; //This is now the size of the compressed buffer
for(i=0; i<o->n; i++) {
if(bwSetPos(fp, o->offset[i])) goto error;
if(sz < o->size[i]) {
compBuf = realloc(compBuf, o->size[i]);
sz = o->size[i];
}
if(!compBuf) goto error;
if(bwRead(compBuf, o->size[i], 1, fp) != 1) goto error;
if(compressed) {
tmp = fp->hdr->bufSize; //This gets over-written by uncompress
rv = uncompress(buf, (uLongf *) &tmp, compBuf, o->size[i]);
if(rv != Z_OK) goto error;
} else {
buf = compBuf;
tmp = o->size[i]; //TODO: Is this correct? Do non-gzipped bigBeds exist?
}
bufEnd = (char*)buf + tmp;
while(buf < bufEnd) {
entryTid = ((uint32_t*)buf)[0];
start = ((uint32_t*)buf)[1];
end = ((uint32_t*)buf)[2];
buf = (char*)buf + 12;
str = (char*)buf;
slen = strlen(str) + 1;
buf = (char*)buf + slen;
if(entryTid < tid) continue;
if(entryTid > tid) break;
if(end <= ostart) continue;
if(start >= oend) break;
//Push the overlap
if(!pushBBIntervals(output, start, end, str, withString)) goto error;
}
buf = (char*)bufEnd - tmp; //reset the buffer pointer
}
if(compressed && buf) free(buf);
if(compBuf) free(compBuf);
return output;
error:
fprintf(stderr, "[bbGetOverlappingEntriesCore] Got an error\n");
buf = (char*)bufEnd - tmp;
if(output) bbDestroyOverlappingEntries(output);
if(compressed && buf) free(buf);
if(compBuf) free(compBuf);
return NULL;
}
//Returns NULL on error OR no intervals, which is a bad design...
bwOverlappingIntervals_t *bwGetOverlappingIntervals(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end) {
bwOverlappingIntervals_t *output;
uint32_t tid = bwGetTid(fp, chrom);
if(tid == (uint32_t) -1) return NULL;
bwOverlapBlock_t *blocks = bwGetOverlappingBlocks(fp, chrom, start, end);
if(!blocks) return NULL;
output = bwGetOverlappingIntervalsCore(fp, blocks, tid, start, end);
destroyBWOverlapBlock(blocks);
return output;
}
//Like above, but for bigBed files
bbOverlappingEntries_t *bbGetOverlappingEntries(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, int withString) {
bbOverlappingEntries_t *output;
uint32_t tid = bwGetTid(fp, chrom);
if(tid == (uint32_t) -1) return NULL;
bwOverlapBlock_t *blocks = bwGetOverlappingBlocks(fp, chrom, start, end);
if(!blocks) return NULL;
output = bbGetOverlappingEntriesCore(fp, blocks, tid, start, end, withString);
destroyBWOverlapBlock(blocks);
return output;
}
//Returns NULL on error
bwOverlapIterator_t *bwOverlappingIntervalsIterator(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, uint32_t blocksPerIteration) {
bwOverlapIterator_t *output = NULL;
uint64_t n;
uint32_t tid = bwGetTid(fp, chrom);
if(tid == (uint32_t) -1) return output;
output = calloc(1, sizeof(bwOverlapIterator_t));
if(!output) return output;
bwOverlapBlock_t *blocks = bwGetOverlappingBlocks(fp, chrom, start, end);
output->bw = fp;
output->tid = tid;
output->start = start;
output->end = end;
output->blocks = blocks;
output->blocksPerIteration = blocksPerIteration;
if(blocks) {
n = blocks->n;
if(n>blocksPerIteration) blocks->n = blocksPerIteration;
output->intervals = bwGetOverlappingIntervalsCore(fp, blocks,tid, start, end);
blocks->n = n;
output->offset = blocksPerIteration;
}
output->data = output->intervals;
return output;
}
//Returns NULL on error
bwOverlapIterator_t *bbOverlappingEntriesIterator(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, int withString, uint32_t blocksPerIteration) {
bwOverlapIterator_t *output = NULL;
uint64_t n;
uint32_t tid = bwGetTid(fp, chrom);
if(tid == (uint32_t) -1) return output;
output = calloc(1, sizeof(bwOverlapIterator_t));
if(!output) return output;
bwOverlapBlock_t *blocks = bwGetOverlappingBlocks(fp, chrom, start, end);
output->bw = fp;
output->tid = tid;
output->start = start;
output->end = end;
output->blocks = blocks;
output->blocksPerIteration = blocksPerIteration;
output->withString = withString;
if(blocks) {
n = blocks->n;
if(n>blocksPerIteration) blocks->n = blocksPerIteration;
output->entries = bbGetOverlappingEntriesCore(fp, blocks,tid, start, end, withString);
blocks->n = n;
output->offset = blocksPerIteration;
}
output->data = output->entries;
return output;
}
void bwIteratorDestroy(bwOverlapIterator_t *iter) {
if(!iter) return;
if(iter->blocks) destroyBWOverlapBlock((bwOverlapBlock_t*) iter->blocks);
if(iter->intervals) bwDestroyOverlappingIntervals(iter->intervals);
if(iter->entries) bbDestroyOverlappingEntries(iter->entries);
free(iter);
}
//On error, points to NULL and destroys the input
bwOverlapIterator_t *bwIteratorNext(bwOverlapIterator_t *iter) {
uint64_t n, *offset, *size;
bwOverlapBlock_t *blocks = iter->blocks;
if(iter->intervals) {
bwDestroyOverlappingIntervals(iter->intervals);
iter->intervals = NULL;
}
if(iter->entries) {
bbDestroyOverlappingEntries(iter->entries);
iter->entries = NULL;
}
iter->data = NULL;
if(iter->offset < blocks->n) {
//store the previous values
n = blocks->n;
offset = blocks->offset;
size = blocks->size;
//Move the start of the blocks
blocks->offset += iter->offset;
blocks->size += iter->offset;
if(iter->offset + iter->blocksPerIteration > n) {
blocks->n = blocks->n - iter->offset;
} else {
blocks->n = iter->blocksPerIteration;
}
//Get the intervals or entries, as appropriate
if(iter->bw->type == 0) {
//bigWig
iter->intervals = bwGetOverlappingIntervalsCore(iter->bw, blocks, iter->tid, iter->start, iter->end);
iter->data = iter->intervals;
} else {
//bigBed
iter->entries = bbGetOverlappingEntriesCore(iter->bw, blocks, iter->tid, iter->start, iter->end, iter->withString);
iter->data = iter->entries;
}
iter->offset += iter->blocksPerIteration;
//reset the values in iter->blocks
blocks->n = n;
blocks->offset = offset;
blocks->size = size;
//Check for error
if(!iter->intervals && !iter->entries) goto error;
}
return iter;
error:
bwIteratorDestroy(iter);
return NULL;
}
//This is like bwGetOverlappingIntervals, except it returns 1 base windows. If includeNA is not 0, then a value will be returned for every position in the range (defaulting to NAN).
//The ->end member is NULL
//If includeNA is not 0 then ->start is also NULL, since it's implied
//Note that bwDestroyOverlappingIntervals() will work in either case
bwOverlappingIntervals_t *bwGetValues(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t end, int includeNA) {
uint32_t i, j, n;
bwOverlappingIntervals_t *output = NULL;
bwOverlappingIntervals_t *intermediate = bwGetOverlappingIntervals(fp, chrom, start, end);
if(!intermediate) return NULL;
output = calloc(1, sizeof(bwOverlappingIntervals_t));
if(!output) goto error;
if(includeNA) {
output->l = end-start;
output->value = malloc(output->l*sizeof(float));
if(!output->value) goto error;
for(i=0; i<output->l; i++) output->value[i] = NAN;
for(i=0; i<intermediate->l; i++) {
for(j=intermediate->start[i]; j<intermediate->end[i]; j++) {
if(j < start || j >= end) continue;
output->value[j-start] = intermediate->value[i];
}
}
} else {
n = 0;
for(i=0; i<intermediate->l; i++) {
if(intermediate->start[i] < start) intermediate->start[i] = start;
if(intermediate->end[i] > end) intermediate->end[i] = end;
n += intermediate->end[i]-intermediate->start[i];
}
output->l = n;
output->start = malloc(sizeof(uint32_t)*n);
if(!output->start) goto error;
output->value = malloc(sizeof(float)*n);
if(!output->value) goto error;
n = 0; //this is now the index
for(i=0; i<intermediate->l; i++) {
for(j=intermediate->start[i]; j<intermediate->end[i]; j++) {
if(j < start || j >= end) continue;
output->start[n] = j;
output->value[n++] = intermediate->value[i];
}
}
}
bwDestroyOverlappingIntervals(intermediate);
return output;
error:
if(intermediate) bwDestroyOverlappingIntervals(intermediate);
if(output) bwDestroyOverlappingIntervals(output);
return NULL;
}
void bwDestroyIndexNode(bwRTreeNode_t *node) {
uint16_t i;
if(!node) return;
free(node->chrIdxStart);
free(node->baseStart);
free(node->chrIdxEnd);
free(node->baseEnd);
free(node->dataOffset);
if(!node->isLeaf) {
for(i=0; i<node->nChildren; i++) {
bwDestroyIndexNode(node->x.child[i]);
}
free(node->x.child);
} else {
free(node->x.size);
}
free(node);
}
void bwDestroyIndex(bwRTree_t *idx) {
bwDestroyIndexNode(idx->root);
free(idx);
}
//Returns a pointer to the requested index (@offset, unless it's 0, in which case the index for the values is returned
//Returns NULL on error
bwRTree_t *bwReadIndex(bigWigFile_t *fp, uint64_t offset) {
bwRTree_t *idx = readRTreeIdx(fp, offset);
if(!idx) return NULL;
//Read in the root node
idx->root = bwGetRTreeNode(fp, idx->rootOffset);
if(!idx->root) {
bwDestroyIndex(idx);
return NULL;
}
return idx;
}
================================================
FILE: libBigWig/bwValues.h
================================================
#ifndef LIBBIGWIG_VALUES_H
#define LIBBIGWIG_VALUES_H
#include <inttypes.h>
/*! \file bwValues.h
*
* You should not directly use functions and structures defined here. They're really meant for internal use only.
*
* All of the structures here need to be destroyed or you'll leak memory! There are methods available to destroy anything that you need to take care of yourself.
*/
//N.B., coordinates are still 0-based half open!
/*!
* @brief A node within an R-tree holding the index for data.
*
* Note that there are two types of nodes: leaf and twig. Leaf nodes point to where data actually is. Twig nodes point to additional index nodes, which may or may not be leaves. Each of these nodes has additional children, which may span multiple chromosomes/contigs.
*
* With the start/end position, these positions refer specifically to the chromosomes specified in chrIdxStart/chrIdxEnd. Any chromosomes between these are completely spanned by a given child.
*/
typedef struct bwRTreeNode_t {
uint8_t isLeaf; /**<Is this node a leaf?*/
//1 byte of padding
uint16_t nChildren; /**<The number of children of this node, all lists have this length.*/
uint32_t *chrIdxStart; /**<A list of the starting chromosome indices of each child.*/
uint32_t *baseStart; /**<A list of the start position of each child.*/
uint32_t *chrIdxEnd; /**<A list of the end chromosome indices of each child.*/
uint32_t *baseEnd; /**<A list of the end position of each child.*/
uint64_t *dataOffset; /**<For leaves, the offset to the on-disk data. For twigs, the offset to the child node.*/
union {
uint64_t *size; /**<Leaves only: The size of the data block.*/
struct bwRTreeNode_t **child; /**<Twigs only: The child node(s).*/
} x; /**<A union holding either size or child*/
} bwRTreeNode_t;
/*!
* A header and index that points to an R-tree that in turn points to data blocks.
*/
//TODO rootOffset is pointless, it's 48bytes after the indexOffset
typedef struct {
uint32_t blockSize; /**<The maximum number of children a node can have*/
uint64_t nItems; /**<The total number of data blocks pointed to by the tree. This is completely redundant.*/
uint32_t chrIdxStart; /**<The index to the first chromosome described.*/
uint32_t baseStart; /**<The first position on chrIdxStart with a value.*/
uint32_t chrIdxEnd; /**<The index of the last chromosome with an entry.*/
uint32_t baseEnd; /**<The last position on chrIdxEnd with an entry.*/
uint64_t idxSize; /**<This is actually the offset of the index rather than the size?!? Yes, it's completely redundant.*/
uint32_t nItemsPerSlot; /**<This is always 1!*/
//There's 4 bytes of padding in the file here
uint64_t rootOffset; /**<The offset to the root node of the R-Tree (on disk). Yes, this is redundant.*/
bwRTreeNode_t *root; /**<A pointer to the root node.*/
} bwRTree_t;
/*!
* @brief This structure holds the data blocks that overlap a given interval.
*/
typedef struct {
uint64_t n; /**<The number of blocks that overlap. This *MAY* be 0!.*/
uint64_t *offset; /**<The offset to the on-disk position of the block.*/
uint64_t *size; /**<The size of each block on disk (in bytes).*/
} bwOverlapBlock_t;
/*!
* @brief The header section of a given data block.
*
* There are 3 types of data blocks in bigWig files, each with slightly different needs. This is all taken care of internally.
*/
typedef struct {
uint32_t tid; /**<The chromosome ID.*/
uint32_t start; /**<The start position of a block*/
uint32_t end; /**<The end position of a block*/
uint32_t step; /**<The step size of the values*/
uint32_t span; /**<The span of each data value*/
uint8_t type; /**<The block type: 1, bedGraph; 2, variable step; 3, fixed step.*/
uint16_t nItems; /**<The number of values in a given block.*/
} bwDataHeader_t;
#endif // LIBBIGWIG_VALUES_H
================================================
FILE: libBigWig/bwWrite.c
================================================
#include <limits.h>
#include <float.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include "bigWig.h"
#include "bwCommon.h"
/// @cond SKIP
struct val_t {
uint32_t tid;
uint32_t start;
uint32_t nBases;
float min, max, sum, sumsq;
double scalar;
struct val_t *next;
};
/// @endcond
//Create a chromList_t and attach it to a bigWigFile_t *. Returns NULL on error
//Note that chroms and lengths are duplicated, so you MUST free the input
chromList_t *bwCreateChromList(const char* const* chroms, const uint32_t *lengths, int64_t n) {
int64_t i = 0;
chromList_t *cl = calloc(1, sizeof(chromList_t));
if(!cl) return NULL;
cl->nKeys = n;
cl->chrom = malloc(sizeof(char*)*n);
cl->len = malloc(sizeof(uint32_t)*n);
if(!cl->chrom) goto error;
if(!cl->len) goto error;
for(i=0; i<n; i++) {
cl->len[i] = lengths[i];
cl->chrom[i] = bwStrdup(chroms[i]);
if(!cl->chrom[i]) goto error;
}
return cl;
error:
if(i) {
int64_t j;
for(j=0; j<i; j++) free(cl->chrom[j]);
}
if(cl) {
if(cl->chrom) free(cl->chrom);
if(cl->len) free(cl->len);
free(cl);
}
return NULL;
}
//If maxZooms == 0, then 0 is used (i.e., there are no zoom levels). If maxZooms < 0 or > 65535 then 10 is used.
//TODO allow changing bufSize and blockSize
int bwCreateHdr(bigWigFile_t *fp, int32_t maxZooms) {
if(!fp->isWrite) return 1;
bigWigHdr_t *hdr = calloc(1, sizeof(bigWigHdr_t));
if(!hdr) return 2;
hdr->version = 4;
if(maxZooms < 0 || maxZooms > 65535) {
hdr->nLevels = 10;
} else {
hdr->nLevels = maxZooms;
}
hdr->bufSize = 32768; //When the file is finalized this is reset if fp->writeBuffer->compressPsz is 0!
hdr->minVal = DBL_MAX;
hdr->maxVal = DBL_MIN;
fp->hdr = hdr;
fp->writeBuffer->blockSize = 64;
//Allocate the writeBuffer buffers
fp->writeBuffer->compressPsz = compressBound(hdr->bufSize);
fp->writeBuffer->compressP = malloc(fp->writeBuffer->compressPsz);
if(!fp->writeBuffer->compressP) return 3;
fp->writeBuffer->p = calloc(1,hdr->bufSize);
if(!fp->writeBuffer->p) return 4;
return 0;
}
//return 0 on success
static int writeAtPos(void *ptr, size_t sz, size_t nmemb, size_t pos, FILE *fp) {
size_t curpos = ftell(fp);
if(fseek(fp, pos, SEEK_SET)) return 1;
if(fwrite(ptr, sz, nmemb, fp) != nmemb) return 2;
if(fseek(fp, curpos, SEEK_SET)) return 3;
return 0;
}
//We lose keySize bytes on error
static int writeChromList(FILE *fp, chromList_t *cl) {
uint16_t k;
uint32_t j, magic = CIRTREE_MAGIC;
uint32_t nperblock = (cl->nKeys > 0x7FFF) ? 0x7FFF : cl->nKeys; //Items per leaf/non-leaf, there are no unsigned ints in java :(
uint32_t nblocks, keySize = 0, valSize = 8; //In theory valSize could be optimized, in practice that'd be annoying
uint64_t i, nonLeafEnd, leafSize, nextLeaf;
uint8_t eight;
int64_t i64;
char *chrom;
size_t l;
if(cl->nKeys > 1073676289) {
fprintf(stderr, "[writeChromList] Error: Currently only 1,073,676,289 contigs are supported. If you really need more then please post a request on github.\n");
return 1;
}
nblocks = cl->nKeys/nperblock;
nblocks += ((cl->nKeys % nperblock) > 0)?1:0;
for(i64=0; i64<cl->nKeys; i64++) {
l = strlen(cl->chrom[i64]);
if(l>keySize) keySize = l;
}
l--; //We don't null terminate strings, because schiess mich tot
chrom = calloc(keySize, sizeof(char));
//Write the root node of a largely pointless tree
if(fwrite(&magic, sizeof(uint32_t), 1, fp) != 1) return 1;
if(fwrite(&nperblock, sizeof(uint32_t), 1, fp) != 1) return 2;
if(fwrite(&keySize, sizeof(uint32_t), 1, fp) != 1) return 3;
if(fwrite(&valSize, sizeof(uint32_t), 1, fp) != 1) return 4;
if(fwrite(&(cl->nKeys), sizeof(uint64_t), 1, fp) != 1) return 5;
//Padding?
i=0;
if(fwrite(&i, sizeof(uint64_t), 1, fp) != 1) return 6;
//Do we need a non-leaf node?
if(nblocks > 1) {
eight = 0;
if(fwrite(&eight, sizeof(uint8_t), 1, fp) != 1) return 7;
if(fwrite(&eight, sizeof(uint8_t), 1, fp) != 1) return 8; //padding
if(fwrite(&nblocks, sizeof(uint16_t), 1, fp) != 1) return 8;
nonLeafEnd = ftell(fp) + nperblock * (keySize + 8);
leafSize = nperblock * (keySize + 8) + 4;
for(i=0; i<nblocks; i++) { //Why yes, this is pointless
chrom = strncpy(chrom, cl->chrom[i * nperblock], keySize);
nextLeaf = nonLeafEnd + i * leafSize;
if(fwrite(chrom, keySize, 1, fp) != 1) return 9;
if(fwrite(&nextLeaf, sizeof(uint64_t), 1, fp) != 1) return 10;
}
for(i=0; i<keySize; i++) chrom[i] = '\0';
nextLeaf = 0;
for(i=nblocks; i<nperblock; i++) {
if(fwrite(chrom, keySize, 1, fp) != 1) return 9;
if(fwrite(&nextLeaf, sizeof(uint64_t), 1, fp) != 1) return 10;
}
}
//Write the leaves
nextLeaf = 0;
for(i=0, j=0; i<nblocks; i++) {
eight = 1;
if(fwrite(&eight, sizeof(uint8_t), 1, fp) != 1) return 11;
eight = 0;
if(fwrite(&eight, sizeof(uint8_t), 1, fp) != 1) return 12;
if(cl->nKeys - j < nperblock) {
k = cl->nKeys - j;
if(fwrite(&k, sizeof(uint16_t), 1, fp) != 1) return 13;
} else {
if(fwrite(&nperblock, sizeof(uint16_t), 1, fp) != 1) return 13;
}
for(k=0; k<nperblock; k++) {
if(j>=cl->nKeys) {
if(chrom[0]) {
for(l=0; l<keySize; l++) chrom[l] = '\0';
}
if(fwrite(chrom, keySize, 1, fp) != 1) return 15;
if(fwrite(&nextLeaf, sizeof(uint64_t), 1, fp) != 1) return 16;
} else {
chrom = strncpy(chrom, cl->chrom[j], keySize);
if(fwrite(chrom, keySize, 1, fp) != 1) return 15;
if(fwrite(&j, sizeof(uint32_t), 1, fp) != 1) return 16;
if(fwrite(&(cl->len[j++]), sizeof(uint32_t), 1, fp) != 1) return 17;
}
}
}
free(chrom);
return 0;
}
//returns 0 on success
//Still need to fill in indexOffset
int bwWriteHdr(bigWigFile_t *bw) {
uint32_t magic = BIGWIG_MAGIC;
uint16_t two = 4;
FILE *fp;
const uint8_t pbuff[58] = {0}; // 58 bytes of nothing
const void *p = (const void *)&pbuff;
if(!bw->isWrite) return 1;
//The header itself, largely just reserving space...
fp = bw->URL->x.fp;
if(!fp) return 2;
if(fseek(fp, 0, SEEK_SET)) return 3;
if(fwrite(&magic, sizeof(uint32_t), 1, fp) != 1) return 4;
if(fwrite(&two, sizeof(uint16_t), 1, fp) != 1) return 5;
if(fwrite(p, sizeof(uint8_t), 58, fp) != 58) return 6;
//Empty zoom headers
if(bw->hdr->nLevels) {
for(two=0; two<bw->hdr->nLevels; two++) {
if(fwrite(p, sizeof(uint8_t), 24, fp) != 24) return 9;
}
}
//Update summaryOffset and write an empty summary block
bw->hdr->summaryOffset = ftell(fp);
if(fwrite(p, sizeof(uint8_t), 40, fp) != 40) return 10;
if(writeAtPos(&(bw->hdr->summaryOffset), sizeof(uint64_t), 1, 0x2c, fp)) return 11;
//Write the chromosome list as a stupid freaking tree (because let's TREE ALL THE THINGS!!!)
bw->hdr->ctOffset = ftell(fp);
if(writeChromList(fp, bw->cl)) return 7;
if(writeAtPos(&(bw->hdr->ctOffset), sizeof(uint64_t), 1, 0x8, fp)) return 8;
//Update the dataOffset
bw->hdr->dataOffset = ftell(fp);
if(writeAtPos(&bw->hdr->dataOffset, sizeof(uint64_t), 1, 0x10, fp)) return 12;
//Save space for the number of blocks
if(fwrite(p, sizeof(uint8_t), 8, fp) != 8) return 13;
return 0;
}
static int insertIndexNode(bigWigFile_t *fp, bwRTreeNode_t *leaf) {
bwLL *l = malloc(sizeof(bwLL));
if(!l) return 1;
l->node = leaf;
l->next = NULL;
if(!fp->writeBuffer->firstIndexNode) {
fp->writeBuffer->firstIndexNode = l;
} else {
fp->writeBuffer->currentIndexNode->next = l;
}
fp->writeBuffer->currentIndexNode = l;
return 0;
}
//0 on success
static int appendIndexNodeEntry(bigWigFile_t *fp, uint32_t tid0, uint32_t tid1, uint32_t start, uint32_t end, uint64_t offset, uint64_t size) {
bwLL *n = fp->writeBuffer->currentIndexNode;
if(!n) return 1;
if(n->node->nChildren >= fp->writeBuffer->blockSize) return 2;
n->node->chrIdxStart[n->node->nChildren] = tid0;
n->node->baseStart[n->node->nChildren] = start;
n->node->chrIdxEnd[n->node->nChildren] = tid1;
n->node->baseEnd[n->node->nChildren] = end;
n->node->dataOffset[n->node->nChildren] = offset;
n->node->x.size[n->node->nChildren] = size;
n->node->nChildren++;
return 0;
}
//Returns 0 on success
static int addIndexEntry(bigWigFile_t *fp, uint32_t tid0, uint32_t tid1, uint32_t start, uint32_t end, uint64_t offset, uint64_t size) {
bwRTreeNode_t *node;
if(appendIndexNodeEntry(fp, tid0, tid1, start, end, offset, size)) {
//The last index node is full, we need to add a new one
node = calloc(1, sizeof(bwRTreeNode_t));
if(!node) return 1;
//Allocate and set the fields
node->isLeaf = 1;
node->nChildren = 1;
node->chrIdxStart = malloc(sizeof(uint32_t)*fp->writeBuffer->blockSize);
if(!node->chrIdxStart) goto error;
node->baseStart = malloc(sizeof(uint32_t)*fp->writeBuffer->blockSize);
if(!node->baseStart) goto error;
node->chrIdxEnd = malloc(sizeof(uint32_t)*fp->writeBuffer->blockSize);
if(!node->chrIdxEnd) goto error;
node->baseEnd = malloc(sizeof(uint32_t)*fp->writeBuffer->blockSize);
if(!node->baseEnd) goto error;
node->dataOffset = malloc(sizeof(uint64_t)*fp->writeBuffer->blockSize);
if(!node->dataOffset) goto error;
node->x.size = malloc(sizeof(uint64_t)*fp->writeBuffer->blockSize);
if(!node->x.size) goto error;
node->chrIdxStart[0] = tid0;
node->baseStart[0] = start;
node->chrIdxEnd[0] = tid1;
node->baseEnd[0] = end;
node->dataOffset[0] = offset;
node->x.size[0] = size;
if(insertIndexNode(fp, node)) goto error;
}
return 0;
error:
if(node->chrIdxStart) free(node->chrIdxStart);
if(node->baseStart) free(node->baseStart);
if(node->chrIdxEnd) free(node->chrIdxEnd);
if(node->baseEnd) free(node->baseEnd);
if(node->dataOffset) free(node->dataOffset);
if(node->x.size) free(node->x.size);
return 2;
}
/*
* TODO:
* The buffer size and compression sz need to be determined elsewhere (and p and compressP filled in!)
*/
static int flushBuffer(bigWigFile_t *fp) {
bwWriteBuffer_t *wb = fp->writeBuffer;
uLongf sz = wb->compressPsz;
uint16_t nItems;
if(!fp->writeBuffer->l) return 0;
if(!wb->ltype) return 0;
//Fill in the header
if(!memcpy((char*)wb->p, &(wb->tid), sizeof(uint32_t))) return 1;
if(!memcpy((char*)wb->p+4, &(wb->start), sizeof(uint32_t))) return 2;
if(!memcpy((char*)wb->p+8, &(wb->end), sizeof(uint32_t))) return 3;
if(!memcpy((char*)wb->p+12, &(wb->step), sizeof(uint32_t))) return 4;
if(!memcpy((char*)wb->p+16, &(wb->span), sizeof(uint32_t))) return 5;
if(!memcpy((char*)wb->p+20, &(wb->ltype), sizeof(uint8_t))) return 6;
//1 byte padding
//Determine the number of items
switch(wb->ltype) {
case 1:
nItems = (wb->l-24)/12;
break;
case 2:
nItems = (wb->l-24)/8;
break;
case 3:
nItems = (wb->l-24)/4;
break;
default:
return 7;
}
if(!memcpy((char*)wb->p+22, &nItems, sizeof(uint16_t))) return 8;
if(sz) {
//compress
if(compress(wb->compressP, &sz, wb->p, wb->l) != Z_OK) return 9;
//write the data to disk
if(fwrite(wb->compressP, sizeof(uint8_t), sz, fp->URL->x.fp) != sz) return 10;
} else {
sz = wb->l;
if(fwrite(wb->p, sizeof(uint8_t), wb->l, fp->URL->x.fp) != wb->l) return 10;
}
//Add an entry into the index
if(addIndexEntry(fp, wb->tid, wb->tid, wb->start, wb->end, bwTell(fp)-sz, sz)) return 11;
wb->nBlocks++;
wb->l = 24;
return 0;
}
static void updateStats(bigWigFile_t *fp, uint32_t span, float val) {
if(val < fp->hdr->minVal) fp->hdr->minVal = val;
else if(val > fp->hdr->maxVal) fp->hdr->maxVal = val;
fp->hdr->nBasesCovered += span;
fp->hdr->sumData += span*val;
fp->hdr->sumSquared += span*pow(val,2);
fp->writeBuffer->nEntries++;
fp->writeBuffer->runningWidthSum += span;
}
//12 bytes per entry
int bwAddIntervals(bigWigFile_t *fp, const char* const* chrom, const uint32_t *start, const uint32_t *end, const float *values, uint32_t n) {
uint32_t tid = 0, i;
const char *lastChrom = NULL;
bwWriteBuffer_t *wb = fp->writeBuffer;
if(!n) return 0; //Not an error per se
if(!fp->isWrite) return 1;
if(!wb) return 2;
//Flush if needed
if(wb->ltype != 1) if(flushBuffer(fp)) return 3;
if(wb->l+36 > fp->hdr->bufSize) if(flushBuffer(fp)) return 4;
lastChrom = chrom[0];
tid = bwGetTid(fp, chrom[0]);
if(tid == (uint32_t) -1) return 5;
if(tid != wb->tid) {
if(flushBuffer(fp)) return 6;
wb->tid = tid;
wb->start = start[0];
wb->end = end[0];
}
//Ensure that everything is set correctly
wb->ltype = 1;
if(wb->l <= 24) {
wb->start = start[0];
wb->span = 0;
wb->step = 0;
}
if(!memcpy((char*)wb->p+wb->l, start, sizeof(uint32_t))) return 7;
if(!memcpy((char*)wb->p+wb->l+4, end, sizeof(uint32_t))) return 8;
if(!memcpy((char*)wb->p+wb->l+8, values, sizeof(float))) return 9;
updateStats(fp, end[0]-start[0], values[0]);
wb->l += 12;
for(i=1; i<n; i++) {
if(strcmp(chrom[i],lastChrom) != 0) {
wb->end = end[i-1];
flushBuffer(fp);
lastChrom = chrom[i];
tid = bwGetTid(fp, chrom[i]);
if(tid == (uint32_t) -1) return 10;
wb->tid = tid;
wb->start = start[i];
}
if(wb->l+12 > fp->hdr->bufSize) { //12 bytes/entry
wb->end = end[i-1];
flushBuffer(fp);
wb->start = start[i];
}
if(!memcpy((char*)wb->p+wb->l, &(start[i]), sizeof(uint32_t))) return 11;
if(!memcpy((char*)wb->p+wb->l+4, &(end[i]), sizeof(uint32_t))) return 12;
if(!memcpy((char*)wb->p+wb->l+8, &(values[i]), sizeof(float))) return 13;
updateStats(fp, end[i]-start[i], values[i]);
wb->l += 12;
}
wb->end = end[i-1];
return 0;
}
int bwAppendIntervals(bigWigFile_t *fp, const uint32_t *start, const uint32_t *end, const float *values, uint32_t n) {
uint32_t i;
bwWriteBuffer_t *wb = fp->writeBuffer;
if(!n) return 0;
if(!fp->isWrite) return 1;
if(!wb) return 2;
if(wb->ltype != 1) return 3;
for(i=0; i<n; i++) {
if(wb->l+12 > fp->hdr->bufSize) {
if(i>0) { //otherwise it's already set
wb->end = end[i-1];
}
flushBuffer(fp);
wb->start = start[i];
}
if(!memcpy((char*)wb->p+wb->l, &(start[i]), sizeof(uint32_t))) return 4;
if(!memcpy((char*)wb->p+wb->l+4, &(end[i]), sizeof(uint32_t))) return 5;
if(!memcpy((char*)wb->p+wb->l+8, &(values[i]), sizeof(float))) return 6;
updateStats(fp, end[i]-start[i], values[i]);
wb->l += 12;
}
wb->end = end[i-1];
return 0;
}
//8 bytes per entry
int bwAddIntervalSpans(bigWigFile_t *fp, const char *chrom, const uint32_t *start, uint32_t span, const float *values, uint32_t n) {
uint32_t i, tid;
bwWriteBuffer_t *wb = fp->writeBuffer;
if(!n) return 0;
if(!fp->isWrite) return 1;
if(!wb) return 2;
if(wb->ltype != 2) if(flushBuffer(fp)) return 3;
if(flushBuffer(fp)) return 4;
tid = bwGetTid(fp, chrom);
if(tid == (uint32_t) -1) return 5;
wb->tid = tid;
wb->start = start[0];
wb->step = 0;
wb->span = span;
wb->ltype = 2;
for(i=0; i<n; i++) {
if(wb->l + 8 >= fp->hdr->bufSize) { //8 bytes/entry
if(i) wb->end = start[i-1]+span;
flushBuffer(fp);
wb->start = start[i];
}
if(!memcpy((char*)wb->p+wb->l, &(start[i]), sizeof(uint32_t))) return 5;
if(!memcpy((char*)wb->p+wb->l+4, &(values[i]), sizeof(float))) return 6;
updateStats(fp, span, values[i]);
wb->l += 8;
}
wb->end = start[n-1] + span;
return 0;
}
int bwAppendIntervalSpans(bigWigFile_t *fp, const uint32_t *start, const float *values, uint32_t n) {
uint32_t i;
bwWriteBuffer_t *wb = fp->writeBuffer;
if(!n) return 0;
if(!fp->isWrite) return 1;
if(!wb) return 2;
if(wb->ltype != 2) return 3;
for(i=0; i<n; i++) {
if(wb->l + 8 >= fp->hdr->bufSize) {
if(i) wb->end = start[i-1]+wb->span;
flushBuffer(fp);
wb->start = start[i];
}
if(!memcpy((char*)wb->p+wb->l, &(start[i]), sizeof(uint32_t))) return 4;
if(!memcpy((char*)wb->p+wb->l+4, &(values[i]), sizeof(float))) return 5;
updateStats(fp, wb->span, values[i]);
wb->l += 8;
}
wb->end = start[n-1] + wb->span;
return 0;
}
//4 bytes per entry
int bwAddIntervalSpanSteps(bigWigFile_t *fp, const char *chrom, uint32_t start, uint32_t span, uint32_t step, const float *values, uint32_t n) {
uint32_t i, tid;
bwWriteBuffer_t *wb = fp->writeBuffer;
if(!n) return 0;
if(!fp->isWrite) return 1;
if(!wb) return 2;
if(wb->ltype != 3) flushBuffer(fp);
if(flushBuffer(fp)) return 3;
tid = bwGetTid(fp, chrom);
if(tid == (uint32_t) -1) return 4;
wb->tid = tid;
wb->start = start;
wb->step = step;
wb->span = span;
wb->ltype = 3;
for(i=0; i<n; i++) {
if(wb->l + 4 >= fp->hdr->bufSize) {
wb->end = wb->start + ((wb->l-24)>>2) * step;
flushBuffer(fp);
wb->start = wb->end;
}
if(!memcpy((char*)wb->p+wb->l, &(values[i]), sizeof(float))) return 5;
updateStats(fp, wb->span, values[i]);
wb->l += 4;
}
wb->end = wb->start + (wb->l>>2) * step;
return 0;
}
int bwAppendIntervalSpanSteps(bigWigFile_t *fp, const float *values, uint32_t n) {
uint32_t i;
bwWriteBuffer_t *wb = fp->writeBuffer;
if(!n) return 0;
if(!fp->isWrite) return 1;
if(!wb) return 2;
if(wb->ltype != 3) return 3;
for(i=0; i<n; i++) {
if(wb->l + 4 >= fp->hdr->bufSize) {
wb->end = wb->start + ((wb->l-24)>>2) * wb->step;
flushBuffer(fp);
wb->start = wb->end;
}
if(!memcpy((char*)wb->p+wb->l, &(values[i]), sizeof(float))) return 4;
updateStats(fp, wb->span, values[i]);
wb->l += 4;
}
wb->end = wb->start + (wb->l>>2) * wb->step;
return 0;
}
//0 on success
int writeSummary(bigWigFile_t *fp) {
if(writeAtPos(&(fp->hdr->nBasesCovered), sizeof(uint64_t), 1, fp->hdr->summaryOffset, fp->URL->x.fp)) return 1;
if(writeAtPos(&(fp->hdr->minVal), sizeof(double), 1, fp->hdr->summaryOffset+8, fp->URL->x.fp)) return 2;
if(writeAtPos(&(fp->hdr->maxVal), sizeof(double), 1, fp->hdr->summaryOffset+16, fp->URL->x.fp)) return 3;
if(writeAtPos(&(fp->hdr->sumData), sizeof(double), 1, fp->hdr->summaryOffset+24, fp->URL->x.fp)) return 4;
if(writeAtPos(&(fp->hdr->sumSquared), sizeof(double), 1, fp->hdr->summaryOffset+32, fp->URL->x.fp)) return 5;
return 0;
}
static bwRTreeNode_t *makeEmptyNode(uint32_t blockSize) {
bwRTreeNode_t *n = calloc(1, sizeof(bwRTreeNode_t));
if(!n) return NULL;
n->chrIdxStart = malloc(blockSize*sizeof(uint32_t));
if(!n->chrIdxStart) goto error;
n->baseStart = malloc(blockSize*sizeof(uint32_t));
if(!n->baseStart) goto error;
n->chrIdxEnd = malloc(blockSize*sizeof(uint32_t));
if(!n->chrIdxEnd) goto error;
n->baseEnd = malloc(blockSize*sizeof(uint32_t));
if(!n->baseEnd) goto error;
n->dataOffset = calloc(blockSize,sizeof(uint64_t)); //This MUST be 0 for node writing!
if(!n->dataOffset) goto error;
n->x.child = malloc(blockSize*sizeof(uint64_t));
if(!n->x.child) goto error;
return n;
error:
if(n->chrIdxStart) free(n->chrIdxStart);
if(n->baseStart) free(n->baseStart);
if(n->chrIdxEnd) free(n->chrIdxEnd);
if(n->baseEnd) free(n->baseEnd);
if(n->dataOffset) free(n->dataOffset);
if(n->x.child) free(n->x.child);
free(n);
return NULL;
}
//Returns 0 on success. This doesn't attempt to clean up!
static bwRTreeNode_t *addLeaves(bwLL **ll, uint64_t *sz, uint64_t toProcess, uint32_t blockSize) {
uint32_t i;
uint64_t foo;
bwRTreeNode_t *n = makeEmptyNode(blockSize);
if(!n) return NULL;
if(toProcess <= blockSize) {
for(i=0; i<toProcess; i++) {
n->chrIdxStart[i] = (*ll)->node->chrIdxStart[0];
n->baseStart[i] = (*ll)->node->baseStart[0];
n->chrIdxEnd[i] = (*ll)->node->chrIdxEnd[(*ll)->node->nChildren-1];
n->baseEnd[i] = (*ll)->node->baseEnd[(*ll)->node->nChildren-1];
n->x.child[i] = (*ll)->node;
*sz += 4 + 32*(*ll)->node->nChildren;
*ll = (*ll)->next;
n->nChildren++;
}
} else {
for(i=0; i<blockSize; i++) {
foo = ceil(((double) toProcess)/((double) blockSize-i));
if(!ll) break;
n->x.child[i] = addLeaves(ll, sz, foo, blockSize);
if(!n->x.child[i]) goto error;
n->chrIdxStart[i] = n->x.child[i]->chrIdxStart[0];
n->baseStart[i] = n->x.child[i]->baseStart[0];
n->chrIdxEnd[i] = n->x.child[i]->chrIdxEnd[n->x.child[i]->nChildren-1];
n->baseEnd[i] = n->x.child[i]->baseEnd[n->x.child[i]->nChildren-1];
n->nChildren++;
toProcess -= foo;
}
}
*sz += 4 + 24*n->nChildren;
return n;
error:
bwDestroyIndexNode(n);
return NULL;
}
//Returns 1 on error
int writeIndexTreeNode(FILE *fp, bwRTreeNode_t *n, uint8_t *wrote, int level) {
uint8_t one = 0;
uint32_t i, j, vector[6] = {0, 0, 0, 0, 0, 0}; //The last 8 bytes are left as 0
if(n->isLeaf) return 0;
for(i=0; i<n->nChildren; i++) {
if(n->dataOffset[i]) { //traverse into child
if(n->isLeaf) return 0; //Only write leaves once!
if(writeIndexTreeNode(fp, n->x.child[i], wrote, level+1)) return 1;
} else {
n->dataOffset[i] = ftell(fp);
if(fwrite(&(n->x.child[i]->isLeaf), sizeof(uint8_t), 1, fp) != 1) return 1;
if(fwrite(&one, sizeof(uint8_t), 1, fp) != 1) return 1; //one byte of padding
if(fwrite(&(n->x.child[i]->nChildren), sizeof(uint16_t), 1, fp) != 1) return 1;
for(j=0; j<n->x.child[i]->nChildren; j++) {
vector[0] = n->x.child[i]->chrIdxStart[j];
vector[1] = n->x.child[i]->baseStart[j];
vector[2] = n->x.child[i]->chrIdxEnd[j];
vector[3] = n->x.child[i]->baseEnd[j];
if(n->x.child[i]->isLeaf) {
//Include the offset and size
if(fwrite(vector, sizeof(uint32_t), 4, fp) != 4) return 1;
if(fwrite(&(n->x.child[i]->dataOffset[j]), sizeof(uint64_t), 1, fp) != 1) return 1;
if(fwrite(&(n->x.child[i]->x.size[j]), sizeof(uint64_t), 1, fp) != 1) return 1;
} else {
if(fwrite(vector, sizeof(uint32_t), 6, fp) != 6) return 1;
}
}
*wrote = 1;
}
}
return 0;
}
//returns 1 on success
int writeIndexOffsets(FILE *fp, bwRTreeNode_t *n, uint64_t offset) {
uint32_t i;
if(n->isLeaf) return 0;
for(i=0; i<n->nChildren; i++) {
if(writeIndexOffsets(fp, n->x.child[i], n->dataOffset[i])) return 1;
if(writeAtPos(&(n->dataOffset[i]), sizeof(uint64_t), 1, offset+20+24*i, fp)) return 2;
}
return 0;
}
//Returns 0 on success
int writeIndexTree(bigWigFile_t *fp) {
uint64_t offset;
uint8_t wrote = 0;
int rv;
while((rv = writeIndexTreeNode(fp->URL->x.fp, fp->idx->root, &wrote, 0)) == 0) {
if(!wrote) break;
wrote = 0;
}
if(rv || wrote) return 1;
//Save the file position
offset = bwTell(fp);
//Write the offsets
if(writeIndexOffsets(fp->URL->x.fp, fp->idx->root, fp->idx->rootOffset)) return 2;
//Move the file pointer back to the end
bwSetPos(fp, offset);
return 0;
}
//Returns 0 on success. The original state SHOULD be preserved on error
int writeIndex(bigWigFile_t *fp) {
uint32_t four = IDX_MAGIC;
uint64_t idxSize = 0, foo;
uint8_t one = 0;
uint32_t i, vector[6] = {0, 0, 0, 0, 0, 0}; //The last 8 bytes are left as 0
bwLL *ll = fp->writeBuffer->firstIndexNode, *p;
bwRTreeNode_t *root = NULL;
if(!fp->writeBuffer->nBlocks) return 0;
fp->idx = malloc(sizeof(bwRTree_t));
if(!fp->idx) return 2;
fp->idx->root = root;
//Update the file header to indicate the proper index position
foo = bwTell(fp);
if(writeAtPos(&foo, sizeof(uint64_t), 1, 0x18, fp->URL->x.fp)) return 3;
//Make the tree
if(ll == fp->writeBuffer->currentIndexNode) {
root = ll->node;
idxSize = 4 + 24*root->nChildren;
} else {
root = addLeaves(&ll, &idxSize, ceil(((double)fp->writeBuffer->nBlocks)/fp->writeBuffer->blockSize), fp->writeBuffer->blockSize);
}
if(!root) return 4;
fp->idx->root = root;
ll = fp->writeBuffer->firstIndexNode;
while(ll) {
p = ll->next;
free(ll);
ll=p;
}
//write the header
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 5;
if(fwrite(&(fp->writeBuffer->blockSize), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 6;
if(fwrite(&(fp->writeBuffer->nBlocks), sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 7;
if(fwrite(&(root->chrIdxStart[0]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 8;
if(fwrite(&(root->baseStart[0]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 9;
if(fwrite(&(root->chrIdxEnd[root->nChildren-1]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 10;
if(fwrite(&(root->baseEnd[root->nChildren-1]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 11;
if(fwrite(&idxSize, sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 12;
four = 1;
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 13;
four = 0;
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 14; //padding
fp->idx->rootOffset = bwTell(fp);
//Write the root node, since writeIndexTree writes the children and fills in the offset
if(fwrite(&(root->isLeaf), sizeof(uint8_t), 1, fp->URL->x.fp) != 1) return 16;
if(fwrite(&one, sizeof(uint8_t), 1, fp->URL->x.fp) != 1) return 17; //one byte of padding
if(fwrite(&(root->nChildren), sizeof(uint16_t), 1, fp->URL->x.fp) != 1) return 18;
for(i=0; i<root->nChildren; i++) {
vector[0] = root->chrIdxStart[i];
vector[1] = root->baseStart[i];
vector[2] = root->chrIdxEnd[i];
vector[3] = root->baseEnd[i];
if(root->isLeaf) {
//Include the offset and size
if(fwrite(vector, sizeof(uint32_t), 4, fp->URL->x.fp) != 4) return 19;
if(fwrite(&(root->dataOffset[i]), sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 20;
if(fwrite(&(root->x.size[i]), sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 21;
} else {
root->dataOffset[i] = 0; //FIXME: Something upstream is setting this to impossible values (e.g., 0x21?!?!?)
if(fwrite(vector, sizeof(uint32_t), 6, fp->URL->x.fp) != 6) return 22;
}
}
//Write each level
if(writeIndexTree(fp)) return 23;
return 0;
}
//The first zoom level has a resolution of 4x mean entry size
//This may or may not produce the requested number of zoom levels
int makeZoomLevels(bigWigFile_t *fp) {
uint32_t meanBinSize, i;
uint32_t multiplier = 4, zoom = 10, maxZoom = 0;
uint16_t nLevels = 0;
meanBinSize = ((double) fp->writeBuffer->runningWidthSum)/(fp->writeBuffer->nEntries);
//In reality, one level is skipped
meanBinSize *= 4;
//N.B., we must ALWAYS check that the zoom doesn't overflow a uint32_t!
if(((uint32_t)-1)>>2 < meanBinSize) return 0; //No zoom levels!
if(meanBinSize*4 > zoom) zoom = multiplier*meanBinSize;
fp->hdr->zoomHdrs = calloc(1, sizeof(bwZoomHdr_t));
if(!fp->hdr->zoomHdrs) return 1;
fp->hdr->zoomHdrs->level = malloc(fp->hdr->nLevels * sizeof(uint32_t));
fp->hdr->zoomHdrs->dataOffset = calloc(fp->hdr->nLevels, sizeof(uint64_t));
fp->hdr->zoomHdrs->indexOffset = calloc(fp->hdr->nLevels, sizeof(uint64_t));
fp->hdr->zoomHdrs->idx = calloc(fp->hdr->nLevels, sizeof(bwRTree_t*));
if(!fp->hdr->zoomHdrs->level) return 2;
if(!fp->hdr->zoomHdrs->dataOffset) return 3;
if(!fp->hdr->zoomHdrs->indexOffset) return 4;
if(!fp->hdr->zoomHdrs->idx) return 5;
//There's no point in having a zoom level larger than the largest chromosome
//This will none the less allow at least one zoom level, which is generally needed for IGV et al.
for(i=0; i<fp->cl->nKeys; i++) {
if(fp->cl->len[i] > maxZoom) maxZoom = fp->cl->len[i];
}
if(zoom > maxZoom) zoom = maxZoom;
for(i=0; i<fp->hdr->nLevels; i++) {
if(zoom > maxZoom) break; //prevent absurdly large zoom levels
fp->hdr->zoomHdrs->level[i] = zoom;
nLevels++;
if(((uint32_t)-1)/multiplier < zoom) break;
zoom *= multiplier;
}
fp->hdr->nLevels = nLevels;
fp->writeBuffer->firstZoomBuffer = calloc(nLevels,sizeof(bwZoomBuffer_t*));
if(!fp->writeBuffer->firstZoomBuffer) goto error;
fp->writeBuffer->lastZoomBuffer = calloc(nLevels,sizeof(bwZoomBuffer_t*));
if(!fp->writeBuffer->lastZoomBuffer) goto error;
fp->writeBuffer->nNodes = calloc(nLevels, sizeof(uint64_t));
for(i=0; i<fp->hdr->nLevels; i++) {
fp->writeBuffer->firstZoomBuffer[i] = calloc(1, sizeof(bwZoomBuffer_t));
if(!fp->writeBuffer->firstZoomBuffer[i]) goto error;
fp->writeBuffer->firstZoomBuffer[i]->p = calloc(fp->hdr->bufSize/32, 32);
if(!fp->writeBuffer->firstZoomBuffer[i]->p) goto error;
fp->writeBuffer->firstZoomBuffer[i]->m = fp->hdr->bufSize;
((uint32_t*)fp->writeBuffer->firstZoomBuffer[i]->p)[0] = 0;
((uint32_t*)fp->writeBuffer->firstZoomBuffer[i]->p)[1] = 0;
((uint32_t*)fp->writeBuffer->firstZoomBuffer[i]->p)[2] = fp->hdr->zoomHdrs->level[i];
if(fp->hdr->zoomHdrs->level[i] > fp->cl->len[0]) ((uint32_t*)fp->writeBuffer->firstZoomBuffer[i]->p)[2] = fp->cl->len[0];
fp->writeBuffer->lastZoomBuffer[i] = fp->writeBuffer->firstZoomBuffer[i];
}
return 0;
error:
if(fp->writeBuffer->firstZoomBuffer) {
for(i=0; i<fp->hdr->nLevels; i++) {
if(fp->writeBuffer->firstZoomBuffer[i]) {
if(fp->writeBuffer->firstZoomBuffer[i]->p) free(fp->writeBuffer->firstZoomBuffer[i]->p);
free(fp->writeBuffer->firstZoomBuffer[i]);
}
}
free(fp->writeBuffer->firstZoomBuffer);
}
if(fp->writeBuffer->lastZoomBuffer) free(fp->writeBuffer->lastZoomBuffer);
if(fp->writeBuffer->nNodes) free(fp->writeBuffer->lastZoomBuffer);
return 6;
}
//Given an interval start, calculate the next one at a zoom level
void nextPos(bigWigFile_t *fp, uint32_t size, uint32_t *pos, uint32_t desiredTid) {
uint32_t *tid = pos;
uint32_t *start = pos+1;
uint32_t *end = pos+2;
*start += size;
if(*start >= fp->cl->len[*tid]) {
(*start) = 0;
(*tid)++;
}
//prevent needless iteration when changing chromosomes
if(*tid < desiredTid) {
*tid = desiredTid;
*start = 0;
}
(*end) = *start+size;
if(*end > fp->cl->len[*tid]) (*end) = fp->cl->len[*tid];
}
//Return the number of bases two intervals overlap
uint32_t overlapsInterval(uint32_t tid0, uint32_t start0, uint32_t end0, uint32_t tid1, uint32_t start1, uint32_t end1) {
if(tid0 != tid1) return 0;
if(end0 <= start1) return 0;
if(end1 <= start0) return 0;
if(end0 <= end1) {
if(start1 > start0) return end0-start1;
return end0-start0;
} else {
if(start1 > start0) return end1-start1;
return end1-start0;
}
}
//Returns the number of bases of the interval written
uint32_t updateInterval(bigWigFile_t *fp, bwZoomBuffer_t *buffer, double *sum, double *sumsq, uint32_t size, uint32_t tid, uint32_t start, uint32_t end, float value) {
uint32_t *p2 = (uint32_t*) buffer->p;
float *fp2 = (float*) p2;
uint32_t rv = 0, offset = 0;
if(!buffer) return 0;
if(buffer->l+32 >= buffer->m) return 0;
//Make sure that we don't overflow a uint32_t by adding some huge value to start
if(start + size < start) size = ((uint32_t) -1) - start;
if(buffer->l) {
offset = buffer->l/32;
} else {
p2[0] = tid;
p2[1] = start;
if(start+size < end) p2[2] = start+size;
else p2[2] = end;
}
//Do we have any overlap with the previously added interval?
if(offset) {
rv = overlapsInterval(p2[8*(offset-1)], p2[8*(offset-1)+1], p2[8*(offset-1)+1] + size, tid, start, end);
if(rv) {
p2[8*(offset-1)+2] = start + rv;
p2[8*(offset-1)+3] += rv;
if(fp2[8*(offset-1)+4] > value) fp2[8*(offset-1)+4] = value;
if(fp2[8*(offset-1)+5] < value) fp2[8*(offset-1)+5] = value;
*sum += rv*value;
*sumsq += rv*pow(value, 2.0);
return rv;
} else {
fp2[8*(offset-1)+6] = *sum;
fp2[8*(offset-1)+7] = *sumsq;
*sum = 0.0;
*sumsq = 0.0;
}
}
//If we move to a new interval then skip iterating over a bunch of obviously non-overlapping intervals
if(offset && p2[8*offset+2] == 0) {
p2[8*offset] = tid;
p2[8*offset+1] = start;
if(start+size < end) p2[8*offset+2] = start+size;
else p2[8*offset+2] = end;
//nextPos(fp, size, p2+8*offset, tid); //We can actually skip uncovered intervals
}
//Add a new entry
while(!(rv = overlapsInterval(p2[8*offset], p2[8*offset+1], p2[8*offset+1] + size, tid, start, end))) {
p2[8*offset] = tid;
p2[8*offset+1] = start;
if(start+size < end) p2[8*offset+2] = start+size;
else p2[8*offset+2] = end;
//nextPos(fp, size, p2+8*offset, tid);
}
p2[8*offset+3] = rv;
fp2[8*offset+4] = value; //min
fp2[8*offset+5] = value; //max
*sum += rv * value;
*sumsq += rv * pow(value,2.0);
buffer->l += 32;
return rv;
}
//Returns 0 on success
int addIntervalValue(bigWigFile_t *fp, uint64_t *nEntries, double *sum, double *sumsq, bwZoomBuffer_t *buffer, uint32_t itemsPerSlot, uint32_t zoom, uint32_t tid, uint32_t start, uint32_t end, float value) {
bwZoomBuffer_t *newBuffer = NULL;
uint32_t rv;
while(start < end) {
rv = updateInterval(fp, buffer, sum, sumsq, zoom, tid, start, end, value);
if(!rv) {
//Allocate a new buffer
newBuffer = calloc(1, sizeof(bwZoomBuffer_t));
if(!newBuffer) return 1;
newBuffer->p = calloc(itemsPerSlot, 32);
if(!newBuffer->p) goto error;
newBuffer->m = itemsPerSlot*32;
memcpy(newBuffer->p, (unsigned char*)buffer->p+buffer->l-32, 4);
memcpy((unsigned char*)newBuffer->p+4, (unsigned char*)buffer->p + buffer->l-28, 4);
((uint32_t*) newBuffer->p)[2] = ((uint32_t*) newBuffer->p)[1] + zoom;
*sum = *sumsq = 0.0;
rv = updateInterval(fp, newBuffer, sum, sumsq, zoom, tid, start, end, value);
if(!rv) goto error;
buffer->next = newBuffer;
buffer = buffer->next;
*nEntries += 1;
}
start += rv;
}
return 0;
error:
if(newBuffer) {
if(newBuffer->m) free(newBuffer->p);
free(newBuffer);
}
return 2;
}
//Get all of the intervals and add them to the appropriate zoomBuffer
int constructZoomLevels(bigWigFile_t *fp) {
bwOverlapIterator_t *it = NULL;
double *sum = NULL, *sumsq = NULL;
uint32_t i, j, k;
sum = calloc(fp->hdr->nLevels, sizeof(double));
sumsq = calloc(fp->hdr->nLevels, sizeof(double));
if(!sum || !sumsq) goto error;
for(i=0; i<fp->cl->nKeys; i++) {
it = bwOverlappingIntervalsIterator(fp, fp->cl->chrom[i], 0, fp->cl->len[i], 100000);
if(!it) goto error;
while(it->data != NULL){
for(j=0;j<it->intervals->l;j++){
for(k=0;k<fp->hdr->nLevels;k++){
if(addIntervalValue(fp, &(fp->writeBuffer->nNodes[k]), sum+k, sumsq+k, fp->writeBuffer->lastZoomBuffer[k], fp->hdr->bufSize/32, fp->hdr->zoomHdrs->level[k], i, it->intervals->start[j], it->intervals->end[j], it->intervals->value[j])) goto error;
while(fp->writeBuffer->lastZoomBuffer[k]->next) fp->writeBuffer->lastZoomBuffer[k] = fp->writeBuffer->lastZoomBuffer[k]->next;
}
}
it = bwIteratorNext(it);
}
bwIteratorDestroy(it);
}
//Make an index for each zoom level
for(i=0; i<fp->hdr->nLevels; i++) {
fp->hdr->zoomHdrs->idx[i] = calloc(1, sizeof(bwRTree_t));
if(!fp->hdr->zoomHdrs->idx[i]) return 1;
fp->hdr->zoomHdrs->idx[i]->blockSize = fp->writeBuffer->blockSize;
}
free(sum);
free(sumsq);
return 0;
error:
if(it) bwIteratorDestroy(it);
if(sum) free(sum);
if(sumsq) free(sumsq);
return 1;
}
int writeZoomLevels(bigWigFile_t *fp) {
uint64_t offset1, offset2, idxSize = 0;
uint32_t i, j, four = 0, last, vector[6] = {0, 0, 0, 0, 0, 0}; //The last 8 bytes are left as 0;
uint8_t wrote, one = 0;
uint16_t actualNLevels = 0;
int rv;
bwLL *ll, *p;
bwRTreeNode_t *root;
bwZoomBuffer_t *zb, *zb2;
bwWriteBuffer_t *wb = fp->writeBuffer;
uLongf sz;
for(i=0; i<fp->hdr->nLevels; i++) {
if(i) {
//Is this a duplicate level?
if(fp->writeBuffer->nNodes[i] == fp->writeBuffer->nNodes[i-1]) break;
}
actualNLevels++;
//reserve a uint32_t for the number of blocks
fp->hdr->zoomHdrs->dataOffset[i] = bwTell(fp);
fp->writeBuffer->nBlocks = 0;
fp->writeBuffer->l = 24;
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 1;
zb = fp->writeBuffer->firstZoomBuffer[i];
fp->writeBuffer->firstIndexNode = NULL;
fp->writeBuffer->currentIndexNode = NULL;
while(zb) {
sz = fp->hdr->bufSize;
if(compress(wb->compressP, &sz, zb->p, zb->l) != Z_OK) return 2;
//write the data to disk
if(fwrite(wb->compressP, sizeof(uint8_t), sz, fp->URL->x.fp) != sz) return 3;
//Add an entry into the index
last = (zb->l - 32)>>2;
if(addIndexEntry(fp, ((uint32_t*)zb->p)[0], ((uint32_t*)zb->p)[last], ((uint32_t*)zb->p)[1], ((uint32_t*)zb->p)[last+2], bwTell(fp)-sz, sz)) return 4;
wb->nBlocks++;
wb->l = 24;
zb = zb->next;
}
if(writeAtPos(&(wb->nBlocks), sizeof(uint32_t), 1, fp->hdr->zoomHdrs->dataOffset[i], fp->URL->x.fp)) return 5;
//Make the tree
ll = fp->writeBuffer->firstIndexNode;
if(ll == fp->writeBuffer->currentIndexNode) {
root = ll->node;
idxSize = 4 + 24*root->nChildren;
} else {
root = addLeaves(&ll, &idxSize, ceil(((double)fp->writeBuffer->nBlocks)/fp->writeBuffer->blockSize), fp->writeBuffer->blockSize);
}
if(!root) return 4;
fp->hdr->zoomHdrs->idx[i]->root = root;
ll = fp->writeBuffer->firstIndexNode;
while(ll) {
p = ll->next;
free(ll);
ll=p;
}
//write the index
wrote = 0;
fp->hdr->zoomHdrs->indexOffset[i] = bwTell(fp);
four = IDX_MAGIC;
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 1;
root = fp->hdr->zoomHdrs->idx[i]->root;
if(fwrite(&(fp->writeBuffer->blockSize), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 6;
if(fwrite(&(fp->writeBuffer->nBlocks), sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 7;
if(fwrite(&(root->chrIdxStart[0]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 8;
if(fwrite(&(root->baseStart[0]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 9;
if(fwrite(&(root->chrIdxEnd[root->nChildren-1]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 10;
if(fwrite(&(root->baseEnd[root->nChildren-1]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 11;
if(fwrite(&idxSize, sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 12;
four = fp->hdr->bufSize/32;
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 13;
four = 0;
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 14; //padding
fp->hdr->zoomHdrs->idx[i]->rootOffset = bwTell(fp);
//Write the root node, since writeIndexTree writes the children and fills in the offset
offset1 = bwTell(fp);
if(fwrite(&(root->isLeaf), sizeof(uint8_t), 1, fp->URL->x.fp) != 1) return 16;
if(fwrite(&one, sizeof(uint8_t), 1, fp->URL->x.fp) != 1) return 17; //one byte of padding
if(fwrite(&(root->nChildren), sizeof(uint16_t), 1, fp->URL->x.fp) != 1) return 18;
for(j=0; j<root->nChildren; j++) {
vector[0] = root->chrIdxStart[j];
vector[1] = root->baseStart[j];
vector[2] = root->chrIdxEnd[j];
vector[3] = root->baseEnd[j];
if(root->isLeaf) {
//Include the offset and size
if(fwrite(vector, sizeof(uint32_t), 4, fp->URL->x.fp) != 4) return 19;
if(fwrite(&(root->dataOffset[j]), sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 20;
if(fwrite(&(root->x.size[j]), sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 21;
} else {
if(fwrite(vector, sizeof(uint32_t), 6, fp->URL->x.fp) != 6) return 22;
}
}
while((rv = writeIndexTreeNode(fp->URL->x.fp, fp->hdr->zoomHdrs->idx[i]->root, &wrote, 0)) == 0) {
if(!wrote) break;
wrote = 0;
}
if(rv || wrote) return 6;
//Save the file position
offset2 = bwTell(fp);
//Write the offsets
if(writeIndexOffsets(fp->URL->x.fp, root, offset1)) return 2;
//Move the file pointer back to the end
bwSetPos(fp, offset2);
//Free the linked list
zb = fp->writeBuffer->firstZoomBuffer[i];
while(zb) {
if(zb->p) free(zb->p);
zb2 = zb->next;
free(zb);
zb = zb2;
}
fp->writeBuffer->firstZoomBuffer[i] = NULL;
}
//Free unused zoom levels
for(i=actualNLevels; i<fp->hdr->nLevels; i++) {
zb = fp->writeBuffer->firstZoomBuffer[i];
while(zb) {
if(zb->p) free(zb->p);
zb2 = zb->next;
free(zb);
zb = zb2;
}
fp->writeBuffer->firstZoomBuffer[i] = NULL;
}
//Write the zoom headers to disk
offset1 = bwTell(fp);
if(bwSetPos(fp, 0x40)) return 7;
four = 0;
for(i=0; i<actualNLevels; i++) {
if(fwrite(&(fp->hdr->zoomHdrs->level[i]), sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 8;
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 9;
if(fwrite(&(fp->hdr->zoomHdrs->dataOffset[i]), sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 10;
if(fwrite(&(fp->hdr->zoomHdrs->indexOffset[i]), sizeof(uint64_t), 1, fp->URL->x.fp) != 1) return 11;
}
//Write the number of levels if needed
if(bwSetPos(fp, 0x6)) return 12;
if(fwrite(&actualNLevels, sizeof(uint16_t), 1, fp->URL->x.fp) != 1) return 13;
if(bwSetPos(fp, offset1)) return 14;
return 0;
}
//0 on success
int bwFinalize(bigWigFile_t *fp) {
uint32_t four;
uint64_t offset;
if(!fp->isWrite) return 0;
//Flush the buffer
if(flushBuffer(fp)) return 1; //Valgrind reports a problem here!
//Update the data section with the number of blocks written
if(fp->hdr) {
if(writeAtPos(&(fp->writeBuffer->nBlocks), sizeof(uint64_t), 1, fp->hdr->dataOffset, fp->URL->x.fp)) return 2;
} else {
//The header wasn't written!
return 1;
}
//write the bufferSize
if(fp->hdr->bufSize) {
if(writeAtPos(&(fp->hdr->bufSize), sizeof(uint32_t), 1, 0x34, fp->URL->x.fp)) return 2;
}
//write the summary information
if(writeSummary(fp)) return 3;
//Convert the linked-list to a tree and write to disk
if(writeIndex(fp)) return 4;
//Zoom level stuff here?
if(fp->hdr->nLevels && fp->writeBuffer->nBlocks) {
offset = bwTell(fp);
if(makeZoomLevels(fp)) return 5;
if(constructZoomLevels(fp)) return 6;
bwSetPos(fp, offset);
if(writeZoomLevels(fp)) return 7; //This write nLevels as well
}
//write magic at the end of the file
four = BIGWIG_MAGIC;
if(fwrite(&four, sizeof(uint32_t), 1, fp->URL->x.fp) != 1) return 9;
return 0;
}
/*
data chunk:
uint64_t number of blocks (2 / 110851)
some blocks
an uncompressed data block (24 byte header)
uint32_t Tid 0-4
uint32_t start 4-8
uint32_t end 8-12
uint32_t step 12-16
uint32_t span 16-20
uint8_t type 20
uint8_t padding
uint16_t nItems 22
nItems of:
type 1: //12 bytes
uint32_t start
uint32_t end
float value
type 2: //8 bytes
uint32_t start
float value
type 3: //4 bytes
float value
data block index header
uint32_t magic
uint32_t blockSize (256 in the example) maximum number of children
uint64_t number of blocks (2 / 110851)
uint32_t startTid
uint32_t startPos
uint32_t endTid
uint32_t endPos
uint64_t index size? (0x1E7 / 0x1AF0401F) index address?
uint32_t itemsPerBlock (1 / 1) 1024 for zoom headers 1024 for zoom headers
uint32_t padding
data block index node non-leaf (4 bytes + 24*nChildren)
uint8_t isLeaf
uint8_t padding
uint16_t nChildren (2, 256)
uint32_t startTid
uint32_t startPos
uint32_t endTid
uint32_t endPos
uint64_t dataOffset (0x1AF05853, 0x1AF07057)
data block index node leaf (4 bytes + 32*nChildren)
uint8_t isLeaf
uint8_t padding
uint16_t nChildren (2)
uint32_t startTid
uint32_t startPos
uint32_t endTid
uint32_t endPos
uint64_t dataOffset (0x198, 0x1CF)
uint64_t dataSize (55, 24)
zoom data block
uint32_t number of blocks (10519766)
some data blocks
*/
================================================
FILE: libBigWig/io.c
================================================
#ifndef NOCURL
#include <curl/curl.h>
#endif
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "bigWigIO.h"
#include <inttypes.h>
#include <errno.h>
size_t GLOBAL_DEFAULTBUFFERSIZE;
#ifndef NOCURL
uint64_t getContentLength(const URL_t *URL) {
double size;
if(curl_easy_getinfo(URL->x.curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD, &size) != CURLE_OK) {
return 0;
}
if(size== -1.0) return 0;
return (uint64_t) size;
}
//Fill the buffer, note that URL may be left in an unusable state on error!
CURLcode urlFetchData(URL_t *URL, unsigned long bufSize) {
CURLcode rv;
char range[1024];
if(URL->filePos != (size_t) -1) URL->filePos += URL->bufLen;
else URL->filePos = 0;
URL->bufPos = URL->bufLen = 0; //Otherwise, we can't copy anything into the buffer!
sprintf(range,"%lu-%lu", URL->filePos, URL->filePos+bufSize-1);
rv = curl_easy_setopt(URL->x.curl, CURLOPT_RANGE, range);
if(rv != CURLE_OK) {
fprintf(stderr, "[urlFetchData] Couldn't set the range (%s)\n", range);
return rv;
}
rv = curl_easy_perform(URL->x.curl);
errno = 0; //Sometimes curl_easy_perform leaves a random errno remnant
return rv;
}
//Read data into a buffer, ideally from a buffer already in memory
//The loop is likely no longer needed.
size_t url_fread(void *obuf, size_t obufSize, URL_t *URL) {
size_t remaining = obufSize, fetchSize;
void *p = obuf;
CURLcode rv;
while(remaining) {
if(!URL->bufLen) {
rv = urlFetchData(URL, URL->bufSize);
if(rv != CURLE_OK) {
fprintf(stderr, "[url_fread] urlFetchData (A) returned %s\n", curl_easy_strerror(rv));
return 0;
}
} else if(URL->bufLen < URL->bufPos + remaining) { //Copy the remaining buffer and reload the buffer as needed
p = memcpy(p, URL->memBuf+URL->bufPos, URL->bufLen - URL->bufPos);
if(!p) return 0;
p += URL->bufLen - URL->bufPos;
remaining -= URL->bufLen - URL->bufPos;
if(remaining) {
if(!URL->isCompressed) {
fetchSize = URL->bufSize;
} else {
fetchSize = (remaining<URL->bufSize)?remaining:URL->bufSize;
}
rv = urlFetchData(URL, fetchSize);
if(rv != CURLE_OK) {
fprintf(stderr, "[url_fread] urlFetchData (B) returned %s\n", curl_easy_strerror(rv));
return 0;
}
}
} else {
p = memcpy(p, URL->memBuf+URL->bufPos, remaining);
if(!p) return 0;
URL->bufPos += remaining;
remaining = 0;
}
}
return obufSize;
}
#endif
//Returns the number of bytes requested or a smaller number on error
//Note that in the case of remote files, the actual amount read may be less than the return value!
size_t urlRead(URL_t *URL, void *buf, size_t bufSize) {
#ifndef NOCURL
if(URL->type==0) {
return fread(buf, bufSize, 1, URL->x.fp)*bufSize;
} else {
return url_fread(buf, bufSize, URL);
}
#else
return fread(buf, bufSize, 1, URL->x.fp)*bufSize;
#endif
}
size_t bwFillBuffer(const void *inBuf, size_t l, size_t nmemb, void *pURL) {
URL_t *URL = (URL_t*) pURL;
void *p = URL->memBuf;
size_t copied = l*nmemb;
if(!p) return 0;
p += URL->bufLen;
if(l*nmemb > URL->bufSize - URL->bufPos) { //We received more than we can store!
copied = URL->bufSize - URL->bufLen;
}
memcpy(p, inBuf, copied);
URL->bufLen += copied;
if(!URL->memBuf) return 0; //signal error
return copied;
}
//Seek to an arbitrary location, returning a CURLcode
//Note that a local file returns CURLE_OK on success or CURLE_FAILED_INIT on any error;
CURLcode urlSeek(URL_t *URL, size_t pos) {
#ifndef NOCURL
char range[1024];
CURLcode rv;
if(URL->type == BWG_FILE) {
#endif
if(fseek(URL->x.fp, pos, SEEK_SET) == 0) {
errno = 0;
return CURLE_OK;
} else {
return CURLE_FAILED_INIT; //This is arbitrary
}
#ifndef NOCURL
} else {
//If the location is covered by the buffer then don't seek!
if(pos < URL->filePos || pos >= URL->filePos+URL->bufLen) {
URL->filePos = pos;
URL->bufLen = 0; //Otherwise, filePos will get incremented on the next read!
URL->bufPos = 0;
//Maybe this works for FTP?
sprintf(range,"%lu-%lu", pos, pos+URL->bufSize-1);
rv = curl_easy_setopt(URL->x.curl, CURLOPT_RANGE, range);
if(rv != CURLE_OK) {
fprintf(stderr, "[urlSeek] Couldn't set the range (%s)\n", range);
return rv;
}
rv = curl_easy_perform(URL->x.curl);
if(rv != CURLE_OK) {
fprintf(stderr, "[urlSeek] curl_easy_perform received an error!\n");
}
errno = 0; //Don't propogate remnant resolved libCurl errors
return rv;
} else {
URL->bufPos = pos-URL->filePos;
return CURLE_OK;
}
}
#endif
}
URL_t *urlOpen(const char *fname, CURLcode (*callBack)(CURL*), const char *mode) {
URL_t *URL = calloc(1, sizeof(URL_t));
if(!URL) return NULL;
char *url = NULL, *req = NULL;
#ifndef NOCURL
CURLcode code;
char range[1024];
#endif
URL->fname = fname;
if((!mode) || (strchr(mode, 'w') == 0)) {
//Set the protocol
#ifndef NOCURL
if(strncmp(fname, "http://", 7) == 0) URL->type = BWG_HTTP;
else if(strncmp(fname, "https://", 8) == 0) URL->type = BWG_HTTPS;
else if(strncmp(fname, "ftp://", 6) == 0) URL->type = BWG_FTP;
else URL->type = BWG_FILE;
#else
URL->type = BWG_FILE;
#endif
//local file?
if(URL->type == BWG_FILE) {
URL->filePos = -1; //This signals that nothing has been read
URL->x.fp = fopen(fname, "rb");
if(!(URL->x.fp)) {
free(URL);
fprintf(stderr, "[urlOpen] Couldn't open %s for reading\n", fname);
return NULL;
}
#ifndef NOCURL
} else {
//Remote file, set up the memory buffer and get CURL ready
URL->memBuf = malloc(GLOBAL_DEFAULTBUFFERSIZE);
if(!(URL->memBuf)) {
free(URL);
fprintf(stderr, "[urlOpen] Couldn't allocate enough space for the file buffer!\n");
return NULL;
}
URL->bufSize = GLOBAL_DEFAULTBUFFERSIZE;
URL->x.curl = curl_easy_init();
if(!(URL->x.curl)) {
fprintf(stderr, "[urlOpen] curl_easy_init() failed!\n");
goto error;
}
//Negotiate a reasonable HTTP authentication method
if(curl_easy_setopt(URL->x.curl, CURLOPT_HTTPAUTH, CURLAUTH_ANY) != CURLE_OK) {
fprintf(stderr, "[urlOpen] Failed instructing curl to use any HTTP authentication it finds to be suitable!\n");
goto error;
}
//Follow redirects
if(curl_easy_setopt(URL->x.curl, CURLOPT_FOLLOWLOCATION, 1L) != CURLE_OK) {
fprintf(stderr, "[urlOpen] Failed instructing curl to follow redirects!\n");
goto error;
}
//Set the URL
if(curl_easy_setopt(URL->x.curl, CURLOPT_URL, fname) != CURLE_OK) {
fprintf(stderr, "[urlOpen] Couldn't set CURLOPT_URL!\n");
goto error;
}
//Set the range, which doesn't do anything for HTTP
sprintf(range, "0-%lu", URL->bufSize-1);
if(curl_easy_setopt(URL->x.curl, CURLOPT_RANGE, range) != CURLE_OK) {
fprintf(stderr, "[urlOpen] Couldn't set CURLOPT_RANGE (%s)!\n", range);
goto error;
}
//Set the callback info, which means we no longer need to directly deal with sockets and header!
if(curl_easy_setopt(URL->x.curl, CURLOPT_WRITEFUNCTION, bwFillBuffer) != CURLE_OK) {
fprintf(stderr, "[urlOpen] Couldn't set CURLOPT_WRITEFUNCTION!\n");
goto error;
}
if(curl_easy_setopt(URL->x.curl, CURLOPT_WRITEDATA, (void*)URL) != CURLE_OK) {
fprintf(stderr, "[urlOpen] Couldn't set CURLOPT_WRITEDATA!\n");
goto error;
}
//Ignore certificate errors with https, libcurl just isn't reliable enough with conda
if(curl_easy_setopt(URL->x.curl, CURLOPT_SSL_VERIFYPEER, 0) != CURLE_OK) {
fprintf(stderr, "[urlOpen] Couldn't set CURLOPT_SSL_VERIFYPEER to 0!\n");
goto error;
}
if(curl_easy_setopt(URL->x.curl, CURLOPT_SSL_VERIFYHOST, 0) != CURLE_OK) {
fprintf(stderr, "[urlOpen] Couldn't set CURLOPT_SSL_VERIFYHOST to 0!\n");
goto error;
}
if(callBack) {
code = callBack(URL->x.curl);
if(code != CURLE_OK) {
fprintf(stderr, "[urlOpen] The user-supplied call back function returned an error: %s\n", curl_easy_strerror(code));
goto error;
}
}
code = curl_easy_perform(URL->x.curl);
errno = 0; //Sometimes curl_easy_perform leaves a random errno remnant
if(code != CURLE_OK) {
fprintf(stderr, "[urlOpen] curl_easy_perform received an error: %s\n", curl_easy_strerror(code));
goto error;
}
#endif
}
} else {
URL->type = BWG_FILE;
URL->x.fp = fopen(fname, mode);
if(!(URL->x.fp)) {
free(URL);
fprintf(stderr, "[urlOpen] Couldn't open %s for writing\n", fname);
return NULL;
}
}
if(url) free(url);
if(req) free(req);
return URL;
#ifndef NOCURL
error:
if(url) free(url);
if(req) free(req);
free(URL->memBuf);
curl_easy_cleanup(URL->x.curl);
free(URL);
return NULL;
#endif
}
//Performs the necessary free() operations and handles cleaning up curl
void urlClose(URL_t *URL) {
if(URL->type == BWG_FILE) {
fclose(URL->x.fp);
#ifndef NOCURL
} else {
free(URL->memBuf);
curl_easy_cleanup(URL->x.curl);
#endif
}
free(URL);
}
================================================
FILE: pyBigWig.c
================================================
#include <Python.h>
#include <inttypes.h>
#include "pyBigWig.h"
#ifdef WITHNUMPY
#include <float.h>
#include "numpy/npy_common.h"
#include "numpy/halffloat.h"
#include "numpy/ndarrayobject.h"
#include "numpy/arrayscalars.h"
int lsize = NPY_SIZEOF_LONG;
//Raises an exception on error, which should be checked
uint32_t getNumpyU32(PyArrayObject *obj, Py_ssize_t i) {
int dtype;
char *p;
uint32_t o = 0;
npy_intp stride;
//Get the dtype
dtype = PyArray_TYPE(obj);
//Get the stride
stride = PyArray_STRIDE(obj, 0);
p = PyArray_BYTES(obj) + i*stride;
switch(dtype) {
case NPY_INT8:
if(((int8_t *) p)[0] < 0) {
PyErr_SetString(PyExc_RuntimeError, "Received an integer < 0!\n");
goto error;
}
o += ((int8_t *) p)[0];
break;
case NPY_INT16:
if(((int16_t *) p)[0] < 0) {
PyErr_SetString(PyExc_RuntimeError, "Received an integer < 0!\n");
goto error;
}
o += ((int16_t *) p)[0];
break;
case NPY_INT32:
if(((int32_t *) p)[0] < 0) {
PyErr_SetString(PyExc_RuntimeError, "Received an integer < 0!\n");
goto error;
}
o += ((int32_t *) p)[0];
break;
case NPY_INT64:
if(((int64_t *) p)[0] < 0) {
PyErr_SetString(PyExc_RuntimeError, "Received an integer < 0!\n");
goto error;
}
o += ((int64_t *) p)[0];
break;
case NPY_UINT8:
o += ((uint8_t *) p)[0];
break;
case NPY_UINT16:
o += ((uint16_t *) p)[0];
break;
case NPY_UINT32:
o += ((uint32_t *) p)[0];
break;
case NPY_UINT64:
if(((uint64_t *) p)[0] > (uint32_t) -1) {
PyErr_SetString(PyExc_RuntimeError, "Received an integer larger than possible for a 32bit unsigned integer!\n");
goto error;
}
o += ((uint64_t *) p)[0];
break;
default:
PyErr_SetString(PyExc_RuntimeError, "Received unknown data type for conversion to uint32_t!\n");
goto error;
break;
}
return o;
error:
return 0;
};
long getNumpyL(PyObject *obj) {
short s;
int i;
long l;
long long ll;
unsigned short us;
unsigned int ui;
unsigned long ul;
unsigned long long ull;
if(!PyArray_IsIntegerScalar(obj)) {
PyErr_SetString(PyExc_RuntimeError, "Received non-Integer scalar type for conversion to long!\n");
return 0;
}
if(PyArray_IsScalar(obj, Short)) {
s = ((PyShortScalarObject *)obj)->obval;
l = s;
} else if(PyArray_IsScalar(obj, Int)) {
i = ((PyLongScalarObject *)obj)->obval;
l = i;
} else if(PyArray_IsScalar(obj, Long)) {
l = ((PyLongScalarObject *)obj)->obval;
} else if(PyArray_IsScalar(obj, LongLong)) {
ll = ((PyLongScalarObject *)obj)->obval;
l = ll;
} else if(PyArray_IsScalar(obj, UShort)) {
us = ((PyLongScalarObject *)obj)->obval;
l = us;
} else if(PyArray_IsScalar(obj, UInt)) {
ui = ((PyLongScalarObject *)obj)->obval;
l = ui;
} else if(PyArray_IsScalar(obj, ULong)) {
ul = ((PyLongScalarObject *)obj)->obval;
l = ul;
} else if(PyArray_IsScalar(obj, ULongLong)) {
ull = ((PyLongScalarObject *)obj)->obval;
l = ull;
} else {
PyErr_SetString(PyExc_RuntimeError, "Received unknown scalar type for conversion to long!\n");
return 0;
}
return l;
}
//Raises an exception on error, which should be checked
float getNumpyF(PyArrayObject *obj, Py_ssize_t i) {
int dtype;
char *p;
float o = 0.0;
npy_intp stride;
//Get the dtype
dtype = PyArray_TYPE(obj);
//Get the stride
stride = PyArray_STRIDE(obj, 0);
p = PyArray_BYTES(obj) + i*stride;
switch(dtype) {
case NPY_FLOAT16:
return npy_half_to_float(((npy_half*)p)[0]);
case NPY_FLOAT32:
return ((float*)p)[0];
case NPY_FLOAT64:
if(((double*)p)[0] > FLT_MAX) {
PyErr_SetString(PyExc_RuntimeError, "Received a floating point value greater than possible for a 32-bit float!\n");
goto error;
}
if(((double*)p)[0] < -FLT_MAX) {
PyErr_SetString(PyExc_RuntimeError, "Received a floating point value less than possible for a 32-bit float!\n");
goto error;
}
o += ((double*)p)[0];
return o;
default:
PyErr_SetString(PyExc_RuntimeError, "Received unknown data type for conversion to float!\n");
goto error;
break;
}
return o;
error:
return 0;
}
//The calling function needs to free the result
char *getNumpyStr(PyArrayObject *obj, Py_ssize_t i) {
char *p , *o = NULL;
npy_intp stride, j;
int dtype;
//Get the dtype
dtype = PyArray_TYPE(obj);
//Get the stride
stride = PyArray_STRIDE(obj, 0);
p = PyArray_BYTES(obj) + i*stride;
switch(dtype) {
case NPY_STRING:
o = calloc(1, stride + 1);
strncpy(o, p, stride);
return o;
case NPY_UNICODE:
o = calloc(1, stride/4 + 1);
for(j=0; j<stride/4; j++) o[j] = (char) ((uint32_t*)p)[j];
return o;
default:
PyErr_SetString(PyExc_RuntimeError, "Received unknown data type!\n");
break;
}
return NULL;
}
#endif
//Return 1 if there are any entries at all
int hasEntries(bigWigFile_t *bw) {
if(bw->hdr->indexOffset != 0) return 1; // No index, no entries pyBigWig issue #111
//if(bw->hdr->nBasesCovered > 0) return 1; // Sometimes headers are broken
return 0;
}
PyObject* pyBwEnter(pyBigWigFile_t*self, PyObject *args) {
bigWigFile_t *bw = self->bw;
if(!bw) {
PyErr_SetString(PyExc_RuntimeError, "The bigWig file handle is not opened!");
return NULL;
}
Py_INCREF(self);
return (PyObject*) self;
}
PyObject* pyBwOpen(PyObject *self, PyObject *pyFname) {
char *fname = NULL;
char *mode = "r";
pyBigWigFile_t *pybw;
bigWigFile_t *bw = NULL;
if(!PyArg_ParseTuple(pyFname, "s|s", &fname, &mode)) goto error;
//Open the local/remote file
if(strchr(mode, 'w') != NULL || bwIsBigWig(fname, NULL)) {
bw = bwOpen(fname, NULL, mode);
} else {
gitextract_xavh5mxs/ ├── .environmentLinux.yaml ├── .github/ │ └── workflows/ │ ├── build.yml │ └── pypi.yml ├── .gitignore ├── .gitmodules ├── LICENSE.txt ├── MANIFEST.in ├── README.md ├── libBigWig/ │ ├── LICENSE │ ├── README.md │ ├── bigWig.h │ ├── bigWigIO.h │ ├── bwCommon.h │ ├── bwRead.c │ ├── bwStats.c │ ├── bwValues.c │ ├── bwValues.h │ ├── bwWrite.c │ └── io.c ├── pyBigWig.c ├── pyBigWig.h ├── pyBigWigTest/ │ ├── __init__.py │ ├── test.bigBed │ ├── test.bw │ └── test.py ├── pyproject.toml ├── setup.cfg └── setup.py
SYMBOL INDEX (197 symbols across 11 files)
FILE: libBigWig/bigWig.h
type CURLcode (line 68) | typedef int CURLcode;
type CURL (line 69) | typedef void CURL;
type bwStatsType (line 103) | enum bwStatsType {
type bwZoomHdr_t (line 122) | typedef struct {
type bigWigHdr_t (line 135) | typedef struct {
type chromList_t (line 160) | typedef struct {
type bwLL (line 168) | typedef struct bwLL bwLL;
type bwLL (line 169) | struct bwLL {
type bwZoomBuffer_t (line 173) | typedef struct bwZoomBuffer_t bwZoomBuffer_t;
type bwZoomBuffer_t (line 174) | struct bwZoomBuffer_t { //each individual entry takes 32 bytes
type bwWriteBuffer_t (line 185) | typedef struct {
type bigWigFile_t (line 210) | typedef struct {
type bwOverlappingIntervals_t (line 223) | typedef struct {
type bbOverlappingEntries_t (line 234) | typedef struct {
type bwOverlapIterator_t (line 246) | typedef struct {
type bwStatsType (line 462) | enum bwStatsType
type bwStatsType (line 476) | enum bwStatsType
FILE: libBigWig/bigWigIO.h
type CURLcode (line 10) | typedef int CURLcode;
type CURL (line 11) | typedef void CURL;
type bigWigFile_type_enum (line 28) | enum bigWigFile_type_enum {
type URL_t (line 38) | typedef struct {
FILE: libBigWig/bwRead.c
function bwTell (line 11) | long bwTell(bigWigFile_t *fp) {
function bwSetPos (line 19) | int bwSetPos(bigWigFile_t *fp, size_t pos) {
function bwRead (line 26) | size_t bwRead(void *data, size_t sz, size_t nmemb, bigWigFile_t *fp) {
function bwInit (line 38) | int bwInit(size_t defaultBufSize) {
function bwCleanup (line 52) | void bwCleanup() {
function bwZoomHdr_t (line 58) | static bwZoomHdr_t *bwReadZoomHdrs(bigWigFile_t *bw) {
function bwHdrDestroy (line 109) | static void bwHdrDestroy(bigWigHdr_t *hdr) {
function bwHdrRead (line 124) | static void bwHdrRead(bigWigFile_t *bw) {
function destroyChromList (line 171) | static void destroyChromList(chromList_t *cl) {
function readChromLeaf (line 184) | static uint64_t readChromLeaf(bigWigFile_t *bw, chromList_t *cl, uint32_...
function readChromNonLeaf (line 209) | static uint64_t readChromNonLeaf(bigWigFile_t *bw, chromList_t *cl, uint...
function readChromBlock (line 227) | static uint64_t readChromBlock(bigWigFile_t *bw, chromList_t *cl, uint32...
function chromList_t (line 240) | static chromList_t *bwReadChromList(bigWigFile_t *bw) {
function bwDestroyWriteBuffer (line 280) | static void bwDestroyWriteBuffer(bwWriteBuffer_t *wb) {
function bwClose (line 289) | void bwClose(bigWigFile_t *fp) {
function bwIsBigWig (line 302) | int bwIsBigWig(const char *fname, CURLcode (*callBack) (CURL*)) {
function bbIsBigBed (line 332) | int bbIsBigBed(const char *fname, CURLcode (*callBack) (CURL*)) {
function bigWigFile_t (line 345) | bigWigFile_t *bwOpen(const char *fname, CURLcode (*callBack) (CURL*), co...
function bigWigFile_t (line 397) | bigWigFile_t *bbOpen(const char *fname, CURLcode (*callBack) (CURL*)) {
FILE: libBigWig/bwStats.c
function determineZoomLevel (line 11) | static int32_t determineZoomLevel(const bigWigFile_t *fp, int basesPerBi...
type val_t (line 29) | struct val_t {
type vals_t (line 35) | struct vals_t {
function destroyVals_t (line 41) | void destroyVals_t(struct vals_t *v) {
function getScalar (line 50) | double getScalar(uint32_t i_start, uint32_t i_end, uint32_t b_start, uin...
type vals_t (line 63) | struct vals_t
type vals_t (line 68) | struct vals_t
type val_t (line 69) | struct val_t
type vals_t (line 79) | struct vals_t
type val_t (line 82) | struct val_t
type val_t (line 112) | struct val_t
type val_t (line 115) | struct val_t
function blockMean (line 139) | static double blockMean(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint...
function intMean (line 168) | static double intMean(bwOverlappingIntervals_t* ints, uint32_t start, ui...
function blockDev (line 187) | static double blockDev(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint3...
function intDev (line 223) | static double intDev(bwOverlappingIntervals_t* ints, uint32_t start, uin...
function blockMax (line 246) | static double blockMax(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint3...
function intMax (line 276) | static double intMax(bwOverlappingIntervals_t* ints) {
function blockMin (line 290) | static double blockMin(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint3...
function intMin (line 318) | static double intMin(bwOverlappingIntervals_t* ints) {
function blockCoverage (line 333) | static double blockCoverage(bigWigFile_t *fp, bwOverlapBlock_t *blocks, ...
function intCoverage (line 359) | static double intCoverage(bwOverlappingIntervals_t* ints, uint32_t start...
function blockSum (line 376) | static double blockSum(bigWigFile_t *fp, bwOverlapBlock_t *blocks, uint3...
function intSum (line 405) | static double intSum(bwOverlappingIntervals_t* ints, uint32_t start, uin...
type bwStatsType (line 423) | enum bwStatsType
type bwStatsType (line 485) | enum bwStatsType
type bwStatsType (line 530) | enum bwStatsType
FILE: libBigWig/bwValues.c
function roundup (line 9) | static uint32_t roundup(uint32_t v) {
function bwRTree_t (line 21) | static bwRTree_t *readRTreeIdx(bigWigFile_t *fp, uint64_t offset) {
function bwRTreeNode_t (line 64) | static bwRTreeNode_t *bwGetRTreeNode(bigWigFile_t *fp, uint64_t offset) {
function destroyBWOverlapBlock (line 124) | void destroyBWOverlapBlock(bwOverlapBlock_t *b) {
function bwOverlapBlock_t (line 132) | static bwOverlapBlock_t *overlapsLeaf(bwRTreeNode_t *node, uint32_t tid,...
function bwOverlapBlock_t (line 194) | static bwOverlapBlock_t *mergeOverlapBlocks(bwOverlapBlock_t *b1, bwOver...
function bwOverlapBlock_t (line 226) | static bwOverlapBlock_t *overlapsNonLeaf(bigWigFile_t *fp, bwRTreeNode_t...
function bwOverlapBlock_t (line 275) | bwOverlapBlock_t *walkRTreeNodes(bigWigFile_t *bw, bwRTreeNode_t *root, ...
function bwGetTid (line 282) | uint32_t bwGetTid(const bigWigFile_t *fp, const char *chrom) {
function bwOverlapBlock_t (line 291) | static bwOverlapBlock_t *bwGetOverlappingBlocks(bigWigFile_t *fp, const ...
function bwFillDataHdr (line 313) | void bwFillDataHdr(bwDataHeader_t *hdr, void *b) {
function bwDestroyOverlappingIntervals (line 323) | void bwDestroyOverlappingIntervals(bwOverlappingIntervals_t *o) {
function bbDestroyOverlappingEntries (line 331) | void bbDestroyOverlappingEntries(bbOverlappingEntries_t *o) {
function bwOverlappingIntervals_t (line 346) | static bwOverlappingIntervals_t *pushIntervals(bwOverlappingIntervals_t ...
function bbOverlappingEntries_t (line 366) | static bbOverlappingEntries_t *pushBBIntervals(bbOverlappingEntries_t *o...
function bwOverlappingIntervals_t (line 390) | bwOverlappingIntervals_t *bwGetOverlappingIntervalsCore(bigWigFile_t *fp...
function bbOverlappingEntries_t (line 486) | bbOverlappingEntries_t *bbGetOverlappingEntriesCore(bigWigFile_t *fp, bw...
function bwOverlappingIntervals_t (line 561) | bwOverlappingIntervals_t *bwGetOverlappingIntervals(bigWigFile_t *fp, co...
function bbOverlappingEntries_t (line 573) | bbOverlappingEntries_t *bbGetOverlappingEntries(bigWigFile_t *fp, const ...
function bwOverlapIterator_t (line 585) | bwOverlapIterator_t *bwOverlappingIntervalsIterator(bigWigFile_t *fp, co...
function bwOverlapIterator_t (line 613) | bwOverlapIterator_t *bbOverlappingEntriesIterator(bigWigFile_t *fp, cons...
function bwIteratorDestroy (line 641) | void bwIteratorDestroy(bwOverlapIterator_t *iter) {
function bwOverlapIterator_t (line 650) | bwOverlapIterator_t *bwIteratorNext(bwOverlapIterator_t *iter) {
function bwOverlappingIntervals_t (line 711) | bwOverlappingIntervals_t *bwGetValues(bigWigFile_t *fp, const char *chro...
function bwDestroyIndexNode (line 761) | void bwDestroyIndexNode(bwRTreeNode_t *node) {
function bwDestroyIndex (line 782) | void bwDestroyIndex(bwRTree_t *idx) {
function bwRTree_t (line 789) | bwRTree_t *bwReadIndex(bigWigFile_t *fp, uint64_t offset) {
FILE: libBigWig/bwValues.h
type bwRTreeNode_t (line 20) | typedef struct bwRTreeNode_t {
type bwRTree_t (line 39) | typedef struct {
type bwOverlapBlock_t (line 56) | typedef struct {
type bwDataHeader_t (line 67) | typedef struct {
FILE: libBigWig/bwWrite.c
type val_t (line 10) | struct val_t {
function chromList_t (line 22) | chromList_t *bwCreateChromList(const char* const* chroms, const uint32_t...
function bwCreateHdr (line 56) | int bwCreateHdr(bigWigFile_t *fp, int32_t maxZooms) {
function writeAtPos (line 85) | static int writeAtPos(void *ptr, size_t sz, size_t nmemb, size_t pos, FI...
function writeChromList (line 94) | static int writeChromList(FILE *fp, chromList_t *cl) {
function bwWriteHdr (line 187) | int bwWriteHdr(bigWigFile_t *bw) {
function insertIndexNode (line 230) | static int insertIndexNode(bigWigFile_t *fp, bwRTreeNode_t *leaf) {
function appendIndexNodeEntry (line 246) | static int appendIndexNodeEntry(bigWigFile_t *fp, uint32_t tid0, uint32_...
function addIndexEntry (line 262) | static int addIndexEntry(bigWigFile_t *fp, uint32_t tid0, uint32_t tid1,...
function flushBuffer (line 312) | static int flushBuffer(bigWigFile_t *fp) {
function updateStats (line 362) | static void updateStats(bigWigFile_t *fp, uint32_t span, float val) {
function bwAddIntervals (line 374) | int bwAddIntervals(bigWigFile_t *fp, const char* const* chrom, const uin...
function bwAppendIntervals (line 434) | int bwAppendIntervals(bigWigFile_t *fp, const uint32_t *start, const uin...
function bwAddIntervalSpans (line 462) | int bwAddIntervalSpans(bigWigFile_t *fp, const char *chrom, const uint32...
function bwAppendIntervalSpans (line 495) | int bwAppendIntervalSpans(bigWigFile_t *fp, const uint32_t *start, const...
function bwAddIntervalSpanSteps (line 520) | int bwAddIntervalSpanSteps(bigWigFile_t *fp, const char *chrom, uint32_t...
function bwAppendIntervalSpanSteps (line 552) | int bwAppendIntervalSpanSteps(bigWigFile_t *fp, const float *values, uin...
function writeSummary (line 576) | int writeSummary(bigWigFile_t *fp) {
function bwRTreeNode_t (line 585) | static bwRTreeNode_t *makeEmptyNode(uint32_t blockSize) {
function bwRTreeNode_t (line 616) | static bwRTreeNode_t *addLeaves(bwLL **ll, uint64_t *sz, uint64_t toProc...
function writeIndexTreeNode (line 657) | int writeIndexTreeNode(FILE *fp, bwRTreeNode_t *n, uint8_t *wrote, int l...
function writeIndexOffsets (line 694) | int writeIndexOffsets(FILE *fp, bwRTreeNode_t *n, uint64_t offset) {
function writeIndexTree (line 706) | int writeIndexTree(bigWigFile_t *fp) {
function writeIndex (line 731) | int writeIndex(bigWigFile_t *fp) {
function makeZoomLevels (line 808) | int makeZoomLevels(bigWigFile_t *fp) {
function nextPos (line 884) | void nextPos(bigWigFile_t *fp, uint32_t size, uint32_t *pos, uint32_t de...
function overlapsInterval (line 905) | uint32_t overlapsInterval(uint32_t tid0, uint32_t start0, uint32_t end0,...
function updateInterval (line 919) | uint32_t updateInterval(bigWigFile_t *fp, bwZoomBuffer_t *buffer, double...
function addIntervalValue (line 984) | int addIntervalValue(bigWigFile_t *fp, uint64_t *nEntries, double *sum, ...
function constructZoomLevels (line 1021) | int constructZoomLevels(bigWigFile_t *fp) {
function writeZoomLevels (line 1066) | int writeZoomLevels(bigWigFile_t *fp) {
function bwFinalize (line 1229) | int bwFinalize(bigWigFile_t *fp) {
FILE: libBigWig/io.c
function getContentLength (line 15) | uint64_t getContentLength(const URL_t *URL) {
function CURLcode (line 25) | CURLcode urlFetchData(URL_t *URL, unsigned long bufSize) {
function url_fread (line 47) | size_t url_fread(void *obuf, size_t obufSize, URL_t *URL) {
function urlRead (line 89) | size_t urlRead(URL_t *URL, void *buf, size_t bufSize) {
function bwFillBuffer (line 101) | size_t bwFillBuffer(const void *inBuf, size_t l, size_t nmemb, void *pUR...
function CURLcode (line 120) | CURLcode urlSeek(URL_t *URL, size_t pos) {
function URL_t (line 161) | URL_t *urlOpen(const char *fname, CURLcode (*callBack)(CURL*), const cha...
function urlClose (line 286) | void urlClose(URL_t *URL) {
FILE: pyBigWig.c
function getNumpyU32 (line 15) | uint32_t getNumpyU32(PyArrayObject *obj, Py_ssize_t i) {
function getNumpyL (line 83) | long getNumpyL(PyObject *obj) {
function getNumpyF (line 130) | float getNumpyF(PyArrayObject *obj, Py_ssize_t i) {
function hasEntries (line 199) | int hasEntries(bigWigFile_t *bw) {
function PyObject (line 205) | PyObject* pyBwEnter(pyBigWigFile_t*self, PyObject *args) {
function PyObject (line 218) | PyObject* pyBwOpen(PyObject *self, PyObject *pyFname) {
function pyBwDealloc (line 259) | static void pyBwDealloc(pyBigWigFile_t *self) {
function PyObject (line 264) | static PyObject *pyBwClose(pyBigWigFile_t *self, PyObject *args) {
function PyObject (line 272) | static PyObject *pyBwGetHeader(pyBigWigFile_t *self, PyObject *args) {
function PyObject (line 318) | static PyObject *pyBwGetChroms(pyBigWigFile_t *self, PyObject *args) {
function char2enum (line 364) | enum bwStatsType char2enum(char *s) {
function PyObject (line 652) | static PyObject *pyBwGetIntervals(pyBigWigFile_t *self, PyObject *args, ...
function PyString_Check (line 765) | int PyString_Check(PyObject *obj) {
function isNumeric (line 779) | int isNumeric(PyObject *obj) {
function Numeric2Uint (line 790) | uint32_t Numeric2Uint(PyObject *obj) {
function PyObject (line 807) | PyObject *pyBwAddHeader(pyBigWigFile_t *self, PyObject *args, PyObject *...
function isType0 (line 923) | int isType0(PyObject *chroms, PyObject *starts, PyObject *ends, PyObject...
function isType1 (line 1020) | int isType1(PyObject *chroms, PyObject *starts, PyObject *values, PyObje...
function isType2 (line 1072) | int isType2(PyObject *chroms, PyObject *starts, PyObject *values, PyObje...
function getType (line 1097) | int getType(PyObject *chroms, PyObject *starts, PyObject *ends, PyObject...
function else (line 1157) | else if(desiredType == 1) {
function else (line 1175) | else if(desiredType == 2) {
function PyAddIntervals (line 1197) | int PyAddIntervals(pyBigWigFile_t *self, PyObject *chroms, PyObject *sta...
function PyAppendIntervals (line 1275) | int PyAppendIntervals(pyBigWigFile_t *self, PyObject *starts, PyObject *...
function PyAddIntervalSpans (line 1335) | int PyAddIntervalSpans(pyBigWigFile_t *self, PyObject *chroms, PyObject ...
function PyAppendIntervalSpans (line 1400) | int PyAppendIntervalSpans(pyBigWigFile_t *self, PyObject *starts, PyObje...
function PyAddIntervalSpanSteps (line 1458) | int PyAddIntervalSpanSteps(pyBigWigFile_t *self, PyObject *chroms, PyObj...
function PyAppendIntervalSpanSteps (line 1507) | int PyAppendIntervalSpanSteps(pyBigWigFile_t *self, PyObject *values) {
function else (line 1612) | else if(type == 1) {
function else (line 1652) | else if(type == 2) {
function PyObject (line 1675) | PyObject *pyBwAddEntries(pyBigWigFile_t *self, PyObject *args, PyObject ...
function PyObject (line 1746) | static PyObject *pyBBGetEntries(pyBigWigFile_t *self, PyObject *args, Py...
function PyObject (line 1854) | static PyObject *pyBBGetSQL(pyBigWigFile_t *self, PyObject *args) {
function PyObject (line 1881) | static PyObject *pyIsBigWig(pyBigWigFile_t *self, PyObject *args) {
function PyObject (line 1892) | static PyObject *pyIsBigBed(pyBigWigFile_t *self, PyObject *args) {
function PyMODINIT_FUNC (line 1918) | PyMODINIT_FUNC initpyBigWig(void) {
FILE: pyBigWig.h
type pyBigWigFile_t (line 7) | typedef struct {
type pyBigWigmodule_state (line 391) | struct pyBigWigmodule_state {
FILE: pyBigWigTest/test.py
class TestRemote (line 8) | class TestRemote():
method doOpen (line 11) | def doOpen(self):
method doOpenWith (line 16) | def doOpenWith(self):
method doChroms (line 20) | def doChroms(self, bw):
method doHeader (line 25) | def doHeader(self, bw):
method doStats (line 28) | def doStats(self, bw):
method doValues (line 35) | def doValues(self, bw):
method doIntervals (line 40) | def doIntervals(self, bw):
method doSum (line 45) | def doSum(self, bw):
method doWrite (line 48) | def doWrite(self, bw):
method doWrite2 (line 85) | def doWrite2(self):
method doWriteEmpty (line 144) | def doWriteEmpty(self):
method doWriteNumpy (line 167) | def doWriteNumpy(self):
method testAll (line 193) | def testAll(self):
class TestLocal (line 209) | class TestLocal():
method testFoo (line 210) | def testFoo(self):
class TestBigBed (line 215) | class TestBigBed():
method testBigBed (line 216) | def testBigBed(self):
class TestNumpy (line 247) | class TestNumpy():
method testNumpy (line 248) | def testNumpy(self):
method testNumpyValues (line 312) | def testNumpyValues(self):
Condensed preview — 28 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (297K chars).
[
{
"path": ".environmentLinux.yaml",
"chars": 155,
"preview": "name: foo\nchannels:\n - conda-forge\n - bioconda\n - default\ndependencies:\n - gcc_linux-64\n - curl\n - zlib\n - python"
},
{
"path": ".github/workflows/build.yml",
"chars": 1260,
"preview": "name: Test\non: \n pull_request:\n push:\n\njobs:\n testLinux:\n name: Test Conda Linux\n runs-on: \"ubuntu-latest\"\n "
},
{
"path": ".github/workflows/pypi.yml",
"chars": 1070,
"preview": "name: pypi\non: [push]\njobs:\n pypi:\n name: upload to pypi\n runs-on: ubuntu-latest\n steps:\n - uses: actions/c"
},
{
"path": ".gitignore",
"chars": 749,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\n"
},
{
"path": ".gitmodules",
"chars": 0,
"preview": ""
},
{
"path": "LICENSE.txt",
"chars": 1078,
"preview": "The MIT License (MIT)\n\nCopyright (c) 2015 Devon Ryan\n\nPermission is hereby granted, free of charge, to any person obtain"
},
{
"path": "MANIFEST.in",
"chars": 27,
"preview": "include *.h\ninclude **/*.h\n"
},
{
"path": "README.md",
"chars": 18746,
"preview": "[](https://badge.fury.io/py/pyBigWig) [\n\nCopyright (c) 2015 Devon Ryan\n\nPermission is hereby granted, free of charge, to any person obtain"
},
{
"path": "libBigWig/README.md",
"chars": 12559,
"preview": " [:\n fname = "
},
{
"path": "pyproject.toml",
"chars": 1545,
"preview": "[build-system]\nbuild-backend = \"setuptools.build_meta\"\nrequires = [\"numpy >= 2.0.0\", \"setuptools\", \"setuptools-scm\"]\n\n[p"
},
{
"path": "setup.cfg",
"chars": 224,
"preview": "# This is required for setuptools to name the wheel with the correct\n# minimum python abi version\n# Commenting this out,"
},
{
"path": "setup.py",
"chars": 2430,
"preview": "#!/usr/bin/env python\nfrom setuptools import setup, Extension\nfrom distutils import sysconfig\nfrom pathlib import Path\ni"
}
]
// ... and 2 more files (download for full content)
About this extraction
This page contains the full source code of the deeptools/pyBigWig GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 28 files (278.9 KB), approximately 83.0k tokens, and a symbol index with 197 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.