Showing preview only (494K chars total). Download the full file or copy to clipboard to get everything.
Repository: Hi-Folks/statistics
Branch: main
Commit: 5d2408423c2a
Files: 62
Total size: 471.9 KB
Directory structure:
gitextract_47q7nj1_/
├── .editorconfig
├── .gitattributes
├── .github/
│ ├── CONTRIBUTING.md
│ ├── ISSUE_TEMPLATE/
│ │ └── config.yml
│ ├── SECURITY.md
│ ├── dependabot.yml
│ └── workflows/
│ ├── dependabot-auto-merge.yml
│ ├── run-tests.yml
│ └── static-code-analysis.yml
├── .gitignore
├── .php-cs-fixer.dist.php
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── README.md
├── TODO.md
├── composer.json
├── examples/
│ ├── article-boston-marathon-analysis.php
│ ├── article-downhill-ski-analysis.php
│ ├── article-gpx-running-analysis.php
│ ├── freq_methods.php
│ ├── frequencies.php
│ ├── kde.php
│ ├── kde_downhill.php
│ ├── norm_dist.php
│ ├── recipes_binomial_approximation.php
│ ├── recipes_classic_probability.php
│ ├── recipes_monte_carlo.php
│ ├── recipes_naive_bayes.php
│ ├── stat.php
│ └── stat_methods.php
├── phpstan.neon
├── phpunit.xml.dist
├── rector.php
├── src/
│ ├── ArrUtil.php
│ ├── Enums/
│ │ ├── Alternative.php
│ │ └── KdeKernel.php
│ ├── Exception/
│ │ └── InvalidDataInputException.php
│ ├── Freq.php
│ ├── Math.php
│ ├── NormalDist.php
│ ├── Stat.php
│ ├── Statistics.php
│ ├── StreamingStat.php
│ ├── StudentT.php
│ └── Utils/
│ ├── Arr.php
│ ├── Format.php
│ └── Math.php
└── tests/
├── ArrTest.php
├── FormatTest.php
├── FreqTest.php
├── FrequenciesTest.php
├── MathTest.php
├── NormalDistTest.php
├── StatDatasetTest.php
├── StatFromCsvTest.php
├── StatTest.php
├── StatisticTest.php
├── StreamingStatTest.php
├── StudentTTest.php
└── data/
└── income.data.csv
================================================
FILE CONTENTS
================================================
================================================
FILE: .editorconfig
================================================
root = true
[*]
charset = utf-8
indent_size = 4
indent_style = space
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.md]
trim_trailing_whitespace = false
[*.{yml,yaml}]
indent_size = 2
================================================
FILE: .gitattributes
================================================
# Path-based git attributes
# https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html
# Ignore all test and documentation with "export-ignore".
/.github export-ignore
/.gitattributes export-ignore
/.gitignore export-ignore
/phpunit.xml.dist export-ignore
/psalm.xml.dist export-ignore
/tests export-ignore
/.editorconfig export-ignore
/.php-cs-fixer.dist.php export-ignore
/art export-ignore
/docs export-ignore
/UPGRADING.md export-ignore
================================================
FILE: .github/CONTRIBUTING.md
================================================
# Contributing
Contributions are **welcome** and will be fully **credited**.
Please read and understand the contribution guide before creating an issue or pull request.
## Etiquette
This project is open source, and as such, the maintainers give their free time to build and maintain the source code
held within. They make the code freely available in the hope that it will be of use to other developers. It would be
extremely unfair for them to suffer abuse or anger for their hard work.
Please be considerate towards maintainers when raising issues or presenting pull requests. Let's show the
world that developers are civilized and selfless people.
It's the duty of the maintainer to ensure that all submissions to the project are of sufficient
quality to benefit the project. Many developers have different skillsets, strengths, and weaknesses. Respect the maintainer's decision, and do not be upset or abusive if your submission is not used.
## Viability
When requesting or submitting new features, first consider whether it might be useful to others. Open
source projects are used by many developers, who may have entirely different needs to your own. Think about
whether or not your feature is likely to be used by other users of the project.
## Procedure
Before filing an issue:
- Attempt to replicate the problem, to ensure that it wasn't a coincidental incident.
- Check to make sure your feature suggestion isn't already present within the project.
- Check the pull requests tab to ensure that the bug doesn't have a fix in progress.
- Check the pull requests tab to ensure that the feature isn't already in progress.
Before submitting a pull request:
- Check the codebase to ensure that your feature doesn't already exist.
- Check the pull requests to ensure that another person hasn't already submitted the feature or fix.
## Requirements
If the project maintainer has any additional requirements, you will find them listed here.
- **[PSR-2 Coding Standard](https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-2-coding-style-guide.md)** - The easiest way to apply the conventions is to install [PHP Code Sniffer](https://pear.php.net/package/PHP_CodeSniffer).
- **Add tests!** - Your patch won't be accepted if it doesn't have tests.
- **Document any change in behaviour** - Make sure the `README.md` and any other relevant documentation are kept up-to-date.
- **Consider our release cycle** - We try to follow [SemVer v2.0.0](https://semver.org/). Randomly breaking public APIs is not an option.
- **One pull request per feature** - If you want to do more than one thing, send multiple pull requests.
- **Send coherent history** - Make sure each individual commit in your pull request is meaningful. If you had to make multiple intermediate commits while developing, please [squash them](https://www.git-scm.com/book/en/v2/Git-Tools-Rewriting-History#Changing-Multiple-Commit-Messages) before submitting.
**Happy coding**!
================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: false
contact_links:
- name: Request a new feature
url: https://github.com/hi-folks/statistics/issues/new?labels=enhancement
about: Share ideas for new features / functions
- name: Report a bug
url: https://github.com/hi-folks/statistics/issues/new?labels=bug
about: Report a reproducable bug
- name: Documentation
url: https://github.com/hi-folks/statistics/issues/new?labels=documentation
about: Improvements or additions to documentation
================================================
FILE: .github/SECURITY.md
================================================
# Package Security Policy
## Reporting Security Issues
If you discover any security-related issues within our package, we take these matters seriously and encourage you to report them to us promptly. Your assistance in disclosing potential security vulnerabilities is highly appreciated.
To report a security issue, please send an email to us at [roberto.butti@gmail.com](mailto:roberto.butti@gmail.com). We request that you do not use public issue trackers or other public communication channels to report security concerns related to this package. This helps us maintain the confidentiality and integrity of the issue while we investigate and address it.
## Responsible Disclosure
We follow a responsible disclosure policy, and we kindly ask you to:
1. **Provide Sufficient Details**: When reporting a security issue, please include as much information as possible so that we can reproduce and understand the problem. This may include steps to reproduce, the affected component, and any proof-of-concept code if available.
2. **Allow Time for Resolution**: We will acknowledge the receipt of your report promptly and work diligently to assess and resolve the issue. We appreciate your patience and understanding during this process.
3. **Keep Information Confidential**: Please do not disclose or share the details of the security issue with others until we have addressed and resolved it. This helps protect our users and the security of our package.
4. **Do Not Impact Other Users**: Please refrain from taking any actions that may negatively impact the availability or integrity of our package or the data of other users.
If you are unsure whether a specific issue qualifies, please report it, and we will assess its validity.
Thank you for your cooperation in helping us maintain the security of our package and protecting our users. We value your contributions to our security efforts and we deeply appreciate your valuable contributions.
================================================
FILE: .github/dependabot.yml
================================================
# Please see the documentation for all configuration options:
# https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
labels:
- "dependencies"
================================================
FILE: .github/workflows/dependabot-auto-merge.yml
================================================
name: dependabot-auto-merge
on: pull_request_target
permissions:
pull-requests: write
contents: write
jobs:
dependabot:
runs-on: ubuntu-latest
if: ${{ github.actor == 'dependabot[bot]' }}
steps:
- name: Dependabot metadata
id: metadata
uses: dependabot/fetch-metadata@v2.5.0
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
- name: Auto-merge Dependabot PRs for semver-minor updates
if: ${{steps.metadata.outputs.update-type == 'version-update:semver-minor'}}
run: gh pr merge --auto --merge "$PR_URL"
env:
PR_URL: ${{github.event.pull_request.html_url}}
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
- name: Auto-merge Dependabot PRs for semver-patch updates
if: ${{steps.metadata.outputs.update-type == 'version-update:semver-patch'}}
run: gh pr merge --auto --merge "$PR_URL"
env:
PR_URL: ${{github.event.pull_request.html_url}}
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
================================================
FILE: .github/workflows/run-tests.yml
================================================
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: true
matrix:
os: [ubuntu-latest, windows-latest]
php: [8.2, 8.3, 8.4, 8.5]
exclude:
- os: windows-latest
php: [8.2, 8.3, 8.5]
name: P${{ matrix.php }} - ${{ matrix.os }}
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php }}
extensions: dom, curl, libxml, mbstring, zip, pcntl, bcmath, soap, intl, iconv, fileinfo
coverage: xdebug
- name: Setup problem matchers
run: |
echo "::add-matcher::${{ runner.tool_cache }}/php.json"
echo "::add-matcher::${{ runner.tool_cache }}/phpunit.json"
- name: Install dependencies
run: composer install --prefer-dist --no-interaction
- name: Execute tests
run: composer test
================================================
FILE: .github/workflows/static-code-analysis.yml
================================================
name: Static Code Analysis
on: [push, pull_request]
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: true
matrix:
os: [ubuntu-latest]
php: [8.4]
stability: [prefer-stable]
name: P${{ matrix.php }} - ${{ matrix.stability }} - ${{ matrix.os }}
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php }}
extensions: dom, curl, libxml, mbstring, zip, pcntl, bcmath, intl, iconv, fileinfo
coverage: none
- name: Setup problem matchers
run: |
echo "::add-matcher::${{ runner.tool_cache }}/php.json"
echo "::add-matcher::${{ runner.tool_cache }}/phpunit.json"
- name: Install dependencies
run: composer update --${{ matrix.stability }} --prefer-dist --no-interaction
- name: Execute static code analysis
run: vendor/bin/phpstan analyse src --level 8 --error-format=github --no-progress --no-ansi
================================================
FILE: .gitignore
================================================
.idea
.php_cs
.php_cs.cache
.phpunit.result.cache
build
bin
composer.lock
coverage
docs
phpunit.xml
psalm.xml
vendor
.php-cs-fixer.cache
/.phpunit.cache
================================================
FILE: .php-cs-fixer.dist.php
================================================
<?php
$finder = new PhpCsFixer\Finder()->in([
__DIR__ . "/src",
__DIR__ . "/tests",
__DIR__ . "/examples",
]);
return new PhpCsFixer\Config()
->setRules([
"@PER-CS" => true,
"@PHP82Migration" => true,
"class_attributes_separation" => [
"elements" => [
"const" => "one",
"method" => "one",
"property" => "one",
"trait_import" => "none",
],
],
"no_extra_blank_lines" => [
"tokens" => ["extra", "throw", "use"],
],
"no_blank_lines_after_class_opening" => true,
"no_blank_lines_after_phpdoc" => true,
"no_closing_tag" => true,
"no_empty_phpdoc" => true,
"no_empty_statement" => true,
//'strict_param' => true,
"array_indentation" => true,
"array_syntax" => ["syntax" => "short"],
"binary_operator_spaces" => [
"default" => "single_space",
"operators" => ["=>" => null],
],
"whitespace_after_comma_in_array" => true,
])
->setFinder($finder);
================================================
FILE: CHANGELOG.md
================================================
# Changelog
## 1.5.0 - 2026-03-07
- Adding `logarithmicRegression()`, `powerRegression()`, and `exponentialRegression()` methods for non-linear regression models
## 1.4.0 - 2026-03-03
- Adding `Utils\Arr` class with `extract()` method for multi-column extraction from arrays of associative arrays, and `partition()` method for splitting arrays into matching/non-matching groups by field condition (supports ==, !=, >, <, >=, <= operators)
- Adding `Utils\Format` class with `secondsToTime()`, `timeToSeconds()`, `secondsToHms()`, and `hmsToSeconds()` methods for time formatting and parsing
- Adding `Utils\Math` class — reorganized from root namespace for consistency with `Enums/` and `Exception/` sub-directories
- Reorganized `ArrUtil` and `Math` into `Utils\Arr` and `Utils\Math` sub-namespace; original classes remain as deprecated proxies for backward compatibility
- Updated internal references in `Stat`, `Freq`, `StreamingStat`, and `Statistics` to use the new `Utils` namespace
## 1.3.1 - 2026-02-23
- Adding `tTestTwoSample()` method for two-sample independent t-test (Welch's t-test) — compares the means of two independent groups without assuming equal variances
- Adding `tTestPaired()` method for paired t-test — tests whether the mean difference between paired observations (e.g. before/after) is significantly different from zero
- Adding `StudentT` class for the Student's t-distribution (pdf, cdf, invCdf) — building block for t-tests and confidence intervals with small samples
- Adding `tTest()` method for one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown
- Adding `zTest()` method for one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean (includes p-value calculation)
- Adding `Alternative` enum (`TwoSided`, `Greater`, `Less`) for hypothesis testing
- Adding `confidenceInterval()` method for computing confidence intervals for the mean using the normal (z) distribution
- Adding `rSquared()` method for R² (coefficient of determination) — proportion of variance explained by linear regression
## 1.3.0 - 2026-02-22
- Adding `StreamingStat` class (experimental) for streaming/online computation of mean, variance, stdev, skewness, kurtosis, sum, min, and max with O(1) memory
- Adding `percentile()` method for computing the value at any percentile (0–100) with linear interpolation
- Adding `coefficientOfVariation()` method for relative dispersion (CV%), supporting both sample and population modes
- Adding `trimmedMean()` method for robust central tendency — computes the mean after removing outliers from each side
- Adding `weightedMedian()` method for computing the median with weighted observations
- Adding `sem()` method for standard error of the mean
- Adding `meanAbsoluteDeviation()` method for mean absolute deviation — average distance from the mean
- Adding `medianAbsoluteDeviation()` method for median absolute deviation — robust dispersion measure resistant to outliers
- Adding `zscores()` method for computing z-scores of each value in a dataset
- Adding `outliers()` method for z-score based outlier detection with configurable threshold
- Adding `iqrOutliers()` method for IQR-based outlier detection (box plot whiskers), robust for skewed data
- Adding `rSquared()` method for R² (coefficient of determination) — proportion of variance explained by linear regression
## 1.2.5 - 2026-02-22
- Adding `kurtosis()` method for excess kurtosis
## 1.2.4 - 2026-02-21
- Adding `skewness()` method for adjusted Fisher-Pearson sample skewness
- Adding `pskewness()` method for population (biased) skewness
- Full Coverage Tests (adding some edge cases)
- Create KDE example
## 1.2.3 - 2026-02-21
- Adding `kde()` method for Kernel Density Estimation — returns a closure that estimates PDF or CDF from sample data, supporting 9 kernel functions with aliases
- Adding `kdeRandom()` method for random sampling from a Kernel Density Estimate — returns a closure that generates random floats from the KDE distribution
- Introducing `KdeKernel` backed string enum — `kde()` and `kdeRandom()`. It accepts `KdeKernel` enum cases
- Adding Kernel Density Estimation (KDE) examples
## 1.2.2 - 2026-02-21
- Adding `method` parameter to `quantiles()` supporting `'exclusive'` (default) and `'inclusive'` interpolation methods
- Adding `medianGrouped()` method for estimating the median of grouped/binned continuous data using interpolation
- Adding Spearman rank correlation via `method` parameter in `correlation()` (`method='ranked'`)
- Adding proportional linear regression via `proportional` parameter in `linearRegression()` for regression through the origin
- Adding optional pre-computed mean parameter to `variance()` (`xbar`) and `pvariance()` (`mu`)
## 1.2.1 - 2026-02-20
- Adding `invCdf()` method to normal distribution
- Adding `getVariance()` method to normal distribution (sigma squared)
- Adding `getMedian()` method to normal distribution (equals mean)
- Adding `getMode()` method to normal distribution (equals mean)
- Adding `quantiles()` method to normal distribution (divide into n equal-probability intervals)
- Adding `overlap()` method to normal distribution (overlapping coefficient between two distributions)
- Adding `zscore()` method to normal distribution (standard score)
- Adding `samples()` method to normal distribution (generate random samples with optional seed)
- Adding `subtract()` method to normal distribution (counterpart to add)
- Adding `divide()` method to normal distribution (counterpart to multiply)
## 1.2.0 - 2026-02-19
- Welcome to PHP 8.5
- Upgrading to PHPstan new rules (offsetAccess)
- Tests migrated from PestPHP 2 to PHPUnit 11
- Code Syntax checker from Pint to PHP CS Fixer
## 1.1.4 - 2025-04-25
- Adding `fmean()` method for computing the arithmetic mean with float numbers.
## 1.1.3 - 2024-12-14
- Adding `multiply()` method to scale NormalDist by a constant
## 1.1.2 - 2024-12-14
- Implementing `add()` method for NormalDist
## 1.1.1 - 2024-12-13
- Implementing fromSample method for NormalDist
## 1.1.0 - 2024-12-13
- Upgrading RectorPHP v 2
- Upgrading PHPStan v 2
## 1.0.2 - 2024-12-10
- NormalDist class, with `cdf()` and `pdf()`
- Fix deprecations for PHP 8.4
## 1.0.1 - 2024-11-21
- Welcome PHP 8.4
- Upgrading to Rector 1
## 1.0.0 - 2023-12-26
- Fixed `median()` function to handle unsorted data by @keatis
- Rector refactor
- PHPstan level 8
- Support for PHP 8.1 and above
- Add support for PHP 8.2 by @AmooAti
- Update to PestPHP v2 by @AmooAti
- Improving documentation (readme, contributing, code of conduct, security policies) by @AbhineshJha, @Arcturus22, @tvermaashutosh, @Abhishekgupta204, @Aryan4884
- Rector v0.18.5 by @sukuasoft
- Introducing Pint by @sukuasoft
- GitHub Actions: Updating actions/checkout v4
## 0.2.1 - 2022-02-22
- Linear regression
## 0.2.0 - 2022-02-21
- Raise Exception instead of returning null if there is no valid input. By Artem Trokhymchuk @trokhymchuk [thanks for the PR #15](https://github.com/Hi-Folks/statistics/pull/15);
- PHPStan, level 9
## 0.1.7 - 2022-02-19
- Code refactoring by @trokhymchuk
- Clean phpdoc blocks by @trokhymchuk
- Stat::correlation()
- PHPStan, level 8
## 0.1.6 -2022-02-17
- Stat::covariance()
## 0.1.5 - 2022-02-05
- frequencyTable()
- frequencyTableBySize()
- code refactoring and documenting some functions by Artem Trokhymchuk @trokhymchuk [thanks for the PR #2](https://github.com/Hi-Folks/statistics/pull/2)
- add tests for Math class
## 0.1.4 - 2022-01-30
- quantiles()
- firstQuartile()
- thirdQuartile()
## 0.1.3 - 2022-01-29
- geometricMean(): geometric mean
- harmonicMean(): harmonic mean and weighted harmonic mean
## 0.1.2 - 2022-01-28
- pstdev(): Population standard deviation
- stdev(): Sample standard deviation
- pvariance(): variance for a population
- variance(): variance for a sample
## 0.1.1 - 2022-01-27
- Create Freq class with static method for managing frequencies table
- Create Stat class with static methods for basci statistic functions like: mean, mode, median, multimode...
- Refactor Statistics class in order to use logic provided by Freq and Stat class
- Create ArrUtil with some helpers/functions to manage arrays
- Add CICD test for PHP 8.1
## Initial release - 2022-01-08
Initial release with:
- getMean()
- count()
- median()
- firstQuartile()
- thirdQuartile()
- mode()
- frequencies(): a frequency is the number of times a value of the data occurs;
- relativeFrequencies(): a relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes;
- cumulativeFrequencies(): is the accumulation of the previous relative frequencies.;
- cumulativeRelativeFrequencies(): is the accumulation of the previous relative ratio.
## 0.1.0 - 2022-01-08
Initial release with:
- getMean()
- count()
- median()
- firstQuartile()
- thirdQuartile()
- mode()
- frequencies(): a frequency is the number of times a value of the data occurs;
- relativeFrequencies(): a relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes;
- cumulativeFrequencies(): is the accumulation of the previous relative frequencies.;
- cumulativeRelativeFrequencies(): is the accumulation of the previous relative ratio.
================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct
## Our Commitment
We, as members, contributors, and leaders, are committed to ensuring that participation in our community is a positive and respectful experience for everyone, regardless of their age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
Our goal is to create an open, welcoming, diverse, inclusive, and healthy community where everyone feels valued and respected.
## Expectations for Behavior
In order to maintain a positive community environment, we expect all members, contributors, and leaders to adhere to the following guidelines:
* **Empathy and Kindness**: Treat others with empathy and kindness. Show understanding and consideration towards fellow community members.
* **Respect for Diverse Perspectives**: Be respectful of differing opinions, viewpoints, and experiences. Acknowledge that diversity of thought enriches our community.
* **Constructive Feedback**: Give feedback in a constructive manner and be open to receiving it. Take responsibility for your actions and apologize when necessary, using the experience as an opportunity to learn.
* **Community-Centered Focus**: Prioritize the well-being of the entire community, not just individual interests. Strive for what benefits the community as a whole.
## Unacceptable Behavior
The following behaviors are not tolerated within our community:
* **Sexualized Language or Imagery**: Avoid using sexualized language or imagery and refrain from making sexual advances.
* **Trolling and Insults**: Do not engage in trolling, insulting or derogatory comments, or personal or political attacks.
* **Harassment**: Harassment, whether public or private, is not acceptable. Respect personal boundaries and avoid intrusive behavior.
* **Sharing Private Information**: Do not publish others' private information, such as physical or email addresses, without their explicit permission.
* **Inappropriate Conduct**: Refrain from any conduct that could be considered unprofessional in a professional setting.
## Responsibilities for Enforcement
Community leaders are responsible for upholding and enforcing these standards of behavior. They will take appropriate and fair corrective action in response to any behavior that is deemed inappropriate, threatening, offensive, or harmful.
Community leaders have the authority to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that do not align with this Code of Conduct. They will communicate the reasons for moderation decisions when necessary.
## Scope
This Code of Conduct applies in all community spaces. It also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official email address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
## Reporting and Enforcement
If you encounter abusive, harassing, or otherwise unacceptable behavior, please report it to the community leaders responsible for enforcement via email. All complaints will be promptly and fairly reviewed and investigated.
Community leaders are obligated to respect the privacy and security of the reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these guidelines to determine the consequences for any actions that violate this Code of Conduct:
### 1. Correction
**Community Impact**: Inappropriate language or other unprofessional behavior.
**Consequence**: A private, written warning from community leaders, providing clarity about the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of actions.
**Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
**Consequence**: A temporary ban from any form of interaction or public communication with the community for a specified period. No public or private interaction with the people involved, including unsolicited interactions with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any form of public interaction within the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
The Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
================================================
FILE: CONTRIBUTING.md
================================================
# Contributing
Your contributions are highly appreciated, and they will be duly recognized.
Before you proceed to create an issue or a pull request, please take a moment to familiarize yourself with our contribution guide.
## Etiquette
This project thrives on the spirit of open source collaboration. Our maintainers dedicate their precious time to create and uphold the source code, and they share it with the hope that it will benefit fellow developers. Let's ensure they don't bear the brunt of abuse or anger for their hard work.
When raising issues or submitting pull requests, let's maintain a considerate and respectful tone. Our goal is to exemplify that developers are a courteous and collaborative community.
The maintainers have the responsibility to evaluate the quality and compatibility of all contributions with the project. Every developer brings unique skills, strengths, and perspectives to the table. Please respect their decisions, even if your submission isn't integrated.
## Relevance
Before proposing or submitting new features, consider whether they are genuinely beneficial to the broader user base. Open source projects serve a diverse group of developers with varying needs. It's important to assess whether your feature is likely to be widely useful.
## Procedure
### Preliminary Steps Before Filing an Issue
- Try to replicate the problem to ensure it's not an isolated occurrence.
- Verify if your feature suggestion has already been addressed within the project.
- Review the pull requests to make sure a solution for the bug isn't already underway.
- Check the pull requests to confirm that the feature isn't already under development.
### Preparing Your Pull Request
- Examine the codebase to prevent duplication of your proposed feature.
- Check the pull requests to verify that another contributor hasn't already submitted the same feature or fix.
## Opening a Pull Request
To maintain coding consistency, we adhere to the PSR-12 coding standard and use PHPStan for static code analysis. You can utilize the following command:
```bash
composer all-check
```
This command encompasses:
- PSR-12 Coding Standard checks employing PHP_CodeSniffer.
- PHPStan analysis at level 8.
- Execution of all tests from the `./tests/*` directory using PestPHP.
We recommend running `composer all-check` before committing and creating a pull request.
When working on a pull request, it is advisable to create a new branch that originates from the main branch. This branch can serve as the target branch when you submit your pull request to the original repository.
For a high-quality pull request, please ensure that you:
- Include tests as part of your patch. We cannot accept submissions lacking tests.
- Document changes in behavior, keeping the README.md and other pertinent documentation up-to-date.
- Respect our release cycle. We follow SemVer v2.0.0, and we cannot afford to randomly break public APIs.
- Stick to one pull request per feature. Multiple changes should be presented through separate pull requests.
- Provide a cohesive history. Each individual commit within your pull request should serve a meaningful purpose. If you have made several intermediary commits during development, please consolidate them before submission.
Happy coding! 🚀
================================================
FILE: LICENSE.md
================================================
The MIT License (MIT)
Copyright (c) hi-folks <roberto.butti@gmail.com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
================================================
FILE: README.md
================================================
<p align="center">
<img src="https://repository-images.githubusercontent.com/445609326/e2539776-0f8f-4556-be1d-887ea2368813" alt="PHP package for Statistics">
</p>
<h1 align="center">
Statistics PHP package
</h1>
<p align=center>
<a href="https://packagist.org/packages/hi-folks/statistics">
<img src="https://img.shields.io/packagist/v/hi-folks/statistics.svg?style=for-the-badge" alt="Latest Version on Packagist">
</a>
<a href="https://packagist.org/packages/hi-folks/statistics">
<img src="https://img.shields.io/packagist/dt/hi-folks/statistics.svg?style=for-the-badge" alt="Total Downloads">
</a>
<br>
<a href="https://github.com/Hi-Folks/statistics/blob/main/.github/workflows/static-code-analysis.yml">
<img src="https://img.shields.io/badge/PHPStan-level%208-brightgreen.svg?style=for-the-badge" alt="Static Code analysis">
</a>
<img src="https://img.shields.io/packagist/l/hi-folks/statistics?style=for-the-badge" alt="Packagist License">
<br>
<img src="https://img.shields.io/packagist/php-v/hi-folks/statistics?style=for-the-badge" alt="Packagist PHP Version Support">
<img src="https://img.shields.io/github/last-commit/hi-folks/statistics?style=for-the-badge" alt="GitHub last commit">
</p>
<p align=center>
<a href="https://github.com/hi-folks/statistics/actions/workflows/run-tests.yml">
<img src="https://github.com/hi-folks/statistics/actions/workflows/run-tests.yml/badge.svg?branch=main&style=for-the-badge" alt="Tests">
</a>
</p>
<p align=center>
<i>
A PHP package for descriptive statistics, normal distribution, outlier detection, and streaming analytics on numeric data.
</i>
</p>
This package provides a comprehensive set of statistical functions for PHP: descriptive statistics (mean, median, mode, standard deviation, variance, quantiles), robust measures (trimmed mean, weighted median, median absolute deviation), distribution modelling (normal distribution with PDF, CDF, and inverse CDF), outlier detection (z-score and IQR-based), z-scores, percentiles, coefficient of variation, frequency tables, correlation, regression (linear, logarithmic, power, and exponential), kernel density estimation, and O(1) memory streaming statistics.
It works with any numeric dataset — from sports telemetry and sensor data to race results, survey responses, and financial time series.
**Articles and resources:**
- [Exploring Olympic Downhill Results with PHP Statistics](https://dev.to/robertobutti/exploring-olympic-downhill-results-with-php-statistics-3eo1) — a step-by-step analysis of 2026 Olympic downhill race data
- [Statistics with PHP](https://dev.to/robertobutti/statistics-with-php-4pfp) — introduction to the package and its core functions
- [PHP Statistics on Laravel News](https://laravel-news.com/php-statistics)
> This package is inspired by the [Python statistics module](https://docs.python.org/3/library/statistics.html)
## Installation
You can install the package via composer:
```bash
composer require hi-folks/statistics
```
## Usage
### Stat class
Stat class has methods to calculate an average or typical value from a population or sample.
This class provides methods for calculating mathematical statistics of numeric data.
The various mathematical statistics are listed below:
| Mathematical Statistic | Description |
| ---------------------- | ----------- |
| `mean()` | arithmetic mean or "average" of data |
| `fmean()` | floating-point arithmetic mean, with optional weighting and precision |
| `trimmedMean()` | trimmed (truncated) mean — mean after removing outliers from each side |
| `median()` | median or "middle value" of data |
| `weightedMedian()` | weighted median — median with weights, where each value has a different importance |
| `medianLow()` | low median of data |
| `medianHigh()` | high median of data |
| `medianGrouped()` | median of grouped data, using interpolation |
| `mode()` | single mode (most common value) of discrete or nominal data |
| `multimode()` | list of modes (most common values) of discrete or nominal data |
| `quantiles()` | cut points dividing the range of a probability distribution into continuous intervals with equal probabilities (supports `exclusive` and `inclusive` methods) |
| `thirdQuartile()` | 3rd quartile, is the value at which 75 percent of the data is below it |
| `firstQuartile()` | first quartile, is the value at which 25 percent of the data is below it |
| `percentile()` | value at any percentile (0–100) with linear interpolation |
| `pstdev()` | Population standard deviation |
| `stdev()` | Sample standard deviation |
| `sem()` | Standard error of the mean (SEM) — measures precision of the sample mean |
| `meanAbsoluteDeviation()` | mean absolute deviation (MAD) — average distance from the mean |
| `medianAbsoluteDeviation()` | median absolute deviation — median distance from the median, robust to outliers |
| `pvariance()` | variance for a population (supports pre-computed mean via `mu`) |
| `variance()` | variance for a sample (supports pre-computed mean via `xbar`) |
| `skewness()` | adjusted Fisher-Pearson sample skewness |
| `pskewness()` | population (biased) skewness |
| `kurtosis()` | excess kurtosis (sample formula, 0 for normal distribution) |
| `coefficientOfVariation()` | coefficient of variation (CV%), relative dispersion as percentage |
| `zscores()` | z-scores for each value — how many standard deviations from the mean |
| `outliers()` | outlier detection based on z-score threshold |
| `iqrOutliers()` | outlier detection based on IQR method (box plot whiskers), robust for skewed data |
| `geometricMean()` | geometric mean |
| `harmonicMean()` | harmonic mean |
| `correlation()` | Pearson’s or Spearman’s rank correlation coefficient for two inputs |
| `covariance()` | the sample covariance of two inputs |
| `linearRegression()` | return the slope and intercept of simple linear regression parameters estimated using ordinary least squares (supports `proportional: true` for regression through the origin) |
| `logarithmicRegression()` | logarithmic regression — fits `y = a × ln(x) + b`, ideal for diminishing returns patterns (e.g., athletic improvement, learning curves) |
| `powerRegression()` | power regression — fits `y = a × x^b`, useful for power law relationships |
| `exponentialRegression()` | exponential regression — fits `y = a × e^(b×x)`, useful for exponential growth or decay |
| `rSquared()` | coefficient of determination (R²) — proportion of variance explained by linear regression |
| `confidenceInterval()` | confidence interval for the mean using the normal (z) distribution |
| `zTest()` | one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean |
| `tTest()` | one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown |
| `tTestTwoSample()` | two-sample independent t-test (Welch's) — compares the means of two independent groups without assuming equal variances |
| `tTestPaired()` | paired t-test — tests whether the mean difference between paired observations is significantly different from zero |
| `kde()` | kernel density estimation — returns a closure that estimates the probability density (or CDF) at any point |
| `kdeRandom()` | random sampling from a kernel density estimate — returns a closure that generates random floats from the KDE distribution |
#### Stat::mean( array $data )
Return the sample arithmetic mean of the array _$data_.
The arithmetic mean is the sum of the data divided by the number of data points. It is commonly called “the average”, although it is only one of many mathematical averages. It is a measure of the central location of the data.
```php
use HiFolks\Statistics\Stat;
$mean = Stat::mean([1, 2, 3, 4, 4]);
// 2.8
$mean = Stat::mean([-1.0, 2.5, 3.25, 5.75]);
// 2.625
```
#### Stat::fmean( array $data, array|null $weights = null, int|null $precision = null )
Return the arithmetic mean of the array `$data`, as a float, with optional weights and precision control.
This function behaves like `mean()` but ensures a floating-point result and supports weighted datasets.
If `$weights` is provided, it computes the weighted average. The result is rounded to a given decimal $precision.
The result is rounded to `$precision` decimal places.
If `$precision` is null, no rounding is applied — this may lead to results with long or unexpected decimal expansions due to the nature of floating-point arithmetic in PHP. Using rounding helps ensure cleaner, more predictable output.
```php
use HiFolks\Statistics\Stat;
// Unweighted mean (same as mean but always float)
$fmean = Stat::fmean([3.5, 4.0, 5.25]);
// 4.25
// Weighted mean
$fmean = Stat::fmean([3.5, 4.0, 5.25], [1, 2, 1]);
// 4.1875
// Custom precision
$fmean = Stat::fmean([3.5, 4.0, 5.25], null, 2);
// 4.25
$fmean = Stat::fmean([3.5, 4.0, 5.25], [1, 2, 1], 3);
// 4.188
```
If the input is empty, or weights are invalid (e.g., length mismatch or sum is zero), an exception is thrown.
Use this function when you need floating-point accuracy or to apply custom weighting and rounding to your average.
#### Stat::trimmedMean( array $data, float $proportionToCut = 0.1, ?int $round = null )
Return the trimmed (truncated) mean of the data. Computes the mean after removing the lowest and highest fraction of values. This is a robust measure of central tendency, less sensitive to outliers than the regular mean.
The `$proportionToCut` parameter specifies the fraction to trim from **each** side (must be in the range `[0, 0.5)`). For example, `0.1` removes the bottom 10% and top 10%.
```php
use HiFolks\Statistics\Stat;
$mean = Stat::trimmedMean([1, 2, 3, 4, 5, 6, 7, 8, 9, 100], 0.1);
// 5.5 (outlier 100 and lowest value 1 removed)
$mean = Stat::trimmedMean([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 0.2);
// 5.5 (removes 2 values from each side)
$mean = Stat::trimmedMean([1, 2, 3, 4, 5], 0.0);
// 3.0 (no trimming, same as regular mean)
```
#### Stat::geometricMean( array $data )
The geometric mean indicates the central tendency or typical value of the data using the product of the values (as opposed to the arithmetic mean which uses their sum).
```php
use HiFolks\Statistics\Stat;
$mean = Stat::geometricMean([54, 24, 36], 1);
// 36.0
```
#### Stat::harmonicMean( array $data )
The harmonic mean is the reciprocal of the arithmetic mean() of the reciprocals of the data. For example, the harmonic mean of three values a, b, and c will be equivalent to 3/(1/a + 1/b + 1/c). If one of the values is zero, the result will be zero.
```php
use HiFolks\Statistics\Stat;
$mean = Stat::harmonicMean([40, 60], null, 1);
// 48.0
```
You can also calculate the harmonic weighted mean.
Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds up to 60 km/hr for the remaining 30 km of the journey. What is the average speed?
```php
use HiFolks\Statistics\Stat;
Stat::harmonicMean([40, 60], [5, 30], 1);
// 56.0
```
where:
- 40, 60: are the elements
- 5, 30: are the weights for each element (the first weight is the weight of the first element, the second one is the weight of the second element)
- 1: is the decimal numbers you want to round
#### Stat::median( array $data )
Return the median (middle value) of numeric data, using the common “mean of middle two” method.
```php
use HiFolks\Statistics\Stat;
$median = Stat::median([1, 3, 5]);
// 3
$median = Stat::median([1, 3, 5, 7]);
// 4
```
#### Stat::weightedMedian( array $data, array $weights, ?int $round = null )
Return the weighted median of the data. The weighted median is the value where the cumulative weight reaches 50% of the total weight. This is useful for survey data, financial analysis, or any dataset where observations have different importance.
All weights must be positive numbers and the weights array must have the same length as the data array.
```php
use HiFolks\Statistics\Stat;
$median = Stat::weightedMedian([1, 2, 3], [1, 1, 1]);
// 2.0 (equal weights, same as regular median)
$median = Stat::weightedMedian([1, 2, 3], [1, 1, 10]);
// 3.0 (heavy weight on 3 pulls the median)
$median = Stat::weightedMedian([1, 2, 3, 4], [1, 1, 1, 1]);
// 2.5 (equal weights, even count — averages the two middle values)
```
#### Stat::medianLow( array $data )
Return the low median of numeric data.
The low median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned.
```php
use HiFolks\Statistics\Stat;
$median = Stat::medianLow([1, 3, 5]);
// 3
$median = Stat::medianLow([1, 3, 5, 7]);
// 3
```
#### Stat::medianHigh( array $data )
Return the high median of data.
The high median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned.
```php
use HiFolks\Statistics\Stat;
$median = Stat::medianHigh([1, 3, 5]);
// 3
$median = Stat::medianHigh([1, 3, 5, 7]);
// 5
```
#### Stat::medianGrouped( array $data, float $interval = 1.0 )
Estimate the median for numeric data that has been grouped or binned around the midpoints of consecutive, fixed-width intervals.
The `$interval` parameter specifies the width of each bin (default `1.0`). This function uses interpolation within the median interval, assuming values are evenly distributed across each bin.
```php
use HiFolks\Statistics\Stat;
$median = Stat::medianGrouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]);
// 3.7
$median = Stat::medianGrouped([1, 3, 3, 5, 7]);
// 3.25
$median = Stat::medianGrouped([1, 3, 3, 5, 7], 2);
// 3.5
```
For example, demographic data summarized into ten-year age groups:
```php
use HiFolks\Statistics\Stat;
// 172 people aged 20-30, 484 aged 30-40, 387 aged 40-50, etc.
$data = array_merge(
array_fill(0, 172, 25),
array_fill(0, 484, 35),
array_fill(0, 387, 45),
array_fill(0, 22, 55),
array_fill(0, 6, 65),
);
round(Stat::medianGrouped($data, 10), 1);
// 37.5
```
#### Stat::quantiles( array $data, $n=4, $round=null, $method='exclusive' )
Divide data into n continuous intervals with equal probability. Returns a list of n - 1 cut points separating the intervals.
Set n to 4 for quartiles (the default). Set n to 10 for deciles. Set n to 100 for percentiles which gives the 99 cut points that separate data into 100 equal-sized groups.
The `$method` parameter controls the interpolation method:
- `'exclusive'` (default): uses `m = count + 1`. Suitable for sampled data that may have more extreme values beyond the sample.
- `'inclusive'`: uses `m = count - 1`. Suitable for population data or samples known to include the most extreme values. The minimum value is treated as the 0th percentile and the maximum as the 100th percentile.
```php
use HiFolks\Statistics\Stat;
$quantiles = Stat::quantiles([98, 90, 70,18,92,92,55,83,45,95,88]);
// [ 55.0, 88.0, 92.0 ]
$quantiles = Stat::quantiles([105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110,100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129,106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86,111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95,103, 107, 101, 81, 109, 104], 10);
// [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0]
// Inclusive method
$quantiles = Stat::quantiles([1, 2, 3, 4, 5], method: 'inclusive');
// [2.0, 3.0, 4.0]
```
#### Stat::firstQuartile( array $data, $round=null )
The lower quartile, or first quartile (Q1), is the value under which 25% of data points are found when they are arranged in increasing order.
```php
use HiFolks\Statistics\Stat;
$percentile = Stat::firstQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 55.0
```
#### Stat::thirdQuartile( array $data, $round=null )
The upper quartile, or third quartile (Q3), is the value under which 75% of data points are found when arranged in increasing order.
```php
use HiFolks\Statistics\Stat;
$percentile = Stat::thirdQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 92.0
```
#### Stat::percentile( array $data, float $p, ?int $round = null )
Return the value at the given percentile of the data, using linear interpolation between adjacent data points (exclusive method, consistent with `quantiles()`).
The percentile `$p` must be between 0 and 100. Requires at least 2 data points.
```php
use HiFolks\Statistics\Stat;
$value = Stat::percentile([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], 50);
// 55.0 (median)
$value = Stat::percentile([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], 90);
// 91.0
```
#### Stat::pstdev( array $data )
Return the **Population** Standard Deviation, a measure of the amount of variation or dispersion of a set of values.
A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.
```php
use HiFolks\Statistics\Stat;
$stdev = Stat::pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]);
// 0.986893273527251
$stdev = Stat::pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75], 4);
// 0.9869
```
#### Stat::stdev( array $data )
Return the **Sample** Standard Deviation, a measure of the amount of variation or dispersion of a set of values.
A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.
```php
use HiFolks\Statistics\Stat;
$stdev = Stat::stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]);
// 1.0810874155219827
$stdev = Stat::stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75], 4);
// 1.0811
```
#### Stat::sem( array $data, ?int $round = null )
Return the standard error of the mean (SEM). SEM measures how precisely the sample mean estimates the population mean. It decreases as the sample size grows.
Formula: `stdev / sqrt(n)`
Requires at least 2 data points.
```php
use HiFolks\Statistics\Stat;
$sem = Stat::sem([2, 4, 4, 4, 5, 5, 7, 9]);
// 0.7559...
$sem = Stat::sem([2, 4, 4, 4, 5, 5, 7, 9], 4);
// 0.7559
```
#### Stat::meanAbsoluteDeviation( array $data, ?int $round = null )
Return the mean absolute deviation (MAD) — the average of the absolute deviations from the mean.
MAD is a simple, intuitive measure of dispersion: it tells you "on average, how far values are from the mean". Unlike standard deviation, it does not square the differences, making it easier to interpret and somewhat less sensitive to outliers.
Use MAD when you want a straightforward, interpretable measure of spread, especially for reporting to non-technical audiences.
```php
use HiFolks\Statistics\Stat;
$mad = Stat::meanAbsoluteDeviation([1, 2, 3, 4, 5]);
// 1.2
$mad = Stat::meanAbsoluteDeviation([1, 2, 3, 4, 5], 1);
// 1.2
```
#### Stat::medianAbsoluteDeviation( array $data, ?int $round = null )
Return the median absolute deviation — the median of the absolute deviations from the median.
This is one of the most **robust measures of dispersion** available. Because it uses the median (not the mean) as the center and takes the median (not the mean) of deviations, it is highly resistant to outliers. Even if up to half the data points are extreme, the median absolute deviation remains stable.
Use it when your data may contain outliers, when you need a robust alternative to standard deviation, or for outlier detection (values far from the median in units of MAD are likely outliers).
```php
use HiFolks\Statistics\Stat;
$mad = Stat::medianAbsoluteDeviation([1, 2, 3, 4, 5]);
// 1.0
// Robust to outliers — the outlier 1000 does not affect the result:
$mad = Stat::medianAbsoluteDeviation([1, 2, 3, 4, 1000]);
// 1.0
```
#### Stat::variance ( array $data, ?int $round = null, int|float|null $xbar = null)
Variance is a measure of dispersion of data points from the mean.
Low variance indicates that data points are generally similar and do not vary widely from the mean.
High variance indicates that data values have greater variability and are more widely dispersed from the mean.
To calculate the variance from a *sample*:
```php
use HiFolks\Statistics\Stat;
$variance = Stat::variance([2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]);
// 1.3720238095238095
```
If you have already computed the mean, you can pass it via `xbar` to avoid recalculation:
```php
$data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5];
$mean = Stat::mean($data);
$variance = Stat::variance($data, xbar: $mean);
```
If you need to calculate the variance on the whole population and not just on a sample you need to use *pvariance* method. You can optionally pass the population mean via `mu`:
```php
use HiFolks\Statistics\Stat;
$variance = Stat::pvariance([0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]);
// 1.25
// With pre-computed mean:
$data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25];
$mu = Stat::mean($data);
$variance = Stat::pvariance($data, mu: $mu);
```
#### Stat::skewness ( array $data, ?int $round = null )
Skewness is a measure of the asymmetry of a distribution. The adjusted Fisher-Pearson formula is used, which is the same as Excel's `SKEW()` and Python's `scipy.stats.skew(bias=False)`.
A positive skewness indicates a right-skewed distribution (tail extends to the right), while a negative skewness indicates a left-skewed distribution. A symmetric distribution has a skewness of 0.
Requires at least 3 data points.
```php
use HiFolks\Statistics\Stat;
$skewness = Stat::skewness([1, 2, 3, 4, 5]);
// 0.0 (symmetric)
$skewness = Stat::skewness([1, 1, 1, 1, 1, 10]);
// positive (right-skewed)
```
If you need the population (biased) skewness instead of the sample skewness, use `pskewness()`. This is equivalent to `scipy.stats.skew(bias=True)`:
```php
use HiFolks\Statistics\Stat;
$pskewness = Stat::pskewness([1, 1, 1, 1, 1, 10]);
```
#### Stat::kurtosis ( array $data, ?int $round = null )
Kurtosis measures the "tailedness" of a distribution — how much data lives in the extreme tails compared to a normal distribution. This method returns the **excess kurtosis** using the sample formula, which is the same as Excel's `KURT()` and Python's `scipy.stats.kurtosis(bias=False)`.
A normal distribution has excess kurtosis of 0. Positive values (leptokurtic) indicate heavier tails and more outliers. Negative values (platykurtic) indicate lighter tails and fewer outliers.
Requires at least 4 data points.
```php
use HiFolks\Statistics\Stat;
$kurtosis = Stat::kurtosis([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
// negative (platykurtic, lighter tails than normal)
$kurtosis = Stat::kurtosis([1, 2, 2, 2, 2, 2, 2, 2, 2, 50]);
// positive (leptokurtic, heavier tails due to outlier)
```
#### Stat::coefficientOfVariation( array $data, ?int $round = null, bool $population = false )
The coefficient of variation (CV) is the ratio of the standard deviation to the mean, expressed as a percentage. It measures relative variability and is useful for comparing dispersion across datasets with different units or scales.
By default it uses the sample standard deviation. Pass `population: true` to use the population standard deviation instead.
Requires at least 2 data points (sample) or 1 (population). Throws if the mean is zero.
```php
use HiFolks\Statistics\Stat;
$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50]);
// ~52.70 (sample)
$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50], round: 2);
// 52.7
$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50], population: true);
// ~47.14 (population)
```
#### Stat::zscores( array $data, ?int $round = null )
Return the z-score for each value in the dataset. A z-score indicates how many standard deviations a value is from the mean. Z-scores are useful for standardizing data, comparing values from different distributions, and identifying outliers.
The z-scores of any dataset always sum to zero, and values beyond ±2 or ±3 are typically considered unusual or outliers.
Requires at least 2 data points and non-zero standard deviation.
```php
use HiFolks\Statistics\Stat;
$zscores = Stat::zscores([2, 4, 4, 4, 5, 5, 7, 9]);
// array of z-scores, one per value
$zscores = Stat::zscores([2, 4, 4, 4, 5, 5, 7, 9], 2);
// z-scores rounded to 2 decimal places
```
#### Stat::outliers( array $data, float $threshold = 3.0 )
Return values from the dataset that are outliers based on z-score threshold. A value is considered an outlier if its absolute z-score exceeds the threshold.
The default threshold of 3.0 is a widely used convention — in a normal distribution, about 99.7% of values fall within 3 standard deviations of the mean, so values beyond that are rare. Use a lower threshold (e.g. 2.0) for stricter detection, or a higher one for more lenient filtering.
```php
use HiFolks\Statistics\Stat;
$outliers = Stat::outliers([1, 2, 3, 4, 5, 6, 7, 8, 9, 100]);
// [100]
$outliers = Stat::outliers([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 1.0);
// values more than 1 stdev from the mean
```
#### Stat::iqrOutliers( array $data, float $factor = 1.5 )
Return values that are outliers based on the Interquartile Range (IQR) method. A value is an outlier if it falls below `Q1 - factor * IQR` or above `Q3 + factor * IQR`. This is the same method used for box plot whiskers.
Unlike z-score based detection, the IQR method is **robust** — it does not assume a normal distribution and is not influenced by extreme values themselves. This makes it the preferred choice for skewed data or when the dataset may already contain outliers that would distort the mean and standard deviation.
Use `factor: 1.5` (default) for mild outliers, or `factor: 3.0` for extreme outliers only.
**Example: Ski downhill race times**
In a ski downhill race, most athletes finish between 108–116 seconds. A time of 200s (e.g. a crash/DNF) or 50s (e.g. a timing error) would be flagged as outliers:
```php
use HiFolks\Statistics\Stat;
$times = [110.2, 112.5, 108.9, 115.3, 111.7, 114.0, 109.8, 113.6, 200.0, 50.0];
$outliers = Stat::iqrOutliers($times);
// [200.0, 50.0] — the crash and the timing error are detected
$extremeOnly = Stat::iqrOutliers($times, 3.0);
// only the most extreme values
```
#### Stat::covariance ( array $x , array $y )
Covariance, static method, returns the sample covariance of two inputs *$x* and *$y*.
Covariance is a measure of the joint variability of two inputs.
```php
$covariance = Stat::covariance(
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 1, 2, 3, 1, 2, 3]
);
// 0.75
```
```php
$covariance = Stat::covariance(
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[9, 8, 7, 6, 5, 4, 3, 2, 1]
);
// -7.5
```
#### Stat::correlation ( array $x , array $y, string $method = ‘linear’ )
Return the Pearson’s correlation coefficient for two inputs. Pearson’s correlation coefficient r takes values between -1 and +1. It measures the strength and direction of the linear relationship, where +1 means very strong, positive linear relationship, -1 very strong, negative linear relationship, and 0 no linear relationship.
Use `$method = ‘ranked’` for Spearman’s rank correlation, which measures monotonic relationships (not just linear). Spearman’s correlation is computed by applying Pearson’s formula to the ranks of the data.
```php
$correlation = Stat::correlation(
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9]
);
// 1.0
```
```php
$correlation = Stat::correlation(
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[9, 8, 7, 6, 5, 4, 3, 2, 1]
);
// -1.0
```
Spearman’s rank correlation (non-linear but monotonic relationship):
```php
$correlation = Stat::correlation(
[1, 2, 3, 4, 5],
[1, 4, 9, 16, 25],
‘ranked’
);
// 1.0
```
#### Stat::linearRegression ( array $x , array $y , bool $proportional = false )
Return the slope and intercept of simple linear regression parameters estimated using ordinary least squares.
Simple linear regression describes the relationship between an independent variable *$x* and a dependent variable *$y* in terms of a linear function.
```php
$years = [1971, 1975, 1979, 1982, 1983];
$films_total = [1, 2, 3, 4, 5]
list($slope, $intercept) = Stat::linearRegression(
$years,
$films_total
);
// 0.31
// -610.18
```
What happens in 2022, according to the samples above?
```php
round($slope * 2022 + $intercept);
// 17.0
```
When `proportional` is `true`, the regression line is forced through the origin (intercept = 0). This is useful when the relationship between *$x* and *$y* is known to be proportional:
```php
list($slope, $intercept) = Stat::linearRegression(
[1, 2, 3, 4, 5],
[2, 4, 6, 8, 10],
proportional: true,
);
// $slope = 2.0
// $intercept = 0.0
```
#### Stat::logarithmicRegression( array $x, array $y )
Fit a logarithmic model **y = a × ln(x) + b**. Returns `[a, b]`.
This model naturally captures diminishing returns — fast initial change that gradually flattens. It is useful for data where early gains are large but improvement slows over time, such as athletic performance trends, learning curves, or market saturation.
All x values must be positive (you cannot take the logarithm of zero or negative numbers).
Internally, this transforms x to ln(x) and applies linear regression, so it leverages the same robust ordinary least squares implementation.
```php
use HiFolks\Statistics\Stat;
// Simulated weekly running paces (seconds/km) — diminishing improvement
$weeks = [1, 2, 3, 4, 5, 6, 7, 8];
$paces = [350, 342, 337, 333, 330, 328, 326, 325];
[$a, $b] = Stat::logarithmicRegression($weeks, $paces);
// $a = -12.33 (pace drops by 12.33 sec per unit of ln(week))
// $b = 350.2
// Predict pace at week 12:
$predicted = $a * log(12) + $b;
// ~320 seconds = 5:20/km
```
Compare with linear regression to see which fits better:
```php
// R² for logarithmic model (transform x first)
$logWeeks = array_map(fn($v) => log($v), $weeks);
$r2Log = Stat::rSquared($logWeeks, $paces);
// 0.9987
// R² for linear model
$r2Linear = Stat::rSquared($weeks, $paces);
// 0.9176
// Logarithmic wins — the data has diminishing returns
```
#### Stat::powerRegression( array $x, array $y )
Fit a power model **y = a × x^b**. Returns `[a, b]`.
Power regression is useful for data following power law relationships (e.g., scaling laws, allometric relationships). Both x and y values must be positive.
Internally, this linearizes as ln(y) = ln(a) + b × ln(x) and applies linear regression.
```php
use HiFolks\Statistics\Stat;
// Data following y = 3 * x^2
$x = [1, 2, 3, 4, 5];
$y = [3, 12, 27, 48, 75];
[$a, $b] = Stat::powerRegression($x, $y);
// $a = 3.0
// $b = 2.0 (the exponent)
```
#### Stat::exponentialRegression( array $x, array $y )
Fit an exponential model **y = a × e^(b×x)**. Returns `[a, b]`.
Exponential regression is useful for data with exponential growth (positive b) or decay (negative b), such as population growth, compound interest, or radioactive decay. All y values must be positive.
Internally, this linearizes as ln(y) = ln(a) + b × x and applies linear regression.
```php
use HiFolks\Statistics\Stat;
// Data following y = 2 * e^(0.5*x)
$x = [1, 2, 3, 4, 5];
$y = [3.30, 5.44, 8.96, 14.78, 24.36];
[$a, $b] = Stat::exponentialRegression($x, $y);
// $a ≈ 2.0
// $b ≈ 0.5
```
#### Stat::rSquared( array $x, array $y, bool $proportional = false, ?int $round = null )
Return the coefficient of determination (R²) — the proportion of variance in the dependent variable explained by the linear regression model. Values range from 0 (no explanatory power) to 1 (perfect fit).
Requires at least 2 data points and arrays of the same length.
```php
use HiFolks\Statistics\Stat;
$r2 = Stat::rSquared([1, 2, 3, 4, 5], [2, 4, 6, 8, 10]);
// 1.0 (perfect linear relationship)
$r2 = Stat::rSquared(
[1971, 1975, 1979, 1982, 1983],
[1, 2, 3, 4, 5],
round: 2,
);
// 0.96
```
With proportional regression (through the origin):
```php
$r2 = Stat::rSquared(
[1, 2, 3, 4, 5],
[2, 4, 6, 8, 10],
proportional: true,
);
// 1.0
```
To compute R² for non-linear models, transform the data the same way the regression method does:
```php
// R² for logarithmic regression
$logX = array_map(fn($v) => log($v), $x);
$r2 = Stat::rSquared($logX, $y);
// R² for power regression
$logX = array_map(fn($v) => log($v), $x);
$logY = array_map(fn($v) => log($v), $y);
$r2 = Stat::rSquared($logX, $logY);
// R² for exponential regression
$logY = array_map(fn($v) => log($v), $y);
$r2 = Stat::rSquared($x, $logY);
```
#### Stat::confidenceInterval( array $data, float $confidenceLevel = 0.95, ?int $round = null )
Return the confidence interval for the mean using the normal (z) distribution.
Computes: `mean ± z * (stdev / √n)`, where the z-critical value is derived from the inverse normal CDF.
Requires at least 2 data points. The confidence level must be between 0 and 1 exclusive.
```php
use HiFolks\Statistics\Stat;
[$lower, $upper] = Stat::confidenceInterval([2, 4, 4, 4, 5, 5, 7, 9]);
// 95% CI: [3.52, 6.48] (approximately)
[$lower, $upper] = Stat::confidenceInterval([2, 4, 4, 4, 5, 5, 7, 9], confidenceLevel: 0.99);
// 99% CI: wider interval
[$lower, $upper] = Stat::confidenceInterval([2, 4, 4, 4, 5, 5, 7, 9], round: 2);
// [3.52, 6.48]
```
#### Stat::zTest( array $data, float $populationMean, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
Perform a one-sample Z-test for the mean. Tests whether the sample mean differs significantly from a hypothesized population mean using the normal distribution.
Returns an associative array with `zScore` and `pValue`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.
Requires at least 2 data points.
```php
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\Alternative;
$result = Stat::zTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0);
// ['zScore' => 2.6457..., 'pValue' => 0.0081...]
$result = Stat::zTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, alternative: Alternative::Greater);
// one-tailed test: is the sample mean greater than 3?
$result = Stat::zTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, round: 4);
// ['zScore' => 2.6458, 'pValue' => 0.0081]
```
#### Stat::tTest( array $data, float $populationMean, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
Perform a one-sample t-test for the mean. Tests whether the sample mean differs significantly from a hypothesized population mean using the Student's t-distribution. Unlike the z-test, the t-test is appropriate for small samples where the population standard deviation is unknown.
Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.
Requires at least 2 data points.
```php
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\Alternative;
$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0);
// ['tStatistic' => 2.6457..., 'pValue' => 0.0331..., 'degreesOfFreedom' => 7]
$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, alternative: Alternative::Greater);
// one-tailed test: is the sample mean greater than 3?
$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, round: 4);
// ['tStatistic' => 2.6458, 'pValue' => 0.0331, 'degreesOfFreedom' => 7]
```
#### Stat::tTestTwoSample( array $data1, array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
Perform a two-sample independent t-test (Welch's t-test). Compares the means of two independent groups without assuming equal variances. Uses the Welch–Satterthwaite approximation for degrees of freedom.
Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.
Requires at least 2 data points in each sample.
```php
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\Alternative;
// Compare two groups
$group1 = [30.02, 29.99, 30.11, 29.97, 30.01, 29.99];
$group2 = [29.89, 29.93, 29.72, 29.98, 30.02, 29.98];
$result = Stat::tTestTwoSample($group1, $group2);
// ['tStatistic' => 1.6245..., 'pValue' => 0.1444..., 'degreesOfFreedom' => 6.84...]
// One-tailed test: is group1 mean greater than group2 mean?
$result = Stat::tTestTwoSample($group1, $group2, alternative: Alternative::Greater);
// Groups can have different sizes
$result = Stat::tTestTwoSample([1, 2, 3, 4, 5, 6, 7, 8], [3, 4, 5], round: 4);
```
#### Stat::tTestPaired( array $data1, array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
Perform a paired t-test. Tests whether the mean difference between paired observations (e.g. before/after measurements on the same subjects) is significantly different from zero.
Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. Both arrays must have the same length.
Requires at least 2 paired observations.
```php
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\Alternative;
// Before and after treatment measurements
$before = [200, 190, 210, 220, 215, 205, 195, 225];
$after = [192, 186, 198, 212, 208, 198, 188, 215];
$result = Stat::tTestPaired($before, $after);
// ['tStatistic' => 5.715..., 'pValue' => 0.0007..., 'degreesOfFreedom' => 7]
// One-tailed: did the treatment decrease the values?
$result = Stat::tTestPaired($before, $after, alternative: Alternative::Greater);
$result = Stat::tTestPaired($before, $after, round: 4);
```
#### Stat::kde ( array $data , float $h , KdeKernel $kernel = KdeKernel::Normal , bool $cumulative = false )
Create a continuous probability density function (or cumulative distribution function) from discrete sample data using Kernel Density Estimation.
Returns a `Closure` that can be called with any point to estimate the density (or CDF value).
Supported kernels: `KdeKernel::Normal` (alias `KdeKernel::Gauss`), `KdeKernel::Logistic`, `KdeKernel::Sigmoid`, `KdeKernel::Rectangular` (alias `KdeKernel::Uniform`), `KdeKernel::Triangular`, `KdeKernel::Parabolic` (alias `KdeKernel::Epanechnikov`), `KdeKernel::Quartic` (alias `KdeKernel::Biweight`), `KdeKernel::Triweight`, `KdeKernel::Cosine`.
```php
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\KdeKernel;
$data = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2];
$f = Stat::kde($data, h: 1.5);
$f(2.5);
// estimated density at x = 2.5
```
Using a different kernel:
```php
$f = Stat::kde($data, h: 1.5, kernel: KdeKernel::Triangular);
$f(2.5);
```
Cumulative distribution function:
```php
$F = Stat::kde($data, h: 1.5, cumulative: true);
$F(2.5);
// estimated CDF at x = 2.5 (probability that a value is <= 2.5)
```
#### Stat::kdeRandom ( array $data , float $h , KdeKernel $kernel = KdeKernel::Normal , ?int $seed = null )
Generate random samples from a Kernel Density Estimate.
Returns a `Closure` that, when called, produces a random float drawn from the KDE distribution defined by the data and bandwidth.
Supported kernels: `KdeKernel::Normal` (alias `KdeKernel::Gauss`), `KdeKernel::Logistic`, `KdeKernel::Sigmoid`, `KdeKernel::Rectangular` (alias `KdeKernel::Uniform`), `KdeKernel::Triangular`, `KdeKernel::Parabolic` (alias `KdeKernel::Epanechnikov`), `KdeKernel::Quartic` (alias `KdeKernel::Biweight`), `KdeKernel::Triweight`, `KdeKernel::Cosine`.
```php
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\KdeKernel;
$data = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2];
$rand = Stat::kdeRandom($data, h: 1.5, seed: 8675309);
$samples = [];
for ($i = 0; $i < 10; $i++) {
$samples[] = round($rand(), 1);
}
// [2.5, 3.3, -1.8, 7.3, -2.1, 4.6, 4.4, 5.9, -3.2, -1.6]
```
Using a different kernel:
```php
$rand = Stat::kdeRandom($data, h: 1.5, kernel: KdeKernel::Triangular, seed: 42);
$rand();
```
### Freq class
With *Statistics* package you can calculate frequency table.
A frequency table lists the frequency of various outcomes in a sample.
Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval.
#### Freq::frequencies( array $data )
```php
use HiFolks\Statistics\Freq;
$fruits = ['🍈', '🍈', '🍈', '🍉','🍉','🍉','🍉','🍉','🍌'];
$freqTable = Freq::frequencies($fruits);
print_r($freqTable);
```
You can see the frequency table as an array:
```
Array
(
[🍈] => 3
[🍉] => 5
[🍌] => 1
)
```
#### Freq::relativeFrequencies( array $data )
You can retrieve the frequency table in relative format (percentage):
```php
$freqTable = Freq::relativeFrequencies($fruits, 2);
print_r($freqTable);
```
You can see the frequency table as an array with percentage of the occurrences:
```
Array
(
[🍈] => 33.33
[🍉] => 55.56
[🍌] => 11.11
)
```
#### Freq::frequencyTableBySize( array $data , $size)
If you want to create a frequency table based on class (ranges of values) you can use frequencyTableBySize.
The first parameter is the array, and the second one is the size of classes.
Calculate the frequency table with classes. Each group size is 4
```php
$data = [1,1,1,4,4,5,5,5,6,7,8,8,8,9,9,9,9,9,9,10,10,11,12,12,
13,14,14,15,15,16,16,16,16,17,17,17,18,18, ];
$result = \HiFolks\Statistics\Freq::frequencyTableBySize($data, 4);
print_r($result);
/*
Array
(
[1] => 5
[5] => 8
[9] => 11
[13] => 9
[17] => 5
)
*/
```
#### Freq::frequencyTable()
If you want to create a frequency table based on class (ranges of values) you can use frequencyTable.
The first parameter is the array, and the second one is the number of classes.
Calculate the frequency table with 5 classes.
```php
$data = [1,1,1,4,4,5,5,5,6,7,8,8,8,9,9,9,9,9,9,10,10,11,12,12,
13,14,14,15,15,16,16,16,16,17,17,17,18,18, ];
$result = \HiFolks\Statistics\Freq::frequencyTable($data, 5);
print_r($result);
/*
Array
(
[1] => 5
[5] => 8
[9] => 11
[13] => 9
[17] => 5
)
*/
```
### Statistics class
The methods provided by the `Freq` and the `Stat` classes are mainly **static** methods.
If you prefer to use an object instance for calculating statistics you can choose to use an instance of the `Statistics` class.
So for calling the statistics methods, you can use your object instance of the `Statistics` class.
For example for calculating the mean, you can obtain the `Statistics` object via the `make()` static method, and then use the new object `$stat` like in the following example:
```php
$stat = HiFolks\Statistics\Statistics::make(
[3,5,4,7,5,2]
);
echo $stat->valuesToString(5) . PHP_EOL;
// 2,3,4,5,5
echo "Mean : " . $stat->mean() . PHP_EOL;
// Mean : 4.3333333333333
echo "Count : " . $stat->count() . PHP_EOL;
// Count : 6
echo "Median : " . $stat->median() . PHP_EOL;
// Median : 4.5
echo "First Quartile : " . $stat->firstQuartile() . PHP_EOL;
// First Quartile : 2.5
echo "Third Quartile : " . $stat->thirdQuartile() . PHP_EOL;
// Third Quartile : 5
echo "Mode : " . $stat->mode() . PHP_EOL;
// Mode : 5
```
#### Calculate Frequency Table
The `Statistics` packages have some methods for generating Frequency Table:
- `frequencies()`: a frequency is the number of times a value of the data occurs;
- `relativeFrequencies()`: a relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes;
- `cumulativeFrequencies()`: is the accumulation of the previous relative frequencies;
- `cumulativeRelativeFrequencies()`: is the accumulation of the previous relative ratio.
```php
use HiFolks\Statistics\Statistics;
$s = Statistics::make(
[98, 90, 70,18,92,92,55,83,45,95,88,76]
);
$a = $s->frequencies();
print_r($a);
/*
Array
(
[18] => 1
[45] => 1
[55] => 1
[70] => 1
[76] => 1
[83] => 1
[88] => 1
[90] => 1
[92] => 2
[95] => 1
[98] => 1
)
*/
$a = $s->relativeFrequencies();
print_r($a);
/*
Array
(
[18] => 8.3333333333333
[45] => 8.3333333333333
[55] => 8.3333333333333
[70] => 8.3333333333333
[76] => 8.3333333333333
[83] => 8.3333333333333
[88] => 8.3333333333333
[90] => 8.3333333333333
[92] => 16.666666666667
[95] => 8.3333333333333
[98] => 8.3333333333333
)
*/
```
## `NormalDist` class
The `NormalDist` class provides an easy way to work with normal distributions in PHP. It allows you to calculate probabilities and densities for a given mean (μ\muμ) and standard deviation (σ\sigmaσ).
### Key features
- Define a normal distribution with mean (μ\muμ) and standard deviation (σ\sigmaσ).
- Calculate the **Probability Density Function (PDF)** to evaluate the relative likelihood of a value.
- Calculate the **Cumulative Distribution Function (CDF)** to determine the probability of a value or lower.
- Calculate the **Inverse Cumulative Distribution Function (inv_cdf)** to find the value for a given probability.
------
### Class constructor
```php
$normalDist = new NormalDist(float $mu = 0.0, float $sigma = 1.0);
```
- `$mu`: The mean (default = `0.0`).
- `$sigma`: The standard deviation (default = `1.0`).
- Throws an exception if `$sigma` is non-positive.
------
### Methods
#### Properties: mean, sigma, and variance
You can access the distribution parameters via getter methods:
```php
$normalDist = new NormalDist(100, 15);
$normalDist->getMean(); // 100.0
$normalDist->getSigma(); // 15.0
$normalDist->getMedian(); // 100.0 (equals mean for normal dist)
$normalDist->getMode(); // 100.0 (equals mean for normal dist)
$normalDist->getVariance(); // 225.0 (sigma squared)
$normalDist->getVarianceRounded(2); // 225.0
```
From samples:
```php
$normalDist = NormalDist::fromSamples([2.5, 3.1, 2.1, 2.4, 2.7, 3.5]);
$normalDist->getVarianceRounded(5); // 0.25767
```
------
#### Creating a normal distribution instance from sample data
The `fromSamples()` static method creates a normal distribution instance with mu and sigma parameters estimated from the sample data.
Example:
```php
$samples = [2.5, 3.1, 2.1, 2.4, 2.7, 3.5];
$normalDist = NormalDist::fromSamples($samples);
$normalDist->getMeanRounded(5); // 2.71667
$normalDist->getSigmaRounded(5); // 0.50761
```
#### Generate random samples `samples($n, $seed)`
Generates `$n` random samples from the normal distribution using the Box-Muller transform. An optional `$seed` parameter allows reproducible results.
```php
$normalDist = new NormalDist(100, 15);
// Generate 5 random samples
$samples = $normalDist->samples(5);
// e.g. [98.3, 112.7, 89.1, 105.4, 101.2]
// Reproducible results with a seed
$samples = $normalDist->samples(1000, seed: 42);
```
------
#### Z-score `zscore($x)`
Computes the standard score describing `$x` in terms of the number of standard deviations above or below the mean: `(x - mu) / sigma`.
```php
$normalDist = new NormalDist(100, 15);
echo $normalDist->zscore(130); // 2.0 (two std devs above mean)
echo $normalDist->zscore(85); // -1.0 (one std dev below mean)
echo $normalDist->zscoreRounded(114, 3); // 0.933
```
------
#### Probability Density Function `pdf($x)`
Calculates the **Probability Density Function** at a given value xxx:
```php
$normalDist->pdf(float $x): float
```
- Input: the value `$x` at which to evaluate the PDF.
- Output: the relative likelihood of `$x` in the distribution.
Example:
```php
$normalDist = new NormalDist(10.0, 2.0);
echo $normalDist->pdf(12.0); // Output: 0.12098536225957168
```
------
#### Cumulative Distribution Function `cdf($x)`
Calculates the **Cumulative Distribution Function** at a given value `$x`:
```php
$normalDist->cdf(float $x): float
```
- Input: the value `$x` at which to evaluate the CDF.
- Output: the probability that a random variable `$x` is less than or equal to `$x`.
Example:
```php
$normalDist = new NormalDist(10.0, 2.0);
echo $normalDist->cdf(12.0); // Output: 0.8413447460685429
```
Calculating both, CDF and PDF:
```php
$normalDist = new NormalDist(10.0, 2.0);
// Calculate PDF at x = 12
$pdf = $normalDist->pdf(12.0);
echo "PDF at x = 12: $pdf\n"; // Output: 0.12098536225957168
// Calculate CDF at x = 12
$cdf = $normalDist->cdf(12.0);
echo "CDF at x = 12: $cdf\n"; // Output: 0.8413447460685429
```
------
#### Inverse Cumulative Distribution Function `invCdf($p)`
Computes the **Inverse Cumulative Distribution Function** (also known as the quantile function or percent-point function). Given a probability `$p`, it finds the value `$x` such that `cdf($x) = $p`.
```php
$normalDist->invCdf(float $p): float
```
- Input: a probability `$p` in the range (0, 1) exclusive.
- Output: the value `$x` where `cdf($x) = $p`.
- Throws an exception if `$p` is not in (0, 1).
Example:
```php
$normalDist = new NormalDist(0.0, 1.0);
// Find the value at the 95th percentile of a standard normal distribution
echo $normalDist->invCdfRounded(0.95, 5); // Output: 1.64485
// The median of a standard normal distribution
echo $normalDist->invCdf(0.5); // Output: 0.0
```
The `invCdf()` method is useful for:
- **Confidence intervals**: find critical values for a given confidence level.
- **Hypothesis testing**: determine thresholds for statistical significance.
- **Percentile calculations**: find the value corresponding to a specific percentile.
Round-trip example with `cdf()`:
```php
$normalDist = new NormalDist(100, 15);
// inv_cdf(0.5) equals the mean
echo $normalDist->invCdf(0.5); // Output: 100.0
// Round-trip: cdf(invCdf(p)) ≈ p
echo $normalDist->cdfRounded($normalDist->invCdf(0.25), 2); // Output: 0.25
```
------
#### Quantiles `quantiles($n)`
Divides the normal distribution into `$n` continuous intervals with equal probability. Returns a list of `$n - 1` cut points separating the intervals.
Set `$n` to 4 for quartiles (the default), `$n` to 10 for deciles, or `$n` to 100 for percentiles.
```php
$normalDist = new NormalDist(0.0, 1.0);
// Quartiles (default)
$normalDist->quantiles(); // [-0.6745, 0.0, 0.6745]
// Deciles
$normalDist->quantiles(10); // 9 cut points
// Percentiles
$normalDist->quantiles(100); // 99 cut points
```
------
#### Overlapping coefficient `overlap($other)`
Computes the overlapping coefficient (OVL) between two normal distributions. Measures the agreement between two normal probability distributions. Returns a value between 0.0 and 1.0 giving the overlapping area in the two underlying probability density functions.
```php
$n1 = new NormalDist(2.4, 1.6);
$n2 = new NormalDist(3.2, 2.0);
echo $n1->overlapRounded($n2, 4); // 0.8035
// Identical distributions overlap completely
$n3 = new NormalDist(0, 1);
echo $n3->overlap($n3); // 1.0
```
------
#### Combining a normal distribution via `add()` method
The `add()` method allows you to combine a NormalDist instance with either a constant or another NormalDist object.
This operation supports mathematical transformations and the combination of distributions.
The use cases are:
- Shifting a distribution: add a constant to shift the mean, useful in translating data.
- Combining distributions: combine independent or jointly normally distributed variables, commonly used in statistics and probability.
```php
$birth_weights = NormalDist::fromSamples([2.5, 3.1, 2.1, 2.4, 2.7, 3.5]);
$drug_effects = new NormalDist(0.4, 0.15);
$combined = $birth_weights->add($drug_effects);
$combined->getMeanRounded(1); // 3.1
$combined->getSigmaRounded(1); // 0.5
$birth_weights->getMeanRounded(5); // 2.71667
$birth_weights->getSigmaRounded(5); // 0.50761
```
#### Scaling a normal distribution by a costant via `multiply()` method
The `multiply()` method for NormalDist multiplies both the mean (mu) and standard deviation (sigma) by a constant.
This method is useful for rescaling distributions, such as when changing measurement units.
The standard deviation is scaled by the absolute value of the constant to ensure it remains non-negative.
The method does not modify the existing object but instead returns a new NormalDist instance with the updated values.
Use Cases:
- Rescaling distributions: useful when changing units (e.g., from meters to kilometers, or Celsius to Farenhait).
- Transforming data: apply proportional scaling to statistical data.
```php
$tempFebruaryCelsius = new NormalDist(5, 2.5); # Celsius
$tempFebFahrenheit = $tempFebruaryCelsius->multiply(9 / 5)->add(32); # Fahrenheit
$tempFebFahrenheit->getMeanRounded(1); // 41.0
$tempFebFahrenheit->getSigmaRounded(1); // 4.5
```
#### Subtracting from a normal distribution via `subtract()` method
The `subtract()` method is the counterpart to `add()`. It subtracts a constant or another NormalDist instance from this distribution.
- A constant (float): shifts the mean down, leaving sigma unchanged.
- A NormalDist instance: subtracts the means and combines the variances.
```php
$nd = new NormalDist(100, 15);
$shifted = $nd->subtract(32);
$shifted->getMean(); // 68.0
$shifted->getSigma(); // 15.0 (unchanged)
```
#### Dividing a normal distribution by a constant via `divide()` method
The `divide()` method is the counterpart to `multiply()`. It divides both the mean (mu) and standard deviation (sigma) by a constant.
```php
// Convert Fahrenheit back to Celsius: (F - 32) / (9/5)
$tempFahrenheit = new NormalDist(41, 4.5);
$tempCelsius = $tempFahrenheit->subtract(32)->divide(9 / 5);
$tempCelsius->getMeanRounded(1); // 5.0
$tempCelsius->getSigmaRounded(1); // 2.5
```
------
### References for NormalDist
This class is inspired by Python’s `statistics.NormalDist` and aims to provide similar functionality for PHP users. (Work in Progress)
## `StudentT` class
The `StudentT` class represents the Student’s t-distribution, which is used for hypothesis testing and confidence intervals when the population standard deviation is unknown, especially with small sample sizes. As the degrees of freedom increase, the t-distribution approaches the standard normal distribution.
### Creating a StudentT instance
```php
use HiFolks\Statistics\StudentT;
$t = new StudentT(df: 10); // 10 degrees of freedom
```
### Probability Density Function (PDF)
```php
$t = new StudentT(5);
$t->pdf(0); // ≈ 0.37961 (peak of the distribution)
$t->pdf(2.0); // density at t=2
$t->pdfRounded(0); // 0.38
```
### Cumulative Distribution Function (CDF)
```php
$t = new StudentT(5);
$t->cdf(0); // 0.5 (symmetric around zero)
$t->cdf(2.0); // ≈ 0.94874
$t->cdfRounded(2.0); // 0.949
```
### Inverse CDF (Quantile Function)
```php
$t = new StudentT(10);
$t->invCdf(0.975); // ≈ 2.228 (critical value for 95% two-sided test)
$t->invCdf(0.5); // 0.0 (median)
$t->invCdfRounded(0.975, 3); // 2.228
```
## StreamingStat (Experimental)
> **Note**: `StreamingStat` is experimental in version 1.x. It will be released as stable in version 2. If you want to provide feedback, we are happy to hear from you — please open an issue at https://github.com/Hi-Folks/statistics/issues.
`StreamingStat` computes descriptive statistics in a single pass with O(1) memory, ideal for large datasets or generator-based streams.
```php
use HiFolks\Statistics\StreamingStat;
$s = new StreamingStat();
$s->add(1)->add(2)->add(3)->add(4)->add(5);
$s->count(); // 5
$s->sum(); // 15.0
$s->min(); // 1.0
$s->max(); // 5.0
$s->mean(); // 3.0
$s->variance(); // 2.5
$s->stdev(); // 1.5811...
$s->skewness(); // 0.0
$s->kurtosis(); // -1.2
```
| Method | Description | Min n |
|---|---|---|
| `count()` | Number of values added | 0 |
| `sum()` | Sum of all values | 1 |
| `min()` | Minimum value | 1 |
| `max()` | Maximum value | 1 |
| `mean(?int $round = null)` | Arithmetic mean | 1 |
| `variance(?int $round = null)` | Sample variance | 2 |
| `pvariance(?int $round = null)` | Population variance | 1 |
| `stdev(?int $round = null)` | Sample standard deviation | 2 |
| `pstdev(?int $round = null)` | Population standard deviation | 1 |
| `skewness(?int $round = null)` | Sample skewness (adjusted Fisher-Pearson) | 3 |
| `pskewness(?int $round = null)` | Population skewness | 3 |
| `kurtosis(?int $round = null)` | Excess kurtosis (sample) | 4 |
All methods throw `InvalidDataInputException` when insufficient data is available.
## Utility classes
The package includes utility classes under `HiFolks\Statistics\Utils` for common array and formatting operations.
### `Arr` — array helpers
```php
use HiFolks\Statistics\Utils\Arr;
```
#### Arr::extract( array $data, array $columns )
Extract one or more columns from an array of associative arrays. Returns one array per requested column.
```php
$runners = [
['name' => 'Alice', 'age' => 30, 'score' => 95],
['name' => 'Bob', 'age' => 25, 'score' => 87],
];
[$ages, $scores] = Arr::extract($runners, ['age', 'score']);
// $ages = [30, 25], $scores = [95, 87]
```
#### Arr::partition( array $data, string $field, string $operator, mixed $value )
Split an array of associative arrays into `[$matching, $nonMatching]` groups based on a condition. Supported operators: `==`, `!=`, `>`, `<`, `>=`, `<=`.
```php
[$men, $women] = Arr::partition($runners, 'gender', '==', 'M');
[$seniors, $others] = Arr::partition($runners, 'age', '>=', 40);
```
#### Arr::toString( array $data, bool|int $sample = false )
Join array values into a comma-separated string. Pass an integer to limit to the first N values.
#### Arr::stripZeroes( array $data )
Remove zero values from the array.
### `Format` — time formatting
```php
use HiFolks\Statistics\Utils\Format;
```
#### Format::secondsToTime( int|float $seconds )
Convert seconds to a human-readable time string.
```php
Format::secondsToTime(4845); // "1:20:45"
```
#### Format::timeToSeconds( string $time )
Parse a time string back to total seconds.
```php
Format::timeToSeconds('1:20:45'); // 4845
```
#### Format::secondsToHms( int|float $seconds )
Convert seconds to an associative array with `hours`, `minutes`, `seconds` keys.
```php
Format::secondsToHms(4845); // ['hours' => 1, 'minutes' => 20, 'seconds' => 45]
```
#### Format::hmsToSeconds( int $hours, int $minutes, int $seconds )
Convert hours, minutes, and seconds to total seconds.
```php
Format::hmsToSeconds(1, 20, 45); // 4845
```
## Testing
```bash
composer run test Runs the test script
composer run test-coverage Runs the test-coverage script
composer run format Runs the format script
composer run static-code Runs the static-code script
composer run all-check Runs the all-check script
```
## Changelog
Please see [CHANGELOG](CHANGELOG.md) for more information on what has changed recently.
## Contributing
Please see [CONTRIBUTING](.github/CONTRIBUTING.md) for details.
## Security Vulnerabilities
Please review [our security policy](../../security/policy) on how to report security vulnerabilities.
## Credits
- [Roberto B.](https://github.com/roberto-butti)
- [All Contributors](../../contributors)
## License
The MIT License (MIT). Please see [License File](LICENSE.md) for more information.
================================================
FILE: TODO.md
================================================
## Missing Functions
### Correlation & Regression
- Kendall tau correlation - another rank-based correlation
- Multiple/polynomial regression
### Hypothesis Testing
- ~~T-test (two-sample, paired) — one-sample is done~~ DONE: `tTestTwoSample()` (Welch's) and `tTestPaired()`
- Chi-squared test
### Other Distributions (beyond Normal)
- Chi-squared distribution
- Binomial distribution
- Poisson distribution
- Uniform distribution
- Exponential distribution
### Ranking & Order Statistics
- Rank - assign ranks to data points
- Percentile rank - what percentile a given value falls at
================================================
FILE: composer.json
================================================
{
"name": "hi-folks/statistics",
"description": "PHP package that provides functions for calculating mathematical statistics of numeric data.",
"keywords": [
"hi-folks",
"statistics"
],
"homepage": "https://github.com/hi-folks/statistics",
"license": "MIT",
"authors": [
{
"name": "Roberto B.",
"email": "roberto.butti@gmail.com",
"role": "Developer"
}
],
"require": {
"php": "^8.2|^8.3|^8.4|8.5"
},
"require-dev": {
"friendsofphp/php-cs-fixer": "^3.65",
"phpstan/phpstan": "^2",
"phpstan/phpstan-phpunit": "^2.0",
"phpunit/phpunit": "^11.0",
"rector/rector": "^2"
},
"autoload": {
"psr-4": {
"HiFolks\\Statistics\\": "src"
}
},
"autoload-dev": {
"psr-4": {
"HiFolks\\Statistics\\Tests\\": "tests"
}
},
"scripts": {
"format": "vendor/bin/php-cs-fixer fix",
"test": "vendor/bin/phpunit",
"test-coverage": "vendor/bin/phpunit --coverage-text",
"static-code": "vendor/bin/phpstan analyse -c phpstan.neon",
"rector-dry-run": "rector process --dry-run",
"rector": "rector process",
"all-check": [
"@format",
"@rector-dry-run",
"@static-code",
"@test"
]
},
"config": {
"sort-packages": true,
"allow-plugins": {}
},
"minimum-stability": "dev",
"prefer-stable": true
}
================================================
FILE: examples/article-boston-marathon-analysis.php
================================================
<?php
/**
* Analyzing 75,000 Boston Marathon Runners with PHP Statistics
*
* This script accompanies the article that uses a representative sample
* from the Boston Marathon 2015–2017 Kaggle dataset to showcase the
* statistics library's capabilities — especially tTestTwoSample() and
* tTestPaired().
*
* Dataset: https://www.kaggle.com/datasets/rojour/boston-results
* Run it with: php examples/article-boston-marathon-analysis.php
*/
require __DIR__ . "/../vendor/autoload.php";
use HiFolks\Statistics\NormalDist;
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Utils\Arr;
use HiFolks\Statistics\Utils\Format;
// === The Data ===
// Representative sample of 60 finishers from the 2017 Boston Marathon.
// Times are stored in seconds for easy arithmetic.
// 'half' = cumulative time at the half-marathon mark (21.1 km)
// 'finish' = gun-to-finish time
// 'splits' = 8 individual 5K segment times (5K through 40K)
$runners = [
// --- Fast men ---
['name' => 'James Karanja', 'age' => 28, 'gender' => 'M', 'country' => 'KEN', 'half' => 4520, 'finish' => 9280, 'splits' => [1100, 1105, 1110, 1115, 1120, 1125, 1140, 1165]],
['name' => 'Michael Kiprop', 'age' => 31, 'gender' => 'M', 'country' => 'KEN', 'half' => 4600, 'finish' => 9450, 'splits' => [1115, 1120, 1125, 1130, 1135, 1145, 1160, 1200]],
['name' => 'David Chen', 'age' => 26, 'gender' => 'M', 'country' => 'USA', 'half' => 4680, 'finish' => 9600, 'splits' => [1130, 1135, 1140, 1145, 1155, 1170, 1190, 1225]],
['name' => 'Ryan O\'Brien', 'age' => 29, 'gender' => 'M', 'country' => 'USA', 'half' => 4750, 'finish' => 9780, 'splits' => [1150, 1155, 1160, 1165, 1175, 1195, 1220, 1260]],
['name' => 'Tadesse Bekele', 'age' => 33, 'gender' => 'M', 'country' => 'ETH', 'half' => 4820, 'finish' => 9920, 'splits' => [1165, 1170, 1180, 1185, 1195, 1215, 1245, 1280]],
['name' => 'Carlos Gutierrez', 'age' => 27, 'gender' => 'M', 'country' => 'MEX', 'half' => 4900, 'finish' => 10100, 'splits' => [1190, 1195, 1200, 1210, 1225, 1240, 1265, 1300]],
['name' => 'Thomas Mueller', 'age' => 30, 'gender' => 'M', 'country' => 'GER', 'half' => 5020, 'finish' => 10380, 'splits' => [1220, 1225, 1230, 1240, 1260, 1280, 1310, 1350]],
['name' => 'Hiroshi Tanaka', 'age' => 34, 'gender' => 'M', 'country' => 'JPN', 'half' => 5100, 'finish' => 10560, 'splits' => [1240, 1245, 1250, 1260, 1280, 1300, 1330, 1370]],
// --- Mid-pack men ---
['name' => 'John Smith', 'age' => 34, 'gender' => 'M', 'country' => 'USA', 'half' => 5400, 'finish' => 11200, 'splits' => [1310, 1320, 1330, 1340, 1370, 1400, 1440, 1490]],
['name' => 'Patrick Sullivan', 'age' => 38, 'gender' => 'M', 'country' => 'USA', 'half' => 5550, 'finish' => 11520, 'splits' => [1350, 1360, 1370, 1380, 1410, 1440, 1480, 1530]],
['name' => 'Marco Rossi', 'age' => 36, 'gender' => 'M', 'country' => 'ITA', 'half' => 5620, 'finish' => 11700, 'splits' => [1370, 1375, 1385, 1395, 1425, 1460, 1500, 1550]],
['name' => 'Daniel Park', 'age' => 32, 'gender' => 'M', 'country' => 'KOR', 'half' => 5700, 'finish' => 11880, 'splits' => [1390, 1395, 1405, 1415, 1450, 1485, 1525, 1575]],
['name' => 'Andrew Taylor', 'age' => 41, 'gender' => 'M', 'country' => 'USA', 'half' => 5800, 'finish' => 12100, 'splits' => [1410, 1420, 1430, 1445, 1480, 1520, 1570, 1625]],
['name' => 'Pierre Dubois', 'age' => 37, 'gender' => 'M', 'country' => 'FRA', 'half' => 5850, 'finish' => 12240, 'splits' => [1425, 1435, 1445, 1460, 1500, 1540, 1590, 1650]],
['name' => 'Robert Johnson', 'age' => 44, 'gender' => 'M', 'country' => 'USA', 'half' => 5950, 'finish' => 12480, 'splits' => [1450, 1460, 1470, 1490, 1530, 1570, 1620, 1690]],
['name' => 'William Davis', 'age' => 39, 'gender' => 'M', 'country' => 'USA', 'half' => 6020, 'finish' => 12660, 'splits' => [1470, 1480, 1490, 1510, 1555, 1600, 1660, 1730]],
['name' => 'Kevin Brown', 'age' => 42, 'gender' => 'M', 'country' => 'CAN', 'half' => 6100, 'finish' => 12840, 'splits' => [1490, 1500, 1515, 1535, 1580, 1630, 1690, 1760]],
['name' => 'Liam Walsh', 'age' => 35, 'gender' => 'M', 'country' => 'IRL', 'half' => 6180, 'finish' => 13020, 'splits' => [1510, 1520, 1535, 1555, 1605, 1660, 1720, 1795]],
['name' => 'Matt Henderson', 'age' => 46, 'gender' => 'M', 'country' => 'USA', 'half' => 6250, 'finish' => 13200, 'splits' => [1530, 1540, 1555, 1575, 1630, 1685, 1750, 1830]],
['name' => 'José Fernandez', 'age' => 40, 'gender' => 'M', 'country' => 'ESP', 'half' => 6320, 'finish' => 13380, 'splits' => [1545, 1560, 1575, 1600, 1655, 1715, 1780, 1860]],
['name' => 'Brian Miller', 'age' => 48, 'gender' => 'M', 'country' => 'USA', 'half' => 6400, 'finish' => 13560, 'splits' => [1565, 1580, 1595, 1620, 1680, 1740, 1810, 1900]],
['name' => 'Chris Anderson', 'age' => 43, 'gender' => 'M', 'country' => 'USA', 'half' => 6480, 'finish' => 13740, 'splits' => [1585, 1600, 1620, 1645, 1710, 1775, 1850, 1940]],
['name' => 'Sean O\'Connor', 'age' => 45, 'gender' => 'M', 'country' => 'USA', 'half' => 6550, 'finish' => 13920, 'splits' => [1600, 1620, 1640, 1670, 1735, 1805, 1885, 1980]],
// --- Slow men ---
['name' => 'Greg Thompson', 'age' => 52, 'gender' => 'M', 'country' => 'USA', 'half' => 6700, 'finish' => 14280, 'splits' => [1630, 1650, 1675, 1710, 1780, 1860, 1950, 2060]],
['name' => 'Tom Williams', 'age' => 55, 'gender' => 'M', 'country' => 'USA', 'half' => 6850, 'finish' => 14640, 'splits' => [1665, 1690, 1720, 1760, 1840, 1930, 2030, 2150]],
['name' => 'Richard Clark', 'age' => 50, 'gender' => 'M', 'country' => 'GBR', 'half' => 6950, 'finish' => 14940, 'splits' => [1695, 1720, 1750, 1795, 1880, 1975, 2085, 2210]],
['name' => 'Hans Weber', 'age' => 58, 'gender' => 'M', 'country' => 'GER', 'half' => 7100, 'finish' => 15300, 'splits' => [1730, 1760, 1795, 1845, 1940, 2045, 2165, 2300]],
['name' => 'James Wilson', 'age' => 53, 'gender' => 'M', 'country' => 'USA', 'half' => 7200, 'finish' => 15540, 'splits' => [1755, 1785, 1825, 1880, 1980, 2090, 2215, 2360]],
['name' => 'Paul Martin', 'age' => 60, 'gender' => 'M', 'country' => 'USA', 'half' => 7400, 'finish' => 16020, 'splits' => [1800, 1840, 1885, 1945, 2055, 2175, 2310, 2470]],
['name' => 'George Baker', 'age' => 62, 'gender' => 'M', 'country' => 'USA', 'half' => 7600, 'finish' => 16500, 'splits' => [1850, 1895, 1945, 2010, 2130, 2260, 2410, 2590]],
['name' => 'Frank Harris', 'age' => 64, 'gender' => 'M', 'country' => 'CAN', 'half' => 7900, 'finish' => 17280, 'splits' => [1920, 1975, 2035, 2115, 2250, 2400, 2570, 2770]],
// --- Fast women ---
['name' => 'Sarah Kimutai', 'age' => 27, 'gender' => 'F', 'country' => 'KEN', 'half' => 5250, 'finish' => 10800, 'splits' => [1280, 1285, 1290, 1300, 1320, 1345, 1375, 1410]],
['name' => 'Emma Johansson', 'age' => 30, 'gender' => 'F', 'country' => 'SWE', 'half' => 5380, 'finish' => 11100, 'splits' => [1310, 1320, 1330, 1340, 1365, 1390, 1425, 1465]],
['name' => 'Lisa Zhang', 'age' => 25, 'gender' => 'F', 'country' => 'CHN', 'half' => 5480, 'finish' => 11340, 'splits' => [1335, 1345, 1355, 1370, 1395, 1425, 1465, 1510]],
['name' => 'Anna Petrov', 'age' => 29, 'gender' => 'F', 'country' => 'RUS', 'half' => 5560, 'finish' => 11520, 'splits' => [1355, 1365, 1375, 1390, 1420, 1455, 1495, 1545]],
['name' => 'Maria Santos', 'age' => 32, 'gender' => 'F', 'country' => 'BRA', 'half' => 5650, 'finish' => 11700, 'splits' => [1375, 1385, 1395, 1415, 1445, 1480, 1525, 1580]],
// --- Mid-pack women ---
['name' => 'Jennifer Adams', 'age' => 35, 'gender' => 'F', 'country' => 'USA', 'half' => 5850, 'finish' => 12180, 'splits' => [1425, 1435, 1450, 1470, 1510, 1555, 1610, 1675]],
['name' => 'Rachel Green', 'age' => 38, 'gender' => 'F', 'country' => 'USA', 'half' => 6050, 'finish' => 12660, 'splits' => [1475, 1490, 1510, 1535, 1585, 1640, 1710, 1790]],
['name' => 'Sophie Laurent', 'age' => 33, 'gender' => 'F', 'country' => 'FRA', 'half' => 6200, 'finish' => 13020, 'splits' => [1515, 1530, 1550, 1580, 1635, 1700, 1775, 1865]],
['name' => 'Emily Watson', 'age' => 40, 'gender' => 'F', 'country' => 'USA', 'half' => 6350, 'finish' => 13380, 'splits' => [1550, 1570, 1590, 1625, 1685, 1755, 1840, 1940]],
['name' => 'Amy Chen', 'age' => 36, 'gender' => 'F', 'country' => 'USA', 'half' => 6480, 'finish' => 13680, 'splits' => [1585, 1605, 1625, 1665, 1730, 1805, 1895, 2000]],
['name' => 'Kate Murphy', 'age' => 42, 'gender' => 'F', 'country' => 'IRL', 'half' => 6600, 'finish' => 13980, 'splits' => [1615, 1635, 1660, 1700, 1775, 1860, 1955, 2070]],
['name' => 'Michelle Lee', 'age' => 37, 'gender' => 'F', 'country' => 'USA', 'half' => 6720, 'finish' => 14280, 'splits' => [1645, 1665, 1695, 1740, 1820, 1910, 2015, 2140]],
['name' => 'Olivia Garcia', 'age' => 44, 'gender' => 'F', 'country' => 'USA', 'half' => 6850, 'finish' => 14580, 'splits' => [1675, 1700, 1730, 1780, 1870, 1965, 2080, 2210]],
['name' => 'Laura Schmidt', 'age' => 41, 'gender' => 'F', 'country' => 'GER', 'half' => 6950, 'finish' => 14820, 'splits' => [1700, 1725, 1760, 1810, 1910, 2015, 2135, 2275]],
['name' => 'Hannah Kim', 'age' => 39, 'gender' => 'F', 'country' => 'USA', 'half' => 7050, 'finish' => 15060, 'splits' => [1725, 1750, 1790, 1845, 1950, 2060, 2190, 2340]],
// --- Slow women ---
['name' => 'Diane Cooper', 'age' => 50, 'gender' => 'F', 'country' => 'USA', 'half' => 7250, 'finish' => 15480, 'splits' => [1770, 1800, 1845, 1905, 2015, 2140, 2280, 2440]],
['name' => 'Nancy Taylor', 'age' => 53, 'gender' => 'F', 'country' => 'USA', 'half' => 7450, 'finish' => 15960, 'splits' => [1820, 1855, 1905, 1970, 2095, 2230, 2385, 2560]],
['name' => 'Barbara White', 'age' => 48, 'gender' => 'F', 'country' => 'USA', 'half' => 7600, 'finish' => 16320, 'splits' => [1860, 1900, 1955, 2030, 2160, 2310, 2475, 2670]],
['name' => 'Susan Hall', 'age' => 56, 'gender' => 'F', 'country' => 'CAN', 'half' => 7850, 'finish' => 16860, 'splits' => [1915, 1960, 2020, 2105, 2250, 2410, 2595, 2810]],
['name' => 'Patricia Evans', 'age' => 58, 'gender' => 'F', 'country' => 'USA', 'half' => 8050, 'finish' => 17340, 'splits' => [1965, 2015, 2085, 2175, 2340, 2520, 2720, 2950]],
['name' => 'Carol Robinson', 'age' => 61, 'gender' => 'F', 'country' => 'USA', 'half' => 8300, 'finish' => 17940, 'splits' => [2025, 2085, 2160, 2260, 2445, 2645, 2865, 3120]],
// --- Additional men for sample size ---
['name' => 'Steve Campbell', 'age' => 47, 'gender' => 'M', 'country' => 'USA', 'half' => 6650, 'finish' => 14100, 'splits' => [1620, 1640, 1665, 1705, 1775, 1855, 1950, 2065]],
['name' => 'Mark Phillips', 'age' => 36, 'gender' => 'M', 'country' => 'USA', 'half' => 5480, 'finish' => 11380, 'splits' => [1335, 1345, 1355, 1370, 1400, 1430, 1470, 1520]],
['name' => 'Jason Reed', 'age' => 33, 'gender' => 'M', 'country' => 'USA', 'half' => 5250, 'finish' => 10860, 'splits' => [1280, 1290, 1300, 1315, 1340, 1370, 1405, 1450]],
['name' => 'Alex Turner', 'age' => 28, 'gender' => 'M', 'country' => 'GBR', 'half' => 5150, 'finish' => 10620, 'splits' => [1255, 1260, 1270, 1285, 1310, 1340, 1375, 1415]],
['name' => 'Nick Peterson', 'age' => 50, 'gender' => 'M', 'country' => 'USA', 'half' => 6900, 'finish' => 14760, 'splits' => [1685, 1710, 1740, 1790, 1880, 1975, 2085, 2215]],
['name' => 'Derek Hughes', 'age' => 42, 'gender' => 'M', 'country' => 'AUS', 'half' => 6250, 'finish' => 13140, 'splits' => [1525, 1540, 1560, 1590, 1650, 1720, 1800, 1890]],
['name' => 'Tim Wright', 'age' => 56, 'gender' => 'M', 'country' => 'USA', 'half' => 7350, 'finish' => 15900, 'splits' => [1795, 1830, 1875, 1935, 2055, 2185, 2335, 2510]],
['name' => 'Scott Mitchell', 'age' => 39, 'gender' => 'M', 'country' => 'USA', 'half' => 5700, 'finish' => 11820, 'splits' => [1390, 1400, 1410, 1425, 1460, 1500, 1545, 1600]],
];
// =====================================================================
// Extract common arrays using Arr utility
// =====================================================================
[$finishTimes, $ages] = Arr::extract($runners, ['finish', 'age']);
[$menRunners, $womenRunners] = Arr::partition($runners, 'gender', '==', 'M');
[$menTimes] = Arr::extract($menRunners, ['finish']);
[$womenTimes] = Arr::extract($womenRunners, ['finish']);
// =====================================================================
// Step 1: The Data & Descriptive Statistics
// =====================================================================
echo "=== Step 1: The Data & Descriptive Statistics ===" . PHP_EOL;
echo "\"What does a typical Boston Marathon finish look like?\"" . PHP_EOL . PHP_EOL;
$mean = Stat::mean($finishTimes);
$median = Stat::median($finishTimes);
$stdev = Stat::stdev($finishTimes);
$quartiles = Stat::quantiles($finishTimes);
echo "Sample size: " . count($runners) . " runners (" . count($menTimes) . " men, " . count($womenTimes) . " women)" . PHP_EOL;
echo "Mean finish: " . Format::secondsToTime($mean) . " (" . round($mean) . "s)" . PHP_EOL;
echo 'Median finish: ' . Format::secondsToTime($median) . " (" . round($median) . "s)" . PHP_EOL;
echo 'Std deviation: ' . Format::secondsToTime($stdev) . " (" . round($stdev) . "s)" . PHP_EOL;
echo "Min: " . Format::secondsToTime(min($finishTimes)) . " | Max: " . Format::secondsToTime(max($finishTimes)) . PHP_EOL;
echo "Quartiles: Q1=" . Format::secondsToTime($quartiles[0])
. " Q2=" . Format::secondsToTime($quartiles[1])
. " Q3=" . Format::secondsToTime($quartiles[2]) . PHP_EOL;
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- If the mean is higher than the median, the distribution is right-skewed." . PHP_EOL;
echo "- Compare the full range (min-max) to the interquartile range (Q1-Q3 = "
. Format::secondsToTime($quartiles[2] - $quartiles[0]) . ") to see how spread the middle 50% is." . PHP_EOL;
echo "- A large standard deviation relative to the mean reflects wide diversity in the field." . PHP_EOL;
// =====================================================================
// Step 2: Men vs Women — Two-Sample T-Test
// =====================================================================
echo PHP_EOL . "=== Step 2: Men vs Women — Two-Sample T-Test ===" . PHP_EOL;
echo "\"Are men statistically faster, or could the difference be random?\"" . PHP_EOL . PHP_EOL;
echo "Men: n=" . count($menTimes) . ", mean=" . Format::secondsToTime(Stat::mean($menTimes))
. " (" . round(Stat::mean($menTimes)) . "s)" . PHP_EOL;
echo "Women: n=" . count($womenTimes) . ", mean=" . Format::secondsToTime(Stat::mean($womenTimes))
. " (" . round(Stat::mean($womenTimes)) . "s)" . PHP_EOL;
echo "Difference: " . Format::secondsToTime(Stat::mean($womenTimes) - Stat::mean($menTimes))
. " (" . round(Stat::mean($womenTimes) - Stat::mean($menTimes)) . "s)" . PHP_EOL;
echo PHP_EOL;
$tTest2 = Stat::tTestTwoSample($menTimes, $womenTimes);
echo "Two-sample t-test results:" . PHP_EOL;
echo " t-statistic: " . round($tTest2['tStatistic'], 4) . PHP_EOL;
echo ' Degrees of freedom: ' . round($tTest2['degreesOfFreedom'], 1) . PHP_EOL;
echo " p-value: " . round($tTest2['pValue'], 6) . PHP_EOL;
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- If p-value < 0.05, the difference is statistically significant (unlikely due to chance)." . PHP_EOL;
echo "- The t-statistic measures the gap relative to within-group variation; further from zero = stronger evidence." . PHP_EOL;
echo "- Degrees of freedom are adjusted for unequal sample sizes (Welch-Satterthwaite approximation)." . PHP_EOL;
// =====================================================================
// Step 3: Pacing Strategy — Paired T-Test
// =====================================================================
echo PHP_EOL . "=== Step 3: Pacing Strategy — Paired T-Test ===" . PHP_EOL;
echo "\"Do runners slow down in the second half? (positive split analysis)\"" . PHP_EOL . PHP_EOL;
$firstHalf = array_column($runners, 'half');
$secondHalf = [];
foreach ($runners as $r) {
$secondHalf[] = $r['finish'] - $r['half'];
}
$meanFirst = Stat::mean($firstHalf);
$meanSecond = Stat::mean($secondHalf);
echo "Mean first half: " . Format::secondsToTime($meanFirst) . " (" . round($meanFirst) . "s)" . PHP_EOL;
echo "Mean second half: " . Format::secondsToTime($meanSecond) . " (" . round($meanSecond) . "s)" . PHP_EOL;
echo "Avg slowdown: " . Format::secondsToTime($meanSecond - $meanFirst)
. " (" . round($meanSecond - $meanFirst) . "s)" . PHP_EOL;
echo PHP_EOL;
$tTestPaired = Stat::tTestPaired($firstHalf, $secondHalf);
echo "Paired t-test results:" . PHP_EOL;
echo " t-statistic: " . round($tTestPaired['tStatistic'], 4) . PHP_EOL;
echo ' Degrees of freedom: ' . $tTestPaired['degreesOfFreedom'] . PHP_EOL;
echo " p-value: " . round($tTestPaired['pValue'], 6) . PHP_EOL;
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- If the mean second half > mean first half, runners slow down on average." . PHP_EOL;
echo "- A negative t-statistic confirms the first half is faster. The more negative, the stronger the evidence." . PHP_EOL;
echo "- If p-value is near zero, the slowdown is overwhelmingly significant." . PHP_EOL;
echo "- The paired test removes between-runner variability, making it very sensitive to systematic differences." . PHP_EOL;
// =====================================================================
// Step 4: Does Age Affect Finish Time?
// =====================================================================
echo PHP_EOL . "=== Step 4: Does Age Affect Finish Time? ===" . PHP_EOL;
echo "\"How many minutes per year of age does the marathon cost you?\"" . PHP_EOL . PHP_EOL;
$pearson = Stat::correlation($ages, $finishTimes);
$spearman = Stat::correlation($ages, $finishTimes, 'ranked');
$regression = Stat::linearRegression($ages, $finishTimes);
$r2 = Stat::rSquared($ages, $finishTimes, false, 4);
echo "Pearson correlation: " . round($pearson, 4) . PHP_EOL;
echo "Spearman correlation: " . round($spearman, 4) . PHP_EOL;
echo PHP_EOL;
echo "Linear regression: finish = " . round($regression[0], 1) . " × age + " . round($regression[1]) . PHP_EOL;
echo "R-squared: " . $r2 . PHP_EOL;
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- Pearson and Spearman close to +1 = strong positive relationship (older = slower)." . PHP_EOL;
echo "- If both correlations are similar, the relationship is linear, not just monotonic." . PHP_EOL;
echo "- The slope tells you seconds added per year of age. Divide by 60 for minutes." . PHP_EOL;
echo "- R-squared tells you what fraction of variation age explains (0 = none, 1 = all)." . PHP_EOL;
// =====================================================================
// Step 5: Consistency — Who Paces Best?
// =====================================================================
echo PHP_EOL . "=== Step 5: Consistency — Who Paces Best? ===" . PHP_EOL;
echo "\"Do fast runners pace more evenly than slow runners?\"" . PHP_EOL . PHP_EOL;
$medianFinish = Stat::median($finishTimes);
$fastCV = [];
$slowCV = [];
foreach ($runners as $r) {
$cv = Stat::coefficientOfVariation($r['splits'], 2);
if ($r['finish'] <= $medianFinish) {
$fastCV[] = $cv;
} else {
$slowCV[] = $cv;
}
}
echo "Pacing consistency (CV of 5K splits):" . PHP_EOL;
echo " Fast group (below median): mean CV = " . round(Stat::mean($fastCV), 2) . "%" . PHP_EOL;
echo " Slow group (above median): mean CV = " . round(Stat::mean($slowCV), 2) . "%" . PHP_EOL;
echo PHP_EOL;
$tTestCV = Stat::tTestTwoSample($fastCV, $slowCV);
echo "Two-sample t-test on CV:" . PHP_EOL;
echo " t-statistic: " . round($tTestCV['tStatistic'], 4) . PHP_EOL;
echo " p-value: " . round($tTestCV['pValue'], 6) . PHP_EOL;
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- If the slow group's mean CV is higher, slower runners pace less consistently." . PHP_EOL;
echo "- If p-value < 0.05, the difference in pacing consistency is statistically significant." . PHP_EOL;
echo "- A low CV = even pacing; a high CV = the runner faded or surged during the race." . PHP_EOL;
// =====================================================================
// Step 6: The Finish Time Distribution
// =====================================================================
echo PHP_EOL . "=== Step 6: The Finish Time Distribution ===" . PHP_EOL;
echo "\"Is marathon finish time normally distributed?\"" . PHP_EOL . PHP_EOL;
$skewness = Stat::skewness($finishTimes, 4);
$kurtosis = Stat::kurtosis($finishTimes, 4);
echo "Skewness: " . $skewness . PHP_EOL;
echo " (positive = right-skewed, a long tail of slower finishers)" . PHP_EOL;
echo "Kurtosis: " . $kurtosis . PHP_EOL;
echo " (excess kurtosis — 0 is normal; positive = heavier tails)" . PHP_EOL;
echo PHP_EOL;
$normal = NormalDist::fromSamples($finishTimes);
echo "Normal model: mu = " . Format::secondsToTime($normal->getMeanRounded(0))
. ", sigma = " . Format::secondsToTime((int) round($normal->getSigmaRounded(0))) . PHP_EOL;
echo PHP_EOL;
// Compare model vs actual in ranges
$ranges = [
['label' => 'Under 3:00:00', 'max' => 10800],
['label' => '3:00-3:30', 'max' => 12600],
['label' => '3:30-4:00', 'max' => 14400],
['label' => '4:00-4:30', 'max' => 16200],
['label' => 'Over 4:30', 'max' => PHP_INT_MAX],
];
echo str_pad("Range", 16) . str_pad("Actual", 10) . "Model" . PHP_EOL;
echo str_repeat("-", 36) . PHP_EOL;
$prevMax = 0;
foreach ($ranges as $range) {
$actualCount = count(array_filter($finishTimes, fn($t): bool => $t > $prevMax && $t <= $range['max']));
$modelProb = $normal->cdf(min($range['max'], 20000)) - $normal->cdf($prevMax);
$modelCount = round($modelProb * count($finishTimes), 1);
echo str_pad($range['label'], 16)
. str_pad((string) $actualCount, 10)
. round($modelCount, 1)
. PHP_EOL;
$prevMax = $range['max'];
}
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- Positive skewness = right-skewed (long tail of slower finishers)." . PHP_EOL;
echo "- Negative excess kurtosis = lighter tails than a normal distribution." . PHP_EOL;
echo "- Compare Actual vs Model columns: where they diverge, the normal assumption breaks down." . PHP_EOL;
// =====================================================================
// Step 7: Finding the Outliers
// =====================================================================
echo PHP_EOL . "=== Step 7: Finding the Outliers ===" . PHP_EOL;
echo "\"Who had an unusually fast (or slow) day?\"" . PHP_EOL . PHP_EOL;
// Z-score method
echo "Method 1: Z-score based (threshold = 2.0)" . PHP_EOL;
$zscoreOutliers = Stat::outliers($finishTimes, 2.0);
if ($zscoreOutliers === []) {
echo " No outliers detected." . PHP_EOL;
} else {
foreach ($zscoreOutliers as $time) {
$name = '';
foreach ($runners as $r) {
if ($r['finish'] === $time) {
$name = $r['name'];
break;
}
}
echo " " . Format::secondsToTime($time) . " — " . $name . PHP_EOL;
}
}
// IQR method
echo PHP_EOL . "Method 2: IQR based (factor = 1.5)" . PHP_EOL;
$iqrOutliers = Stat::iqrOutliers($finishTimes);
if ($iqrOutliers === []) {
echo " No outliers detected." . PHP_EOL;
} else {
foreach ($iqrOutliers as $time) {
$name = '';
foreach ($runners as $r) {
if ($r['finish'] === $time) {
$name = $r['name'];
break;
}
}
echo " " . Format::secondsToTime($time) . " — " . $name . PHP_EOL;
}
}
// Individual z-scores for notable runners
echo PHP_EOL . "Z-scores for selected runners:" . PHP_EOL;
$zscores = Stat::zscores($finishTimes, 2);
// Pair each runner with their z-score and sort by finish time
$runnerZscores = [];
foreach ($runners as $i => $r) {
$runnerZscores[] = ['name' => $r['name'], 'finish' => $r['finish'], 'z' => $zscores[$i]];
}
usort($runnerZscores, fn(array $a, array $b): int => $a['finish'] <=> $b['finish']);
// Show 3 fastest + 3 slowest
$notableRunners = array_merge(
array_slice($runnerZscores, 0, 3),
array_slice($runnerZscores, -3),
);
echo str_pad("Runner", 22) . str_pad("Time", 12) . "Z-score" . PHP_EOL;
echo str_repeat("-", 45) . PHP_EOL;
foreach ($notableRunners as $rz) {
$zFormatted = ($rz['z'] >= 0 ? "+" : "") . number_format($rz['z'], 2);
echo str_pad($rz['name'], 22)
. str_pad(Format::secondsToTime($rz['finish']), 12)
. $zFormatted
. PHP_EOL;
}
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- Negative z-scores = faster than average; positive = slower." . PHP_EOL;
echo "- Z-scores beyond +/-2 are unusual; beyond +/-3 are very rare." . PHP_EOL;
echo "- The IQR method is more robust for skewed data (doesn't assume symmetry)." . PHP_EOL;
echo "- The z-score method can miss outliers because outliers inflate the standard deviation." . PHP_EOL;
// =====================================================================
// Step 8: Confidence Intervals
// =====================================================================
echo PHP_EOL . "=== Step 8: Confidence Intervals ===" . PHP_EOL;
echo "\"How precisely do we know the average finish time?\"" . PHP_EOL . PHP_EOL;
$ciAll = Stat::confidenceInterval($finishTimes, 0.95, 0);
$ciMen = Stat::confidenceInterval($menTimes, 0.95, 0);
$ciWomen = Stat::confidenceInterval($womenTimes, 0.95, 0);
$semAll = Stat::sem($finishTimes, 0);
$semMen = Stat::sem($menTimes, 0);
$semWomen = Stat::sem($womenTimes, 0);
echo "95% Confidence Intervals:" . PHP_EOL;
echo " All runners: " . Format::secondsToTime($ciAll[0]) . " to " . Format::secondsToTime($ciAll[1])
. " (SEM: " . $semAll . "s)" . PHP_EOL;
echo " Men: " . Format::secondsToTime($ciMen[0]) . " to " . Format::secondsToTime($ciMen[1])
. " (SEM: " . $semMen . "s)" . PHP_EOL;
echo " Women: " . Format::secondsToTime($ciWomen[0]) . " to " . Format::secondsToTime($ciWomen[1])
. " (SEM: " . $semWomen . "s)" . PHP_EOL;
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- The interval gives you a range: we are 95% confident the true mean falls within it." . PHP_EOL;
echo "- Smaller samples produce wider intervals (more uncertainty)." . PHP_EOL;
echo "- SEM = stdev / sqrt(n) — as sample size grows, SEM shrinks and the interval tightens." . PHP_EOL;
// =====================================================================
// Step 9: Percentile Benchmarks
// =====================================================================
echo PHP_EOL . "=== Step 9: Percentile Benchmarks ===" . PHP_EOL;
echo "\"What time do you need to beat 75% of the field?\"" . PHP_EOL . PHP_EOL;
echo "Percentile benchmarks:" . PHP_EOL;
$percentiles = [10, 25, 50, 75, 90];
foreach ($percentiles as $p) {
$val = Stat::percentile($finishTimes, $p, 0);
echo " P" . str_pad((string) $p, 3) . ": " . Format::secondsToTime($val) . " (" . $val . "s)" . PHP_EOL;
}
echo PHP_EOL;
$trimmed10 = Stat::trimmedMean($finishTimes, 0.1, 0);
$trimmed20 = Stat::trimmedMean($finishTimes, 0.2, 0);
echo "Trimmed means (removing extreme runners):" . PHP_EOL;
echo " Regular mean: " . Format::secondsToTime(round($mean)) . PHP_EOL;
echo " Trimmed mean (10%): " . Format::secondsToTime($trimmed10) . PHP_EOL;
echo " Trimmed mean (20%): " . Format::secondsToTime($trimmed20) . PHP_EOL;
echo PHP_EOL;
// Weighted median — weight by inverse placement (top finishers weighted more)
$weights = [];
$n = count($finishTimes);
foreach (array_keys($runners) as $i) {
// Weight inversely by finish order (sorted data: fast = high weight)
$weights[] = $n - $i;
}
$wMedian = Stat::weightedMedian($finishTimes, $weights, 0);
echo "Weighted median (top finishers weighted more): " . Format::secondsToTime($wMedian) . PHP_EOL;
echo "Regular median: " . Format::secondsToTime(round(Stat::median($finishTimes))) . PHP_EOL;
echo PHP_EOL;
echo "How to interpret:" . PHP_EOL;
echo "- P25 is the cutoff to beat 75% of the field." . PHP_EOL;
echo "- If trimmed means get closer to the median, it confirms right skew (slow outliers pull the mean up)." . PHP_EOL;
echo "- If the weighted median is faster than the regular median, the weighting emphasizes the competitive core." . PHP_EOL;
// =====================================================================
// Step 10: Summary & Functions Used
// =====================================================================
echo PHP_EOL . str_repeat("=", 60) . PHP_EOL;
echo "SUMMARY: FUNCTIONS DEMONSTRATED" . PHP_EOL;
echo str_repeat("=", 60) . PHP_EOL . PHP_EOL;
echo "Functions demonstrated (30+):" . PHP_EOL;
echo str_pad(" Function", 38) . "Step" . PHP_EOL;
echo " " . str_repeat("-", 40) . PHP_EOL;
$functions = [
['Stat::mean()', '1,2,3,5'],
['Stat::median()', '1,9'],
['Stat::stdev()', '1'],
['Stat::quantiles()', '1'],
['Stat::tTestTwoSample()', '2,5'],
['Stat::tTestPaired()', '3'],
['Stat::correlation() — Pearson', '4'],
['Stat::correlation() — Spearman', '4'],
['Stat::linearRegression()', '4'],
['Stat::rSquared()', '4'],
['Stat::coefficientOfVariation()', '5'],
['Stat::skewness()', '6'],
['Stat::kurtosis()', '6'],
['NormalDist::fromSamples()', '6'],
['NormalDist::cdf()', '6'],
['Stat::outliers()', '7'],
['Stat::iqrOutliers()', '7'],
['Stat::zscores()', '7'],
['Stat::confidenceInterval()', '8'],
['Stat::sem()', '8'],
['Stat::percentile()', '9'],
['Stat::trimmedMean()', '9'],
['Stat::weightedMedian()', '9'],
];
foreach ($functions as $f) {
echo " " . str_pad($f[0], 36) . $f[1] . PHP_EOL;
}
================================================
FILE: examples/article-downhill-ski-analysis.php
================================================
<?php
/**
* Exploring Olympic Downhill Results with PHP Statistics
*
* This script accompanies the article:
* https://dev.to/robertobutti/exploring-olympic-downhill-results-with-php-statistics-3eo1
*
* Each section below corresponds to a step in the article.
* Run it with: php examples/article-downhill-ski-analysis.php
*/
require __DIR__ . "/../vendor/autoload.php";
use HiFolks\Statistics\Freq;
use HiFolks\Statistics\NormalDist;
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\StreamingStat;
// === The Data ===
// 2026 Olympic Men's Downhill — 34 athletes, times in seconds.
$results = [
["name" => "Franjo von ALLMEN", "time" => 111.61],
["name" => "Giovanni FRANZONI", "time" => 111.81],
["name" => "Dominik PARIS", "time" => 112.11],
["name" => "Marco ODERMATT", "time" => 112.31],
["name" => "Alexis MONNEY", "time" => 112.36],
["name" => "Vincent KRIECHMAYR", "time" => 112.38],
["name" => "Daniel HEMETSBERGER", "time" => 112.58],
["name" => "Nils ALLEGRE", "time" => 112.8],
["name" => "James CRAWFORD", "time" => 113.0],
["name" => "Kyle NEGOMIR", "time" => 113.2],
["name" => "Mattia CASSE", "time" => 113.28],
["name" => "Miha HROBAT", "time" => 113.3],
["name" => "Bryce BENNETT", "time" => 113.45],
["name" => "Cameron ALEXANDER", "time" => 113.49],
["name" => "Raphael HAASER", "time" => 113.5],
["name" => "Martin CATER", "time" => 113.51],
["name" => "Florian SCHIEDER", "time" => 113.57],
["name" => "Ryan COCHRAN-SIEGLE", "time" => 113.63],
["name" => "Sam MORSE", "time" => 113.68],
["name" => "Elian LEHTO", "time" => 113.83],
["name" => "Simon JOCHER", "time" => 114.01],
["name" => "Nils ALPHAND", "time" => 114.06],
["name" => "Stefan ROGENTIN", "time" => 114.18],
["name" => "Jan ZABYSTRAN", "time" => 114.39],
["name" => "Jeffrey READ", "time" => 114.56],
["name" => "Stefan BABINSKY", "time" => 114.73],
["name" => "Alban ELEZI CANNAFERINA", "time" => 114.9],
["name" => "Brodie SEGER", "time" => 114.96],
["name" => "Marco PFIFFNER", "time" => 115.66],
["name" => "Barnabas SZOLLOS", "time" => 117.03],
["name" => "Arnaud ALESSANDRIA", "time" => 117.15],
["name" => "Elvis OPMANIS", "time" => 119.24],
["name" => "Dmytro SHEPIUK", "time" => 120.11],
["name" => "Cormac COMERFORD", "time" => 124.4],
];
$times = array_column($results, "time");
// =====================================================================
// Step 1: Descriptive Statistics
// =====================================================================
echo "=== Step 1: Descriptive Statistics ===" . PHP_EOL . PHP_EOL;
$mean = Stat::mean($times);
$median = Stat::median($times);
$std = Stat::stdev($times);
$min = min($times);
$max = max($times);
$range = $max - $min;
$quartiles = Stat::quantiles($times);
echo "Sample size: " . count($times) . PHP_EOL;
echo "Mean time: " . round($mean, 2) . " seconds" . PHP_EOL;
echo "Median time: " . round($median, 2) . " seconds" . PHP_EOL;
echo "Std dev: " . round($std, 2) . " seconds" . PHP_EOL;
echo "Min: " . $min . "s | Max: " . $max . "s | Range: " . round($range, 2) . "s" . PHP_EOL;
echo "Quartiles (Q1, Q2, Q3): "
. round($quartiles[0], 2) . "s, "
. round($quartiles[1], 2) . "s, "
. round($quartiles[2], 2) . "s"
. PHP_EOL;
echo PHP_EOL;
echo "Observations:" . PHP_EOL;
echo "- The mean (114.38) is higher than the median (113.60) — right skew." . PHP_EOL;
echo "- The range (12.79s) is large relative to the std dev (2.60s)." . PHP_EOL;
echo "- Q1 to Q3 spans only ~1.82s, so the middle 50% is tightly packed." . PHP_EOL;
// =====================================================================
// Step 1b: Robust Central Tendency
// =====================================================================
echo PHP_EOL . "=== Step 1b: Robust Central Tendency ===" . PHP_EOL . PHP_EOL;
$trimmedMean10 = Stat::trimmedMean($times, 0.1, 2);
$trimmedMean20 = Stat::trimmedMean($times, 0.2, 2);
echo "Regular mean: " . round(Stat::mean($times), 2) . "s" . PHP_EOL;
echo "Trimmed mean (10%): " . $trimmedMean10 . "s" . PHP_EOL;
echo "Trimmed mean (20%): " . $trimmedMean20 . "s" . PHP_EOL;
echo PHP_EOL;
echo "The trimmed mean removes extreme values from each end." . PHP_EOL;
echo "With 10% cut, the 3 fastest and 3 slowest are excluded." . PHP_EOL;
echo "Result: the 'typical' time drops from 114.38s to 113.91s." . PHP_EOL;
// =====================================================================
// Step 1c: Percentile Analysis
// =====================================================================
echo PHP_EOL . "=== Step 1c: Percentile Analysis ===" . PHP_EOL . PHP_EOL;
echo "P10: " . Stat::percentile($times, 10, 2) . "s — elite threshold" . PHP_EOL;
echo "P25: " . Stat::percentile($times, 25, 2) . "s — top quarter" . PHP_EOL;
echo "P50: " . Stat::percentile($times, 50, 2) . "s — median" . PHP_EOL;
echo "P75: " . Stat::percentile($times, 75, 2) . "s — bottom quarter" . PHP_EOL;
echo "P90: " . Stat::percentile($times, 90, 2) . "s — struggling" . PHP_EOL;
echo PHP_EOL;
echo "Notice: P75-P90 gap (3.4s) is much larger than P10-P25 gap (0.7s)." . PHP_EOL;
echo "This asymmetry IS the right skew, quantified." . PHP_EOL;
// =====================================================================
// Step 1d: Precision of the Mean
// =====================================================================
echo PHP_EOL . "=== Step 1d: Precision of the Mean (SEM) ===" . PHP_EOL . PHP_EOL;
$sem = Stat::sem($times, 2);
echo "SEM: " . $sem . "s" . PHP_EOL;
echo "95% confidence interval: "
. round(Stat::mean($times) - 1.96 * $sem, 2) . "s to "
. round(Stat::mean($times) + 1.96 * $sem, 2) . "s"
. PHP_EOL;
echo PHP_EOL;
echo "With 34 athletes, we estimate the true mean within ~"
. round($sem * 1.96, 2) . "s at 95% confidence." . PHP_EOL;
// =====================================================================
// Step 2: Fitting a Normal Distribution
// =====================================================================
echo PHP_EOL . "=== Step 2: Fitting a Normal Distribution ===" . PHP_EOL . PHP_EOL;
$normal = NormalDist::fromSamples($times);
echo "Estimated mu (mean): " . $normal->getMeanRounded(2) . " seconds" . PHP_EOL;
echo "Estimated sigma (std): " . $normal->getSigmaRounded(2) . " seconds" . PHP_EOL;
echo PHP_EOL;
echo "Model median: " . $normal->getMedianRounded(2) . "s" . PHP_EOL;
echo "Actual median: " . round($median, 2) . "s" . PHP_EOL;
echo "Difference: " . round($normal->getMedianRounded(2) - $median, 2) . "s" . PHP_EOL;
echo "(the right skew pulls the model median = mean upward)" . PHP_EOL;
// =====================================================================
// Step 3: Asking Probabilistic Questions
// =====================================================================
echo PHP_EOL . "=== Step 3: Probabilistic Questions ===" . PHP_EOL . PHP_EOL;
$target = 113.0;
$probUnder = $normal->cdfRounded($target, 4);
$actualUnder = count(array_filter($times, fn(float $t): bool => $t <= $target));
echo "Q: What is the probability of finishing in " . $target . "s or less?" . PHP_EOL;
echo "Model: P(time <= " . $target . "s) = "
. round($probUnder * 100, 1) . "%" . PHP_EOL;
echo "Actual: " . $actualUnder . "/" . count($times)
. " = " . round(($actualUnder / count($times)) * 100, 1) . "%" . PHP_EOL;
echo "(the gap shows the effect of skewness on the normal model)" . PHP_EOL;
echo PHP_EOL;
echo "PDF at " . $target . "s = " . $normal->pdfRounded($target, 6) . PHP_EOL;
// =====================================================================
// Step 4: Performance Thresholds (Inverse CDF)
// =====================================================================
echo PHP_EOL . "=== Step 4: Performance Thresholds ===" . PHP_EOL . PHP_EOL;
$eliteThreshold = $normal->invCdfRounded(0.2, 2);
$slowThreshold = $normal->invCdfRounded(0.8, 2);
echo "Top 20% fastest (below): " . $eliteThreshold . " seconds" . PHP_EOL;
echo "Slowest 20% (above): " . $slowThreshold . " seconds" . PHP_EOL;
// =====================================================================
// Step 5: Z-scores
// =====================================================================
echo PHP_EOL . "=== Step 5: Z-scores ===" . PHP_EOL . PHP_EOL;
echo str_pad("Athlete", 30)
. str_pad("Time", 10)
. str_pad("Z-score", 10)
. "Tier"
. PHP_EOL;
echo str_repeat("-", 65) . PHP_EOL;
$tierDefinitions = [
["max" => 0.20, "label" => "Elite"],
["max" => 0.50, "label" => "Strong"],
["max" => 0.80, "label" => "Average"],
["max" => 1.00, "label" => "Below avg"],
];
foreach ($results as $r) {
$time = $r["time"];
$percentile = $normal->cdf($time);
$tier = "Below avg";
foreach ($tierDefinitions as $def) {
if ($percentile <= $def["max"]) {
$tier = $def["label"];
break;
}
}
$z = $normal->zscoreRounded($time, 2);
$zFormatted = ($z >= 0 ? "+" : "") . number_format($z, 2);
echo str_pad($r["name"], 30)
. str_pad(number_format($time, 2) . "s", 10)
. str_pad($zFormatted, 10)
. $tier
. PHP_EOL;
}
// =====================================================================
// Step 5b: Outlier Detection
// =====================================================================
echo PHP_EOL . "=== Step 5b: Outlier Detection ===" . PHP_EOL . PHP_EOL;
// Method 1: Z-score
echo "Method 1: Z-score based (threshold = 2.5)" . PHP_EOL;
$zscoreOutliers = Stat::outliers($times, 2.5);
if ($zscoreOutliers === []) {
echo " No outliers detected." . PHP_EOL;
} else {
foreach ($zscoreOutliers as $time) {
$name = "";
foreach ($results as $r) {
if ($r["time"] === $time) {
$name = $r["name"];
break;
}
}
echo " " . $time . "s — " . $name . PHP_EOL;
}
}
// Method 2: IQR
echo PHP_EOL . "Method 2: IQR based (factor = 1.5, box plot whiskers)" . PHP_EOL;
$iqrOutliers = Stat::iqrOutliers($times);
if ($iqrOutliers === []) {
echo " No outliers detected." . PHP_EOL;
} else {
foreach ($iqrOutliers as $time) {
$name = "";
foreach ($results as $r) {
if ($r["time"] === $time) {
$name = $r["name"];
break;
}
}
echo " " . $time . "s — " . $name . PHP_EOL;
}
}
echo PHP_EOL;
echo "Z-score detected 1 outlier; IQR detected 3." . PHP_EOL;
echo "IQR is more robust for skewed data — outliers don't inflate" . PHP_EOL;
echo "the detection threshold (unlike z-score, where they inflate stdev)." . PHP_EOL;
// =====================================================================
// Step 6: Classifying Athletes into Tiers
// =====================================================================
echo PHP_EOL . "=== Step 6: Athlete Tier Classification ===" . PHP_EOL . PHP_EOL;
echo "Using the normal model's CDF to assign tiers:" . PHP_EOL;
echo " Elite: bottom 20% of the CDF (fastest)" . PHP_EOL;
echo " Strong: 20%–50%" . PHP_EOL;
echo " Average: 50%–80%" . PHP_EOL;
echo " Below avg: 80%–100% (slowest)" . PHP_EOL;
echo PHP_EOL;
$tierCounts = ["Elite" => 0, "Strong" => 0, "Average" => 0, "Below avg" => 0];
foreach ($results as $r) {
$percentile = $normal->cdf($r["time"]);
foreach ($tierDefinitions as $def) {
if ($percentile <= $def["max"]) {
$tierCounts[$def["label"]]++;
break;
}
}
}
foreach ($tierCounts as $tier => $count) {
echo str_pad($tier, 12) . str_repeat("*", $count) . " (" . $count . ")" . PHP_EOL;
}
// =====================================================================
// Step 7: Frequency Table
// =====================================================================
echo PHP_EOL . "=== Step 7: Frequency Table (1-second bins) ===" . PHP_EOL . PHP_EOL;
$freqTable = Freq::frequencyTableBySize($times, 1);
foreach ($freqTable as $class => $count) {
echo str_pad($class . "s", 8)
. str_repeat("*", $count)
. " (" . $count . ")"
. PHP_EOL;
}
// =====================================================================
// Step 8: Skewness and Kurtosis
// =====================================================================
echo PHP_EOL . "=== Step 8: Skewness and Kurtosis ===" . PHP_EOL . PHP_EOL;
echo "Skewness: " . Stat::skewness($times, 4) . PHP_EOL;
echo " (positive = right-skewed, a few slow finishers pull the tail)" . PHP_EOL;
echo "Kurtosis: " . Stat::kurtosis($times, 4) . PHP_EOL;
echo " (positive = heavy tails, outliers present)" . PHP_EOL;
// =====================================================================
// Step 9: Dispersion Beyond Standard Deviation
// =====================================================================
echo PHP_EOL . "=== Step 9: Dispersion Measures Compared ===" . PHP_EOL . PHP_EOL;
$stdev = Stat::stdev($times, 4);
$mad = Stat::meanAbsoluteDeviation($times, 4);
$medianAD = Stat::medianAbsoluteDeviation($times, 4);
echo "Standard deviation: " . $stdev . "s" . PHP_EOL;
echo "Mean Absolute Deviation: " . $mad . "s" . PHP_EOL;
echo "Median Absolute Deviation: " . $medianAD . "s" . PHP_EOL;
echo PHP_EOL;
echo "The median absolute deviation (0.88s) is much smaller than" . PHP_EOL;
echo "the stdev (2.60s). This reveals two groups: a tight core pack" . PHP_EOL;
echo "(within ~1 second of each other) and a few stragglers." . PHP_EOL;
// =====================================================================
// Step 10: Coefficient of Variation
// =====================================================================
echo PHP_EOL . "=== Step 10: Coefficient of Variation ===" . PHP_EOL . PHP_EOL;
$cvFull = Stat::coefficientOfVariation($times, 2);
$top10 = array_slice($times, 0, 10);
$cvTop10 = Stat::coefficientOfVariation($top10, 2);
echo "Full field CV: " . $cvFull . "%" . PHP_EOL;
echo "Top 10 CV: " . $cvTop10 . "%" . PHP_EOL;
echo PHP_EOL;
echo "The top 10 is 5x tighter than the full field." . PHP_EOL;
echo "CV lets you compare tightness across different events or years." . PHP_EOL;
// =====================================================================
// Step 11: Weighted Median
// =====================================================================
echo PHP_EOL . "=== Step 11: Weighted Median ===" . PHP_EOL . PHP_EOL;
$weights = [];
foreach ($results as $i => $r) {
$weights[] = $i < 15 ? 3.0 : 1.0;
}
$wMedian = Stat::weightedMedian($times, $weights, 2);
echo "Regular median: " . round(Stat::median($times), 2) . "s" . PHP_EOL;
echo "Weighted median: " . $wMedian . "s (top-15 seeded athletes weighted 3x)" . PHP_EOL;
echo PHP_EOL;
echo "The weighted median answers: 'What does a competitive time look like?'" . PHP_EOL;
echo "rather than 'What does the typical time look like?'" . PHP_EOL;
// =====================================================================
// Step 12: StreamingStat — Real-Time Processing
// =====================================================================
echo PHP_EOL . "=== Step 12: StreamingStat (O(1) Memory) ===" . PHP_EOL . PHP_EOL;
$stream = new StreamingStat();
foreach ($results as $i => $r) {
$stream->add($r["time"]);
if (in_array($i + 1, [5, 10, 20, 34])) {
echo "After " . str_pad($stream->count(), 2) . " athletes: "
. "mean=" . $stream->mean(2) . "s, "
. "stdev=" . $stream->stdev(2) . "s, "
. "min=" . $stream->min() . "s, "
. "max=" . $stream->max() . "s"
. PHP_EOL;
}
}
echo PHP_EOL;
echo "Final streaming results match Stat:" . PHP_EOL;
echo " Streaming mean: " . $stream->mean(2) . "s vs Stat::mean: " . round($mean, 2) . "s" . PHP_EOL;
echo " Streaming stdev: " . $stream->stdev(2) . "s vs Stat::stdev: " . round($std, 2) . "s" . PHP_EOL;
// =====================================================================
// When the Normal Distribution Works (and When It Doesn't)
// =====================================================================
echo PHP_EOL . "=== Model Limitations ===" . PHP_EOL . PHP_EOL;
echo "The normal model is a useful approximation, but this data is" . PHP_EOL;
echo "right-skewed (skewness: " . Stat::skewness($times, 2) . "). Signs of misfit:" . PHP_EOL;
echo "- Model median (" . $normal->getMedianRounded(2)
. "s) differs from actual median (" . round($median, 2) . "s)" . PHP_EOL;
echo "- Model P(time <= 113s) = " . round($normal->cdf(113.0) * 100, 1)
. "%, actual = " . round((count(array_filter($times, fn(float $t): bool => $t <= 113.0)) / count($times)) * 100, 1) . "%" . PHP_EOL;
echo "- Kurtosis (" . Stat::kurtosis($times, 2) . ") >> 0 — heavier tails than normal" . PHP_EOL;
echo PHP_EOL;
echo "For this dataset, robust measures (trimmed mean, IQR outliers," . PHP_EOL;
echo "median absolute deviation) give more reliable insights than" . PHP_EOL;
echo "methods that assume normality." . PHP_EOL;
// =====================================================================
// Summary
// =====================================================================
echo PHP_EOL . str_repeat("=", 55) . PHP_EOL;
echo "SUMMARY" . PHP_EOL;
echo str_repeat("=", 55) . PHP_EOL;
echo "Winner: " . $results[0]["name"] . " (" . $results[0]["time"] . "s)" . PHP_EOL;
echo "Mean / Median: " . round($mean, 2) . "s / " . round($median, 2) . "s" . PHP_EOL;
echo "Trimmed mean (10%): " . $trimmedMean10 . "s" . PHP_EOL;
echo "Core pack spread: " . $medianAD . "s (median abs deviation)" . PHP_EOL;
echo "Race tightness (CV): " . $cvFull . "% (full field), " . $cvTop10 . "% (top 10)" . PHP_EOL;
echo "Outliers (IQR): " . count($iqrOutliers) . " athletes flagged" . PHP_EOL;
echo "Distribution: right-skewed (skewness " . Stat::skewness($times, 2) . ")" . PHP_EOL;
================================================
FILE: examples/article-gpx-running-analysis.php
================================================
<?php
/**
* Analyze Your Running Performance with GPX Data and PHP Statistics
*
* This script shows how to parse a GPX file from your sport watch
* and analyze your running performance using the hi-folks/statistics package.
*
* It includes helper functions for GPX parsing, plus simulated data
* so you can run it immediately without a GPX file.
*
* Run it with: php examples/article-gpx-running-analysis.php
*/
require __DIR__ . "/../vendor/autoload.php";
use HiFolks\Statistics\Freq;
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Utils\Arr;
use HiFolks\Statistics\Utils\Format;
// ============================================================
// HELPER FUNCTIONS — GPX parsing and distance calculation
// ============================================================
/**
* Parse a GPX file and return an array of trackpoints.
* Each trackpoint: ['lat' => float, 'lon' => float, 'ele' => float,
* 'time' => int (unix timestamp), 'hr' => int|null]
*/
function parseGpx(string $filePath): array
{
$xml = simplexml_load_file($filePath);
if ($xml === false) {
throw new RuntimeException("Cannot parse GPX file: {$filePath}");
}
$namespaces = $xml->getNamespaces(true);
$points = [];
foreach ($xml->trk->trkseg->trkpt as $trkpt) {
$point = [
"lat" => (float) $trkpt["lat"],
"lon" => (float) $trkpt["lon"],
"ele" => property_exists($trkpt, 'ele') && $trkpt->ele !== null ? (float) $trkpt->ele : 0.0,
"time" => property_exists($trkpt, 'time') && $trkpt->time !== null
? strtotime((string) $trkpt->time)
: 0,
"hr" => null,
];
// Try to extract heart rate from Garmin TrackPointExtension
if (isset($namespaces["gpxtpx"])) {
$extensions = $trkpt->extensions;
if ($extensions) {
$gpxtpx = $extensions->children($namespaces["gpxtpx"]);
if (property_exists($gpxtpx->TrackPointExtension, 'hr') && $gpxtpx->TrackPointExtension->hr !== null) {
$point["hr"] = (int) $gpxtpx->TrackPointExtension->hr;
}
}
}
$points[] = $point;
}
return $points;
}
/**
* Haversine distance between two GPS coordinates in meters.
*/
function haversineDistance(
float $lat1,
float $lon1,
float $lat2,
float $lon2,
): float {
$R = 6371000; // Earth radius in meters
$dLat = deg2rad($lat2 - $lat1);
$dLon = deg2rad($lon2 - $lon1);
$a
= sin($dLat / 2) ** 2
+ cos(deg2rad($lat1)) * cos(deg2rad($lat2)) * sin($dLon / 2) ** 2;
return $R * 2 * atan2(sqrt($a), sqrt(1 - $a));
}
/**
* Build per-kilometer splits from trackpoints.
* Returns array of ['km' => int, 'time' => int (seconds), 'pace' => int (sec/km),
* 'eleGain' => float, 'eleLoss' => float, 'avgHr' => int|null]
*/
function buildKmSplits(array $trackpoints): array
{
$splits = [];
$currentKm = 1;
$kmDistance = 0;
$kmStartTime = $trackpoints[0]["time"];
$kmEleGain = 0;
$kmEleLoss = 0;
$kmHrValues = [];
$counter = count($trackpoints);
for ($i = 1; $i < $counter; $i++) {
$prev = $trackpoints[$i - 1];
$curr = $trackpoints[$i];
$segDist = haversineDistance(
$prev["lat"],
$prev["lon"],
$curr["lat"],
$curr["lon"],
);
$kmDistance += $segDist;
$eleDiff = $curr["ele"] - $prev["ele"];
if ($eleDiff > 0) {
$kmEleGain += $eleDiff;
} else {
$kmEleLoss += abs($eleDiff);
}
if ($curr["hr"] !== null) {
$kmHrValues[] = $curr["hr"];
}
if ($kmDistance >= 1000) {
$kmTime = $curr["time"] - $kmStartTime;
$splits[] = [
"km" => $currentKm,
"time" => $kmTime,
"pace" => $kmTime,
"eleGain" => round($kmEleGain, 1),
"eleLoss" => round($kmEleLoss, 1),
"avgHr"
=> count($kmHrValues) > 0
? (int) round(Stat::mean($kmHrValues))
: null,
];
$currentKm++;
$kmDistance -= 1000;
$kmStartTime = $curr["time"];
$kmEleGain = 0;
$kmEleLoss = 0;
$kmHrValues = [];
}
}
return $splits;
}
/**
* Format a pace in seconds as "M:SS/km".
*/
function formatPace(int|float $seconds): string
{
return Format::secondsToTime((int) round($seconds)) . "/km";
}
// ============================================================
// THE DATA
// ============================================================
// === Option 1: Parse a real GPX file ===
// Uncomment these lines if you have a GPX file from your sport watch:
//
// $trackpoints = parseGpx('your-run.gpx');
// $splits = buildKmSplits($trackpoints);
// === Option 2: Simulated 10K run ===
// A realistic 10K with a hilly middle section, slight positive split,
// and heart rate drifting upward as fatigue accumulates.
$splits = [
[
"km" => 1,
"time" => 322,
"pace" => 322,
"eleGain" => 5,
"eleLoss" => 2,
"avgHr" => 145,
],
[
"km" => 2,
"time" => 318,
"pace" => 318,
"eleGain" => 8,
"eleLoss" => 3,
"avgHr" => 150,
],
[
"km" => 3,
"time" => 335,
"pace" => 335,
"eleGain" => 22,
"eleLoss" => 4,
"avgHr" => 158,
],
[
"km" => 4,
"time" => 348,
"pace" => 348,
"eleGain" => 28,
"eleLoss" => 5,
"avgHr" => 164,
],
[
"km" => 5,
"time" => 340,
"pace" => 340,
"eleGain" => 15,
"eleLoss" => 18,
"avgHr" => 162,
],
[
"km" => 6,
"time" => 312,
"pace" => 312,
"eleGain" => 2,
"eleLoss" => 30,
"avgHr" => 155,
],
[
"km" => 7,
"time" => 325,
"pace" => 325,
"eleGain" => 3,
"eleLoss" => 8,
"avgHr" => 158,
],
[
"km" => 8,
"time" => 338,
"pace" => 338,
"eleGain" => 12,
"eleLoss" => 5,
"avgHr" => 165,
],
[
"km" => 9,
"time" => 352,
"pace" => 352,
"eleGain" => 18,
"eleLoss" => 3,
"avgHr" => 170,
],
[
"km" => 10,
"time" => 330,
"pace" => 330,
"eleGain" => 4,
"eleLoss" => 15,
"avgHr" => 172,
],
];
// Extract column arrays we will reuse throughout
[$paces, $eleGains, $hrValues, $kmNumbers] = Arr::extract($splits, [
"pace",
"eleGain",
"avgHr",
"km",
]);
// ============================================================
// STEP 1: Run Overview
// ============================================================
$totalDistance = count($splits);
$totalTime = array_sum(array_column($splits, "time"));
$totalEleGain = array_sum(array_column($splits, "eleGain"));
$totalEleLoss = array_sum(array_column($splits, "eleLoss"));
echo "=== STEP 1: Run Overview ===" . PHP_EOL;
echo "Distance: " . $totalDistance . " km" . PHP_EOL;
echo "Total time: " . Format::secondsToTime($totalTime) . PHP_EOL;
echo "Average pace: " . formatPace(Stat::mean($paces)) . PHP_EOL;
echo "Elevation gain: +" . $totalEleGain . " m" . PHP_EOL;
echo "Elevation loss: -" . $totalEleLoss . " m" . PHP_EOL;
echo "Average HR: " . round(Stat::mean($hrValues)) . " bpm" . PHP_EOL;
echo PHP_EOL;
// ============================================================
// STEP 2: Pace Descriptive Statistics
// ============================================================
$meanPace = Stat::mean($paces);
$medianPace = Stat::median($paces);
$stdevPace = Stat::stdev($paces);
$quartiles = Stat::quantiles($paces);
echo "=== STEP 2: Pace Descriptive Statistics ===" . PHP_EOL;
echo "Mean pace: " . formatPace($meanPace) . PHP_EOL;
echo "Median pace: " . formatPace($medianPace) . PHP_EOL;
echo "Std deviation: " . round($stdevPace, 1) . " sec" . PHP_EOL;
echo "Fastest km: "
. formatPace(min($paces))
. " (km "
. $splits[array_search(min($paces), $paces)]["km"]
. ")"
. PHP_EOL;
echo "Slowest km: "
. formatPace(max($paces))
. " (km "
. $splits[array_search(max($paces), $paces)]["km"]
. ")"
. PHP_EOL;
echo "Quartiles: Q1="
. formatPace($quartiles[0])
. " Q2="
. formatPace($quartiles[1])
. " Q3="
. formatPace($quartiles[2])
. PHP_EOL;
echo PHP_EOL;
// ============================================================
// STEP 3: Pacing Consistency
// ============================================================
$cv = Stat::coefficientOfVariation($paces, 2);
$halfPoint = intdiv(count($splits), 2);
$firstHalfPaces = array_slice($paces, 0, $halfPoint);
$secondHalfPaces = array_slice($paces, $halfPoint);
$meanFirst = Stat::mean($firstHalfPaces);
$meanSecond = Stat::mean($secondHalfPaces);
$splitDiff = $meanSecond - $meanFirst;
$splitPct = round(($splitDiff / $meanFirst) * 100, 1);
echo "=== STEP 3: Pacing Consistency ===" . PHP_EOL;
echo "Coefficient of Variation: " . $cv . "%" . PHP_EOL;
echo "First half avg pace: "
. formatPace($meanFirst)
. " (km 1-"
. $halfPoint
. ")"
. PHP_EOL;
echo "Second half avg pace: "
. formatPace($meanSecond)
. " (km "
. ($halfPoint + 1)
. "-"
. $totalDistance
. ")"
. PHP_EOL;
if ($splitDiff > 0) {
echo "Positive split: +"
. round($splitDiff, 1)
. " sec/km slower ("
. $splitPct
. "% fade)"
. PHP_EOL;
} elseif ($splitDiff < 0) {
echo "Negative split: "
. round(abs($splitDiff), 1)
. " sec/km faster ("
. abs($splitPct)
. "% improvement)"
. PHP_EOL;
} else {
echo "Even split: perfectly consistent pacing" . PHP_EOL;
}
echo PHP_EOL;
// ============================================================
// STEP 4: Elevation Impact on Pace
// ============================================================
$corrEle = Stat::correlation($eleGains, $paces);
$regEle = Stat::linearRegression($eleGains, $paces);
$r2Ele = Stat::rSquared($eleGains, $paces, false, 4);
echo "=== STEP 4: Elevation Impact on Pace ===" . PHP_EOL;
echo "Correlation (elevation gain vs pace): " . round($corrEle, 4) . PHP_EOL;
echo "Linear regression: pace = "
. round($regEle[0], 2)
. " x eleGain + "
. round($regEle[1], 1)
. PHP_EOL;
echo "R-squared: " . $r2Ele . PHP_EOL;
echo "Interpretation: each meter of elevation gain costs ~"
. round($regEle[0], 1)
. " seconds per km"
. PHP_EOL;
echo PHP_EOL;
// ============================================================
// STEP 5: Heart Rate Analysis
// ============================================================
$meanHr = Stat::mean($hrValues);
$medianHr = Stat::median($hrValues);
$stdevHr = Stat::stdev($hrValues);
// Cardiac drift: does HR rise over the course of the run?
$corrHrKm = Stat::correlation($kmNumbers, $hrValues);
$regHrKm = Stat::linearRegression($kmNumbers, $hrValues);
$r2HrKm = Stat::rSquared($kmNumbers, $hrValues, false, 4);
// HR vs pace correlation
$corrHrPace = Stat::correlation($hrValues, $paces);
echo "=== STEP 5: Heart Rate Analysis ===" . PHP_EOL;
echo "Mean HR: " . round($meanHr) . " bpm" . PHP_EOL;
echo "Median HR: " . round($medianHr) . " bpm" . PHP_EOL;
echo "Std dev: " . round($stdevHr, 1) . " bpm" . PHP_EOL;
echo "Min HR: "
. min($hrValues)
. " bpm | Max HR: "
. max($hrValues)
. " bpm"
. PHP_EOL;
echo PHP_EOL;
echo "Cardiac drift (HR vs km):" . PHP_EOL;
echo " Correlation: " . round($corrHrKm, 4) . PHP_EOL;
echo " Regression: HR = "
. round($regHrKm[0], 2)
. " x km + "
. round($regHrKm[1], 1)
. PHP_EOL;
echo " R-squared: " . $r2HrKm . PHP_EOL;
echo " HR drift per km: +" . round($regHrKm[0], 1) . " bpm/km" . PHP_EOL;
echo PHP_EOL;
echo "HR vs pace correlation: " . round($corrHrPace, 4) . PHP_EOL;
echo PHP_EOL;
// Heart rate zone distribution
$hrZones = Freq::frequencyTableBySize($hrValues, 10);
echo "Heart Rate Zone Distribution:" . PHP_EOL;
foreach ($hrZones as $range => $count) {
echo " "
. $range
. " bpm: "
. str_repeat("#", $count)
. " ("
. $count
. " km)"
. PHP_EOL;
}
echo PHP_EOL;
// ============================================================
// STEP 6: Outlier Detection
// ============================================================
$zscores = Stat::zscores($paces, 2);
$zOutliers = Stat::outliers($paces, 2.0);
$iqrOutliers = Stat::iqrOutliers($paces);
echo "=== STEP 6: Outlier Detection ===" . PHP_EOL;
echo "Per-km z-scores (negative = faster than average):" . PHP_EOL;
foreach ($splits as $i => $split) {
$z = $zscores[$i];
$bar
= $z < 0
? str_repeat("<", (int) abs(round($z * 5)))
: str_repeat(">", (int) round($z * 5));
echo " km "
. str_pad((string) $split["km"], 2, " ", STR_PAD_LEFT)
. ": "
. formatPace($split["pace"])
. " z="
. sprintf("%+.2f", $z)
. " "
. $bar
. PHP_EOL;
}
echo PHP_EOL;
echo "Z-score outliers (|z| > 2.0): "
. (count($zOutliers) > 0
? implode(", ", array_map(formatPace(...), $zOutliers))
: "none")
. PHP_EOL;
echo "IQR outliers: "
. (count($iqrOutliers) > 0
? implode(", ", array_map(formatPace(...), $iqrOutliers))
: "none")
. PHP_EOL;
echo PHP_EOL;
// ============================================================
// STEP 7: Percentile Benchmarks
// ============================================================
echo "=== STEP 7: Percentile Benchmarks ===" . PHP_EOL;
echo "Your pace distribution across this run:" . PHP_EOL;
$percentiles = [10, 25, 50, 75, 90];
foreach ($percentiles as $p) {
$val = Stat::percentile($paces, $p, 0);
echo " P"
. str_pad((string) $p, 2, " ", STR_PAD_LEFT)
. ": "
. formatPace($val)
. PHP_EOL;
}
echo PHP_EOL;
echo "P10 = your fastest 10% of km were at this pace or faster" . PHP_EOL;
echo "P90 = your slowest 10% of km were at this pace or slower" . PHP_EOL;
echo PHP_EOL;
// ============================================================
// STEP 8: Distribution Shape
// ============================================================
$skewness = Stat::skewness($paces, 4);
$kurtosis = Stat::kurtosis($paces, 4);
echo "=== STEP 8: Distribution Shape ===" . PHP_EOL;
echo "Skewness: " . $skewness . PHP_EOL;
echo "Kurtosis: " . $kurtosis . PHP_EOL;
if ($skewness > 0.2) {
echo "Right-skewed: you have a tail of slower km (hills? fatigue?)"
. PHP_EOL;
} elseif ($skewness < -0.2) {
echo "Left-skewed: you have a tail of faster km (downhills? strong start?)"
. PHP_EOL;
} else {
echo "Approximately symmetric pacing" . PHP_EOL;
}
echo PHP_EOL;
// ============================================================
// STEP 9: Confidence Interval on True Pace
// ============================================================
$ci = Stat::confidenceInterval($paces, 0.95, 0);
$sem = Stat::sem($paces, 1);
echo "=== STEP 9: Confidence Interval ===" . PHP_EOL;
echo "95% CI for your true pace: "
. formatPace($ci[0])
. " to "
. formatPace($ci[1])
. PHP_EOL;
echo "Standard Error of the Mean: " . $sem . " sec" . PHP_EOL;
echo "With more km (longer runs), this interval would narrow." . PHP_EOL;
echo PHP_EOL;
// ============================================================
// STEP 10: Multi-Run Trend Analysis (Simulated)
// ============================================================
// Simulated: 8 weeks of average 10K paces showing diminishing improvement
// Early weeks show big gains; later weeks show smaller improvements (plateau effect)
$weeks = [1, 2, 3, 4, 5, 6, 7, 8];
$weeklyPaces = [350, 342, 337, 333, 330, 328, 326, 325];
$trendReg = Stat::linearRegression($weeks, $weeklyPaces);
$trendR2 = Stat::rSquared($weeks, $weeklyPaces, false, 4);
$trendCorr = Stat::correlation($weeks, $weeklyPaces);
echo "=== STEP 10: Multi-Run Trend (8-Week Simulation) ===" . PHP_EOL;
echo "Weekly average paces:" . PHP_EOL;
foreach ($weeks as $i => $w) {
echo " Week " . $w . ": " . formatPace($weeklyPaces[$i]) . PHP_EOL;
}
echo PHP_EOL;
echo "Trend regression: pace = "
. round($trendReg[0], 2)
. " x week + "
. round($trendReg[1], 1)
. PHP_EOL;
echo "R-squared: " . $trendR2 . PHP_EOL;
echo "Correlation: " . round($trendCorr, 4) . PHP_EOL;
echo "Improvement rate: "
. round(abs($trendReg[0]), 1)
. " seconds/km per week"
. PHP_EOL;
echo PHP_EOL;
// Linear prediction for week 12
$linearPrediction12 = $trendReg[0] * 12 + $trendReg[1];
echo "Linear prediction at week 12: "
. formatPace(max(0, $linearPrediction12))
. PHP_EOL;
echo "(Extrapolation — use with caution!)" . PHP_EOL;
echo PHP_EOL;
// ============================================================
// STEP 10b: Logarithmic Regression — Modeling the Plateau
// ============================================================
echo "=== STEP 10b: Logarithmic Regression ===" . PHP_EOL;
echo PHP_EOL;
// Logarithmic model: pace = a * ln(week) + b
$logReg = Stat::logarithmicRegression($weeks, $weeklyPaces);
$logWeeks = array_map(log(...), $weeks);
$logR2 = Stat::rSquared($logWeeks, $weeklyPaces, false, 4);
echo "Logarithmic regression: pace = "
. round($logReg[0], 2)
. " x ln(week) + "
. round($logReg[1], 1)
. PHP_EOL;
echo "R-squared: " . $logR2 . PHP_EOL;
echo PHP_EOL;
// Compare models
echo "Model comparison:" . PHP_EOL;
echo " Linear R²: " . $trendR2 . PHP_EOL;
echo " Logarithmic R²: " . $logR2 . PHP_EOL;
echo " Better fit: "
. ($logR2 > $trendR2 ? "Logarithmic" : "Linear")
. PHP_EOL;
echo PHP_EOL;
// Compare predictions
$logPrediction12 = $logReg[0] * log(12) + $logReg[1];
$logPrediction20 = $logReg[0] * log(20) + $logReg[1];
$linearPrediction20 = $trendReg[0] * 20 + $trendReg[1];
echo "Predictions:" . PHP_EOL;
echo " Week 12 — Linear: "
. formatPace(max(0, $linearPrediction12))
. " | Logarithmic: "
. formatPace(max(0, $logPrediction12))
. PHP_EOL;
echo " Week 20 — Linear: "
. formatPace(max(0, $linearPrediction20))
. " | Logarithmic: "
. formatPace(max(0, $logPrediction20))
. PHP_EOL;
echo PHP_EOL;
echo "The logarithmic model predicts more conservative (realistic) paces" . PHP_EOL;
echo "because it accounts for the natural plateau in athletic improvement." . PHP_EOL;
echo PHP_EOL;
// ============================================================
// STEP 10c: All Four Models Compared
// ============================================================
echo "=== STEP 10c: All Four Models Compared ===" . PHP_EOL;
echo PHP_EOL;
// Power: pace = a * week^b
[$aPow, $bPow] = Stat::powerRegression($weeks, $weeklyPaces);
$logPaces = array_map(log(...), $weeklyPaces);
$r2Pow = Stat::rSquared($logWeeks, $logPaces, false, 4);
// Exponential: pace = a * e^(b * week)
[$aExp, $bExp] = Stat::exponentialRegression($weeks, $weeklyPaces);
$r2Exp = Stat::rSquared($weeks, $logPaces, false, 4);
// Predictions for week 12, 20, 52
$predWeeks = [12, 20, 52];
$models = [
'Linear' => [
'r2' => $trendR2,
'predict' => fn($w): int|float => $trendReg[0] * $w + $trendReg[1],
],
'Logarithmic' => [
'r2' => $logR2,
'predict' => fn($w): float => $logReg[0] * log($w) + $logReg[1],
],
'Power' => [
'r2' => $r2Pow,
'predict' => fn($w): float|int => $aPow * $w ** $bPow,
],
'Exponential' => [
'r2' => $r2Exp,
'predict' => fn($w): float => $aExp * exp($bExp * $w),
],
];
echo str_pad("Model", 18)
. str_pad("R²", 11)
. str_pad("Week 12", 11)
. str_pad("Week 20", 11)
. "Week 52"
. PHP_EOL;
echo str_repeat("-", 58) . PHP_EOL;
foreach ($models as $name => $model) {
echo str_pad($name, 18)
. str_pad((string) $model['r2'], 11)
. str_pad(formatPace(max(0, $model['predict'](12))), 11)
. str_pad(formatPace(max(0, $model['predict'](20))), 11)
. formatPace(max(0, $model['predict'](52)))
. PHP_EOL;
}
echo PHP_EOL;
// Find the best model by R²
$bestModel = '';
$bestR2 = 0;
foreach ($models as $name => $model) {
if ($model['r2'] > $bestR2) {
$bestR2 = $model['r2'];
$bestModel = $name;
}
}
echo "Best fit by R²: " . $bestModel . " (R² = " . $bestR2 . ")" . PHP_EOL;
echo "The data tells us the improvement pattern follows a curve, not a straight line." . PHP_EOL;
================================================
FILE: examples/freq_methods.php
================================================
<?php
require __DIR__ . '/../vendor/autoload.php';
$data = [55, 70, 57, 73, 55, 59, 64, 72,
60, 48, 58, 54, 69, 51, 63, 78,
75, 64, 65, 57, 71, 78, 76, 62,
49, 66, 62, 76, 61, 63, 63, 76,
52, 76, 71, 61, 53, 56, 67, 71, ];
$result = \HiFolks\Statistics\Freq::frequencyTable($data, 7);
echo min($data) . PHP_EOL;
echo max($data) . PHP_EOL;
print_r($result);
$data = [1, 1, 1, 4, 4, 5, 5, 5, 6, 7, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 11, 12, 12,
13, 14, 14, 15, 15, 16, 16, 16, 16, 17, 17, 17, 18, 18, ];
$result = \HiFolks\Statistics\Freq::frequencyTableBySize($data, 4);
print_r($result);
$result = \HiFolks\Statistics\Freq::frequencyTable($data, 5);
echo count($data) . PHP_EOL;
echo min($data) . PHP_EOL;
echo max($data) . PHP_EOL;
print_r($result);
================================================
FILE: examples/frequencies.php
================================================
<?php
require __DIR__ . '/../vendor/autoload.php';
use HiFolks\Statistics\Freq;
use HiFolks\Statistics\Statistics;
$fruits = ['🍈', '🍈', '🍈', '🍉', '🍉', '🍉', '🍉', '🍉', '🍌'];
$freqTable = Freq::frequencies($fruits);
print_r($freqTable);
/*
Array
(
[🍈] => 3
[🍉] => 5
[🍌] => 1
)
*/
$freqTable = Freq::relativeFrequencies($fruits, 2);
print_r($freqTable);
/*
Array
(
[🍈] => 33.33
[🍉] => 55.56
[🍌] => 11.11
)
*/
$s = Statistics::make(
[98, 90, 70, 18, 92, 92, 55, 83, 45, 95, 88, 76],
);
$a = $s->frequencies();
print_r($a);
/*
Array
(
[18] => 1
[45] => 1
[55] => 1
[70] => 1
[76] => 1
[83] => 1
[88] => 1
[90] => 1
[92] => 2
[95] => 1
[98] => 1
)
*/
$a = $s->relativeFrequencies();
print_r($a);
/*
Array
(
[18] => 8.3333333333333
[45] => 8.3333333333333
[55] => 8.3333333333333
[70] => 8.3333333333333
[76] => 8.3333333333333
[83] => 8.3333333333333
[88] => 8.3333333333333
[90] => 8.3333333333333
[92] => 16.666666666667
[95] => 8.3333333333333
[98] => 8.3333333333333
)
*/
================================================
FILE: examples/kde.php
================================================
<?php
require __DIR__ . "/../vendor/autoload.php";
use HiFolks\Statistics\Enums\KdeKernel;
use HiFolks\Statistics\Stat;
/**
* Kernel Density Estimation (KDE) examples.
*
* KDE builds a smooth, continuous probability density function from
* discrete sample data. Think of it as a "smoothed histogram" that
* lets you estimate the likelihood of any value — not just the ones
* you observed.
*
* Inspired by the Python statistics module:
* https://docs.python.org/3/library/statistics.html#statistics.kde
*/
// ---------------------------------------------------------------
// 1. Basic PDF estimation (Wikipedia example)
// ---------------------------------------------------------------
echo "=== 1. Basic PDF estimation ===" . PHP_EOL . PHP_EOL;
$sample = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2];
$h = 1.5;
$f = Stat::kde($sample, h: $h);
// Evaluate the estimated density at a few points
$points = [-4.0, -2.0, 0.0, 2.0, 4.0, 6.0, 8.0];
echo "Sample : " . implode(", ", $sample) . PHP_EOL;
echo "Bandwidth h = $h" . PHP_EOL . PHP_EOL;
echo str_pad("x", 8) . "f(x)" . PHP_EOL;
echo str_repeat("-", 24) . PHP_EOL;
foreach ($points as $x) {
$density = $f($x);
echo str_pad(number_format($x, 1), 8) . number_format($density, 6) . PHP_EOL;
}
// ---------------------------------------------------------------
// 2. ASCII density plot
// ---------------------------------------------------------------
echo PHP_EOL . "=== 2. ASCII density plot ===" . PHP_EOL . PHP_EOL;
$xMin = -6.0;
$xMax = 10.0;
$steps = 60;
$maxBarWidth = 50;
// Compute densities across the range
$densities = [];
$maxDensity = 0.0;
for ($i = 0; $i <= $steps; $i++) {
$x = $xMin + ($xMax - $xMin) * $i / $steps;
$d = $f($x);
$densities[] = [$x, $d];
if ($d > $maxDensity) {
$maxDensity = $d;
}
}
foreach ($densities as [$x, $d]) {
$barLen = (int) round($d / $maxDensity * $maxBarWidth);
echo str_pad(number_format($x, 1), 7)
. " |"
. str_repeat("*", $barLen)
. PHP_EOL;
}
// ---------------------------------------------------------------
// 3. Comparing kernels
// ---------------------------------------------------------------
echo PHP_EOL . "=== 3. Comparing kernels ===" . PHP_EOL . PHP_EOL;
$data = [1.0, 2.0, 3.0, 4.0, 5.0];
$evalAt = 3.0;
$kernelsToCompare = [
KdeKernel::Normal,
KdeKernel::Triangular,
KdeKernel::Rectangular,
KdeKernel::Parabolic,
KdeKernel::Cosine,
];
echo "Data: " . implode(", ", $data) . PHP_EOL;
echo "Evaluating density at x = $evalAt (h = 1.0)" . PHP_EOL . PHP_EOL;
echo str_pad("Kernel", 16) . "f($evalAt)" . PHP_EOL;
echo str_repeat("-", 30) . PHP_EOL;
foreach ($kernelsToCompare as $kernel) {
$fk = Stat::kde($data, 1.0, $kernel);
echo str_pad($kernel->value, 16)
. number_format($fk($evalAt), 6)
. PHP_EOL;
}
// ---------------------------------------------------------------
// 4. Cumulative Distribution Function (CDF)
// ---------------------------------------------------------------
echo PHP_EOL . "=== 4. Cumulative Distribution Function ===" . PHP_EOL . PHP_EOL;
$F = Stat::kde($sample, h: $h, cumulative: true);
echo "Sample : " . implode(", ", $sample) . PHP_EOL;
echo "Bandwidth h = $h" . PHP_EOL . PHP_EOL;
echo str_pad("x", 8) . "F(x)" . PHP_EOL;
echo str_repeat("-", 24) . PHP_EOL;
foreach ([-6.0, -4.0, -2.0, 0.0, 2.0, 4.0, 6.0, 8.0, 10.0] as $x) {
echo str_pad(number_format($x, 1), 8)
. number_format($F($x), 6)
. PHP_EOL;
}
// P(X <= 2.5)
$p = $F(2.5);
echo PHP_EOL . "P(X <= 2.5) = " . round($p * 100, 1) . "%" . PHP_EOL;
// ---------------------------------------------------------------
// 5. Alias equivalence
// ---------------------------------------------------------------
echo PHP_EOL . "=== 5. Alias equivalence ===" . PHP_EOL . PHP_EOL;
$aliasPairs = [
[KdeKernel::Gauss, KdeKernel::Normal],
[KdeKernel::Uniform, KdeKernel::Rectangular],
[KdeKernel::Epanechnikov, KdeKernel::Parabolic],
[KdeKernel::Biweight, KdeKernel::Quartic],
];
echo "Aliases resolve to their canonical kernel:" . PHP_EOL;
foreach ($aliasPairs as [$alias, $canonical]) {
$f1 = Stat::kde($data, 1.0, $alias);
$f2 = Stat::kde($data, 1.0, $canonical);
$match = abs($f1(3.0) - $f2(3.0)) < 1e-15 ? "OK" : "MISMATCH";
echo " " . str_pad($alias->value, 14) . " => "
. str_pad($canonical->value, 14)
. $match . PHP_EOL;
}
// ---------------------------------------------------------------
// 6. Random sampling with kdeRandom()
// ---------------------------------------------------------------
echo PHP_EOL . "=== 6. Random sampling with kdeRandom() ===" . PHP_EOL . PHP_EOL;
$rand = Stat::kdeRandom($sample, h: $h, seed: 8675309);
$nSamples = 10;
$samples = [];
for ($i = 0; $i < $nSamples; $i++) {
$samples[] = round($rand(), 1);
}
echo "Original data : " . implode(", ", $sample) . PHP_EOL;
echo "10 KDE samples: " . implode(", ", $samples) . PHP_EOL;
// ---------------------------------------------------------------
// 7. Verifying statistical properties of random samples
// ---------------------------------------------------------------
echo PHP_EOL . "=== 7. Statistical properties of KDE samples ===" . PHP_EOL . PHP_EOL;
$dataMean = Stat::mean($sample);
$n = 50000;
$sampler = Stat::kdeRandom($sample, h: $h, seed: 42);
$sum = 0.0;
for ($i = 0; $i < $n; $i++) {
$sum += $sampler();
}
$sampleMean = $sum / $n;
echo "Original data mean : " . round($dataMean, 4) . PHP_EOL;
echo "KDE sample mean (n=$n): " . round($sampleMean, 4) . PHP_EOL;
echo "Difference : " . round(abs($dataMean - $sampleMean), 4) . PHP_EOL;
// ---------------------------------------------------------------
// 8. Sampling with different kernels
// ---------------------------------------------------------------
echo PHP_EOL . "=== 8. Sampling with different kernels ===" . PHP_EOL . PHP_EOL;
echo "5 random draws per kernel (seed=42):" . PHP_EOL . PHP_EOL;
foreach ($kernelsToCompare as $kernel) {
$sampler = Stat::kdeRandom($sample, h: $h, kernel: $kernel, seed: 42);
$draws = [];
for ($i = 0; $i < 5; $i++) {
$draws[] = round($sampler(), 2);
}
echo str_pad($kernel->value, 16) . implode(", ", $draws) . PHP_EOL;
}
================================================
FILE: examples/kde_downhill.php
================================================
<?php
require __DIR__ . "/../vendor/autoload.php";
use HiFolks\Statistics\Enums\KdeKernel;
use HiFolks\Statistics\Stat;
/**
* Kernel Density Estimation applied to real sports data.
*
* Dataset: Men's Downhill results — Winter Olympic Games 2026.
*
* KDE lets us move beyond simple averages and histograms to answer
* richer questions: Where do finishing times cluster? What is the
* probability of finishing under a given threshold? How would
* simulated future races look?
*/
$results = [
["name" => "Franjo von ALLMEN", "time" => 111.61],
["name" => "Giovanni FRANZONI", "time" => 111.81],
["name" => "Dominik PARIS", "time" => 112.11],
["name" => "Marco ODERMATT", "time" => 112.31],
["name" => "Alexis MONNEY", "time" => 112.36],
["name" => "Vincent KRIECHMAYR", "time" => 112.38],
["name" => "Daniel HEMETSBERGER", "time" => 112.58],
["name" => "Nils ALLEGRE", "time" => 112.8],
["name" => "James CRAWFORD", "time" => 113.0],
["name" => "Kyle NEGOMIR", "time" => 113.2],
["name" => "Mattia CASSE", "time" => 113.28],
["name" => "Miha HROBAT", "time" => 113.3],
["name" => "Bryce BENNETT", "time" => 113.45],
["name" => "Cameron ALEXANDER", "time" => 113.49],
["name" => "Raphael HAASER", "time" => 113.5],
["name" => "Martin CATER", "time" => 113.51],
["name" => "Florian SCHIEDER", "time" => 113.57],
["name" => "Ryan COCHRAN-SIEGLE", "time" => 113.63],
["name" => "Sam MORSE", "time" => 113.68],
["name" => "Elian LEHTO", "time" => 113.83],
["name" => "Simon JOCHER", "time" => 114.01],
["name" => "Nils ALPHAND", "time" => 114.06],
["name" => "Stefan ROGENTIN", "time" => 114.18],
["name" => "Jan ZABYSTRAN", "time" => 114.39],
["name" => "Jeffrey READ", "time" => 114.56],
["name" => "Stefan BABINSKY", "time" => 114.73],
["name" => "Alban ELEZI CANNAFERINA", "time" => 114.9],
["name" => "Brodie SEGER", "time" => 114.96],
["name" => "Marco PFIFFNER", "time" => 115.66],
["name" => "Barnabas SZOLLOS", "time" => 117.03],
["name" => "Arnaud ALESSANDRIA", "time" => 117.15],
["name" => "Elvis OPMANIS", "time" => 119.24],
["name" => "Dmytro SHEPIUK", "time" => 120.11],
["name" => "Cormac COMERFORD", "time" => 124.4],
];
$times = array_column($results, "time");
echo "=== Men's Downhill — Olympic Winter Games 2026 ===" . PHP_EOL;
echo "Athletes: " . count($times) . PHP_EOL;
echo "Winner : " . $results[0]["name"] . " (" . $results[0]["time"] . "s)" . PHP_EOL;
echo "Mean : " . round(Stat::mean($times), 2) . "s" . PHP_EOL;
echo "Median : " . round(Stat::median($times), 2) . "s" . PHP_EOL;
// ---------------------------------------------------------------
// 1. Density profile — where do finishing times cluster?
// ---------------------------------------------------------------
echo PHP_EOL . "=== 1. Density profile ===" . PHP_EOL . PHP_EOL;
// A bandwidth of 0.8s is a good fit: wide enough to smooth out
// individual gaps, narrow enough to reveal the shape of the
// distribution. With 34 athletes, a smaller h would produce
// spiky noise; a larger h would wash out the interesting
// right-skewed tail.
$h = 0.8;
$f = Stat::kde($times, h: $h, kernel: KdeKernel::Normal);
echo "Bandwidth h = {$h}s" . PHP_EOL . PHP_EOL;
// Scan for the peak (mode of the continuous distribution)
$peakX = 0.0;
$peakD = 0.0;
$maxBarWidth = 50;
$densities = [];
for ($x = 110.0; $x <= 126.0; $x += 0.2) {
$d = $f($x);
$densities[] = [$x, $d];
if ($d > $peakD) {
$peakD = $d;
$peakX = $x;
}
}
echo "Density plot (each * ~ "
. round($peakD / $maxBarWidth, 5)
. " density units):" . PHP_EOL . PHP_EOL;
foreach ($densities as [$x, $d]) {
// Only print every 0.6s to keep it readable
if (round(($x - 110.0) * 10) % 6 !== 0) {
continue;
}
$barLen = (int) round($d / $peakD * $maxBarWidth);
echo str_pad(number_format($x, 1) . "s", 8)
. "|"
. str_repeat("*", $barLen)
. PHP_EOL;
}
echo PHP_EOL;
echo "Peak density at "
. number_format($peakX, 1) . "s"
. " — this is the KDE mode, the most likely finishing time."
. PHP_EOL;
echo "Compare with the arithmetic mean ("
. round(Stat::mean($times), 2)
. "s): the mean is pulled right" . PHP_EOL;
echo "by slow outliers, but KDE reveals the true concentration point."
. PHP_EOL;
// ---------------------------------------------------------------
// 2. Probability thresholds via CDF
// ---------------------------------------------------------------
echo PHP_EOL . "=== 2. Probability thresholds (CDF) ===" . PHP_EOL . PHP_EOL;
$F = Stat::kde($times, h: $h, cumulative: true);
$thresholds = [
[112.0, "podium contender"],
[113.0, "top-10 territory"],
[113.5, "solid mid-pack"],
[114.0, "~top 20"],
[115.0, "lower pack"],
[117.0, "off the pace"],
[120.0, "struggling finisher"],
];
echo str_pad("Threshold", 12)
. str_pad("P(time <= t)", 15)
. "Interpretation" . PHP_EOL;
echo str_repeat("-", 65) . PHP_EOL;
foreach ($thresholds as [$t, $label]) {
$prob = $F($t);
echo str_pad(number_format($t, 1) . "s", 12)
. str_pad(round($prob * 100, 1) . "%", 15)
. $label
. PHP_EOL;
}
// ---------------------------------------------------------------
// 3. Classifying each athlete by density region
// ---------------------------------------------------------------
echo PHP_EOL . "=== 3. Athlete classification by density ===" . PHP_EOL . PHP_EOL;
// Use the CDF to assign a percentile to each athlete.
// KDE percentiles reflect the actual shape of the distribution,
// unlike assuming a normal distribution.
echo str_pad("Rank", 5)
. str_pad("Athlete", 30)
. str_pad("Time", 9)
. str_pad("Pctile", 9)
. "Tier" . PHP_EOL;
echo str_repeat("-", 65) . PHP_EOL;
foreach ($results as $rank => $r) {
$pctile = $F($r["time"]) * 100;
if ($pctile <= 15) {
$tier = "Elite";
} elseif ($pctile <= 40) {
$tier = "Strong";
} elseif ($pctile <= 70) {
$tier = "Mid-pack";
} elseif ($pctile <= 90) {
$tier = "Back";
} else {
$tier = "Outlier";
}
echo str_pad((string) ($rank + 1), 5)
. str_pad($r["name"], 30)
. str_pad(number_format($r["time"], 2) . "s", 9)
. str_pad(round($pctile, 1) . "%", 9)
. $tier
. PHP_EOL;
}
// ---------------------------------------------------------------
// 4. Comparing kernels — does the choice matter here?
// ---------------------------------------------------------------
echo PHP_EOL . "=== 4. Kernel comparison ===" . PHP_EOL . PHP_EOL;
$kernels = [
KdeKernel::Normal,
KdeKernel::Triangular,
KdeKernel::Parabolic,
KdeKernel::Cosine,
];
$evalPoints = [112.0, 113.5, 115.0, 120.0];
echo str_pad("Kernel", 14);
foreach ($evalPoints as $ep) {
echo str_pad(number_format($ep, 1) . "s", 10);
}
echo PHP_EOL . str_repeat("-", 54) . PHP_EOL;
foreach ($kernels as $kernel) {
$fk = Stat::kde($times, $h, $kernel);
echo str_pad($kernel->value, 14);
foreach ($evalPoints as $ep) {
echo str_pad(number_format($fk($ep), 5), 10);
}
echo PHP_EOL;
}
echo PHP_EOL
. "With enough data (34 athletes) the kernel choice has minimal"
. PHP_EOL
. "impact — the bandwidth h matters far more."
. PHP_EOL;
// ---------------------------------------------------------------
// 5. Simulating future races with kdeRandom()
// ---------------------------------------------------------------
echo PHP_EOL . "=== 5. Simulating future races with kdeRandom() ===" . PHP_EOL . PHP_EOL;
// kdeRandom() draws random values from the estimated density.
// This is useful for "what-if" analysis: if the same field raced
// again under similar conditions, what might the results look like?
$nRaces = 10000;
$raceSize = count($times);
$rand = Stat::kdeRandom($times, h: $h, seed: 2026);
echo "Simulating $nRaces races of $raceSize athletes..." . PHP_EOL . PHP_EOL;
$winningTimes = [];
$podiumCuts = [];
for ($race = 0; $race < $nRaces; $race++) {
$simTimes = [];
for ($a = 0; $a < $raceSize; $a++) {
$simTimes[] = $rand();
}
sort($simTimes);
$winningTimes[] = $simTimes[0];
$podiumCuts[] = $simTimes[2]; // 3rd place
}
sort($winningTimes);
sort($podiumCuts);
echo "Winning time distribution (from $nRaces simulations):" . PHP_EOL;
echo " Fastest simulated winner : " . round(min($winningTimes), 2) . "s" . PHP_EOL;
echo " Median winning time : " . round(Stat::median($winningTimes), 2) . "s" . PHP_EOL;
echo " Slowest simulated winner : " . round(max($winningTimes), 2) . "s" . PHP_EOL;
echo " Actual winner : " . $results[0]["time"] . "s ("
. $results[0]["name"] . ")" . PHP_EOL;
echo PHP_EOL . "Podium threshold (3rd-place time):" . PHP_EOL;
echo " Median podium cut-off : " . round(Stat::median($podiumCuts), 2) . "s" . PHP_EOL;
echo " Actual 3rd place : " . $results[2]["time"] . "s ("
. $results[2]["name"] . ")" . PHP_EOL;
// ---------------------------------------------------------------
// 6. Podium probability per athlete
// ---------------------------------------------------------------
echo PHP_EOL . "=== 6. Podium probability per athlete ===" . PHP_EOL . PHP_EOL;
// For each athlete, we simulate many individual runs drawn from
// a personal KDE centered on their actual time. We then count
// how often each athlete's simulated time would beat the simulated
// podium cut-off.
//
// This captures two sources of uncertainty:
// - race-to-race variation across the whole field (podium cut-off)
// - each athlete's own run-to-run variation
$nSim = 50000;
$personalH = 0.5; // personal run-to-run variation (narrower than field)
echo "Estimating podium probability ($nSim simulations per athlete)..." . PHP_EOL;
echo "Personal bandwidth h = {$personalH}s" . PHP_EOL . PHP_EOL;
// Pre-sort podium cuts for percentile lookup
sort($podiumCuts);
$nPodium = count($podiumCuts);
echo str_pad("Athlete", 30)
. str_pad("Actual", 9)
. "P(podium)" . PHP_EOL;
echo str_repeat("-", 52) . PHP_EOL;
// Show top-15 athletes (the realistic podium contenders)
for ($idx = 0; $idx < min(15, count($results)); $idx++) {
$r = $results[$idx];
$athleteSampler = Stat::kdeRandom([$r["time"]], h: $personalH, seed: $idx);
$podiumCount = 0;
for ($s = 0; $s < $nSim; $s++) {
$simTime = $athleteSampler();
// Compare against a random podium cut-off from our race simulations
$cutIdx = $s % $nPodium;
if ($simTime <= $podiumCuts[$cutIdx]) {
$podiumCount++;
}
}
$prob = $podiumCount / $nSim * 100;
echo str_pad($r["name"], 30)
. str_pad(number_format($r["time"], 2) . "s", 9)
. round($prob, 1) . "%"
. PHP_EOL;
}
echo PHP_EOL
. "These probabilities reflect both the athlete's expected pace" . PHP_EOL
. "and the random variation inherent in downhill racing." . PHP_EOL;
================================================
FILE: examples/norm_dist.php
================================================
<?php
require __DIR__ . "/../vendor/autoload.php";
use HiFolks\Statistics\Freq;
use HiFolks\Statistics\NormalDist;
use HiFolks\Statistics\Stat;
/**
* This is the result of the Downhill race at Olympic Games 2026.
* The results are stored in an array with name and the time in
* seconds.
*/
$results = [
["name" => "Franjo von ALLMEN", "time" => 111.61],
["name" => "Giovanni FRANZONI", "time" => 111.81],
["name" => "Dominik PARIS", "time" => 112.11],
["name" => "Marco ODERMATT", "time" => 112.31],
["name" => "Alexis MONNEY", "time" => 112.36],
["name" => "Vincent KRIECHMAYR", "time" => 112.38],
["name" => "Daniel HEMETSBERGER", "time" => 112.58],
["name" => "Nils ALLEGRE", "time" => 112.8],
["name" => "James CRAWFORD", "time" => 113.0],
["name" => "Kyle NEGOMIR", "time" => 113.2],
["name" => "Mattia CASSE", "time" => 113.28],
["name" => "Miha HROBAT", "time" => 113.3],
["name" => "Bryce BENNETT", "time" => 113.45],
["name" => "Cameron ALEXANDER", "time" => 113.49],
["name" => "Raphael HAASER", "time" => 113.5],
["name" => "Martin CATER", "time" => 113.51],
["name" => "Florian SCHIEDER", "time" => 113.57],
["name" => "Ryan COCHRAN-SIEGLE", "time" => 113.63],
["name" => "Sam MORSE", "time" => 113.68],
["name" => "Elian LEHTO", "time" => 113.83],
["name" => "Simon JOCHER", "time" => 114.01],
["name" => "Nils ALPHAND", "time" => 114.06],
["name" => "Stefan ROGENTIN", "time" => 114.18],
["name" => "Jan ZABYSTRAN", "time" => 114.39],
["name" => "Jeffrey READ", "time" => 114.56],
["name" => "Stefan BABINSKY", "time" => 114.73],
["name" => "Alban ELEZI CANNAFERINA", "time" => 114.9],
["name" => "Brodie SEGER", "time" => 114.96],
["name" => "Marco PFIFFNER", "time" => 115.66],
["name" => "Barnabas SZOLLOS", "time" => 117.03],
["name" => "Arnaud ALESSANDRIA", "time" => 117.15],
["name" => "Elvis OPMANIS", "time" => 119.24],
["name" => "Dmytro SHEPIUK", "time" => 120.11],
["name" => "Cormac COMERFORD", "time" => 124.4],
];
$times = array_column($results, "time");
// --- Descriptive Statistics ---
echo "=== Downhill Race Analysis - Olympic Games 2026 ===" . PHP_EOL . PHP_EOL;
$mean = Stat::mean($times);
$median = Stat::median($times);
$std = Stat::stdev($times);
$min = min($times);
$max = max($times);
$range = $max - $min;
$quartiles = Stat::quantiles($times);
echo "Sample size: " . count($times) . PHP_EOL;
echo "Mean time: " . round($mean, 2) . " seconds" . PHP_EOL;
echo "Median time: " . round($median, 2) . " seconds" . PHP_EOL;
echo "Standard deviation: " . round($std, 2) . " seconds" . PHP_EOL;
echo "Min: "
. $min
. "s | Max: "
. $max
. "s | Range: "
. round($range, 2)
. "s"
. PHP_EOL;
echo "Quartiles (Q1, Q2, Q3): "
. round($quartiles[0], 2)
. "s, "
. round($quartiles[1], 2)
. "s, "
. round($quartiles[2], 2)
. "s"
. PHP_EOL;
echo "Skewness: " . Stat::skewness($times, 4)
. " (positive = right-skewed, a few slow finishers pull the tail right)"
. PHP_EOL;
echo "Kurtosis: " . Stat::kurtosis($times, 4)
. " (positive = leptokurtic, heavy tails with outliers)"
. PHP_EOL;
// --- Normal Distribution Model ---
echo PHP_EOL . "=== Normal Distribution Model ===" . PHP_EOL . PHP_EOL;
$normal = NormalDist::fromSamples($times);
echo "Estimated mu (mean): "
. $normal->getMeanRounded(2)
. " seconds"
. PHP_EOL;
echo "Estimated sigma (std dev): "
. $normal->getSigmaRounded(2)
. " seconds"
. PHP_EOL;
// Compare model median vs actual median
// For a normal distribution, median = mean, so getMedian() returns mu directly.
echo "Model median: " . $normal->getMedianRounded(2) . " seconds" . PHP_EOL;
echo "Actual median: " . round($median, 2) . " seconds" . PHP_EOL;
// Note: the model median equals the mean (as expected for a normal
// distribution), but it differs from the actual median by
// ~0.78 seconds. This gap tells us the data is right-skewed:
// a few very slow finishers (119s, 120s, 124s) pull the mean up.
// A normal distribution assumes symmetry, so it is not a perfect
// fit for this dataset.
// --- Thresholds from the model ---
echo PHP_EOL . "=== Performance Thresholds ===" . PHP_EOL . PHP_EOL;
$eliteThreshold = $normal->invCdfRounded(0.2, 2);
$slowThreshold = $normal->invCdfRounded(0.8, 2);
echo "Top 20% fastest (below): " . $eliteThreshold . " seconds" . PHP_EOL;
echo "Slowest 20% (above): " . $slowThreshold . " seconds" . PHP_EOL;
// --- Probability questions ---
echo PHP_EOL . "=== Probability Questions ===" . PHP_EOL . PHP_EOL;
$target = 113.0;
$probUnder = $normal->cdfRounded($target, 4);
$actualUnder = count(array_filter($times, fn(float $t): bool => $t <= $target));
echo "Model: P(time <= "
. $target
. "s) = "
. round($probUnder * 100, 1)
. "%"
. PHP_EOL;
echo "Actual: "
. $actualUnder
. "/"
. count($times)
. " = "
. round(($actualUnder / count($times)) * 100, 1)
. "%"
. PHP_EOL;
echo "(The gap shows the effect of skewness on the normal model)" . PHP_EOL;
$pdfAt = $normal->pdfRounded($target, 6);
echo "PDF at " . $target . "s = " . $pdfAt . PHP_EOL;
// --- Athlete Tier Classification ---
echo PHP_EOL . "=== Athlete Tier Classification ===" . PHP_EOL . PHP_EOL;
// We use percentile ranks based on the normal model.
// Lower time = better performance = lower percentile.
$tierDefinitions = [
["max" => 0.2, "label" => "Elite"],
["max" => 0.5, "label" => "Strong"],
["max" => 0.8, "label" => "Average"],
["max" => 1.0, "label" => "Below avg"],
];
foreach ($results as $r) {
$time = $r["time"];
$percentile = $normal->cdf($time);
$tier = "Below avg";
foreach ($tierDefinitions as $def) {
if ($percentile <= $def["max"]) {
$tier = $def["label"];
break;
}
}
$z = $normal->zscoreRounded($time, 2);
$zFormatted = ($z >= 0 ? "+" : "") . number_format($z, 2);
echo str_pad($r["name"], 30)
. str_pad(number_format($time, 2) . "s", 10)
. str_pad($tier, 12)
. "z: "
. str_pad($zFormatted, 7)
. "(percentile: "
. min(round($percentile * 100, 1), 99.9)
. "%)"
. PHP_EOL;
}
// --- Frequency Table ---
echo PHP_EOL . "=== Frequency Table (2-second classes) ===" . PHP_EOL . PHP_EOL;
$freqTable = Freq::frequencyTableBySize($times, 1);
foreach ($freqTable as $class => $count) {
echo str_pad($class . "s", 8)
. str_repeat("*", $count)
. " ("
. $count
. ")"
. PHP_EOL;
}
// --- Distribution Shape ---
echo PHP_EOL . "=== Distribution Shape ===" . PHP_EOL . PHP_EOL;
echo "Skewness: " . Stat::skewness($times, 4) . PHP_EOL;
echo "Kurtosis: " . Stat::kurtosis($times, 4) . PHP_EOL;
================================================
FILE: examples/recipes_binomial_approximation.php
================================================
<?php
require __DIR__ . '/../vendor/autoload.php';
use HiFolks\Statistics\NormalDist;
/**
* Recipe: Approximating Binomial Distributions
*
* Adapted from the Python statistics module "Examples and Recipes":
* https://docs.python.org/3/library/statistics.html#examples-and-recipes
*
* NormalDist can be used to approximate binomial distributions
* when the sample size is large (via the Central Limit Theorem).
*
* Scenario: a]conference has 750 attendees. 65% prefer Python
* and 35% prefer Ruby. The "Python" room holds 500 people.
* What is the probability that the room will stay within capacity?
*/
echo "=== Approximating Binomial Distributions ===" . PHP_EOL . PHP_EOL;
$n = 750; // Sample size (attendees)
$p = 0.65; // Probability of preferring Python
$q = 1.0 - $p; // Probability of preferring Ruby
$k = 500; // Room capacity
// For a binomial distribution B(n, p):
// mean = n * p
// sigma = sqrt(n * p * q)
$mu = $n * $p;
$sigma = sqrt($n * $p * $q);
echo "Binomial parameters:" . PHP_EOL;
echo " n = " . $n . " (attendees)" . PHP_EOL;
echo " p = " . $p . " (Python preference)" . PHP_EOL;
echo " Expected Python fans: " . $mu . PHP_EOL;
echo " Standard deviation: " . round($sigma, 2) . PHP_EOL;
echo PHP_EOL;
// Normal approximation with continuity correction
$normal = new NormalDist($mu, $sigma);
$probNormal = $normal->cdf($k + 0.5);
echo "Normal approximation: P(X <= " . $k . ") = "
. round($probNormal, 4) . PHP_EOL;
// Exact binomial calculation using log-space arithmetic.
// P(X <= k) = sum from r=0 to k of C(n,r) * p^r * q^(n-r)
// We use Stirling's log-gamma via log() of factorials to avoid overflow.
// Build log-factorial lookup table
$logFact = [0.0]; // log(0!) = 0
for ($i = 1; $i <= $n; $i++) {
$logFact[$i] = $logFact[$i - 1] + log($i);
}
$logTerms = [];
for ($r = 0; $r <= $k; $r++) {
// log(C(n,r)) = log(n!) - log(r!) - log((n-r)!)
$logBinom = $logFact[$n] - $logFact[$r] - $logFact[$n - $r];
$logTerms[] = $logBinom + $r * log($p) + ($n - $r) * log($q);
}
// Log-sum-exp for numerical stability
$maxLog = max($logTerms);
$sum = 0.0;
foreach ($logTerms as $logTerm) {
$sum += exp($logTerm - $maxLog);
}
$probExact = exp($maxLog + log($sum));
echo "Exact binomial: P(X <= " . $k . ") = "
. round($probExact, 4) . PHP_EOL;
// Monte Carlo simulation approximation
$seed = 8675309;
mt_srand($seed);
$trials = 10_000;
$successes = 0;
for ($i = 0; $i < $trials; $i++) {
$count = 0;
for ($j = 0; $j < $n; $j++) {
if (mt_rand() / mt_getrandmax() < $p) {
$count++;
}
}
if ($count <= $k) {
$successes++;
}
}
$probSimulation = $successes / $trials;
echo "Simulation (" . $trials . " trials): P(X <= " . $k . ") = "
. round($probSimulation, 4) . PHP_EOL;
echo PHP_EOL . "All three methods should give approximately the same result (~0.84)."
. PHP_EOL;
// --- Additional: What capacity is needed for 99% confidence? ---
echo PHP_EOL . "--- Capacity Planning ---" . PHP_EOL;
$needed = $normal->invCdfRounded(0.99, 0);
echo "For 99% confidence, room capacity should be: "
. $needed . " seats" . PHP_EOL;
================================================
FILE: examples/recipes_classic_probability.php
================================================
<?php
require __DIR__ . '/../vendor/autoload.php';
use HiFolks\Statistics\NormalDist;
/**
* Recipe: Classic Probability Problems
*
* Adapted from the Python statistics module "Examples and Recipes":
* https://docs.python.org/3/library/statistics.html#examples-and-recipes
*
* Using NormalDist to solve classic probability problems.
*/
echo "=== Classic Probability Problems ===" . PHP_EOL . PHP_EOL;
// --- SAT scores are normally distributed with mean 1060 and std dev 195 ---
$sat = new NormalDist(1060, 195);
// What percentage of students score between 1100 and 1200?
// Adding 0.5 applies a continuity correction for discrete scores.
$fraction = $sat->cdf(1200 + 0.5) - $sat->cdf(1100 - 0.5);
echo "Percentage of students scoring between 1100 and 1200: "
. round($fraction * 100, 1) . "%" . PHP_EOL;
// Quartiles: divide SAT scores into 4 equal-probability groups
echo PHP_EOL . "--- SAT Score Quartiles ---" . PHP_EOL;
$quartiles = $sat->quantiles(4);
echo "Quartiles (Q1, Q2, Q3): "
. implode(', ', array_map(round(...), $quartiles))
. PHP_EOL;
// Deciles: divide SAT scores into 10 equal-probability groups
echo PHP_EOL . "--- SAT Score Deciles ---" . PHP_EOL;
$deciles = $sat->quantiles(10);
echo "Deciles: "
. implode(', ', array_map(round(...), $deciles))
. PHP_EOL;
// --- What SAT score is needed to be in the top 10%? ---
echo PHP_EOL . "--- SAT Score Thresholds ---" . PHP_EOL;
$top10 = $sat->invCdfRounded(0.90, 0);
echo "SAT score needed for top 10%: " . $top10 . PHP_EOL;
$top1 = $sat->invCdfRounded(0.99, 0);
echo "SAT score needed for top 1%: " . $top1 . PHP_EOL;
// --- Probability of scoring above a threshold ---
$threshold = 1300;
$probAbove = 1 - $sat->cdf($threshold);
echo PHP_EOL . "Probability of scoring above " . $threshold . ": "
. round($probAbove * 100, 1) . "%" . PHP_EOL;
// --- Z-score for a specific SAT score ---
$score = 1250;
$z = $sat->zscoreRounded($score, 2);
echo "Z-score for SAT score of " . $score . ": " . $z . PHP_EOL;
================================================
FILE: examples/recipes_monte_carlo.php
================================================
<?php
require __DIR__ . '/../vendor/autoload.php';
use HiFolks\Statistics\NormalDist;
use HiFolks\Statistics\Stat;
/**
* Recipe: Monte Carlo Inputs for Simulations
*
* Adapted from the Python statistics module "Examples and Recipes":
* https://docs.python.org/3/library/statistics.html#examples-and-recipes
*
* NormalDist can generate random samples to use as inputs for
* Monte Carlo simulations.
*/
echo "=== Monte Carlo Simulation ===" . PHP_EOL . PHP_EOL;
/**
* A simple model function that combines three uncertain variables.
*/
function model(float $x, float $y, float $z): float
{
return (3 * $x + 7 * $x * $y - 5 * $y) / (11 * $z);
}
$n = 100_000;
// Generate random samples from three independent normal distributions
$X = (new NormalDist(10, 2.5))->samples($n, seed: 3652260728);
$Y = (new NormalDist(15, 1.75))->samples($n, seed: 4582495471);
$Z = (new NormalDist(50, 1.25))->samples($n, seed: 6582483453);
// Compute the model output for each set of inputs
$results = [];
for ($i = 0; $i < $n; $i++) {
$results[] = model($X[$i], $Y[$i], $Z[$i]);
}
// Find the quartiles of the model output distribution
$quantiles = Stat::quantiles($results);
echo "Model output quartiles (Q1, Q2, Q3):" . PHP_EOL;
echo " Q1: " . round($quantiles[0], 4) . PHP_EOL;
echo " Q2: " . round($quantiles[1], 4) . PHP_EOL;
echo " Q3: " . round($quantiles[2], 4) . PHP_EOL;
// Basic descriptive statistics of the simulation
echo PHP_EOL . "--- Simulation Summary ---" . PHP_EOL;
echo "Mean: " . round(Stat::mean($results), 4) . PHP_EOL;
echo "Stdev: " . round(Stat::stdev($results), 4) . PHP_EOL;
echo "Min: " . round(min($results), 4) . PHP_EOL;
echo "Max: " . round(max($results), 4) . PHP_EOL;
// Fit a normal distribution to the simulation results
$fitted = NormalDist::fromSamples($results);
echo PHP_EOL . "--- Fitted Normal Distribution ---" . PHP_EOL;
echo "Estimated mu: " . $fitted->getMeanRounded(4) . PHP_EOL;
echo "Estimated sigma: " . $fitted->getSigmaRounded(4) . PHP_EOL;
// Use the fitted distribution to answer probability questions
$threshold = 2.0;
$probAbove = 1 - $fitted->cdf($threshold);
echo PHP_EOL . "P(result > " . $threshold . "): "
. round($probAbove * 100, 1) . "%" . PHP_EOL;
================================================
FILE: examples/recipes_naive_bayes.php
================================================
<?php
require __DIR__ . '/../vendor/autoload.php';
use HiFolks\Statistics\NormalDist;
/**
* Recipe: Naive Bayesian Classifier
*
* Adapted from the Python statistics module "Examples and Recipes":
* https://docs.python.org/3/library/statistics.html#examples-and-recipes
*
* A simple Naive Bayes classifier using NormalDist.
* Given training data for height, weight, and foot size of males
* and females, classify a new person based on their measurements.
*/
echo "=== Naive Bayesian Classifier ===" . PHP_EOL . PHP_EOL;
// --- Training data ---
// Fit normal distributions to each feature for each class
echo "--- Training Phase ---" . PHP_EOL;
$heightMale = NormalDist::fromSamples([6, 5.92, 5.58, 5.92]);
$heightFemale = NormalDist::fromSamples([5, 5.5, 5.42, 5.75]);
$weightMale = NormalDist::fromSamples([180, 190, 170, 165]);
$weightFemale = NormalDist::fromSamples([100, 150, 130, 150]);
$footSizeMale = NormalDist::fromSamples([12, 11, 12, 10]);
$footSizeFemale = NormalDist::fromSamples([6, 8, 7, 9]);
echo "Height (male): mu=" . $heightMale->getMeanRounded(2)
. ", sigma=" . $heightMale->getSigmaRounded(2) . PHP_EOL;
echo "Height (female): mu=" . $heightFemale->getMeanRounded(2)
. ", sigma=" . $heightFemale->getSigmaRounded(2) . PHP_EOL;
echo "Weight (male): mu=" . $weightMale->getMeanRounded(2)
. ", sigma=" . $weightMale->getSigmaRounded(2) . PHP_EOL;
echo "Weight (female): mu=" . $weightFemale->getMeanRounded(2)
. ", sigma=" . $weightFemale->getSigmaRounded(2) . PHP_EOL;
echo "Foot size (male): mu=" . $footSizeMale->getMeanRounded(2)
. ", sigma=" . $footSizeMale->getSigmaRounded(2) . PHP_EOL;
echo "Foot size (female): mu=" . $footSizeFemale->getMeanRounded(2)
. ", sigma=" . $footSizeFemale->getSigmaRounded(2) . PHP_EOL;
// --- Classification ---
echo PHP_EOL . "--- Classification Phase ---" . PHP_EOL . PHP_EOL;
// Person to classify
$ht = 6.0; // height in feet
$wt = 130;
gitextract_47q7nj1_/
├── .editorconfig
├── .gitattributes
├── .github/
│ ├── CONTRIBUTING.md
│ ├── ISSUE_TEMPLATE/
│ │ └── config.yml
│ ├── SECURITY.md
│ ├── dependabot.yml
│ └── workflows/
│ ├── dependabot-auto-merge.yml
│ ├── run-tests.yml
│ └── static-code-analysis.yml
├── .gitignore
├── .php-cs-fixer.dist.php
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── README.md
├── TODO.md
├── composer.json
├── examples/
│ ├── article-boston-marathon-analysis.php
│ ├── article-downhill-ski-analysis.php
│ ├── article-gpx-running-analysis.php
│ ├── freq_methods.php
│ ├── frequencies.php
│ ├── kde.php
│ ├── kde_downhill.php
│ ├── norm_dist.php
│ ├── recipes_binomial_approximation.php
│ ├── recipes_classic_probability.php
│ ├── recipes_monte_carlo.php
│ ├── recipes_naive_bayes.php
│ ├── stat.php
│ └── stat_methods.php
├── phpstan.neon
├── phpunit.xml.dist
├── rector.php
├── src/
│ ├── ArrUtil.php
│ ├── Enums/
│ │ ├── Alternative.php
│ │ └── KdeKernel.php
│ ├── Exception/
│ │ └── InvalidDataInputException.php
│ ├── Freq.php
│ ├── Math.php
│ ├── NormalDist.php
│ ├── Stat.php
│ ├── Statistics.php
│ ├── StreamingStat.php
│ ├── StudentT.php
│ └── Utils/
│ ├── Arr.php
│ ├── Format.php
│ └── Math.php
└── tests/
├── ArrTest.php
├── FormatTest.php
├── FreqTest.php
├── FrequenciesTest.php
├── MathTest.php
├── NormalDistTest.php
├── StatDatasetTest.php
├── StatFromCsvTest.php
├── StatTest.php
├── StatisticTest.php
├── StreamingStatTest.php
├── StudentTTest.php
└── data/
└── income.data.csv
SYMBOL INDEX (625 symbols across 27 files)
FILE: examples/article-gpx-running-analysis.php
function parseGpx (line 31) | function parseGpx(string $filePath): array
function haversineDistance (line 72) | function haversineDistance(
function buildKmSplits (line 93) | function buildKmSplits(array $trackpoints): array
function formatPace (line 156) | function formatPace(int|float $seconds): string
FILE: examples/recipes_monte_carlo.php
function model (line 22) | function model(float $x, float $y, float $z): float
FILE: src/ArrUtil.php
class ArrUtil (line 10) | class ArrUtil extends Arr {}
FILE: src/Enums/KdeKernel.php
method resolve (line 21) | public function resolve(): self
FILE: src/Exception/InvalidDataInputException.php
class InvalidDataInputException (line 7) | class InvalidDataInputException extends InvalidArgumentException {}
FILE: src/Freq.php
class Freq (line 7) | class Freq
method isDiscreteType (line 12) | private static function isDiscreteType(mixed $value): bool
method frequencies (line 27) | public static function frequencies(array $data, bool $transformToInteg...
method cumulativeFrequencies (line 51) | public static function cumulativeFrequencies(array $data): array
method relativeFrequencies (line 72) | public static function relativeFrequencies(array $data, ?int $round = ...
method cumulativeRelativeFrequencies (line 92) | public static function cumulativeRelativeFrequencies(array $data): array
method frequencyTableBySize (line 109) | public static function frequencyTableBySize(array $data, int $chunkSiz...
method frequencyTable (line 150) | public static function frequencyTable(array $data, ?int $category = nu...
FILE: src/Math.php
class Math (line 10) | class Math extends UtilsMath {}
FILE: src/NormalDist.php
class NormalDist (line 7) | class NormalDist
method __construct (line 14) | public function __construct(private readonly float $mu = 0.0, float $s...
method getMean (line 23) | public function getMean(): float
method getMeanRounded (line 28) | public function getMeanRounded(int $precision = 3): float
method getSigma (line 34) | public function getSigma(): float
method getSigmaRounded (line 39) | public function getSigmaRounded(int $precision = 3): float
method getMedian (line 45) | public function getMedian(): float
method getMedianRounded (line 50) | public function getMedianRounded(int $precision = 3): float
method getMode (line 56) | public function getMode(): float
method getModeRounded (line 61) | public function getModeRounded(int $precision = 3): float
method getVariance (line 67) | public function getVariance(): float
method getVarianceRounded (line 72) | public function getVarianceRounded(int $precision = 3): float
method fromSamples (line 92) | public static function fromSamples(array $samples): self
method zscore (line 110) | public function zscore(float $x): float
method zscoreRounded (line 119) | public function zscoreRounded(float $x, int $precision = 3): float
method samples (line 135) | public function samples(int $n, ?int $seed = null): array
method pdf (line 158) | public function pdf(float $x): float
method pdfRounded (line 166) | public function pdfRounded(float $x, int $precision = 3): float
method erfc (line 172) | private function erfc(float $z): float
method erf (line 178) | private function erf(float $z): float
method cdf (line 196) | public function cdf(float $x): float
method cdfRounded (line 203) | public function cdfRounded(float $x, int $precision = 3): float
method invCdf (line 220) | public function invCdf(float $p): float
method invCdfRounded (line 286) | public function invCdfRounded(float $p, int $precision = 3): float
method quantiles (line 303) | public function quantiles(int $n = 4): array
method overlap (line 328) | public function overlap(NormalDist $other): float
method overlapRounded (line 362) | public function overlapRounded(NormalDist $other, int $precision = 3):...
method add (line 378) | public function add(float|NormalDist $x2): NormalDist
method subtract (line 401) | public function subtract(float|NormalDist $x2): NormalDist
method multiply (line 423) | public function multiply(float $constant): NormalDist
method divide (line 442) | public function divide(float $constant): NormalDist
FILE: src/Stat.php
class Stat (line 12) | class Stat
method count (line 25) | public static function count(array $data): int
method mean (line 43) | public static function mean(array $data): int|float|null
method fmean (line 72) | public static function fmean(
method trimmedMean (line 133) | public static function trimmedMean(
method median (line 174) | public static function median(
method weightedMedian (line 210) | public static function weightedMedian(array $data, array $weights, ?in...
method medianGrouped (line 278) | public static function medianGrouped(
method bisectLeft (line 313) | private static function bisectLeft(array $data, float $target): int
method bisectRight (line 337) | private static function bisectRight(
method medianLow (line 367) | public static function medianLow(array $data): mixed
method medianHigh (line 384) | public static function medianHigh(array $data): mixed
method mode (line 400) | public static function mode(array $data, bool $multimode = false): mixed
method multimode (line 436) | public static function multimode(array $data): ?array
method quantiles (line 452) | public static function quantiles(
method firstQuartile (line 511) | public static function firstQuartile(array $data, ?int $round = null):...
method thirdQuartile (line 527) | public static function thirdQuartile(array $data): mixed
method percentile (line 547) | public static function percentile(
method pstdev (line 599) | public static function pstdev(array $data, ?int $round = null): float
method pvariance (line 615) | public static function pvariance(
method stdev (line 646) | public static function stdev(array $data, ?int $round = null): float
method sem (line 665) | public static function sem(array $data, ?int $round = null): float
method meanAbsoluteDeviation (line 682) | public static function meanAbsoluteDeviation(array $data, ?int $round ...
method medianAbsoluteDeviation (line 711) | public static function medianAbsoluteDeviation(array $data, ?int $roun...
method zscores (line 739) | public static function zscores(array $data, ?int $round = null): array
method outliers (line 770) | public static function outliers(array $data, float $threshold = 3.0): ...
method iqrOutliers (line 796) | public static function iqrOutliers(array $data, float $factor = 1.5): ...
method variance (line 823) | public static function variance(
method skewness (line 861) | public static function skewness(array $data, ?int $round = null): float
method pskewness (line 901) | public static function pskewness(array $data, ?int $round = null): float
method kurtosis (line 947) | public static function kurtosis(array $data, ?int $round = null): float
method coefficientOfVariation (line 991) | public static function coefficientOfVariation(
method geometricMean (line 1019) | public static function geometricMean(array $data, ?int $round = null):...
method harmonicMean (line 1044) | public static function harmonicMean(
method covariance (line 1077) | public static function covariance(array $x, array $y): false|float
method correlation (line 1132) | public static function correlation(
method ranks (line 1190) | private static function ranks(array $data): array
method kde (line 1231) | public static function kde(
method kdeRandom (line 1416) | public static function kdeRandom(
method linearRegression (line 1642) | public static function linearRegression(
method logarithmicRegression (line 1706) | public static function logarithmicRegression(
method powerRegression (line 1736) | public static function powerRegression(
method exponentialRegression (line 1776) | public static function exponentialRegression(
method rSquared (line 1805) | public static function rSquared(array $x, array $y, bool $proportional...
method confidenceInterval (line 1861) | public static function confidenceInterval(
method zTest (line 1908) | public static function zTest(
method tTest (line 1952) | public static function tTest(
method tTestTwoSample (line 1997) | public static function tTestTwoSample(
method tTestPaired (line 2062) | public static function tTestPaired(
FILE: src/Statistics.php
class Statistics (line 9) | class Statistics
method __construct (line 33) | public function __construct(
method make (line 44) | public static function make(array $values): self
method stripZeroes (line 52) | public function stripZeroes(): self
method originalArray (line 64) | public function originalArray(): array
method frequencies (line 78) | public function frequencies(bool $transformToInteger = false): array
method relativeFrequencies (line 91) | public function relativeFrequencies(?int $round = null): array
method cumulativeRelativeFrequencies (line 103) | public function cumulativeRelativeFrequencies(): array
method cumulativeFrequencies (line 115) | public function cumulativeFrequencies(): array
method max (line 123) | public function max(): mixed
method min (line 134) | public function min(): mixed
method range (line 145) | public function range(): int|float
method count (line 153) | public function count(): int
method mean (line 163) | public function mean(): int|float|null
method trimmedMean (line 176) | public function trimmedMean(float $proportionToCut = 0.1, ?int $round ...
method median (line 186) | public function median(): mixed
method weightedMedian (line 199) | public function weightedMedian(array $weights, ?int $round = null): float
method medianGrouped (line 211) | public function medianGrouped(float $interval = 1.0): float
method firstQuartile (line 221) | public function firstQuartile(): mixed
method thirdQuartile (line 231) | public function thirdQuartile(): mixed
method interquartileRange (line 239) | public function interquartileRange(): mixed
method mode (line 249) | public function mode(): mixed
method stdev (line 261) | public function stdev(?int $round = null): float
method sem (line 273) | public function sem(?int $round = null): float
method confidenceInterval (line 287) | public function confidenceInterval(float $confidenceLevel = 0.95, ?int...
method zTest (line 302) | public function zTest(float $populationMean, Alternative $alternative ...
method tTest (line 317) | public function tTest(float $populationMean, Alternative $alternative ...
method tTestTwoSample (line 332) | public function tTestTwoSample(array $data2, Alternative $alternative ...
method tTestPaired (line 347) | public function tTestPaired(array $data2, Alternative $alternative = A...
method meanAbsoluteDeviation (line 359) | public function meanAbsoluteDeviation(?int $round = null): float
method medianAbsoluteDeviation (line 371) | public function medianAbsoluteDeviation(?int $round = null): float
method zscores (line 384) | public function zscores(?int $round = null): array
method outliers (line 397) | public function outliers(float $threshold = 3.0): array
method iqrOutliers (line 410) | public function iqrOutliers(float $factor = 1.5): array
method variance (line 422) | public function variance(?int $round = null): float
method pstdev (line 434) | public function pstdev(?int $round = null): float
method pvariance (line 446) | public function pvariance(?int $round = null): float
method skewness (line 458) | public function skewness(?int $round = null): float
method pskewness (line 470) | public function pskewness(?int $round = null): float
method kurtosis (line 482) | public function kurtosis(?int $round = null): float
method percentile (line 495) | public function percentile(float $p, ?int $round = null): float
method coefficientOfVariation (line 508) | public function coefficientOfVariation(?int $round = null, bool $popul...
method geometricMean (line 520) | public function geometricMean(?int $round = null): float
method harmonicMean (line 533) | public function harmonicMean(?int $round = null, ?array $weights = nul...
method valuesToString (line 541) | public function valuesToString(bool|int $sample = false): string
method numericalArray (line 551) | public function numericalArray(): array
FILE: src/StreamingStat.php
class StreamingStat (line 16) | class StreamingStat
method add (line 37) | public function add(int|float $value): self
method count (line 70) | public function count(): int
method sum (line 78) | public function sum(): float
method min (line 90) | public function min(): float
method max (line 102) | public function max(): float
method mean (line 114) | public function mean(?int $round = null): float
method variance (line 126) | public function variance(?int $round = null): float
method pvariance (line 140) | public function pvariance(?int $round = null): float
method stdev (line 152) | public function stdev(?int $round = null): float
method pstdev (line 160) | public function pstdev(?int $round = null): float
method skewness (line 168) | public function skewness(?int $round = null): float
method pskewness (line 194) | public function pskewness(?int $round = null): float
method kurtosis (line 214) | public function kurtosis(?int $round = null): float
FILE: src/StudentT.php
class StudentT (line 7) | class StudentT
method __construct (line 9) | public function __construct(private readonly float $df)
method getDegreesOfFreedom (line 16) | public function getDegreesOfFreedom(): float
method pdf (line 24) | public function pdf(float $t): float
method pdfRounded (line 34) | public function pdfRounded(float $t, int $precision = 3): float
method cdf (line 44) | public function cdf(float $t): float
method cdfRounded (line 57) | public function cdfRounded(float $t, int $precision = 3): float
method invCdf (line 69) | public function invCdf(float $p): float
method invCdfRounded (line 98) | public function invCdfRounded(float $p, int $precision = 3): float
method logGamma (line 106) | private static function logGamma(float $x): float
method regularizedIncompleteBeta (line 141) | private function regularizedIncompleteBeta(float $a, float $b, float $...
method incompleteBetaCf (line 160) | private function incompleteBetaCf(float $a, float $b, float $x): float
FILE: src/Utils/Arr.php
class Arr (line 5) | class Arr
method toString (line 12) | public static function toString(array $data, bool|int $sample = false)...
method stripZeroes (line 27) | public static function stripZeroes(array $data): array
method extract (line 46) | public static function extract(array $data, array $columns): array
method partition (line 69) | public static function partition(array $data, string $field, string $o...
method compare (line 87) | private static function compare(mixed $fieldValue, string $operator, m...
FILE: src/Utils/Format.php
class Format (line 5) | class Format
method secondsToHms (line 12) | public static function secondsToHms(int|float $seconds): array
method hmsToSeconds (line 25) | public static function hmsToSeconds(int $hours, int $minutes, int $sec...
method secondsToTime (line 33) | public static function secondsToTime(int|float $seconds): string
method timeToSeconds (line 43) | public static function timeToSeconds(string $time): int
FILE: src/Utils/Math.php
class Math (line 5) | class Math
method round (line 10) | public static function round(float $value, ?int $round): float
method isOdd (line 18) | public static function isOdd(int $number): bool
FILE: tests/ArrTest.php
class ArrTest (line 8) | class ArrTest extends TestCase
method test_extract_single_column (line 10) | public function test_extract_single_column(): void
method test_extract_multiple_columns (line 21) | public function test_extract_multiple_columns(): void
method test_extract_empty_array (line 33) | public function test_extract_empty_array(): void
method test_partition_equals (line 39) | public function test_partition_equals(): void
method test_partition_not_equals (line 53) | public function test_partition_not_equals(): void
method test_partition_greater_than (line 66) | public function test_partition_greater_than(): void
method test_partition_less_than (line 79) | public function test_partition_less_than(): void
method test_partition_greater_than_or_equal (line 92) | public function test_partition_greater_than_or_equal(): void
method test_partition_less_than_or_equal (line 105) | public function test_partition_less_than_or_equal(): void
method test_partition_empty_array (line 118) | public function test_partition_empty_array(): void
method test_partition_preserves_full_rows (line 125) | public function test_partition_preserves_full_rows(): void
method test_partition_invalid_operator (line 135) | public function test_partition_invalid_operator(): void
method test_to_string (line 144) | public function test_to_string(): void
method test_to_string_with_sample (line 149) | public function test_to_string_with_sample(): void
method test_strip_zeroes (line 154) | public function test_strip_zeroes(): void
FILE: tests/FormatTest.php
class FormatTest (line 8) | class FormatTest extends TestCase
method test_seconds_to_hms (line 10) | public function test_seconds_to_hms(): void
method test_seconds_to_hms_zero (line 18) | public function test_seconds_to_hms_zero(): void
method test_seconds_to_hms_with_float (line 26) | public function test_seconds_to_hms_with_float(): void
method test_hms_to_seconds (line 34) | public function test_hms_to_seconds(): void
method test_hms_to_seconds_zero (line 39) | public function test_hms_to_seconds_zero(): void
method test_seconds_to_time (line 44) | public function test_seconds_to_time(): void
method test_seconds_to_time_with_padding (line 49) | public function test_seconds_to_time_with_padding(): void
method test_time_to_seconds (line 54) | public function test_time_to_seconds(): void
method test_time_to_seconds_with_leading_zeros (line 59) | public function test_time_to_seconds_with_leading_zeros(): void
method test_round_trip_seconds (line 64) | public function test_round_trip_seconds(): void
method test_round_trip_hms (line 72) | public function test_round_trip_hms(): void
method test_time_to_seconds_invalid_format (line 81) | public function test_time_to_seconds_invalid_format(): void
method test_time_to_seconds_invalid_format_too_many_parts (line 87) | public function test_time_to_seconds_invalid_format_too_many_parts(): ...
FILE: tests/FreqTest.php
class FreqTest (line 8) | class FreqTest extends TestCase
method test_can_calculate_freq_table (line 10) | public function test_can_calculate_freq_table(): void
method test_can_calculate_relative_freq_table (line 27) | public function test_can_calculate_relative_freq_table(): void
method test_can_calculate_grouped_frequency_table (line 44) | public function test_can_calculate_grouped_frequency_table(): void
method test_can_calculate_grouped_frequency_table_by_size (line 75) | public function test_can_calculate_grouped_frequency_table_by_size(): ...
method test_frequency_table_with_empty_array (line 96) | public function test_frequency_table_with_empty_array(): void
method test_frequency_table_by_size_with_empty_array (line 101) | public function test_frequency_table_by_size_with_empty_array(): void
FILE: tests/FrequenciesTest.php
class FrequenciesTest (line 9) | class FrequenciesTest extends TestCase
method test_can_calculate_frequencies (line 11) | public function test_can_calculate_frequencies(): void
method test_can_calculate_relative_frequencies (line 21) | public function test_can_calculate_relative_frequencies(): void
method test_can_calculate_cumulative_frequencies (line 32) | public function test_can_calculate_cumulative_frequencies(): void
method test_can_calculate_cumulative_relative_frequencies (line 43) | public function test_can_calculate_cumulative_relative_frequencies(): ...
method test_can_calculate_first_quartile (line 54) | public function test_can_calculate_first_quartile(): void
method test_can_calculate_first_quartile_with_empty_array (line 63) | public function test_can_calculate_first_quartile_with_empty_array(): ...
method test_can_calculate_third_quartile (line 70) | public function test_can_calculate_third_quartile(): void
method test_can_calculate_third_quartile_with_empty_array (line 79) | public function test_can_calculate_third_quartile_with_empty_array(): ...
FILE: tests/MathTest.php
class MathTest (line 8) | class MathTest extends TestCase
method test_is_odd (line 10) | public function test_is_odd(): void
FILE: tests/NormalDistTest.php
class NormalDistTest (line 8) | class NormalDistTest extends TestCase
method test_init_normal_dist (line 10) | public function test_init_normal_dist(): void
method test_can_calculate_normal_dist_cdf (line 17) | public function test_can_calculate_normal_dist_cdf(): void
method test_can_calculate_normal_dist_pdf (line 23) | public function test_can_calculate_normal_dist_pdf(): void
method test_median (line 30) | public function test_median(): void
method test_median_from_samples (line 41) | public function test_median_from_samples(): void
method test_mode (line 49) | public function test_mode(): void
method test_mode_from_samples (line 60) | public function test_mode_from_samples(): void
method test_variance (line 68) | public function test_variance(): void
method test_variance_from_samples (line 81) | public function test_variance_from_samples(): void
method test_load_normal_dist_from_samples (line 88) | public function test_load_normal_dist_from_samples(): void
method test_add_to_normal_dist (line 96) | public function test_add_to_normal_dist(): void
method test_multiply_normal_dist (line 107) | public function test_multiply_normal_dist(): void
method test_subtract_constant_from_normal_dist (line 117) | public function test_subtract_constant_from_normal_dist(): void
method test_subtract_normal_dist (line 127) | public function test_subtract_normal_dist(): void
method test_divide_normal_dist (line 137) | public function test_divide_normal_dist(): void
method test_divide_preserves_original (line 146) | public function test_divide_preserves_original(): void
method test_divide_by_zero_throws (line 157) | public function test_divide_by_zero_throws(): void
method test_quantiles_default_quartiles (line 164) | public function test_quantiles_default_quartiles(): void
method test_quantiles_deciles (line 175) | public function test_quantiles_deciles(): void
method test_quantiles_percentiles (line 190) | public function test_quantiles_percentiles(): void
method test_quantiles_n_one_returns_empty (line 199) | public function test_quantiles_n_one_returns_empty(): void
method test_quantiles_throws_for_invalid_n (line 206) | public function test_quantiles_throws_for_invalid_n(): void
method test_samples_count (line 213) | public function test_samples_count(): void
method test_samples_statistical_properties (line 220) | public function test_samples_statistical_properties(): void
method test_samples_seed_reproducibility (line 232) | public function test_samples_seed_reproducibility(): void
method test_samples_throws_for_invalid_n (line 240) | public function test_samples_throws_for_invalid_n(): void
method test_zscore (line 247) | public function test_zscore(): void
method test_zscore_standard_normal (line 260) | public function test_zscore_standard_normal(): void
method test_zscore_rounded (line 268) | public function test_zscore_rounded(): void
method test_zscore_throws_for_zero_sigma (line 274) | public function test_zscore_throws_for_zero_sigma(): void
method test_overlap_identical_distributions (line 281) | public function test_overlap_identical_distributions(): void
method test_overlap_different_means (line 288) | public function test_overlap_different_means(): void
method test_overlap_equal_variances (line 296) | public function test_overlap_equal_variances(): void
method test_overlap_far_apart_distributions (line 308) | public function test_overlap_far_apart_distributions(): void
method test_overlap_is_symmetric (line 316) | public function test_overlap_is_symmetric(): void
method test_inv_cdf_standard_normal (line 327) | public function test_inv_cdf_standard_normal(): void
method test_inv_cdf_custom_distribution (line 339) | public function test_inv_cdf_custom_distribution(): void
method test_inv_cdf_throws_for_invalid_p (line 350) | public function test_inv_cdf_throws_for_invalid_p(): void
method test_inv_cdf_throws_for_p_equals_one (line 357) | public function test_inv_cdf_throws_for_p_equals_one(): void
method test_inv_cdf_extreme_tails (line 364) | public function test_inv_cdf_extreme_tails(): void
method test_constructor_negative_sigma_throws (line 373) | public function test_constructor_negative_sigma_throws(): void
method test_from_samples_empty_throws (line 379) | public function test_from_samples_empty_throws(): void
method test_cdf_rounded (line 385) | public function test_cdf_rounded(): void
method test_overlap_zero_sigma_throws (line 393) | public function test_overlap_zero_sigma_throws(): void
FILE: tests/StatDatasetTest.php
class StatDatasetTest (line 9) | class StatDatasetTest extends TestCase
method test_mean (line 11) | public function test_mean(): void
method test_mean_chain (line 17) | public function test_mean_chain(): void
method test_mean_dataset (line 24) | #[DataProvider('meanDatasetProvider')]
method meanDatasetProvider (line 31) | public static function meanDatasetProvider(): array
method test_dynamic_operation (line 40) | #[DataProvider('dynamicOperationProvider')]
method dynamicOperationProvider (line 47) | public static function dynamicOperationProvider(): array
method test_dynamic_operation_with_external_dataset (line 60) | #[DataProvider('externalDatasetProvider')]
method externalDatasetProvider (line 75) | public static function externalDatasetProvider(): array
FILE: tests/StatFromCsvTest.php
class StatFromCsvTest (line 8) | class StatFromCsvTest extends TestCase
method test_parse_csv (line 10) | public function test_parse_csv(): void
FILE: tests/StatTest.php
class StatTest (line 11) | class StatTest extends TestCase
method test_calculates_mean (line 13) | public function test_calculates_mean(): void
method test_calculates_fmean (line 21) | public function test_calculates_fmean(): void
method test_calculates_fmean_with_empty_array (line 47) | public function test_calculates_fmean_with_empty_array(): void
method test_fmean_empty_data_throws (line 53) | public function test_fmean_empty_data_throws(): void
method test_fmean_mismatched_weights_throws (line 59) | public function test_fmean_mismatched_weights_throws(): void
method test_fmean_zero_weight_sum_throws (line 65) | public function test_fmean_zero_weight_sum_throws(): void
method test_calculates_median (line 71) | public function test_calculates_median(): void
method test_calculates_median_with_empty_array (line 83) | public function test_calculates_median_with_empty_array(): void
method test_calculates_median_low (line 89) | public function test_calculates_median_low(): void
method test_calculates_median_low_with_empty_array (line 96) | public function test_calculates_median_low_with_empty_array(): void
method test_calculates_median_high (line 102) | public function test_calculates_median_high(): void
method test_calculates_median_high_with_empty_array (line 109) | public function test_calculates_median_high_with_empty_array(): void
method test_calculates_median_grouped (line 115) | public function test_calculates_median_grouped(): void
method test_calculates_median_grouped_with_empty_array (line 143) | public function test_calculates_median_grouped_with_empty_array(): void
method test_calculates_mode (line 149) | public function test_calculates_mode(): void
method test_calculates_mode_with_empty_array (line 156) | public function test_calculates_mode_with_empty_array(): void
method test_calculates_multimode (line 162) | public function test_calculates_multimode(): void
method test_calculates_multimode_with_empty_array (line 175) | public function test_calculates_multimode_with_empty_array(): void
method test_calculates_population_standard_deviation (line 181) | public function test_calculates_population_standard_deviation(): void
method test_calculates_population_standard_deviation_with_empty_array (line 189) | public function test_calculates_population_standard_deviation_with_emp...
method test_calculates_sample_standard_deviation (line 195) | public function test_calculates_sample_standard_deviation(): void
method test_calculates_sample_standard_deviation_with_empty_array (line 202) | public function test_calculates_sample_standard_deviation_with_empty_a...
method test_calculates_sample_standard_deviation_with_single_element (line 208) | public function test_calculates_sample_standard_deviation_with_single_...
method test_calculates_variance (line 214) | public function test_calculates_variance(): void
method test_calculates_variance_with_precomputed_mean (line 219) | public function test_calculates_variance_with_precomputed_mean(): void
method test_calculates_pvariance (line 229) | public function test_calculates_pvariance(): void
method test_calculates_pvariance_with_precomputed_mean (line 235) | public function test_calculates_pvariance_with_precomputed_mean(): void
method test_calculates_skewness_symmetric (line 245) | public function test_calculates_skewness_symmetric(): void
method test_calculates_skewness_right_skewed (line 250) | public function test_calculates_skewness_right_skewed(): void
method test_calculates_skewness_left_skewed (line 256) | public function test_calculates_skewness_left_skewed(): void
method test_calculates_skewness_with_rounding (line 262) | public function test_calculates_skewness_with_rounding(): void
method test_skewness_with_empty_array (line 269) | public function test_skewness_with_empty_array(): void
method test_skewness_with_two_elements (line 275) | public function test_skewness_with_two_elements(): void
method test_skewness_with_identical_values (line 281) | public function test_skewness_with_identical_values(): void
method test_calculates_pskewness_symmetric (line 287) | public function test_calculates_pskewness_symmetric(): void
method test_calculates_pskewness_right_skewed (line 292) | public function test_calculates_pskewness_right_skewed(): void
method test_calculates_pskewness_left_skewed (line 298) | public function test_calculates_pskewness_left_skewed(): void
method test_calculates_pskewness_with_rounding (line 304) | public function test_calculates_pskewness_with_rounding(): void
method test_pskewness_with_empty_array (line 311) | public function test_pskewness_with_empty_array(): void
method test_pskewness_with_two_elements (line 317) | public function test_pskewness_with_two_elements(): void
method test_pskewness_with_identical_values (line 323) | public function test_pskewness_with_identical_values(): void
method test_pskewness_less_than_skewness_for_small_samples (line 329) | public function test_pskewness_less_than_skewness_for_small_samples():...
method test_calculates_kurtosis_normal_like (line 338) | public function test_calculates_kurtosis_normal_like(): void
method test_calculates_kurtosis_heavy_tails (line 344) | public function test_calculates_kurtosis_heavy_tails(): void
method test_calculates_kurtosis_light_tails (line 351) | public function test_calculates_kurtosis_light_tails(): void
method test_calculates_kurtosis_with_rounding (line 358) | public function test_calculates_kurtosis_with_rounding(): void
method test_kurtosis_with_empty_array (line 364) | public function test_kurtosis_with_empty_array(): void
method test_kurtosis_with_three_elements (line 370) | public function test_kurtosis_with_three_elements(): void
method test_kurtosis_with_identical_values (line 376) | public function test_kurtosis_with_identical_values(): void
method test_calculates_geometric_mean (line 382) | public function test_calculates_geometric_mean(): void
method test_calculates_geometric_mean_with_empty_array (line 387) | public function test_calculates_geometric_mean_with_empty_array(): void
method test_calculates_harmonic_mean (line 393) | public function test_calculates_harmonic_mean(): void
method test_calculates_harmonic_mean_with_empty_array (line 401) | public function test_calculates_harmonic_mean_with_empty_array(): void
method test_calculates_quantiles (line 407) | public function test_calculates_quantiles(): void
method test_calculates_quantiles_with_too_few_elements (line 430) | public function test_calculates_quantiles_with_too_few_elements(): void
method test_calculates_quantiles_with_invalid_n (line 436) | public function test_calculates_quantiles_with_invalid_n(): void
method test_calculates_quantiles_inclusive (line 442) | public function test_calculates_quantiles_inclusive(): void
method test_calculates_quantiles_with_invalid_method (line 461) | public function test_calculates_quantiles_with_invalid_method(): void
method test_calculates_first_quartile (line 467) | public function test_calculates_first_quartile(): void
method test_calculates_first_quartile_with_empty_array (line 473) | public function test_calculates_first_quartile_with_empty_array(): void
method test_calculates_covariance (line 479) | public function test_calculates_covariance(): void
method test_calculates_covariance_wrong_usage (line 500) | public function test_calculates_covariance_wrong_usage(): void
method test_calculates_covariance_with_empty_arrays (line 509) | public function test_calculates_covariance_with_empty_arrays(): void
method test_calculates_covariance_with_single_element (line 515) | public function test_calculates_covariance_with_single_element(): void
method test_calculates_covariance_with_non_numeric_first (line 521) | public function test_calculates_covariance_with_non_numeric_first(): void
method test_calculates_covariance_with_non_numeric_second (line 528) | public function test_calculates_covariance_with_non_numeric_second(): ...
method test_calculates_correlation (line 535) | public function test_calculates_correlation(): void
method test_calculates_spearman_correlation (line 566) | public function test_calculates_spearman_correlation(): void
method test_calculates_spearman_correlation_planets (line 596) | public function test_calculates_spearman_correlation_planets(): void
method test_calculates_spearman_correlation_with_ties (line 620) | public function test_calculates_spearman_correlation_with_ties(): void
method test_calculates_correlation_invalid_method (line 632) | public function test_calculates_correlation_invalid_method(): void
method test_calculates_correlation_wrong_usage_different_lengths (line 642) | public function test_calculates_correlation_wrong_usage_different_leng...
method test_calculates_correlation_wrong_usage_empty (line 651) | public function test_calculates_correlation_wrong_usage_empty(): void
method test_calculates_correlation_wrong_usage_single (line 657) | public function test_calculates_correlation_wrong_usage_single(): void
method test_calculates_correlation_wrong_usage_constant (line 663) | public function test_calculates_correlation_wrong_usage_constant(): void
method test_calculates_linear_regression (line 669) | public function test_calculates_linear_regression(): void
method test_calculates_linear_regression_with_single_element (line 691) | public function test_calculates_linear_regression_with_single_element(...
method test_calculates_linear_regression_with_different_lengths (line 697) | public function test_calculates_linear_regression_with_different_lengt...
method test_calculates_linear_regression_with_constant_x (line 703) | public function test_calculates_linear_regression_with_constant_x(): void
method test_calculates_proportional_linear_regression (line 709) | public function test_calculates_proportional_linear_regression(): void
method test_proportional_linear_regression_with_all_zeros_x (line 730) | public function test_proportional_linear_regression_with_all_zeros_x()...
method test_r_squared_perfect_fit (line 736) | public function test_r_squared_perfect_fit(): void
method test_r_squared_real_data (line 742) | public function test_r_squared_real_data(): void
method test_r_squared_with_rounding (line 751) | public function test_r_squared_with_rounding(): void
method test_r_squared_proportional (line 761) | public function test_r_squared_proportional(): void
method test_r_squared_with_different_lengths (line 771) | public function test_r_squared_with_different_lengths(): void
method test_r_squared_with_single_element (line 777) | public function test_r_squared_with_single_element(): void
method test_r_squared_with_constant_y (line 783) | public function test_r_squared_with_constant_y(): void
method test_logarithmic_regression (line 789) | public function test_logarithmic_regression(): void
method test_logarithmic_regression_running_pace (line 800) | public function test_logarithmic_regression_running_pace(): void
method test_logarithmic_regression_diminishing_values (line 817) | public function test_logarithmic_regression_diminishing_values(): void
method test_logarithmic_regression_with_non_positive_x (line 827) | public function test_logarithmic_regression_with_non_positive_x(): void
method test_logarithmic_regression_with_negative_x (line 833) | public function test_logarithmic_regression_with_negative_x(): void
method test_power_regression (line 839) | public function test_power_regression(): void
method test_power_regression_with_non_positive_x (line 850) | public function test_power_regression_with_non_positive_x(): void
method test_power_regression_with_non_positive_y (line 856) | public function test_power_regression_with_non_positive_y(): void
method test_exponential_regression (line 862) | public function test_exponential_regression(): void
method test_exponential_regression_with_non_positive_y (line 873) | public function test_exponential_regression_with_non_positive_y(): void
method test_confidence_interval_95 (line 879) | public function test_confidence_interval_95(): void
method test_confidence_interval_99 (line 889) | public function test_confidence_interval_99(): void
method test_confidence_interval_with_rounding (line 898) | public function test_confidence_interval_with_rounding(): void
method test_confidence_interval_narrows_with_more_data (line 906) | public function test_confidence_interval_narrows_with_more_data(): void
method test_confidence_interval_single_element_throws (line 915) | public function test_confidence_interval_single_element_throws(): void
method test_confidence_interval_empty_throws (line 921) | public function test_confidence_interval_empty_throws(): void
method test_confidence_interval_invalid_confidence_level_throws (line 927) | public function test_confidence_interval_invalid_confidence_level_thro...
method test_confidence_interval_confidence_level_one_throws (line 934) | public function test_confidence_interval_confidence_level_one_throws()...
method test_confidence_interval_confidence_level_above_one_throws (line 941) | public function test_confidence_interval_confidence_level_above_one_th...
method test_confidence_interval_negative_confidence_level_throws (line 948) | public function test_confidence_interval_negative_confidence_level_thr...
method test_z_test_two_sided (line 957) | public function test_z_test_two_sided(): void
method test_z_test_greater (line 969) | public function test_z_test_greater(): void
method test_z_test_less (line 978) | public function test_z_test_less(): void
method test_z_test_non_significant (line 986) | public function test_z_test_non_significant(): void
method test_z_test_with_rounding (line 995) | public function test_z_test_with_rounding(): void
method test_z_test_single_element_throws (line 1003) | public function test_z_test_single_element_throws(): void
method test_z_test_empty_throws (line 1009) | public function test_z_test_empty_throws(): void
method test_t_test_two_sided (line 1017) | public function test_t_test_two_sided(): void
method test_t_test_greater (line 1031) | public function test_t_test_greater(): void
method test_t_test_less (line 1040) | public function test_t_test_less(): void
method test_t_test_non_significant (line 1048) | public function test_t_test_non_significant(): void
method test_t_test_degrees_of_freedom (line 1057) | public function test_t_test_degrees_of_freedom(): void
method test_t_test_large_sample_converges_to_z_test (line 1064) | public function test_t_test_large_sample_converges_to_z_test(): void
method test_t_test_with_rounding (line 1073) | public function test_t_test_with_rounding(): void
method test_t_test_single_element_throws (line 1081) | public function test_t_test_single_element_throws(): void
method test_t_test_empty_throws (line 1087) | public function test_t_test_empty_throws(): void
method test_t_test_two_sample_two_sided (line 1095) | public function test_t_test_two_sample_two_sided(): void
method test_t_test_two_sample_equal_means (line 1108) | public function test_t_test_two_sample_equal_means(): void
method test_t_test_two_sample_significant_difference (line 1117) | public function test_t_test_two_sample_significant_difference(): void
method test_t_test_two_sample_greater (line 1128) | public function test_t_test_two_sample_greater(): void
method test_t_test_two_sample_less (line 1137) | public function test_t_test_two_sample_less(): void
method test_t_test_two_sample_unequal_sizes (line 1146) | public function test_t_test_two_sample_unequal_sizes(): void
method test_t_test_two_sample_with_rounding (line 1156) | public function test_t_test_two_sample_with_rounding(): void
method test_t_test_two_sample_welch_df (line 1165) | public function test_t_test_two_sample_welch_df(): void
method test_t_test_two_sample_single_element_throws (line 1176) | public function test_t_test_two_sample_single_element_throws(): void
method test_t_test_two_sample_empty_throws (line 1182) | public function test_t_test_two_sample_empty_throws(): void
method test_t_test_two_sample_zero_variance_throws (line 1188) | public function test_t_test_two_sample_zero_variance_throws(): void
method test_t_test_paired_two_sided (line 1196) | public function test_t_test_paired_two_sided(): void
method test_t_test_paired_no_difference (line 1211) | public function test_t_test_paired_no_difference(): void
method test_t_test_paired_significant (line 1221) | public function test_t_test_paired_significant(): void
method test_t_test_paired_greater (line 1232) | public function test_t_test_paired_greater(): void
method test_t_test_paired_less (line 1241) | public function test_t_test_paired_less(): void
method test_t_test_paired_with_rounding (line 1250) | public function test_t_test_paired_with_rounding(): void
method test_t_test_paired_different_lengths_throws (line 1259) | public function test_t_test_paired_different_lengths_throws(): void
method test_t_test_paired_single_element_throws (line 1265) | public function test_t_test_paired_single_element_throws(): void
method test_t_test_paired_empty_throws (line 1271) | public function test_t_test_paired_empty_throws(): void
method test_kde_normal (line 1277) | public function test_kde_normal(): void
method test_kde_all_kernels (line 1301) | public function test_kde_all_kernels(): void
method test_kde_cumulative (line 1313) | public function test_kde_cumulative(): void
method test_kde_aliases (line 1332) | public function test_kde_aliases(): void
method test_kde_empty_data (line 1358) | public function test_kde_empty_data(): void
method test_kde_invalid_bandwidth (line 1364) | public function test_kde_invalid_bandwidth(): void
method test_kde_invalid_bandwidth_negative (line 1370) | public function test_kde_invalid_bandwidth_negative(): void
method test_kde_random_returns_callable (line 1376) | public function test_kde_random_returns_callable(): void
method test_kde_random_all_kernels (line 1386) | public function test_kde_random_all_kernels(): void
method test_kde_random_seed_reproducibility (line 1398) | public function test_kde_random_seed_reproducibility(): void
method test_kde_random_aliases (line 1417) | public function test_kde_random_aliases(): void
method test_kde_random_known_output (line 1448) | public function test_kde_random_known_output(): void
method test_kde_random_statistical_properties (line 1463) | public function test_kde_random_statistical_properties(): void
method test_kde_random_empty_data (line 1479) | public function test_kde_random_empty_data(): void
method test_kde_random_invalid_bandwidth (line 1485) | public function test_kde_random_invalid_bandwidth(): void
method test_covariance_non_numeric_x_throws (line 1491) | public function test_covariance_non_numeric_x_throws(): void
method test_covariance_non_numeric_y_throws (line 1499) | public function test_covariance_non_numeric_y_throws(): void
method test_kde_cumulative_bounded_kernels (line 1505) | public function test_kde_cumulative_bounded_kernels(): void
method test_kde_random_quartic_covers_small_p (line 1539) | public function test_kde_random_quartic_covers_small_p(): void
method test_kde_random_triweight_covers_both_signs (line 1550) | public function test_kde_random_triweight_covers_both_signs(): void
method test_kde_random_triangular_covers_both_branches (line 1561) | public function test_kde_random_triangular_covers_both_branches(): void
method test_percentile_median_matches (line 1574) | public function test_percentile_median_matches(): void
method test_percentile_quartiles (line 1581) | public function test_percentile_quartiles(): void
method test_percentile_boundaries (line 1590) | public function test_percentile_boundaries(): void
method test_percentile_rounding (line 1597) | public function test_percentile_rounding(): void
method test_percentile_too_few_data_throws (line 1604) | public function test_percentile_too_few_data_throws(): void
method test_percentile_out_of_range_throws (line 1610) | public function test_percentile_out_of_range_throws(): void
method test_percentile_negative_throws (line 1616) | public function test_percentile_negative_throws(): void
method test_coefficient_of_variation (line 1624) | public function test_coefficient_of_variation(): void
method test_coefficient_of_variation_population (line 1631) | public function test_coefficient_of_variation_population(): void
method test_coefficient_of_variation_rounding (line 1638) | public function test_coefficient_of_variation_rounding(): void
method test_coefficient_of_variation_low_dispersion (line 1645) | public function test_coefficient_of_variation_low_dispersion(): void
method test_coefficient_of_variation_zero_mean_throws (line 1653) | public function test_coefficient_of_variation_zero_mean_throws(): void
method test_coefficient_of_variation_negative_mean (line 1659) | public function test_coefficient_of_variation_negative_mean(): void
method test_coefficient_of_variation_too_few_data_throws (line 1667) | public function test_coefficient_of_variation_too_few_data_throws(): void
method test_trimmed_mean_basic (line 1675) | public function test_trimmed_mean_basic(): void
method test_trimmed_mean_zero_trim_equals_mean (line 1684) | public function test_trimmed_mean_zero_trim_equals_mean(): void
method test_trimmed_mean_with_rounding (line 1694) | public function test_trimmed_mean_with_rounding(): void
method test_trimmed_mean_removes_outliers (line 1701) | public function test_trimmed_mean_removes_outliers(): void
method test_trimmed_mean_empty_throws (line 1709) | public function test_trimmed_mean_empty_throws(): void
method test_trimmed_mean_proportion_too_high_throws (line 1715) | public function test_trimmed_mean_proportion_too_high_throws(): void
method test_trimmed_mean_negative_proportion_throws (line 1721) | public function test_trimmed_mean_negative_proportion_throws(): void
method test_weighted_median_basic (line 1729) | public function test_weighted_median_basic(): void
method test_weighted_median_skewed_weights (line 1735) | public function test_weighted_median_skewed_weights(): void
method test_weighted_median_unsorted_data (line 1741) | public function test_weighted_median_unsorted_data(): void
method test_weighted_median_interpolation_at_midpoint (line 1751) | public function test_weighted_median_interpolation_at_midpoint(): void
method test_weighted_median_single_element (line 1759) | public function test_weighted_median_single_element(): void
method test_weighted_median_with_rounding (line 1764) | public function test_weighted_median_with_rounding(): void
method test_weighted_median_empty_throws (line 1770) | public function test_weighted_median_empty_throws(): void
method test_weighted_median_length_mismatch_throws (line 1776) | public function test_weighted_median_length_mismatch_throws(): void
method test_weighted_median_negative_weight_throws (line 1782) | public function test_weighted_median_negative_weight_throws(): void
method test_weighted_median_zero_weight_throws (line 1788) | public function test_weighted_median_zero_weight_throws(): void
method test_sem (line 1796) | public function test_sem(): void
method test_sem_with_rounding (line 1803) | public function test_sem_with_rounding(): void
method test_sem_decreases_with_larger_sample (line 1810) | public function test_sem_decreases_with_larger_sample(): void
method test_sem_too_few_data_throws (line 1817) | public function test_sem_too_few_data_throws(): void
method test_mean_absolute_deviation (line 1825) | public function test_mean_absolute_deviation(): void
method test_mean_absolute_deviation_single_element (line 1831) | public function test_mean_absolute_deviation_single_element(): void
method test_mean_absolute_deviation_identical_values (line 1836) | public function test_mean_absolute_deviation_identical_values(): void
method test_mean_absolute_deviation_with_rounding (line 1841) | public function test_mean_absolute_deviation_with_rounding(): void
method test_mean_absolute_deviation_less_than_stdev (line 1847) | public function test_mean_absolute_deviation_less_than_stdev(): void
method test_mean_absolute_deviation_empty_throws (line 1854) | public function test_mean_absolute_deviation_empty_throws(): void
method test_median_absolute_deviation (line 1862) | public function test_median_absolute_deviation(): void
method test_median_absolute_deviation_with_outlier (line 1868) | public function test_median_absolute_deviation_with_outlier(): void
method test_median_absolute_deviation_single_element (line 1882) | public function test_median_absolute_deviation_single_element(): void
method test_median_absolute_deviation_identical_values (line 1887) | public function test_median_absolute_deviation_identical_values(): void
method test_median_absolute_deviation_with_rounding (line 1892) | public function test_median_absolute_deviation_with_rounding(): void
method test_median_absolute_deviation_empty_throws (line 1898) | public function test_median_absolute_deviation_empty_throws(): void
method test_zscores (line 1906) | public function test_zscores(): void
method test_zscores_sum_to_zero (line 1919) | public function test_zscores_sum_to_zero(): void
method test_zscores_with_rounding (line 1926) | public function test_zscores_with_rounding(): void
method test_zscores_identical_values_throws (line 1935) | public function test_zscores_identical_values_throws(): void
method test_zscores_too_few_data_throws (line 1941) | public function test_zscores_too_few_data_throws(): void
method test_outliers_detects_extreme_values (line 1949) | public function test_outliers_detects_extreme_values(): void
method test_outliers_no_outliers (line 1956) | public function test_outliers_no_outliers(): void
method test_outliers_custom_threshold (line 1963) | public function test_outliers_custom_threshold(): void
method test_outliers_identical_values_throws (line 1971) | public function test_outliers_identical_values_throws(): void
method test_iqr_outliers_detects_extreme_values (line 1979) | public function test_iqr_outliers_detects_extreme_values(): void
method test_iqr_outliers_no_outliers (line 1990) | public function test_iqr_outliers_no_outliers(): void
method test_iqr_outliers_custom_factor (line 1997) | public function test_iqr_outliers_custom_factor(): void
method test_iqr_outliers_identical_values (line 2006) | public function test_iqr_outliers_identical_values(): void
method test_iqr_outliers_with_negative_values (line 2014) | public function test_iqr_outliers_with_negative_values(): void
method test_iqr_outliers_too_few_data_throws (line 2022) | public function test_iqr_outliers_too_few_data_throws(): void
FILE: tests/StatisticTest.php
class StatisticTest (line 11) | class StatisticTest extends TestCase
method test_can_calculate_statistics (line 13) | public function test_can_calculate_statistics(): void
method test_can_calculate_statistics_again (line 36) | public function test_can_calculate_statistics_again(): void
method test_can_calculate_statistics_again_and_again (line 52) | public function test_can_calculate_statistics_again_and_again(): void
method test_can_strip_zeros (line 79) | public function test_can_strip_zeros(): void
method test_can_calculate_mean (line 87) | public function test_can_calculate_mean(): void
method test_can_calculate_mean_again (line 101) | public function test_can_calculate_mean_again(): void
method test_can_values_to_string (line 116) | public function test_can_values_to_string(): void
method test_calculates_population_standard_deviation (line 123) | public function test_calculates_population_standard_deviation(): void
method test_calculates_population_standard_deviation_with_empty_array (line 137) | public function test_calculates_population_standard_deviation_with_emp...
method test_calculates_sample_standard_deviation (line 143) | public function test_calculates_sample_standard_deviation(): void
method test_calculates_sample_standard_deviation_with_empty_array (line 153) | public function test_calculates_sample_standard_deviation_with_empty_a...
method test_calculates_sample_standard_deviation_with_single_element (line 159) | public function test_calculates_sample_standard_deviation_with_single_...
method test_calculates_variance (line 165) | public function test_calculates_variance(): void
method test_calculates_pvariance (line 173) | public function test_calculates_pvariance(): void
method test_calculates_skewness (line 182) | public function test_calculates_skewness(): void
method test_calculates_pskewness (line 187) | public function test_calculates_pskewness(): void
method test_calculates_kurtosis (line 192) | public function test_calculates_kurtosis(): void
method test_calculates_geometric_mean (line 198) | public function test_calculates_geometric_mean(): void
method test_calculates_geometric_mean_with_empty_array (line 204) | public function test_calculates_geometric_mean_with_empty_array(): void
method test_calculates_harmonic_mean (line 210) | public function test_calculates_harmonic_mean(): void
method test_calculates_harmonic_mean_with_empty_array (line 215) | public function test_calculates_harmonic_mean_with_empty_array(): void
method test_can_distinct_numeric_array (line 221) | public function test_can_distinct_numeric_array(): void
method test_median_grouped (line 230) | public function test_median_grouped(): void
method test_max_with_empty_array (line 237) | public function test_max_with_empty_array(): void
method test_min_with_empty_array (line 242) | public function test_min_with_empty_array(): void
method test_percentile (line 247) | public function test_percentile(): void
method test_percentile_with_rounding (line 255) | public function test_percentile_with_rounding(): void
method test_coefficient_of_variation (line 262) | public function test_coefficient_of_variation(): void
method test_coefficient_of_variation_with_rounding (line 272) | public function test_coefficient_of_variation_with_rounding(): void
method test_trimmed_mean (line 279) | public function test_trimmed_mean(): void
method test_trimmed_mean_with_rounding (line 286) | public function test_trimmed_mean_with_rounding(): void
method test_weighted_median (line 293) | public function test_weighted_median(): void
method test_sem (line 302) | public function test_sem(): void
method test_confidence_interval (line 309) | public function test_confidence_interval(): void
method test_confidence_interval_with_params (line 318) | public function test_confidence_interval_with_params(): void
method test_mean_absolute_deviation (line 327) | public function test_mean_absolute_deviation(): void
method test_median_absolute_deviation (line 333) | public function test_median_absolute_deviation(): void
method test_zscores (line 339) | public function test_zscores(): void
method test_outliers (line 347) | public function test_outliers(): void
method test_iqr_outliers (line 353) | public function test_iqr_outliers(): void
method test_z_test (line 361) | public function test_z_test(): void
method test_z_test_with_params (line 370) | public function test_z_test_with_params(): void
method test_t_test (line 379) | public function test_t_test(): void
method test_t_test_with_params (line 389) | public function test_t_test_with_params(): void
FILE: tests/StreamingStatTest.php
class StreamingStatTest (line 10) | class StreamingStatTest extends TestCase
method fromArray (line 18) | private function fromArray(array $data): StreamingStat
method test_matches_stat_mean (line 28) | public function test_matches_stat_mean(): void
method test_matches_stat_variance (line 36) | public function test_matches_stat_variance(): void
method test_matches_stat_pvariance (line 44) | public function test_matches_stat_pvariance(): void
method test_matches_stat_stdev (line 52) | public function test_matches_stat_stdev(): void
method test_matches_stat_pstdev (line 60) | public function test_matches_stat_pstdev(): void
method test_matches_stat_skewness (line 68) | public function test_matches_stat_skewness(): void
method test_matches_stat_pskewness (line 76) | public function test_matches_stat_pskewness(): void
method test_matches_stat_kurtosis (line 84) | public function test_matches_stat_kurtosis(): void
method test_rounding (line 92) | public function test_rounding(): void
method test_chaining (line 107) | public function test_chaining(): void
method test_empty_mean_throws (line 114) | public function test_empty_mean_throws(): void
method test_one_element_variance_throws (line 120) | public function test_one_element_variance_throws(): void
method test_two_elements_skewness_throws (line 129) | public function test_two_elements_skewness_throws(): void
method test_three_elements_kurtosis_throws (line 139) | public function test_three_elements_kurtosis_throws(): void
method test_insufficient_data_pskewness_throws (line 149) | public function test_insufficient_data_pskewness_throws(): void
method test_identical_values_pskewness_throws (line 156) | public function test_identical_values_pskewness_throws(): void
method test_identical_values_skewness_throws (line 163) | public function test_identical_values_skewness_throws(): void
method test_identical_values_kurtosis_throws (line 170) | public function test_identical_values_kurtosis_throws(): void
method test_large_dataset (line 177) | public function test_large_dataset(): void
method test_count (line 198) | public function test_count(): void
method test_negative_values (line 206) | public function test_negative_values(): void
method test_pvariance_single_element (line 216) | public function test_pvariance_single_element(): void
method test_empty_pvariance_throws (line 222) | public function test_empty_pvariance_throws(): void
method test_sum (line 228) | public function test_sum(): void
method test_empty_sum_throws (line 235) | public function test_empty_sum_throws(): void
method test_min (line 241) | public function test_min(): void
method test_empty_min_throws (line 248) | public function test_empty_min_throws(): void
method test_max (line 254) | public function test_max(): void
method test_empty_max_throws (line 261) | public function test_empty_max_throws(): void
method test_min_max_single_element (line 267) | public function test_min_max_single_element(): void
method test_min_max_negative_values (line 274) | public function test_min_max_negative_values(): void
FILE: tests/StudentTTest.php
class StudentTTest (line 10) | class StudentTTest extends TestCase
method test_constructor_valid_df (line 14) | public function test_constructor_valid_df(): void
method test_constructor_fractional_df (line 20) | public function test_constructor_fractional_df(): void
method test_constructor_zero_df_throws (line 26) | public function test_constructor_zero_df_throws(): void
method test_constructor_negative_df_throws (line 32) | public function test_constructor_negative_df_throws(): void
method test_pdf_df1_cauchy (line 40) | public function test_pdf_df1_cauchy(): void
method test_pdf_df5 (line 48) | public function test_pdf_df5(): void
method test_pdf_df30 (line 55) | public function test_pdf_df30(): void
method test_pdf_symmetry (line 63) | public function test_pdf_symmetry(): void
method test_pdf_tails (line 70) | public function test_pdf_tails(): void
method test_pdf_rounded (line 78) | public function test_pdf_rounded(): void
method test_cdf_at_zero (line 86) | public function test_cdf_at_zero(): void
method test_cdf_df1_cauchy (line 95) | public function test_cdf_df1_cauchy(): void
method test_cdf_df5_known_values (line 103) | public function test_cdf_df5_known_values(): void
method test_cdf_monotonicity (line 110) | public function test_cdf_monotonicity(): void
method test_cdf_converges_to_normal_for_large_df (line 121) | public function test_cdf_converges_to_normal_for_large_df(): void
method test_cdf_rounded (line 135) | public function test_cdf_rounded(): void
method test_inv_cdf_round_trip (line 143) | public function test_inv_cdf_round_trip(): void
method test_inv_cdf_symmetry (line 151) | public function test_inv_cdf_symmetry(): void
method test_inv_cdf_median (line 164) | public function test_inv_cdf_median(): void
method test_inv_cdf_throws_for_p_zero (line 173) | public function test_inv_cdf_throws_for_p_zero(): void
method test_inv_cdf_throws_for_p_one (line 180) | public function test_inv_cdf_throws_for_p_one(): void
method test_inv_cdf_rounded (line 187) | public function test_inv_cdf_rounded(): void
method test_inv_cdf_extreme_tail (line 194) | public function test_inv_cdf_extreme_tail(): void
method test_pdf_fractional_df_triggers_loggamma_reflection (line 205) | public function test_pdf_fractional_df_triggers_loggamma_reflection():...
method test_cdf_very_large_t_value (line 217) | public function test_cdf_very_large_t_value(): void
method test_cdf_negative_very_large_t_value (line 224) | public function test_cdf_negative_very_large_t_value(): void
method test_cdf_df1_wide_range (line 231) | public function test_cdf_df1_wide_range(): void
method test_cdf_df2_known_values (line 247) | public function test_cdf_df2_known_values(): void
method test_cdf_very_high_df (line 259) | public function test_cdf_very_high_df(): void
method test_cdf_small_t_many_df_values (line 268) | public function test_cdf_small_t_many_df_values(): void
method test_cdf_symmetry_identity (line 281) | public function test_cdf_symmetry_identity(): void
Condensed preview — 62 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (506K chars).
[
{
"path": ".editorconfig",
"chars": 220,
"preview": "root = true\n\n[*]\ncharset = utf-8\nindent_size = 4\nindent_style = space\nend_of_line = lf\ninsert_final_newline = true\ntrim_"
},
{
"path": ".gitattributes",
"chars": 586,
"preview": "# Path-based git attributes\n# https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html\n\n# Ignore all test and"
},
{
"path": ".github/CONTRIBUTING.md",
"chars": 2972,
"preview": "# Contributing\n\nContributions are **welcome** and will be fully **credited**.\n\nPlease read and understand the contributi"
},
{
"path": ".github/ISSUE_TEMPLATE/config.yml",
"chars": 511,
"preview": "blank_issues_enabled: false\ncontact_links:\n - name: Request a new feature\n url: https://github.com/hi-folks/stat"
},
{
"path": ".github/SECURITY.md",
"chars": 1957,
"preview": "# Package Security Policy\n\n## Reporting Security Issues\n\nIf you discover any security-related issues within our package,"
},
{
"path": ".github/dependabot.yml",
"chars": 321,
"preview": "# Please see the documentation for all configuration options:\n# https://help.github.com/github/administering-a-repositor"
},
{
"path": ".github/workflows/dependabot-auto-merge.yml",
"chars": 1029,
"preview": "name: dependabot-auto-merge\non: pull_request_target\n\npermissions:\n pull-requests: write\n contents: write\n\njobs:\n depe"
},
{
"path": ".github/workflows/run-tests.yml",
"chars": 1003,
"preview": "name: Tests\n\non: [push, pull_request]\n\njobs:\n test:\n runs-on: ${{ matrix.os }}\n strategy:\n fail-fast: true\n "
},
{
"path": ".github/workflows/static-code-analysis.yml",
"chars": 1072,
"preview": "name: Static Code Analysis\n\non: [push, pull_request]\n\njobs:\n test:\n runs-on: ${{ matrix.os }}\n strategy:\n fa"
},
{
"path": ".gitignore",
"chars": 153,
"preview": ".idea\n.php_cs\n.php_cs.cache\n.phpunit.result.cache\nbuild\nbin\ncomposer.lock\ncoverage\ndocs\nphpunit.xml\npsalm.xml\nvendor\n.ph"
},
{
"path": ".php-cs-fixer.dist.php",
"chars": 1124,
"preview": "<?php\n\n$finder = new PhpCsFixer\\Finder()->in([\n __DIR__ . \"/src\",\n __DIR__ . \"/tests\",\n __DIR__ . \"/examples\",\n"
},
{
"path": "CHANGELOG.md",
"chars": 9451,
"preview": "# Changelog\n\n## 1.5.0 - 2026-03-07\n- Adding `logarithmicRegression()`, `powerRegression()`, and `exponentialRegression()"
},
{
"path": "CODE_OF_CONDUCT.md",
"chars": 5821,
"preview": "# Contributor Covenant Code of Conduct\n\n## Our Commitment\n\nWe, as members, contributors, and leaders, are committed to e"
},
{
"path": "CONTRIBUTING.md",
"chars": 3299,
"preview": "# Contributing\n\nYour contributions are highly appreciated, and they will be duly recognized.\n\nBefore you proceed to crea"
},
{
"path": "LICENSE.md",
"chars": 1096,
"preview": "The MIT License (MIT)\n\nCopyright (c) hi-folks <roberto.butti@gmail.com>\n\nPermission is hereby granted, free of charge, t"
},
{
"path": "README.md",
"chars": 58932,
"preview": "<p align=\"center\">\n <img src=\"https://repository-images.githubusercontent.com/445609326/e2539776-0f8f-4556-be1d-887ea"
},
{
"path": "TODO.md",
"chars": 600,
"preview": "## Missing Functions\n\n\n\n\n### Correlation & Regression\n\n\n- Kendall tau correlation - another rank-based correlation\n- Mul"
},
{
"path": "composer.json",
"chars": 1549,
"preview": "{\n \"name\": \"hi-folks/statistics\",\n \"description\": \"PHP package that provides functions for calculating mathematica"
},
{
"path": "examples/article-boston-marathon-analysis.php",
"chars": 30122,
"preview": "<?php\n\n/**\n * Analyzing 75,000 Boston Marathon Runners with PHP Statistics\n *\n * This script accompanies the article tha"
},
{
"path": "examples/article-downhill-ski-analysis.php",
"chars": 17905,
"preview": "<?php\n\n/**\n * Exploring Olympic Downhill Results with PHP Statistics\n *\n * This script accompanies the article:\n * https"
},
{
"path": "examples/article-gpx-running-analysis.php",
"chars": 20894,
"preview": "<?php\n\n/**\n * Analyze Your Running Performance with GPX Data and PHP Statistics\n *\n * This script shows how to parse a G"
},
{
"path": "examples/freq_methods.php",
"chars": 774,
"preview": "<?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\n$data = [55, 70, 57, 73, 55, 59, 64, 72,\n 60, 48, 58, 54, 69, 51"
},
{
"path": "examples/frequencies.php",
"chars": 1095,
"preview": "<?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\nuse HiFolks\\Statistics\\Freq;\nuse HiFolks\\Statistics\\Statistics;\n\n$f"
},
{
"path": "examples/kde.php",
"chars": 6270,
"preview": "<?php\n\nrequire __DIR__ . \"/../vendor/autoload.php\";\n\nuse HiFolks\\Statistics\\Enums\\KdeKernel;\nuse HiFolks\\Statistics\\Stat"
},
{
"path": "examples/kde_downhill.php",
"chars": 11034,
"preview": "<?php\n\nrequire __DIR__ . \"/../vendor/autoload.php\";\n\nuse HiFolks\\Statistics\\Enums\\KdeKernel;\nuse HiFolks\\Statistics\\Stat"
},
{
"path": "examples/norm_dist.php",
"chars": 6850,
"preview": "<?php\n\nrequire __DIR__ . \"/../vendor/autoload.php\";\n\nuse HiFolks\\Statistics\\Freq;\nuse HiFolks\\Statistics\\NormalDist;\nuse"
},
{
"path": "examples/recipes_binomial_approximation.php",
"chars": 3202,
"preview": "<?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\nuse HiFolks\\Statistics\\NormalDist;\n\n/**\n * Recipe: Approximating Bi"
},
{
"path": "examples/recipes_classic_probability.php",
"chars": 2011,
"preview": "<?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\nuse HiFolks\\Statistics\\NormalDist;\n\n/**\n * Recipe: Classic Probabil"
},
{
"path": "examples/recipes_monte_carlo.php",
"chars": 2242,
"preview": "<?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\nuse HiFolks\\Statistics\\NormalDist;\nuse HiFolks\\Statistics\\Stat;\n\n/*"
},
{
"path": "examples/recipes_naive_bayes.php",
"chars": 4134,
"preview": "<?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\nuse HiFolks\\Statistics\\NormalDist;\n\n/**\n * Recipe: Naive Bayesian C"
},
{
"path": "examples/stat.php",
"chars": 318,
"preview": "<?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\nuse HiFolks\\Statistics\\Freq;\nuse HiFolks\\Statistics\\Stat;\n\n$freq = "
},
{
"path": "examples/stat_methods.php",
"chars": 1684,
"preview": "<?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\nuse HiFolks\\Statistics\\Stat;\n\n$mean = Stat::mean([1, 2, 3, 4, 4]);\n"
},
{
"path": "phpstan.neon",
"chars": 145,
"preview": "includes:\n - vendor/phpstan/phpstan-phpunit/extension.neon\n\nparameters:\n\tlevel: 8\n\ttreatPhpDocTypesAsCertain: false\n\t"
},
{
"path": "phpunit.xml.dist",
"chars": 1024,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<phpunit\n xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n xsi:noName"
},
{
"path": "rector.php",
"chars": 768,
"preview": "<?php\n\ndeclare(strict_types=1);\n\nuse Rector\\CodeQuality\\Rector\\Class_\\InlineConstructorDefaultToPropertyRector;\nuse Rect"
},
{
"path": "src/ArrUtil.php",
"chars": 168,
"preview": "<?php\n\nnamespace HiFolks\\Statistics;\n\nuse HiFolks\\Statistics\\Utils\\Arr;\n\n/**\n * @deprecated Use \\HiFolks\\Statistics\\Util"
},
{
"path": "src/Enums/Alternative.php",
"chars": 160,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Enums;\n\nenum Alternative: string\n{\n case TwoSided = 'two-sided';\n case Greater"
},
{
"path": "src/Enums/KdeKernel.php",
"chars": 795,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Enums;\n\nenum KdeKernel: string\n{\n case Normal = 'normal';\n case Gauss = 'gauss"
},
{
"path": "src/Exception/InvalidDataInputException.php",
"chars": 147,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Exception;\n\nuse InvalidArgumentException;\n\nclass InvalidDataInputException extends I"
},
{
"path": "src/Freq.php",
"chars": 5516,
"preview": "<?php\n\nnamespace HiFolks\\Statistics;\n\nuse HiFolks\\Statistics\\Utils\\Math;\n\nclass Freq\n{\n /**\n * Return true is the"
},
{
"path": "src/Math.php",
"chars": 186,
"preview": "<?php\n\nnamespace HiFolks\\Statistics;\n\nuse HiFolks\\Statistics\\Utils\\Math as UtilsMath;\n\n/**\n * @deprecated Use \\HiFolks\\S"
},
{
"path": "src/NormalDist.php",
"chars": 15259,
"preview": "<?php\n\nnamespace HiFolks\\Statistics;\n\nuse HiFolks\\Statistics\\Exception\\InvalidDataInputException;\n\nclass NormalDist\n{\n "
},
{
"path": "src/Stat.php",
"chars": 70188,
"preview": "<?php\n\nnamespace HiFolks\\Statistics;\n\nuse HiFolks\\Statistics\\Enums\\Alternative;\nuse HiFolks\\Statistics\\Enums\\KdeKernel;\n"
},
{
"path": "src/Statistics.php",
"chars": 15288,
"preview": "<?php\n\nnamespace HiFolks\\Statistics;\n\nuse HiFolks\\Statistics\\Enums\\Alternative;\nuse HiFolks\\Statistics\\Exception\\Invalid"
},
{
"path": "src/StreamingStat.php",
"chars": 6373,
"preview": "<?php\n\nnamespace HiFolks\\Statistics;\n\nuse HiFolks\\Statistics\\Exception\\InvalidDataInputException;\nuse HiFolks\\Statistics"
},
{
"path": "src/StudentT.php",
"chars": 5961,
"preview": "<?php\n\nnamespace HiFolks\\Statistics;\n\nuse HiFolks\\Statistics\\Exception\\InvalidDataInputException;\n\nclass StudentT\n{\n "
},
{
"path": "src/Utils/Arr.php",
"chars": 2797,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Utils;\n\nclass Arr\n{\n /**\n * Returns a string with values joined with a separa"
},
{
"path": "src/Utils/Format.php",
"chars": 1562,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Utils;\n\nclass Format\n{\n /**\n * Convert seconds to an associative array with h"
},
{
"path": "src/Utils/Math.php",
"chars": 440,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Utils;\n\nclass Math\n{\n /**\n * Rounds value with the given precision, if the ro"
},
{
"path": "tests/ArrTest.php",
"chars": 4328,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Utils\\Arr;\nuse PHPUnit\\Framework\\TestCase;\n\nclass Arr"
},
{
"path": "tests/FormatTest.php",
"chars": 2654,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Utils\\Format;\nuse PHPUnit\\Framework\\TestCase;\n\nclass "
},
{
"path": "tests/FreqTest.php",
"chars": 4429,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Freq;\nuse PHPUnit\\Framework\\TestCase;\n\nclass FreqTest"
},
{
"path": "tests/FrequenciesTest.php",
"chars": 2457,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Exception\\InvalidDataInputException;\nuse HiFolks\\Stat"
},
{
"path": "tests/MathTest.php",
"chars": 376,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Utils\\Math;\nuse PHPUnit\\Framework\\TestCase;\n\nclass Ma"
},
{
"path": "tests/NormalDistTest.php",
"chars": 14231,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\NormalDist;\nuse PHPUnit\\Framework\\TestCase;\n\nclass No"
},
{
"path": "tests/StatDatasetTest.php",
"chars": 2586,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Stat;\nuse PHPUnit\\Framework\\Attributes\\DataProvider;\n"
},
{
"path": "tests/StatFromCsvTest.php",
"chars": 1347,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Stat;\nuse PHPUnit\\Framework\\TestCase;\n\nclass StatFrom"
},
{
"path": "tests/StatTest.php",
"chars": 70808,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Enums\\Alternative;\nuse HiFolks\\Statistics\\Exception\\I"
},
{
"path": "tests/StatisticTest.php",
"chars": 14487,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Enums\\Alternative;\nuse HiFolks\\Statistics\\Exception\\I"
},
{
"path": "tests/StreamingStatTest.php",
"chars": 8837,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Exception\\InvalidDataInputException;\nuse HiFolks\\Stat"
},
{
"path": "tests/StudentTTest.php",
"chars": 9372,
"preview": "<?php\n\nnamespace HiFolks\\Statistics\\Tests;\n\nuse HiFolks\\Statistics\\Exception\\InvalidDataInputException;\nuse HiFolks\\Stat"
},
{
"path": "tests/data/income.data.csv",
"chars": 20272,
"preview": "\"\",\"income\",\"happiness\"\r\n\"1\",3.86264741839841,2.31448898284741\r\n\"2\",4.97938138246536,3.43348975853174\r\n\"3\",4.92395693622"
}
]
About this extraction
This page contains the full source code of the Hi-Folks/statistics GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 62 files (471.9 KB), approximately 147.7k tokens, and a symbol index with 625 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.