PitchIQ — MiLB Pitch Intelligence

Zone Classification Methodology & Accuracy

PitchIQ Filth+ — Pitch Bat-Missing Model

What it is. Filth+ grades a pitch's ability to miss bats from its physical characteristics alone — velocity, induced vertical break (IVB), horizontal break (HB), vertical approach angle (VAA), release height and side, extension, and spin rate — plus its velocity and movement differential off the pitcher's own fastball. It is scaled so 100 = average and every 10 points = one standard deviation, against the full pitch population, so grades are directly comparable across pitch types. Filth+ is the first bat-missing model built natively for minor league arms, including the Complex League.

How it is built. The model is trained on 2.24 million MLB pitches with full Hawk-Eye Statcast tracking, conditioned on swings only — because location, not stuff, drives whether a hitter offers at a pitch in the first place. Following the approach Eno Sarris originated with Stuff+, we train a separate model per pitch family (four-seam, sinker, breaking, curve, changeup, cutter) to predict the probability a swing comes up empty. The single most important feature — consistent with the public literature — is each offspeed and breaking pitch's velocity and movement differential off the pitcher's own fastball. VAA is computed from full trajectory vectors identically to the rest of PitchIQ, so MLB training and MiLB scoring measure the same physical quantity.

Train on MLB, apply to MiLB. The model learns what makes a pitch nasty from the cleanest data in existence (MLB Hawk-Eye), then applies that physical standard to every trackable MiLB pitch. This is legitimate because bat-missing is physics — a 95 mph fastball with elite ride and a flat approach angle misses bats at a predictable rate regardless of which level it is thrown at. We are not claiming to out-predict Eno Sarris's Stuff+ on MLB arms. The claim is narrower and defensible: PitchIQ Filth+ is the only public bat-missing model calibrated for, and validated against, minor league outcomes.

Validation — and why this number matters. A correlation means nothing without a baseline, so we benchmarked Filth+ against the naive predictors a scout already has. Tested within pitch type (the fair comparison) against actual MiLB swinging-strike rate, on identical pitches:

• Filth+   r = 0.55
• Raw velocity   r = 0.13
• IVB only   r = 0.00
• Velocity + IVB combined   r = 0.14

Filth+ explains roughly four times more of the variance in swing-and-miss than raw velocity, and IVB on its own carries almost no signal. At the individual-pitch level the relationship is cleanly monotonic: pitches graded ~90 Filth+ miss bats on roughly 17% of swings, ~110 on ~48%, and 120+ on ~78%. These are out-of-sample results — the model never saw a single MiLB pitch in training. The validation loop is the genuine edge: every night of new MiLB data tests the grades against fresh outcomes, and the model recalibrates as the sample grows.

Filth+ and TiltStuff+ — one question each. Filth+ answers exactly one question: can this pitch miss bats? Our first attempt at a broader contact-and-run-prevention model failed validation and was scrapped, because predicting run value directly from shape is swamped by contact noise. We later solved that problem a different way — by modeling the individual components a pitch shape can predict (whiffs, called strikes, chases, weak contact) and combining them — which became TiltStuff+, our run-prevention grade, documented in its own section below. The two metrics coexist by design and answer different questions: Filth+ is bat-missing, TiltStuff+ is total run prevention. They overlap but each carries signal the other does not, so we show both rather than collapsing them into one number.

Availability and limits. Filth+ requires pitch tracking, so it exists only at the Hawk-Eye levels — Triple-A, Low-A FSL, and Complex (ACL/FCL). High-A and Double-A lack the tracking infrastructure and show no grade. Grades on small samples are noisier; minimum-pitch thresholds apply for display, though because Filth+ grades pitch shape rather than outcomes, it stabilizes faster than results-based stats. As with all models, the grade describes the pitch's physical quality, not a guarantee of outcomes — it is a tool, not a verdict. A Complex League teenager carrying a 120 Filth+ slider is showing a physical trait the industry cannot see at scale. PitchIQ can.

Predictive Validation — 2026 Season Backtest

The test. Concurrent correlation is table stakes; the real question for any stuff metric is whether it carries forward-looking signal. We backtested Filth+ on the full 2026 MiLB season (March 27 – June 9): for every Statcast-tracked pitcher, we computed Filth+ from his first 30 days only, then measured how well it predicted his swinging-strike rate over the final 30 days — fully out-of-sample, no overlap between windows. The benchmark to beat: a pitcher's own early-season swinging-strike rate predicting his late-season rate, which is what results-based projection amounts to.

• Early Filth+ → future SwStr%   r = 0.31 (n = 533)
• Early SwStr% → future SwStr%   r = 0.29 (n = 1,535)
• Filth+ → concurrent SwStr%   r = 0.45 (n = 966)

What it means. Filth+ computed from a pitcher's first month matches or beats his actual whiff results at projecting his rest-of-season whiff rate. This is the property that justifies a stuff model's existence: shape-based grades stabilize faster than outcomes, so the metric sees the skill before the stat line confirms it. The edge is modest — we report 0.31 vs 0.29 and let the reader judge — but it is real, out-of-sample, and tested on every tracked arm with sufficient pitches, not a curated subset.

Organizational validation. At the system level, Development Above Expectation (DAE — each org's average Filth+ versus the age-at-level expectation, see below) correlates with promotion volume at r = 0.51 across all 30 organizations. Organizations whose arms outperform their age-and-level peers promote more of them. The development metrics connect to the development outcome they are supposed to predict.

What Filth+ does not predict — published anyway. Early-season Filth+ shows no correlation with late-season K-BB% (r = 0.03). This is not a surprise and we will not bury it: K-BB% bundles command, sequencing, and competition changes from mid-season promotions — none of which are pitch shape. Even K-BB% itself barely predicts its own future over 30-day windows (r = 0.21), a reminder of how noisy short-window outcome stats are. Filth+ measures stuff. Stuff is not command. We publish the failed test alongside the passed ones for the same reason we killed our own Stuff+ contact model: a metric you cannot test honestly is a metric you cannot trust.

Age-adjusted expectations. All DAE calculations are conditioned on age at level, not level alone. Expected Filth+ is computed within age buckets (≤20, 21–22, 23–24, 25+) at each tracked level, so a 20-year-old is measured against 20-year-olds — the same standard a front office model applies. The empirical curve surfaces a real scouting truth: at Triple-A, younger arms carry higher expected Filth+ than older ones (102.4 for ages 21–22 vs 102.2 for 25+), because young arms at the top level are prospects while older arms are organizational depth. These backtests rerun as the season grows; correlations will be updated, in either direction, as the sample expands.

Filth+ Aging Projection — Methodology & Accuracy

What it is. A predictive layer on top of Filth+ that forecasts how a pitcher's bat-missing stuff ages, reported as a confidence cone rather than a single line. The principle is to age the stuff, not the results: a pitch's shape is a physical property independent of the level it is thrown at, so aging the grade sidesteps the competition confound that contaminates outcome-based minor-league aging.

How it is built and validated. The aging path is learned from three full seasons of MLB Statcast (2021–2023, 1,123 pitchers, 7,099 pitcher-season-pitch-type rows), every pitch scored with the identical production Filth+ model. We use the standard delta method — comparing each pitcher to himself across consecutive seasons, within pitch type, weighted by the smaller sample — which suppresses survivorship bias, yielding 3,224 year-over-year change pairs. Before trusting the curve, we validated the same pipeline against fastball velocity, the one aging signal the public literature has measured: it reproduced the known shape out of sample, including a −0.21 mph/year decline at ages 30–36, inside the published consensus. Component checks confirmed release geometry and approach angle barely move while velocity and IVB decline — exactly as physics demands.

The central finding, and the honest limits. Stuff ages gently: the Filth+ curve spans only ~2–3 points across a career (under ¼ SD), peaking near age 28. Aging is a refinement to a projection, not the main driver — a pitcher's current grade dominates. Individual pitchers scatter around the path at 2.04 Filth+ points per year, which sets the cone width. A multi-tick stuff breakout (e.g. a deGrom-style jump) sits ~2.8 cone-SD out, roughly a 0.3% event — the model places such outliers in the upper tail and quantifies their rarity rather than pretending to predict them, because their cause lives outside shape and age. Ages under 22 use a literature developmental prior (shown dashed); the MLB panel cannot directly observe pre-MLB development, and the aging shape is transferred to minor-league grades on the basis that bat-missing shape is physical and level-independent.

Read the full methodology & accuracy account →

PitchIQ TiltStuff+ — Run-Prevention Pitch Model

What it is. TiltStuff+ grades a pitch by how much it helps a pitcher prevent runs, from the pitch's physical shape alone. Where Filth+ answers one question — can this pitch miss bats? — TiltStuff+ answers a broader one: across everything a pitch can do to limit damage (miss bats, steal called strikes, draw chases, suppress hard contact), how good is its shape? It is scaled the same way as our other grades: 100 = average for that pitch type, every 10 points = one standard deviation, so a 110 four-seam and a 110 slider are each one standard deviation better than their peers. Filth+ and TiltStuff+ are different questions, not competing answers — one is bat-missing, the other is total run prevention — and a pitcher can rank highly on one and ordinarily on the other.

The first attempt — and why it failed. Our first instinct was the standard one: train a single model to predict each pitch's run value directly from its shape, the way most public stuff models do. We built it. It did not work. The per-pitch run-value signal was swamped by contact noise — whether a given pitch gets crushed or rolled over is mostly the hitter's doing, not predictable from shape — and the model collapsed into simply guessing pitch-type base rates, with shape features barely registering. A direct run-value regression on a single season of one level is too noisy to learn from. Rather than dress that up, we threw it out and changed the approach.

What actually worked — modeling the pieces, not the whole. Instead of predicting run value in one noisy step, we predict the components that pitch shape can actually drive, each as its own model, then combine them. Four shape-driven probability models, trained on Triple-A tracked pitches:

Whiff on a swing — how often a swing comes up empty (model accuracy AUC 0.73)
Called strike on a take — how often a taken pitch is called a strike (AUC 0.83, the strongest of the four)
Chase out of the zone — how often an out-of-zone pitch draws a swing (AUC 0.72)
Weak contact in play — how often contact comes off slow, under 90 mph (AUC 0.61, the hardest and noisiest axis)

Each probability is converted to a run-value contribution and summed into an expected run value per pitch, which is then normalized to the TiltStuff+ scale. This component approach is far more data-efficient than the direct regression: it lets each model learn the thing shape genuinely predicts, instead of asking one model to see through the noise of final outcomes. The build is shape-only by design — no location, no count, no sequencing — so it grades the pitch itself, independent of where it happened to be thrown. The two strongest components, whiff (0.73) and chase (0.72), are the two primary engines of strikeouts; the called-strike model (0.83) captures something Filth+ does not measure at all.

Validation — the honest numbers. We gated TiltStuff+ behind a single forward test before it was allowed to exist: grade every pitcher on the first half of the season, then see whether that grade predicts the same pitcher's run prevention in the second half — data the model never touched. Among Triple-A pitchers with 200+ tracked pitches in each half:

First-half TiltStuff+ predicts second-half run prevention at r = −0.18 (negative is correct: a higher grade means fewer runs allowed).
On that same test, Filth+ — built to predict whiffs, not runs — comes in at r = −0.11. TiltStuff+ is the better predictor of future run prevention, which is exactly what it was designed for.
Cross-validated, TiltStuff+ holds a small but real out-of-sample R² of about 0.02 against future run value, and it beats both Filth+ alone and any blend of the two.

We want to be precise about magnitude: these are modest correlations, not a blowout. Forward-predicting run value across a half-season at a single level is hard, and any honest model lands in this range — the public MLB models show similar magnitudes on comparable tests. The signal is real (it survives cross-validation and it beats the alternatives), and we describe it as exactly what it is: a modestly predictive, forward-looking run-prevention grade where none existed before.

What we tested and chose not to ship. We checked whether combining TiltStuff+ and Filth+ into a single composite would predict better than either alone. Cross-validated, it did not — the two metrics overlap enough (they share roughly 72% of their variance) that once you have TiltStuff+, adding Filth+ buys essentially nothing for run prevention. So we did not build a blended number. A composite that adds no predictive power is just complexity wearing a lab coat. We show the two metrics side by side instead, because they answer two genuinely different questions, and we let you read both.

The component stats, exposed. Because each component model is individually validated, we surface the raw expected rates on every pitcher: xWhiff%, xChase%, xCSW% (expected called-strike-plus-whiff), and xWeakContact%. These are useful on their own — xWhiff% and xChase% together describe a pitcher's strikeout engine — and they make the TiltStuff+ grade transparent rather than a black box: you can see exactly which axes a pitcher's shape is winning.

Availability and limits. Like all our shape grades, TiltStuff+ requires pitch tracking, so it exists only where Hawk-Eye records — Triple-A, Low-A, and Complex. High-A and Double-A have no tracking and show no grade. The model is trained and normalized on Triple-A, the cleanest and most MLB-adjacent tracked level, so grades there carry full confidence; Low-A and Complex are scored from the same model but flagged at lower confidence, since the model has seen less of those populations. The weak-contact component is the noisiest of the four and the clearest target for future improvement. As always, the grade describes the pitch's physical quality — it is a tool for finding signal early, not a guarantee of outcomes.

PitchIQ Repeatability+ — Mechanical Consistency Model

What it is. Repeatability+ measures how consistently a pitcher reproduces his delivery, pitch to pitch, from the tracking data alone. Where Filth+ asks "how nasty is this pitch?", Repeatability+ asks a different and complementary question: "how reliably can he repeat it?" It is built on the premise — long held inside pitching development but rarely quantified publicly — that mechanical repeatability is the foundation of command. A delivery a pitcher can reproduce is a delivery he can locate. It is scaled so 100 = the average arm at that level, with higher being more repeatable, and it is among the first public attempts to infer mechanical consistency for minor league pitchers from ball-tracking data without motion capture.

What it is built from. Repeatability+ is the equally-weighted combination of the within-pitcher variability of four physical release characteristics, each measured by Hawk-Eye on every tracked pitch:

• Release side (horizontal release point, ft)
• Release height (vertical release point, ft)
• Extension (how far toward the plate the ball is released, ft)
• Velocity (release speed, mph)

For each component we compute the standard deviation of that quantity within a single pitch type — a low standard deviation means the pitcher releases that pitch from nearly the same point, at nearly the same speed, every time. The four standard deviations are converted to percentile ranks against same-level peers, inverted so that low variability scores high, averaged, and rescaled. A pitcher who ranks in the top percentiles for consistency across all four components grades elite; one who is erratic across them grades low.

Why these four, and why equal weight. Two decisions here were made empirically rather than by assertion, and both could have gone the other way. First, on composition: we deliberately excluded pitch movement and shape (IVB, HB) from the inputs, even though shape consistency is intuitively appealing. The reason is physical — movement is a downstream consequence of release point, grip, and velocity, so its variance largely re-expresses the release and velocity variance we already measure. Including it would count the same underlying inconsistency twice and inflate the metric's apparent robustness. We confirmed the four chosen components are near-independent: their pairwise correlations range from just +0.08 to +0.24, meaning each contributes distinct information rather than echoing the others. Velocity consistency in particular correlates only +0.10 to +0.22 with the release components — it is a genuinely separate signal of timing and effort repeatability, which is why it earns a place. Second, on weighting: we tested whether tuning the four weights could improve the metric, using a 70/30 train/test split to guard against overfitting. The tuned weights improved the in-sample fit but performed worse on the held-out data than simple equal weighting (out-of-sample r of −0.290 tuned vs −0.299 equal). We kept equal weighting. A metric that cannot beat its own simplest form under honest cross-validation should not pretend to.

The measurement question we had to rule out first. Hawk-Eye is a different physical installation at every park, and release points are measured relative to that installation. Before computing anything, we had to know whether a pitcher who threw at multiple parks would show inflated release variance for a purely instrumental reason — calibration differences masquerading as inconsistency. We tested it directly: pitchers who threw at a single level/park showed a mean release-side standard deviation of 0.202 ft; pitchers who threw across multiple parks showed 0.193 ft — essentially identical, and if anything slightly tighter (likely because arms that earn promotions are more polished). The conclusion is that MLB's affiliate Hawk-Eye network is standardized closely enough that cross-park calibration noise does not meaningfully contaminate the signal, so no park normalization is applied. Had that test failed, the metric would have required per-park normalization or been confined to single-park samples.

Validation — and what it actually predicts. Repeatability is only meaningful if it connects to something real on the field. The mechanistic hypothesis is specific: a repeatable release should produce command — the ability to throw strikes and avoid walks — rather than swing-and-miss, which is the domain of stuff and therefore of Filth+. We tested Repeatability+ against three command outcomes, out of sample, and every relationship carried the predicted sign:

• vs. walk rate (BB/PA): r = −0.30 (more repeatable → fewer walks)
• vs. Zone% : r = +0.17 to +0.23 (more repeatable → more strikes in the zone)
• vs. CSW% : positive, strongest at Low-A

The walk-rate relationship is the headline, and it replicates independently at both levels — r = −0.27 at Triple-A and −0.34 at Low-A. Replication across two separate populations is what distinguishes a real effect from a sample artifact. The magnitude deserves honest framing: walks are influenced by sequencing, approach, pitch selection, catcher framing, and count leverage, not mechanics alone, so a correlation near −0.30 from release consistency in isolation is a substantial result for a single-factor descriptive metric — not the near-deterministic relationship a trained outcome model like Filth+ achieves against its own target (r = 0.55), and we do not claim it to be.

What it does and does not measure. The validation pattern is itself informative. Repeatability+ correlates with getting the ball in the zone and avoiding walks, but its relationship to CSW% is strong at Low-A and near zero at Triple-A — because CSW includes swinging strikes, which are a function of stuff, not consistency. This is the correct and desired result: Repeatability+ measures zone command and walk avoidance; it does not measure bat-missing. That is Filth+'s job. The two metrics are deliberately non-redundant — a pitcher can have an elite, erratic slider (high Filth+, low Repeatability+) or a modest fastball he paints at will (modest Filth+, high Repeatability+). Read together they describe stuff and the ability to command it.

PitchIQ PVAA — Pitch Value Above Average

What it is. PVAA measures count management — how effectively a pitcher acquires favorable count leverage. Every pitch changes the count, and every count state carries a measurable run expectancy. PVAA sums the run-expectancy impact of a pitcher's count transitions across more than 1.26 million linked pitches and 340,000 plate appearances at all four full-season levels. Negative PVAA = run prevention. The rate stat PVAA/100 normalizes per 100 pitches. PVAA is best understood as a run-value framework for how pitchers acquire count leverage — not a replacement for K-BB%, but a lens that shows the count-by-count path a pitcher takes to his results.

How we built it, broke it, and rebuilt it — in public. The first version of PVAA scored every terminal pitch by its actual outcome: a home run on 1-1 and a groundout on 1-1 were valued very differently. That version correlated with ERA at r=0.842 out-of-sample — impressive, but we did not trust it until we tested its reliability. We split each pitcher's pitches into random halves and correlated the two PVAA values. The result was weak: Spearman-Brown reliability of just 0.31–0.35, and — the tell — it did not improve with more pitches. A real skill stabilizes as the sample grows; noise stays flat. That flatness told us the original PVAA was contaminated by contact luck: whether a ball in play found a glove or a gap was injecting variance that does not repeat.

The fix. We rebuilt PVAA to score terminal balls in play at a single neutral run value, stripping contact luck, while keeping the real run values for walks and strikeouts (those are command outcomes, not luck). Reliability jumped to a Spearman-Brown of 0.57–0.62 and, critically, held flat across every pitch threshold — the signature of a stable, repeatable skill. This contact-stripped “process” computation is the version PitchIQ now displays. It correlates with K-BB% at r = −0.91 (convergent validation: a completely different computational path independently reconstructs the strikeout-minus-walk command skill) and retains a meaningful r = +0.50 relationship with ERA — the portion of run prevention that flows from repeatable process rather than batted-ball luck.

Is it projectable? We tested whether first-half PVAA predicts second-half performance, and benchmarked it against the obvious incumbent. Over a six-week MiLB split, first-half K-BB% predicts second-half K-BB% at only r=0.27 — minor-league samples are short and noisy, and almost nothing stabilizes strongly over that window. Process-PVAA persists at r=0.28, essentially identical to how well K-BB% predicts itself. PVAA is as projectable as the best outcome stat available, because they measure the same underlying command skill. We make no claim that it projects better than K-BB%; we claim it projects as well, while revealing structure K-BB% cannot.

The count-state run expectancy matrix. 0-2 counts produce −0.123 expected runs (pitcher-friendly). 3-0 counts produce +0.228 (hitter-friendly). The spread of 0.35 runs from most pitcher-friendly to most hitter-friendly means a pitcher who lives in 0-2 versus one who lives in 3-0 creates roughly a third of a run of value difference per PA. PVAA captures command, first-pitch strikes, count leverage, and walk avoidance — all through count management.

What PVAA does NOT capture. Pitch quality, deception, or contact suppression. A 99 mph painted corner and an 88 mph stolen strike that both produce an 0-1 count receive identical PVAA. That is by design — pitch quality is Filth+'s domain. PVAA and Filth+ are deliberately orthogonal: r = −0.075.

The Count Management Profile — where the real value lives. The single PVAA number is a run-value restatement of count-management skill, and at the season level it tracks K-BB% closely. The genuinely new information is the decomposition. PVAA breaks into four count buckets: FIRST (0-0 — count acquisition), AHEAD (0-1, 0-2, 1-2 — finishing), BEHIND (1-0, 2-0, 2-1, 3-0, 3-1 — recovery), and EVEN (1-1, 2-2, 3-2 — battle counts). Consider two pitchers who both finish at 27% strikeouts and 8% walks — identical K-BB%. One is a first-pitch-strike machine who gets ahead constantly and finishes efficiently. The other falls behind repeatedly and survives on chase stuff. Same outcome, completely different development path, risk profile, and coaching plan. That distinction is invisible to K-BB% and visible in the CMP. Each bucket is ranked as a percentile against same-level peers, and pitchers earn command archetypes — Leverage Creator, Finisher, Escape Artist, Balanced Commander, Elite Commander — built from the repeatable count skills (see the reliability finding below), plus weakness flags where a stable bucket lags. The profile computes for any window: full season or a single start.

Command is not one skill — it decomposes. Here is the finding we did not expect. We ran split-half reliability separately for each count bucket — does a pitcher's first-pitch skill repeat? His finishing? His recovery? His battle-count performance? The buckets came apart cleanly. FIRST (Spearman-Brown 0.61), AHEAD (0.66), and BEHIND (0.70) are all strongly repeatable — they are stable pitcher traits. EVEN (0.23) is not; battle-count performance is low-persistence, more situational than trait-driven. The implication is bigger than PVAA itself: command, which everyone treats as a single attribute, appears to be at least four distinct subskills with different reliability profiles. A pitcher's ability to acquire, hold, and recover count leverage repeats; his ability to win the maximum-tension battle counts largely does not — the same way BABIP carries information but does not stabilize quickly.

The most striking piece: recovery is the most repeatable trait of all. BEHIND counts stabilize at 0.70 — higher than first-pitch or finishing skill. That is evidence for something coaches have long believed but could not measure: some pitchers possess a genuine, repeatable ability to limit damage after losing leverage. It is a real skill, not luck, and now it is quantifiable.

How this shapes the display. We show all four buckets, because each is descriptively true for a given window. But we are honest about what each number means. The three stable buckets (FIRST, AHEAD, BEHIND) drive the command archetypes — Leverage Creator, Finisher, Escape Artist, Balanced Commander, Elite Commander — because archetypes imply identity, and identity should rest only on traits that repeat. The EVEN bucket is shown but flagged variable, and it earns no archetype: labeling a pitcher for a skill that stabilizes at 0.23 would oversell certainty. These reliability figures come from the 2026 partial season and will be re-tested as the sample grows; we publish them now because methodological transparency about what does and does not repeat is exactly what separates research from analytics marketing.

Why PVAA is not shown per pitch type. We built and tested context-adjusted per-pitch-type PVAA (PVAA+ — residual from league-average PVAA for that pitch type in that count). The result: r = 0.988 with raw per-pitch-type PVAA. The context adjustment adds no meaningful signal. League-average PVAA by count is nearly identical across pitch types (0-2 fastball: +0.014, 0-2 slider: +0.013, 0-2 changeup: +0.012). Count management is a pitcher-level process skill driven by sequencing, command, and pitch mix — not by individual pitch types. We do not display per-pitch-type PVAA because it would imply differentiation that does not exist.

Validation — the process version (2,252 pitchers, 200+ pitches). These are the figures for the contact-luck-stripped metric the platform displays:

• Split-half reliability 0.57–0.62 — stable across every pitch threshold; a repeatable skill, not noise
• K-BB%   r = −0.91 — an independent computational path reconstructs the strikeout-minus-walk command skill (convergent validation)
• ERA   r = +0.50 — the portion of run prevention that flows from repeatable process, not batted-ball luck
• Filth+   r = −0.075 — deliberately orthogonal to stuff

For context: the original outcome version (which scored balls in play by their actual result) correlated with ERA at r=0.842 but carried only 0.31–0.35 split-half reliability — the contact-luck contamination we identified and removed. The drop in ERA correlation from 0.84 to 0.50 is the expected, correct signature of stripping that luck out.

Validation — out-of-sample. RE matrix trained on the first half of the season (March 27 – May 8), applied to score pitchers in the second half (May 9 – June 14). This is the leakage test, and the projectability benchmark.

• Year-over-year-style persistence: first-half process-PVAA predicts second-half process-PVAA as well as K-BB% predicts itself (r ≈ 0.28 vs the K-BB% self-prediction bar of 0.27 over this six-week window)
• Promotion prediction: using pre-promotion data only, PVAA/100 separates promoted from non-promoted pitchers at AUC 0.75 — statistically tied with K-BB% (0.75), the best traditional benchmark
• Per-bucket reliability: FIRST 0.61, AHEAD 0.66, BEHIND 0.70 (stable traits); EVEN 0.23 (low-persistence) — command decomposes into distinct subskills

Minor-league samples are short and noisy, so nothing stabilizes strongly over six weeks — but process-PVAA is as projectable as the best outcome stat available, and identifies promotion-ready arms as well as any traditional metric. We make no claim it beats K-BB%; we claim it matches K-BB% while revealing structure K-BB% cannot.

Grading scale. PVAA/100 is the rate stat, normalized per 100 pitches, contact luck removed. Negative = run prevention; positive = giving up count leverage. Tiers are calibrated to the process version's distribution.

PVAA/100	GRADE	INTERPRETATION
≤ −2.5	Elite	Dominant count manager. Wins counts in every situation. Front-of-rotation process.
−2.5 to −1.5	Plus	Consistently ahead in counts. Commands the zone and finishes at-bats efficiently.
−1.5 to −0.5	Above Avg	Gets ahead more often than not. Solid count management without consistent dominance.
−0.5 to +0.5	Average	Neutral count impact. Neither winning nor losing count leverage consistently.
+0.5 to +1.5	Below Avg	Falls behind in counts regularly. Hitters see favorable counts more often than they should.
≥ +1.5	Poor	Consistently losing count leverage. Frequent hitter-friendly counts, deep PAs, elevated walk rates.

Coverage. All levels. All parks. PVAA requires only count-state data (reconstructed from pitch sequences) and plate-appearance outcomes — both available at every MiLB venue, including non-Hawk-Eye levels where pitch shape is unavailable. This is why a pitcher at Low-A or Double-A who has no velocity or movement data still gets a full count-management profile.

Limitations, stated plainly. Three honest caveats. First, at the season-aggregate level, PVAA and K-BB% measure the same underlying command skill (r = −0.91) — the unique value is the bucket decomposition, not the headline number. Second, like every metric over a six-week minor-league window, PVAA's projectability is modest (r ≈ 0.28), because MiLB samples are short and players change levels; it is as projectable as K-BB%, no more. Third, we deliberately kept PVAA simple: no park factors, no leverage index, no game-state or score adjustments. Each such adjustment would make the metric noisier and harder to explain, and the simplicity is part of why it survived reliability testing where more complex constructions (Stuff+, pitcher xRCP, per-pitch-type PVAA) did not. We would rather have a clean, honest, repeatable measure than an over-engineered one with a flattering correlation.

What it really means, and why it matters. PVAA is not a magic number that beats K-BB%. It is a run-value framework that makes count management legible — you can see, count state by count state, where a pitcher creates and loses leverage, and you can do it at every level of the minors including the ones without tracking technology. For a development staff, “94th-percentile recovery, 8th-percentile finishing” is an actionable sentence that a raw rate stat can never produce. That is the point. We tested this metric harder than most public metrics are ever tested, found a real flaw, fixed it, and confirmed what remained is a genuine, repeatable signal. The receipts are above.

PitchIQ xRCP — Expected Run Creation from Process (Hitter)

What it is. xRCP measures contact quality relative to pitch difficulty — how much damage a hitter produces on balls in play compared to what the pitch characteristics predict he should produce. A hitter with positive xRCP consistently barrels pitches that, based on their velocity, movement, location, and shape, should produce weaker contact. That is bat speed, pitch recognition, and barrel precision — skills that raw exit velocity alone does not capture.

How it works — two stages.

Stage 1: Contact Damage Model. We built an expected run value model from 65,765 MiLB balls in play with exit velocity and launch angle data. EV + LA → expected contact damage. CV R² = 0.386. A 108 mph lineout and a 108 mph single get nearly identical damage scores — this removes BABIP noise.

Stage 2: Pitch Shape → Expected Contact Damage. For ~65,000 BIP at Hawk-Eye venues (AAA + Low-A FSL), we predict the Stage 1 damage score from pitch characteristics: velocity, IVB, HB, VAA, extension, spin, location, pitch type, handedness. No count-state features — decoupled from PVAA by design. The residual (actual damage − expected damage) is the hitter's contact quality skill.

Validation — split-half reliability:

• min 25 BIP per half: r = +0.514 (n=499)
• min 40 BIP per half: r = +0.541 (n=359)
• min 75 BIP per half: r = +0.659 (n=69)

A hitter's xRCP in months 1-2 persists into months 3-4. This is a real, repeatable skill.

Validation — predictive power (fully out-of-sample):

• 1st-half xRCP → 2nd-half Avg EV: r = +0.492
• 1st-half xRCP → 2nd-half HardHit%: r = +0.490
• 1st-half xRCP → 2nd-half ISO: r = +0.337
• xRCP vs wRC+ controlling for Avg EV: partial r = +0.368 — xRCP predicts offensive value beyond raw exit velocity
• Fully out-of-sample persistence: r = +0.508 — first-half model applied to second-half data

Coverage. Hawk-Eye venues only: AAA and select Low-A (Florida State League) parks — approximately 32% of MiLB pitches. Double-A and High-A hitters do not receive xRCP scores.

Pitcher xRCP — Killed

We tested whether pitchers could consistently suppress contact quality below what their pitch characteristics predict. The answer is no.

Split-half reliability: r = 0.070. Threshold stability: 5% overlap from 50 to 100 BIP. Incremental prediction: R² = 0.0000 beyond Filth+ and PVAA.

Contact suppression is not a repeatable pitcher skill at MiLB sample sizes. This aligns with decades of MLB research showing pitcher BABIP does not stabilize. We killed it. The same methodology that validated Filth+ (r = 0.55), PVAA (split-half reliability 0.60 after stripping contact luck), and hitter xRCP (split-half r = 0.51–0.66) says pitcher xRCP is noise. We publish what passes and kill what doesn't. That is how the receipts work.

The Framework

PitchIQ evaluates pitchers and hitters across independent skill dimensions:

METRIC	SKILL	KEY VALIDATION	GRADE
Filth+	Bat-missing ability (stuff)	r = 0.55 vs SwStr%	A
PVAA	Count management (process)	Split-half 0.60 · K-BB% r=−0.91	A−
Rep+	Mechanical consistency	BB% r = −0.27/−0.34	B+
xRCP (hitter)	Contact quality vs pitch difficulty	Split-half 0.51–0.66, predictive	A−
Stuff+	Contact suppression (pitcher)	Failed validation	Killed
Pitcher xRCP	Contact suppression (pitcher)	Split-half r = 0.07	Killed

A pitcher with Filth+ 120 and PVAA +5 has elite stuff but cannot manage counts. A pitcher with Filth+ 95 and PVAA −20 has average stuff but elite process. Both are real profiles. Both are now quantifiable. Neither was visible from a single metric. A hitter with xRCP +0.15 and 88 mph Avg EV is producing damage that raw power cannot explain — that is barrel precision and pitch recognition. These are the dimensions. The receipts are above. The methodology is public. That is how PitchIQ works.

Sample size and regression. Standard deviation is noisy in small samples, and naively ranked, thin samples colonize the extremes of any variability metric — a pitcher with 90 tracked pitches can look elite or erratic by chance. To prevent that, every pitcher's raw score is regressed toward the population mean in proportion to his sample size (regression constant K = 150 pitches): a pitcher with 600 tracked pitches keeps essentially his full raw grade, while one with 90 is pulled substantially toward 100 to reflect the genuine uncertainty. We re-ran the full validation after regression to confirm the relationship survived — it did, holding at r = −0.27 pooled and strengthening to −0.34 at Low-A. On the player card, the grade additionally carries a visible confidence treatment: the displayed sample size, with thin samples shown faded, so a reader can always see how much data stands behind the number. The per-pitch-type breakdown shown on the card is expressed on the same scale (100 = average for that pitch type at that level), with the underlying percentile available on hover; because those per-pitch values are computed within a single pitch type and are not sample-size regressed, they will sit near — but not exactly average to — the overall grade, which is usage-weighted across the arsenal and regressed.

Why it is strongest at the lower levels. Repeatability+ discriminates more sharply at Low-A than Triple-A, and this is a feature of the metric being real rather than a flaw. At Low-A there is wide spread in mechanical polish, so consistency separates pitchers meaningfully. By Triple-A, survivorship has already filtered out the wildest deliveries — the arms that reach that level are uniformly more repeatable — which compresses the signal. The metric is therefore most informative precisely where projection matters most: identifying which lower-level arms already own the mechanical foundation for command.

Availability and limits. Repeatability+ requires pitch tracking and a meaningful sample, so it is computed only at the Hawk-Eye levels with sufficient qualified populations — currently Triple-A and Low-A. The Complex League is suppressed entirely: too few arms there clear the minimum tracked-pitch threshold to form a stable peer distribution, and rather than publish a grade computed against a handful of players, we show none. A pitcher must have at least 200 tracked pitches overall, with each contributing pitch type carrying at least 75, to qualify. The metric is descriptive, not diagnostic: it characterizes mechanical consistency and its established link to command, and it deliberately makes no claim about injury risk or durability — release-point data cannot support such a claim, and we will not imply one. What Repeatability+ offers is a quantified, validated view of a trait scouts have always valued by eye: whether a pitcher can do the same thing twice.

Research Foundation

PitchIQ's zone classification is grounded in Ben-Porat (2018, The Hardball Times), which established that MiLB Gameday coordinate data contains meaningful zone signal: Triple-A vertical location correlates with MLB at R²=0.48, Double-A at R²=0.34, and horizontal location carries signal (R²=0.25) across levels. PitchIQ uses a tiered methodology that matches data quality to the tracking infrastructure available at each level.

Triple-A

Full Statcast tracking via Hawk-Eye cameras provides exact pitch coordinates in feet (pX/pZ). Zone classification uses standard MLB strike zone boundaries (±0.83 ft horizontal, personalized sz_top/sz_bot vertical per batter fetched from the MLB Stats API). Accuracy is equivalent to MLB Statcast.

Double-A & High-A

Games are tracked by MiLB stringers using the Gameday pixel coordinate system. PitchIQ derives park-specific empirical zone boundaries for each of 80 parks: the horizontal boundary is set at the 8th–92nd percentile of called strike x-coordinates at that park; the vertical boundary from the 8th–92nd percentile of called strike y-coordinates. Called strikes and balls are classified directly from umpire decisions (100% accurate). Swing pitches are classified using the park-specific pixel boundaries.

Accuracy. Validated at 89.79% accuracy across 124,603 umpire-confirmed pitches. The remaining ~10% error reflects genuine umpire missed calls (~4–5%) and inherent Gameday coordinate noise — the practical ceiling for this methodology given the available data.

Why park-specific. Gameday pixel coordinates are not consistent across parks. Each stadium has a different camera position and zoom level. By deriving boundaries park-by-park from actual umpire calls, PitchIQ eliminates the camera calibration problem entirely. The umpire is the calibration instrument.

Low-A — FSL Parks

Jupiter, Clearwater, Fort Myers, Lakeland, Bradenton, St. Lucie, Tampa, Daytona, Charlotte, and Palm Beach have full Statcast/Hawk-Eye coverage. Zone classification at these parks is equivalent to Triple-A accuracy.

Low-A — Non-FSL Parks

The same park-specific empirical boundary approach is applied to 20 non-FSL Low-A parks. For parks where only horizontal Gameday coordinates are available, zone classification uses horizontal position only — capturing chase tendencies on pitches off the plate horizontally. Ben-Porat (2018) confirmed R²=0.25 horizontal signal at the MiLB level. Non-FSL Low-A metrics should be interpreted as directional estimates with an expected margin of ±3–5 percentage points.

Zone Rate Validation

PitchIQ's park-specific system produces zone rates of 40.6% at Double-A, 41.0% at High-A, 42.5% at Low-A, and 43.6% at Triple-A (Statcast ground truth) — a spread of just 3 percentage points across all four levels. This cross-level alignment is strong evidence the methodology is correctly calibrated.

Impact on Metrics

O-Swing% (Chase Rate). League averages: Triple-A 28.8%, Double-A 31.5%, High-A 31.0% — the slight elevation at lower levels is consistent with less experienced hitters expanding the zone against developing pitchers with less precise command.

Z-Contact%. League averages: Triple-A 85.2%, Double-A 81.3%, High-A 80.9% — directionally correct, with lower levels showing slightly less contact on strikes as expected given developmental stage.

PIIS. The Prospect Impact Intelligence Score uses both Z-Contact% and O-Swing% as primary inputs. With corrected zone classifications at all four levels, PIIS scores better reflect true plate discipline process rather than coordinate system artifacts.

Limitations and Honest Caveats

Umpire error. Called strikes and called balls are not perfect — umpires miss calls at approximately 4–5% on boundary pitches. Park-specific boundaries therefore encode both the true zone and umpire tendencies at each park. For prospect evaluation this is appropriate — what matters is how umpires actually call games at that level.

Sample size stability. Early-season boundaries derived from 500–600 called strikes are reliable but tighten considerably as the corpus grows to 2,000+ by season's end. Zone metrics for players with fewer than 150 pitches should be treated as directional indicators.

Ongoing development. Park boundaries are recalibrated nightly as new games are collected. This system improves permanently with time as the calibration corpus grows through the 2026 season and beyond.

Dynasty Value Score (DVS) — Methodology

DVS is a nightly-updated composite metric for minor league hitters only. It combines five independent signals into a single 0–1000 score. Pitchers and MLB players are not currently included but are planned for future versions.

The five components: (1) Live performance — MiLB wRC+ (park-adjusted, league-relative) acts as a performance anchor multiplier. (2) MLB arrival probability — our MLB Probability Index (0–100) acts as a confidence modifier on the base score. (3) Age/level surplus — players younger than expected for their level receive an upside amplifier. A 19-year-old in Double-A is worth significantly more in dynasty than a 24-year-old at the same level. (4) Industry consensus — rankings across five publications (The Athletic, Baseball America, FanGraphs, MLB Pipeline, RotoWire) add a consensus bonus. Players ranked top-5 in 3+ publications are eligible for the Generational tier (950–1000). (5) Dynasty ownership — Fantrax roster percentage across 6,500+ players adds an ownership bonus. In a pool this large, 40%+ ownership for a MiLB prospect represents strong conviction from experienced dynasty managers.

Important caveat: A player ranked highly by multiple publications but underperforming statistically may carry significant upside not yet reflected in current numbers. DVS weights pedigree and consensus heavily for young prospects — this is intentional. The formula self-corrects as the season progresses and rankings update.

Tiers: 950–1000 = Generational · 850–949 = Elite · 700–849 = Top Prospect · 500–699 = Strong Prospect · 300–499 = Solid · Below 300 = Developmental

Update frequency: Nightly at 5am ET, after the comp engine (4am) and enrichment pipeline (4:30am).

Pitcher Metrics — How To Read The Leaderboard

ERA (Earned Run Average): Earned runs allowed per 9 innings. Sourced directly from the MLB Stats API official boxscore — same number you see on Baseball Reference. 100% accurate, official scorer determination. Tiers: Elite ≤2.50 · Plus ≤3.50 · Average ≤4.25 · Below Average ≤5.00 · Poor 5.00+

WHIP (Walks + Hits per Inning Pitched): The most stable early-season pitching metric. Measures baserunner rate independent of run scoring. Calculated exactly from our PA table and game log. Tiers: Elite ≤0.90 · Plus ≤1.10 · Average ≤1.30 · Below Average ≤1.50 · Poor 1.50+

xFIP (Expected Fielding Independent Pitching): Removes defense and luck from ERA by using strikeouts, walks, HBP, and fly balls with a normalized home run rate (lgHR/FB = 9.7%). The most predictive single-number ERA estimator available. A pitcher with 2.50 ERA and 4.50 xFIP is likely due for regression. A pitcher with 5.00 ERA and 2.75 xFIP is a buy-low target. Formula: ((13 × (FB × lgHR/FB)) + (3 × (BB+HBP)) − (2 × K)) / IP + lgFIP_constant.

K-BB%: Strikeout rate minus walk rate. The single cleanest indicator of pitcher dominance. Removes the noise of defense, park, and luck. Above 20% is elite at any level. Below 10% is a significant concern regardless of ERA.

SwStr% (Swinging Strike Rate): The upstream cause of strikeouts — the moment of failure before it shows in the box score. More stable than K% in small samples. 15%+ is elite. 10-12% is functional. Below 8% means the pitcher is living on defense and weak contact.

O-Swing% Induced (Chase Rate): How often batters chase pitches outside the strike zone against this pitcher. Elite pitch movers who tunnel and locate generate 35%+ chase rates. Average is around 28-30%.

Z-Contact% Allowed: How often batters make contact when they swing at strikes. Lower is better for pitchers. Elite: below 78%. Average: 82-85%. Above 88% means hitters are squaring up pitches in the zone regularly.

GB% (Ground Ball Rate): Ground balls as a percentage of balls in play. Ground ball pitchers suppress home runs, induce double plays, and project well as starters. Elite: 55%+. Average: 42-46%. Below 35% is a fly ball pitcher — needs elite stuff to survive at higher levels.

Barrel% Allowed (AAA and Low-A FSL only): Percentage of balls in play that meet the Statcast barrel definition (EV ≥98 mph, optimal launch angle). The ultimate contact quality metric. Elite: under 4%. MLB average: ~8%. Above 12% means hitters are consistently squaring this pitcher up.

Velocity — What The Numbers Mean

Velocity is displayed as average (Velo) and maximum (MaxVelo) across all pitch types combined. At AAA and Low-A FSL parks where Hawk-Eye is deployed, these are exact Statcast measurements. At Double-A and High-A, velocity shows — because Statcast hardware is not installed at those parks.

Velo tiers (fastball average): 98+ mph = Elite (top 1%) · 96-97.9 = Plus · 94-95.9 = Above Average · 92-93.9 = Average · 90-91.9 = Below Average · Under 90 = Fringe. These benchmarks apply to fastball velocity specifically. Off-speed pitches will always register lower — a pitcher showing 85 mph average velo likely mixes significant breaking ball usage.

Why MaxVelo matters: Maximum velocity in a start or outing represents ceiling, not typical output. A starter who sits 92 and touches 96 has a different profile than one who sits 92 and maxes at 93. MaxVelo tells you what the arm is capable of — particularly important for young pitchers still developing.

IVB — Induced Vertical Break

Induced Vertical Break measures how much a pitch rises or falls relative to a theoretical spinless pitch subject only to gravity. Positive IVB means the pitch is fighting gravity — riding up. Negative IVB means the pitch is dropping faster than gravity alone would produce.

Why IVB matters: A four-seam fastball with 18 inches of IVB is generating significant backspin-driven lift. Batters expect the ball to drop at a certain rate based on physics. When it drops less than expected, the hitter swings under it. That is the rising fastball illusion — the ball never actually rises, but it arrives higher than the hitter's brain predicted. This is measurable, repeatable, and one of the most important traits in modern pitcher evaluation.

IVB tiers (four-seam fastball): 20"+ = Elite (generational ride, top 2%) · 18-19.9" = Plus · 14-17.9" = Above Average · 10-13.9" = Average · Under 10" = Below Average. Sinkers and two-seamers intentionally have low or negative IVB — they are designed to dive, not ride.

IVB by pitch type context: Curveballs have negative IVB (typically -8" to -14") because they dive hard with topspin. Sliders and sweepers have near-zero IVB because their movement is primarily horizontal. Changeups typically have 4-8" IVB, slightly less than the fastball they mirror, which creates the perception of the ball diving away from the bat at the last moment.

VAA — Vertical Approach Angle

What it is: The angle at which a pitch is descending as it crosses the front of home plate. Measured in degrees. Always negative — the ball is always dropping at plate crossing. Flatter pitches are closer to 0°. Steeper pitches are more negative.

The physics formula (Chamberlain, FanGraphs 2022 — the industry standard):

vy_f = −√(vy0² − 2×ay×(y0 − yf)) · t = (vy_f − vy0) / ay · vz_f = vz0 + az×t · VAA = −arctan(vz_f / vy_f) × (180/π)

Where yf = 17/12 feet (front of home plate). All velocity and acceleration components sourced directly from Statcast at the point of measurement (y=50 feet from plate). This is the identical methodology used by Baseball Prospectus, and FanGraphs (Alex Chamberlain).

Why VAA matters: Hitters swing on an upward plane of roughly 10-12 degrees. A pitch with a flat VAA creates maximum mismatch — the ball arrives at an angle the bat is least prepared to meet. Every degree closer to 0° is a degree of deception the pitcher gets for free, independent of velocity. A 93 mph fastball with -4.0° VAA can be harder to square than a 97 mph fastball at -5.8°. This is the Bryan Woo effect — elite deception from geometry, not just raw stuff.

VAA is available at Triple-A and Low-A FSL parks only. At Double-A and High-A, VAA shows — due to the absence of Hawk-Eye infrastructure. A Low-A arm showing -4.1° VAA at 19 years old is demonstrating a physical trait the industry cannot see at scale. PitchIQ can.

Four-Seam Fastball VAA Tiers: Works best elevated where flat angle clashes maximally with upward swing plane.

-3.5° or flatter = Elite — generational. Tyler Rogers, Josh Hader tier. Top 2% of pitchers.

-4.0° to -4.5° = Plus — genuine deception, plays up significantly above velo grade.

-4.6° to -5.2° = Average — functional, needs plus velo or movement to miss bats consistently.

-5.3° to -5.9° = Below Average — gets elevated and barreled at upper levels.

-6.0° or steeper = Poor for a fastball — do not throw up in the zone.

Sinker / Two-Seam Fastball VAA Tiers: Opposite philosophy — steeper is better. Works low in the zone to generate ground balls.

-7.0° or steeper = Elite — true bowling ball effect, extreme GB%.

-6.0° to -6.9° = Plus — strong ground ball tendency.

-5.0° to -5.9° = Average — functional sinker.

-4.9° or flatter = Poor — sinker playing like a flat fastball, gets lifted.

Slider / Sweeper VAA: VAA is secondary to horizontal break. But steep VAA combined with elite sweep creates two-plane movement that is nearly unhittable. A slider with -5.5° or steeper VAA AND 15"+ horizontal break is elite. A slider with flat VAA is tunneling poorly with the fastball.

Curveball VAA Tiers: Steep VAA is the entire point. The pitch should dive through the zone.

-10.0° or steeper = Elite — true 12-to-6, disappears under barrels.

-8.0° to -9.9° = Plus — strong downward action.

-6.5° to -7.9° = Average — functional breaker.

-6.4° or flatter = Poor — curveball without enough depth to generate swings and misses.

Changeup VAA: Should mirror the fastball as closely as possible. The deception comes from velocity separation, not angle. A changeup within 0.5° of the fastball VAA is tunneling correctly. A changeup 2.5°+ steeper than the fastball is tipping — advanced hitters will identify it early.

Key factors shaping VAA: Release height — lower release point flattens VAA naturally. A 6'5" pitcher releasing at 6.5 feet gets flatness for free. Extension — every extra inch of extension flattens VAA. Logan Gilbert and Tyler Glasnow at 7.6 feet of extension generate perceived velocity 2+ mph above actual. IVB — higher IVB fights gravity and flattens the approach angle. Velocity — higher velocity reduces time in flight, reducing gravity's effect. Location — higher in the zone equals flatter VAA, lower in the zone equals steeper.

The IVB + VAA Combination — The Most Deceptive Fastball Profile In Baseball

High IVB fights gravity so the ball drops less than expected. Flat VAA means it arrives at the plate on a shallower angle than the hitter's swing plane. Together, the combination creates a fastball that appears to rise even though nothing rises. The hitter's brain predicts a trajectory based on what it sees out of the hand. The ball arrives significantly higher and at a flatter angle than predicted. The result is a swing under the pitch — the classic elevated fastball whiff.

This is not velocity. This is geometry. A pitcher sitting 93 mph with 19" IVB and -4.2° VAA is generating a more deceptive fastball than most pitchers sitting 97 with average shape. This combination is what Driveline, Tread Athletics, and every elite pitching lab in the country obsesses over. At the MiLB level almost nobody surfaces it systematically. PitchIQ does — live, across all four levels, every morning.

The elite threshold: IVB ≥ 16" AND VAA ≥ -4.5° on the four-seam fastball. This combination at any MiLB level on a pitcher under 23 years old is a serious MLB projection signal. Use the metric filter to find them: set IVB ≥ 16 and VAA ≥ -4.5 on the pitcher leaderboard. The list will be short. Every name on it deserves your attention.

The tunneling principle: The most effective pitch combinations share VAA as long as possible before separating. A fastball at -4.2° and a curveball at -9.5° look completely different at the plate but can be made to look identical out of the hand through 40 feet of flight. The hitter commits to one trajectory and gets the other. VAA is the foundation of pitch tunneling — and tunneling is why elite pitchers dominate regardless of velocity.

Extension and the perceived velocity bonus: Every foot of extension beyond the rubber shortens the distance the ball travels before reaching the plate. A pitcher with 7.0 feet of extension is effectively releasing the ball 7 feet in front of the rubber — the ball only travels 53.5 feet instead of 60.5. At 95 mph that translates to roughly 2.5 mph of perceived velocity gain. Extension is a free velocity upgrade. In PitchIQ, extension is displayed for AAA and Low-A FSL pitchers. A MiLB pitcher combining elite extension (6.8+ feet), high IVB (16"+), and flat VAA (-4.5° or better) has a fastball that plays at the highest level regardless of what the radar gun says.

TiltValue — ProspectTilt Proprietary Pitcher Rating

TiltValue is ProspectTilt's composite pitcher rating. It rewards elite strikeout rates, low walk rates, high swinging strike rates, low ERA, strong xFIP, ground ball tendency, and — at Statcast-equipped parks — velocity and induced vertical break. It penalizes high walk rates and poor ERA. Available in the dedicated TiltValue tab on the pitcher leaderboard. Minimum 15 IP required.

Formula components: ERA qualifier (up to +13 points) · xFIP qualifier (up to +10 points) · K% qualifier (up to +13 points) · BB% qualifier (up to +10 points, penalty for high walk rates) · SwStr% qualifier (up to +10 points) · K-BB% (×0.44 multiplier) · IFFB% (×0.25 multiplier) · Velo qualifier — AAA/Low-A only (up to +11 points) · IVB qualifier — AAA/Low-A only (up to +7 points).

Important note: Pitchers at Double-A and High-A are structurally disadvantaged in TiltValue because velo and IVB components (up to 18 combined points) are unavailable without Statcast. A High-A pitcher with elite K%, BB%, and xFIP will score lower than an equivalent AAA pitcher with the same rates simply because the Statcast bonus is inaccessible. This is a data limitation, not a reflection of pitcher quality. Always compare TiltValue within level, not across levels.

PIIS-T: Prospect Impact Intelligence Score — ProspectTilt Proprietary

The Problem With Existing Metrics: Every advanced metric available to dynasty and prospect analysts was built for Major League Baseball. xwOBA requires Hawk-Eye Statcast unavailable at Double-A and High-A. wRC+ measures what happened, not why or whether it will continue. EV90 captures raw power but ignores contact decisions. K% and BB% capture discipline but say nothing about damage. No single metric has ever simultaneously measured contact execution, plate discipline, power sustainability, and swing efficiency across all four full-season MiLB levels — until PIIS-T.

The Formula: Raw PIIS = (Z-Contact% × BB% × dampened_ISO^1.5) / SwStr%. All components expressed as decimals. ISO is dampened toward league average based on sample size using a stabilization weight of k=150 PA — this prevents a player with 3 home runs in 30 at-bats from dominating the leaderboard in April. As plate appearances accumulate, the observed ISO carries progressively more weight. By 300 PA the dampener has minimal effect and observed ISO is trusted fully. The ^1.5 exponent on ISO creates meaningful separation between true impact bats and slap hitters without the extreme volatility of squaring. O-Swing% floor of 15% applied at Double-A and High-A corrects for stringer data coordinate limitations at those levels. Minimum 40 PA required. PIIS-T+ adds EV90: Raw PIIS+ = (Z-Contact% × BB% × dampened_ISO^1.5 × EV90) / SwStr%, available at Triple-A and Low-A Statcast parks only.

Component Breakdown: Z-Contact% — the foundation. Measures whether the hitter executes when pitchers attack the strike zone. The most stable early-season metric in PitchIQ, reliable after just 100 pitches. A hitter who cannot make contact on strikes is being beaten at the most fundamental level of hitting. BB% — the complement. Walk rate proves pitch recognition on the other side of the equation. Together with Z-Contact%, it describes a hitter who attacks strikes and lays off balls. High Z-Contact% without BB% is a hacker being exploited. High BB% without Z-Contact% is a passive hitter who is hittable. The combination is the signal. Dampened ISO^1.5 — the damage component. ISO measures extra-base ability independent of batting average. Dampening and the ^1.5 exponent together ensure power is real and sustainable before it amplifies the score. SwStr% — the penalty. The upstream cause of strikeouts, not the downstream outcome. Captures the moment of failure — the swing that misses entirely — before it shows in the box score.

The Normalization — Why T-Score: Most normalized metrics use min-max scaling — take the best player, call them 100, take the worst, call them 0. This has a critical flaw: the meaning of every score shifts as the player pool changes. One freakish outlier at the top compresses every other player toward the bottom. PIIS-T uses T-Score normalization — the same statistical framework used in sports science, psychometrics, and academic standardized testing. Formula: z = (Raw PIIS − μ_level) / σ_level, then PIIS-T = (z × 10) + 50, clamped 0–100. μ_level and σ_level are the mean and standard deviation of all qualifying players at that specific level. Result: 50 is always exactly league average — permanently anchored, not relative to the current filter or leaderboard view. Every 10 points equals exactly one standard deviation. Approximately 68% of qualifying players will score between 40 and 60. Fewer than 2.5% will exceed 70. A 75+ PIIS-T is not a hot streak. It is a demonstrated elite process. This is the first application of T-Score normalization to a composite MiLB hitting process metric across all four full-season levels.

Analytical Foundation: PIIS-T combines four independently validated process metrics using a statistically rigorous normalization framework standard in sports science. Each component has documented predictive validity in published sabermetric research — Z-Contact% and SwStr% stability work documented by Ben-Porat (The Hardball Times, 2018) and Tango, Lichtman, and Dolphin (The Book, 2006); BB% and discipline metric stability established across multiple peer-reviewed studies. PIIS-T is a novel application of these components to MiLB data at scale. It is the first metric to combine contact execution, plate discipline, power sustainability, and swing efficiency into a single T-Score normalized number available at all four full-season MiLB levels. Formal validation against promotion timelines and future wOBA outcomes is ongoing as our dataset grows through the 2026 season and beyond.

The Four Scenarios — How to Use PIIS-T:

PIIS-T 65+ and wRC+ 130+ — The Real Deal. Process confirms outcomes. Elite fundamentals backing elite results. Trust the performance — it is backed by a demonstrated process. These are players you build around in dynasty. They are not regression candidates. They are promotion candidates.

PIIS-T 65+ and wRC+ under 115 — Diamond in the Rough. The most valuable signal in PitchIQ. Elite process, suppressed results. This divergence has identifiable causes: BABIP bad luck, pitcher-friendly park suppressing counting stats, small result sample, or a hitter making all the right decisions without receiving pitches to drive yet. Buy low before the industry catches on. Positive regression is a mathematical expectation, not a hope. Specifically look for: Z-Contact% 82%+, BB% 12%+, dampened ISO .150+, SwStr% under 9% combined with wRC+ under 115. When those align you have found a future breakout.

wRC+ 130+ and PIIS-T under 35 — Regression Candidate. The most dangerous profile in dynasty. Production not supported by process. Likely driven by BABIP luck, a hot streak on a small sample, favorable park, or weak opponent quality. Sell high before the results collapse. SwStr% is too high, BB% too low, or ISO propped by a few hard-hit balls in a compressed window. When pitchers adjust and attack the weaknesses PIIS-T has already flagged, the box score will crater. Red flags: wRC+ 130+ with PIIS-T under 35, SwStr% above 14%, BB% under 6%, Z-Contact% under 70%.

PIIS-T under 35 and wRC+ under 100 — Context Required. Not all low-PIIS-T players are sells. An 18-year-old IFA at Low-A with PIIS-T 30 in 50 PA is still adjusting to professional pitching — expected and unremarkable. A 24-year-old at Double-A with PIIS-T 30 in 350 PA is a significant red flag. Age relative to level changes everything. Cross-reference with MLB Probability, DVS, and the age context banner. No single metric tells the whole story. PIIS-T tells the process story. The other metrics tell the ceiling story. Together they give you the truth.

Tiers: 75–100 = Elite (top 2.5%, 2+ SD above average) · 65–74 = Plus (top 7%, 1.5 SD above) · 55–64 = Above Average (top 30%) · 45–54 = Average (within 0.5 SD of mean) · 35–44 = Below Average · 0–34 = Poor (1.5+ SD below average)

Transparency: Every PIIS-T component is visible in PitchIQ — Z-Contact%, BB%, ISO, and SwStr% are all displayed in the leaderboard and player cards. The formula is published. The normalization method is fully explained. The guardrails — 40 PA minimum, ISO dampener, O-Swing% floor at AA/High-A — are documented here. PIIS-T is a tool, not a verdict. Use it alongside wRC+, DVS, MLB Probability, and age context for the complete picture. ProspectTilt believes that analytical transparency builds better analysts.

wRC+ (MiLB) — How It Is Calculated & Why It Differs From FanGraphs

wRC+ (MiLB) is a park-adjusted, league-adjusted weighted runs created metric calculated directly from PitchIQ plate appearance data. A score of 100 means exactly league average for that level. Above 100 is above average, below 100 is below average.

Formula: Player wOBA is calculated from actual outcomes (BB, HBP, 1B, 2B, 3B, HR) using standard 2026 wOBA weights. Park adjustment uses the half-park-factor method: park_adj_wOBA = raw_wOBA / ((PF + 100) / 200). League wOBA (lgwOBA) is derived from all plate appearances in PitchIQ at that level. wRC+ = ((park_adj_wOBA − lgwOBA) / wOBA_scale + lgR/PA) / lgR/PA × 100.

Why it differs from FanGraphs: FanGraphs wRC+ anchors lgwOBA to the MLB run environment (~0.310–0.320), then scales down. PitchIQ wRC+ uses the actual MiLB run environment derived from our database (~0.330–0.340 depending on level). This makes PitchIQ wRC+ a true within-level comparison — 120 means 20% above Double-A average. FanGraphs wRC+ is MLB-scaled, so the same player will show a lower number there. Neither is wrong — they measure different things. PitchIQ wRC+ is more accurate for comparing prospects to their peers at the same level.

Park factors: Source is Baseball America 2024 MiLB Park Factors (Matt Eddy, January 2026). wOBA park factor used as primary metric. Half-adjustment applied since players split games between home and away.

Update frequency: Recalculated nightly as part of the enrichment pipeline. Reflects all plate appearances in the current season to date.