Reliable trends start with reliable data. Our research on repeating return-window trends (20–180 trading days) is backed by a layered validation program spanning ingestion to model outputs.
1) Ingestion & Schema
-
Strict datatypes, keys, and trading-calendar alignment.
-
Duplicate prevention; negative or impossible values rejected.
-
Corporate actions normalized (splits/dividends) with sanity checks.
2) Content Quality
-
Gap detection (e.g., >3 missing trading days) and staleness alerts.
-
Outlier screening via z-scores/IQR, reconciled to events (splits, halts, news).
-
Cross-vendor parity checks on prices and corporate actions with defined tolerances.
3) Calculation Integrity
-
Recompute rolling 20–180 day returns and annualization independently (SQL vs. Python).
-
Edge-window tests ensure correct first/last eligible dates.
-
Idempotence: same inputs yield identical outputs.
4) Bias & Leakage Controls
-
No look-ahead: features limited to information available at the decision date.
-
No survivorship bias: delisted symbols retained; index membership time-stamped.
-
Corporate events mapped to preserve continuity (mergers, ticker changes).
5) Monitoring & Governance
-
Freshness, completeness, and quality KPIs tracked continually.
-
Versioned releases with manifests, immutable raw zone, and lineage to code commits.
-
Peer review, canary runs, and incident playbooks; defects quarantined and disclosed.
6) Trend Repeatability
-
Year-by-year returns and threshold flags (e.g., ≥30%, ≥40%) recomputed from adjusted data.
-
“% of years meeting threshold” validated across variable analysis ranges.
Outcome
Transparent lineage, reproducible results, and rapid anomaly containment—so client decisions rest on defensible, auditable data.