Model registries are useful. They keep track of versions, artifacts, metadata, and promotion state. They are often one of the first visible signs that a machine learning organization is becoming more mature.
But a registry is not an MLOps platform.
A platform has to support the full path from data to decision to monitoring. If the registry is the only mature component, the team can still struggle to reproduce training data, explain event definitions, compare model versions, monitor production behavior, or evaluate whether the model moved the business or clinical outcome.
The platform starts before training
Most production ML risk starts upstream of the model artifact.
Feature definitions need ownership. Event definitions need versioning. Cohorts need reproducibility. Label logic needs review. Real-time systems need latency and completeness checks. Batch systems need freshness and lineage.
If these parts are informal, the model registry becomes a neat catalog of artifacts built on unstable ground.
Evaluation needs a home
Evaluation should be a first-class product surface, not a folder of one-off notebooks.
A useful evaluation layer makes it easy to answer:
- Which cohort was evaluated?
- Which model version was scored?
- Which event definition was used?
- What was the alert volume at candidate thresholds?
- How did performance differ across hospitals, units, or patient subgroups?
- What changed compared with the prior version?
In clinical AI, retrospective evaluation is not just a model science task. It is part of risk management and product governance.
Serving is part of the contract
Production inference has a contract:
- input schema
- feature freshness
- missingness behavior
- latency expectation
- model version
- output interpretation
- logging requirements
- fallback behavior
When that contract is explicit, teams can reason about reliability. When it is implicit, production issues become archaeology.
Monitoring has to connect to outcomes
Monitoring score distributions and technical health is necessary. It is not sufficient.
A mature platform connects model behavior to the downstream product and operating metrics that justified the model in the first place. That can include utilization, action rates, alert burden, cost avoidance, care variation, or clinical outcome movement.
The platform should make it easier to ask whether the system is still worth running.
The practical takeaway
A model registry is a component. A platform is the operating system around the model.
If the goal is production AI, invest in the boring connective tissue: data definitions, evaluation, serving contracts, observability, governance, and outcome measurement.