When Data Handling Alters Physical Interpretation: HVAC’s Missing Evidence Layer

By Greggory Butler

1. The Discovery That Should Stop the Industry

A recent 2026 study published in Energy and AI reveals something the HVAC industry has not formally acknowledged:

The conclusions about thermal comfort are not stable.

Using the same dataset and the same machine learning model, the researchers demonstrated that simply changing how missing data is handled produces materially different interpretations of physical reality.

  • Measured-only data: Ta:MRT ≈ 1.9:1
  • ML with missing data patterns: Ta:MRT ≈ 4.46:1

This represents a 135% shift in variable importance—without any change in the underlying environment.

Source: Guo et al., Energy and AI (2026),
https://www.sciencedirect.com/science/article/pii/S2666546826000352

Let’s be precise about what this means:

The system did not discover different physics.
It produced different conclusions based on data handling.


2. The Illusion of Stability

HVAC has long operated under a foundational assumption:

Better models → better predictions → better decisions

This study demonstrates that assumption is incomplete.

Because:

The conclusions themselves are conditional—not on environmental reality, but on how missing data is treated.

When mean radiant temperature (MRT) is absent:

  • It is often substituted (MRT ≈ Ta), or
  • Left for the algorithm to interpret

This introduces:

  • Artificial correlation
  • Suppressed radiant influence
  • Distorted system behavior

The result is a model that appears consistent, but is not physically anchored.


3. Accuracy Without Truth

One of the most important findings in the study is this:

The most accurate model (lowest error) produced the most misleading interpretation of reality.

This occurs because imputation:

  • Reduces variability
  • Simplifies prediction
  • Collapses distinct physical variables

Which creates:

A system that improves predictive performance
while degrading physical meaning.

This distinction—between accuracy and truth—is not enforced in current HVAC workflows.


4. When AI Learns the Wrong System

The study used LightGBM, a widely adopted machine learning framework.

Like many modern models, it:

  • Treats missing data as signal
  • Routes null values through decision trees
  • Learns patterns from absence

This creates a fundamental misalignment:

The model is learning data collection behavior, not environmental physics.

Because machine learning optimizes aggressively:

  • It reinforces those patterns
  • It amplifies bias
  • It presents artifacts as insight

This is not a failure of AI.

It is a failure of input integrity.


5. The Root Cause: Missing Reality at Origin

The study correctly identifies:

  • Missing data
  • Imputation bias
  • Policy-driven variability

But these are downstream effects.

The upstream failure is this:

The environmental state is not fully captured and preserved at the moment it occurs.

Everything that follows—

  • Imputation
  • Reconstruction
  • Modeling
  • Interpretation

—is an attempt to compensate for that absence.


6. Reconstruction Is Not Observation

Once environmental state is incomplete, two options remain:

  • Fill the gap (impute)
  • Remove the gap (exclude)

Neither restores reality.

Because:

Reconstruction is not measurement.

Yet reconstructed data is routinely treated as equivalent to observed conditions.

This creates a structural condition where:

Reality becomes dependent on data policy.


7. System-Level Consequences

This instability propagates across the built environment:

Radiant Systems

  • Undervalued due to suppressed MRT influence
  • Removed from design consideration

Energy Modeling

  • Overemphasis on air temperature
  • Underestimation of radiant efficiency

Control Systems

  • Built on incomplete environmental state
  • Reinforce distorted assumptions

AI Systems

  • Learn from missingness
  • Optimize around artifacts
  • Scale flawed interpretations

8. The Wrong Question

The industry continues to ask:

“Which model is better?”

This assumes the inputs are valid.

This study shows they are not.

If:

  • Environmental state is incomplete
  • Data is reconstructed
  • Context is not preserved

Then:

No model—regardless of sophistication—can produce stable truth.


9. The Right Question

The correct question is:

“How is the environmental state being proven at the moment of capture?”

Because without that:

  • Outputs are conditional
  • Interpretations are unstable
  • Decisions are not defensible

They are simply:

A function of data handling policy.


10. The Missing Layer: Evidence Governance

What this study reveals—without explicitly naming—is the absence of a governing layer.

Not:

  • Better models
  • More data
  • Increased transparency

But:

A system that enforces admissible environmental evidence before interpretation

Such a system requires:

  • Capture at origin (no reconstruction)
  • Continuous environmental recording
  • Time-sequenced, append-only records
  • Separation of observation from interpretation

And most critically:

No decision without a complete, admissible state.

If interpretation changes when data handling changes, then the system is operating on non-admissible state.

In such conditions, any output—no matter how accurate—cannot be considered physically valid, because the underlying environmental state was never fully observed and preserved.

This establishes a hard boundary:

No system can claim to represent reality if that reality was not captured in full at the moment of occurrence.


11. From Data Systems to Truth Systems

The industry currently operates in data systems:

Collect → Clean → Fill → Model → Interpret

What is required is a truth system:

Observe → Record → Preserve → Verify → Then Interpret

This is not optimization.

It is a structural shift.


12. Environmental Truth Cannot Be Reconstructed

The core principle is simple:

Environmental truth is not inferred or reconstructed.
It must be observed and preserved at origin.

Anything else is approximation.

And when approximation is scaled through automation and AI, it becomes systemic error.


13. A Boundary Moment

This study marks a boundary:

  • Model sophistication cannot resolve missing reality
  • Transparency cannot restore lost state
  • Policy cannot substitute for evidence

It reveals:

The current paradigm cannot produce stable environmental truth.


14. The Path Forward

The path forward requires:

  • Continuous environmental records
  • Non-invasive baseline capture
  • Sequence-enforced measurement
  • Admissibility before interpretation

These are not enhancements.

They are infrastructure.


15. Final Reality

The most important conclusion is not about HVAC.

It is about truth:

If conclusions change when data handling changes,
then truth was never captured in the first place.

And if truth is not captured at origin:

Every system built on it inherits instability.


16. Conclusion

The HVAC industry does not have a modeling problem.

It has a missing evidence infrastructure problem.

Until that is addressed:

  • AI will scale distortion
  • Models will reinforce artifacts
  • Decisions will remain conditional

Because the foundation—environmental reality itself—is incomplete.

The next evolution is not smarter systems.

It is:

Systems that cannot lose truth.

LinkedIn
Twitter
Pinterest
Facebook