A rigorous mathematical and algorithmic paradigm designed for downstream clinical prognostic modeling and operational clinical asset management under monthly and quarterly snapshot data ingestion constraints.
Operational environments in healthcare business intelligence and enterprise hospital performance platforms frequently function under severe localized data ingestion constraints. Unlike localized intensive care units featuring streaming telemetry, enterprise clinical data engines typically accept Electronic Health Record (EHR) drops in discrete monthly or quarterly batch snapshots. This introduces systemic temporal misalignment, irregular sampling, and severe artifact multi-scale sparsity across long-range patient records.
To extract deep prognostic and economic utility from these asynchronous records, this manuscript introduces an offline, non-causal machine learning framework powered by Selective State Space Models (SSMs). By structuring an irregular, continuous-time formulation mapped into a bidirectional, block-parallel architecture, our framework achieves dual-system optimization: it tracks and imputes continuous latent physiological vectors across sparse temporal windows while compressing arbitrary-length multi-year patient histories into a static, low-dimensional **Patient State Portrait Vector** $\mathbf{z}_p \in \mathbb{R}^{2d}$.
Traditional clinical deep learning pipelines implicitly assume highly aligned, synchronous, or high-frequency telemetry matrices. In enterprise clinical performance platforms, this assumption falls apart. Clinical operations run on transactional batch intervals. When an analytical pipeline receives data snapshots every 30 or 90 days, the absolute temporal sequence of a patient’s life is deeply fragmented. A typical cohort patient profile contains a series of sparse, localized clusters: an outpatient encounter on Day 4, a metabolic panel on Day 12, a localized prescription adjustment on Day 18, followed by weeks of absolute analytical emptiness.
Standard approaches rely on naive engineering heuristics, including **Last Observation Carried Forward (LOCF)**, mean imputation, or regularized spline fitting. These methods treat time as a uniform sequence of slots, introducing severe synthetic noise. For example, carrying forward a blood creatinine or urea nitrogen value from 45 days prior treats a dynamic biological state as a frozen constant. This destroys the temporal signal required to evaluate clinical degradation or sudden disease trajectory changes.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks suffer from severe gradient vanishing when capturing long-term historical records across multi-year spans. Conversely, Multi-Head Attention Transformers scale quadratically ($O(L^2)$) relative to length $L$. This makes them incredibly expensive to scale over tens of thousands of historical patient sequences.
By pivoting to **Selective State Space Models**, we implement sub-quadratic, linear-time implementations that retain long-range historical memory. This provides an elegant framework to reconstruct historical clinical pathways when new batch drops arrive.
We define a patient's long-range historical timeline as a collection of asynchronous observations. Let the input sequence be defined as $S = \{(t_k, \mathbf{x}_k)\}_{k=1}^L$, where $t_k \in \mathbb{R}^+$ represents an absolute monotonically increasing timestamp marking a record edit, and $\mathbf{x}_k \in \mathbb{R}^M$ represents a mixed feature vector containing current labs, vitals, and diagnostic embeddings.
To handle irregular intervals natively, we ground the underlying sequence model within a continuous-time linear system. The hidden state $\mathbf{h}(t) \in \mathbb{R}^d$ evolves based on a continuous vector input $\mathbf{x}(t) \in \mathbb{R}^M$ governed by the following core differential system:
Where $\mathbf{A} \in \mathbb{R}^{d \times d}$ is structured via a HiPPO matrix initialization framework to enable stable long-range history tracking. To process this on discrete, irregularly spaced computer records, we use a zero-order hold (ZOH) discretization step. This step incorporates the dynamic, data-driven step size $\Delta_k = t_k - t_{k-1}$:
Unlike classical Linear Time-Invariant (LTI) state space networks, our architecture uses data-dependent selection operators. This allows the model to adjust its parameters based on incoming information. Let the matrix parameters $\mathbf{B}_k$, $\mathbf{C}_k$, and step size $\Delta_k$ be direct functional projections of the current vector input $\mathbf{x}_k$:
This design allows the model to intelligently filter incoming data. If a specific timestamp entry $t_k$ represents an irrelevant administrative update or redundant billing code, the network squashes $\Delta_k \to 0$. This forces $\mathbf{\overline{A}}_k \to \mathbf{I}$, safely passing the core patient vector through that timeline node without distorting the underlying physiological history.
Because our data processing runs entirely offline on stable batch updates, the network can process information in both temporal directions simultaneously. For each sequence, we evaluate a forward pass to capture historical context, and a backward pass to integrate future timeline trends:
At each timestamp $k$, these two vectors are combined to form a bidirectional hidden representation: $\mathbf{M}_k = [\vec{\mathbf{h}}_k \parallel \overleftarrow{\mathbf{h}}_k] \in \mathbb{R}^{2d}$. This fused matrix passes through a linear decoder head to calculate continuous feature reconstructions, successfully filling in unobserved record gaps across the timeline.
To test this architecture under realistic operational constraints, we use the open-source **MIMIC-IV (v2.2)** database. This allows us to reconstruct historical timelines across long-term intervals, mimicking the behavior of quarterly batch data drops.
The extraction pipeline focuses on identifying cohorts with highly variable, non-uniform clinical touches. We target three core tables:
WITH RawEvents AS (
SELECT
le.subject_id,
le.charttime AS event_time,
CASE
WHEN le.itemid = 50912 THEN 'CREATININE'
WHEN le.itemid = 51006 THEN 'BUN'
WHEN le.itemid = 50882 THEN 'BICARBONATE'
END AS feature_name,
le.valuenum AS feature_value
FROM `physionet-data.mimiciv_hosp.labevents` le
WHERE le.itemid IN (50912, 51006, 50882)
AND le.valuenum IS NOT NULL
UNION ALL
SELECT
pr.subject_id,
pr.starttime AS event_time,
'MED_' || REGEXP_REPLACE(UPPER(pr.drug), r'[^A-Z0-9]', '_') AS feature_name,
1.0 AS feature_value
FROM `physionet-data.mimiciv_hosp.prescriptions` pr
WHERE pr.drug IS NOT NULL
),
OrderedTimeline AS (
SELECT
subject_id,
event_time,
feature_name,
feature_value,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY event_time) as seq_idx
FROM RawEvents
),
DeltaCalculation AS (
SELECT
curr.subject_id,
curr.event_time,
curr.feature_name,
curr.feature_value,
curr.seq_idx,
COALESCE(
TIMESTAMP_DIFF(curr.event_time, prev.event_time, MINUTE),
0
) AS delta_minutes
FROM OrderedTimeline curr
LEFT JOIN OrderedTimeline prev
ON curr.subject_id = prev.subject_id
AND curr.seq_idx = prev.seq_idx + 1
)
SELECT
subject_id,
event_time,
delta_minutes,
feature_name,
feature_value
FROM DeltaCalculation
ORDER BY subject_id, seq_idx;
This model implements our non-causal state space compressor in PyTorch. It uses bidirectional processing to ingest patient timelines and generate missing data projections alongside downstream risk assessments.
import torch
import torch.nn as nn
from mamba_ssm import Mamba
class ClinicalStateCompressor(nn.Module):
"""
Bidirectional Selective SSM for non-causal longitudinal trajectory
compression and value imputation on asynchronous EHR batch data drops.
"""
def __init__(self, num_features, d_model=256, d_state=32):
super(ClinicalStateCompressor, self).__init__()
# Linear layer combining raw sparse metrics with empirical delta trackers
self.embedding_layer = nn.Linear(num_features + 1, d_model)
# Dual-Directional Mamba operators for non-causal sequence analysis
self.forward_mamba = Mamba(
d_model=d_model,
d_state=d_state,
d_conv=4,
expand=2
)
self.backward_mamba = Mamba(
d_model=d_model,
d_state=d_state,
d_conv=4,
expand=2
)
# Imputation Decoder: Maps bi-directional hidden layers to input feature dimensions
self.imputation_decoder = nn.Linear(d_model * 2, num_features)
# Risk Predictor Classifier: Estimates out-of-sample patient risks for the upcoming quarter
self.risk_classifier = nn.Sequential(
nn.Linear(d_model * 2, 128),
nn.LayerNorm(128),
nn.GELU(),
nn.Dropout(0.3),
nn.Linear(128, 15),
nn.Sigmoid()
)
def forward(self, features, deltas):
"""
Args:
features (Tensor): [Batch Size, Sequence Length, Number of Clinical Features]
deltas (Tensor): [Batch Size, Sequence Length, 1] (Time differences)
"""
# Combine clinical measurements and time step differences into a single input representation
x = torch.cat([features, deltas], dim=-1)
x_prime = self.embedding_layer(x)
# Execute forward pass over historical context
h_fwd = self.forward_mamba(x_prime)
# Reverse chronological timeline order for backward pass processing
x_prime_reversed = torch.flip(x_prime, dims=[1])
h_bwd_raw = self.backward_mamba(x_prime_reversed)
h_bwd = torch.flip(h_bwd_raw, dims=[1])
# Construct bidirectional hidden representation matrix
m_k = torch.cat([h_fwd, h_bwd], dim=-1)
# Reconstruct missing clinical values across the sequence timeline
imputed_trajectory = self.imputation_decoder(m_k)
# Isolate the final hidden vector to serve as the fixed-dimensional patient portrait
z_p = m_k[:, -1, :]
# Generate downstream operational and clinical risk classifications
predicted_risk_profile = self.risk_classifier(z_p)
return imputed_trajectory, predicted_risk_profile, z_p
To evaluate how effectively the **Patient State Portrait Vector ($\mathbf{z}_p$)** captures long-term historical records, we benchmark our architecture against traditional clinical sequence models.
The framework is tested on its ability to forecast critical clinical events during a data-blind 90-day window following each batch drop. Performance metrics look specifically at predicting shifts in Acute Kidney Injury (AKI) severity levels using the KDIGO clinical criteria, alongside forecasting unplanned 30-day readmissions.
| Baseline Strategy | Sparsity Handling Mechanism | Computational Scaling | Batch Drop Suitability |
|---|---|---|---|
| LSTM + LOCF Heuristics | Constant zero-order forward propagation. No intrinsic interval understanding. | $O(L)$ | Poor. Struggles with vanishing gradients across multi-year historical patient records. |
| Multi-Time Attention Network (mTAN) | Learned continuous time-embedding functions mapped to query vectors. | $O(L^2)$ | Moderate. Delivers high accuracy but scales poorly due to extreme quadratic memory bottlenecks. |
| Bidirectional Selective SSM (Proposed) | Continuous-time linear differential parameterization via data-dependent delta adjustment. | $O(L)$ | Optimal. Compresses multi-year, irregularly sampled histories efficiently whenever batch databases are updated. |