Chapter 5: EHR and Temporal Models

Large language models for electronic health records, clinical time series, and longitudinal patient data analysis.

← Medical Vision-Language Models | Back to Index | Next: Validation and Datasets →

Executive Summary

Electronic Health Records (EHRs) contain rich, longitudinal patient data combining structured fields, unstructured clinical notes, and temporal measurements. This chapter explores how LLMs are being adapted to process this complex, multimodal healthcare data for clinical decision support, risk prediction, and automated documentation.

5.1 The EHR Challenge

5.1.1 Data Complexity

EHRs present unique challenges:

Data Type	Volume	Challenges
Clinical Notes	80% of EHR	Unstructured, medical jargon
Lab Results	Thousands/patient	Temporal, missing values
Medications	Complex histories	Interactions, discontinuations
Vital Signs	Continuous streams	Irregular sampling
Diagnoses	ICD codes	Hierarchical, evolving

5.1.2 Why Standard LLMs Fail

Medical Terminology: 100,000+ specialized terms
Temporal Reasoning: Disease progression over time
Multi-scale Time: Minutes (ICU) to years (chronic)
Sparse Data: Irregular observations
Privacy Requirements: Can’t use cloud APIs

5.2 GatorTron: Large Clinical Language Model

5.2.1 Architecture and Scale

Breaking barriers in clinical NLP:

class GatorTron(nn.Module):
    def __init__(self, size='8.9B'):
        super().__init__()
        configs = {
            '345M': {'layers': 24, 'hidden': 1024, 'heads': 16},
            '3.9B': {'layers': 48, 'hidden': 2560, 'heads': 40},
            '8.9B': {'layers': 48, 'hidden': 4096, 'heads': 64}
        }
        
        config = configs[size]
        self.transformer = MegatronBERT(
            num_layers=config['layers'],
            hidden_size=config['hidden'],
            num_attention_heads=config['heads'],
            max_position_embeddings=512,
            vocab_size=50265  # Extended medical vocabulary
        )

5.2.2 Training Data

Unprecedented clinical corpus:

Source	Size	Content
UF Health	50B words	126M notes, 2M patients
PubMed	16B words	Biomedical literature
Wikipedia	3B words	Medical articles
MIMIC-III	500M words	Public ICU data
Clinical Trials	1B words	Protocol descriptions

Total: >90B words of text (>82B clinical)

5.2.3 Clinical NLP Tasks

Performance improvements:

Task	Metric	GatorTron-3.9B	BioBERT	Improvement
Clinical NER	F1	92.4%	87.8%	+5.2%
Relation Extraction	F1	89.1%	83.2%	+7.1%
Text Similarity	Pearson	0.84	0.76	+10.5%
Natural Language Inference	Acc	88.5%	82.3%	+7.5%
Medical QA	EM	78.9%	71.4%	+10.5%

5.2.4 Clinical Applications

# Example: Automated ICD coding
def extract_diagnoses(clinical_note):
    # Tokenize and encode
    inputs = gatortron_tokenizer(clinical_note, return_tensors='pt')
    
    # Extract representations
    with torch.no_grad():
        outputs = gatortron_model(**inputs)
        hidden_states = outputs.last_hidden_state
    
    # Multi-label classification head
    icd_logits = icd_classifier(hidden_states.mean(dim=1))
    
    # Threshold and decode
    predicted_codes = (torch.sigmoid(icd_logits) > 0.5)
    return decode_icd_codes(predicted_codes)

5.3 Few-Shot Health Learners: Time Series with LLMs

5.3.1 Grounding Numerical Data

Adapting text-based LLMs for physiological signals:

class TimeSeriesLLM:
    def __init__(self, base_model='PaLM-24B'):
        self.llm = load_model(base_model)
        self.tokenizer = NumericTokenizer()
        
    def encode_timeseries(self, values, timestamps):
        # Convert to text representation
        text_repr = []
        for t, v in zip(timestamps, values):
            # Format: [TIME: HH:MM] METRIC: VALUE
            text_repr.append(f"[{t}] HR: {v} bpm")
        
        return " ".join(text_repr)

5.3.2 Applications Demonstrated

Diverse health monitoring tasks:

Task	Input	Output	Performance
Cardiac Analysis	ECG waveform	Arrhythmia detection	91% accuracy
Activity Recognition	Accelerometer	Activity type	87% F1
Metabolic Calculation	Motion + demographics	Calories burned	±50 kcal/day
Stress Detection	HRV + EDA	Stress level	0.78 correlation
Mental Health	Multi-sensor	PHQ-9 score	±2 points

5.3.3 Few-Shot Adaptation

# Example: 5-shot learning for new vital sign
few_shot_prompt = """
Given these examples of respiratory patterns:
1. [09:00] RR: 12, SpO2: 98% → Status: Normal
2. [09:15] RR: 24, SpO2: 94% → Status: Mild distress
3. [09:30] RR: 8, SpO2: 99% → Status: Bradypnea
4. [09:45] RR: 30, SpO2: 91% → Status: Severe distress
5. [10:00] RR: 16, SpO2: 96% → Status: Normal
 
Now classify: [10:15] RR: 28, SpO2: 92% → Status: ?
"""
 
response = model.generate(few_shot_prompt)
# Output: "Moderate to severe distress"

5.4 Temporal Modeling Strategies

5.4.1 Handling Irregular Sampling

Clinical data arrives irregularly:

class IrregularTimeSeriesEncoder:
    def __init__(self):
        self.time_encoder = TimePositionalEncoding()
        self.value_encoder = ValueEmbedding()
        self.missing_token = nn.Parameter(torch.randn(768))
    
    def encode(self, times, values, mask):
        # Handle missing values
        encoded_values = torch.where(
            mask,
            self.value_encoder(values),
            self.missing_token
        )
        
        # Add time information
        time_features = self.time_encoder(times)
        
        # Combine
        return encoded_values + time_features

5.4.2 Multi-Scale Temporal Attention

Different timescales for different signals:

class MultiScaleTemporalAttention(nn.Module):
    def __init__(self):
        super().__init__()
        self.minute_attention = TemporalAttention(window=60)
        self.hour_attention = TemporalAttention(window=3600)
        self.day_attention = TemporalAttention(window=86400)
        
    def forward(self, x, timestamps):
        # Different granularities
        minute_features = self.minute_attention(x, timestamps)
        hour_features = self.hour_attention(x, timestamps)
        day_features = self.day_attention(x, timestamps)
        
        # Adaptive fusion
        return self.fusion([minute_features, hour_features, day_features])

5.5 Integration with Clinical Workflows

5.5.1 Real-Time Processing Pipeline

class ClinicalStreamProcessor:
    def __init__(self):
        self.ehr_model = GatorTron()
        self.timeseries_model = TimeSeriesLLM()
        self.alert_system = ClinicalAlertSystem()
        
    def process_patient_stream(self, patient_id):
        while True:
            # Get latest data
            notes = get_latest_notes(patient_id)
            vitals = get_latest_vitals(patient_id)
            labs = get_latest_labs(patient_id)
            
            # Process each modality
            note_insights = self.ehr_model.analyze(notes)
            vital_trends = self.timeseries_model.analyze(vitals)
            lab_abnormalities = detect_lab_abnormalities(labs)
            
            # Integrate and alert
            risk_score = self.integrate_insights(
                note_insights, vital_trends, lab_abnormalities
            )
            
            if risk_score > THRESHOLD:
                self.alert_system.notify(patient_id, risk_score)
            
            time.sleep(300)  # Check every 5 minutes

5.5.2 Documentation Assistance

Automated note generation:

def generate_progress_note(patient_data):
    template = """
    SUBJECTIVE: {chief_complaint}
    
    OBJECTIVE:
    Vitals: {vital_signs}
    Labs: {lab_results}
    Physical Exam: {exam_findings}
    
    ASSESSMENT: {assessment}
    
    PLAN: {treatment_plan}
    """
    
    # LLM fills in template
    completed_note = model.complete(
        template,
        context=patient_data,
        style='professional_medical'
    )
    
    return completed_note

5.6 Privacy-Preserving Techniques

5.6.1 Federated Learning

Training without centralizing data:

class FederatedEHRModel:
    def __init__(self):
        self.global_model = GatorTron('345M')
        self.hospitals = []
        
    def federated_round(self):
        # Each hospital trains locally
        local_updates = []
        for hospital in self.hospitals:
            local_model = deepcopy(self.global_model)
            local_model.train_on_local_data(hospital.data)
            local_updates.append(local_model.state_dict())
        
        # Aggregate updates
        self.global_model = federated_averaging(local_updates)

5.6.2 Differential Privacy

def dp_training_step(model, batch, epsilon=1.0):
    # Clip gradients
    max_grad_norm = 1.0
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
    
    # Add noise
    noise_scale = max_grad_norm / epsilon
    for param in model.parameters():
        if param.grad is not None:
            noise = torch.randn_like(param.grad) * noise_scale
            param.grad += noise
    
    # Update
    optimizer.step()

5.7 Evaluation on Clinical Outcomes

5.7.1 Beyond NLP Metrics

Real clinical impact:

Metric	Description	Target
30-day Readmission	Prediction accuracy	>0.85 AUC
Length of Stay	MAE in days	<1.5 days
Mortality Risk	Calibration error	<0.05
Alert Fatigue	False positive rate	<10%
Documentation Time	Reduction	>30%

5.7.2 Clinical Trial Design

class ClinicalTrialEvaluator:
    def __init__(self, model, control_group, intervention_group):
        self.model = model
        self.control = control_group
        self.intervention = intervention_group
        
    def run_trial(self, duration_days=180):
        outcomes = {
            'control': [],
            'intervention': []
        }
        
        for day in range(duration_days):
            # Control: Standard care
            control_decisions = physician_decisions(self.control)
            
            # Intervention: Model-assisted
            model_suggestions = self.model.predict(self.intervention)
            intervention_decisions = physician_review(model_suggestions)
            
            # Track outcomes
            outcomes['control'].append(measure_outcomes(control_decisions))
            outcomes['intervention'].append(measure_outcomes(intervention_decisions))
        
        return statistical_analysis(outcomes)

5.8 Challenges and Limitations

5.8.1 Current Limitations

Hallucination: Generating plausible but false clinical information
Temporal Reasoning: Long-term dependencies still challenging
Causal Inference: Correlation vs causation in treatment effects
Rare Events: Limited data for rare diseases
Interpretability: Black-box decisions in critical care

5.8.2 Ongoing Research

Multimodal Integration: Combining notes + images + signals
Continual Learning: Adapting to new diseases (e.g., COVID-19)
Explainable Predictions: Attention-based explanations
Uncertainty Quantification: Knowing when not to predict

5.9 Future Directions

Near-term (2024-2025)

Real-time ICU monitoring systems
Automated coding and billing
Clinical trial matching

Long-term (2025+)

Personalized medicine recommendations
Drug discovery from EHR patterns
Digital twins for treatment simulation

5.10 Key Takeaways

Scale Matters: Billions of clinical words improve performance
Temporal Complexity: Healthcare data requires special handling
Privacy First: Federated learning and differential privacy essential
Clinical Validation: Beyond NLP metrics to patient outcomes
Integration Challenge: Fitting into existing workflows crucial

5.11 Resources

Models and Code

Datasets