ECG Synthesis: From GPU Training to Edge Deployment on Raspberry Pi

Kasturi Murthy
Aug 31
5 min read

Introduction

In my previous blog post, I discussed the input data for the LSTMVAE model, which comes from the publicly available PTB-XL dataset, specifically the single-lead Normal PTB-XL ECG data—a comprehensive clinical ECG corpus. Initially, raw ECG signals are processed using the NeuroKit2 library to detect R-peaks, which serve as temporal anchors for constructing analytic waveforms. Based on these R-peak positions, structured ECG sequences are generated by overlaying physiologically inspired wave components—P, Q, R, S, T, and U—using Dirac delta–type Gaussian functions positioned at appropriate offsets relative to each R-peak. The R-peak data consisted of lists where each inner list contained the indices of all the R-peaks for a single ECG waveform. The idea was to inject controlled, idealistic R-peaks into the input signal and observe how well the model preserves or distorts them during reconstruction.

After training, the synthetic data generated and highlighted in the visualization video of the previous blog post. That visualization reveals that the synthetic data produced by the LSTMVAE model distorted the amplitude of the generated synthetic ECG. However, the Neurokit2 library was still able to identify the R-peaks in the synthetic data. Methods to minimize amplitude distortion in synthesized ECG waveforms will be discussed in a future blog post.

For now, this blog post details how the trained model is employed to generate synthetic ECG waveforms on a resource-limited device such as a Raspberry Pi.

1. The Challenge: Building a Smart Edge Device

Running complex machine learning models on low-power devices like a Raspberry Pi is a significant challenge. Training a sophisticated model like a Variational Autoencoder (VAE) requires massive computational power, but the final application—generating synthetic ECGs for a specific purpose—needs to run efficiently and affordably on a small device. Our solution is a two-step pipeline: build the brain on a powerful machine, then transfer the knowledge to the low-power device.

2. The Training Phase: Forging the Brains of the Operation

This phase focuses on creating a high-fidelity model to understand intricate ECG patterns. Training is performed on a machine with an NVIDIA GPU, utilizing Docker pass-through for direct GPU access.

Model Architecture (LSTM-VAE): We use a specialized architecture combining convolutional and recurrent networks. The Encoder features a multi-scale Conv1D layer to extract key morphological features (PQRSTU complexes) from the ECG signal, processed by an LSTM layer to capture long-range dependencies. The Decoder employs a similar LSTM to reconstruct the latent representation into a complete waveform.
Loss Function: The process incorporates a morphology-aware weighted MSE loss that prioritizes accurate reconstruction of critical PQRSTU peaks over flatline segments, ensuring the model learns a high-fidelity representation of the ECG. This is complemented by a cyclical beta annealing schedule to prevent latent collapse and promote diversity in synthetic ECG generation.
Outcome: Following training, the model’s parameters are fine-tuned to accurately encode and decode ECGs, resulting in a smooth, continuous latent space ready for generation.

For more in-depth information on this topic, please see the previous posts.

3. The Hand-off: Serializing the Model for Portability

Upon completing the training, it is essential to save the model in a manner that facilitates easy transfer to another device. The recommended practice in PyTorch is to save only the model's state dictionary, which encompasses all the learned weights and biases.

Saving the State Dictionary: The torch.save() function is employed to serialize the model's parameters into a file (e.g., trained_lstm_vae_weights.pth). This file is lightweight and can be effortlessly transferred to the Raspberry Pi.
Advantages of a State Dictionary: This method is both lightweight and robust. It decouples the model's architecture from its parameters, thereby simplifying the process of loading the model into different environments, even if there are minor changes in the surrounding code.

4. The Deployment Phase: Putting It to Work on the Edge

On the Raspberry Pi, although we lack a powerful GPU, we can still perform model inference. The primary task of the Pi is to utilize the trained model to generate synthetic ECGs on demand.

The Pi's Environment: The Raspberry Pi's Python environment must have a CPU-only version of PyTorch installed, along with the exact class definition of the LSTMVAE model.
Loading the Model: We instantiate a new LSTMVAE object on the Pi using the same hyperparameters from the training phase. Subsequently, we load the saved state dictionary into this new model using model.load_state_dict().
Inference on Pi: A final call to model.eval() prepares the model for inference. We can then use its decode() method to generate new synthetic ECGs from random latent vectors. While this process is slower on the Pi's CPU compared to a GPU, it effectively demonstrates how a complex model can be deployed and run on an edge device, making it a cost-effective and portable solution.

5. Conclusion: A Hybrid AI Pipeline

This end-to-end process successfully demonstrates a powerful hybrid AI pipeline. By separating the computationally intensive training from the final inference, we can leverage the power of cloud or local GPUs for development and still deploy a functional, intelligent application on an affordable and low-power edge device like a Raspberry Pi.

In Figure 1.0, which features a video of generated synthetic data on Pi, the R peaks were intentionally amplified. This was a deliberate approach to stress test the amplitude fidelity of the LSTMVAE decoder, assessing how the model preserves or distorts exaggerated R peaks in the input signal during reconstruction. This also helps uncover latent bottlenecks or biases in the loss function that prioritize smooth reconstruction over amplitude fidelity—a topic to be discussed in a future blog post.

Figure 1: A video demonstration of synthetic ECG signals generated on Raspberry PI from five randomly selected samples. R-peak positions, identified using the Neurokit2 library, are highlighted throughout. The visualization reveals that the LSTM-VAE decoder tends to compress or distort sharp R-peak transients. To probe this behavior, controlled amplitude enhancements were introduced at the detected R-peak locations using Gaussian-shaped bumps. Additionally, the median R-peak positions within defined temporal bins across the 1000-sample ECG sequences are displayed. These median values are computed from the entire input training dataset and serve as a reference for comparing synthesized outputs against the original data distribution.

6. Side Note-Histogram and Histograms of Ingested Data to LSTM-VAE model

We have referred to the median values in the previous figure above within the context of analyzing the distribution of R-peak locations in input ECG waveforms. The goal was to understand where the R-peaks tend to occur in the input dataset.

Raw Data: The R-peak data was a list of lists (e.g., x2_read), where each inner list contained the indices of all the R-peaks for a single ECG waveform.
Objective: The objective was to get a single statistical value (the median) that represents the central tendency of the R-peak locations within specific temporal ranges or "bins" of your 1000-sample ECG signals. This would give you a more nuanced understanding of your data's distribution than just the simple frequency count from a histogram

**Figure 2.0 Histogram of counts of at each of R Peak locations in Input Data. The input data is a normal ECG waveform taken from the PTB-XL dataset. Plotnine library's theme_xcd is used for this plot.**

Figure 3 Boxplots illustrating the distribution of R-peak indices across temporal bins within 10-second ECG signals sampled at 100 Hz (total length: 1000 samples). The x-axis represents sequential bin ranges spanning the full signal duration, while the y-axis denotes the index positions of detected R-peaks. Each boxplot summarizes the spread of R-peak locations within its respective bin, and the count of R-peaks in each bin is displayed above the corresponding box. This visualization provides insight into the temporal density and variability of R-peak occurrences across the input dataset.

Technology Blog

Artificial Intelligence, Parallel Computing & Space Physics

ECG Synthesis: From GPU Training to Edge Deployment on Raspberry Pi

Introduction

1. The Challenge: Building a Smart Edge Device

2. The Training Phase: Forging the Brains of the Operation

3. The Hand-off: Serializing the Model for Portability

4. The Deployment Phase: Putting It to Work on the Edge

5. Conclusion: A Hybrid AI Pipeline

6. Side Note-Histogram and Histograms of Ingested Data to LSTM-VAE model

Recent Posts

Comments

Technology Blog Artificial Intelligence, Parallel Computing & Space Physics

Introduction

1. The Challenge: Building a Smart Edge Device

2. The Training Phase: Forging the Brains of the Operation

3. The Hand-off: Serializing the Model for Portability

4. The Deployment Phase: Putting It to Work on the Edge

5. Conclusion: A Hybrid AI Pipeline

6. Side Note-Histogram and Histograms of Ingested Data to LSTM-VAE model

Comments

Technology Blog

Artificial Intelligence, Parallel Computing & Space Physics