इन्द्रिय ఇంద్రియ ಇಂದ್ರಿಯ
Making sense of what is Sensed

Technology Blog
Artificial Intelligence, Parallel Computing & Space Physics
11 results found with an empty search
- BiomedClip - Citation Ripples
If you’ve ever wondered how a research paper’s ideas ripple across disciplines, Litstudy[1] is the tool that brings those connections to life. Rather than simply tallying up citations, Litstudy dives deeper showing you exactly which domains have picked up on a paper’s concepts. To make the experience even more interactive, I’ve included a Python notebook in HTML format by taking the BiomedCLIP [2] paper as an example. BiomedClip Model is featured in AI Kosh, the Government of India’s AI repository [3]. This blog post highlights the application of BiomedCLIP's concepts in biomedical AI, emphasizing multimodal learning that combines image and text data—insights gathered from its references on arxiv.org. The influence is clear in areas like vision-language processing, medical image classification, visual question answering, and radiology. But Litstudy doesn’t stop at just mapping citations—it also brings powerful topic modeling into the mix. With this feature, you can uncover the main themes and research trends that emerge from a paper’s citation network. Topic modeling automatically groups related citations, helping you visualize which areas of research are most influenced by the paper and how its ideas have evolved across disciplines. More specifically, one can narrow down to specific documents addressing a particular topic. This means you’re not only seeing where a paper has been cited, but also gaining a deeper understanding of the conversations and innovations it has sparked. Combined with the interactive Python notebook, Litstudy empowers you to explore these topics hands-on, making your literature reviews and blog posts richer, more insightful, and truly data driven. The following is the Jupyter Notebook: A Jupyter Notebook with self-explanatory features and a Litstudy API to access arxiv.org References S. Heldens, A. Sclocco, H. Dreuning, B. van Werkhoven, P. Hijma, J. Maassen & R.V. van Nieuwpoort (2022), "litstudy: A Python package for literature reviews", SoftwareX , 20, 101207. DOI: 10.1016/j.softx.2022.101207 BiomedCLIP: A Multimodal Biomedical Foundation Model Pretrained from Fifteen Million Scientific Image–Text Pairs ; Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Angela Crabtree, Brian Piening, Carlo Bifulco, Matthew P. Lungren, Tristan Naumann, Sheng Wang, Hoifung PoonYear: 2025 Published on: arXiv:2303.00915 AIKosh
- Geomagnetic coordinates, time and field in Centred and Eccentric dipole approximations
** FORTRAN code work is in progress will also be shared from GitHub References Geomagnetic coordinates, time, and field in Centered and Eccentric Dipole approximations. Ramana, K V V , Murthy, K S R N and Khan, Ibrahim , Indian Journal of Radio & Space Physics Vol. 27, February 1998, pp.35-42. http://nopr.niscpr.res.in/handle/123456789/35273 A Theoretical Study of F2-region Equatorial Anomaly in Solar Maximum and Minimum, K S R Narayana Murthy, PhD (Thesis), 1995. Department of Physics, Andhra University, Visakhapatnam Narayana Murthy, K.S.R. and K.V.V. Ramana (1996): A Numerical Study of Quiet-time F-region Anomaly at Solar Maximum and Minimum, Advances of Space Research, 18, 87-90, Pergamon Press U.K Met Office. (2010 - 2015). Cartopy: a cartographic Python library with a Matplotlib interface . Retrieved from https://cartopy.readthedocs.io
- Efficient Containerized Fortran and CUDA Development with Docker in VS Code
In the current high-performance computing environment, developers require tools that are both powerful and easy to configure. Fortran and CUDA remain essential for scientific and engineering applications but integrating them can be complex and error prone. Docker addresses this issue by offering a consistent, portable environment that packages code, compilers, libraries, and system tools, thus resolving the typical “it works on my machine” problem. This blog post builds on recent research supporting Fortran’s ongoing importance in scientific computing [1]. Another reason for undertaking this is to revive the code I used for my PhD thesis. This document demonstrates the application of NVIDIA's `nvfortran` compiler to interface with GPUs through structured JSON and YAML configuration files. The entire process is conducted within a containerized development workflow, facilitated by Docker and Visual Studio Code, and is initiated seamlessly from a Windows Subsystem for Linux (WSL) prompt. Understanding Docker and Its Benefits Docker is a platform that streamlines the deployment of applications within lightweight, portable containers. These containers encompass everything required for your application to function, such as code, runtime, libraries, and system tools. Here are some key benefits of using Docker: Code and Dependencies Together: Typically, running code on different computers can lead to issues due to variations in library versions, tools, or even the operating system. Docker addresses this by allowing you to “package” your code along with all its dependencies into a single container. Lightweight and Portable: Unlike virtual machines, containers do not require a full operating system. They share the host system’s kernel, making them much more efficient and quicker to start. Consistency Across Environments: Whether you run your container on your laptop, a colleague’s computer, or a cloud server, it will function exactly the same way. This eliminates the classic “it works on my machine” problem. Scientific Sandbox: The term “scientific sandbox” refers to a controlled, isolated environment where you can experiment, develop, and test your scientific code without affecting other projects or system settings. You determine precisely what goes into the container—specific compilers, libraries, tools, and even data files. Setting Up Docker for Fortran and CUDA Development To get started with Docker, you need to install it on your machine. Follow these straightforward steps: Download Docker: Visit the Docker website and download the version suitable for your operating system. Windows 11 with WSL used in this demonstration Install Docker: Follow the installation instructions specific to your OS. If you are using Windows, enable the WSL 2 feature during installation. Verify Installation: Open a terminal and run `docker --version` to confirm that Docker is correctly installed. The docker-compose.yml file : For establishing the nvfortran-dev service as defined in your . devcontainer folder Defining the base Docker image as nvcr.io/nvidia/nvhpc:24.3-devel-cuda12.3-ubuntu22.04 , which includes the Ubuntu OS, the nvfortran compiler, and all required NVIDIA libraries. Setting up GPU access by configuring the runtime: NVidia and environment variables such as NVIDIA_VISIBLE_DEVICES. Linking your local project folder to the container's workspace directory with a volume, allowing local file editing and execution within the container. Ensuring the container remains active so that VS Code stays connected to it. services: nvfortran-dev: image: nvcr.io/nvidia/nvhpc:24.3-devel-cuda12.3-ubuntu22.04 container_name: nvfortran-dev runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICES=all - NVIDIA_DRIVER_CAPABILITIES=compute,utility volumes: - ..:/workspace working_dir: /workspace command: /bin/bash stdin_open: true tty: true Connecting to VS Code: devcontainer.json To enable this container for use within Visual Studio Code, I create a devcontainer.json file. This file instructs VS Code on how to: Start the container Mount my workspace Install useful extensions The content of the devcontainer.json file is as follows: { "name": "NVIDIA Fortran Dev", "dockerComposeFile": "docker-compose.yml", "service": "nvfortran-dev", "workspaceFolder": "/workspace", "settings": { "files.associations": { "*.f90": "fortran", "*.f95": "fortran", "*.F95": "fortran", "*.for": "fortran", "*.FOR": "fortran", "*.f": "fortran", "*.F": "fortran" } }, "extensions": [ "fortran-lang.fortran", "ms-vscode.remote-containers" ], "remoteUser": "root" } This devcontainer.json file sets up a Visual Studio Code development container for Fortran programming using NVIDIA's tools. Here is an explanation of each key in the JSON file: name: "NVIDIA Fortran Dev" is the display name for the development environment that will show up in the VS Code interface. dockerComposeFile: Indicates that the container configuration is defined in the docker-compose.yml file located in the same directory. service: Tells VS Code to connect to the nvfortran-dev service specified in your docker-compose.yml file. workspaceFolder: Defines the default directory that will open in VS Code when you connect to the container, which is workspace . settings: Configures VS Code settings that apply only when working inside this container. files.associations: Ensures that files ending in .f90 and .f95 are identified as Fortran files, allowing for proper syntax highlighting and language features. extensions: A list of VS Code extensions that will be automatically installed and activated inside the container. hansec.fortran: Offers Fortran language support (syntax highlighting, snippets, etc.). However, I do not see syntax highlights working for me. ms-vscode.remote-containers: The essential extension that enables the development container feature. remoteUser: Specifies that commands and processes inside the container will be executed as the root user. Proceed with the following steps Modify your devcontainer.json file located in [.devcontainer/devcontainer.json] by adding the extended files associations block. Please refer to IDE figure below Access the Command Palette using Ctrl+Shift+P. Execute the Dev Containers: Rebuild Container command in VS Code, which will recreate and reconnect your development environment using the configuration from your .devcontainer folder. During this process, it will: Initiate the nvfortran-dev service (utilizing the NVIDIA HPC SDK image with nvfortran and CUDA support) Configure the container with all designated settings and extensions, and link your VS Code session to this active container, providing a fully configured nvfortran environment within Docker. Consequently, after executing the Dev Containers: Rebuild Container command, you will be operating within the Docker-based nvfortran environment as outlined by your configuration files. VS Code IDE on a Windows system using WSL, displaying a Fortran program set up for GPU benchmarking. The terminal confirms (refer figure below) successful GPU passthrough, showing one available OpenMP device with default device ID zero. Refer the following screenshot of Terminal Output Details from the Terminal Output References McKevitt, James, Vorobyov, Eduard I., and Kulikov, Igor. “Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP.” Journal of Parallel and Distributed Computing , vol. 195, Elsevier BV, January 2025, article 104977. DOI: 10.1016/j.jpdc.2024.104977 . Also available as a preprint at arXiv:2409.02294 .
- Generation of Synthetic ECG Data for Enhancing Medical Diagnosis
Electrocardiogram (ECG) signals are pivotal in diagnosing cardiovascular conditions, as they provide critical insights into the heart's electrical activity. Despite their significance, the acquisition of high-quality, annotated ECG data poses challenges, largely due to privacy concerns, as well as substantial financial and time constraints. This MTech dissertation in Data Science and Engineering addresses these challenges by investigating the potential of generating synthetic ECG data through the application of Artificial Intelligence (AI) models, specifically Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN). Research Objectives Develop models using Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN) to generate synthetic ECG data. Compare the statistical properties of synthetic and real ECG data to ensure clinical utility. Methodology The research is divided into four phases: Detailed Literature Survey: This study employs the Python library, Litstudy, to conduct an exhaustive review of existing literature on ECG data and machine learning techniques for synthetic data generation. Litstudy facilitates the analysis by extracting metadata from various scientific sources, standardizing the data, and managing documents through filtering, selection, deduplication, and annotation. Additionally, it provides statistical analysis, generates bibliographic networks, and utilizes natural language processing (NLP) for topic discovery, making it a powerful tool for conducting detailed literature surveys and reviews. The utility program developed for this purpose will be shared through a GitHub link. Development of VAE Model: This phase involves creating and training a Variational Autoencoder (VAE) model to generate synthetic ECG data for both normal and abnormal ECGs. The development of the VAE model comprises encoding the input ECG data into a latent space, capturing essential features, and decoding this latent representation to reconstruct the ECG signals. The model utilizes the variational inference approach to approximate posterior distributions, allowing for efficient generation of new data points. The MIT-BIH Arrhythmia Database and the PTB-XL dataset, hosted by PhysioNet, are employed for this purpose. The MIT-BIH dataset is instrumental for developing and evaluating algorithms for cardiac arrhythmia detection, ECG signal processing, and machine learning applications, while the PTB-XL dataset is a comprehensive resource that includes a diverse array of ECG recordings with detailed annotations. To ensure the quality of the generated ECG data, the NeuroKit2 library is utilized for signal processing and quality assessment. This library offers various tools for cleaning ECG signals, detecting peaks, and computing heart rate variability indices, ensuring that the synthetic data closely resembles the original data in terms of quality and statistical properties. Statistical comparisons, including Maximum Mean Discrepancy (MMD) and the Kolmogorov-Smirnov (KS) test, are conducted to evaluate the similarity between the synthetic and real data distributions. This approach addresses the challenges of obtaining high-quality, annotated ECG data and facilitates training machine learning models, enhancing their performance and robustness. Development of GAN Model: T he development of the GAN model for generating synthetic ECG data involves several key steps: Dataset Utilization: The MIT-BIH Arrhythmia Database and the PTB-XL dataset hosted by PhysioNet are employed to provide diverse ECG recordings. These datasets are instrumental for developing and evaluating algorithms for cardiac arrhythmia detection and ECG signal processing. Training and Testing: The GAN model is trained using these datasets, focusing on generating synthetic data for both normal and abnormal ECGs. The synthetic data is then evaluated using statistical measures like Maximum Mean Discrepancy (MMD) and the Kolmogorov-Smirnov (KS) test to compare the similarity between synthetic and real data distributions. Quality Assurance: Tools like the NeuroKit2 library are utilized for signal processing and quality assessment of the generated ECG data. This ensures that the synthetic data closely resembles the original data in terms of quality and statistical properties. Evaluation and Analysis: Perform rigorous evaluations of the generated data and report findings, including statistical tests and visual comparisons to ensure data quality. Challenges and Future Work Addressing noise and artifacts in ECG data to improve the quality of synthetic data. Overcoming the scarcity of specific ECG datasets, particularly for Ventricular Tachycardias. Expanding the use of various sampling techniques in the latent space for more diverse synthetic data generation.
- Synthetic Sine Waves Using Variational Auto-Encoder
This blog post begins with a playful analogy to the scientific paper by King and Welling [1], likening to a friend who studies your fashion to create their own distinctive style. Most of inspiration is drawn from this paper. VAE is a generative model that learns from a sampled distribution. Your Quirky Friend: The Encoder This friend, called "Q," encapsulates the essence of your style and records it in a notebook, symbolizing the latent space in a VAE. Q uses the encoder for this task. They observe your outfit to capture its essence. Instead of merely replicating your outfit, Q identifies the key features that make your style unique. Q has several methods to capture the key features of your style using the encoder, which is a neural network and can take various forms, such as fully connected networks, Convolutional Neural Networks (CNNs) for image data, or Long Short-Term Memory (LSTM) networks for sequential data. Once Q has captured the key features of your style, they record these insights in their notebook. This notebook, or latent space, is where Q stores the essence of your fashion sense. Let's take a closer look at what this latent space represents More Concretely, the Encoder is responsible for mapping the input data into a latent space representation using neural network forms just mentioned. It takes the input data and compresses it into a smaller, latent variable space, typically characterized by a mean and variance that define a probability distribution. Q's Notebook: The Latent Space Q's notebook represents the latent space in the VAE. It's where Q stores their notes, sketches, and ideas inspired by your fashion sense. This notebook contains the essence of your style, distilled into a set of key features. In the VAE, the latent space is a probabilistic representation of the input data. It's a distribution over the possible values of the latent variables, which capture the underlying patterns and structures in the data. ELBO – Evidence Lower Bound In order to make the VAE learns a meaningful latent space representation, the model optimizes the Evidence Lower Bound (ELBO). The ELBO has two main components: the reconstruction term, which checks how well the decoder can recreate the input data from the latent space, and the regularization term, which ensures the latent space distribution matches a prior distribution (usually a standard Gaussian distribution). Reparameterization: Q's Creative Twist Now, imagine Q wants to create a new outfit inspired by your style, but with a twist. Instead of directly sampling from their notebook, Q uses a clever trick: they sample features from a standard normal distribution (like a random fashion magazine) and then transforms the sample using their notebook's parameters (like applying their fashion sense). This is similar to the reparameterization trick used in VAEs. By sampling from a standard normal distribution and transforming the sample using the latent space's parameters, we can backpropagate through the sampling process and optimize the VAE's parameters efficiently. Q's Fashion Creations: The Decoder Q will use their parameterized samples to create new outfits inspired by your fashion sense. Q's creations may not be exact replicas, but they capture the essence of your style. This is the decoder in a VAE. The decoder takes the reparameterized samples from Q's notebook and uses them to generate new data samples similar to the original input data. In the VAE, the decoder is trained to reconstruct the input data from the latent space. decoder is also another neural network that takes samples from the latent space and reconstructs the input data, mapping the latent variables back to the data space. Key Functions Encoder: Compresses input data into a latent representation. Decoder: Reconstructs data from the latent representation. Q's Fashion Evolution: Training the VAE As Q continues to study your fashion sense, create new outfits, and refine their reparameterization trick, they develop a unique understanding of your style. This process is like training the VAE. During training, the VAE learns to optimize the encoder, latent space, and decoder. The goal is to find a balance between reconstructing the input data accurately and capturing the underlying patterns and structures in the data. Training Objective The VAE is trained to minimize the difference between the original input and the reconstructed output while also ensuring that the latent space follows a desired distribution (usually a Gaussian distribution). This structure allows the VAE to learn meaningful representations of the data while also enabling effective generation of new data samples. The Result: A Quirky yet Stylish VAE Through this process, Q develops a unique understanding of your fashion sense, which is reflected in their quirky yet stylish outfits. Similarly, the trained VAE can generate new data samples that capture the essence of the input data, while also introducing new and interesting variations. Figure 1.0 Variational Autoencoder (VAE) Architecture. The flow of data is indicated by arrows connecting each block, showing the sequence of operations from input to output Implementation Details Using Pytorch The encoder/decoder-based Variational Autoencoder (VAE) shown in this Figure 2.0 processes inputs consisting of sine waves. These sine waves have an amplitude of 1 unit and a single frequency, but the phase varies randomly within the range of [-π, +π]. The encoder maps these inputs to a latent representation, characterized by mean and variance, which the decoder uses to reconstruct the sine waves. Figure 2.0 The Encoder-Decoder architecture of a Variational Autoencoder (VAE) is designed to generate synthetic sine waves. It is important to note that the linear layers do not inherently compute the mean and variance of the input (such as averaging the input values in the case of the mean Instead, they learn to produce a vector that represents the and variance of latent distribution as part of the VAE training process VAE Loss Function The VAE loss function [1] comprises two main components: Reconstruction Term: Measures the discrepancy between the input data and its reconstructed version. Regularization Term (Kullback-Leibler Divergence): Ensures that the latent space distribution aligns closely with a prior distribution. By minimizing the VAE loss function, the model learns to encode input data into a meaningful latent space representation, decode this representation into a reconstructed version of the input data, and generate new data samples akin to the input data. Figure 3.0 This figure illustrates the VAE (Variational Autoencoder) loss function during training. The plot shows the decline in average loss over 25 epochs, highlighting the model's learning progression. The x-axis represents the epochs, and the y-axis represents the average loss, with a noticeable trend of decreasing loss as the training continues, indicating improved model performance. Generating Synthetic Data Using VAEs The steps for generation of Synthetic Sine Wave data using a Trained VAE are given here under: Set the VAE to evaluation mode after sufficient convergence. Disable gradient tracking to save memory and computation. Sample latent vectors from a standard normal distribution. This is on Q's Creative Twist mentioned above. Q's Synthetic Data Factory Powered by Circular Buffers A circular buffer, often used for efficient data management, operates like a fixed-size, looping queue. In the of a VAE system, it functions as follows: The circular buffer comprises two components—the read buffer and the write buffer: Read Buffer: This is where Q continuously inputs random fashion patterns drawn from various magazines. These patterns are stored sequentially in the buffer for processing by the VAE encoder. The read buffer ensures a steady data to the encoder enabling operation. Write Buffer Once the read buffer reaches its capacity (i.e., it becomes full), the write buffer takes over and begins storing new incoming patterns. This mechanism allows the system to handle data without overflow or interruptions. The decoder subsequently reads patterns the write buffer, transforming them into synthetic sine waves as part of the output. The circular buffer operates such that when the end of the buffer is reached, it loops back to the beginning. This ensures optimal use of space, as no data is lost or wasted—old data is simply overwritten when the buffer cycles around. In this implementation, the circular buffer enables efficient coordination between the encoder and decoder ensuring a seamless flow of and pattern decoding into synthetic sine waves. It is akin to a never-ending loop pattern generation and transformation. Video 1.0: "Circular Buffers in Action – Q's Synthetic Data Factory" - The top window showcases Q randomly sampling fashion patterns and placing them into the read buffer. Once the buffer is full, the decoder activates, reading these patterns to produce synthetic sine waves. The green sine wave represents the reference wave with zero phase. The red sine wave is the noisy synthetic output generated by the decoder, while the blue sine wave is a smoothed version of the raw decoder output. References [1] Diederik P. Kingma and Max Welling (2019), An Introduction to Variational Autoencoders, Foundations and Trends in Machine Learning, arXiv:1906.02691v3 [cs.LG] 11 Dec 2019, https://doi.org/10.48550/arXiv.1906.02691
- LSTM-VAE: Deep Architecture for GPU-Accelerated ECG Generation
Introduction In this post, I explore a custom LSTM-based Variational Autoencoder (LSTMVAE) designed for sequential representation learning and generative modeling of ECG signals. The model combines the temporal sensitivity of Long Short-Term Memory (LSTM) networks with the probabilistic structure of a Variational Autoencoder (VAE), enabling it to capture both short-term waveform dynamics and long-range dependencies in cardiac rhythms. By learning a compressed latent representation of ECG sequences, the LSTMVAE can reconstruct realistic signals and generate novel variations—making it a powerful tool for tasks like anomaly detection, synthetic data generation, and physiological modeling. The input data for the LSTMVAE model comes from the publicly available PTB-XL dataset, specifically the single-lead Normal PTB-XL ECG data—a comprehensive clinical ECG corpus I referred to in an earlier post. Initially, raw ECG signals are processed using the NeuroKit2 library to detect R-peaks, which serve as temporal anchors for constructing analytic waveforms. Based on these R-peak positions, structured ECG sequences are generated using a custom function, generate_ecg_waveform_given_R_wave_pos_modified, which overlays physiologically inspired wave components—P, Q, R, S, T, and U—using Dirac delta–type Gaussian functions positioned at appropriate offsets relative to each R-peak. Figure 1.0 Annotated ECG signal from the single-lead Normal PTB-XL dataset, with R-peaks detected using NeuroKit2 and labeled by sample index. These R-peaks serve as temporal anchors for waveform synthesis in the LSTMVAE pipeline, where a custom function (generate_ecg_waveform_given_R_wave_pos_modified) overlays physiologically inspired components—P, Q, R, S, T, and U—at appropriate R-peak positions using Dirac delta–type Gaussian functions positioned at appropriate offsets relative to each R-peak to generate structured ECG for training LSTMVAE. Each wave is modeled using a scaled Gaussian approximation of the Dirac delta function, implemented via a helper function dirac_delta(x, epsilon). The width of each wave (controlled by epsilon) is adaptively scaled according to the sampling rate, ensuring realistic morphology across different resolutions. For example, the P wave is placed ~80 ms before the R-peak, the Q and S waves flank the R wave at ±10 ms, and the T and U waves follow at ~150 ms and ~250 ms respectively. These components are summed to form structured, beat-wise ECG signals that preserve both temporal rhythm and morphological diversity. This approach not only enables precise control over waveform shape and timing but also facilitates the generation of clean, interpretable sequences ideal for downstream modeling. The resulting synthetic ECGs are then fed into the LSTMVAE for representation learning and generative modeling, allowing the model to capture latent dynamics and reconstruct physiologically plausible cardiac signals I. Overall Architecture Flow & GPU Utilization LSTMVAE is a generative model designed to learn the underlying distribution of ECG waveforms, enabling it to both compress (encode) existing signals and generate (decode) new, realistic ones. This entire process, especially the computationally intensive training and inference, leverages the power of local NVIDIA GPU, which is made accessible through Docker pass-through. The VAE operates in three main stages: Encoder: Takes a real ECG signal (x) and compresses it into a probability distribution (mean μ and log-variance log(σ2) in a lower-dimensional latent space (z). Reparameterization Trick: Samples a latent vector (z) from this learned distribution, making the sampling process differentiable. Decoder: Takes the sampled latent vector (z) and reconstructs an ECG waveform (x^). The model is trained to minimize two things simultaneously: the difference between x^ and x (reconstruction loss) and the difference between the learned latent distribution and a simple standard normal distribution (KL divergence loss). II. Detailed Encoder Explanation (Feature Extraction & Compression) The encoder's job is to intelligently extract meaningful features from the raw ECG signal and summarize them into the latent space. Encoder uses a sophisticated multi-stage approach: Initial Input ( x ): The raw ECG signal enters, typically with a shape like (batch_size, sequence_length, input_dim) (e.g., (N, 1000, 1) for 1000 time points of a single-lead ECG). Multi-Scale Convolutional Layer ( self.conv_kernels & self.conv_merge ) : Purpose: This serves as a primary feature extractor. Instead of a single convolution, it utilizes a nn.ModuleList comprising Conv1d layers with different kernel_sizes (3, 11, 25, 75, 119) and a dilation=2. This design enables the model to capture ECG features across multiple temporal scales. Small Kernels (3, 11): Detect very local, sharp changes and fine details, like the precise onset/offset of waves or the sharp peak of the R-wave. Large Kernels (25, 75, 119): Capture broader contextual patterns and the overall morphology of entire PQRSTU complexes, considering longer durations. Dilated Convolutions: dilation=2 efficiently expands the receptive field of each kernel without needing more parameters, allowing even small kernels to "see" a wider range of the signal, which is crucial for 1000-point sequences . Process: The input x is transposed ( (N, 1, 1000) ) to fit Conv1d . Each Conv1d in self.conv_kernels processes the signal independently, generating 32 output channels for its specific scale. The outputs from all these kernels are torch.cat (concatenated) along the channel dimension, combining the multi-scale features into a rich representation (e.g., (N, 160, 1000) if 5 kernels * 32 channels). self.conv_merge (a Sequential of Conv1d s and ReLU s) then merges and processes these 160 channels, typically reducing them to a more manageable number (e.g., input_dim=1 ) while preserving important feature combinations. This acts as a bottleneck for the convolutional features. Linear Projection ( self.merge_project & self.merge_activation ): Purpose: After the convolutional features are merged and transposed back ( (N, 1000, 1) or (N, 1000, input_dim) ), this layer projects them into a projected_dim (e.g., 32). This further transforms the features into a format suitable for the LSTM. LSTM Encoder ( self.encoder_lstm ): Purpose: The core sequential processing unit. It takes the sequence of projected_dim features from the convolutional layers. Function: It processes the sequence step-by-step, capturing long-range temporal dependencies and contextual information across the entire ECG beat or rhythm. It's excellent at understanding how P, QRS, T, and U waves relate to each other over time. Output: It outputs a sequence of hidden states, but for VAEs, the key output is the final hidden state ( h_n[-1] ) from its last layer, which acts as a compressed, fixed-size summary of the entire input ECG sequence. Encoder Output Processing ( self.encoder_norm , self.dropout_latent , self.fc_mean , self.fc_log_var ): self.encoder_norm : LayerNorm is applied to the LSTM's final hidden state, stabilizing training. self.dropout_latent : Dropout is applied here to regularize the latent representation, preventing overfitting by ensuring the network doesn't rely too much on any single feature. self.fc_mean & self.fc_log_var : These linear layers take the processed hidden state and output the mean (μ) and log-variance (logσ2) vectors that define the probability distribution of the input ECG in the latent space. III. Latent Space (The Bottleneck & Generator Seed) Reparameterization Trick ( reparameterize method): Instead of directly sampling from the distribution, which would not be differentiable, the reparameterization trick can be used . This involves using this formula z=μ+ϵ⋅exp(0.5⋅logσ2), where ϵ is sampled from a standard normal distribution. This allows gradients to flow back through the sampling process and thereby making it differentiable. KL Divergence Loss: During training, this loss term forces the encoder's learned latent distributions to resemble a standard normal distribution (N(0,I)). This is crucial because it ensures the latent space is smooth and continuous, meaning similar points in the latent space correspond to similar ECGs. This also allows for sampling from a simple N(0,I) to generate new data . Taking cues from [1] cyclical Beta annealing and weighted MSE around R-peaks is used. IV. Detailed Decoder Explanation (Reconstruction & Generation) The decoder's job is to take a latent vector and "unfold" it back into a full ECG waveform. LSTM initialization in the Decoder: Latent Vector Initialization ( self.latent_to_hidden , self.latent_to_cell ): Purpose: To inject the "essence" or "style" of the desired ECG into the decoder LSTM. Process: The sampled latent vector z is passed through two linear layers ( self.latent_to_hidden , self.latent_to_cell ) to create the initial hidden state ( h_0 ) and cell state ( c_0 ) for the decoder LSTM. These initial states guide the LSTM's generation. Decoder Input ( decoder_input ): Purpose: This is a key part of the present design. Instead of just taking a repeated z vector, giving the decoder a more complex input: z_repeat : The latent vector z is repeated across the entire sequence_length . This provides consistent "context" or "style" information at every time step of the decoding process. noise : Concatenate torch.randn(...) (random noise) to the z_repeat input. This noise introduces stochasticity, preventing the decoder from generating identical outputs from the same z vector and helping it explore the data manifold. It allows for more diverse and varied generations. Shape: This combined input has a shape like (batch_size, sequence_length, latent_dim + input_dim) , which is fed into the decoder_lstm . LSTM Decoder ( self.decoder_lstm ): Purpose: The core sequence generator. Process: It takes the decoder_input and the initialized (h_0, c_0) states. It then processes this input step-by-step, generating a sequence of hidden states that represent the evolving structure of the ECG. It effectively "unrolls" the latent representation over time to construct the output sequence. Decoder Output ( self.fc_decoder_output & self.final _activation ): self.fc_decoder_output : A linear layer projects the LSTM's hidden states at each time step back to the original input_dim (e.g., 1 for single-lead ECG). self.final _activation : nn.Identity() (no activation). This is suitable if ECG signal is normalized to a range like [-1, 1] or if using MSE loss on raw values. V. How Encoder and Decoder Work Together for Synthetic ECG Generation Training Phase: The Encoder learns to map complex ECG waveforms (with their PQRSTU nuances) into a compact, regularized, and meaningful latent space. The Decoder simultaneously learns how to reconstruct these ECGs from samples taken from that latent space. The Weighted MSE Loss: This is critical! By assigning higher weights to errors in the PQRSTU regions, the model is explicitly incentivized to preserve the crucial morphology of the ECG peaks, overcoming the common blurring issue with standard MSE. The cyclical beta annealing helps balance this reconstruction fidelity with the regularization of the latent space. Synthetic ECG Generation Phase: Once the model is trained, it can generate entirely new ECGs without any input data. Process: Sample a random vector z : Sample a random vector z directly from a standard normal distribution (N(0,I)) (because training has forced the latent space to conform to this distribution). Feed z to the Decoder: This randomly sampled z vector is then passed as input to trained decoder. Generate Output: The decoder uses this z (and the additional noise input) to unfold a completely new, unique, and hopefully realistic ECG waveform. Outcome: Because latent space is smooth and meaningful, slightly different z vectors will yield slightly different (but still plausible) synthetic ECGs. This allows one to explore the space of possible ECGs and generate novel examples, which is invaluable for data augmentation or creating synthetic datasets for rare cardiac conditions . This model is trained using a custom loss function that combines weighted mean squared error (MSE) [1] for reconstruction fidelity with a KL divergence term, modulated by β-annealing. This approach draws from the β-VAE framework introduced by Higgins et al. (2017) [2], which encourages disentangled and interpretable latent representations by adjusting the weight of the KL term during training. Weighted MSE ensures signal-specific emphasis—especially important for biomedical time series like ECG—while cyclical β-annealing allows the model to gradually balance reconstruction accuracy and latent regularization over epochs. This combination enables the model to learn robust, physiologically meaningful representations of cardiac dynamics. The video below presents a screen recording of synthesized ECG waveforms generated using two distinct loss functions: weighted reconstruction loss and KL divergence with β-annealing. These waveforms are produced by a trained LSTM-VAE model and visualized using a circular buffer mechanism to simulate continuous signal flow. A custom CircularBuffer class is used to manage the temporal storage of synthetic ECG sequences. The buffer holds up to 1000 samples and supports random sampling for visualization. In each cycle, a pre-trained LSTM-VAE model—loaded in evaluation mode—is used to generate synthetic ECG signals by decoding latent vectors sampled from a standard normal distribution. These decoded sequences are then appended to the buffer for retrieval and display. The buffer acts as a lightweight, memory-efficient queue that enables smooth, continuous visualization of model outputs, simulating a real-time ECG stream. The visualization loop continuously samples five sequences from the buffer and processes them using NeuroKit2’s ecg_process method. Each cleaned ECG signal is plotted with its corresponding R-peaks annotated, providing insight into the physiological realism of the generated data. The display updates every 0.5 seconds, mimicking a live ECG feed. This setup enables dynamic monitoring of synthetic cardiac signals and serves as a useful tool for evaluating generative model performance in biomedical contexts. The visualization loop—captured in the video below—illustrates the raw synthetic ECG produced by the model, processed using the NeuroKit2 library at a sampling rate of 100 Hz. The nuances of the generated output will be explored further in a subsequent article. This work was developed using PyTorch 2.5.1+cu121 on an NVIDIA GeForce RTX 4060 Laptop GPU, accessed via Docker passthrough in a laptop environment. References Harvey, Christopher J., Sumaiya Shomaji, Zijun Yao, and Amit Noheria. "Comparison of Autoencoder Encodings for ECG Representation in Downstream Prediction Tasks." arXiv , 2024. arXiv:2410.02937. Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., & Lerchner, A. (2017). β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework – ICLR 2017.
- A2TTS-Telugu Speaker Adaptive TTS (Text-to-Speech)-v0.5
In this blog post, I discuss my preliminary experiences with a speaker-adaptive text-to-speech (TTS) system aimed at producing high-quality, natural speech in the low-resource Indian language, Telugu [1]. As I am proficient in reading and writing Telugu (despite my writing skills having declined since completing my class 12), I decided to explore this TTS system. AIKosh is an innovative initiative by the Government of India, aimed at democratizing access to artificial intelligence resources nationwide. Developed by the National e-Governance Division, AIKosh serves as a unified digital platform that consolidates datasets, AI models, computational tools, and sector-specific use cases to empower researchers, startups, and innovators. This short blog post will discuss my preliminary experiments carried out with this system. To access the model, you need to register and obtain permission to download it from AIKosh . Once I received approval from the model contributors, likely linked to IIT, Mumbai [1], I downloaded the model and developed a Python Jupyter notebook using the available TTS model features. Logging into AIKosh is relatively straightforward, and although there are several methods, I opted for the Digilocker route. A2TTS: TTS for Low Resource Indian Languages The model outlined in the publication employs attention mechanisms and diffusion-based mel-spectrogram refinement for synthesizing multi-speaker text-to-speech. The document details that Grad-TTS forms the core of the system, integrating diffusion-based text-to-speech architecture with speaker conditioning techniques. By incorporating speaker conditioning methods from UniSpeech, the model improves the preservation of speaker identity and prosody in the generated speech. Grad-TTS [3,4] retains components like text encoding, duration prediction, and diffusion-based mel spectrogram synthesis, but enhances them with cross-attention mechanisms for better alignment of linguistic and speaker features, resulting in more natural speech synthesis. To prevent overfitting, the reference mel spectrogram is intentionally unrelated to the input text, enabling the model to concentrate on speaker-specific timing and intonation patterns. Speaker identity is meticulously addressed through speaker conditioning strategies, including the application of speaker embeddings and classifier-free guidance during inference to boost expressiveness and ensure speaker consistency. What is GradTTS? Grad-TTS is a cutting-edge text-to-speech (TTS) model that uses diffusion probabilistic modeling to generate high-quality speech from text. Developed by researchers including Vadim Popov and colleagues, it was introduced in a paper accepted to ICML 2021 [2], [3,4]. Key Features Diffusion-based generation: Instead of directly predicting speech features, Grad-TTS gradually transforms random noise into a mel-spectrogram using a score-based decoder. This process is inspired by denoising diffusion probabilistic models. Monotonic Alignment Search: It aligns text input with audio features in a stable and efficient way, improving the quality and consistency of synthesized speech. Flexible inference: Users can control the trade-off between audio quality and generation speed, making it adaptable for different applications. Multilingual and multispeaker support: It can be extended to handle multiple languages and voices, especially when paired with models like HiFi-GAN for vocoding [3,4] My Experiment with Telugu - TTS I developed this Jupyter notebook based on the code I was permitted to download. While the original model included a Gradio-based user interface for interactive demos, I found that working within Jupyter offered greater flexibility and control for my workflow. This notebook demonstrates a modular pipeline for zero-shot voice cloning, integrating the following components: GradTTS: A diffusion-based text-to-speech synthesis model HiFi-GAN: A neural vocoder for high-fidelity waveform generation SpeakerEncoder: Used to extract speaker embeddings from reference audio—my own voice served as the input here Gradio (optional): For web-based interaction, though I opted for notebook-based execution For the Telugu text inputs, I selected verses from the poems of Yogi Vemana, a celebrated philosopher-poet known for his succinct, thought-provoking couplets. These short poems, rich in moral insight and social commentary, served as ideal linguistic material for testing the TTS model. Telugu Text Inputs: 'ఉప్పు కప్పురంబు నొక్కపోలికెను౦డు జూడ జూడ రుచుల జూడ వేరు పురుషులందు పుణ్య పురుషులూ వెరయా' (* I beg your pardon if there are mistakes in this Telugu Text ) Voice Input: My voice in .wav format Input wave form for Encoder: SpeakerEncoder: Used to extract speaker embeddings from reference audio—my own voice served as the input here Outputs: Text to Speech generated audio Encoder (also known as the reference mel spectrogram) and Decoder Mel Spectrum Waveforms. The encoder in this context processes the input phoneme sequence, typically representing a sequence of phonemes, to generate dense vector representations that capture contextual relationships among phonemes. These dense embeddings, obtained from the input text, serve as core linguistic features used in downstream modules like duration prediction and mel spectrogram generation. The encoder's role is crucial in extracting and encoding phonetic and linguistic structures to enable high-quality speech synthesis with accurate prosody and speaker-specific characteristics. The decoder in the described model architecture takes the attention output from the encoder and refines an initial mel spectrogram representation through a de-noising diffusion probabilistic model (DDPM). It iteratively refines the spectrogram through a reverse diffusion process, enhancing the quality of the mel spectrogram for speech synthesis. Additionally, the decoder is conditioned on speaker identity using speaker embeddings from a pre-trained speaker encoder, ensuring speaker consistency and expressive speech generation. The decoder's role is crucial in generating high-quality mel spectrograms that capture both linguistic content and speaker characteristics for natural-sounding speech synthesis. So, the decoder mel spectrogram captures the refined and improved spectrogram representation that incorporates linguistic content and speaker characteristics for natural-sounding speech synthesis Attention Waveform. Attention refers to a crucial component of the model's architecture used to capture speaker-specific timing and intonation patterns in the speech synthesis process. The attention output is generated based on the reference mel spectrogram, queries, and serves as both keys and values within the system. This mechanism allows the model to focus on capturing distinct speaker characteristics and produce speaker-adaptive durations for more expressive and natural-sounding speech synthesis. Conclusions It appears that Grad-TTS is effective, but it might require a phoneme sequence more specific to the Telugu language. A more precise input phoneme sequence might be necessary for training this model. This is just an initial assessment, and I could be mistaken! References AIKosh [2507.15272] A2TTS: TTS for Low Resource Indian Languages https://github.com/huawei-noah/Speech-Backbones/blob/main/Grad-TTS/README.md . Original Paper - https://arxiv.org/abs/2105.06337
- ECG Synthesis: From GPU Training to Edge Deployment on Raspberry Pi
Introduction In my previous blog post, I discussed the input data for the LSTMVAE model, which comes from the publicly available PTB-XL dataset, specifically the single-lead Normal PTB-XL ECG data—a comprehensive clinical ECG corpus. Initially, raw ECG signals are processed using the NeuroKit2 library to detect R-peaks, which serve as temporal anchors for constructing analytic waveforms. Based on these R-peak positions, structured ECG sequences are generated by overlaying physiologically inspired wave components—P, Q, R, S, T, and U—using Dirac delta–type Gaussian functions positioned at appropriate offsets relative to each R-peak. The R-peak data consisted of lists where each inner list contained the indices of all the R-peaks for a single ECG waveform. The idea was to inject controlled, idealistic R-peaks into the input signal and observe how well the model preserves or distorts them during reconstruction. After training, the synthetic data generated and highlighted in the visualization video of the previous blog post. That visualization reveals that the synthetic data produced by the LSTMVAE model distorted the amplitude of the generated synthetic ECG. However, the Neurokit2 library was still able to identify the R-peaks in the synthetic data. Methods to minimize amplitude distortion in synthesized ECG waveforms will be discussed in a future blog post. For now, this blog post details how the trained model is employed to generate synthetic ECG waveforms on a resource-limited device such as a Raspberry Pi. 1. The Challenge: Building a Smart Edge Device Running complex machine learning models on low-power devices like a Raspberry Pi is a significant challenge. Training a sophisticated model like a Variational Autoencoder (VAE) requires massive computational power, but the final application—generating synthetic ECGs for a specific purpose—needs to run efficiently and affordably on a small device. Our solution is a two-step pipeline: build the brain on a powerful machine, then transfer the knowledge to the low-power device. 2. The Training Phase: Forging the Brains of the Operation This phase focuses on creating a high-fidelity model to understand intricate ECG patterns. Training is performed on a machine with an NVIDIA GPU, utilizing Docker pass-through for direct GPU access. Model Architecture (LSTM-VAE): We use a specialized architecture combining convolutional and recurrent networks. The Encoder features a multi-scale Conv1D layer to extract key morphological features (PQRSTU complexes) from the ECG signal, processed by an LSTM layer to capture long-range dependencies. The Decoder employs a similar LSTM to reconstruct the latent representation into a complete waveform. Loss Function: The process incorporates a morphology-aware weighted MSE loss that prioritizes accurate reconstruction of critical PQRSTU peaks over flatline segments, ensuring the model learns a high-fidelity representation of the ECG. This is complemented by a cyclical beta annealing schedule to prevent latent collapse and promote diversity in synthetic ECG generation. Outcome: Following training, the model’s parameters are fine-tuned to accurately encode and decode ECGs, resulting in a smooth, continuous latent space ready for generation. For more in-depth information on this topic, please see the previous posts. 3. The Hand-off: Serializing the Model for Portability Upon completing the training, it is essential to save the model in a manner that facilitates easy transfer to another device. The recommended practice in PyTorch is to save only the model's state dictionary, which encompasses all the learned weights and biases. Saving the State Dictionary: The torch.save () function is employed to serialize the model's parameters into a file (e.g., trained_lstm_vae_weights.pth). This file is lightweight and can be effortlessly transferred to the Raspberry Pi. Advantages of a State Dictionary: This method is both lightweight and robust. It decouples the model's architecture from its parameters, thereby simplifying the process of loading the model into different environments, even if there are minor changes in the surrounding code. 4. The Deployment Phase: Putting It to Work on the Edge On the Raspberry Pi, although we lack a powerful GPU, we can still perform model inference. The primary task of the Pi is to utilize the trained model to generate synthetic ECGs on demand. The Pi's Environment: The Raspberry Pi's Python environment must have a CPU-only version of PyTorch installed, along with the exact class definition of the LSTMVAE model. Loading the Model: We instantiate a new LSTMVAE object on the Pi using the same hyperparameters from the training phase. Subsequently, we load the saved state dictionary into this new model using model.load_state_dict(). Inference on Pi: A final call to model.eval() prepares the model for inference. We can then use its decode() method to generate new synthetic ECGs from random latent vectors. While this process is slower on the Pi's CPU compared to a GPU, it effectively demonstrates how a complex model can be deployed and run on an edge device, making it a cost-effective and portable solution. 5. Conclusion: A Hybrid AI Pipeline This end-to-end process successfully demonstrates a powerful hybrid AI pipeline. By separating the computationally intensive training from the final inference, we can leverage the power of cloud or local GPUs for development and still deploy a functional, intelligent application on an affordable and low-power edge device like a Raspberry Pi. In Figure 1.0, which features a video of generated synthetic data on Pi, the R peaks were intentionally amplified. This was a deliberate approach to stress test the amplitude fidelity of the LSTMVAE decoder, assessing how the model preserves or distorts exaggerated R peaks in the input signal during reconstruction. This also helps uncover latent bottlenecks or biases in the loss function that prioritize smooth reconstruction over amplitude fidelity—a topic to be discussed in a future blog post. Figure 1: A video demonstration of synthetic ECG signals generated on Raspberry PI from five randomly selected samples. R-peak positions, identified using the Neurokit2 library, are highlighted throughout. The visualization reveals that the LSTM-VAE decoder tends to compress or distort sharp R-peak transients. To probe this behavior, controlled amplitude enhancements were introduced at the detected R-peak locations using Gaussian-shaped bumps. Additionally, the median R-peak positions within defined temporal bins across the 1000-sample ECG sequences are displayed. These median values are computed from the entire input training dataset and serve as a reference for comparing synthesized outputs against the original data distribution. 6. Side Note-Histogram and Histograms of Ingested Data to LSTM-VAE model We have referred to the median values in the previous figure above within the context of analyzing the distribution of R-peak locations in input ECG waveforms. The goal was to understand where the R-peaks tend to occur in the input dataset. Raw Data: The R-peak data was a list of lists (e.g., x2_read), where each inner list contained the indices of all the R-peaks for a single ECG waveform. Objective: The objective was to get a single statistical value (the median) that represents the central tendency of the R-peak locations within specific temporal ranges or "bins" of your 1000-sample ECG signals. This would give you a more nuanced understanding of your data's distribution than just the simple frequency count from a histogram Figure 2.0 Histogram of counts of at each of R Peak locations in Input Data. The input data is a normal ECG waveform taken from the PTB-XL dataset. Plotnine library's theme_xcd is used for this plot. Figure 3 Boxplots illustrating the distribution of R-peak indices across temporal bins within 10-second ECG signals sampled at 100 Hz (total length: 1000 samples). The x-axis represents sequential bin ranges spanning the full signal duration, while the y-axis denotes the index positions of detected R-peaks. Each boxplot summarizes the spread of R-peak locations within its respective bin, and the count of R-peaks in each bin is displayed above the corresponding box. This visualization provides insight into the temporal density and variability of R-peak occurrences across the input dataset.
- Exploring ECG Data: PTB-XL for Training Generative Models
Electrocardiogram (ECG) data is essential for assessing heart health. In data science and engineering, particularly when developing AI algorithms, access to a diverse range of high-quality ECG data is crucial. This blog post, inspired by dissertation research on generating synthetic ventricular tachycardias, will delve into key aspects of ECG data sources and the significant challenge posed by noise and artifacts. Building on my previous post about synthetic sine wave generation, this entry introduces a practical example. We will explore various publicly available ECG data sources before concentrating on the PTB-XL data source [8] for training the Variational Autoencoder, and later the Generative Adversarial Model, as well as combinations of both (VAE and GAN). Excited to share PTB_XL_Data_Overview.ipynb, my Google Colab notebook for an in-depth look at the PTB-XL ECG dataset. WFDB at its Core: This notebook extensively uses WFDB, PhysioNet's [ PhysioNet Databases ] robust software for managing and processing physiological signals. You'll see how it enables efficient data handling, advanced signal processing (filtering, resampling), and deep analysis (heartbeat detection, feature extraction) vital for cardiac research. PTB-XL Data Insights: Explore how I've structured and loaded PTB-XL's ECG data, handled its rich metadata, and processed diagnostic information. The notebook demonstrates flexible data loading with custom sampling rates and smart aggregation of diagnostic classes using SCP codes. Access by Request: To ensure personalized support and quality, I'm offering access on a request basis. Get Access: Simply fill out this quick form: [ Response - Google Sheets ] Heads Up: A Gmail account is preferred for direct access. I'll review requests regularly and notify you via email once access is granted. Understanding the ECG Waveform and Its Features A typical ECG signal provides insights into cardiac, cardiovascular, and cardiorespiratory functions. It consists of key components: P wave: Represents atrial depolarization. QRS complex: Signifies ventricular depolarization and the heart's powerful pumping action. An abnormally large Q wave can indicate a past heart attack. T wave: Represents ventricular repolarization (relaxation and recovery phase) and is vital for cardiac electrical stability. PR interval: The time between the P wave and QRS complex. QT interval: The time between the QRS complex and T wave. RR interval: The time between successive R-peaks, often used interchangeably with NN interval, emphasizing normal heartbeats. Deviations from these normal features, known as arrhythmias, can indicate various cardiac issues. These abnormalities can manifest as altered heart rates, irregular rhythms, or changes in wave morphology The ECG waveform illustrates key cardiac electrical events, including the P wave, QRS complex, T wave, and U wave. It is annotated with essential intervals such as the PR interval (0.12–0.20 sec), QRS duration (0.08–0.10 sec), QT interval (0.4–0.43 sec), and RR interval (0.6–1.0 sec). Additionally, the ST segment, J point, and TP segment are marked to aid in comprehensive ECG interpretation. Voltage is scaled at 0.1 mV, while time markers indicate 0.04 sec and 0.2 sec intervals. The Landscape of ECG Data Sources When working with ECG data, researchers often turn to comprehensive resources like PhysioNet [ PhysioNet Databases ]. Established in 1999 under the National Institutes of Health (NIH), PhysioNet offers free access to a vast collection of physiological and clinical data, coupled with open-source software for analysis. Key components include: Physiobank: A digital archive containing ECG recordings from healthy individuals and patients with various cardiac conditions. PhysioToolKit: A library of open-source software for processing and analyzing physiological signals. PhysioNet hosts numerous ECG databases tailored to different research needs, such as studying arrhythmias or benchmarking algorithms. Some notable databases include: MIMIC-III Waveform Database [1]: Contains a wide range of physiological signals, including ECG waveforms, from critical care patients. ICENTIA11k [2]: A large-scale ECG dataset with 11,000 patients and over 2 billion labeled beats. Chapman-Shaoxing 12-Lead ECG Database [3]: Focuses on arrhythmia research with 12-lead ECG recordings. MIT-BIH Malignant Ventricular Ectopy Database (MVED) [4]: A collection of ECG recordings for detecting and analyzing cardiac arrhythmias, specifically malignant ventricular ectopy. MIT-BIH Arrhythmia Database [5]: Widely used for developing and evaluating algorithms for cardiac arrhythmia detection, ECG signal processing, and machine learning applications, featuring 450 half-hour recordings and expert annotations of over 650,000 heartbeats. CU Ventricular Tachyarrhythmia Database [6]: Contains 35 eight-minute ECG recordings of patients with sustained ventricular tachycardia, flutter, and fibrillation. INCART Database [7]: A collection of ECG recordings from intensive care unit (ICU) patients, with 346 recordings from 146 patients. PTB-XL [8] - A large publicly available electrocardiography dataset (version 1.0.1) A Closer Look at the PTB-XL Database [8] Among these valuable resources, the PTB-XL ECG Dataset [8] stands out as a large and freely accessible dataset, providing 21,837 clinical 12-lead ECG records from 18,885 patients. Each record is 10 seconds long. What makes PTB-XL particularly valuable is its inclusion of predefined train-test splits based on stratified sampling, which helps address limitations of datasets that only provide raw data. The diagnostic classes in the PTB-XL dataset are interconnected through a structured system that categorizes specific cardiac abnormalities and pathologies based on common characteristics and clinical interpretations. Each diagnostic class represents a broader category of ECG findings, while subclasses provide further granularity by specifying particular types of abnormalities within each class. The linkage between classes is established through a classification scheme that organizes related diagnostic statements into hierarchical relationships. For instance, within the superclass "CD" (Conduction Disorders), subclasses like "LAFB/LPFB" (Left Anterior Fascicular Block/Left Posterior Fascicular Block) and "CRBBB" (Complete Right Bundle Branch Block) are grouped under this overarching category of conduction abnormalities. Similarly, subclasses under the superclass "HYP" (Heart Hypertrophy) such as "LVH" (Left Ventricular Hypertrophy) and "RVH" (Right Ventricular Hypertrophy) are linked by their shared characteristic of hypertrophy in specific regions of the heart. Furthermore, the subclasses within the superclass "MI" (Myocardial Infarction) are connected based on the location and nature of myocardial ischemic injury. Subclasses like "IMI" (Inferior Myocardial Infarction), "AMI" (Anterior Myocardial Infarction), and others represent distinct types of myocardial infarctions, each linked to the superclass through their pathological features. By establishing these interconnections between diagnostic classes and subclasses, the PTB-XL dataset offers a comprehensive framework for classifying and understanding diverse ECG abnormalities and cardiac conditions The PTB-XL dataset comprises two main files: ptbxl-database.csv: Contains the main dataset information, including ECG recordings and metadata such as patient information, signal data, and diagnostic annotations. scp_statements.csv: Details the SCP-ECG statements used in the dataset, representing specific findings or characteristics in the ECG recordings. These statements provide structured and standardized information about diagnoses, forms, and rhythms, and are linked to the ECG records for integrated analysis. The dataset is categorized into: Diagnostic statements: Describe abnormalities or specific findings (e.g., non-diagnostic T abnormalities, abnormal QRS, ventricular premature complex). Form statements: Provide information about the overall ECG pattern (e.g., sinus rhythm, atrial fibrillation). Rhythm statements: Show specific rhythm patterns (e.g., sinus tachycardia, sinus arrhythmia). The use of "strat_fold" in PTB-XL indicates stratified sampling, ensuring that subsets of the data maintain a representative distribution of characteristics, which is crucial for training robust machine learning models, especially with imbalanced datasets. Visual overview of the PTB-XL dataset, illustrating the hierarchical organization of diagnostic super-classes and subclasses using a sunburst chart. The chart shows the main diagnostic categories at the center, with subsequent rings providing more detailed levels of diagnostic sub-classification and representing individual data points or subcategories. The numbers represent counts of records for different diagnostic superclass and subclass co-occurrences The black inset window displays diagnostic superclasses, while the plot window features a typical ECG waveform generated using the Neurokit2 [9] Python library based on PTB-XL data set. This visualization provides a clear representation of ECG data, highlighting key diagnostic insights. Training of Generative Models Training Progress of Variational Autoencoder (VAE) — The figure presents the training summary of a VAE model. The left section outlines the model architecture and parameter statistics, while the right section plots the average training loss per epoch. The consistent downward trend in the loss curve illustrates effective convergence and optimization of the network during training Efficient Latent Space Sampling for Synthetic ECG Generation Using Circular Buffers. This video demonstrates the creation of synthetic ECG signals with a Variational Autoencoder (VAE). Random latent codes, representing abstracted ECG features, are drawn from the VAE's latent space and input into a circular read buffer for effective, pseudo-continuous way . Once the buffer is full, the decoder converts these latent codes into synthetic ECG waveforms. The write buffer stores these outputs along with their Empirical Mode Decomposition (EMD) components t₁ and t₂ . The red waveform in the right panel window depicts the synthetic ECG based on the blue t₁ waveform (in the right panel window). A modified NeuroKit2 [9] ECG plotting function is employed to plot ECG ( t₁ ) and selectively renders ECGs based on their quality—either Excellent or Barely Acceptable . This procedure repeats at regular intervals to generation of diverse ECG signals. References 1. Moody, B., Moody, G., Villarroel, M., Clifford, G. D., & Silva, I. (2020).’ MIMIC-III Waveform Database (version 1.0). PhysioNet’. https://doi.org/10.13026/c2607mMIMIC-III Waveform Database, Published: April 7, 2020. Version: 1.0. MIMIC-III Waveform Database v1.0 ( physionet.org ) 2. Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). ‘Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet’. https://doi.org/10.13026/kk0v-r952 . Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset v1.0 ( physionet.org ) 3. Zheng, J., Guo, H., & Chu, H. (2022). ‘A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0.0). PhysioNet’. https://doi.org/10.13026/wgex-er52 . 4. Greenwald SD. Development and analysis of a ventricular fibrillation detector. M.S. thesis, MIT Dept. of Electrical Engineering and Computer Science, 1986. 5. Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng in Med and Biol 20(3):45-50 (May-June 2001). (PMID: 11446209) 6. Nolle FM, Badura FK, Catlett JM, Bowser RW, Sketch MH. CREI-GARD, ‘A new concept in computerized arrhythmia monitoring systems’. Computers in Cardiology 13:515-518 (1986). CU Ventricular Tachyarrhythmia Database v1.0.0 ( physionet.org ) 7. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101 (23):e215-e220 [Circulation Electronic Pages; http://circ.ahajournals.org/cgi/content/full/101/23/e215 ]; 2000 (June 13). St.-Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database 8. Wagner, P., Strodthoff, N., Bousseljot, R., Samek, W., & Schaeffter, T. (2020). ‘PTB-XL, a large publicly available electrocardiography dataset (version 1.0.1). PhysioNet’. https://doi.org/10.13026/x4td-x982 9. Makowski, D., Pham, T., Lau, Z. J., Brammer, J. C., Lespinasse, F., Pham, H., Schölzel, C., & Chen, S. A. (2021). NeuroKit2: ‘A Python toolbox for neurophysiological signal processing. Behavior Research Methods’, 53(4), 1689-1696. https://doi.org/10.3758/s13428-020-01516-y
- litstudy in Action: A Real-World Example of Scientific Literature Analysis
This blog delves into the science of exploring scientific literature, showcasing how tools like Litstudy [1] and Jupyter Notebook can transform your approach to research. Drawing from my experience using Litstudy to craft the introductory chapter of my M.Tech dissertation's literature survey, this post provides an overview of its features. Designed for readers familiar with Jupyter Notebook [3], it highlights how Litstudy enables you to effortlessly navigate new research domains, analyze scientific papers, and enhance your analysis with interactive visualizations . Figure 1. 0 Software architecture of litstudy, showcasing its integration with Jupyter notebooks and Python scripts. The system uses various libraries such as bibtexparser, pandas, and gensim for metadata retrieval, analysis, and visualization. Search Functionality of Litstudy The litstudy [2] search feature aggregates metadata from scientific publications, highlighting important aspects such as the title, authors, publication date, and DOI. However, it does not offer access to the complete documents. This function enables users to obtain a detailed overview of literature databases. A DOI serves as a unique link for digital content, offering crucial metadata for reliable citations and access. The Various literature databases and their associated functionalities supported by this library include: Scopus: Offers functions to retrieve metadata for documents using identifiers (DOI, PubMed ID, Scopus ID) or approximate title-based searches. Enables submitting queries to the Scopus API and importing CSV files exported from Scopus. Semantic Scholar: Allows metadata retrieval using various identifiers (e.g., DOI, ArXiv ID, Corpus ID). Provides functions for refining metadata, submitting queries to the Semantic Scholar API, and obtaining results. CrossRef: Offers metadata retrieval for documents using DOI with timeout settings to manage server communication. Supports querying the CrossRef API with options for sorting and filtering results. CSV: General-purpose CSV loading with options for field name customization and filtering. litsudy [2] attempts to infer field purposes or use explicitly defined fields. Other Platforms: Functions for importing metadata from specific databases, including IEEE Xplore, Springer Link, DBLP, arXiv, BibTeX, and RIS files. Each platform supports specific queries or file formats for metadata retrieval. Document handling classes: Document: Represents a single publication, storing metadata such as title, authors, abstract, DOI, and publication source. DocumentSet: Manages a collection of documents, supporting filtering, deduplication, merging, and set operations (e.g., union, intersection). Author: Represents an author, containing details like name and affiliations. PublicationSource: Defines the source of a document (e.g., Scopus, PubMed), tracking source-specific metadata. Figure 2.0 UML diagram illustrating the relationships among key classes in Litstudy, including Document, DocumentSet, Author, and Publication Source. It highlights the attributes, functionalities, and interactions within the system for handling scientific publication metadata. Fetching Documents using Search Query Figure 3.0 illustrates Python code snippets utilizing the " litstudy.search _semanticscholar" function. It showcases the retrieval of academic documents based on different search queries [z1-z3] along with corresponding document counts. Enables operations such as union (|) and intersection (&). Figure 4.0 Documents that are fetched from Semantic Scholar Create a Document Corpus from a Collection of Documents The build_corpus function prepares text data from documents for tasks like topic modeling, natural language processing, and clustering. It uses advanced preprocessing techniques (e.g., token filtering and n-gram merging) and outputs word-frequency vectors alongside a word-index dictionary, integrating seamlessly with other Litstudy features . Topic Modeling: Discovers popular topics and trends in scientific publications using NLP methods (e.g., LDA, embedding-based techniques) and generates topic lists with keywords, visualizations, and thematic insights. NMF Model: Extracts latent topics from document corpora with adjustable parameters (e.g., number of topics, iterations) to support literature reviews, research exploration, and thematic grouping. Retains important context that could be lost if words are analyzed individually N-Gram Merging: Combines frequently co-occurring tokens into meaningful phrases (e.g., bigrams, trigrams) to enhance text representation and improve accuracy in clustering and topic modeling. Topic Clouds This feature visualizes topics identified through modeling by creating customizable word clouds, where the size of the words indicates their significance, helping users intuitively understand key terms and thematic structures. It accepts trained topic models (such as NMF or LDA) and allows adjustments to parameters like font size and color schemes. These visualizations can be saved as images, used for topic labeling, and easily integrate with Litstudy pipelines and Jupyter Notebooks for interactive exploration. Figure 5.0 Word clouds illustrating key terms for various topics, highlighting the central concepts with size denoting significance. These visualizations offer a quick grasp of thematic structures and aid in topic labeling and further analysis. DocumentSet enables efficient management and analysis of document collections. The member function "best_documents_for_topic" in this class is utilized to pinpoint the most pertinent documents for a specific topic. Figure 6.0 The `litstudy.nlp.TopicModel` is a trained model designed to analyze topics and relationships within documents and tokens. It incorporates matrices to represent topic-document and topic-token mappings, along with the `best_documents_for_topic` function that identifies key documents associated with a specific topic. Plot Embedding The Litstudy Plot Embedding functionality visualizes document relationships in a low-dimensional scatter plot, where proximity indicates similarity. It uses precomputed embeddings generated by techniques like word or sentence embeddings and topic modeling Figure 7.0 A scatter plot visualizing relationships between documents in a low-dimensional space, where proximity of points reflects document similarity based on embeddings generated through techniques like word and sentence embeddings or topic modeling Citation Network This feature illustrates the citation network of scientific papers by generating a graph where nodes symbolize documents and edges indicate citation links. By utilizing a DocumentSet as input, it permits customization of node size, edge thickness, and color coding according to metadata such as publication year or source. The resulting graph can be either interactive or static, facilitating the analysis of relationships and connections among documents. Figure 8.0 A graph visualizing the citation network of scientific publications, where nodes represent documents and edges depict citation relationships. The visualization is customizable with parameters such as node size, edge thickness, and metadata-based color coding, highlighting connections and relationships between documents Summary The key components related to Litstudy for M.Tech dissertations in scientific literature research include: Comprehensive Metadata Management: Enables the retrieval and management of metadata from various literature databases such as Scopus, Semantic Scholar, and CrossRef. Advanced Text Analysis: Offers features like document filtering, n-gram merging, and corpus creation for NLP applications. Topic Modeling and Visualization: Provides tools to identify trends using LDA, NMF, word clouds, and embedding plots for thematic insights. Citation Network Analysis: Illustrates citation connections among documents with customizable graphs for detailed research linkages. Seamless Workflow Integration: Integrates with libraries like bibtexparser and pandas in Jupyter Notebooks to enhance analysis and interactivity. References S. Heldens, A. Sclocco, H. Dreuning, B. van Werkhoven, P. Hijma, J. Maassen & R.V. van Nieuwpoort (2022), "litstudy: A Python package for literature reviews", SoftwareX , 20, 101207. DOI: 10.1016/j.softx.2022.101207 . Literature Databases — litstudy 0.1 documentation Link to Jupyter Notebook available on request from Google Collab








