February 16, 2021
By Dr. David Kremelberg
As part of a large-scale project for a California medical start-up requiring the development of a predictive model which would connect the readings taken from their medical device to the outcome of patients having fluid in their lungs following a heart attack, DK Statistical Consulting, Inc. worked with them on an in-depth analysis of some challenges they were facing.
One of these was that following a heart attack, patients can be classified as having coronary heart failure (CHF) and having fluid in their lungs. They can also present as having CHF without fluid in their lungs, or be classified as “other,” which includes asthmatic patients as well as hypochondriacs. Problematically, it is difficult for those in the medical field to accurately classify patients into one of these three categories.
A patient with CHF may present a variety of related conditions. For example, a patient with CHF may develop another condition, such as pneumonia, which could be the real reason behind symptoms including difficulty breathing.
While the diagnosis of CHF is more of a quantitative test with accuracies greater than 90%, involving measures such as the concentration of brain natriuretic peptide (BNP), diagnosing fluid in the lungs is more qualitative. It involves tests such as listening to the lungs. Accuracies regarding the latter, among emergency room doctors, is only approximately 40-60%, and generally only incorporates a single measurement per subject.
Pulmonary edema is a condition caused by excess fluid in the lungs. This fluid collects in the numerous air sacs in the lungs, making it difficult to breathe.
In most cases, heart problems cause pulmonary edema. But fluid can collect in the lungs for other reasons, including pneumonia, exposure to certain toxins and medications, trauma to the chest wall, and traveling to or exercising at high elevations.
Pulmonary edema that develops suddenly (acute pulmonary edema) is a medical emergency requiring immediate care. Pulmonary edema can sometimes cause death. Treatment for pulmonary edema varies depending on the cause but generally includes supplemental oxygen and medications.
Other causes of pulmonary edema include heart attack, or other heart diseases; leaking, narrowed, or damaged heart valves; sudden high blood pressure; pneumonia; kidney failure; lung damage caused by severe infection; and severe sepsis of the blood, or blood poisoning caused by infection.
The collected data presented numerous challenges.
Many challenges were presented, the first of which consisted of datasets which were dozens of gigabytes in size. This was a result of the fact that their device, which took electrical readings from the human body using a series of electrodes, took thousands of readings per second per patient. Specifically, 200,000 measurements were taken for each 2.5 minutes of recording, with measurements taken in millivolts, milliamps, and ohms.
In designing their medical device and in the approach to their study, the medical device team used the human body as a model, modeled as a series of capacitors and resistors. Mirroring this, the medical device developed consisted of hardware that could conform to this model.
It was determined that these datasets could most easily be manipulated in SAS, which was used initially for data cleaning and manipulation. Following this, smaller, more manageable summary datasets were created, which did not involve the loss of any important data. All of this allowed all analyses conducted to be run on a standard laptop, obviating the need to use one of the company’s servers.
It was critical that highly predictive models be developed. Challenges presented themselves since the construction of these models involved peeling of the electrodes, which was not binary and could vary based on patient group. This could potentially cause the sine wave to be clipped, so special scripts were developed to detect these types of measurements.
Some patients collected a great deal of fluid, some of which presented itself in their skin, as opposed to their lungs: this possibility also had to be accounted for with respect to any predictive models that were constructed.
There was also the possibility of extreme levels of noise to be present in these data, along with anomalous readings. Factors that needed to be considered included the distributions of the data, resistivity differences, distribution trends and shifts, and different types of hardware having been used on different patients.
Robust models increased prediction accuracies on novel patients.
In addition, robust models were required, which would produce a similar level of prediction accuracy when analyzing data collected from new patients. To achieve this, cross-validation was required to ensure that any models generated were not overfitting the data. These robust models would also have to be accurate for all types of patients, regardless of gender, age, or body type, weight, or height.
As logistic regression did not produce models with acceptable prediction accuracy, neural networks were instead used for the purposes of generating predictive models. The final set of mulilayer perceptrons, which incorporated factors including frequency, sensor size, race, gender, age, skin type, and BMI, found prediction accuracies of 99-100%, allowing for a level of prediction approaching certainty regarding the determination of whether patients had fluid in their lungs, using only the recorded data derived from the company’s medical device.
About Dr. David Kremelberg: Dr. David Kremelberg is the Principal and Founder of DK Statistical Consulting, Inc. and has worked full-time as a statistical consultant since 2009. In that time, he has helped hundreds of clients in over 25 countries and across multiple fields — psychology, sociology, marketing, management, economics, medicine, biology, political science, chemistry, archaeology, and others.