Data Validation Ranges: Ensuring Accuracy
Hey team, let's dive into some data validation ranges for our maternal health risk classifier! I've been working on the checklist and drafted some ranges for our dataset. Before we roll out the Pandera schema, I wanted to get your awesome feedback on whether these ranges make sense. Let's make sure our data is squeaky clean so we can build the best model possible! We'll cover the Age, SystolicBP, DiastolicBP, BS, BodyTemp, HeartRate and the target variable RiskLevel.
Age Range: Striking the Right Balance
First up, let's talk about Age. We've set the validation range to be between 12 and 60 years old. This range is super important because it directly impacts the health of both the mother and the baby. The lower limit of 12 years accounts for those rare but possible early pregnancies. Now, pregnancies below 15 are often medically concerning, but we want to be inclusive. The upper limit of 60 years represents the maximum biologically feasible range for pregnancy, though pregnancies beyond menopause are extremely rare and usually involve medical intervention. You know, these cases are like unicorns! They are very rare, and they might skew our model since they represent special situations. In rare cases, we might find values outside this range. If we do, we should consider them extreme outliers, not really suitable for what we're trying to predict. We're aiming to classify routine maternal health risk, so these extremes might add noise. The goal is to build a model that accurately reflects the common scenarios we'll encounter in maternal health. Our data validation is a core step for having that success. So, do these ranges seem on the mark, or should we adjust things to be a bit more inclusive? We must decide if we will include the data that falls outside the 12-60 range.
Blood Pressure: Systolic & Diastolic - Finding the Sweet Spot
Next, let's chat about blood pressure, specifically SystolicBP and DiastolicBP. For SystolicBP, we've chosen a range of 60-200 mmHg. Now, this is important because it tells us about the pressure in your arteries when your heart beats. Anything below 60 mmHg can signal severe hypotension (dangerously low blood pressure), while anything above 200 mmHg can indicate severe hypertension (dangerously high blood pressure). Both scenarios are like flashing red lights, requiring immediate medical attention. Our model is built to classify routine maternal health risks. So, values outside this validation range are outside the scope of our classification model. We would consider these outliers. For DiastolicBP, we have a range of 40-140 mmHg. This measures the pressure in your arteries when your heart rests between beats. Again, values outside this range can signal serious issues. We treat these extremes as outliers. Remember, our goal is to classify typical maternal health risks, so these ranges help us focus on the relevant data.
Systolic Blood Pressure
Let's get into the nitty-gritty of the systolic blood pressure. We set the data validation for it between 60-200 mmHg. As you know, the systolic blood pressure reflects the pressure in your arteries when your heart pumps blood. Low blood pressure (below 60 mmHg) is a problem, as it might mean your body isn't getting enough blood. This is a critical situation that needs immediate medical care. Conversely, high blood pressure (over 200 mmHg) can be just as dangerous. It indicates that the heart is working too hard, putting you at risk for serious issues like stroke or heart attack. Since we focus on classifying the common maternal health risks, the values falling outside this range would not suit our model. They are more likely to be outliers. Outliers can mess up our analysis, so that is why we should exclude them. If our model includes those values, it might not work well on the regular cases. We are aiming for a model that can perform the risk classification in common conditions.
Diastolic Blood Pressure
For the diastolic blood pressure, we have set a data validation range of 40-140 mmHg. Diastolic blood pressure is equally important. It is the pressure in your arteries when your heart is relaxing between beats. Extremely low (below 40 mmHg) or high (above 140 mmHg) diastolic blood pressure levels can also be life-threatening and require immediate medical intervention. These extreme values are a signal of potential health problems. We should consider these values as outliers, just like with the systolic blood pressure. These situations are important to recognize, but they might not be the focus of our specific model for assessing the common risks in maternal health. We want to identify the common risks, so we exclude the outliers.
Blood Sugar and Body Temperature: Keeping Things in Balance
Now, let's talk about BS (Blood Sugar) and BodyTemp. For BS, we've set a validation range of 1-25 mmol/L. Extremely low blood sugar (below 1 mmol/L) indicates severe hypoglycemia, and extremely high blood sugar (above 25 mmol/L) indicates severe hyperglycemia. Both of these situations require immediate hospitalization and are therefore unsuitable for our predictive model. For BodyTemp, we've chosen a validation range of 95.0-105.0°F. Anything below 95.0°F indicates severe hypothermia, while anything above 105.0°F indicates severe hyperthermia. Both are emergencies and unsuitable for our model. These ranges are all about making sure we focus on the data that's most relevant to the routine health assessments we're classifying.
Blood Sugar
Let's keep going on the journey of validation. Our model's data validation for BS is between 1-25 mmol/L. This range helps ensure our data is useful. Very low blood sugar (hypoglycemia) can be dangerous. It means your body doesn't have enough fuel. Very high blood sugar (hyperglycemia) is also dangerous. It can be a sign of diabetes or other serious problems. Both of these conditions are emergencies, not related to the common maternal health risks. Since we are trying to assess the routine risks, the values that fall outside this range would be unsuitable. This helps us focus our model and make it more accurate for typical cases. This helps our classification model to be on point.
Body Temperature
For the body temperature, we have a data validation range of 95.0-105.0°F. The body temperature validation is also critical for our analysis. If someone's temperature is too low (hypothermia), it means their body is losing heat faster than it can produce it. Too high, and it is hyperthermia, which can damage your body. Both of these conditions require immediate medical intervention, making them unsuitable for our specific risk classification model. These extreme temperature values are not typical of routine maternal health scenarios. They could distract our model from the cases we are looking for. So, we exclude them. Our goal is to build a model that reflects the common conditions. That's why we validate the data within these specific ranges.
Heart Rate and the Target Variable: Wrapping Up
Finally, let's look at HeartRate and the target variable, RiskLevel. For heart rate, we've chosen a validation range of 50-150 bpm (beats per minute). If the resting heart rate is outside this range, it might indicate some cardiovascular issues. These values aren't in line with a normal maternal health assessment, and we can consider them outliers. The target variable RiskLevel is a categorical variable representing maternal health risk, and it must contain one of these: low risk, mid risk, and high risk. That wraps up the ranges, guys! What do you think? Any adjustments we need to make to ensure our model is top-notch?
Heart Rate
Here's what you need to know about the heart rate, and why we set a data validation range of 50-150 bpm. Think of your heart rate as your heart's personal tempo. If it's too slow (below 50 bpm) or too fast (above 150 bpm), this can be a sign of a cardiovascular problem. These aren't common values for a normal maternal health assessment. We treat them as outliers. Our model is built to assess the most common risks. We want to avoid being distracted by these extreme values. These are all about making sure we focus on the data that's most relevant to the routine health assessments we're classifying.
Target Variable
To ensure our model's success, the RiskLevel is our target variable. It is a category that helps us classify the level of health risk based on clinical assessments. It must match one of the three options: low risk, mid risk, or high risk. The accurate categorization of our data is essential for the reliability of our model. It is a crucial part of our analysis.