Some Idle Thoughts on a Metric For Threshold Normalized Deviation

Back to Blog Index

July 28th, 2025

A colleague of mine asked me a question as to what a good metric would be for the following scenario: they wanted to look at a given patient in a dataset before and after some change that resulting in that patient having a different measurement, and identify how much a measurement associated with that patient changed, but that they wanted that change in relation to what was expected of that dataset.

That’s a bit of a vague description which makes it sound like the ideal solution would be something like a measure of difference in terms of z scores, but what they want becomes more clear with an example, which they gave as follows: If Elon Musk lost one hundred billion dollars of net worth, that’s a lot of money lost, but he’s probably fine and not a lot changes for him. If the median American family lost one hundred thousand dollars of net worth, that’s really quite bad for them.

From this, my first thought was that an appropriate measure would just be a percentage loss, or perhaps looking at something related to the absolute deviation around the median, but there’s something missing there, in that even if Elon loses 50% of his net worth of ~400 billion at time of writing, it is unlikely to affect his actual life much, as far as I know. Thinking on this more, my thought on what’s comparable between a prediction scenario that we’re interested in and a scenario like the money loss one above is that there’s a point of interest in which fluctuations around it matter significantly more. In the money example, there is (in my opinion) a general point of wealth where all material needs are effectively met and after that there are returns in regards to luxury, which eventually diminish, and after that, everything else is gravy.

To bring it into the realm of prediction where we were working, the point of interest is something like a threshold for decision making in binary classification, where we classify a subject as a case when their decision score s is greater than or equal to the threshold t. Measures of central tendency like mean are generally inappropriate in many of these situations because of the warping effect of outliers like multi-billionaires in the money scenario. We’d also prefer to have something that has some sort of directionality in relation to the threshold while maintaining information about how the individual subject’s observation compares to the overall distribution. Perhaps we care in some rare cases more about how subjects move in relation to the threshold more than we do about how they move in relation to everyone else independent of the threshold.

My thought was that we would like to have something like the z-score but with respect to this threshold or point of interest, and from there I thought of the average absolute deviation and put together the following metric while thinking about that, which I haven’t seen come up elsewhere so wasn’t sure what it would be referred to as. I suppose it’s the deviation normalized by the mean absolute deviation around the threshold, but it seems like it could be possibly useful in a couple of the contexts I often look at measures in. I suspect that generally it’s probably best to use one of either the mean absolute deviation around the mean or mean absolute deviation around the median in most cases, but I’m curious about which cases something like this metric could be useful.

The definition is as follows. Given an observation s from set G (containing n elements), the threshold normalized deviation of that observation around threshold t is equal to the following:

(xt)/AAD(G,t)

Where AAD(G,t) is equal to the following:

1/n * (SUMi=1→n | Gi - t |)

Where Gi represents the ith element of set G.

So, we calculate the absolute average deviation of the dataset in relation to the threshold, and then we use that to normalize the difference of the observation from the threshold (maintaining directionality information for a given subject) by what is effectively the variability of the dataset around the threshold point (which could have been chosen for any external reason, such as an amount of money indicating that an individuals needs are met, or a threshold by which taking action for a given patient is expected to yield positive results given the prior establishing studies on the model, or a clinically relevant value in relation to predicted output units).

Then using this one can compare the participant’s value before and after some change with respect to a constant threshold, while also taking into account the dispersion of the dataset in relation to that same threshold, and compare effectively whether they had a “large” change or an insignificant one in relation to where they were before. This would, I think, capture both the shift of the dataset around that threshold as well as whether or not the patient had a large shift.

This looked pretty reasonable when playing around with some toy datasets, but I couldn’t find any mentions of people using something similar in the past. The scenario I would likely consider using something like this in is one where we have some live study that we’re running where we intervene at a specific predicted probability from a model, and we’re tracking patients and taking in data to generate a probability for these patients each day. We would often be interested in identifying times where a patient shifted a large amount with respect to the rest of the patient population and how that relates to the intervention threshold we set. For example, something like patients who were in the bottom of the probabilities in the dataset, and then all of a sudden jumped quite high from where they were before. A related but very different group from just having a high probability in an early appearance. Of course usually I just look for absolute difference there since the measurement is bounded, but a nice thing about this particular metric though is that because of the normalization, it can be applied regardless of the range or units of the measurement in question, and the values from it would be comparable over different types of scoring measurements or values, so we can compare a couple different measures that all have different units pretty easily in how much they’ve shifted in respect to their previous distribution and anchor point. It should in theory work just as well for the money example as for a bounded prediction example, in terms of interpretation.

Does this exist already? Has anyone run into anything similar to this? Is this a dumb idea? Let me know if you have any thoughts or find this interesting.

Theodore Morley