CHAPTER 7 LPV CLOCKING MODELING AND EXPERIMENTAL INTEGRATION
B.2 DATA ANALYSIS-BASIC DEFINITIONS
B.2.5 Uncertainty bands and noise bands
Section B.2.2 discussed the creation of time-average data from the time-resolved data. At that time it was mentioned that the time-average is just one of the statistics that can be generated from the time-accurate data. There are other types of statistics as well that are broadly called measures of dispersion. Two that are used quite often in this work are the standard deviation and the peak-to-peak variation. Often these quantities are defined as the “error”, and while that may be true in certain instances, that is an
interpretation that is not accurate in the context of this research area. To start with, we need to mathematically define these operations. It is important to note that these
197
operations can be done on any array of more than two data points, although the
interpretation of the results may change significantly as fewer points are used. Also as a side note, for samples fewer than 30, to achieve the correct interpretation of the
dispersion from the standard deviation as calculated below, the Student’s T factor needs to be applied. This is discussed in detail in various statistics books. One of the best for general review is that by [26].
Average of P(i), i= 1to N, (P)= i= 1SN P(i) N
Standard devition of P(i)i= 1to N,sP= 1
N– 1i= 1SN P(i) –P 2
Peak to Peak Variation of P(i)i= 1to N,P–P = Max(P(i)) –min(P(i)) Now returning to the example data shown in Figure B.1, we would like to characterize the data over this time range. Creating the average is easy enough as is the standard deviation, but the interpretation is questionable. The standard deviation is clearly some type of measure of dispersion, but is it an error? From the data shown in Figure B.3 we know that a lot of the energy content is contained in the main blade passing frequency and its harmonics. This component is not a noise, but rather a measure of the true periodic variation that occurs due to blade passing. So, how then do we quantify the variation?
The answer lies in deciding what the information will be used for. And as noted previously one has to make sure that both the underlying data and the procedure are well documented since they can often be confused. The standard deviation of the data in Figure B.1 will not be the same, nor will it have the same interpretation as the standard deviation of the data from Figure B.1 if it were filtered at 9 KHz, or if it were filtered at 25 KHz. A more detailed explanation is in order.
Examining the data in Figure B.1, one can easily generate the average value and the standard deviation for this array of data using the definitions presented above.
Average = 72.25 KPa
Standard Deviation (STD) = 1.29 KPa
198
Now if the deviations (the differences between the each data sample of the time-window and the average of the time window) were the result of a random error one would expect the deviations from the average to be distributed in a Gaussian
manner.
0 5 10 15 20 25 30 35
-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
Count
Deviation from mean (KPa) Average = 72.25
STD (s)= 1.29 99.76 % are within ±3s 96.1% are within ±2s 68.9% are within ±s
Figure B.4 Distribution of Deviations from Average for PTDA While the distribution is not particularly Gaussian (note the two bumps at ±1.5), the containment of the data within the spans (±1s, ±2s, and ±3s) is close to the normalized Gaussian curve (68.3%, 95.4%, and 99.7% respectively). In this case the standard deviation is a measure of the dispersion over the time range (as it always is), but it should be thought of as banding the probability. One should say that there exists a 96%
chance that the data resides within the band 72.25 ±2.58 KPa, within one revolution of data without any knowledge of when in the time period (or in other words, the blade passage) the data was taken.
199
The reason for this last statement is that from Figure B.3 it is known that there exists a repeatable variation in the data due to the rotor blade passing, and this variation has been incorporated into the standard deviation. We can show the effect of these components by performing a filter on the data and removing all the components below a certain frequency. Since there are two harmonics, the filter used is a 25 KHz high-pass filter (see discussion below on filtering). The plot of the resulting deviation is shown below.
0 20 40 60 80 100 120 140
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Count
Deviation from mean (KPa) STD (s)= 0.28
100% are within ±3s 95.2% are within ±2s 67.3% are within ±1s
Figure B.5 Distribution of Deviations from Average for PTDA filtered at 25KHz This distribution looks more uniform, which is not surprising, since what is happening here is that we are characterizing the high frequency part of the signal. This is a much closer representation to the “Noise” on the measurement, since there are no underlying frequencies in this range that are due to the flow physics. Note that the calculated standard deviation has gone from 1.29 to 0.28 (a factor of 4.6), as the time scale over which the data was analyzed changed from Figure B.4 to Figure B.5, while the actual data window has not changed.
200
There are other ways to “filter” data other than using a traditional digital filter.
One is to use a running average (described in more detail later). For now think of a running average as a moving window where the center of each window has the average over that window. The advantage of this technique is that it is quite easy, and that it does not introduce any ringing sometimes associated with digital filters. A comparison of the time-resolved data using these different “filtering” techniques is shown
below.
-4 -3 -2 -1 0 1 2 3 4
120 120.5 121 121.5 122
Raw Data Filtered at 9 KHz Filtered at 25 KHz
Running Average over 11 points
Dev from avg
TIME (ms)
Figure B.6 Time-Resolved Deviation data from Average for PTDA The running average and the data filtered at 9 KHz provide similar results (as expected since an 11 point average should produce a filter at 9.09 KHz), but clearly both show some of the higher frequency characteristics. Comparisons of the deviations of the data from the average are shown in the following figures.
201
0 10 20 30 40 50 60
-2.2 -2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Count
Deviation from mean (KPa) STD (s)= 0.62
99.8% are within ±3s 96.1% are within ±2s 67.6% are within ±1s
0 5 10 15 20 25 30 35 40
-2.4 -2.1 -1.8 -1.5 -1.2 -0.9 -0.6 -0.3 0 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4
Count
Deviation from mean (KPa) STD (s)= 0.9
100% are within ±3s 96.4% are within ±2s 66.5% are within ±1s
Filtered at 9 KHz Running Average over 11 point Figure B.7 Distribution of Deviations from Average for PTDA (Different Filtering) The differences between these distributions are not great. One can see that the running average maintains more of a flattop distributions, which makes sense since it works much more like a step function than the filter can.
The main point of this sub-section is to show that the processing that goes on before the standard deviation is created will dictate the interpretation that the standard deviation has as a measure of the dispersion. In addition, no matter how the data was processed, the percentage of the data that was within the standard deviation bands follows closely with the Gaussian distribution (even if the distribution does not quite look
Gaussian). This implies that the confidence bars normally associated with these bands can be used in all of these cases as well.