# Python tips – Simple sensor data handling 3

Last time I covered easily reading data from a file into a useful structure. This time I will be demonstrating some summary statistics that can be generated from the data.

Remember that `data` is a list that looks something like this:

```
[
(0, (4, 8, 3)),
(1, (4, 9, 2)),
(2, (5, 8, 2)),
(3, (5, 8, 1)),
(4, (6, 9, 2))
]
```

The following code sample shows a method of generating the statistics by timestamp:

```
for timestamp,values in data:
print timestamp, func(values)
```

Alternatively, you can generate by source sensor:

```
values_only = [x[1] for x in data] # Extract the values
by_sensor = zip(*values_only) # Reorganise by sensor
for sensor,values in zip(header[1:], by_sensor):
print sensor, func(values)
```

Remember that `header` came from reading the first line of the datafile in the previous instalment of this series. We use `header[1:]` as we don’t want to include the time column when we’re looking up sensor names.

Listed below are a few examples of simple summary statistics that can be generated. A call to any one of these would replace `func(values)` in the code samples above. Note that these, and many more, are part of packages such as numpy. These packages inevitably offer more features and more robust implementations than home-cooked ones, but for simple cases (such as here) these examples may be sufficient.

## Mean

```
def mean(values):
return sum(values) / len(values)
```

## Min / max (already exist)

```
min(values)
max(values)
```

## Median

```
def median(values):
values_s = sorted(values)
n = len(values)
if n % 2 == 0:
value1 = values_s[n / 2]
value2 = values_s[n / 2 - 1]
return 0.5 * (value1 + value2)
else:
return values_s[n / 2]
```

## RMS

```
import math
def rms(values):
sq = map(x*x, values)
mean = sum(sq) / len(sq)
result = math.sqrt(mean)
```

Or, in one line:

```
import math
def rms(values):
math.sqrt(sum(map(x*x, values)) / len(values))
```

## More complete example

Let’s assume you want to write the data out to the terminal (to be redirected to a file) with an extra column that contains a weighted sum of the three values for each timestamp.

```
import sys
import csv
weights = [0.5, 0.3, 0.2]
def weighted_mean(values, weights):
vw = zip(values, weights)
weighted_values = [value * weight for value,weight in vw]
return sum(weighted_values)
header = get_header() # Returns the original header
data = get_values() # Returns the data values
outfile = csv.writer(sys.stdout)
outfile.writerow(header + ['weighted'])
for timestamp,values in data:
row = [timestamp] + list(values) + [weighted_mean(values, weights)]
outfile.writerow(row)
```

Next time we’ll look at generating some statistics that will help you see how well the system was working.