Python tips – Simple sensor data handling 3

Last time I covered easily reading data from a file into a useful structure. This time I will be demonstrating some summary statistics that can be generated from the data.

Remember that data is a list that looks something like this:

[
 (0, (4, 8, 3)),
 (1, (4, 9, 2)),
 (2, (5, 8, 2)),
 (3, (5, 8, 1)),
 (4, (6, 9, 2))
]

The following code sample shows a method of generating the statistics by timestamp:

for timestamp,values in data:
    print timestamp, func(values)

Alternatively, you can generate by source sensor:

values_only = [x[1] for x in data]        # Extract the values
by_sensor = zip(*values_only)             # Reorganise by sensor
for sensor,values in zip(header[1:], by_sensor):
    print sensor, func(values)

Remember that header came from reading the first line of the datafile in the previous instalment of this series. We use header[1:] as we don’t want to include the time column when we’re looking up sensor names.

Listed below are a few examples of simple summary statistics that can be generated. A call to any one of these would replace func(values) in the code samples above. Note that these, and many more, are part of packages such as numpy. These packages inevitably offer more features and more robust implementations than home-cooked ones, but for simple cases (such as here) these examples may be sufficient.

Mean

def mean(values):
    return sum(values) / len(values)

Min / max (already exist)

min(values)
max(values)

Median

def median(values):
    values_s = sorted(values)
    n = len(values)
    if n % 2 == 0:
        value1 = values_s[n / 2]
        value2 = values_s[n / 2 - 1]
        return 0.5 * (value1 + value2)
    else:
        return values_s[n / 2]

RMS

import math
def rms(values):
    sq = map(x*x, values)
    mean = sum(sq) / len(sq)
    result = math.sqrt(mean)

Or, in one line:

import math
def rms(values):
    math.sqrt(sum(map(x*x, values)) / len(values))

More complete example

Let’s assume you want to write the data out to the terminal (to be redirected to a file) with an extra column that contains a weighted sum of the three values for each timestamp.

import sys
import csv

weights = [0.5, 0.3, 0.2]

def weighted_mean(values, weights):
    vw = zip(values, weights)
    weighted_values = [value * weight for value,weight in vw]
    return sum(weighted_values)

header = get_header()  # Returns the original header
data = get_values()    # Returns the data values

outfile = csv.writer(sys.stdout)
outfile.writerow(header + ['weighted'])

for timestamp,values in data:
    row = [timestamp] + list(values) + [weighted_mean(values, weights)]
    outfile.writerow(row)

Next time we’ll look at generating some statistics that will help you see how well the system was working.