Last time I covered easily reading data from a file into a useful structure. This time I will be demonstrating some summary statistics that can be generated from the data.
Remember that data is a list that looks something like this:
[ (0, (4, 8, 3)), (1, (4, 9, 2)), (2, (5, 8, 2)), (3, (5, 8, 1)), (4, (6, 9, 2)) ]
The following code sample shows a method of generating the statistics by timestamp:
for timestamp,values in data: print timestamp, func(values)
Alternatively, you can generate by source sensor:
values_only = [x for x in data] # Extract the values by_sensor = zip(*values_only) # Reorganise by sensor for sensor,values in zip(header[1:], by_sensor): print sensor, func(values)
Remember that header came from reading the first line of the datafile in the previous instalment of this series. We use header[1:] as we don’t want to include the time column when we’re looking up sensor names.
Listed below are a few examples of simple summary statistics that can be generated. A call to any one of these would replace func(values) in the code samples above. Note that these, and many more, are part of packages such as numpy. These packages inevitably offer more features and more robust implementations than home-cooked ones, but for simple cases (such as here) these examples may be sufficient.
def mean(values): return sum(values) / len(values)
Min / max (already exist)
def median(values): values_s = sorted(values) n = len(values) if n % 2 == 0: value1 = values_s[n / 2] value2 = values_s[n / 2 - 1] return 0.5 * (value1 + value2) else: return values_s[n / 2]
import math def rms(values): sq = map(x*x, values) mean = sum(sq) / len(sq) result = math.sqrt(mean)
Or, in one line:
import math def rms(values): math.sqrt(sum(map(x*x, values)) / len(values))
As a more involved example, let’s assume you want to write the data out to the terminal (to be redirected to a file) with an extra column that contains a weighted sum of the three values for each timestamp.
import sys import csv weights = [0.5, 0.3, 0.2] def weighted_mean(values, weights): vw = zip(values, weights) weighted_values = [value * weight for value,weight in vw] return sum(weighted_values) header = get_header() # Returns the original header data = get_values() # Returns the data values outfile = csv.writer(sys.stdout) outfile.writerow(header + ['weighted']) for timestamp,values in data: row = [timestamp] + list(values) + [weighted_mean(values, weights)] outfile.writerow(row)
Next time we’ll look at generating some statistics that will help you see how well the system was working.