What We Collect, and What We Refuse To

The student activity model — and the surveillance features we left on the table.

A learning platform that knew nothing about its students would be useless: teachers need to see who is struggling, who has finished, and whether the work is genuinely theirs. A platform that knew everything would be a surveillance system wearing a classroom badge. The interesting design work lives between those two failures, and it comes down to a small number of explicit decisions about what to record, how to store it, and when to throw it away.

What we collect, and why

For each quiz or exercise attempt, we record only what serves a clear pedagogical or security purpose:

Data	Why we keep it	How it’s stored	Retention
Start / completion timestamps	Time-on-task; spotting who is stuck	Plain UTC time	Life of the educational record
Attempt duration	Engagement; integrity signals	Derived value	Life of the educational record
Scores	Progress tracking	Numeric (0–100)	Life of the educational record
Answers given	Item-level feedback and review	Structured response data	Life of the educational record
IP address	Detecting account sharing only	Salted one-way hash — never the raw address	90 days, then auto-deleted

Attempt records link to a student by an internal identifier, not by name embedded in the record. The student’s display name lives in one place, the roster, so that correcting or removing it is a single operation rather than a hunt across every row of activity they ever generated.

The IP address decision

Account sharing is a real problem in classrooms: one student logs in for another, or a code circulates beyond the class. To detect it you need some notion of where a session came from. The naive solution is to store the IP address. We don’t, because under the GDPR an IP address is personal data, and storing it indefinitely to catch occasional cheating fails any reasonable balancing test.

Instead we store a SHA-256 hash of the IP combined with a per-tenant salt. This is a one-way transformation: we can tell that ten sessions came from the same place (because the hashes match) without ever being able to recover the place itself. The per-tenant salt means the same address in two different schools produces two different hashes, so nothing can be correlated across customers. And the hash is deleted automatically after 90 days — long enough to investigate a live integrity concern, short enough that we are not sitting on a quietly growing pile of network identifiers.

Holding periods: why two clocks, not one

Not all data should age the same way, so we run two retention clocks rather than one blunt policy.

Security data has a short, automatic life. IP hashes are purged by a scheduled job that runs every night and removes any hash older than 90 days. This is not a manual cleanup we promise to remember — it is a background process that runs whether or not anyone is watching, which is the only kind of retention limit that actually holds over years.
Educational records persist while they are educational records. Scores, timestamps, and answers are kept for as long as the school needs them for their legitimate educational purpose, and are removed when a student or group is deleted (see Data Subject Rights in Practice). The school, as controller, decides the lifetime of these records; we provide the tools to honour that decision, including deletion.

Separating the two clocks is itself a data-minimisation decision. The data with the highest privacy sensitivity and the lowest long-term value — network identifiers — expires fastest, automatically. The data the teacher actually relies on outlives it but never outlives the student’s relationship with the school.

Our legal basis: legitimate interest, documented

We process this activity data under legitimate interest (GDPR Article 6(1)(f)) rather than consent. Consent is the wrong instrument here: a student cannot meaningfully refuse the basic activity logging that a graded assignment requires, and a “consent” that cannot be declined is not consent. Legitimate interest is honest about what is happening — the school has a genuine educational and integrity interest, and we have minimised the data so that interest is not outweighed by the intrusion. That balancing is written down, not assumed, and the data minimisation described above is what makes it defensible.

What teachers see — and don’t

Collecting data responsibly is only half the job; exposing it responsibly is the other half. Teachers see actionable insight: “Sarah spent 18 minutes on Quiz 3, against a class average of 25,” or a flag that a quiz was completed implausibly fast, or an account-sharing alert. They do not see raw IP hashes, forensic logs, or precise second-by-second timelines. The dashboard is built to answer teaching questions, not to enable monitoring of a child’s every move.

What we deliberately refuse to collect

The clearest statement of a privacy philosophy is the list of easy features you turned down. We explicitly rejected each of these:

Geolocation — invasive, with no educational payoff.
Keystroke analytics — surveillance dressed up as analytics.
Webcam proctoring — disproportionate for classroom learning.
Persistent device fingerprinting — unnecessary for any use case we have.
Indefinite IP retention — the legal and ethical risk outweighs any forensic value past 90 days.

Each of these would have been straightforward to build, and some would have made a tidy bullet point on a feature comparison. We left them out on purpose. A platform for children should be judged as much by what it declines to know as by what it does.

This note is adapted from Lesson Commons’ internal architecture decision record on student activity tracking. It describes design intent and current implementation; it is not legal advice. Schools remain responsible for their own data-protection obligations as controllers.

This document was written with the assistance of Claude (Anthropic). The author defined the purpose, audience, and main ideas, directed the editorial approach, and edited the final text.