From Spreadsheets to Structured Data: A Migration Playbook

Every project team that decides to move off spreadsheets discovers the same thing: the spreadsheets were never really the problem. They were a symptom. The actual problem is that nobody agreed on what the data model should be before people started entering data into it. The spreadsheet was just flexible enough to accommodate the disagreement.

Migrating to a structured platform doesn't automatically fix that. If you carry the ambiguity of your spreadsheets into a structured system, you'll have the same problems with a better interface. The migration itself is the opportunity to resolve the ambiguity — to decide what your data model actually is, what the fields mean, what the validation rules should be, and whose responsibility it is to keep the records current.

This playbook reflects what we've seen work. It's not the only way, but it's a reliable path.

Phase 0: Don't start with the data

The instinct is to export the spreadsheet and start mapping columns to fields. Resist it. Before you touch the data, answer four questions:

What are the asset classes? A pump is different from a valve is different from a control panel. Each needs its own attribute schema. Lumping them together because they're all in the same spreadsheet was the original mistake — don't repeat it.
What are the mandatory fields? Which attributes must be populated before a record can be considered complete? "Complete" needs a definition before migration starts, not after.
Who owns the data? For every asset class, there should be a named person or role who is responsible for the accuracy of the records. If nobody is responsible, the data will drift.
What does the validation logic look like? Tag numbers: what format? Pressure ratings: what units? Define the validation rules before you migrate, not as a cleanup exercise afterwards.

These four questions will typically surface more disagreement than you expect. That disagreement is valuable — it's the reason the spreadsheets were always partially wrong. Get it resolved before the migration, and the migration becomes straightforward.

Phase 1: Audit the source data

Pull the spreadsheet apart before you try to import it. A structured data audit has three parts:

Completeness audit

For every column that maps to a mandatory field in the target schema, count the null and empty values. A column that's 30% populated is not ready to migrate — it's a data gap that will follow you into the new system and be harder to fix once it's buried in a platform with hundreds of records.

Consistency audit

Look at the distinct values in every field that should have a controlled vocabulary. Tag number formats that don't follow the convention. Units entered as "kPa", "kpa", "KPA", and "kilopascal" by four different people. Date formats that mixed DD/MM/YYYY and MM/DD/YYYY. These need to be normalised before migration, not after.

Duplication audit

Find the duplicate rows. Construction asset registers almost always have them — the same piece of equipment entered twice under different tag numbers, or the same tag number referencing two different physical assets. Duplicates imported into a structured system create confused records that are difficult to clean up because the system treats them as legitimate.

The output of the audit is a data quality report: a list of rows that are ready to migrate and a list of rows that need remediation. The remediation work usually takes longer than the import itself. Plan for it.

Phase 2: Design the schema, not the import

The most common migration mistake is designing the import to fit the source data. The import should be designed to fit the target schema — and the target schema should be designed to fit what the data needs to do operationally.

In practice, this means building the attribute schema for each asset class from first principles rather than from the spreadsheet columns. Ask: if a maintenance engineer needs to diagnose a fault on this asset in five years, what data do they need? That's your attribute schema. The spreadsheet columns are a starting point for discovery, not a specification.

Build the schema in the platform first, with validation rules applied. Then map the source columns to the target fields. Some source columns won't map to anything — they were spreadsheet metadata (row numbers, entered-by columns, colour coding that never made it into a formal field). Let them go. Some target fields won't have source data — they'll need to be populated after migration. Note them in the data gap report.

Phase 3: Import in batches, validate as you go

Don't import everything at once. Start with one asset class — the one your team knows best, where the source data is cleanest, and where validation failures are easiest to diagnose. Import it, review the results, and fix any mapping issues before moving to the next class.

After each batch import, run the validation report. Every validation failure is a data quality issue in the source data that wasn't caught in the audit, or a schema design decision that needs revisiting. Resolve them before the next batch — don't let failures accumulate.

The batch approach also gives you a natural checkpoint for team sign-off. After each batch, the asset owner for that class reviews the records and confirms that the data looks right. This is much easier to do on a batch of 50 records than on a migration of 2,000 records that's already complete.

Phase 4: Run parallel — briefly

There's always pressure to cut over immediately and shut down the spreadsheets. Resist this too, but only briefly. Run both systems in parallel for one to two weeks after the initial migration. During that period, all new data goes into the structured platform; no changes are made to the spreadsheets. At the end of the parallel period, compare the records. Any discrepancy is a migration gap that needs to be resolved before the spreadsheets are retired.

The parallel period is not a safety net for going back. It's a validation exercise. If the comparison surfaces systematic differences rather than isolated gaps, that's a signal that the schema design or the import mapping needs revisiting — and it's better to catch that in week two than in month six.

Phase 5: Define the ongoing data maintenance regime

The migration is not the end. The end is a platform where the data stays current. That requires three things:

Clear ownership. Every asset class has a named owner who reviews the records on a cadence. Not "the project team" — a specific person.

Enforced validation. The platform should make it impossible to save a record that fails mandatory validation. If it can be bypassed, it will be bypassed. Configure the validation rules to be blocking, not advisory.

A change log. Every modification to every record should be logged with a timestamp and an author. The change log is the audit trail that makes the data trustworthy. Without it, you're back to spreadsheet-level confidence in the data quality.

The spreadsheet wasn't the problem. The lack of structure, ownership, and validation was the problem. The migration is the opportunity to fix all three at once — but only if you treat it as a data design project rather than a file conversion exercise.

From Spreadsheets to Structured Data: A Migration Playbook

Phase 0: Don't start with the data

Phase 1: Audit the source data

Completeness audit

Consistency audit

Duplication audit

Phase 2: Design the schema, not the import

Phase 3: Import in batches, validate as you go

Phase 4: Run parallel — briefly

Phase 5: Define the ongoing data maintenance regime

Why Your Asset Register Should Be a Graph, Not a Spreadsheet

Validation by Description: Letting AI Define Data Quality Rules

The Construction Industry's Data Problem Is Bigger Than You Think

Phase 0: Don't start with the data

Phase 1: Audit the source data

Completeness audit

Consistency audit

Duplication audit

Phase 2: Design the schema, not the import

Phase 3: Import in batches, validate as you go

Phase 4: Run parallel — briefly

Phase 5: Define the ongoing data maintenance regime

Why Your Asset Register Should Be a Graph, Not a Spreadsheet

Validation by Description: Letting AI Define Data Quality Rules

The Construction Industry's Data Problem Is Bigger Than You Think

Stay in the loop