Reliable Conversion of Legacy Test and Material Data into Standardised Data Models

Legacy Data as a Structural Barrier to Digitalisation

The digitalisation of materials quality assurance in the automotive and plastics industries
depends on the availability of structured, machine-readable data. However, a substantial
proportion of existing test and product data resides in formats that were never designed for
automated processing: PDF data sheets, proprietary Excel templates, laboratory reports in
unstructured text, and isolated database exports with heterogeneous schemas.

When organisations seek to adopt new data standards—such as VDA 231-301 or
comparable structured data models—they inevitably face the challenge of converting these
legacy data inventories into the target format. In practice, this conversion process is one of
the most resource-intensive steps in any digitalisation initiative.

Typical obstacles include:

Inconsistent document structures across suppliers, projects, and time periods
Ambiguous or incomplete descriptions of testing conditions and parameters
Implicit domain knowledge embedded in naming conventions and abbreviations
Varying granularity of recorded results, from aggregated summaries to raw
measurement values
Absence of unique identifiers linking test results to defined testing requirements

The cumulative effect is a significant bottleneck: organisations cannot leverage their
historical data assets within modern, interoperable systems without first undertaking a
costly and error-prone transformation effort. This delays the realisation of efficiency gains
that standardised data models are designed to deliver.

AI and Algorithmic Mapping: Potential and Limitations

A widely discussed approach to legacy data conversion involves the use of Artificial
Intelligence and algorithmic methods to automatically map existing data into a target data
model. Techniques such as natural language processing, pattern recognition, and
rule-based extraction engines can identify and classify testing requirements, extract
parameter values, and propose mappings to standardised fields.

These methods offer clear advantages in terms of throughput and scalability. For large
document volumes, automated approaches can process orders of magnitude more data
than manual review. However, purely automated solutions encounter well-documented
limitations in the materials testing domain:

Normative texts and OEM specifications frequently contain ambiguous formulations
that require domain expertise to interpret correctly
Variations in document layout, terminology, and language across suppliers and time
periods reduce the reliability of pattern-based extraction
Customer-specific parameterisations and non-standard testing conditions are often
documented in ways that resist automated classification
Errors in automated mapping—particularly false-positive assignments—can propagate
through downstream systems and compromise data integrity
Confidence in the output diminishes when the source data deviates from the formats
and structures on which the models were trained

As a result, relying exclusively on AI or algorithmic approaches introduces risks that are
difficult to accept in quality-critical environments. The automotive industry, in particular,
demands traceability, auditability, and correctness—requirements that purely automated
pipelines cannot consistently guarantee without additional safeguards.

Brain of Materials: A Hybrid Approach to Reliable Legacy Data Conversion

Brain of Materials addresses the fundamental tension between automation efficiency and
domain reliability by researching and implementing multiple complementary approaches for
the extraction and standardisation of legacy data. Rather than relying on a single method,
the platform employs a hybrid strategy that combines algorithmic processing with
structured domain validation.

The core methodology integrates:

AI-based text and structural analysis for the automated detection and classification of
testing requirements within heterogeneous source documents
Algorithmic extraction engines that identify parameter values, testing conditions, and
result structures from legacy formats including PDF data sheets, Excel files, and
proprietary templates
Rule-based mapping logic that assigns extracted data elements to the appropriate
fields within target data models such as VDA 231-301
Systematic assignment of unique TestIDs (TIDs) to each identified testing
requirement, ensuring unambiguous referencing and eliminating interpretative leeway
Domain-specific validation through continuous feedback loops with subject matter
experts, ensuring that automated suggestions are verified against established technical
knowledge

This hybrid approach—combining algorithmic evaluation with expert-guided quality
assurance—ensures that even extensive, heterogeneous legacy data inventories can be
transformed into a structured, standardised data basis with a high degree of reliability. The
feedback mechanisms simultaneously serve to refine and optimise the underlying
algorithms, improving extraction accuracy over successive iterations.

Enabling Digitalisation of Existing Systems and Processes

The ability to reliably convert legacy data into standardised formats unlocks a range of
operational benefits that extend well beyond the immediate conversion task. By
transforming historically accumulated document repositories into a consistent,
interoperable data source, organisations establish the preconditions for meaningful
digitalisation of their entire materials quality assurance landscape.

Concrete applications include:

Digitalisation of existing systems: Legacy test and product data becomes accessible
within modern CAQ, PLM, and ERP platforms, enabling organisations to integrate
historical data into current digital workflows without rebuilding their data foundations
from scratch
Accessibility for new formats and standards: Data originally captured in obsolete or
proprietary formats is made available in standardised, machine-readable
structures—such as VDA 231-301—facilitating cross-system processing and automated
validation
Simplification of process digitalisation: With a structured, standardised data basis in
place, subsequent digitalisation steps—from automated result comparison to
cross-supplier benchmarking to audit-proof documentation—become significantly more efficient and scalable
Cost-effective standardisation: By combining automated extraction with targeted
expert validation, the conversion effort per data record is reduced substantially
compared to fully manual approaches, making standardisation economically viable evenvfor large legacy inventories
Preservation of institutional knowledge: Testing data and associated metadata that
might otherwise be lost due to format obsolescence or organisational changes are
systematically captured in a durable, interoperable form

Enabling Digitalisation of Existing Systems and Processes

The conversion of legacy data gains particular significance in the context of industry-wide
standardisation efforts. VDA 231-301 defines a generic, machine-readable data model for
the structured description of testing requirements, testing conditions, result structures, and
references to standards and specifications. However, the value of such a standard is fully
realised only when existing data—not merely newly generated data—can be represented
within it.

Brain of Materials facilitates this integration by mapping extracted legacy data directly into
the VDA 231-301 data model and enriching it with TestIDs. The TestID provides a unique,
machine-readable identifier for each testing requirement, including its methodology,
parameterisation, and conditions. Within the VDA 231-301 framework, the TestID functions
as a BusinessKey that enhances the data model with unambiguous references—bridging
the gap between normative text and operational testing practice.

This combination of standardised data structures and unique identification transforms
legacy data from a static archive into an active, queryable, and automatable resource
within the digital supply chain.

Summary

The conversion of existing test and product data into new data standards and data models
is one of the most significant practical challenges in the digitalisation of materials quality
assurance. Purely manual conversion is prohibitively expensive at scale; purely automated
approaches lack the domain-specific reliability required in quality-critical environments.

Brain of Materials addresses this challenge by researching and implementing a hybrid
methodology that combines AI-based extraction, algorithmic mapping, and expert-validated
structuring. This enables:

Cost-effective and reliable standardisation of legacy data inventories
Digitalisation of existing systems by making historical data accessible in modern
formats
Seamless integration with standardised data models such as VDA 231-301
Unambiguous identification of testing requirements through systematic TestID
assignment mechanisms
Continuous improvement of extraction accuracy through adaptive feedback mechanisms

Organisations that address their legacy data challenge systematically will not only
accelerate their transition to data-driven quality assurance but also unlock the full value of
their historical data assets for future digitalisation initiatives.

Curious?

Would you like to understand how legacy data conversion can be concretely implemented
in your existing system and process landscape—and what efficiency potentials can be
realised through structured, standardised data exchange?

In our complimentary webinar, we will demonstrate practical applications of how Brain of
Materials can serve as an operational infrastructure for testing and material data. Together,
we will analyse typical integration scenarios, automation potentials, and specific use cases
along the supply chain.

Secure your appointment now and discuss your individual requirements directly with our
experts.