Data Analysis and Data Profiling

This phase involves analysing and profiling the potential source systems.

Data analysis (also called Data Discovery) involves:

  • Analysing and documenting the data structures (databases, tables, columns, primary keys, foreign keys) - Note there are tools that can automate this activity.
  • Identifying data volumes
  • Identifying Data Custodians, Data Stewards and Data Owners
  • Identifying data standards and data governance approach.
  • Identifying the Infrastructure and technologies which the system uses
  • Identifying software which used such as COTS solution, Data Management tools and custom software.
  • Analysing the data being passed to and from interfacing systems
  • Identifying the Source of Truth for data elements
  • Identifying peak usage times 
  • Identifying times when backups and interfaces are run (to identify a window to run regular data extracts and potential access issues during trial migrations and at go-live)
  • Identifying any data retention requirements
  • Analysing the archiving approach and the location of historical data  
  • Identifying Potential Data Migration Issues, Risks, Constraints and Dependencies.

Data Profiling (also called Data Quality Analysis) involves analysing the quality of the data by applying the applicable Data Quality Dimensions:

  • Correctness
  • Validity
  • Duplication
  • Consistency
  • Non-standard values
  • Obsolete Data
  • Timeliness
  • Completeness
  • Missing values
  • Integrity
  • Precision

A Data Analysis and Data Profiling report should be done for each source system. Generally a single report is done for each source system. Alternately a separate Data Analysis Report and Data Profiling Report can be done for each system. In order to keep the explanation simple separate reports will be assumed.