IBM i Test Data Management – A Best Practice Guide. This guide explores the key principals and techniques as they relate to the creation, maintenance, validation, use, and re-use of test data environments on the IBM i, IBM iSeries & IBM AS/400.
TestBench for IBM i is the definitive product for test data management and in each section, we will highlight ‘How TestBench Does It’ and share some of the secrets behind its success.
What are the key challenges of test data management?
A copy of the live database is unwieldy, slowing testing, and making validation almost impossible. The ideal test database contains a representative cross-section of the data which maintains referential integrity. If this can be achieved, many more test environments can be stored concurrently to better support multiple teams and scenarios.
The GDPR legislation changed the rules for any organization processing data on EU citizens and extended the requirements of industry-specific initiatives such as PCI compliance and HIPAA in the US. What was best practice, became an unavoidable necessity. However, the need for meaningful test data still exists and simply replacing sensitive data with random characters does not work, especially in system and UAT testing.
The creation of test data environments is often difficult, so once complete there is a reluctance to repeat the process. While understandable, the downside is that the data loses connection to current processing dates and is inevitably corrupted as testing is performed.
IBM i shops know that, while errors in the UI are painful, the real problems start when the database gets corrupted. Such errors require the affected applications to be taken offline and in the worst-case scenario, the most recent uncorrupted back-up must be restored, and any subsequent transactions repeated.
You will find a section on each of these areas in this guide to facilitate your research.
Developers, Project Managers, QA Test Teams, UAT Teams, and Trainers all need test data and ideally in an environment dedicated to them. So, what do you need to consider?
‘Everything’ sounds like a good starting point and that’s where every IT shop started. It is an approach that has worked in the past. But the chances are that you will run out of disk space and even if the disk is available, you will have only created one test environment when many are needed.
Anyway, big test databases are a bad idea. Each test will take too long – you don’t want to wait five hours every time you test the ‘end of day’ run. Big data sets are also impossible to validate – how can you hope to check the database state if thousands of rows have been affected?
So, a sub-set is the only way to go, but there’s a lot more to it. The first 100 rows from every table will result in a smaller database but will be useless as it ignores the need to maintain referential integrity. Once you’ve solved referential integrity and remembered to include the foreign keys, you’re still not there. To be useful the test environment needs to include a representative selection of master records and transaction data to support all the test scenarios.
GDPR, HIPAA, or simply good practice? It does not matter why, what matters is that you can no longer allow users to access personal or sensitive information held in your live environment. So, what do you need to consider?
GDPR is the broadest piece of data protection to date and has sharp teeth in the shape of significant fines for non-compliance. In broad terms if you hold any personal data on a citizen of the European Union you must comply with GDPR and that means the data must be kept confidential.
In the US, the Privacy Act of 1974 covers confidentiality of personal data, and more specifically HIPAA addresses the Health Insurance industry, the Gramm-Leach-Bliley Act, and FSA (UK) addresses Financial Services.
In light of these regulations, internal and external auditors are increasingly focussed on data confidentiality and IT shops need to demonstrate that they have followed best practice.
It is difficult to effectively mask test data without destroying its usefulness:
TestBench provides several options to mask data and respect data confidentiality requirements. Vertical scrambling is the starting point for many accounts and may be sufficient for your needs. This moves data between the rows in a table so that the information remains meaningful but untraceable. For example, after scrambling, a row may contain a First Name of ‘David’, a Surname of ‘Smith’, and a City of ‘New York’, however, in the original data all these items came from different rows.
For more specific needs, TestBench provides several custom masking routines such as US Social Security Numbers and Credit Card Numbers. Finally, you can also develop your own masking algorithms and embed them within the TestBench framework.
Uniquely, these masking choices can be synchronized across multiple tables so that the same data is always masked in the same way wherever it occurs.
IBM i IT shops cover a wide spectrum, from pure green screen, through mixed UI technologies to those for who the IBM i is purely a server supporting a range of APIs. Wherever you are on that spectrum the IBM i is holding data critical to your company and it is self-evident that database validation should be a requirement of every test.
IBM i applications are rich in database activity and much of that processing is performed by APIs or batch processes that do not have any form of user interface. Given the damage that a poorly written program can inflict on this key corporate asset, validating that every database activity is correct is difficult and IBM provide few tools to help.
This is a challenging area for any IT shop and the lack of technical knowledge in your test teams may be an issue with any of the following techniques:
TestBench monitors every update to every table and data area including multiple updates to a single row so that it is easy to see at what point in a test a data row became corrupted. This can be a large volume of information so TestBench provides 3 key analysis techniques.
1. Data Rules can be created that specifies the expected state of each field after every update dependent upon its status.
2. TestBench can compare every element of the database activity monitored during a test to a baseline set of results and highlight the differences.
3. A static comparison can be run of the resultant file against an expected version of that same file, and the differences highlighted. A similar comparison is available for reports
It takes time and effort to create a test environment but what can you do when that environment becomes corrupted through a failed test or you simply run out of data at the correct initial state?
What you need is the ability to take your data back to a previous state so that the same test can be re-run. The importance of knowing the state of the data at the start of a test cannot be under-estimated. Without a known starting point, the expected state of the data at the end of the test cannot be defined and without that definition it is impossible to verify the database level activity.
The reality for many IT shops is that they live with test environments which they know are corrupted to a greater or lesser extent. Due diligence at the database level is therefore significantly compromised or simply impossible. At best, allowances must be made for the final state of the data given the less than perfect state of the start data.
They are the only options, but they take time, and given that the test environment is a shared resource, it is difficult to find opportunities to execute. This explains why many test environments are frequently out of date and have a level of data corruption.
TestBench provides an all-encompassing, ‘Undo’ button. This capability is based on the IBM journals but extended to cover other data objects such as data areas. In addition, the IBM journaling support does not cover operations such as Clear Physical File which are handled automatically by TestBench.
You can create as many Checkpoints as required during your use of the test environment, perhaps setting one each time a significant part of the testing has been completed. Once created, you can simply return the test environment to the state that existed at the time the Checkpoint was taken.
In summary, the creation and management of test data environments is a challenge but let us finish this paper by reviewing the key steps to success:
Create a representative linked subset of live data
Identify every column containing sensitive data
Mask that data while maintaining meaning for the user
Ensure the test executes in the correct environment
Consider UI, batch & API testing
Ensure every action on every column in every row is correct
Ensure parent/child summaries are correct
Set multiple Checkpoints as critical steps are achieved
Reset rapidly every element of your test data environment
We understand the challenges associated with test data management and the problems you may be facing. So why not reach out to one of our specialists and explore the options available.
We are sorry to tell you that using Internet Explorer as your browser won’t give you the best experience of this website.
To get the best value visit us via Chrome, Edge or Firefox