Since the brand launch of Libelle DataMasking seven years ago , we have already carried out many different as well as interesting customer projects. Each project always presents us with new and exciting challenges. But it would not be us if we didn't accept and then master these different challenges.
In this blog post, I have summarized the TOP 10 mistakes in anonymization all together.
One of the key features of Libelle DataMasking is the preservation of cross-system and cross-landscape consistency of data, regardless of whether only SAP, only other systems, or a combination of both worlds is considered in the anonymization process. This functionality is achieved by entering the so-called anonymization key.
In the majority of projects, great importance is being attached to maintaining cross-system and cross-landscape consistency, which is why a uniform anonymization key is defined at a very early stage of the projects, which from then on applies to all participants.
It becomes problematic if, despite this definition, a different key is suddenly used; it is sufficient if this key is already used in only one of the systems involved. The workflow of a test can therefore quickly come to a standstill because relevant information about a test case can no longer be found because the anonymized data now diverges.
In relational databases, there are usually dependencies between the tables based on referential integrity in order to ensure the consistency and integrity of the data. These dependencies must of course be maintained during the anonymization.
With Libelle DataMasking, a sequence can be defined, in which the tables to be handled are anonymized. The software offers up to ten different levels for this purpose. An example: Tables A and B contain related data records.
In a first step (sequence 0), the fields of both tables are being anonymized independently of each other. A second step is then used in order to restore the relatedness of the data records. This is done by means of an SQL-SELECT statement, by which the tables are JOINed with each other. The JOIN must necessarily be performed in a different order (order 1 in the example). Otherwise, there is a risk that consistency will be violated. In extreme cases, a deadlock could even be created during anonymization, because two statements may want to access and change the very same data records.
The product scope of Libelle DataMasking includes the so-called reference database. It contains possible target values where anonymization can be performed. However, in addition to Libelle's own reference database, your own customer-specific reference files can also be used as a basis. Depending on the concept of the project, the reference files can be provided via a separate workflow, but can also be generated automatically with the help of the software. The error-prone situation is that the file is either not registered as a reference file or is not activated in the configuration in which the file is required. Another possible source of error is that the file was not created by the software during automatic generation as a so-called anonymization activity.
Out of the box, the Libelle DataMasking software currently contains 40 anonymization algorithms. Some algorithms manage without additional parameters. These include, for example, the first name algorithm. Others, such as the address algorithm, require additional parameters. The parameters are used to specify exactly how a field is to be anonymized.
In our projects we experience over and over that parameters are being defined incorrectly or parameters are specified that the respective algorithm does not support. In addition, some algorithms require ID values to be assigned. For example, in the case of addresses, a data set consisting of street, house number, postal code and city forms a group. This group is defined by the ID value. If a table contains, for example, the primary and secondary residence, i.e. two address groups, a unique ID must be assigned for each group.
I count this phenomenon among the classics in SAP-specific projects. It is always interesting and astonishing to see how the search and match code fields are being filled in the SAP systems of the customers and which characteristics it can hold.
A quite harmless example is that only the first search field is being completed, but then with the first and last name. In our SAP standard templates, we have chosen a certain way of filling out the fields. This setting must be adapted to the specific customer's needs.
SAP systems that are not yet running on the basis of SAP HANA usually contain cluster and pool tables in addition to transparent tables, which are characterized by the fact that they can only be selected at ABAP level. In order to perform this, a Libelle-specific function module must be transported into the systems.
The error-proneness at this point is that the updates of the software also contain updates of the function module and the import of the new transport file which is forgotten often. Libelle DataMasking checks the version of the function module and issues an error message in the event of a discrepancy.
An anonymization run can fail for a variety of reasons. The causes can also lie outside the software, for example a full file system or even a full tablespace. After the error has been corrected, the anonymization can be continued. It should be noted whether the restart points are being activated. With their help, the anonymization can be continued exactly at the data set, where the error occurred in the first place. If the restart points are not active, the last treated table will be anonymized once again, although part of the data has already been anonymized.
In some projects, we sometimes achieve quite a high degree of customizing. These extensions that deviate from the standard (e.g. scripts) are taken into consideration during an update of the software. However, if the software is being reinstalled in parallel, the customizing settings have to be "laboriously" transferred to the new environment. I.e. additional files must first be copied, and on the other hand the settings must also be adjusted again within the tool itself.
I also count this case among the classic errors in the projects. It happens that fields have to be anonymized which are part of the primary key of a table. A typical example is the table TIBAN in SAP. Using our algorithm, we validly recalculate the values of an IBAN in Libelle DataMasking. But often the quality of the original data throws a spanner into the works.
To stay with the example: What happens over and over is the absence of check digits. Thus, there are data records in the systems once with and once without check digits. However, the algorithm also validly recalculates the check digit, although it is actually missing in one of the data records. This constellation leads to duplicates being created, with the result that the primary key that was previously deleted for handling these fields can no longer be recreated.
In many projects, it is important that address data are not arbitrarily alienated but remain in the original region, which can also be implemented with Libelle DataMasking.
However, this requires a high quality of the original data. If the region (e.g. state) or country is not maintained clean, the anonymized values end up in a completely different area. If the data is incomplete in the system, customizing can be used I order to create a mapping so that the addresses remain in their original region even after anonymization.
Libelle IT Group has developed a solution for the required anonymization and pseudonymization here with Libelle DataMasking. The solution was designed to produce anonymized, logically consistent data on development, test and QA systems across all platforms. Learn more about our solutions and get your free whitepaper.