Report on Cloud - Efficient data analysis in large data volumes

August 30, 2020

There are some scenarios where one-off or regular data analysis is driven on enormously large data pools (such as prediction of shopping behaviour, analysis of customer loyalty, retention management, early risk detection, ...).

In addition to the purely technical requirements and dependencies, these entail two challenges in particular:

Challenge No. 1:Performance-intensive data analysis = High hardware costs & long turnaround time

Depending on the complexity of the data structures, either a certain amount of runtime or the performance of the systems is required to reduce this runtime. But on which systems can you actually carry out such data analyses efficiently? In many cases, the production systems that are intended for operational use are not suitable for this - the performance requirements of complex analyses slow down the actual operational use, which can become a recurring annoyance with regular analyses.

Challenge No. 2:Dealing with personal data (GDPR) = customer and employee profiles

Data watchdogs are increasingly paying attention to the fact that sensitive, personal data may not be used for extensive profiling at the individual level without the consent of the data subjects. With the GDPR, there has been a significant tightening in terms of regulations and especially in terms of potential penalties for non-compliance. Here the question arises: do without data analysis or neutralise the data, even if the professional informative value suffers massively?

Don't compromise!Libelle's cloud approach strengthens you in dealing with these challenges

Use dedicated systems that are available exclusively for data analysis and reporting purposes. At the same time, you avoid investing in hardware that runs at low capacity most of the time and is then again too weak for the purpose of analysis. At the same time, you limit the circle of users for these analysis systems. Operate an analysis system that is regularly provided with fresh and sensibly (!) anonymised production data.

Rely on data analysis or reporting in the cloud. The advantage: the provision of the required resources as needed and usage-based billing. In concrete terms, this means that you operate - and pay for - your cloud-based analysis and reporting systems.

only in the time windows in which analyses are actually running.
with exactly the resources (especially CPU power and RAM) that are needed at the time.
with anonymised but logically consistent data that complies with both the GDPR and the professional requirements.

Setting up and permanently operating such systems is much easier than many companies and departments still see it. Because Libelle and BasisTeam support with reliable best practices.

The technical planning

During functional planning, we typically work with your application, process and data protection managers to define the optimal operating model. This includes

the definition of the concrete requirements for the analysis system, based on this
the pre-selection of possible cloud providers including PoC planning,
a decision template regarding the ultimate cloud platform,
the detailed planning of the implementation phase and the operational phase,
the support of the implementation
And if required: support in the operating phase

The implementation project

The technical planning is followed by the implementation phase, in which we work with you to either greenfield the analysis/reporting system in the cloud, i.e. build it from scratch, or migrate an existing system to the cloud (Migrate2Cloud). In doing so, we draw on the comprehensive know-how of our consultants and, depending on the implementation option, on software solutions such as Libelle DBShadow or Libelle SystemClone .

Regular operation

Regular operation is usually divided into four main phases:

Normal operation: The analysis system is used by the "usual" users for performance-neutral to simple queries and is operated with a "sufficient" performance profile.
Performance phase: With the announcement of large, performance-intensive analyses, the cloud system is inflated and operated with a "performance" performance profile. At the end of the performance phase, the performance profile is reset to normal operation.
Quiet phase: During periods of low or no work, the performance profile is turned down to "minimal" or the cloud system is completely shut down.
Refresh phase: During the refresh phase, the current production data is transferred to the analysis system - for SAP systems with the help of Libelle SystemCopy, for many non-SAP applications via Libelle DBCopy. An optimised performance profile is applied, which is automatically enlarged or reduced according to the process phase. After the transfer, personal or otherwise sensitive data is anonymised according to legal and professional requirements. The performance parameters of the cloud system are also automatically adjusted during these activities.

Next article Previous article