Article about Data, Hardware and Public Cloud

An article by Niels Pothmann, Head of AI and Andreas Tamm, Lead Enterprise Architect at Arvato Systems.

To benefit from Artificial Intelligence, companies must first create the necessary conditions for it. That concerns the corporate strategy as well as the overall organization and the available technology, which plays an essential role in the implementation of AI projects. The following article will outline which technological prerequisites are necessary and what companies should pay attention to in this context.

Increasing digitization as a data generator

In the course of progressive digitization, new data is created in companies in a wide variety of places. This data comes, among other things, from process monitoring and processing, customer touchpoints, and production lines. In the course of the Internet of Things, data is increasingly being generated directly from products and services. Typically, the information obtained from the generated data is stored in a wide variety of databases that are not connected. However, the potential of AI applications can often only be realized by connecting these individual databases, such as combining master data and transaction data for predictive maintenance applications.

Flexible data access

Machine Learning (ML) specialists first need access to data points in the various databases. In most cases, the individual databases are based on different technologies, are often not compatible with each other, and are also in productive use. Based on these initial parameters, it must be ensured that access to the databases can be carried out flexibly but with a high level of data security. For these reasons, most companies, therefore, insert a new data layer containing previously prepared, relevant data for Artificial Intelligence and Machine Learning. As one of several possibilities, the so-called Data Lake is very often used in this context. This mechanism ensures that data can be accessed flexibly while also following all data security requirements, such as the BDSG and DSGVO.

Luenendonk_Article_Pothmann and Tamm_bild3

Not only Big Data decisive, but the quality is what counts

The procedures used in AI, such as deep learning networks, cannot be trained and set up in the required quality without data. However, the following applies: more data does not necessarily lead to higher quality results. In principle, data of suitable quality and content for the underlying AI objective is required, and ideally in large quantities. An AI can, therefore, only be as good as the quality of the data with which it is fed. In reality, however, it is precisely this data quality that is often a significant challenge. Usually, the essential information in the data fields of applications is not maintained as intended but is sometimes ambiguous or stored in different ways. A further, in this case, technical challenge is that the various source systems are not updated in the same cycle. That means that the information available refers to different points in time, which again reduces the data quality. To achieve the required data quality, a considerable amount of maintenance is necessary. Data must either be cleaned up at the root - i.e., in the source system within which the data is created - or later by Machine Learning and data quality experts in the course of the individual creation of AI models.

Luenendonk_Article_Pothmann and Tamm_bild2

AI in the development process

The development of AI solutions often begins with the realization of prototypes, In which the feasibility of the target is tested by taking the company-specific data into account. AI solutions can make or prepare "intelligent" decisions based on data. They are always part of a larger whole - for example, an application, a service, or a technical backend - via which the underlying AI logic is made available to the end-user or a process. Added business value is, therefore, only created when the applications, IT systems, and services developed are integrated into the company's productive application processes. The training of AI models, which then starts, is an iterative process - the earlier the Machine Learning experts receive feedback on the model quality during the training process, the faster further improvements can be introduced. In addition to qualitatively suitable data sources, this also requires the right hardware. In particular, in the increasing context of Big Data, Natural Language or Computer Vision applications, computing power and the resulting increased speed play a key role. Because if the model training can be shortened from several days to a few hours, there are also considerable advantages in terms of development speed and performance of the AI development teams. The challenge here is to set up high-performance hardware clusters that are easily accessible with machine-learning-friendly tool stacks that match the technology of the data sources. The solution to these technical challenges is of high importance for an effective AI development process.

Public Cloud as an enabler

All the requirements described so far pose significant challenges in traditional IT departments. These certainly include both the provision of the right infrastructure and the need always to be able to provide the latest technologies and services. For this reason, it makes sense to rely on the public clouds of the significant technology providers (Amazon AWS, Microsoft Azure, Google Cloud). These platforms offer the possibility of using scalable state-of-the-art services at comparatively low prices. In the area of data, these include, for example, data factories for preparing the data, the data lakes mentioned above, in which large amounts of data can be stored, or services for visualizing the data. The cloud providers provide a wide range of different services, especially in Artificial Intelligence and Machine Learning. That starts with the pre-configured virtual machine with the corresponding AI/ML tools, goes on to fully managed services for creating models and extends to already trained models for voice, text and object recognition in images and videos. All these services are available on-demand and are offered in a highly scalable and highly accessible way. In this way, it is possible to concentrate on the essentials when developing AI solutions and thus achieve the most significant benefit for the company.

Luenendonk_Article_Pothmann and Tamm_bild1

Conclusion

In summary, the technical requirements for the implementation of AI projects can be described as following: Access to high-quality data, competence in system integration, and high-performance hardware combined with public cloud services are essential. Structured, professional support on using AI and operating the corresponding systems can also help to overcome potential challenges, especially as access to AI technologies is becoming increasingly standardized and straightforward.

This article is part of the Luenendonk Magazin, which can be found for download in its full length here (German only).

Data, Hardware, Public Cloud

Technical Requirements for the Application of Artificial Intelligence