Machine Learning / Python

"Machine Learning" is where Big Data meets Artificial Intelligence. Whereas Data Analysts have, in recent years, demonstrated some spectacular successes in the innovative interpretation of large data volumes, subtle correlations/weak signals often escape the human eye. When quantities of interest are correlated with a large number of data streams, some of them having contrary effects, it becomes necessary to apply computing techniques to identify and evaluate these correlations. "Supervised" Machine Learning systems rely on a large quantity of "examples" to "learn" to recognize such correlations. Modern programming languages such as Python have made it possible to create such models with relative ease. The analysis being written in Python, and the speed-critical code libraries in C/C++, offer the data scientist an optimal combination between speed of coding and speed of execution.

Using an example many might recognize from school days and/or work, the figure below shows the age and height (the x-markings/data points) for a set of trees. Trees generally grow with age, making it possible to draw an upwards sloping straight line closely matching the entire set of data points. This method is known as a linear regression.
An example of a linear regression where the points representing a tree's
              age and height are approximated by a line.
As a result, it is now possible to make a good prognosis for a tree's height, given its age. When a machine determines the height, by first calculating the line of best fit through the input data, it can be seen as a very simple example of machine learning. By adding further parameters, such as e.g. the fertility of the soil to more accurately determine the height, we carry out a multivariate regression. Many machine learning models use exactly this technique.

Data Science / Big Data

"Big Data" is a generic expression that refers to large amounts of unstructured and semi-structured data that is being produced at a high velocity. Modern mass-data handling technology, such as 'Hadoop' has enabled companies and institutions to capture all data that is produced either in-house or in its dealings with the environment. A small part of this data is well-known and has 'always' been processed, e.g. financial transactional data which is used for financial accounting.

With the digitalization of virtually all business processes, many of these have started to generate their own data. Important business opportunities may lie hidden in this data. Individual customer transactions generate an overview of revenue-over-time for individual customers and obviate the need for sampling and statistics ("n = all"). This may provide important insights for the honing of the marketing strategy and/or more exact production planning. Keeping track of production parameters for individual produce and analysis of these vs. rejects and/or customer complaints may allow for root-cause defect analysis.

Not all of this data is readily accessible for analysis. Much of it lies hidden in company ERP-Systems, system logs or even equipment data internal storages. Furthermore, only a small proportion of data is immediately useful. Finding value in the analysis of this data is the domain of the data scientist/analyst . The activity of generating insights from such data is referred to as data mining .

Corporate executive management must carefully embed and monitor big data projects within their corporate strategy so as to give them the necessaty credence to succeed. Past experience has shown that these projects generally fail or remain far below their potential without Top Management buy-in. Typically downfalls such as irrelevant results, departmental data egoism, business/IT language barriers can only be avoided by executive attention and managerial "business/strategy/IT bridge-building".

Artificial Intelligence

"Artificial Intelligence" (A.I.) is any form of intelligence exhibited by computing systems. The concept is somewhat poorly defined, which can be illustrated by the so-called Turing test for artificial intelligence. It is used to determine a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Already in the 1960s, machines turning typed statements by test-persons into questions were by many deemed to be intelligent.

At present, A.I. is most practically defined through its principal areas of development which include:
  • Pattern recognition including processing human speech (Natural Language Processing, NLP) including advanced forms of Business Intelligence
  • Game solving (e.g. Chess/Go)
  • Steering equipment and machines in e.g. production and households (robots), traffic (self-driving cars) etc.
  • "General A.I.", i.e. the ability of a machine to successfully perform any intellectual task that a human being can perform (presently still decades away).
Many of these applications use "machine learning" and/or "deep learning" (see the following section) and have already begun to be implemented in Business. NLP helps improve the user-perceived service from call-centers to human interfaces. Game-solving strategies underlay modern navigation systems etc. Leading law firms have started using pattern recognition systems to analyze texts to identify relevant legal precedents.

Autonomous Vehicles

Perhaps the most spectacular application of machine learning technology is the ongoing Research and Development of "autonomous vehicles". These are motored vehicles whose control is taken over by Machine Learning systems. They rely on input data from cameras, LIDAR* and other sensors. Through "training" during many hours of human-controlled driving, they identify the connection between the situation on the road and the appropriate action. Such actions include setting the correct direction, braking, accelerating etc. Similar systems are under development for ships and planes. The market-readiness of these systems is a subject of controversial discussions. Presumably they will start to make a substantial market-impact in the early 2020s. The introduction of these vehicles generates urgent questions regarding road regulation, dimensioning of transport systems, urban and housing development.

Deep Learning / Neural Networks

A special type of M.L. systems are neural networks which mimic the functioning of the brain, and are better suited to identify non-linear relationships rather than the more 'traditional' machine learning techniques. The recent construction of multi-/many-layered neural networks has become known as deep learning .

*Light Detection And Ranging
Selected Experience
Predictive Maintenance
Predictive Maintenance Feasibility Study
Cologne 2017
Big Data
Adressable Telco Spend Analysis (100,000 entr. Purchase Order book )
SEE 2016
Machine Learning
Sales-support for self-learning churn reduction system
Bonn 2012
Data Analytics
Construction of mathematical model for market size projection
Frankfurt 2008
© 2017 Dr. Eduard H. van Kleef