## Successful Machine Learning …

#### … requires a comprehenisve understanding of the data ecosystem and a concerted effort of 7 research-tracks: 1) data integration, 2) learning algorithms, 3) graphs, 4) topology, 5) entropy, 6) visualization, and 7) privacy, data protection, safety and security (see > Conference CD-MAKE):

01 TRACK DAT – DATA INTEGRATION

*Motivation for our research:* Before we may apply machine learning algorithms to heterogenous and complex data sets we have to work carefully on data integration, data fusion and data mapping and on peforming data pre-processing to avoid the danger of modelling artifacts or to avoid of losing information. Data integration in the life sciences is generally a hot topic in the international research community – to date there is no solution on fusing e.g. *omics data with the electronic patient record.

*Our goal:* We want to understand the underlying physics of complex, high-dimensional and weakly-structured data sets.

*Keywords:* data preprocessing, data fusion, data integration, data mapping, data cleansing, multivariate data, high-dimensional data, complex data, sparse data, heterogeneous data, relevant data, dirty data, noisy data, big data, little data, medical data, biological data, biomedical data, *omics data (e.g. from genomics, proteomics, metabolomics, etc.).

*Key conferences:* CD-MAKE, DILS, ICDE, DEXA, ITBAM, DATA

*Key journals:* KAIS,

**02 CENTRAL TRACK *ML – MACHINE LEARNING (*aML = automatic; iML= interactive)**

*Motivation for our research:* Machine learning evolved from statistical learning theory and we are undertaking theoretical, algorithmical and experimental machine learning studies to try to understand the problem of knowledge discovery. aML for big data and computational solvable problems, iML for small and complex data and NP-hard problems. We are interested in evolutionary algorithms, nature inspired computing and multi-agent systems.

*Our goal:* Our long term goal is to contribute advancements in ML and to apply it to knowledge discovery for health.

*Keywords:* Deep learning, nature inspired computing, boltzmann machines, statistical learning theory, evolutionary algorithms, genetic programming, ensemble learning, optimization, natural computation, swarm intelligence, multi-agents, neural computation, integrative data mining.

*Key conferences:* CD-MAKE, NIPS, ICMLA, ICML, ECML, PKDD, DEXA, BIH, KDD, COLT, DS, ICDM, SIGKDD, topic can be found at many conferences …

*Key journals:* KAIS, JMLR, MACH, INS, topic can be found in many journals

Applied journals: BMC Medical Informatics and Decision Making, and many others

03 TRACK GDM – GRAPH-BASED DATA MINING

*Motivation for our research:* Graph theory provides powerful tools to map data structures and to discover novel connections among data sets. Graph structures can be analyzed by using statistical and machine learning techniques.

*Our goal: * Our goal is to develop – blending with graph-entropy and multi-touch interaction – promising novel approaches for interactive knowledge discovery, which requires investigations beyond small world and random networks.

*Keywords:* Watershed algorithm, region splitting, graph cuts, region merging, mathematical morphology, network, relative neighborhood graph, multi-touch, graph manipulation, graph interaction, sphere-of-influence graph, small world theory, cyclicity, branching.

*Key conferences:* ICDM, HPGM, ICDM, MLDM

*Key journals:* KAIS, INS

04 TRACK TOP – TOPOLOGICAL DATA MINING

*Motivation for our research:* Often we are confronted with point cloud data sets sampled from an unknown high-dimensional space. We use the shape of data to identify features in the data aiming at recovering the topology of the space. Topological Data Analysis has a strong connection to machine learning.

*Our goal:* We seek to gain a deep understanding of the underlying data structures and to contribute towards the application of toplogical data analysis for advances in machine learning.

*Keywords:* Topology and data, point cloud data sets, Euclidean space, metrics, measure theory, topological text mining, algebraic topology, alpha shapes, betti number, homology group, computational geometry, contour, delaunay, gromov-norm, Hausdroff space, homomorphism, homological algebra, homotopy, isometry, reeb graph, simplex, simplicial complex, combinatorics.

*Key conferences:* CTIC, TOPOVIS,

*Key journals:* topic can be found in many different journals

05 TRACK EDM – ENTROPY-BASED DATA MINING

*Motivation for our research:* Information entropy, originally used by Shannon (1949) as a measure of uncertainty in data, has evolved into a vast own research area providing advancement in many different directions.

*Our goal: * Our goal is to contribute towards advances in the application of learning algorithms and entropy for the use in knowledge discovery and data mining to discover unknown unknowns in complex data sets, e.g. for biomarker discovery in biomedical data sets.

*Keywords:* Topological entropy, graph entropy, sample entropy, FiniteTopEn, approximate entropy, information theory, artifact, data quality, dirty data, dirty time oriented data, longitudinal data, dynamical system, heart-rate variability, noise, Shannon entropy, Lebesque, Measure theory.

*Key conferences:* topic can be found at many conferences

*Key journals:* ENTROPY, KAIS, INS

06 TRACK DAV – DATA VISUALIZATION

*Motivation for our research:* Humans are excellent at pattern recognition in dimensions of less than 3, however, most biomedical data sets are in dimensions much higher than 3 making manual analysis often impossible.

*Our goal: * Our goal is to reduce the dimensionality of results from this arbitrarly high dimensional spaces into the lower dimensions.

*Keywords:* Data Visualization, Human-computer interaction, dimensionality reduction, visual analytics, visual data mining, scalable network visualization, graph visualization, networks, topological visualization, interactive visual analytics, IVA, factor analysis, decision tree, multi-dimensional scaling, MDS, regression analysis.

*Key conferences*: CGVC, EuroVIS, Visigrapp

*Key journals:* topic can be found in many journals

07 TRACK DAP – DATA PRIVACY, SAFETY, SECURITY AND DATA PROTECTION

*Motivation for our research:* As soon as you deal with any sort of human personal data (e.g. medical data sets), issues of privacy, data protection, safety, and security are mandatory. This includes also issues of ethics and acceptance.

*Our goal:* We want to to contribute to the international research community with the generation of open data sets, in order to support the international research community and to make results openly available and replicable – a major goal in fundamental science.

*Keywords:* Privacy, Data Protection, Safety, Security, Anonymization, data-driven sciences, big data, pseudonymization, k-anonymity, business intelligence, data linkage, data precision metric, generalization, identifier, l-diversity, permutation, perturbation, quasi-identifier, open data, open data sets, open source frameworks, rule mining, structured data, suppression, t-closeness, privacy preserving data mining, fair use of data, life sciences, biological sciences, medical practice, health services, clinical workplace.

*Key conferences:* ARES, CD-MAKE

*Key journals:* topic can be found in many journals