### Machine Learning for Biomedical Knowledge Discovery

### Springer Lecture Notes in Artificial Intelligence (LNAI)

As one outcome of our BIRS 15w2181 workshop we are currently producing a volume of Springer LNAI – State-of-the-Art (SOTA). All papers will be carefully reviewed by at least three reviewers from members of the international expert network HCI-KDD to ensure the high Springer quality standards.

##### Machine learning and Health

The life sciences are progressively turning into data-centric sciences. Machine learning can help to make sense of complex data. Consequently, big challenges in today’s biomedical domain are in the development of new methods, algorithms and tools for the effective analysis and interpretation of complex, high-dimensional biomedical data. Within such data sets, relevant structural and/or temporal patterns (“knowledge”) are often hidden, difficult to extract, thus not directly accessible to a biomedical expert. Consequently, a major challenge is in interactive Knowledge Discovery and Data Mining which relies heavily on machine learning approaches. However, many of the classical methods are based on the assumption that the data objects under consideration are represented in terms of feature vectors, or collections of attribute values, for example, argued that graphs have a representational power that is significantly higher than the representational power of feature vectors. Moreover, graph-theory provides powerful tools to map data structures and to find novel connections between data objects and allow the application of statistical and machine learning techniques. Methods from computational geometry and algebraic topology may also be of great help here, and could be combined with machine learning approaches, e.g. with evolutionary algorithms. Promising future research routes in this field are in interactive visual data mining.

The papers collected in this LNAI State-of-the-art (SOTA) volume report on various challenges, problems, methods, algorithms and tools to discover knowledge from complex data sets – applicable for the life sciences, biomedicine and health.

##### The Banff Meeting:

The 7th International Special Session of the expert network HCI-KDD was taking place at the Banff International Research Station for Mathematical Innovation and Discovery from July, 24-26, 2015.

**Full Title:** Advances in interactive Knowledge Discovery and Data Mining in complex data

**Short Title:** Interactive Knowledge Discovery (iKDD)

**Type:** 2 Day Workshop, BIRS-Approval Nr. 15w2181

**Dates:** 2015 July, Friday, 24th to Sunday, 26th

**AMS Subject Area:** Data Mining

**AMS Subject Code 1:** 68-Computer Science

**AMS Subject Code2:** 92-Biology and other Natural Sciences

Videos from BIRS workshop 15w2181

**Overview: **

In the past, research in data mining has concentrated on developing efficient machine learning algorithms for analysing big data sets arising in the biomedical domain, such as for discovering patterns in biomedical data or for modelling uncertainties in clinical decision-making. With the increasing amount of biomedical and health-care data becoming available everyday, structured learning and graphical models, such as probabilistic dependency networks, probabilistic decision trees, Bayesian networks and Markov Random Fields, are becoming popular tools in biomedical data mining research. Many problems in the area can be formulated as probabilistic inference problems.

**Objectives:**

This workshop will try to borrow and adapt diverse theoretical innovations on probabilistic models and related machine learning methods from other areas and will focus on probabilistic-based data mining methods, including graph-based data mining, topological data mining and other information-theoretic-based approaches (e.g., entropy-based data mining), as well as on the “human-in-the-loop” concept, supported by an interactive learning and optimization component and in visual analysis of heterogeneous and dynamic data sets. For example in network-based approaches, statistical extensions of graph theoretical approaches, visualizing networks, epistemological meaning of inferred networks, structural analysis of networks, comparative analysis of networks and network-based biomarkers are challenges, to mention only a few. Classical mathematical techniques do often not fit well the task of analyzing, comparing, classifying, retrieving complex data sets. Topology (and in particular algebraic topology) is, by its very nature, the part of mathematics which formalizes qualitative aspects of objects; therefore topological data processing and topological data mining well integrates with more classical mathematical tools. For example, persistent homology combines geometry and algebraic topology in the study of pairs *(X,f)* where* X* is an object (topological space) and *f* is a continuous function defined on *X* (typically with real values). One application is the extraction of topological features of an object out of a cloud of sample points. Features are key to learning and understanding. Another class of applications uses *f* as a formalization of a classification criterion; in this case various functions can give different criteria, cooperating in a complex classifier. Several problems arise from such settings: One, in the application context, is the choice of suitable functions *f*. This is generally done heuristically, but it would be necessary to have parametrized spaces of such functions and eventually a self-driving, optimized choice of *f* for statistical learning. Another challenge is the construction of good distances. The ones presently available need exponential computation. A third problem concerns functions with multidimensional range: functions from *X* to *R* give rise to diagrams whose information is condensed in a discrete (mostly finite) set of points in the plane; but if the range is R^k, the same information is carried by (2k-2) dimensional patches in R^2k. A one-dimensional reduction is available, but it raises computational problems in applications.

**Press Release:**

A worldwide problem in health systems is how to deal with increasingly large and complex sets of heterogeneous, high-dimensional data and increasing amounts of unstructured information. The trend towards preventive, predictive, participatory and personalized medicine, i.e., precision medicine (P4), has resulted in an explosion in the amount of generated omics-data, rarely used in clinical routine. This workshop will bring together researchers with diverse backgrounds, complementary competencies, but common interests and a shared vision: to make sense of big data by using machine learning with the “human-in-the-loop”. This workshop particularly tries to contribute advancements in promising novel areas, e.g. in interactive graph-based, entropy-based and topological data mining.

**Keywords:**

Interactive machine learning, complex data, human-in-the-loop, cognitive computing, personalized medicine

**Organizers:**

Andreas HOLZINGER, Medical University Graz, Institute for Medical Informatics, AT <expertise>

Randy GOEBEL, University of Alberta, Centre for Machine Learning, Department of Computer Science, CA

Vasile PALADE, University of Oxford and Coventry, Cogent Computing Applied Research Centre, UK <expertise>

Massimo FERRI, University of Bologna, Department of Mathematics, IT <expertise>

##### Confirmed Participants:

Mirko CESARINI, Università di Milano Bicocca, Department of Statistics and Quantitative Methods, IT, <expertise>

Nitesh V. CHAWLA, University of Notre Dame, Data, Inference, Analytics and Learning Lab, US <expertise>

Sou-Cheng Terrya CHOI, NORC and Department of Applied Mathematics, Illinois Insitute of Technology, US <expertise>

Massimo FERRI, University of Bologna, Department of Mathematics, IT <expertise>

Randy GOEBEL, University of Alberta, Centre for Machine Learning, Department of Computer Science, CA <expertise>

Sibylle HESS, Dortmund University of Technology, Artificial Intelligence Unit, Student at the SFB 876, DE

Andreas HOLZINGER, Medical University Graz, Institute for Medical Informatics and CBmed Center for Biomarker Research, AT <expertise>

Katharina HOLZINGER, Karl Franzens University Graz, Student at the Faculty of Natural Sciences, AT <expertise>

Mateusz JUDA, MROZEK Group, Jagiellonian University, Institute of Computer Science, Krakow, PL <expertise>

Lek-Heng LIM, University of Chicago, Department of Statistics, Computational Mathematics Initiative, US <expertise>

Katharina MORIK, Dortmund University of Technology, Artificial Intelligence Unit, DE <expertise>

Sayan MUKHERJEE, Duke University, Department of Statistical Science, Computer Science and Mathematics, US <expertise>

Monica NICOLAU, Stanford School of Medicine, Center for Cancer Systems Biology, US <expertise>

Vasile PALADE, University of Oxford and Coventry, Cogent Computing Applied Research Centre, UK <expertise>

Yuzuru TANAKA, Hokkaido University, Information Science and Technology, JP <expertise>

**Supporters (original list):**

Mikhail BELKIN, Ohio State University, Computer Science and Engineering, Center for Cognitive Science, US <expertise>

Mounir BEN AYED, Ecole Nationale d’Ingenieurs de Sfax, Research Group Intelligent Machines, TN <expertise>

Jean-Francois BOULICAUT, Universite de Lyon, Data Mining and Machine Learning Team, FR <expertise>

Mirko CESARINI, Università di Milano Bicocca, Department of Statistics and Quantitative Methods, IT, <expertise>

Polo CHAU, Georgia Tech, College of Computing, School of Computational Science & Engineering, US <expertise>

Nitesh V. CHAWLA, University of Notre Dame, Data, Inference, Analytics and Learning Lab, US <expertise>

Lidija COMIC, University of Novi Sad, Faculty of Technical Sciences, Department for Fundamental Disciplines, RS <expertise>

Matthias DEHMER, University for Health and Life Sciences, Department of Bioinformatics, AT <expertise>

Frank EMMERT-STREIB, Queens University Belfast, Computational Biology and Machine Learning Lab <expertise>

Barbara Di FABIO, University of Bologna, Department of Mathematics, IT

Dimitrios GUNOPULOS, University of Athens, Knowledge Discovery in Databases Lab, GR <expertise>

Michael HOULE, National Institute of Informatics, Tokyo, JP

Jesse JOHNSON, Oklahoma State University, Department of Mathematics, US

Mateusz JUDA, MROZEK Group, Jagiellonian University, Institute of Computer Science, Krakow, PL <expertise>

Igor JURISICA, IBM Life Sciences Discovery Centre and Princess Margarete Cancer Center Toronto, CA

Mei KOBAYASHI, ex. IBM Tokyo Laboratory Tokyo, JP

Claudia LANDI, University of Modena, Department of Engineering Science and Methods, IT <expertise>

Sangkyun LEE, Dortmund University of Technology, Artificial Intelligence Unit, DE <expertise>

Lek-Heng LIM, University of Chicago, Department of Statistics, Computational Mathematics Initiative, US <expertise>

Sayan MUKHERJEE, Duke University, Department of Statistical Science, Computer Science and Mathematics, US <expertise>

Katharina MORIK, Dortmund University of Technology, Artificial Intelligence Unit, DE <expertise>

Monica NICOLAU, Stanford School of Medicine, Center for Cancer Systems Biology, US <expertise>

Zoran OBRADOVIC, Temple University, Data Analytics and Biomedical Informatics Center, US <expertise>

Massimiliano PONTIL, University College London, Centre for Computational Statistics and Machine Learning, UK <expertise>

Joerg SANDER, University of Alberta, Department of Computing Science, CA <expertise>

Raul RABADAN, Columbia University, Center for Computational Biology and Bioinformatics, US <expertise>

Michele SEBAG, Universite Paris Sud, Machine Learning Group, FR <expertise>

Arno SIEBES, Universiteit Utrecht, Artificial Intelligence and Algorithmic Data Analysis Group, NL

Dan SIMOVICI, University of Massachusetts, Department of Computer Science, US <expertise>

Yongtang SHI, Nankai University, Center for Combinatorics, CN <expertise>

Yuzuru TANAKA, Hokkaido University, Information Science and Technology, JP <expertise>

David WINDRIDGE, University of Surrey, Machine Learning Group, UK <expertise>

Gane Ka-Shu WONG, University of Alberta, BGI-Shenzhen, Department of Biological Sciences, Edmonton, CA <expertise>

Juan D. VELASQUEZ, Universidad de Chile, Web Intelligence Research Centre, CL <expertise>

Karin VERSPOOR, Melbourne University, Department of Computing and Information Systems, AU <expertise>

Osmar ZAIANE, University of Alberta, Department of Computing Science, Edmonton, CA <expertise>

Ning ZHONG, Maebashi Institute of Technology, Knowledge Infromation Systems Laboratory, JP

Eric XING, Carnegie Mellon University, Machine Learning Department, US <expertise>

Banff, Alberta > Travel Information

Below you can see som impressions from BIRS workshop 15w2181