Interactive machine learning for health informatics: when do we need the human-in-the-loop?
Machine learning (ML) is the fastest growing field in computer science, and health informatics is among the greatest challenges. The goal of ML is to develop algorithms which can learn and improve over time and can be used for predictions. Most ML researchers concentrate on automatic machine learning (aML), where great advances have been made, for example, in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches greatly benefit from big data with many training sets. However, in the health domain, sometimes we are confronted with a small number of data sets or rare events, where aML-approaches suffer of insufficient training samples. Here interactive machine learning (iML) may be of help, having its roots in reinforcement learning, preference learning, and active learning. The term iML is not yet well used, so we define it as “algorithms that can interact with agents and can optimize their learning behavior through these interactions, where the agents can also be human.” This “human-in-the-loop” can be beneficial in solving computationally hard problems, e.g., subspace clustering, protein folding, or k-anonymization of health data, where human expertise can help to reduce an exponential search space through heuristic selection of samples. Therefore, what would otherwise be an NP-hard problem, reduces greatly in complexity through the input and the assistance of a human agent involved in the learning phase.
We define iML-approaches as algorithms that can interact with both computational agents and human agents *) and can optimize their learning behavior through these interactions.
*) In active learning such agents are referred to as the so-called “oracles”
From black-box to glass-box: where is the human-in-the-loop?
The first question we have to answer is: “What is the difference between the iML-approach to the aML-approach, i.e., unsupervised learning, supervised, or semi-supervised learning?”
Scenario D – see slide below – shows the iML-approach, where the human expert is seen as an agent directly involved in the actual learning phase, step-by-step influencing measures such as distance, cost functions, etc.
Obvious concerns may emerge immediately and one can argue: what about the robustness of this approach, the subjectivity, the transfer of the (human) agents; many questions remain open and are subject for future research, particularly in evaluation, replicability, robustness, etc.
Read full article here:
We are organizing a special session on Privacy Aware Machine Learning for Health Data Science at the 11th international Conference on Availability, Reliability and Security (ARES and CD-ARES), Salzburg, Austria, August 29 – September, 2, 2016
Machine learning is the fastest growing field in computer science [Jordan, M. I. & Mitchell, T. M. 2015. Machine learning: Trends, perspectives, and prospects. Science, 349, (6245), 255-260], and it is well accepted that health informatics is amongst the greatest challenges [LeCun, Y., Bengio, Y. & Hinton, G. 2015. Deep learning. Nature, 521, (7553), 436-444 ], e.g. large-scale aggregate analyses of anonymized data can yield valuable insights addressing public health challenges and provide new avenues for scientific discovery [Horvitz, E. & Mulligan, D. 2015. Data, privacy, and the greater good. Science, 349, (6245), 253-255]. Privacy is becoming a major concern for machine learning tasks, which often operate on personal and sensitive data. Consequently, privacy, data protection, safety, information security and fair use of data is of utmost importance for health data science.
The amount of patient-related data produced in today’s clinical setting poses many challenges with respect to collection, storage and responsible use. For example, in research and public health care analysis, data must be anonymized before transfer, for which the k-anonymity measure was introduced and successively enhanced by further criteria. As k-anonymity is an NP-hard problem, which cannot be solved by automatic machine learning (aML) approaches we must often make use of approximation and heuristics. As data security is not guranteed given a certain k-anonymity degree, additional measures have been introduced in order to refine results (l-diversity, t-closeness, delta-presence). This motivates methods, methodologies and algorithmic machine learning approaches to tackle the problem. As the resulting data set will be a tradeoff between utility, usability and individual privacy and security, we need to optimize those measures to individual (subjective) standards. Moreover, the efficacy of an algorithm strongly depends on the background knowledge of an potential attacker as well as the underlying problem domain. One possible solution is to make use of interactive machine learning (iML) approaches and put a human-in-the-loop where the central question remains open: “could human intelligence lead to general heuristics we can use to improve heuristics?”
Research topics covered by this special session include but are not limited to the following topics:
– Production of Open Data Sets
– Synthetic data sets for learning algorithm testing
– Privacy preserving machine learning, data mining and knowledge discovery
– Data leak detection
– Data citation
– Differential privacy
– Anonymization and pseudonymization
– Securing expert-in-the-loop machine learning systems
– Evaluation and benchmarking
This special session will bring together scientists with diverse background, interested in both the underlying theoretical principles as well as the application of such methods for practical use in the biomedical, life sciences and health care domain. The cross-domain integration and appraisal of different fields will provide an atmosphere to foster different perspectives and opinions; it will offer a platform for novel crazy ideas and a fresh look on the methodologies to put these ideas into business.
Accepted Papers will be published in a Springer Lecture Notes in Computer Science LNCS Volume.
I) Deadline for submissions: April, 30, 2016
Paper submission via:
II) Camera Ready deadline: July, 4, 2016
The International Scientific Committee – consisting of experts from the international expert network HCI-KDD dealing with area (7), privacy, data protection, safety and security and additionally invited international experts will ensure the highest possible scientific quality, each paper will be reviewed by at least three reviewers (the paper acceptance rate of the last special session was 35 %).
In January 2016, Yahoo announce the public release of the largest-ever machine learning data set to the international research community. The data set stands at a massive ~110B events (13.5TB uncompressed) of anonymized user-news item interaction data, collected by recording the user-news item interactions of about 20M users from February 2015 to May 2015.
Mastering the game of Go with deep neural networks and tree search – a very recent paper in Nature:
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529, (7587), 484-489.
Go (in Chinese: 圍棋 , in Japanese 囲碁) is a two-player board strategy game (EXPTIME-complete, resp. PSPACE-complete) for two players aiming to surround more territory than the opponent; the number of he number of possible moves is enormous (10761 with a 19 x 19 board) compared to approximately 10120 in chess with a 8 x 8 board) – despite simple rules.
According to the new article by Silver et al (2016), Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. The authors introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. The authors introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, the program AlphaGo (see: http://deepmind.com/alpha-go.html) achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
There is also a news report on BBC:
Congrats to the Google Deepmind people!
Date: Tuesday, 26th January 2016, Start: 10:00, End: 17:00; Venue: Graz University of Technology,
Institute of Computer Graphics and Knowledge Visualization CGV, hosted by Prof. Tobias SCHRECK
Address: Inffeldgasse 16c, A-8010 Graz <maps and directions>
Machine learning is the most growing field in computer science [Jordan, M. I. & Mitchell, T. M. 2015. Machine learning: Trends, perspectives, and prospects. Science, 349, (6245), 255-260], and it is well accepted that health informatics is amongst the greatest challenges [LeCun, Y., Bengio, Y. & Hinton, G. 2015. Deep learning. Nature, 521, (7553), 436-444 ].
Sucessful Machine Learning for Health Informatics requires a comprehensive understanding of the data ecosystem and a multi-disciplinary skill-set, from seven specializations: 1) data science, 2) algorithms, 3) network science, 4) graphs/topology, 5) time/entropy, 6) data visualization and visual analytics, and 7) privacy, data protection, safety and security – as supported by the international expert network HCI-KDD.
We wish you a prosperous scientific 2016 with a lot of crazy ideas and successful breakthrough discoveries !
Again machine learning made it to the title page of Science: A nice further proof for the importance of the human-in-the-loop by a paper of
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. 2015. Human-level concept learning through probabilistic program induction. Science, 350, (6266), 1332-1338.
Whilst humans can learn new concepts often from a very few examples, automated machine learning (aML) methods ususally need many examples (often called: big data) to perform with similar accuracy (and with the danger of modelling artefacts, e.g. through overfitting). The authors present a computational model which captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets. The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. Very interesting is the fact that on a challenging one-shot classification task, this model achieves human-level performance and outperforms recent deep learning approaches!
The authors also present several “visual Turing tests” probing the model’s creative generalization abilities, which in many cases are indistinguishable from human behavior – a must read at: http://www.sciencemag.org/content/350/6266/1332.full
Machine Learning for Health Informatics
Machine learning is a large and rapidly developing subfield of computer science that evolved from artificial intelligence (AI) and is tightly connected with data mining and knowledge discovery. The ultimate goal of machine learning is to design and develop algorithms which can learn from data. Consequently, machine learning systems learn and improve with experience over time and their trained models can be used to predict outcomes of questions based on previously seen knowledge. In fact, the process of learning intelligent behaviour from noisy examples is one of the major questions in the field. The ability to learn from noisy, high dimensional data is highly relevant for many applications in the health informatics domain. This is due to the inherent nature of biomedical data, and health will increasingly be the focus of machine learning research in the near future.
Title: Coordination of post-translational modifications in human protein interaction network
Lecturer: Ulrich Stelzl, Network Pharmacology, Insitute of Pharmaceutical Sciences, Karl-Franzens University Graz
Abstract: Comprehensive protein interaction networks are prerequisite for a better understanding of complex genotype to phenotype relationships. Post – translational modifications (PTMs) regulate protein activity, stability and protein interaction (PPI) profiles critical for cellular functioning. In combined experimental and computational approaches, we want to elucidate the role of post – translational protein modifications, such as phosphorylation, for these dynamic processes and investigate how the large number of changing PTMs is coordinated in cellular protein networks and likewise how PTMs may modulate protein – protein interaction networks. We identified hundreds of protein complexes that selectively accumulate different PTMs i.e. phosphorylation, acetylation and ubiquitination. Also protein regions of very high PTM densities, termed PTMi spots, were characterized and show domain – like features. The analysis of phosphorylation – dependent interactions provides clues on how these PPIs are dynamically and spatially constrained to separate simultaneously triggered growth signals which are often altered in oncogenic conditions. Our data indicate coordinated targeting of specific molecular functions via PTMs at different levels emphasizing a protein network approach as requisite to better understand modification impact on cellular signaling and cancer phenotypes.
Short bio: Ulrich Stelzl studied Chemistry/Biochemistry at the TU Vienna and ETH Zürich. His PhD thesis (MPIMG, Berlin) and first PostDoc (MSKCC, New York) addressed detailed biochemical questions of RNA-protein recognition, such as the assembly and dynamics of ribonucleo-protein complexes in gene expression and regulation. Then at the MDC Berlin, Ulrich Stelzl contributed significantly to well recognized protein-protein interaction (PPI) studies such as the generation and analysis of the first human proteome scale PPI networks or the development of an empirical framework for human interactome mapping. The importance of the work and its interdisciplinary character was recognized by the Erwin Schrödinger Price 2008 of the German Helmholtz Society. From 2007 on, Ulrich Stelzl headed the Max-Planck Research Group “Molecular Interaction Networks” at the MPIMG, Berlin and joined recently the Department of Pharmaceutical Sciences of the University of Graz.