machine learning for health informatics

LNAI 9605 Machine Learning for Health Informatics available

14.12.2016 LNAI 9605 just appeared

Machine Learning for Health Informatics Lecture Notes in Artificial Intelligence LNAI 9605

Holzinger, Andreas (ed.) 2016. Machine Learning for Health Informatics: State-of-the-Art and Future Challenges. Cham: Springer International Publishing, doi:10.1007/978-3-319-50478-0

[book homepage]

Machine learning (ML) is the fastest growing field in computer science, and Health Informatics (HI) is amongst the greatest application challenges, providing future benefits in improved medical diagnoses, disease analyses, and pharmaceutical development. However, successful ML for HI needs a concerted effort, fostering integrative research between experts ranging from diverse disciplines from data science to visualization.

Tackling complex challenges needs both disciplinary excellence and cross-disciplinary networking without any boundaries. Following the HCI-KDD approach, in combining the best of two worlds, it is aimed to support human intelligence with machine intelligence.

This state-of-the-art survey is an output of the international HCI-KDD expert network and features 22 carefully selected and peer-reviewed chapters on hot topics in machine learning for health informatics; they discuss open problems and future challenges in order to stimulate further research and international progress in this field.

Machine Learning with Fun

Google Research hosts a number of very interesting so-called A.I. experiments. There you can play with machine learning algorithms in a very amusing way. A recent example is QUICK, DRAW *). This is an online guessing game that challenges humans to hand sketch (called doodles) a given object. The game uses a  neural network to learn from the input data

https://quickdraw.withgoogle.com

which is part of the A.I. Experiments platform:

https://aiexperiments.withgoogle.com

and here the explanatory video:
https://www.youtube.com/watch?v=oOwfiYnRi5c

Have fun and enjoy!

Here you see more than 100.000 hedgehog drawings made by humans on the internet:

https://quickdraw.withgoogle.com/data/hedgehog

*) not to be confused with QuickDraw [1], which is a sketch-based drawing tool facilitating to draw precise geometry diagrams,  and can automatically recognize sketched diagrams containing components such as line segments and circles, infer geometric constraints relating recognized components, and use this information to “beautify” the sketched diagram. This “Beautification” is based on an algorithm that iteratively computes various sub-components of the components using an extensible set of deductive rules.

[1] Cheema, S., Gulwani, S. & Laviola, J. QuickDraw: improving drawing experience for geometric diagrams. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012. ACM, 1037-1064. doi: 10.1145/2207676.2208550

[2] https://experiments.withgoogle.com/ai

Visualization of High Dimensional Data

Google is doing experiments with visualization of high dimenisonal data. This experiment helps visualize what’s happening in machine learning. It allows coders to see and explore their high-dimensional data. The goal is to eventually make this an open-source tool within TensorFlow, so that any coder can use these visualization techniques to explore their data.
Built by Daniel Smilkov, Fernanda Viégas, Martin Wattenberg, and the Big Picture team at Google:
This work is based on a method developed by Laurens van der Maaten & Geoffrey Hinton in 2008:
Maaten, L. V. D. & Hinton, G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 11, 2579-2605, https://www.jmlr.org/papers/v9/vandermaaten08a.html
t-Distributed Stochastic Neighbor Embedding (t-SNE, spoken: Disney) is a (prize-winning) nonlinear technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional data sets into R2 or R3. The technique can be implemented via Barnes-Hut approximations, allowing it to be applied on large real-world datasets (“big data”).
For details please refer directly to:
Compare this method to our own work on subspace clustering:

Obama on humans-in-the-loop

How artificial intelligence will affect jobs

In an discussion with Barack OBAMA [1] on how artificial intelligence will affect jobs, he emphasized how important human-in-the-loop machine learning will become in the future. Trust, transparency and explainabiltity will be THE driving factors of future AI solutions! The discussion interview was led by the Wired [2] Editor Scott DADICH, and MIT Media Lab [3] Director Joi ITO. I recommend my students to watch the full video. Barack Obama demonstrates a  good understanding of the field and indicates indirectly the importance of our research in the the human-in-the-loop approach [4], despite all progress towards fully automatic approaches and autonomous systems.

More information see:

[1] Barack Obama was the 44th President of the United States of America and was in office from January, 20, 2009 to January, 20, 2017. He was born August, 4, 1961 in Honolulu (Hawaii)

[2] Wired is a monthly tech magazine which reports since 1993 on how emerging technologies may affect culture, politics, economics. Very interesting to note is that Wired is known for coning the popular terms “long tail” and “crowdsourcing”. https://www.wired.com

[3] The MIT Media Lab is an interdisciplinary research lab at the Massachusetts Institute of Technology in Cambridge (MA), which is part of the Boston metropolitan area in the north, just across the Charles River – not far way from the Harvard Campus.

[4] Holzinger, A., Plass, M., Holzinger, K., Crisan, G.C., Pintea, C.-M. & Palade, V. 2017. A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop. arXiv:1708.01104

 

 

Google releases their Syntactic Parser Open Source

Google researchers spend a lot of time thinking about how computer systems can read and understand human language in order to process it in intelligent ways. On May, 12, 2016 Slav Petrov (expertise) based in New York and leading the machine learning for natural language group (Slav Petrov Page), announced that they released SyntaxNet as an open-source neural network framework implemented in TensorFlow that provides a new foundation for Natural Language Understanding (NLU) . The release includes all code needed to train new SyntaxNet models on own data, as well as Parsey McParseface, an English parser that the Googlers have trained and that can be used to analyze English text. Parsey McParseface is built on powerful machine learning algorithms that learn to analyze the linguistic structure of language, and that can explain the functional role of each word in a given sentence.

Read more:
https://googleresearch.blogspot.co.at/2016/05/announcing-syntaxnet-worlds-most.html

Literature:

Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S. & Collins, M. 2016. Globally normalized transition-based neural networks. arXiv preprint arXiv:1603.06042.

Petrov, S., Mcdonald, R. & Hall, K. 2016. Multi-source transfer of delexicalized dependency parsers. US Patent 9,305,544.

Weiss, D., Alberti, C., Collins, M. & Petrov, S. 2015. Structured Training for Neural Network Transition-Based Parsing. arXiv:1506.06158.

Vinyals, O., Kaiser, Ł., Koo, T., Petrov, S., Sutskever, I. & Hinton, G. Grammar as a foreign language. Advances in Neural Information Processing Systems, 2015. 2755-2763.

Human-in-the-Loop

Interactive machine learning for health informatics: when do we need the human-in-the-loop?

Machine learning (ML) is the fastest growing field in computer science, and health informatics is among the greatest challenges. The goal of ML is to develop algorithms which can learn and improve over time and can be used for predictions. Most ML researchers concentrate on automatic machine learning (aML), where great advances have been made, for example, in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches greatly benefit from big data with many training sets. However, in the health domain, sometimes we are confronted with a small number of data sets or rare events, where aML-approaches suffer of insufficient training samples. Here interactive machine learning (iML) may be of help, having its roots in reinforcement learning, preference learning, and active learning. The term iML is not yet well used, so we define it as “algorithms that can interact with agents and can optimize their learning behavior through these interactions, where the agents can also be human.” This “human-in-the-loop” can be beneficial in solving computationally hard problems, e.g., subspace clustering, protein folding, or k-anonymization of health data, where human expertise can help to reduce an exponential search space through heuristic selection of samples. Therefore, what would otherwise be an NP-hard problem, reduces greatly in complexity through the input and the assistance of a human agent involved in the learning phase. Most of all the human in the loop can bring in conceptual knowledge, “intuition”, expertise and explicit knowledge which current AI is completely lacking!

We define iML-approaches as algorithms that can interact with both computational agents and human agents *) and can optimize their learning behavior through these interactions.

*) In active learning such agents are referred to as the so-called “oracles”

From black-box to glass-box: where is the human-in-the-loop?

The first question we have to answer is: “What is the difference between the iML-approach to the aML-approach, i.e., unsupervised learning, supervised, or semi-supervised learning?”

Scenario D – see slide below – shows the iML-approach, where the human expert is seen as an agent directly involved in the actual learning phase, step-by-step influencing measures such as distance, cost functions, etc.

Obvious concerns may emerge immediately and one can argue: what about the robustness of this approach, the subjectivity, the transfer of the (human) agents; many questions remain open and are subject for future research, particularly in evaluation, replicability, robustness, etc.

Human-in-the-loop - Interactive Machine Learning

The iML-approach

Read full article here:
https://link.springer.com/article/10.1007/s40708-016-0042-6/fulltext.html
https://www.mendeley.com/catalog/interactive-machine-learning-health-informatics-we-need-humanintheloop

Yahoo Labs released largest-ever annonymized machine learning data set for researchers

In January 2016, Yahoo announce the public release of the largest-ever machine learning data set to the international research community. The data set stands at a massive ~110B events (13.5TB uncompressed) of anonymized user-news item interaction data, collected by recording the user-news item interactions of about 20M users from February 2015 to May 2015.

see: https://yahoolabs.tumblr.com/post/137281912191/yahoo-releases-the-largest-ever-machine-learning

 

January, 27, 2016, Major breakthrough in AI research …

Mastering the game of Go with deep neural networks and tree search – a very recent paper in Nature:

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529, (7587), 484-489.

https://www.nature.com/nature/journal/v529/n7587/full/nature16961.html

Go (in Chinese: 圍棋 , in Japanese 囲碁) is a two-player board strategy game (EXPTIME-complete, resp. PSPACE-complete) for two players aiming to surround more territory than the opponent; the number of he number of possible moves is enormous (10761 with a 19 x 19 board) compared to approximately 10120 in chess with a 8 x 8 board) – despite simple rules. 

According to the new article by Silver et al (2016),  Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. The authors introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play.  The authors introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, the program AlphaGo (see: https://deepmind.com/alpha-go.html)  achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

There is also a news report on BBC:

https://www.bbc.com/news/technology-35420579

Congrats to the Google Deepmind people!

 

Science Magazine Vol.350, Issue 6266

A proof of the importance of the human-in-the-loop

Again machine learning made it to the title page of Science: A nice further proof for the importance of the human-in-the-loop by a paper of

Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. 2015. Human-level concept learning through probabilistic program induction. Science, 350, (6266), 1332-1338.

Whilst humans can learn new concepts often from a very few examples, automated machine learning (aML) methods ususally need many examples (often called: big data) to perform with similar accuracy (and with the danger of modelling artefacts, e.g. through overfitting).  The authors present a computational model  which captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets. The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. Very interesting is the fact that on a challenging one-shot classification task, this model achieves human-level performance and outperforms recent deep learning approaches!

The authors also present several “visual Turing tests” probing the model’s creative generalization abilities, which in many cases are indistinguishable from human behavior – a must read at: https://www.sciencemag.org/content/350/6266/1332.full

Machine Learning in Nature again

Lecun, Y., Bengio, Y. & Hinton, G. 2015. Deep learning. Nature, 521, (7553), 436-444.

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

More information: https://www.nature.com/nature/journal/v521/n7553/full/nature14539.html

Nature Issue 7553 contains a special about computational intelligence!

https://www.nature.com/nature/current_issue.html