Papers due to April, 30, 2016: Privacy Aware Machine Learning (PAML) for Health Data Science

We are organizing a special session on Privacy Aware Machine Learning for Health Data Science at the 11th international Conference on Availability, Reliability and Security (ARES and CD-ARES), Salzburg, Austria, August 29 – September, 2, 2016

supported by the International Federation of Information Processing IFIPTC5 and WG 8.4 and WG 8.9
http://cd-ares-conference.eu
http://www.ares-conference.eu

Keynote Talk by Bernhard SCHÖLKOPF, Max Planck Institute for Intelligent Systems, Empirical Inference Department

Bernhard Schölkopf as Keynote Speaker at the ARES/CD-ARES conference in Salzburg

We are proud to welcome Bernhard Schölkopf as Keynote Speaker to the ARES/CD-ARES conference in Salzburg

Machine learning is the fastest growing field in computer science  [Jordan, M. I. & Mitchell, T. M. 2015. Machine learning: Trends, perspectives, and prospects. Science, 349, (6245), 255-260], and it is well accepted that health informatics is amongst the greatest challenges [LeCun, Y., Bengio, Y. & Hinton, G. 2015. Deep learning. Nature, 521, (7553), 436-444 ], e.g. large-scale aggregate analyses of anonymized data can yield valuable insights addressing public health challenges and provide new avenues for scientific discovery [Horvitz, E. & Mulligan, D. 2015. Data, privacy, and the greater good. Science, 349, (6245), 253-255]. Privacy is becoming a major concern for machine learning tasks, which often operate on personal and sensitive data. Consequently, privacy, data protection, safety, information security and fair use of data is of utmost importance for health data science.

The amount of patient-related data produced in today’s clinical setting poses many challenges with respect to collection, storage and responsible use. For example, in research and public health care analysis, data must be anonymized before transfer, for which the k-anonymity measure was introduced and successively enhanced by further criteria. As k-anonymity is an NP-hard problem, which cannot be solved by automatic machine learning (aML) approaches we must often make use of approximation and heuristics. As data security is not guranteed given a certain k-anonymity degree, additional measures have been introduced in order to refine results (l-diversity, t-closeness, delta-presence). This motivates methods, methodologies and algorithmic machine learning approaches to tackle the problem. As the resulting data set will be a tradeoff between utility, usability and individual privacy and security, we need to optimize those measures to individual (subjective) standards. Moreover, the efficacy of an algorithm strongly depends on the background knowledge of an potential attacker as well as the underlying problem domain. One possible solution is to make use of interactive machine learning (iML) approaches and put a human-in-the-loop where the central question remains open: “could human intelligence lead to general heuristics we can use to improve heuristics?”

Research topics covered by this special session include but are not limited to the following topics:

– Production of Open Data Sets
– Synthetic data sets for learning algorithm testing
– Privacy preserving machine learning, data mining and knowledge discovery
– Data leak detection
– Data citation
– Differential privacy
– Anonymization and pseudonymization
– Securing expert-in-the-loop machine learning systems
– Evaluation and benchmarking

This special session will bring together scientists with diverse background, interested in both the underlying theoretical principles as well as the application of such methods for practical use in the biomedical, life sciences and health care domain. The cross-domain integration and appraisal of different fields will provide an atmosphere to foster different perspectives and opinions; it will offer a platform for novel crazy ideas and a fresh look on the methodologies to put these ideas into business.

Accepted Papers will be published in a Springer Lecture Notes in Computer Science LNCS Volume.

Schedule:

I) Deadline for submissions: April, 30, 2016
Paper submission via:
http://cd-ares-conference.eu/?page_id=43

II) Camera Ready deadline: July, 4, 2016

III) Special Session: August, 30, 2016
> Conference Venue
> Travel Information Salzburg
> Lonely Planet Salzburg

The International Scientific Committee – consisting of experts from the international expert network HCI-KDD dealing with area (7), privacy, data protection, safety and security and additionally invited international experts will ensure the highest possible scientific quality, each paper will be reviewed by at least three reviewers (the paper acceptance rate of the last special session was 35 %).

 

Yahoo Labs released largest-ever annonymized machine learning data set for researchers

In January 2016, Yahoo announce the public release of the largest-ever machine learning data set to the international research community. The data set stands at a massive ~110B events (13.5TB uncompressed) of anonymized user-news item interaction data, collected by recording the user-news item interactions of about 20M users from February 2015 to May 2015.

see: http://yahoolabs.tumblr.com/post/137281912191/yahoo-releases-the-largest-ever-machine-learning

 

January, 27, 2016, Major breakthrough in AI research …

Mastering the game of Go with deep neural networks and tree search – a very recent paper in Nature:

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529, (7587), 484-489.

http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html

Go (in Chinese: 圍棋 , in Japanese 囲碁) is a two-player board strategy game (EXPTIME-complete, resp. PSPACE-complete) for two players aiming to surround more territory than the opponent; the number of he number of possible moves is enormous (10761 with a 19 x 19 board) compared to approximately 10120 in chess with a 8 x 8 board) – despite simple rules. 

According to the new article by Silver et al (2016),  Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. The authors introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play.  The authors introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, the program AlphaGo (see: http://deepmind.com/alpha-go.html)  achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

There is also a news report on BBC:

http://www.bbc.com/news/technology-35420579

Congrats to the Google Deepmind people!

 

Science Magazine Vol.350, Issue 6266

January, 26, 2016, Workshop “Machine Learning for Biomedicine” TU Graz

Date: Tuesday, 26th January 2016, Start: 10:00, End: 17:00; Venue: Graz University of Technology,
Institute of Computer Graphics and Knowledge Visualization CGV, hosted by Prof. Tobias SCHRECK
Address: Inffeldgasse 16c, A-8010 Graz <maps and directions>

Machine learning is the most growing field in computer science  [Jordan, M. I. & Mitchell, T. M. 2015. Machine learning: Trends, perspectives, and prospects. Science, 349, (6245), 255-260], and it is well accepted that health informatics is amongst the greatest challenges [LeCun, Y., Bengio, Y. & Hinton, G. 2015. Deep learning. Nature, 521, (7553), 436-444 ].

Sucessful Machine Learning for Health Informatics requires a comprehensive understanding of the data ecosystem and a multi-disciplinary skill-set, from seven specializations: 1) data science, 2)  algorithms, 3) network science, 4) graphs/topology, 5) time/entropy, 6) data visualization and visual analytics, and 7) privacy, data protection, safety and security – as supported by the international expert network HCI-KDD.

Program see: http://hci-kdd.org/machine-learning-for-biomedicine-tugraz/

Happy Scientific 2016

We wish you a prosperous scientific 2016 with a lot of crazy ideas and successful breakthrough discoveries !

Happy New 2016

Happy New Year from the Holzinger Group HCI-KDD

Science Magazine Vol.350, Issue 6266

A proof of the importance of the human-in-the-loop

Again machine learning made it to the title page of Science: A nice further proof for the importance of the human-in-the-loop by a paper of

Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. 2015. Human-level concept learning through probabilistic program induction. Science, 350, (6266), 1332-1338.

Whilst humans can learn new concepts often from a very few examples, automated machine learning (aML) methods ususally need many examples (often called: big data) to perform with similar accuracy (and with the danger of modelling artefacts, e.g. through overfitting).  The authors present a computational model  which captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets. The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. Very interesting is the fact that on a challenging one-shot classification task, this model achieves human-level performance and outperforms recent deep learning approaches!

The authors also present several “visual Turing tests” probing the model’s creative generalization abilities, which in many cases are indistinguishable from human behavior – a must read at: http://www.sciencemag.org/content/350/6266/1332.full

Workshop “Machine Learning for Health Informatics” November, 30, 2015

Workshop

Machine Learning for Health Informatics

Machine learning is a large and rapidly developing subfield of computer science that evolved from artificial intelligence (AI) and is tightly connected with data mining and knowledge discovery. The ultimate goal of machine learning is to design and develop algorithms which can learn from data. Consequently, machine learning systems learn and improve with experience over time and their trained models can be used to predict outcomes of questions based on previously seen knowledge. In fact, the process of learning intelligent behaviour from noisy examples is one of the major questions in the field. The ability to learn from noisy, high dimensional data is highly relevant for many applications in the health informatics domain. This is due to the inherent nature of biomedical data, and health will increasingly be the focus of machine learning research in the near future.

Program

http://hci-kdd.org/machine-learning-for-health-informatics/

December, 3, 2015 Seminar Talk on human protein interaction networks

Title: Coordination of post-translational modifications in human protein interaction network

Lecturer: Ulrich Stelzl, Network Pharmacology, Insitute of Pharmaceutical Sciences, Karl-Franzens University Graz

Abstract: Comprehensive protein interaction networks are prerequisite for a better understanding of complex genotype to phenotype relationships. Post – translational modifications (PTMs) regulate protein activity, stability and protein interaction (PPI) profiles critical for cellular functioning. In combined experimental and computational approaches, we want to elucidate the role of post – translational protein modifications, such as phosphorylation, for these dynamic processes and investigate how the large number of changing PTMs is coordinated in cellular protein networks and likewise how PTMs may modulate protein – protein interaction networks. We identified hundreds of protein complexes that selectively accumulate different PTMs i.e. phosphorylation, acetylation and ubiquitination. Also protein regions of very high PTM densities, termed PTMi spots, were characterized and show domain – like features. The analysis of phosphorylation – dependent interactions provides clues on how these PPIs are dynamically and spatially constrained to separate simultaneously triggered growth signals which are often altered in oncogenic conditions. Our data indicate coordinated targeting of specific molecular functions via PTMs at different levels emphasizing a protein network approach as requisite to better understand modification impact on cellular signaling and cancer phenotypes.

Short bio: Ulrich Stelzl studied Chemistry/Biochemistry at the TU Vienna and ETH Zürich. His PhD thesis (MPIMG, Berlin) and first PostDoc (MSKCC, New York) addressed detailed biochemical questions of RNA-protein recognition, such as the assembly and dynamics of ribonucleo-protein complexes in gene expression and regulation. Then at the MDC Berlin, Ulrich Stelzl contributed significantly to well recognized protein-protein interaction (PPI) studies such as the generation and analysis of the first human proteome scale PPI networks or the development of an empirical framework for human interactome mapping. The importance of the work and its interdisciplinary character was recognized by the Erwin Schrödinger Price 2008 of the German Helmholtz Society. From 2007 on, Ulrich Stelzl headed the Max-Planck Research Group “Molecular Interaction Networks” at the MPIMG, Berlin and joined recently the Department of Pharmaceutical Sciences of the University of Graz.

November, 9, 2015 Welcome Seminar Machine Learning for Mitochondria Research

We welcome Irina KUZNETSOVA to our group, who will do her PhD with us on the topic of machine learning for mitochondria research

Her inauguratioal talk is on

Mitochondrial Interactions

Mitochondrial diseases are progressive and debilitating multi-system disorders that occur at a frequency of up to 1 in 5,000 live births with no known cure. There is a variety of different complex mechanisms that cause the disruption of normal mitochondrial functions and leads to development of mitochondrial diseases. Identification of the molecular and pathophysiological mechanisms that cause mitochondrial disease remains challenging. However, establishing mouse models of mitochondrial disease would enable the study of the onset, progression and penetrance of mitochondrial disease as well as investigation of the tissues specifically affected in mitochondrial disease. Consequently this will enable to develop pre-clinical models of mitochondrial disease that could be used for testing a range of treatments for these diseases.

Irina did her Bachelor in computing sciences in St.Petersburg, and her Masters in Bioinformatics at the Tampere University of Technology in Finland. Curently she is working a the  Mitochondrial Medicine and Biology laboratory at the University of Western Australia in Perth where she is co-supervised by Professor Aleksandra Filipovska.

Lecture-Irina-02-11-2015-machine-learning

 

July, 7, 2015 Seminar Metabolomics data types

The potential of metabolomics and its various data types

Lecturer: Natalie BORDAG,  CBmed – Center for Biomarker Research in Medicine Graz

Abstract: Metabolomics is one of the youngest -omics technologies primarily concerned with the identification and quantification of small molecules (<1500 Da). The specific advantage of metabolomics in biomarker research lies in the concept, that metabolites fall downstream of genetic, transcriptomic, proteomic, microbiomic and environmental variation, thus providing the most integrated and dynamic measure of phenotype and medical condition. Thus metabolomics can deliver biologically most valuable results allowing for example early diagnostic biomarkers, optimization of biotechnological productions, gaining deep insights into pathological mechanism, identifying new therapeutic targets and many more. Metabolomics, especially MS (mass spectrometry) based metabolomics, delivers along a the flow from measurement towards knowledge generation highly divers data types with most potential yet to be exploited. The biological potential for knowledge generation by metabolomics will be shown with a real life example. The different data types and common data aggregation (e.g. peak detection, identification), transformations, statistical analysis and visualizations will be shown and open potentials jointly discussed.