Privacy Aware Machine Learning (PAML)
for Health Data Science

privacy-aware-machine-learning-paml-2

Special Session organized by Andreas HOLZINGER, Peter KIESEBERG, Edgar WEIPPL and A Min TJOA

PAML – September, 1, 2017:

12th International Conference on Availability, Reliability and Security (ARES and CD-ARES), Reggio di Calabria, Italy, August 29 – September, 2, 2017

supported by the International Federation of Information Processing IFIPTC5 and WG 8.4 and WG 8.9
http://cd-ares-conference.eulawrence-data-saves-lives
http://www.ares-conference.eu

Keynote Talk by Neil D. LAWRENCE, University of Sheffield and Amazon
<machine learning, computational biology, dimensionality reduction, Gaussian processes, probabilistic modelling>

Mini Bio:  Neil Lawrence is a Professor of Machine Learning and Computational Biology at the University of Sheffield. He holds a PhD in Computer science from Cambridge University and had a postdoctoral stay with Microsoft Research Cambridge. He has served as the Chair of the NIPS Conference, the premier Machine Learning conference in the world, and was the founding editor of the Journal of Machine Learning (JMLR) Research Workshop and Conference Proceedings. He is a fellow of the Royal Society in the working group for machine learning. He is considered one of the foremost experts on probabilistic modeling of real-world phenomena, specifically using Gaussian Processes. With his group, he is leading efforts to apply machine learning techniques for health informatics.

Machine learning is the fastest growing field in computer science  [Jordan, M. I. & Mitchell, T. M. 2015. Machine learning: Trends, perspectives, and prospects. Science, 349, (6245), 255-260], and it is well accepted that health informatics is amongst the greatest challenges [LeCun, Y., Bengio, Y. & Hinton, G. 2015. Deep learning. Nature, 521, (7553), 436-444 ], e.g. large-scale aggregate analyses of anonymized data can yield valuable insights addressing public health challenges and provide new avenues for scientific discovery [Horvitz, E. & Mulligan, D. 2015. Data, privacy, and the greater good. Science, 349, (6245), 253-255]. Privacy is becoming a major concern for machine learning tasks, which often operate on personal and sensitive data. Consequently, privacy, data protection, safety, information security and fair use of data is of utmost importance for health data science.

Background of the PAML Session

The amount of patient-related data produced in today’s clinical setting poses many challenges with respect to collection, storage and responsible use. For example, in research and public health care analysis, data must be anonymized before transfer, for which the k-anonymity measure was introduced and successively enhanced by further criteria. As k-anonymity is an NP-hard problem, which cannot be solved by automatic machine learning (aML) approaches we must often make use of approximation and heuristics. As data security is not guranteed given a certain k-anonymity degree, additional measures have been introduced in order to refine results (l-diversity, t-closeness, delta-presence). This motivates methods, methodologies and algorithmic machine learning approaches to tackle the problem. As the resulting data set will be a tradeoff between utility, usability and individual privacy and security, we need to optimize those measures to individual (subjective) standards. Moreover, the efficacy of an algorithm strongly depends on the background knowledge of an potential attacker as well as the underlying problem domain. One possible solution is to make use of interactive machine learning (iML) approaches and put a human-in-the-loop where the central question remains open: “could human intelligence lead to general heuristics we can use to improve heuristics?”

Research topics covered by this special session include but are not limited to the following topics:

– Production of Open Data Sets
– Synthetic data sets for learning algorithm testing
– Privacy preserving machine learning, data mining and knowledge discovery
– Data leak detection
– Data citation
– Differential privacy
– Anonymization and pseudonymization
– Securing expert-in-the-loop machine learning systems
– Evaluation and benchmarking

This special session will bring together scientists with diverse background, interested in both the underlying theoretical principles as well as the application of such methods for practical use in the biomedical, life sciences and health care domain. The cross-domain integration and appraisal of different fields will provide an atmosphere to foster different perspectives and opinions; it will offer a platform for novel crazy ideas and a fresh look on the methodologies to put these ideas into business.

Accepted Papers will be published in a Springer Lecture Notes in Computer Science LNCS Volume.

We are planning to invite excellent contributions for extension in journals (Springer MACH, BMC MIDM)

Schedule:

1) Deadline for submissions: April, 1, 2017
Paper submission via:
http://cd-ares-conference.eu/?page_id=43

2) Notification: May, 1, 2017

3) Camera Ready deadline: June, 1, 2017

4) Special Session: September, 1, 2017
> Conference Venue: Universita Mediterrranea di Reggio Calabria
> Information Reggio
> Lonely Planet  Reggio

The International Scientific Committee – consisting of experts from the international expert network HCI-KDD dealing with area (7), privacy, data protection, safety and security and additionally invited international experts will ensure the highest possible scientific quality, each paper will be reviewed by at least three reviewers (the paper acceptance rate of the last special session was 35 %).

International Scientific committee:

Call for Papers

PAML-call-for-papers-2017 (pdf, 76kB)

PAML-call-for-papers-2017 (Word docx 41 kB)

PAML-call-for-papers-2017 (txt 4 kB)

cfp in wikicfp

Technical Program from PAML 2016 – September, 1, 2016, Session PAML I – 11:30 – 13:00

Speaker #1: Yoan MICHE, Nokia Bell Labs, Helsinki, FI
Data Anonymization as a Vector Quantization Problem: Control Over Privacy for Health Data.  doi:10.1007/978-3-319-45507-5_13.

Speaker #2: Andre CALERO-VALDEZ, Ziefle-Group RWTH Aachen, DE
An Open-Source Object-Graph-Mapping Framework for Neo4j and Scala: Renesca.  doi:10.1007/978-3-319-45507-5_14.

Technical Program September, 1, 2016, Session PAML II – 14:00 – 16:00

Speaker #3: Sigal SHAKED, Rokach Group, Department of Information Systems Engineering, Ben Gurion University, IL
Publishing Differentially Private Medical Events Data. doi:10.1007/978-3-319-45507-5_15

Speaker #4: Katerina ZAMANI, Institute and Informatics and Telecommunications, NCSR Demokritos, GR
A Peer-to-Peer Protocol and System Architecture for Privacy-Preserving Statistical Analysis. doi:10.1007/978-3-319-45507-5_16

Speaker #5: Bernd MALLE, Holzinger Group HCI-KDD, Institute for Medical Informatics, Medical University Graz, AT
Towards Machine Learning on Perturbed Knowledge Bases. doi:10.1007/978-3-319-45507-5_17