Machine Learning & and Knowledge Extraction (MAKE) following the HCI-KDD approach involves a concerted effort of topics ranging from data pre-processing to visualization. In a joint effort with our international research colleagues, we are interested in theoretical, algorithmic, and experimental studies in machine learning in order to solve the problem of knowledge extraction from complex data to discover unknown unknowns.
We are aiming to reach excellence at international level within an inspiring group atmosphere. We follow our guiding motto: Science is to test crazy ideas – Engineering is to put these ideas into Business. We enjoy thinking and we consider us as problem solvers. Most of all: We just do it!
If you are crazy enough: We are always on talent scouting, in case you are interested to join our group please go to www.aholzinger.at and
1) watch the introduction video on [Youtube],
2) read the research statement [Research 5p.], and
3) read the teaching statement [Teaching 5p.]
If you are then still crazy enough to join our group, please send
one single pdf file containing:
1) your two-page scientific resume with a small paragraph on why you want to work with us
2) one sample paper from your previous work, and – in case you want to do your PhD with us,
3) the completed PhD-proposal – which can be found here:
and can be saved under a different – your personal project name,
directly to email@example.com
Note: We are always happy to receive applications, but please understand that we do not respond to non-personalized, non-English applications, or applications which do obviously not follow the criteria listed above and are not in our field of interest. Thank you!
Of course your own ideas and application domains (from Astronomy to Zoology 🙂 are highly welcome, thinking is not bound to any limits. However, in case you are principally interested but have no ideas where to start, you find below some starting points.
Current as of April, 5, 2017
A sample curriculum for the course Machine Learning for Health Informatics can be found in 
 Holzinger, A. 2016. Machine Learning for Health Informatics. In: Holzinger, A. (ed.) Machine Learning for Health Informatics: State-of-the-Art and Future Challenges, Lecture Notes in Artificial Intelligence LNAI 9605, Cham: Springer International Publishing, pp. 1-24, doi:10.1007/978-3-319-50478-0_1
 Agrawal, R., Golshan, B. & Papalexakis, E. 2016. Toward Data-Driven Design of Educational Courses: A Feasibility Study. Journal of Educational Data Mining (JEDM), 8, (1), 1-21.
 Calero Valdez, A., Ziefle, M., Verbert, K., Felfernig, A. & Holzinger, A. 2016. Recommender Systems for Health Informatics: State-of-the-Art and Future Perspectives. In: Holzinger, A. (ed.) Machine Learning for Health Informatics: State-of-the-Art and Future Challenges. Cham: Springer International Publishing, pp. 391-414, doi:10.1007/978-3-319-50478-0_20
 Thrun, S. 1996. Is learning the n-th thing any easier than learning the first? Advances in neural information processing systems (NIPS), 640-646.
Zonderland, M. E., Boucherie, R. J., Litvak, N. & Vleggeert-Lankamp, C. L. a. M. 2010. Planning and scheduling of semi-urgent surgeries. Health Care Management Science, 13, (3), 256-267, doi:10.1007/s10729-010-9127-6.
particularly look at the work done by Maartje E. Zonderland, see:
Her doctoral thesis was published in the Springer Briefs in Health Care Management and Economics Series:
Zonderland, M. E. 2014. Appointment Planning in Outpatient Clinics and Diagnostic Facilities. Boston, MA: Springer,
and is available online here: http://doc.utwente.nl/79465/
A good example for a current relevant work is:
Kshirsagar, M., Carbonell, J. & Klein-Seetharaman, J. 2013. Multitask learning for host–pathogen protein interactions. Bioinformatics, 29, (13), i217-i226. Kshirsagar, M., Carbonell, J. & Klein-Seetharaman, J. 2013. Multitask learning for host–pathogen protein interactions. Bioinformatics, 29, (13), i217-i226.
General Information about Multi-task learning can be found in:
Caruana, R. 1997. Multitask Learning. Machine Learning, 28, (1), 41-75.
Lecture LV 706.315 Interactive Machine Learning
Please read this paper BEFORE you decide to do this work:
Akrour, R., Schoenauer, M. & Sebag, M. 2012. APRIL: Active Preference Learning-Based Reinforcement Learning. In: Flach, P. A., De Bie, T. & Cristianini, N. (eds.) Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science LNCS 7524. Berlin Heidelberg: Springer, pp. 116-131. Paper can be found here.
In order to tackle this problem, your work will consist of researching and evaluating potential approaches for merging several graphs into a single result graph and eliminating (likely) false information while retaining the (likely) true attributes needed to accurately describe the original image.
Traditionally the browser has been unable to accomplish complex visualization tasks, however, with the advent of WebGL and the HTML canvas element some years ago, the field has gained traction and a cornucopia of visualization libraries has been developed ever since, enabling activities ranging from simple 2D-chart visualization of datasets to web based game programming. As part of our ongoing research, students have already undertaken to implement 3D graph visualization utilizing the three.js and scene.js libraries. Although successful in their task, there were still a lot of performance problems and a lack of interaction possibilities. Therefore in this work, you will evaluate different routes to utilize Canvas & WebGL, building upon the work already done, but extending your insights to writing low-level WebGL yourself as well as other alternatives such as X3DOM (a way to write WebGL like HTML, in loose terms..). The emphasis of this project lies on performance, as we will need to visualize several thousand nodes (and about an order of magnitude more edges) in near-realtime. Upon comparing the alternatives, you will pick a winner and implement a proof-of-concept; as our framework for extracting graphs out of images inside the browser already exists, the results of your efforts could have a direct impact on our real-world application!
This is a great opportunity for hardcore programmers; further details (including payment) to be elaborated in a personal meeting!
 Davenport, T. H. & Glaser, J. (2002) Just-in-time delivery comes to knowledge management. Harvard Business Review, 80, 7, 107-111.
 Holzinger, A.; Simonic, K.-M.; Yildirim, P. (2012) Disease-disease relationships for rheumatic diseases Web-based biomedical textmining and knowledge discovery to assist medical decision making. In: IEEE COMPSAC, 36th Annual International Computer Software and Applications Conference [Preprint of the paper for download]
Biomedical discoveries are documented mostly in scientific articles. Web-based collections of such biomedical articles contain an exponentially increasing amount of text data. For example the MEDLINE – Medical Literature Analysis and Retrieval System Online  – is the National Library of Medicine’s (NLM) premier bibliographic database that contains 19+ million references to journal articles in life science and biomedicine. The records of this database are indexed by NLM Medical Subject Headings (MeSH). As a matter of fact, yet 10 years ago, a typical medical practitioner had to stay up-to-date of more than 10k diseases and syndromes, more than 3k medications and more than 1k lab tests . So it is easy to understand that no human medical expert can manually transfer all articles dealing with relevant information into knowledge; but what the experts can is asking questions, stating hypotheses. Consequently, there is an urgent need for methods and tools to enable the expert to discover relevant hidden knowledge in those massive data sets. For this purpose intelligent interactive text mining methods can be applied to assist the reader, e.g. to look for similarities or anomalies within these large volumes of these non standardized textual data. Statistical models can be used to evaluate the significance of the relationship between entities such as disease names, drug names, and keywords in titles, abstracts or within the entire publication .
Collective data in medical documents is vastly increasing, making it more and more difficult to discover relevant knowledge. One possible approach of harnessing this data deluge is in the application of topic modeling. Topic modeling algorithms (e.g. Latent Dirichlet Algorithm, LDA) can be applied to large collections of documents, e.g. to find previously unknown patterns. A promising future direction for topic modeling is to develop new methods of interacting with and visualizing topics and corpora. Topic models provide new exploratory structures in large collections, the question remains on how can we best exploit that structure to aid interactive knowledge discovery.
A big problem is how to visualize the topics. Generally, topics are visualized just by listing the most frequent keywords of a document; however, new ways of labeling such topics may be more effective. A further problem is how to best display a document with a topic model. At the document level, topic models provide potentially useful information about the structure of the document.
Topic modeling algorithms show much promise for uncovering meaningful thematic structure in large collections of documents. But making this structure useful requires careful attention to information visualization and the corresponding user interfaces. Consequently, the goal of this work is to experiment with medical documents on possible solutions and to experimentally test the solutions in the medical area.
Gong, X., Yang, Y., Lin, J. H. & Li, T. R. (2011) Expression Detection Based on a Novel Emotion Recognition Method. International Journal of Computational Intelligence Systems, 4, 1, 44-53.
Lopatovska, I. & Arapakis, I. (2011) Theories, methods and current research on emotions in library and information science, information retrieval and human–computer interaction. Information Processing & Management, 47, 4, 575-592.
Emotions are regarded as important mental and physiological states influencing perception and cognition, thus behaviour . Consequently, emotion has enourmous influence on cognitive performance during all types of human activities. Example applications include decision making, recommender systems, learning etc. Therefore, the concept of emotion is a topic of enourmous interest in Human-Computer Interaction (HCI) for some time. Popular examples include stress detection or affective computing.
According to Johnson (2004)  developers of visualization technologies do not spend enough (or indeed any) time endeavoring to understand the underlying scientific aspects they are trying to represent, just as application scientists sometimes create crude visualizations without understanding the algorithms and the science of the visualization, respectively. It is necessary to understanding the underlying science, engineering, and medical applications. There is no substitute for working together with end users to create better techniques and tools for solving challenging scientific problems. Issues of cognition, perception and reasoning are of great importance and tasks such as discovering patterns of change in the data will involve not only data visualization, but also how the data is changing. Such high-level discoveries can be used by the domain analyst to form, confirm, or refute a set hypothesis, expand or correct mental models, and provide confidence in decision making processes – which is still the core research area in biomedical informatics .
 Foldit [Start page]
 Foldit [Nature Video]
 Cooper, S., Khatib, F., Treuille, A., Barbero, J., Lee, J., Beenen, M., Leaver-Fay, A., Baker, D., Popovic, Z. & players, F. (2010) Predicting protein structures with a multiplayer online game. Nature, 466, 7307, 756-760.
 Kawrykow, A., Roumanis, G., Kam, A., Kwak, D., Leung, C., Wu, C., Zarour, E., Sarmenta, L., Blanchette, M., Waldispühl, J. & Phylo, p. (2012) Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment. Plos One, 7, 3, e31362.
 Ebner, M. & Holzinger, A. (2007) Successful Implementation of User-Centered Game Based Learning in Higher Education – an Example from Civil Engineering. Computers & Education, 49, 3, 873-890.
 Mayo, M. J. (2009) Video Games: A Route to Large-Scale STEM Education? Science, 323, 5910, 79-82.
 Holzinger, A., Nischelwitzer, A., Friedl, S. & Hu, B. (2010) Towards life long learning: three models for ubiquitous applications. Wireless Communications and Mobile Computing, 10, 10, 1350-1365.
 The concept of Life Long Learning [Video]
An excellent example of a science game is Foldit (Strauss, 2012) , , : Protein structures are important for many purposes in medicine and the life sciences and are specifically studied in molecular biology and bioinformatics respectively. Successful identification of structural topologies of proteins enable experts to study and understand protein-protein interactions (PPI), which may lead to the creation of new proteins along with advancements in the treatment of diseases and many other biomedical problems (Cooper et al., 2010) . Foldit is an online puzzle game with the goal of folding the structure of selected proteins to the best of the player’s ability, using various tools provided within the game. The highest scoring solutions are analyzed by researchers, who determine whether and to what extent there is a native structural configuration (or native state) that can be applied to the relevant proteins back in the real-world. This is also a good example for crowdsourcing . A further outstanding example is Phylo, a human-based computing framework applying crowdsourcing techniques to solve the Multiple Sequence Alignment (MSA) problem; the key idea of this game is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal previous knowledge of the biological context (Kawrykow et al., 2012) . The power of games for educational purposes has been proved in several areas (Ebner & Holzinger, 2007) , (Mayo, 2009)  and are also useful for inclusion in Life Long Learning scenarios (Holzinger et al., 2010) , .
The bed side paper patient chart (sloppy called “fever curve”) is of invaluable importance for the daily work of the clinician. It is an important summary of medical relevant data and a substantial part of the patient
documentation, necessary for the physician and for everyone concerned with patient care (nurses, therapists, shared care etc.). However, to date, no convincing, clinical useable and useful electronic solution is available. Previous attempts mostly failed on the insufficient display resolution of the devices. Consequently, the recent Super AMOLED capacitive touchscreens with 300+ ppi pixel-density could possibly be a first chance to make such a chart of benefit for the clinician (previous work available – more information on demand).
 Pincus, S. M. (1991) Approximate Entropy as a measure of system complexity. Proceedings of the National Academy of Sciences of the United States of America, 88, 6, 2297-2301.
(more information on demand)
We can see time series data as a set of data points representing observations made in time. If the observations are made subsequently in time we speak of an equally spaced time series. In medical practice often it is not possible to achieve equally spaced time series (Simonic et al., 2011), . The main problem involved is, that standard methods may model artefacts. By application of approximate entropy (ApEn) we are able to classify complex systems, which may include both deterministic chaotic and stochastic processes (Pincus, 1991), .
Computational problems in biology, medicine and life sciences create masses of data and the primary goal is in sensemaking, i.e. to extract relevant patterns and trends, and to gain out knowledge of this data. Hastie, Tibshirani and Friedman call this “learning from data” (Hastie, Tibshirani & Friedman, 2009) . The learning problems which they consider are generally categorized in supervised learning (e.g. Bayesian statistics, nearest neighbor algorithm, support vector machines, etc.), or unsupervised learning (e.g. hierarchical clustering, principal component analysis, neural network models, etc.). In supervised learning, the goal is to predict the value of an outcome measure based on a number of input measures (set of training data), whereas in unsupervised learning, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures (typically vectors).
 Auinger, A., Ebner, M., Nedbal, D. & Holzinger, A. (2009) Mixing Content and Endless Collaboration – MashUps: Towards Future Personal Learning Environments. In: Stephanidis, C. (Ed.) Universal Access in Human-Computer Interaction HCI, Part III: Applications and Services, HCI International 2009, Lecture Notes in Computer Science (LNCS 5616). Berlin, Heidelberg, New York, Springer, 14-23.
 Aghaee, S. & Pautasso, C. (2012) End-User Programming for Web Mashups. In: Harth, A. & Koch, N. (Eds.) Current Trends in Web Engineering. LNCS 7059. Berlin, Heidelberg, Springer, 347-351.
Cao, J., Riche, Y., Wiedenbeck, S., Burnett, M. & Grigoreanu, V. (2010). End-User Mashup Programming: Through the Design Lens. CHI 2010: 28th Annual Conference on Human Factors in Computing Systems, New York, Association of Computing Machinery, 1009-1018.
Holzinger, A., Mayr, S., Slany, W. & Debevc, M. (2010). The influence of AJAX on Web Usability. ICE-B 2010 - ICETE The International Joint Conference on e-Business and Telecommunications, Athens (Greece), INSTIC IEEE, 124-127.
Wong, J. & Hong, J. I. (2007). Making Mashups with Marmite: Towards End-User Programming for the Web. ACM Conference on Human Factors in Computing Systems, Vol 1 and 2, New York, Association of Computing Machinery, 1435-1444.
„The Web is filled with millions of customized applications, most created by end users themselves (Adams, 2008), ”. Hybrid web applications, so called mash-ups, are promising web service concepts, allowing end users with very low computer literacy (end users with minimal or no programming experience) to tailor their software applications exactly to their tasks, their needs and their environment (Auinger et al., 2009), . Whilst there are some examples from diverse areas, research in the area of clinical medicine and health care is highly necessary (Aghaee & Pautasso, 2012) .
This work is on evaluation of important information retrieval methods for the use in medical informatics. The survey includes following methods: Latent Semantic Analysis (LSA), Probabilistic latent semantic analysis (PLSA), Latent Dirichlet allocation (LDA), Hierarchical Latent Dirichlet Allocation (hLDA), Vector Space Model (VSM), Semantic Vector Space Model (SVSM), Latent semantic mapping (LSM), Principal component analysis (PCA). The work shall discuss their applicability for practical use in medicine and health care.
 Hawn, C. (2009) Take two aspirin and tweet me in the morning: how Twitter, Facebook, and other social media are reshaping health care. Health affairs, 28, 2, 361-368.
 Cheng, Z., Caverlee, J. & Lee, K. (2010). You are where you tweet: a content-based approach to geo-locating twitter users. ACM, 759-768.
Web Sciences encompass the study of social networks, for example to gain knowledge out of big social data. Social media allow getting an impression of the thoughts and activities of the society at large. Consequently, Twitter data may help to answer questions or to support hypotheses, which are otherwise hard to approach, because a manual polling of large groups of people is too time consuming, too expensive or just impossible (Savage, 2011) , (Hawn, 2009) . To analyze such data is of high interest for public health measures. A typical example is the correlation between mined Twitter messages with actual influenza rates. There are many research & development possibilities of public health apps for Twitter, e.g. tracking illnesses over times. We call this syndromes surveillance, i.e. to monitor clinical syndromes that may have significant impact on public health, impacts medical resource allocation, health policy and education. Moreover, there are possibilities to measure behavioral risk factors, localizing illnesses by geographic region, and/or analyzing symptoms and medication usages (Cheng, Caverlee & Lee, 2010) . Technical approaches include probabilistic topic models, e.g. Latent Dirichlet allocation (LDA).
The decreasing general state of health caused by lack of health consciousness, sedentary lifestyles and demographic changes will have dramatic effects on our health care system in years to come. An unhealthy lifestyle will lead to an increase in chronic diseases and eventually to increased costs in health care. A preventive measure against such a development can be to reinforce health-awareness through the use of mobile applications supporting self-observation and behavior change. The aim of this project was the design and development of a mobile web application that assists people in changing their lifestyles by providing the means to manage their wellness related activities and health risks. The application not merely offers the means for wellness management but also attempts to create high motivation through the adaption of design goals created especially for supporting behavior change. A user study on the final prototype including a questionnaire showed very good usability ratings with a SUS score of 83.75. The majority of our respondents stated that the functions offered by the system could be useful for them and they could image that using the application might motivate them to lead a healthier life. The goal-reward system and the summarizing feedback page were the most popular features among our test users. Usability issues discovered in the study included button size and spacing as well as the system reaction on tapping events. Neither age nor previous experience with computers or smartphones showed a significant influence on the users’ perceived usability and on their motivation to use the application.
This master thesis deals with the design and development process of a mobile Android application to support the inpatient glucose management of patients with diabetes at the University Hospital Graz in order to optimize the current paper based glucose management. An integrated decision support service for insulin dosing should provide additional security and support for clinicians. The master thesis was carried out in the course of the EU-project REACTION at the Joanneum Research institute HEALTH – Institute for Biomedicine and Health Sciences – in Graz. The thesis is generally divided into two parts. The first part deals with an extensive requirements analysis, in order to get an imagination of the design of the application’s user interface, as well as to understand clinical workflow patterns. The design phase followed a user-centered design approach, which means that the end-users have been involved in every step of the design process. In the second part, the achievements of the requirements analysis were used to set up the implementation of the inpatient glucose management system. Due to maintainability and expandability it was decided to distinguish between a frontend application for user interactions and a platform independent backend application, which contains the business logic for the decision support, as well as the data storage and interfaces to the hospital information system. The exchange of data between the backend and the frontend is done via encrypted web services to provide data security. This master thesis primarily deals with the development of the frontend apllication and should illustrate collected experiences during the design and the development process. It should demonstrate the requirements and challenges of implementing safety-critical medical software and should show how the end user can be involved in the engineering process.
“The most rewarding research is the one that delights the thinker and at the same time is beneficial to humankind” (Christian Doppler, 1803‐1853), consequently, we are devoted to our guiding principle: “Science is to test crazy ideas – Engineering is to put these ideas into Business” (A.Holzinger, 2011, Successful Management of R&D).