The goal of this exam is perform an analysis on data related to heart disease,
in particular, we want to explore the relationship between a
target variable - whether patient has a heart disease or not - and several other variables such as cholesterol level, age, …
The data is present in the file
'heartData_simplified.csv', which is a cleaned and simplified version of the UCI heart disease data set
We ask that you explore the data-set and answer the questions in a commented R code (or Rmd if you know how). You should send your code to email@example.com by the 7th of August.
Do not hesitate to comment your code to explain to us your thought process and detail your conclusions following the analysis.
heartData <- read.csv('heartData_simplified.csv')
perform a PCA on the
oldpeak features. Do the PCA axes helps you to visually distinguish patients along different categorized features such as
perform a Hierarchical Clustering on all features but
target. Evaluate the quality of your clustering and explore the different options (distances, clustering method).
target using the other features with the train set