Practical Exercise I: Performance Evaluation with K-fold Cross-Validation

Veröffentlichungsdatum

22. Oktober 2024

Compute In-sample Performance Estimate

In our first practical exercise, we will demonstrate how to use mlr3 to fit a standard linear regression model to the PhoneStudy dataset by predicting the sociability personality trait score based on all variables of aggregated smartphone usage behavior. Then we compare in-sample and out-of-sample predictive performance based on \(R^2\) and \(RMSE\).

First, we load the PhoneStudy dataset and remove some administrative variables we do not use in our tutorial. We also remove 4 participants who did not report their gender. You can download the dataset here. Note that the dataset is not a .csv file but instead a .RDS file that can be loaded into R with the readRDS function.

phonedata <- readRDS(file = "clusterdata.RDS")
phonedata <- phonedata[complete.cases(phonedata$gender),]
phonedata <- phonedata[, c(1:1821, 1823, 1837)]

We load the mlr3verse - R package (Lang und Schratz 2021), which conveniently loads mlr3 and the most important companion packages. Then we create a task object with a unique id (Sociability_Regr) which is mlr3’s way to store the raw data along with some meta-information for modeling. In mlr3, a task defines a certain prediction problem, here supervised regression with the sociability trait score (named E2.Sociableness in our dataset) as the target.1

library(mlr3verse)
Loading required package: mlr3
task_Soci <- as_task_regr(phonedata, id = "Sociability_Regr",
  target = "E2.Sociableness")

The meta-data can be displayed by printing the task object (type task_Soci). When training a model on a task, mlr3 by default uses all variables except the target as features. We do not want to use gender as a feature although we want to check our models for gender fairness in a later module. Therefore we remove gender from the set of features but keep it within the task object.

task_Soci$set_col_roles("gender", remove_from = "feature")

We recommend to always double check which variables are really intended to be used as features (you can get the full list of feature names with task_Soci$col_roles$feature), because including the wrong variables is a common source of embarrassing mistakes, which can completely invalidate the whole analysis.

Next we create a learner object, to specify an ML model to apply later. mlr3 does not implement its own ML models but links to available implementations in other R packages. For example, the id regr.lm links to the ordinary lm function in the stats package. You can find a list of mlr3 ids for the most popular ML models in the mlr3 e-book (Becker u. a. 2022).

lm <- lrn("regr.lm")

We try to train (i.e., estimate model parameters) the learner on the task. In mlr3, objects have “abilities” (also called methods) that can be applied with the following $-syntax (here the train method of the learner object is used to train the learner on a specified task).

lm$train(task = task_Soci)
Error: Task 'Sociability_Regr' has missing values in column(s) 'AR_num_calls_in1', 'AR_num_calls_in12', 'AR_num_calls_in24', 'AR_num_calls_in4', 'AR_num_calls_in8', 'AR_num_calls_out1', 'AR_num_calls_out12', 'AR_num_calls_out24', 'AR_num_calls_out4', 'AR_num_calls_out8', 'AR_num_calls_ring1', 'AR_num_calls_ring12', 'AR_num_calls_ring24', 'AR_num_calls_ring4', 'AR_num_calls_ring8', 'IVI_call_in', 'IVI_call_in_week', 'IVI_call_in_weekend', 'IVI_call_miss', 'IVI_call_miss_week', 'IVI_call_miss_weekend', 'IVI_call_out', 'IVI_call_out_week', 'IVI_call_out_weekend', 'IVI_call_ring', 'IVI_call_ring_week', 'IVI_call_ring_weekend', 'IVI_call_week', 'IVI_call_weekend', 'IVI_calls', 'IV_Academia', 'IV_Artistic_Hobby', 'IV_Beauty', 'IV_Betting_Risk', 'IV_Calculator', 'IV_Calendar_Apps', 'IV_Calling', 'IV_Camera', 'IV_Checkup_Monitoring', 'IV_ComicsBooks', 'IV_Dating_Mating', 'IV_E_Mail', 'IV_Eating', 'IV_Education', 'IV_Emergency_Warning', 'IV_Entertainment', 'IV_Financial', 'IV_Gallery', 'IV_Gaming_Action', 'IV_Gaming_Adventure', 'IV_Gaming_Casual', 'IV_Gaming_Knowledge', 'IV_Gaming_Logic', 'IV_Gaming_Role_Playing', 'IV_Gaming_Simulation', 'IV_Gaming_Sports', 'IV_Gaming_Strategy', 'IV_Gaming_Tools_Community', 'IV_Health_SelfMonitoring', 'IV_Image_And_Video_Editing', 'IV_Internet_Browser', 'IV_Jobs_Additional_Income', 'IV_Language_Learning', 'IV_Messaging', 'IV_Music_Audio_Radio', 'IV_News_Magazines', 'IV_Note_Apps', 'IV_Office_Tools', 'IV_Organisation', 'IV_Orientation', 'IV_Personalization', 'IV_Private_Transportation', 'IV_Provider_Services', 'IV_Religion_Spirituality_Esotersim', 'IV_Security', 'IV_Settings', 'IV_Shared_Transportation', 'IV_Sharing_Cloud', 'IV_Shop_Sell_Rent', 'IV_Shopping', 'IV_Sleep', 'IV_Social_Media_Tools', 'IV_Social_Networks', 'IV_Sport_News', 'IV_Sports', 'IV_System_App', 'IV_TVVideo_Apps', 'IV_TV_Film_Guide', 'IV_Timer_Clocks', 'IV_Tools', 'IV_Travel', 'IV_Weather', 'IV_Womens_Apps', 'IV_Workout', 'IV_apps', 'IV_unlock', 'Qn_dur_.app.me.phonestudy_restructured', 'Qn_dur_.canvasm.myo2', 'Qn_dur_.cc.dict.dictcc', 'Qn_dur_.ch.threema.app', 'Qn_dur_.cn.wps.moffice_eng', 'Qn_dur_.com.adobe.reader', 'Qn_dur_.com.airbnb.android', 'Qn_dur_.com.amazon.avod.thirdpartyclient', 'Qn_dur_.com.amazon.mShop.android.shopping', 'Qn_dur_.com.amazon.mp3', 'Qn_dur_.com.android.bluetooth', 'Qn_dur_.com.android.browser', 'Qn_dur_.com.android.calculator2', 'Qn_dur_.com.android.calendar', 'Qn_dur_.com.android.chrome', 'Qn_dur_.com.android.contacts', 'Qn_dur_.com.android.deskclock', 'Qn_dur_.com.android.dialer', 'Qn_dur_.com.android.email', 'Qn_dur_.com.android.launcher3', 'Qn_dur_.com.android.mediacenter', 'Qn_dur_.com.android.phone', 'Qn_dur_.com.android.printspooler', 'Qn_dur_.com.android.settings', 'Qn_dur_.com.android.soundrecorder', 'Qn_dur_.com.android.systemui', 'Qn_dur_.com.android.vending', 'Qn_dur_.com.antivirus', 'Qn_dur_.com.appseleration.android.selfcare', 'Qn_dur_.com.audible.application', 'Qn_dur_.com.avast.android.mobilesecurity', 'Qn_dur_.com.avira.android', 'Qn_dur_.com.baviux.pillreminder', 'Qn_dur_.com.bitstrips.imoji', 'Qn_dur_.com.cleanmaster.mguard', 'Qn_dur_.com.cleanmaster.security', 'Qn_dur_.com.comuto', 'Qn_dur_.com.devuni.flashlight', 'Qn_dur_.com.dn.drivenow', 'Qn_dur_.com.doodle.android', 'Qn_dur_.com.dropbox.android', 'Qn_dur_.com.duolingo', 'Qn_dur_.com.ebay.kleinanzeigen', 'Qn_dur_.com.ebay.mobile', 'Qn_dur_.com.estrongs.android.pop', 'Qn_dur_.com.evernote', 'Qn_dur_.com.example.android.notepad', 'Qn_dur_.com.facebook.katana', 'Qn_dur_.com.facebook.orca', 'Qn_dur_.com.google.android.GoogleCamera', 'Qn_dur_.com.google.android.apps.docs', 'Qn_dur_.com.google.android.apps.docs.editors.docs', 'Qn_dur_.com.google.android.apps.docs.editors.sheets', 'Qn_dur_.com.google.android.apps.genie.geniewidget', 'Qn_dur_.com.google.android.apps.magazines', 'Qn_dur_.com.google.android.apps.maps', 'Qn_dur_.com.google.android.apps.messaging', 'Qn_dur_.com.google.android.apps.paidtasks', 'Qn_dur_.com.google.android.apps.photos', 'Qn_dur_.com.google.android.apps.plus', 'Qn_dur_.com.google.android.apps.translate', 'Qn_dur_.com.google.android.calculator', 'Qn_dur_.com.google.android.calendar', 'Qn_dur_.com.google.android.deskclock', 'Qn_dur_.com.google.android.gm', 'Qn_dur_.com.google.android.googlequicksearchbox', 'Qn_dur_.com.google.android.keep', 'Qn_dur_.com.google.android.music', 'Qn_dur_.com.google.android.play.games', 'Qn_dur_.com.google.android.talk', 'Qn_dur_.com.google.android.youtube', 'Qn_dur_.com.groupon', 'Qn_dur_.com.here.app.maps', 'Qn_dur_.com.hm', 'Qn_dur_.com.htc.Weather', 'Qn_dur_.com.htc.album', 'Qn_dur_.com.htc.android.worldclock', 'Qn_dur_.com.htc.calendar', 'Qn_dur_.com.htc.camera', 'Qn_dur_.com.htc.contacts', 'Qn_dur_.com.htc.launcher', 'Qn_dur_.com.htc.music', 'Qn_dur_.com.htc.sense.browser', 'Qn_dur_.com.htc.sense.mms', 'Qn_dur_.com.htc.video', 'Qn_dur_.com.huawei.android.launcher', 'Qn_dur_.com.huawei.android.totemweather', 'Qn_dur_.com.huawei.camera', 'Qn_dur_.com.huawei.hidisk', 'Qn_dur_.com.huawei.hwvplayer', 'Qn_dur_.com.huawei.vassistant', 'Qn_dur_.com.infraware.polarisviewer4', 'Qn_dur_.com.infraware.polarisviewer5', 'Qn_dur_.com.instagram.android', 'Qn_dur_.com.intsig.camscanner', 'Qn_dur_.com.king.candycrushsaga', 'Qn_dur_.com.king.candycrushsodasaga', 'Qn_dur_.com.mdv.AVVCompanion', 'Qn_dur_.com.mdv.companion', 'Qn_dur_.com.melodis.midomiMusicIdentifier.freemium', 'Qn_dur_.com.microsoft.office.excel', 'Qn_dur_.com.microsoft.office.outlook', 'Qn_dur_.com.microsoft.office.powerpoint', 'Qn_dur_.com.microsoft.office.word', 'Qn_dur_.com.microsoft.skydrive', 'Qn_dur_.com.mobisystems.fileman', 'Qn_dur_.com.mobisystems.office', 'Qn_dur_.com.motorola.cameraone', 'Qn_dur_.com.netbiscuits.kicker', 'Qn_dur_.com.netflix.mediaclient', 'Qn_dur_.com.nianticlabs.pokemongo', 'Qn_dur_.com.ninegag.android.app', 'Qn_dur_.com.nintendo.zaca', 'Qn_dur_.com.oneplus.camera', 'Qn_dur_.com.paypal.android.p2pmobile', 'Qn_dur_.com.picsart.studio', 'Qn_dur_.com.pinterest', 'Qn_dur_.com.pons.onlinedictionary', 'Qn_dur_.com.popularapp.periodcalendar', 'Qn_dur_.com.runtastic.android', 'Qn_dur_.com.samsung.android.app.galaxyfinder', 'Qn_dur_.com.samsung.android.app.memo', 'Qn_dur_.com.samsung.android.app.scrollcapture', 'Qn_dur_.com.samsung.android.calendar', 'Qn_dur_.com.samsung.android.contacts', 'Qn_dur_.com.samsung.android.email.provider', 'Qn_dur_.com.samsung.android.incallui', 'Qn_dur_.com.samsung.android.lool', 'Qn_dur_.com.samsung.android.messaging', 'Qn_dur_.com.samsung.android.qconnect', 'Qn_dur_.com.samsung.android.sm', 'Qn_dur_.com.samsung.android.themestore', 'Qn_dur_.com.samsung.android.video', 'Qn_dur_.com.samsung.android.weather', 'Qn_dur_.com.sec.android.app.camera', 'Qn_dur_.com.sec.android.app.clockpackage', 'Qn_dur_.com.sec.android.app.controlpanel', 'Qn_dur_.com.sec.android.app.fm', 'Qn_dur_.com.sec.android.app.launcher', 'Qn_dur_.com.sec.android.app.memo', 'Qn_dur_.com.sec.android.app.music', 'Qn_dur_.com.sec.android.app.myfiles', 'Qn_dur_.com.sec.android.app.popupcalculator', 'Qn_dur_.com.sec.android.app.samsungapps', 'Qn_dur_.com.sec.android.app.sbrowser', 'Qn_dur_.com.sec.android.app.shealth', 'Qn_dur_.com.sec.android.app.taskmanager', 'Qn_dur_.com.sec.android.app.videoplayer', 'Qn_dur_.com.sec.android.app.voicenote', 'Qn_dur_.com.sec.android.app.voicerecorder', 'Qn_dur_.com.sec.android.app.wallpaperchooser', 'Qn_dur_.com.sec.android.emergencylauncher', 'Qn_dur_.com.sec.android.gallery3d', 'Qn_dur_.com.sec.android.mimage.photoretouching', 'Qn_dur_.com.sec.android.wallpapercropper2', 'Qn_dur_.com.sec.android.widgetapp.ap.hero.accuweather', 'Qn_dur_.com.sec.android.widgetapp.diotek.smemo', 'Qn_dur_.com.shazam.android', 'Qn_dur_.com.shpock.android', 'Qn_dur_.com.skype.raider', 'Qn_dur_.com.snapchat.android', 'Qn_dur_.com.socialnmobile.dictapps.notepad.color.note', 'Qn_dur_.com.sonyericsson.advancedwidget.weather', 'Qn_dur_.com.sonyericsson.album', 'Qn_dur_.com.sonyericsson.android.camera', 'Qn_dur_.com.sonyericsson.android.socialphonebook', 'Qn_dur_.com.sonyericsson.conversations', 'Qn_dur_.com.sonyericsson.extras.liveware', 'Qn_dur_.com.sonyericsson.home', 'Qn_dur_.com.sonyericsson.music', 'Qn_dur_.com.sonyericsson.organizer', 'Qn_dur_.com.sonyericsson.photoeditor', 'Qn_dur_.com.sonyericsson.video', 'Qn_dur_.com.sonymobile.android.contacts', 'Qn_dur_.com.sonymobile.android.dialer', 'Qn_dur_.com.sonymobile.calendar', 'Qn_dur_.com.sonymobile.entrance', 'Qn_dur_.com.soundcloud.android', 'Qn_dur_.com.spotify.music', 'Qn_dur_.com.starfinanz.mobile.android.pushtan', 'Qn_dur_.com.starfinanz.smob.android.sfinanzstatus', 'Qn_dur_.com.supercell.clashofclans', 'Qn_dur_.com.supercell.clashroyale', 'Qn_dur_.com.surpax.ledflashlight.panel', 'Qn_dur_.com.tellm.android.app', 'Qn_dur_.com.tinder', 'Qn_dur_.com.tumblr', 'Qn_dur_.com.twitter.android', 'Qn_dur_.com.urbandroid.sleep', 'Qn_dur_.com.valvesoftware.android.steam.community', 'Qn_dur_.com.viber.voip', 'Qn_dur_.com.vlingo.midas', 'Qn_dur_.com.wetter.androidclient', 'Qn_dur_.com.whatsapp', 'Qn_dur_.com.wunderkinder.wunderlistandroid', 'Qn_dur_.com.xing.android', 'Qn_dur_.com.yahoo.mobile.client.android.mail', 'Qn_dur_.com.yopeso.lieferando', 'Qn_dur_.com.zdf.android.mediathek', 'Qn_dur_.de.amazon.mShop.android', 'Qn_dur_.de.axelspringer.yana.zeropage', 'Qn_dur_.de.bfv.android', 'Qn_dur_.de.burgerking.kingfinder', 'Qn_dur_.de.cellular.focus', 'Qn_dur_.de.cellular.tagesschau', 'Qn_dur_.de.eplus.mappecc.client.android.alditalk', 'Qn_dur_.de.fiducia.smartphone.android.banking.vr', 'Qn_dur_.de.flixbus.app', 'Qn_dur_.de.gmx.mobile.android.mail', 'Qn_dur_.de.hafas.android.db', 'Qn_dur_.de.hafas.android.sbm', 'Qn_dur_.de.is24.android', 'Qn_dur_.de.kicktipp.mbookmark', 'Qn_dur_.de.kleiderkreisel', 'Qn_dur_.de.lieferheld.android', 'Qn_dur_.de.lineas.lit.ntv.android', 'Qn_dur_.de.lmuroomfinder.release', 'Qn_dur_.de.mensaplan.app.android.muenchen', 'Qn_dur_.de.mobile.android.app', 'Qn_dur_.de.motain.iliga', 'Qn_dur_.de.payback.client.android', 'Qn_dur_.de.pixelhouse', 'Qn_dur_.de.schildbach.oeffi', 'Qn_dur_.de.sde.mobile', 'Qn_dur_.de.spiegel.android.app.spon', 'Qn_dur_.de.swm.mvgfahrinfo.muenchen', 'Qn_dur_.de.tagesschau', 'Qn_dur_.de.telekom.mds.mbp', 'Qn_dur_.de.tum.in.tumcampus', 'Qn_dur_.de.tvspielfilm', 'Qn_dur_.de.web.mobile.android.mail', 'Qn_dur_.de.wetteronline.wetterapp', 'Qn_dur_.de.zalando.mobile', 'Qn_dur_.de.zeit.online', 'Qn_dur_.flipboard.boxer.app', 'Qn_dur_.kik.android', 'Qn_dur_.net.lovoo.android', 'Qn_dur_.org.leo.android.dict', 'Qn_dur_.org.mozilla.firefox', 'Qn_dur_.org.telegram.messenger', 'Qn_dur_.org.thoughtcrime.securesms', 'Qn_dur_.org.wikipedia', 'Qn_dur_.pm.lamm.myandroidlogger', 'Qn_dur_.se.feomedia.quizkampen.de.lite', 'Qn_dur_.se.feomedia.quizkampen.de.premium', 'Qn_dur_.tunein.player', 'Qn_dur_.tv.peel.app', 'Qn_dur_.tv.peel.smartremote', 'Qn_dur_.tv.twitch.android.app', 'Qn_dur_.uk.amazon.mShop.android', 'Qn_dur_Academia', 'Qn_dur_Activism_Charity', 'Qn_dur_Artistic_Hobby', 'Qn_dur_Beauty', 'Qn_dur_Betting_Risk', 'Qn_dur_Calculator', 'Qn_dur_Calendar_Apps', 'Qn_dur_Calling', 'Qn_dur_Camera', 'Qn_dur_Checkup_Monitoring', 'Qn_dur_ComicsBooks', 'Qn_dur_Dating_Mating', 'Qn_dur_E_Mail', 'Qn_dur_Eating', 'Qn_dur_Education', 'Qn_dur_Emergency_Warning', 'Qn_dur_Entertainment', 'Qn_dur_Financial', 'Qn_dur_Gallery', 'Qn_dur_Gaming_Action', 'Qn_dur_Gaming_Adventure', 'Qn_dur_Gaming_Casual', 'Qn_dur_Gaming_Knowledge', 'Qn_dur_Gaming_Logic', 'Qn_dur_Gaming_Role_Playing', 'Qn_dur_Gaming_Simulation', 'Qn_dur_Gaming_Sports', 'Qn_dur_Gaming_Strategy', 'Qn_dur_Gaming_Tools_Community', 'Qn_dur_Group_Activity', 'Qn_dur_Health_SelfMonitoring', 'Qn_dur_Image_And_Video_Editing', 'Qn_dur_Internet_Browser', 'Qn_dur_Jobs_Additional_Income', 'Qn_dur_Language_Learning', 'Qn_dur_Messaging', 'Qn_dur_Music_Audio_Radio', 'Qn_dur_News_Magazines', 'Qn_dur_Note_Apps', 'Qn_dur_Office_Tools', 'Qn_dur_Organisation', 'Qn_dur_Orientation', 'Qn_dur_Personalization', 'Qn_dur_Private_Transportation', 'Qn_dur_Provider_Services', 'Qn_dur_Public_Events', 'Qn_dur_Religion_Spirituality_Esotersim', 'Qn_dur_Security', 'Qn_dur_Settings', 'Qn_dur_Shared_Transportation', 'Qn_dur_Sharing_Cloud', 'Qn_dur_Shop_Sell_Rent', 'Qn_dur_Shopping', 'Qn_dur_Sleep', 'Qn_dur_Social_Media_Tools', 'Qn_dur_Social_Networks', 'Qn_dur_Sport_News', 'Qn_dur_Sports', 'Qn_dur_System_App', 'Qn_dur_TVVideo_Apps', 'Qn_dur_TV_Film_Guide', 'Qn_dur_Timer_Clocks', 'Qn_dur_Tools', 'Qn_dur_Travel', 'Qn_dur_Weather', 'Qn_dur_Womens_Apps', 'Qn_dur_Workout', 'Qn_dur_call_ring', 'Qn_firstevent', 'Qn_firstevent_weekdays', 'Qn_firstevent_weekends', 'Qn_lastevent', 'Qn_lastevent_weekdays', 'Qn_lastevent_weekend', 'Qn_rog', 'Qn_rog_weekdays', 'Qn_rog_weekends', 'Responses_calls', 'Responses_sms', 'SDD', 'SDD_daytime', 'SDD_nighttime', 'SDD_sat', 'SDD_weekday', 'SDD_weekend', 'app_simi_dayNight', 'app_simi_weekWeekend', 'contact_simi_call_inOut', 'contact_simi_call_weekWeekend', 'contact_simi_smsPhone', 'contact_simi_sms_inOut', 'daily_huber_homevisits', 'daily_huber_homevisits_weekday', 'daily_huber_homevisits_weekend', 'daily_mean_duration_music', 'daily_mean_duration_music_weekdays', 'daily_mean_duration_music_weekend', 'daily_mean_elev_change', 'daily_mean_elev_change_weekdays', 'daily_mean_elev_change_weekend', 'daily_mean_neg_elev_change', 'daily_mean_num_clusters', 'daily_mean_num_clusters_week', 'daily_mean_num_clusters_weekend', 'daily_mean_pos_elev_change', 'daily_sd_duration_music', 'daily_sd_elev_change', 'daily_sd_homevisits', 'daily_sd_homevisits_weekday', 'daily_sd_homevisits_weekend', 'daily_sd_num_song', 'daily_sd_num_uniq_alb', 'daily_sd_num_uniq_art', 'daily_sd_num_uniq_song', 'daily_sd_sum_intereventall', 'durationHome', 'entropy_contacts_call_in', 'entropy_contacts_call_miss', 'entropy_contacts_call_out', 'entropy_contacts_call_ring', 'entropy_contacts_sms_in', 'entropy_contacts_sms_sent', 'entropy_duration_clusters', 'excess_music_acousticness', 'excess_music_danceability', 'excess_music_energy', 'excess_music_instrumentalness', 'excess_music_liveness', 'excess_music_loudness', 'excess_music_popularity', 'excess_music_speechiness', 'excess_music_tempo', 'excess_music_valence', 'fav1_acousticness', 'fav1_daily_mean_duration', 'fav1_daily_mean_num', 'fav1_danceability', 'fav1_energy', 'fav1_instrumentalness', 'fav1_liveness', 'fav1_loudness', 'fav1_popularity', 'fav1_speechiness', 'fav1_tempo', 'fav1_valence', 'fav2_acousticness', 'fav2_daily_mean_duration', 'fav2_daily_mean_num', 'fav2_danceability', 'fav2_energy', 'fav2_instrumentalness', 'fav2_liveness', 'fav2_loudness', 'fav2_popularity', 'fav2_speechiness', 'fav2_tempo', 'fav2_valence', 'fav3_acousticness', 'fav3_daily_mean_duration', 'fav3_daily_mean_num', 'fav3_danceability', 'fav3_energy', 'fav3_instrumentalness', 'fav3_liveness', 'fav3_loudness', 'fav3_popularity', 'fav3_speechiness', 'fav3_tempo', 'fav3_valence', 'fav4_acousticness', 'fav4_daily_mean_duration', 'fav4_daily_mean_num', 'fav4_danceability', 'fav4_energy', 'fav4_instrumentalness', 'fav4_liveness', 'fav4_loudness', 'fav4_popularity', 'fav4_speechiness', 'fav4_tempo', 'fav4_valence', 'fav5_acousticness', 'fav5_daily_mean_duration', 'fav5_daily_mean_num', 'fav5_danceability', 'fav5_energy', 'fav5_instrumentalness', 'fav5_liveness', 'fav5_loudness', 'fav5_popularity', 'fav5_speechiness', 'fav5_tempo', 'fav5_valence', 'huberM_daily_max_dist_home', 'huberM_daily_max_dist_home_weekday', 'huberM_daily_max_dist_home_weekend', 'huberM_daily_time_spent_home', 'huberM_distance_covered_daily', 'huberM_distance_covered_weekday', 'huberM_distance_covered_weekend', 'huberM_dur_.com.android.deskclock', 'huberM_dur_.com.android.phone', 'huberM_dur_.com.android.settings', 'huberM_dur_.com.android.systemui', 'huberM_dur_.com.estrongs.android.pop', 'huberM_dur_.com.facebook.orca', 'huberM_dur_.com.google.android.calendar', 'huberM_dur_.com.google.android.talk', 'huberM_dur_.com.huawei.vassistant', 'huberM_dur_.com.netflix.mediaclient', 'huberM_dur_.com.samsung.android.app.galaxyfinder', 'huberM_dur_.com.samsung.android.incallui', 'huberM_dur_.com.samsung.android.messaging', 'huberM_dur_.com.sec.android.app.fm', 'huberM_dur_.com.sec.android.app.taskmanager', 'huberM_dur_.com.sonymobile.entrance', 'huberM_dur_.com.surpax.ledflashlight.panel', 'huberM_dur_.flipboard.app', 'huberM_dur_.tv.peel.smartremote', 'huberM_dur_Calendar_Apps', 'huberM_dur_Checkup_Monitoring', 'huberM_dur_Gaming_Tools_Community', 'huberM_dur_Music_Audio_Radio', 'huberM_dur_Personalization', 'huberM_dur_Settings', 'huberM_dur_Timer_Clocks', 'huberM_dur_Tools', 'huberM_dur_night_Calculator', 'huberM_dur_night_Calendar_Apps', 'huberM_dur_night_Calling', 'huberM_dur_night_Camera', 'huberM_dur_night_Gallery', 'huberM_dur_night_Gaming_Strategy', 'huberM_dur_night_Gaming_Tools_Community', 'huberM_dur_night_Language_Learning', 'huberM_dur_night_Music_Audio_Radio', 'huberM_dur_night_Organisation', 'huberM_dur_night_Private_Transportation', 'huberM_dur_night_Settings', 'huberM_dur_night_TVVideo_Apps', 'huberM_dur_night_Timer_Clocks', 'huberM_dur_night_Tools', 'huberM_firstevent', 'huberM_firstevent_weekdays', 'huberM_firstevent_weekends', 'huberM_lastevent', 'huberM_lastevent_weekdays', 'huberM_lastevent_weekend', 'huberM_max_dist_two_points_daily', 'huberM_max_dist_two_points_weekday', 'huberM_max_dist_two_points_weekend', 'huberM_rog_daily', 'huberM_rog_nightly', 'huberM_rog_weekdays', 'huberM_rog_weekends', 'huberM_time_spent_home', 'huberM_time_spent_home_weekday', 'huberM_time_spent_home_weekend', 'maxDistance', 'max_distance_home', 'max_elevation', 'max_elevation_weekdays', 'max_elevation_weekends', 'max_music_acousticness', 'max_music_danceability', 'max_music_energy', 'max_music_instrumentalness', 'max_music_liveness', 'max_music_loudness', 'max_music_popularity', 'max_music_speechiness', 'max_music_tempo', 'max_music_valence', 'mean_charge_conn', 'mean_charge_dis', 'mean_dur_wakeLeave', 'mean_dur_wakeLeaveHome', 'mean_dur_wakeLeaveHome_weekday', 'mean_dur_wakeLeaveHome_weekend', 'mean_elevation', 'mean_elevation_weekdays', 'mean_elevation_weekends', 'mean_music_acousticness', 'mean_music_acousticness_weekday', 'mean_music_acousticness_weekend', 'mean_music_danceability', 'mean_music_danceability_weekday', 'mean_music_danceability_weekend', 'mean_music_energy', 'mean_music_energy_weekday', 'mean_music_energy_weekend', 'mean_music_explicit', 'mean_music_explicit_weekday', 'mean_music_explicit_weekend', 'mean_music_instrumentalness', 'mean_music_instrumentalness_weekday', 'mean_music_instrumentalness_weekend', 'mean_music_liveness', 'mean_music_liveness_weekday', 'mean_music_liveness_weekend', 'mean_music_loudness', 'mean_music_loudness_weekday', 'mean_music_loudness_weekend', 'mean_music_mode', 'mean_music_mode_weekday', 'mean_music_mode_weekend', 'mean_music_popularity', 'mean_music_popularity_weekday', 'mean_music_popularity_weekend', 'mean_music_speechiness', 'mean_music_speechiness_weekday', 'mean_music_speechiness_weekend', 'mean_music_tempo', 'mean_music_tempo_weekday', 'mean_music_tempo_weekend', 'mean_music_valence', 'mean_music_valence_weekday', 'mean_music_valence_weekend', 'mean_time_LeaveHome', 'mean_time_LeaveHome_weekday', 'mean_time_LeaveHome_weekend', 'mean_time_callback', 'mean_time_firstLeave', 'mean_time_lastHome', 'mean_time_lastHome_weekday', 'mean_time_lastHome_weekend', 'min_music_acousticness', 'min_music_danceability', 'min_music_energy', 'min_music_instrumentalness', 'min_music_liveness', 'min_music_loudness', 'min_music_popularity', 'min_music_speechiness', 'min_music_tempo', 'min_music_valence', 'perc_music_key_0', 'perc_music_key_0_weekday', 'perc_music_key_0_weekend', 'perc_music_key_1', 'perc_music_key_10', 'perc_music_key_10_weekday', 'perc_music_key_10_weekend', 'perc_music_key_1_weekday', 'perc_music_key_1_weekend', 'perc_music_key_2', 'perc_music_key_2_weekday', 'perc_music_key_2_weekend', 'perc_music_key_3', 'perc_music_key_3_weekday', 'perc_music_key_3_weekend', 'perc_music_key_4', 'perc_music_key_4_weekday', 'perc_music_key_4_weekend', 'perc_music_key_5', 'perc_music_key_5_weekday', 'perc_music_key_5_weekend', 'perc_music_key_6', 'perc_music_key_6_weekday', 'perc_music_key_6_weekend', 'perc_music_key_7', 'perc_music_key_7_weekday', 'perc_music_key_7_weekend', 'perc_music_key_8', 'perc_music_key_8_weekday', 'perc_music_key_8_weekend', 'perc_music_key_9', 'perc_music_key_9_weekday', 'perc_music_key_9_weekend', 'responserate_calls', 'responserate_sms', 'rog', 'sd_dur_down', 'sd_dur_down_Fri_and_Sat', 'sd_dur_down_Sun_until_Thu', 'sd_dur_wakeLeave', 'sd_dur_wakeLeaveHome', 'sd_dur_wakeLeaveHome_weekday', 'sd_dur_wakeLeaveHome_weekend', 'sd_music_acousticness', 'sd_music_acousticness_weekday', 'sd_music_acousticness_weekend', 'sd_music_danceability', 'sd_music_danceability_weekday', 'sd_music_danceability_weekend', 'sd_music_energy', 'sd_music_energy_weekday', 'sd_music_energy_weekend', 'sd_music_instrumentalness', 'sd_music_instrumentalness_weekday', 'sd_music_instrumentalness_weekend', 'sd_music_liveness', 'sd_music_liveness_weekday', 'sd_music_liveness_weekend', 'sd_music_loudness', 'sd_music_loudness_weekday', 'sd_music_loudness_weekend', 'sd_music_popularity', 'sd_music_popularity_weekday', 'sd_music_popularity_weekend', 'sd_music_speechiness', 'sd_music_speechiness_weekday', 'sd_music_speechiness_weekend', 'sd_music_tempo', 'sd_music_tempo_weekday', 'sd_music_tempo_weekend', 'sd_music_valence', 'sd_music_valence_weekday', 'sd_music_valence_weekend', 'sd_time_LeaveHome', 'sd_time_LeaveHome_weekday', 'sd_time_LeaveHome_weekend', 'sd_time_firstLeave', 'sd_time_lastHome', 'sd_time_lastHome_weekday', 'sd_time_lastHome_weekend', 'skew_music_acousticness', 'skew_music_danceability', 'skew_music_energy', 'skew_music_instrumentalness', 'skew_music_liveness', 'skew_music_loudness', 'skew_music_popularity', 'skew_music_speechiness', 'skew_music_tempo', 'skew_music_valence', but learner 'regr.lm' does not support this

Unfortunately, this fails because there are missing values in the dataset and regr.lm cannot handle them. We will use mlr3pipelines (Binder u. a. 2021) to build a simple analysis pipeline, called GraphLearner in mlr3 (a learner consisting of several consecutive analysis steps that can be visualized as a graph), that automatically replaces missing values with the median of the respective training set (i.e., median imputation), prior to fitting our linear model. We do not recommend using mean or median imputation in real applications.2 A tutorial on how to build more complex analysis pipelines with the mlr3pipelines package can be found in the mlr3 e-book (Becker u. a. 2022).

imputer <- po("imputemedian") # po defines a single pipeline operation
lm <- as_learner(imputer %>>% lm) # combine po and learner into a pipeline

Now, training the augmented learner on the task works just fine.

lm$train(task = task_Soci)

The previous line trained the model and automatically stored it inside the learner object. One great advantage of mlr3 is that we can use the same modeling functions for ML models from different R packages without having to remember the peculiarities of their modeling syntax. We can use the trained model to make predictions which we have to store in a separate object.3

prediction <- lm$predict(task = task_Soci)

We just predicted the same data that we already used for model training, but we could also compute predictions for new observations. In the Sociability task, we did not include four individuals with missing values on the gender variable. Because we do not use gender as a feature here, we can treat these individuals as new data and predict their sociability score with $predict_newdata().

phonedata_new <- readRDS(file = "clusterdata.RDS")
phonedata_new <- phonedata_new[
  !complete.cases(phonedata_new$gender), c(1:1821, 1837)]
lm$predict_newdata(newdata = phonedata_new)$response
[1]  602.66204 -373.82706  -44.80488  -27.41789
attr(,"non-estim")
1 2 3 4 
1 2 3 4 

With this functionality, it would be possible to use the model in a practical application. However, it would be irresponsible to apply any predictive model for which the expected predictive performance is unknown. Therefore, we now demonstrate how to evaluate predictive performance with mlr3.

If we wanted to compute in-sample performance based on the predictions for all observations included in our task (which we stored in prediction), we could calculate the estimates with the score function and specify the performance measures we are interested in (\(R^2\) and \(RMSE\)) with their respective id. For an exhaustive list of all performance measures available in mlr3, type as.data.table(mlr_measures) or checkout the mlr3 e-book (Becker u. a. 2022).

mes <- msrs(c("regr.rsq", "regr.rmse"))
prediction$score(mes)
    regr.rsq    regr.rmse 
1.000000e+00 2.433205e-11 

The performance on the training data is almost perfect. \(R^2\) is \(1\) and the \(RMSE\) is numerically indistinguishable from \(0\) (see all.equal(0, 2.68163e-11)). We should always be skeptical when we observe very high in-sample performance, because this can be a sign that the model overfitted to the training data. In general, we should never trust in-sample performance but estimate out-of-sample performance instead.

Compute Out-of-sample Performance Estimate

Next we want to use CV to compute an out-of-sample performance estimate. We specify a resampling strategy (here 5-fold CV). You can run as.data.table(mlr_resamplings) to get a table of available resampling strategies.

rdesc <- rsmp("cv", folds = 5)

The resample function randomly splits our dataset based on our resample description, retrains our learner on each subset and computes predictions on each test set. Before running resample, we set an arbitrary seed to make our results reproducible. Next we compute the out-of-sample performance estimate for our preferred measures aggregated across our 5 test sets with aggregate.

set.seed(1)
res <- resample(learner = lm, task = task_Soci, resampling = rdesc)
INFO  [11:22:30.262] [mlr3] Applying learner 'imputemedian.regr.lm' on task 'Sociability_Regr' (iter 1/5)
INFO  [11:22:42.474] [mlr3] Applying learner 'imputemedian.regr.lm' on task 'Sociability_Regr' (iter 2/5)
INFO  [11:22:54.633] [mlr3] Applying learner 'imputemedian.regr.lm' on task 'Sociability_Regr' (iter 3/5)
INFO  [11:23:06.931] [mlr3] Applying learner 'imputemedian.regr.lm' on task 'Sociability_Regr' (iter 4/5)
INFO  [11:23:19.358] [mlr3] Applying learner 'imputemedian.regr.lm' on task 'Sociability_Regr' (iter 5/5)
res$aggregate(mes)
   regr.rsq   regr.rmse 
-2340.92709    79.31954 

When we compare the out-of-sample to the in-sample estimates, we realize that the predictions of our model are expected to be really bad. This might be no surprise to many readers because we used ordinary linear regression with 620 observations and 1822 predictor variables, which results in an unidentified model. As a consequence, the \(RMSE\) is huge: a typical deviation between true and predicted sociability scores is about 79, but the true sociability scores in the dataset range only from -4.5 to 5.64. The negative \(R^2\) also implies that the predictive model should not be used in practice. Remember that in contrast to the well known in-sample estimate for linear regression, out-of-sample \(R^2\) can be negative. Negative \(R^2\) indicates that the model performs worse than a simple baseline model that completely ignores all features and merely predicts the mean target value in the test data. The concrete values of negative \(R^2\) do not have any intuitive interpretation. We give a better intuition on why \(R^2\) can become negative in ESM 3.1. The important message here is that with a poorly designed ML model, it is easy to produce worse predictions compared to simple guessing. The naive notion that using any predictive model might still be better than using no formal predictions at all is wrong. However, estimating predictive performance with resampling can prevent us from applying inappropriate models in practice, without relying on expert knowledge about the specific model class (e.g., identification issues in linear regression).

Practical Exercise II:

Follow up here with Practical Exercise II on how to Train a Random Forest and Estimate Predictive Performance.

References

Becker, Marc, Przemyslaw Biecek, Martin Binder, Bernd Bischl, Lukas Burk, Giuseppe Casalicchio, Sebastian Fischer, u. a. 2022. Flexible and Robust Machine Learning Using mlr3 in R. https://mlr3book.mlr-org.com/.
Binder, Martin, Florian Pfisterer, Michel Lang, Lennart Schneider, Lars Kotthoff, und Bernd Bischl. 2021. „mlr3pipelines-flexible machine learning pipelines in r“. Journal of Machine Learning Research 22 (184): 1–7.
Lang, Michel, und Patrick Schratz. 2021. mlr3verse: Easily Install and Load the ’mlr3’ Package Family. https://CRAN.R-project.org/package=mlr3verse.
Masters, Geoff N. 1982. „A rasch model for partial credit scoring“. Psychometrika 47 (2): 149–74. https://doi.org/10.1007/BF02296272.
Stachl, Clemens, Quay Au, Ramona Schoedel, Samuel D Gosling, Gabriella M Harari, Daniel Buschek, Sarah Theres Völkel, u. a. 2020. Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences of the United States of America 117 (30): 17680–87. https://doi.org/10.1073/pnas.1920484117.

Fußnoten

  1. The E2.Sociableness variable is the estimated person parameter of a Partial Credit Model (Masters 1982) for the sociability facet of the personality trait extraversion in the BFSI. For details, see Stachl u. a. (2020).↩︎

  2. In Stachl u. a. (2020), a more advanced analysis pipeline and imputation strategy was used compared to this tutorial. For a description, see the supplementary information for that paper.↩︎

  3. R issues a warning that the predictions may be misleading, but they are computed nonetheless.↩︎