Skip to content

Commit ecbe1e6

Browse files
committed
Add the challenge instructions
1 parent d9bc322 commit ecbe1e6

File tree

1 file changed

+117
-0
lines changed

1 file changed

+117
-0
lines changed

analysis/challenge.Rmd

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: "Reproducibility challenge"
3+
author: "John Blischak"
4+
date: "2020-06-11"
5+
output: workflowr::wflow_html
6+
editor_options:
7+
chunk_output_type: console
8+
---
9+
10+
## Introduction
11+
12+
For the reproducibility challenge, you will attempt to re-run an analysis of
13+
Spotify song genres that was inspired by the blog post
14+
[Understanding + classifying genres using Spotify audio features][blog-post]
15+
by Kaylin Pavlik ([\@kaylinquest][kaylinquest]).
16+
17+
[kaylinquest]: https://twitter.com/kaylinquest
18+
[blog-post]: https://www.kaylinpavlik.com/classifying-songs-genres/
19+
20+
<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">NEW: Understanding song genres using Spotify audio features and decision trees in <a href="https://twitter.com/hashtag/rstats?src=hash&amp;ref_src=twsrc%5Etfw">#rstats</a>. Basically: <br><br>rap: speechy 🗣️<br>rock: can’t dance to it 🤟<br>EDM: high tempo ⏩<br>R&amp;B: long songs ⏱️<br>latin: very danceable 💃<br>pop: everything else.<a href="https://t.co/q57ZDdROf7">https://t.co/q57ZDdROf7</a> <a href="https://t.co/sfxRPKvpp2">pic.twitter.com/sfxRPKvpp2</a></p>&mdash; Kaylin Pavlik (@kaylinquest) <a href="https://twitter.com/kaylinquest/status/1213138536570015745?ref_src=twsrc%5Etfw">January 3, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
21+
22+
The code includes a minimal machine learning style analysis with the following
23+
steps:
24+
25+
* Import the songs data
26+
* Split the songs data into training and test sets
27+
* Build a tree model to classify songs into genres based on the song characteristics
28+
* Assess accuracy of the model on the training and test sets
29+
* Compute the accuracy of a model based on random guessing
30+
31+
## Getting started
32+
33+
The analysis purposefully contains various issues that make it difficult to
34+
reproduce. Open the file `spotify.Rmd` by clicking on it in the RStudio Files
35+
pane. Click the Knit button to re-run the analysis and find the first issue.
36+
37+
## File paths
38+
39+
The first error you will encounter is below:
40+
41+
```
42+
Quitting from lines 16-21 (spotify.Rmd)
43+
Error in file(file, "rt") : cannot open the connection
44+
Calls: <Anonymous> ... withVisible -> eval -> eval -> read.csv -> read.table -> file
45+
Execution halted
46+
```
47+
48+
The function `read.csv()` is unable to open the data file. What's wrong with the
49+
path to the file? Apply what you know about absolute and relative paths to
50+
update the path and re-run the analysis.
51+
52+
## Undefined variable
53+
54+
The next error you encounter is:
55+
56+
```
57+
Quitting from lines 27-30 (spotify.Rmd)
58+
Error in sample.int(length(x), size, replace, prob) :
59+
object 'numTrainingSamples' not found
60+
Calls: <Anonymous> ... withVisible -> eval -> eval -> sample -> sample.int
61+
Execution halted
62+
```
63+
64+
It looks like the variable `numTrainingSamples` isn't defined in the Rmd file.
65+
This error often occurs when a variable is interactively created in the R
66+
console, but you forget to define it in the script.
67+
68+
Based on the description above the code chunk, can you define the variable
69+
`numTrainingSamples`? Hint: You can obtain the number of samples with
70+
`nrow(spotify)`.
71+
72+
## Missing package
73+
74+
The next error you encounter is:
75+
76+
```
77+
Quitting from lines 36-39 (spotify.Rmd)
78+
Error in rpart(genre ~ ., data = spotifyTraining) :
79+
could not find function "rpart"
80+
Calls: <Anonymous> ... handle -> withCallingHandlers -> withVisible -> eval -> eval
81+
Execution halted
82+
```
83+
84+
The function `rpart()` can't be found. This can occur when you load a package
85+
in the current R session, but forget to put the call to `library()` in the
86+
script.
87+
88+
Based on the text above the code chunk, can you figure out which package needs
89+
to be loaded?
90+
91+
## Renamed variable
92+
93+
The next error you encounter is:
94+
95+
```
96+
Quitting from lines 61-66 (spotify.Rmd)
97+
Error in mean(spotifyTesting[, 1] == predict_random) :
98+
object 'predict_random' not found
99+
Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> mean
100+
101+
Execution halted
102+
```
103+
104+
R can't find the variable named `predict_random`. Look at the surrounding code:
105+
what do you think the name of this variable should be?
106+
107+
Renaming variables during an analysis can lead to these subtle errors. Since
108+
both the original and updated versions of the variable are defined in the current
109+
R session, the code will continue to run. But when you or someone else tries to
110+
run the code in a clean R session, the code will unexpectedly fail.
111+
112+
## Compare results
113+
114+
Success! The analysis now runs. Compare your prediction results to that of your
115+
partners' and/or re-run the analysis again. Are the results always identical?
116+
Why not? What could you do if you wanted to publish these results and allow
117+
others to exactly reproduce your findings?

0 commit comments

Comments
 (0)