You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Use air
* Reformat with air
* Format on save with air
* Remove empty lines after code cell yaml
* Fix typo, closes#1665
* Use source editor
* Fix logical error, closes#1603
@@ -386,7 +372,6 @@ A visually simpler plot for exploring this relationship is using side-by-side bo
386
372
#| prices is right skewed for each cut (Fair, Good, Very Good, Premium, and
387
373
#| Ideal). The medians are close to each other, with the median for Ideal
388
374
#| diamonds lowest and that for Fair highest.
389
-
390
375
ggplot(diamonds, aes(x = cut, y = price)) +
391
376
geom_boxplot()
392
377
```
@@ -407,7 +392,6 @@ You might be interested to know how highway mileage varies across classes:
407
392
#| Side-by-side boxplots of highway mileages of cars by class. Classes are
408
393
#| on the x-axis (2seaters, compact, midsize, minivan, pickup, subcompact,
409
394
#| and suv).
410
-
411
395
ggplot(mpg, aes(x = class, y = hwy)) +
412
396
geom_boxplot()
413
397
```
@@ -419,7 +403,6 @@ To make the trend easier to see, we can reorder `class` based on the median valu
419
403
#| Side-by-side boxplots of highway mileages of cars by class. Classes are
420
404
#| on the x-axis and ordered by increasing median highway mileage (pickup,
421
405
#| suv, minivan, 2seater, subcompact, compact, and midsize).
422
-
423
406
ggplot(mpg, aes(x = fct_reorder(class, hwy, median), y = hwy)) +
424
407
geom_boxplot()
425
408
```
@@ -431,7 +414,6 @@ You can do that by exchanging the x and y aesthetic mappings.
431
414
#| fig-alt: |
432
415
#| Side-by-side boxplots of highway mileages of cars by class. Classes are
433
416
#| on the y-axis and ordered by increasing median highway mileage.
434
-
435
417
ggplot(mpg, aes(x = hwy, y = fct_reorder(class, hwy, median))) +
436
418
geom_boxplot()
437
419
```
@@ -473,7 +455,6 @@ One way to do that is to rely on the built-in `geom_count()`:
473
455
#| and color (D, E, F, G, G, I, and J). The sizes of the points represent
474
456
#| the number of observations for that combination. The legend indicates
475
457
#| that these sizes range between 1000 and 4000.
476
-
477
458
ggplot(diamonds, aes(x = cut, y = color)) +
478
459
geom_count()
479
460
```
@@ -497,7 +478,6 @@ Then visualize with `geom_tile()` and the fill aesthetic:
497
478
#| observations in each tile. There are more Ideal diamonds than other cuts,
498
479
#| with the highest number being Ideal diamonds with color G. Fair diamonds
499
480
#| and diamonds with color I are the lowest in frequency.
500
-
501
481
diamonds |>
502
482
count(color, cut) |>
503
483
ggplot(aes(x = color, y = cut)) +
@@ -530,7 +510,6 @@ The relationship is exponential.
530
510
#| fig-alt: |
531
511
#| A scatterplot of price vs. carat. The relationship is positive, somewhat
532
512
#| strong, and exponential.
533
-
534
513
ggplot(smaller, aes(x = carat, y = price)) +
535
514
geom_point()
536
515
```
@@ -547,7 +526,6 @@ You've already seen one way to fix the problem: using the `alpha` aesthetic to a
547
526
#| strong, and exponential. The points are transparent, showing clusters where
548
527
#| the number of points is higher than other areas, The most obvious clusters
549
528
#| are for diamonds with 1, 1.5, and 2 carats.
550
-
551
529
ggplot(smaller, aes(x = carat, y = price)) +
552
530
geom_point(alpha = 1 / 100)
553
531
```
@@ -569,7 +547,6 @@ You will need to install the hexbin package to use `geom_hex()`.
569
547
#| Plot 1: A binned density plot of price vs. carat. Plot 2: A hexagonal bin
570
548
#| plot of price vs. carat. Both plots show that the highest density of
571
549
#| diamonds have low carats and low prices.
572
-
573
550
ggplot(smaller, aes(x = carat, y = price)) +
574
551
geom_bin2d()
575
552
@@ -591,7 +568,6 @@ For example, you could bin `carat` and then for each group, display a boxplot:
591
568
#| roughly symmetric price distributions, and diamonds that weigh more have
592
569
#| left skewed distributions. Cheaper, smaller diamonds have outliers on the
593
570
#| higher end, more expensive, bigger diamonds have outliers on the lower end.
594
-
595
571
ggplot(smaller, aes(x = carat, y = price)) +
596
572
geom_boxplot(aes(group = cut_width(carat, 0.1)))
597
573
```
@@ -672,7 +648,6 @@ Then, we exponentiate the residuals to put them back in the scale of raw prices.
672
648
#| to 5, the y-axis ranges from 0 to almost 4. Much of the data are clustered
673
649
#| around low values of carat and residuals. There is a clear, curved pattern
674
650
#| showing decrease in residuals as carat increases.
675
-
676
651
library(tidymodels)
677
652
678
653
diamonds <- diamonds |>
@@ -699,7 +674,6 @@ Once you've removed the strong relationship between carat and price, you can see
699
674
#| cuts (Fair to Ideal), the y-axis ranges from 0 to almost 5. The medians are
700
675
#| quite similar, between roughly 0.75 to 1.25. Each of the distributions of
701
676
#| residuals is right skewed, with many outliers on the higher end.
702
-
703
677
ggplot(diamonds_aug, aes(x = cut, y = .resid)) +
704
678
geom_boxplot()
705
679
```
@@ -712,4 +686,4 @@ In this chapter you've learned a variety of tools to help you understand the var
712
686
You've seen techniques that work with a single variable at a time and with a pair of variables.
713
687
This might seem painfully restrictive if you have tens or hundreds of variables in your data, but they're the foundation upon which all other techniques are built.
714
688
715
-
In the next chapter, we'll focus on the tools we can use to communicate our results.
689
+
In the next chapter, we'll focus on the tools we can use to communicate our results.
0 commit comments