Author: Jose Carlos Molano de Oro
University: Pontificia Universidad Javeriana
Course: Linear Regression Analysis
Semester: 2022-3
Professor: Mario Gregorio Saavedra Rodriguez
Author Email: jose_molano@javeriana.edu.co
Professor Email: saavedrarmg@javeriana.edu.co
library(DT)
library(Hmisc)
library(reactablefmtr)
library(dplyr)
library(aplpack)
library(corrplot)
library(ggplot2)
library(MASS)
library(broom)
library(tidyverse)
library(ggfortify)
library(mosaic)
library(jtools)
library(latex2exp)
library(pubh)
library(sjlabelled)
library(sjPlot)
library(sjmisc)
library(Ecdat)
library(PerformanceAnalytics)
library(regclass)
library(pdp)
library(gridExtra)
library(olsrr)
This dataset contains the monthly number of reported arrests in the US for various offenses reported by participating law enforcement agencies. The arrests are by offense and broken down by age and sex or age and race. Not all agencies report race and/or ethnicity for arrests but they must report age and sex. Note that only agencies that have reported arrests for 12 months of the year are represented in the annual counts that are included in the database. Download this dataset to see totals of reported arrests for the nation from 1995–2016.
The dataset was taken from de Federal Bureo of Investigation (FBI) Crime Data Explorer
FBI<-read.csv(url("https://s3-us-gov-west-1.amazonaws.com/cg-d3f0433b-a53e-4934-8b94-c678aa2cbaf3/arrests_national.csv"),row.names = 2,h = T)
reactable(FBI,rownames = TRUE)
Since the id attribute is not used, it is removed from the dataset
FBI$id<-NULL
reactable(FBI,rownames = TRUE)
summary(FBI)
## population total_arrests homicide rape
## Min. :262803276 Min. :10662252 Min. :10231 Min. :16863
## 1st Qu.:282395819 1st Qu.:12586911 1st Qu.:11348 1st Qu.:21701
## Median :297952772 Median :13839754 Median :13331 Median :25032
## Mean :295487602 Mean :13418226 Mean :13710 Mean :25205
## 3rd Qu.:311023417 3rd Qu.:14180570 3rd Qu.:14134 3rd Qu.:28083
## Max. :323127513 Max. :15284300 Max. :21230 Max. :34650
## robbery aggravated_assault burglary larceny
## Min. : 94403 Min. :358860 Min. :207325 Min. :1050058
## 1st Qu.:105863 1st Qu.:400402 1st Qu.:288660 1st Qu.:1160498
## Median :108921 Median :442990 Median :295372 Median :1210490
## Mean :116045 Mean :445464 Mean :294936 Mean :1241126
## 3rd Qu.:126438 3rd Qu.:478265 3rd Qu.:304564 3rd Qu.:1279616
## Max. :171870 Max. :568480 Max. :386500 Max. :1530200
## motor_vehicle_theft arson violent_crime property_crime
## Min. : 64566 Min. : 8834 Min. :480360 Min. :1353283
## 1st Qu.: 78934 1st Qu.:11519 1st Qu.:539047 1st Qu.:1606177
## Median :139978 Median :15834 Median :597236 Median :1630406
## Mean :120921 Mean :14733 Mean :600415 Mean :1671715
## 3rd Qu.:148814 3rd Qu.:16759 3rd Qu.:626632 3rd Qu.:1677062
## Max. :191900 Max. :20000 Max. :796250 Max. :2128600
## other_assault forgery fraud embezzlement
## Min. :1078808 Min. : 55333 Min. :128531 Min. :15200
## 1st Qu.:1242966 1st Qu.: 72184 1st Qu.:173134 1st Qu.:16065
## Median :1293424 Median :107777 Median :281816 Median :17100
## Mean :1259624 Mean : 95762 Mean :273572 Mean :17620
## 3rd Qu.:1310566 3rd Qu.:115451 3rd Qu.:343650 3rd Qu.:18852
## Max. :1395800 Max. :122300 Max. :465000 Max. :22381
## stolen_property vandalism weapons prostitution
## Min. : 88576 Min. :191015 Min. :137779 Min. : 38306
## 1st Qu.: 95519 1st Qu.:241417 1st Qu.:157338 1st Qu.: 58676
## Median :121936 Median :275064 Median :167153 Median : 78640
## Mean :118191 Mean :265275 Mean :174857 Mean : 74418
## 3rd Qu.:128090 3rd Qu.:289934 3rd Qu.:190173 3rd Qu.: 87809
## Max. :166500 Max. :320900 Max. :243900 Max. :101600
## other_sex_offenses drug_abuse gambling against_family
## Min. : 51063 Min. :1476100 Min. : 3705 Min. : 88748
## 1st Qu.: 70076 1st Qu.:1533853 1st Qu.: 8900 1st Qu.:111938
## Median : 89082 Median :1576072 Median :10630 Median :127032
## Mean : 81231 Mean :1617127 Mean :10736 Mean :126231
## 3rd Qu.: 93149 3rd Qu.:1674540 3rd Qu.:11916 3rd Qu.:143487
## Max. :101900 Max. :1889810 Max. :21000 Max. :155800
## dui liquor_laws drunkenness disorderly_conduct
## Min. :1017808 Min. :234899 Min. :376433 Min. :369733
## 1st Qu.:1305198 1st Qu.:503684 1st Qu.:537818 1st Qu.:590412
## Median :1434117 Median :611335 Median :566726 Median :647346
## Mean :1364988 Mean :548852 Mean :573149 Mean :626903
## 3rd Qu.:1461434 3rd Qu.:635714 3rd Qu.:632832 3rd Qu.:693571
## Max. :1511300 Max. :683124 Max. :734800 Max. :842600
## vagrancy other suspicion curfew_loitering
## Min. :24851 Min. :3218880 Min. : 576 Min. : 34176
## 1st Qu.:27316 1st Qu.:3553687 1st Qu.: 1451 1st Qu.: 81406
## Median :29076 Median :3724251 Median : 3018 Median :139116
## Mean :29909 Mean :3668659 Mean : 3909 Mean :122666
## 3rd Qu.:33056 3rd Qu.:3832337 3rd Qu.: 5562 3rd Qu.:152130
## Max. :36471 Max. :4022068 Max. :12100 Max. :187800
datatable(as.matrix(sapply(FBI,function(x) mean(x, na.rm=TRUE))))
reactable(var(FBI,use = "complete.obs"))
reactable(cov(FBI,use = "complete.obs"))
c=cor(FBI)
y=as.data.frame(c)
y[y==1]<-" "
y <- mutate_all(y, function(x) as.numeric(as.character(x)))
reactable(as.data.frame.array((y)),
defaultColDef = colDef(
style = highlight_min_max(as.data.frame.array((y)))))
corrplot(cor(FBI,use = "complete.obs"),method="circle")
FBI.rcorr = rcorr(as.matrix(FBI))
FBI.p=FBI.rcorr$P
reactable(as.data.frame.array(FBI.p),
defaultColDef = colDef(
style = highlight_min_max(as.data.frame.array(FBI.p))))
As can be seen in the correlation plot, it can be seen that almost all the variables are correlated with each other.
The values highlighted on the correlation matrix in green, represent the variables that are most correlated with each other.
The values highlighted on the correlation test table in green, represent the highest p-values of variables that are most correlated with each other. This p-values non-reject the nule hypothesis.
Considering the attributes in the correlation matrix that are most correlated positively with each other (values greater than 0.95):
Some of the mentioned attributes are taken to perform several graphs.
ggplot(FBI, aes(x=violent_crime, y=aggravated_assault)) + geom_point()+labs(title = "Violent Crime vs Aggravated Assault")
ggplot(FBI, aes(x=violent_crime, y=aggravated_assault,label=rownames(FBI))) + geom_text()+labs(title = "Violent Crime vs Aggravated Assault using Row Names")
ggplot(FBI, aes(x=violent_crime, y=homicide)) + geom_point()+labs(title = "Violent Crime vs Homicide")
ggplot(FBI, aes(x=violent_crime, y=homicide,label=rownames(FBI))) + geom_text()+labs(title = "Violent Crime vs Homicide using Row Names")
ggplot(FBI, aes(x=violent_crime, y=aggravated_assault,size=homicide)) + geom_point(alpha=0.5)+scale_size(range=c(.1,15))+labs(title = "Violent Crime vs Aggravated Assault and Homicide")
ggplot(FBI, aes(x=violent_crime, y=aggravated_assault,size=fraud)) + geom_point(alpha=0.5)+scale_size(range=c(.1,15))+labs(title = "Violent Crime vs Aggravated Assault and Fraud")
df<-data.frame(FBI$aggravated_assault,FBI$violent_crime,FBI$homicide,FBI$stolen_property,FBI$fraud,FBI$arson,FBI$prostitution,FBI$other_sex_offenses,FBI$drunkenness,FBI$dui,FBI$liquor_laws,FBI$drug_abuse,FBI$curfew_loitering,FBI$embezzlement,FBI$vagrancy)
faces(df, main="United States FBI Arrest Data",face.type=0, print.info=TRUE,labels = rownames(FBI))
## effect of variables:
## modified item Var
## "height of face " "FBI.aggravated_assault"
## "width of face " "FBI.violent_crime"
## "structure of face" "FBI.homicide"
## "height of mouth " "FBI.stolen_property"
## "width of mouth " "FBI.fraud"
## "smiling " "FBI.arson"
## "height of eyes " "FBI.prostitution"
## "width of eyes " "FBI.other_sex_offenses"
## "height of hair " "FBI.drunkenness"
## "width of hair " "FBI.dui"
## "style of hair " "FBI.liquor_laws"
## "height of nose " "FBI.drug_abuse"
## "width of nose " "FBI.curfew_loitering"
## "width of ear " "FBI.embezzlement"
## "height of ear " "FBI.vagrancy"
FBI.train <- sample_frac(tbl = df, replace = FALSE, size = 0.80)
FBI.test <- anti_join(df, FBI.train)
## Joining, by = c("FBI.aggravated_assault", "FBI.violent_crime", "FBI.homicide",
## "FBI.stolen_property", "FBI.fraud", "FBI.arson", "FBI.prostitution",
## "FBI.other_sex_offenses", "FBI.drunkenness", "FBI.dui", "FBI.liquor_laws",
## "FBI.drug_abuse", "FBI.curfew_loitering", "FBI.embezzlement", "FBI.vagrancy")
chart.Correlation(df)
model_norm <- lm(FBI.aggravated_assault ~ FBI.violent_crime + FBI.homicide, data = FBI.train)
autoplot(model_norm)
model_norm %>% augment() %>% as_tibble()
| FBI.aggravated_assault | FBI.violent_crime | FBI.homicide | .fitted | .resid | .hat | .sigma | .cooksd | .std.resid |
|---|---|---|---|---|---|---|---|---|
| 372685 | 498666 | 10571 | 3.71e+05 | 2.04e+03 | 0.203 | 9.17e+03 | 0.00565 | 0.258 |
| 376154 | 505681 | 11092 | 3.75e+05 | 1.6e+03 | 0.231 | 9.18e+03 | 0.00424 | 0.206 |
| 472290 | 620510 | 14158 | 4.6e+05 | 1.21e+04 | 0.07 | 8.56e+03 | 0.0505 | 1.42 |
| 383977 | 515151 | 11788 | 3.8e+05 | 4.12e+03 | 0.302 | 9.1e+03 | 0.0444 | 0.555 |
| 429969 | 594911 | 12955 | 4.43e+05 | -1.31e+04 | 0.08 | 8.43e+03 | 0.069 | -1.54 |
| 421215 | 581765 | 12418 | 4.34e+05 | -1.28e+04 | 0.0916 | 8.46e+03 | 0.0772 | -1.52 |
| 438033 | 586558 | 13467 | 4.34e+05 | 4.05e+03 | 0.0735 | 9.13e+03 | 0.00593 | 0.474 |
| 408488 | 552077 | 11201 | 4.14e+05 | -5.11e+03 | 0.144 | 9.08e+03 | 0.0217 | -0.622 |
| 433945 | 597447 | 13480 | 4.43e+05 | -9.25e+03 | 0.0563 | 8.84e+03 | 0.0229 | -1.07 |
| 397707 | 534704 | 10832 | 4e+05 | -2.57e+03 | 0.131 | 9.17e+03 | 0.00484 | -0.31 |
| 388362 | 521196 | 11075 | 3.88e+05 | 541 | 0.128 | 9.19e+03 | 0.000209 | 0.0652 |
| 447948 | 611523 | 13435 | 4.55e+05 | -7.4e+03 | 0.0971 | 8.96e+03 | 0.0276 | -0.877 |
| 449297 | 603503 | 14062 | 4.46e+05 | 3.25e+03 | 0.0775 | 9.15e+03 | 0.00405 | 0.38 |
| 478417 | 625132 | 13227 | 4.68e+05 | 1.07e+04 | 0.259 | 8.58e+03 | 0.226 | 1.39 |
| 534920 | 717750 | 18290 | 5.27e+05 | 8.37e+03 | 0.353 | 8.77e+03 | 0.249 | 1.17 |
| 521570 | 729900 | 19020 | 5.34e+05 | -1.24e+04 | 0.47 | 7.98e+03 | 1.1 | -1.92 |
| 477809 | 627132 | 13653 | 4.68e+05 | 1e+04 | 0.166 | 8.71e+03 | 0.101 | 1.24 |
| 449933 | 597026 | 13190 | 4.44e+05 | 5.95e+03 | 0.0656 | 9.05e+03 | 0.0112 | 0.693 |
model_norm %>% tidy()
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -1.19e+04 | 2.99e+04 | -0.398 | 0.696 |
| FBI.violent_crime | 0.851 | 0.115 | 7.4 | 2.2e-06 |
| FBI.homicide | -3.95 | 3.22 | -1.23 | 0.238 |
model_norm %>% confint() %>% as_tibble()
| 2.5 % | 97.5 % |
|---|---|
| -7.56e+04 | 5.18e+04 |
| 0.606 | 1.1 |
| -10.8 | 2.91 |
plot_model(model_norm,colors = "Accent",
show.values = TRUE,
value.offset = .4,
value.size = 4,
dot.size = 3,
line.size = 1.5,
vline.color = "blue",
width = 1.5
)
model_norm %>% VIF() %>% as_tibble()
| value |
|---|
| 11.8 |
| 11.8 |
p1 <- ggplot(FBI.train, aes(FBI.train[,2], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
p2 <- ggplot(FBI.train, aes(FBI.train[,3], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
p3 <- ggplot(FBI.train, aes(FBI.train[,10], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
grid.arrange(p1, p2, p3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
model_norm %>% ols_plot_cooksd_bar()
model_norm %>%
glm_coef(se_rob = TRUE, labels = model_labels(model_norm))
| Parameter | Coefficient | Pr(>|t|) |
|---|---|---|
| Constant | -11887.45 (-71104.23, 47329.34) | 0.675 |
| FBI.violent_crime | 0.85 (0.56, 1.14) | < 0.001 |
| FBI.homicide | -3.95 (-13.81, 5.91) | 0.406 |
model_norm %>% glance()
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.968 | 0.964 | 8.88e+03 | 230 | 5.58e-12 | 2 | -188 | 383 | 387 | 1.18e+09 | 15 | 18 |
model_norm %>% aov() %>% tidy()
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| FBI.violent_crime | 1 | 3.62e+10 | 3.62e+10 | 458 | 1.17e-12 |
| FBI.homicide | 1 | 1.19e+08 | 1.19e+08 | 1.51 | 0.238 |
| Residuals | 15 | 1.18e+09 | 7.89e+07 |
model_norm_AIC <- stepAIC(model_norm, trace = 0)
AIC(model_norm, model_norm_AIC)
| df | AIC |
|---|---|
| 4 | 383 |
| 3 | 383 |
model_norm <- lm(FBI.homicide ~ FBI.drug_abuse + FBI.violent_crime, data = FBI.train)
autoplot(model_norm)
model_norm %>% augment() %>% as_tibble()
| FBI.homicide | FBI.drug_abuse | FBI.violent_crime | .fitted | .resid | .hat | .sigma | .cooksd | .std.resid |
|---|---|---|---|---|---|---|---|---|
| 10571 | 1561231 | 498666 | 1.02e+04 | 403 | 0.188 | 710 | 0.0319 | 0.642 |
| 11092 | 1488707 | 505681 | 1.05e+04 | 596 | 0.228 | 697 | 0.0936 | 0.975 |
| 14158 | 1538813 | 620510 | 1.44e+04 | -228 | 0.114 | 717 | 0.00518 | -0.348 |
| 11788 | 1572579 | 515151 | 1.07e+04 | 1.07e+03 | 0.146 | 651 | 0.157 | 1.66 |
| 12955 | 1702537 | 594911 | 1.33e+04 | -355 | 0.0713 | 713 | 0.00716 | -0.529 |
| 12418 | 1663582 | 581765 | 1.29e+04 | -486 | 0.0592 | 708 | 0.0109 | -0.72 |
| 13467 | 1746570 | 586558 | 1.3e+04 | 497 | 0.102 | 706 | 0.0216 | 0.754 |
| 11201 | 1638846 | 552077 | 1.19e+04 | -712 | 0.0762 | 692 | 0.0311 | -1.06 |
| 13480 | 1841182 | 597447 | 1.32e+04 | 249 | 0.216 | 716 | 0.015 | 0.404 |
| 10832 | 1531251 | 534704 | 1.14e+04 | -612 | 0.137 | 698 | 0.0473 | -0.946 |
| 11075 | 1552432 | 521196 | 1.1e+04 | 121 | 0.144 | 719 | 0.00199 | 0.188 |
| 13435 | 1889810 | 611523 | 1.37e+04 | -222 | 0.304 | 717 | 0.0213 | -0.382 |
| 14062 | 1846351 | 603503 | 1.34e+04 | 629 | 0.225 | 694 | 0.102 | 1.03 |
| 13227 | 1579566 | 625132 | 1.45e+04 | -1.27e+03 | 0.0907 | 626 | 0.122 | -1.91 |
| 18290 | 1583600 | 717750 | 1.77e+04 | 613 | 0.312 | 693 | 0.171 | 1.06 |
| 19020 | 1506200 | 729900 | 1.82e+04 | 833 | 0.435 | 657 | 0.651 | 1.59 |
| 13653 | 1586902 | 627132 | 1.46e+04 | -903 | 0.0892 | 674 | 0.0604 | -1.36 |
| 13190 | 1678192 | 597026 | 1.34e+04 | -221 | 0.0618 | 718 | 0.00237 | -0.329 |
model_norm %>% tidy()
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5.12e+03 | 2.63e+03 | -1.95 | 0.0705 |
| FBI.drug_abuse | -0.0012 | 0.00139 | -0.861 | 0.403 |
| FBI.violent_crime | 0.0344 | 0.00263 | 13.1 | 1.34e-09 |
model_norm %>% confint() %>% as_tibble()
| 2.5 % | 97.5 % |
|---|---|
| -1.07e+04 | 484 |
| -0.00416 | 0.00176 |
| 0.0288 | 0.04 |
plot_model(model_norm,colors = "Accent",
show.values = TRUE,
value.offset = .4,
value.size = 4,
dot.size = 3,
line.size = 1.5,
vline.color = "blue",
width = 1.5
)
model_norm %>% VIF() %>% as_tibble()
| value |
|---|
| 1.01 |
| 1.01 |
p1 <- ggplot(FBI.train, aes(FBI.train[,2], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
p2 <- ggplot(FBI.train, aes(FBI.train[,3], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
p3 <- ggplot(FBI.train, aes(FBI.train[,10], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
grid.arrange(p1, p2, p3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
model_norm %>% ols_plot_cooksd_bar()
model_norm %>%
glm_coef(se_rob = TRUE, labels = model_labels(model_norm))
| Parameter | Coefficient | Pr(>|t|) |
|---|---|---|
| Constant | -5115.68 (-10372.07, 140.7) | 0.056 |
| FBI.drug_abuse | 0 (0, 0) | 0.43 |
| FBI.violent_crime | 0.03 (0.03, 0.04) | < 0.001 |
model_norm %>% glance()
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.919 | 0.909 | 696 | 85.4 | 6.33e-09 | 2 | -142 | 291 | 295 | 7.26e+06 | 15 | 18 |
model_norm %>% aov() %>% tidy()
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| FBI.drug_abuse | 1 | 1.09e+05 | 1.09e+05 | 0.225 | 0.642 |
| FBI.violent_crime | 1 | 8.26e+07 | 8.26e+07 | 171 | 1.34e-09 |
| Residuals | 15 | 7.26e+06 | 4.84e+05 |
model_norm_AIC <- stepAIC(model_norm, trace = 0)
AIC(model_norm, model_norm_AIC)
| df | AIC |
|---|---|
| 4 | 291 |
| 3 | 290 |
model_norm <- lm(FBI.homicide ~ FBI.prostitution + FBI.drug_abuse, data = FBI.train)
autoplot(model_norm)
model_norm %>% augment() %>% as_tibble()
| FBI.homicide | FBI.prostitution | FBI.drug_abuse | .fitted | .resid | .hat | .sigma | .cooksd | .std.resid |
|---|---|---|---|---|---|---|---|---|
| 10571 | 47598 | 1561231 | 1.07e+04 | -148 | 0.167 | 1.24e+03 | 0.00122 | -0.135 |
| 11092 | 41877 | 1488707 | 1.04e+04 | 687 | 0.253 | 1.23e+03 | 0.0494 | 0.661 |
| 14158 | 79733 | 1538813 | 1.45e+04 | -374 | 0.123 | 1.24e+03 | 0.0052 | -0.333 |
| 11788 | 38306 | 1572579 | 9.59e+03 | 2.19e+03 | 0.261 | 1.04e+03 | 0.532 | 2.12 |
| 12955 | 75004 | 1702537 | 1.32e+04 | -252 | 0.0713 | 1.24e+03 | 0.00121 | -0.217 |
| 12418 | 71355 | 1663582 | 1.3e+04 | -553 | 0.0589 | 1.23e+03 | 0.0047 | -0.475 |
| 13467 | 87872 | 1746570 | 1.45e+04 | -1.01e+03 | 0.122 | 1.21e+03 | 0.0375 | -0.9 |
| 11201 | 62668 | 1638846 | 1.21e+04 | -886 | 0.0743 | 1.22e+03 | 0.0157 | -0.767 |
| 13480 | 77607 | 1841182 | 1.28e+04 | 634 | 0.22 | 1.23e+03 | 0.0335 | 0.598 |
| 10832 | 57345 | 1531251 | 1.2e+04 | -1.15e+03 | 0.121 | 1.2e+03 | 0.0484 | -1.02 |
| 11075 | 56575 | 1552432 | 1.18e+04 | -722 | 0.113 | 1.23e+03 | 0.0172 | -0.637 |
| 13435 | 79673 | 1889810 | 1.29e+04 | 583 | 0.307 | 1.23e+03 | 0.0501 | 0.583 |
| 14062 | 84891 | 1846351 | 1.37e+04 | 401 | 0.226 | 1.24e+03 | 0.014 | 0.379 |
| 13227 | 87620 | 1579566 | 1.52e+04 | -2.02e+03 | 0.132 | 1.1e+03 | 0.166 | -1.81 |
| 18290 | 101600 | 1583600 | 1.68e+04 | 1.45e+03 | 0.264 | 1.16e+03 | 0.237 | 1.41 |
| 19020 | 99000 | 1506200 | 1.69e+04 | 2.11e+03 | 0.335 | 1.03e+03 | 0.778 | 2.15 |
| 13653 | 80854 | 1586902 | 1.44e+04 | -779 | 0.0893 | 1.22e+03 | 0.0151 | -0.68 |
| 13190 | 75190 | 1678192 | 1.33e+04 | -154 | 0.0617 | 1.24e+03 | 0.000384 | -0.132 |
model_norm %>% tidy()
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 1.27e+04 | 3.92e+03 | 3.23 | 0.0056 |
| FBI.prostitution | 0.115 | 0.0168 | 6.87 | 5.28e-06 |
| FBI.drug_abuse | -0.00477 | 0.00251 | -1.9 | 0.0772 |
model_norm %>% confint() %>% as_tibble()
| 2.5 % | 97.5 % |
|---|---|
| 4.31e+03 | 2.1e+04 |
| 0.0796 | 0.151 |
| -0.0101 | 0.000589 |
plot_model(model_norm,colors = "Accent",
show.values = TRUE,
value.offset = .4,
value.size = 4,
dot.size = 3,
line.size = 1.5,
vline.color = "blue",
width = 1.5
)
model_norm %>% VIF() %>% as_tibble()
| value |
|---|
| 1.11 |
| 1.11 |
p1 <- ggplot(FBI.train, aes(FBI.train[,2], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
p2 <- ggplot(FBI.train, aes(FBI.train[,3], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
p3 <- ggplot(FBI.train, aes(FBI.train[,10], residuals(model_norm))) +
geom_point() + geom_smooth(color = "blue")
grid.arrange(p1, p2, p3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
model_norm %>% ols_plot_cooksd_bar()
model_norm %>%
glm_coef(se_rob = TRUE, labels = model_labels(model_norm))
| Parameter | Coefficient | Pr(>|t|) |
|---|---|---|
| Constant | 12671.62 (3431.43, 21911.81) | 0.01 |
| FBI.prostitution | 0.12 (0.05, 0.18) | 0.002 |
| FBI.drug_abuse | 0 (-0.01, 0) | 0.166 |
model_norm %>% glance()
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.759 | 0.727 | 1.2e+03 | 23.7 | 2.29e-05 | 2 | -152 | 311 | 315 | 2.17e+07 | 15 | 18 |
model_norm %>% aov() %>% tidy()
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| FBI.prostitution | 1 | 6.31e+07 | 6.31e+07 | 43.7 | 8.25e-06 |
| FBI.drug_abuse | 1 | 5.2e+06 | 5.2e+06 | 3.6 | 0.0772 |
| Residuals | 15 | 2.17e+07 | 1.44e+06 |
model_norm_AIC <- stepAIC(model_norm, trace = 0)
AIC(model_norm, model_norm_AIC)
| df | AIC |
|---|---|
| 4 | 311 |
| 4 | 311 |
Federal Bureau Investigation, ed. “Crime Data Explorer.” FBI. crime-data-explorer.fr.cloud.gov. Accessed August 21, 2022. https://crime-data-explorer.fr.cloud.gov/pages/downloads#datasets
Federal Bureau Investigation, ed. “FBI National Arrests.” FBI. Accessed August 21, 2022. https://s3-us-gov-west-1.amazonaws.com/cg-d3f0433b-a53e-4934-8b94-c678aa2cbaf3/arrests_national.csv
GeeksforGeeks. “Conditioning Plot - GeeksforGeeks.” www.geeksforgeeks.org, March 7, 2021. https://www.geeksforgeeks.org/conditioning-plot/
Yi, Mike. “Scatter Plots | A Complete Guide to Scatter Plots.” Chartio. chartio.com. Accessed August 21, 2022. https://chartio.com/learn/charts/what-is-a-scatter-plot/
Viliam Simko, Taiyun Wei. “An Introduction to Corrplot Package.” An Introduction to corrplot Package. cran.r-project.org, November 18, 2021. https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html
Kabacoff, Rob. “Data Visualization with R.” Data Visualization with R. rkabacoff.github.io. Accessed August 21, 2022. https://rkabacoff.github.io/datavis/Multivariate.html
Vitor A. A. Marchi, Francisco A. R. Rojas, Francisco Louzada, The Chi-plot and Its Asymptotic Confidence Interval for Analyzing Bivariate Dependence: An Application to the Average Intelligence and Atheism Rates across Nations Data, J. data sci. 10(2022), no. 4, 711-722, DOI 10.6339/JDS.2012.10(4).1094
Saavedra, Mario. “Análisis de Regresión”. RPubs by RStudio . Accessed October 17, 2022. https://rpubs.com/mgsaavedraro/925582