Autor: Jose Carlos Molano de Oro
Universidad: Pontificia Universidad Javeriana
Curso: Linear Regression Analysis
Semestre: 2022-3
Profesor: Marisol Garcia Peña
Email del Autor: jose_molano@javeriana.edu.co
Email del Profesor: marisolgarcia@javeriana.edu.co
El conjunto de datos USArrests que ofrece R, contiene estadísticas, en arrestos por cada 100,000 residentes por agresión, asesinato y violación en cada uno de los 50 estados de EE. UU. en 1973. También se da el porcentaje de la población que vive en áreas urbanas (Urban Pop).
Arrests <- USArrests
Arrests <- scale(Arrests)
reactable(Arrests)
summary(Arrests)
## Murder Assault UrbanPop Rape
## Min. :-1.6044 Min. :-1.5090 Min. :-2.31714 Min. :-1.4874
## 1st Qu.:-0.8525 1st Qu.:-0.7411 1st Qu.:-0.76271 1st Qu.:-0.6574
## Median :-0.1235 Median :-0.1411 Median : 0.03178 Median :-0.1209
## Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.7949 3rd Qu.: 0.9388 3rd Qu.: 0.84354 3rd Qu.: 0.5277
## Max. : 2.2069 Max. : 1.9948 Max. : 1.75892 Max. : 2.6444
La función agnes puede obtener el coeficiente de aglomeración, que mide la cantidad de estructura de agrupamiento encontrada (los valores más cercanos a 1 sugieren una fuerte estructura de agrupamiento). Se usa mediante el paquete cluster la función agnes.
dArrests<-dist(Arrests, method = "euclidean")
#dArrests
#plot(cs<-hclust(dArrests,method="single"))
knitr::include_graphics("~/Documents/Analisis Multivariado/C1.png")
#plot(cc<-hclust(dArrests,method="complete"))
knitr::include_graphics("~/Documents/Analisis Multivariado/C2.png")
#plot(ca<-hclust(dArrests,method="average"))
#plot(ca<-agnes(dArrests,method="average"))
knitr::include_graphics("~/Documents/Analisis Multivariado/C3.png")
#plot(cw<-agnes(dArrests,method="ward"))
knitr::include_graphics("~/Documents/Analisis Multivariado/C4.png")
Mediante la función fviz_nbclust se puede determinar y visualizar el número óptimo de clústeres. Se pueden distintos métodos para estimar el número óptimo de conglomerados. Los valores posibles son “silhouette” (para el ancho promedio de la silueta), “wss” (para el total dentro de la suma del cuadrado) y “gap_stat” (para estadísticas de brecha). Para determinar se usa el método wws.
#fviz_nbclust(Arrests, FUN = hcut, method = "wss")
knitr::include_graphics("~/Documents/Analisis Multivariado/C5.png")
De lo anterior el número óptimo de clusters serian 4.
cw <- agnes(Arrests, method = "ward" )
subgrp<-cutree(as.hclust(cw), k = 4)
#plot(cw)
#rect.hclust(cw, k = 4, border = 2:5)
knitr::include_graphics("~/Documents/Analisis Multivariado/C6.png")
#fviz_cluster(list(data = Arrests, cluster = subgrp))
knitr::include_graphics("~/Documents/Analisis Multivariado/C7.png")
El conjunto de datos USArrests que ofrece R, contiene estadísticas, en arrestos por cada 100,000 residentes por agresión, asesinato y violación en cada uno de los 50 estados de EE. UU. en 1973. También se da el porcentaje de la población que vive en áreas urbanas (Urban Pop).
Para el análisis de correspondencias, solo se usan los datos de arrestos por cada 100000 residentes por agresión, asesinato y violación.
Arrests <- USArrests
Arrests$UrbanPop<-NULL
reactable(Arrests)
summary(Arrests)
## Murder Assault Rape
## Min. : 0.800 Min. : 45.0 Min. : 7.30
## 1st Qu.: 4.075 1st Qu.:109.0 1st Qu.:15.07
## Median : 7.250 Median :159.0 Median :20.10
## Mean : 7.788 Mean :170.8 Mean :21.23
## 3rd Qu.:11.250 3rd Qu.:249.0 3rd Qu.:26.18
## Max. :17.400 Max. :337.0 Max. :46.00
ca(Arrests)
##
## Principal inertias (eigenvalues):
## 1 2
## Value 0.01306 0.003948
## Percentage 76.79% 23.21%
##
##
## Rows:
## Alabama Alaska Arizona Arkansas California Colorado
## Mass 0.027070 0.031785 0.033347 0.021854 0.032596 0.025088
## ChiDist 0.100858 0.114253 0.089828 0.055084 0.080929 0.158687
## Inertia 0.000275 0.000415 0.000269 0.000066 0.000213 0.000632
## Dim. 1 -0.733596 0.918108 -0.462162 -0.473639 0.455975 1.323698
## Dim. 2 0.892302 -0.719653 -1.156340 0.162536 -0.985404 -0.762784
## Connecticut Delaware Florida Georgia Hawaii Idaho Illinois
## Mass 0.012454 0.025999 0.038272 0.025448 0.007158 0.013695 0.028371
## ChiDist 0.087938 0.175118 0.074109 0.152323 0.613368 0.104330 0.072023
## Inertia 0.000096 0.000797 0.000210 0.000590 0.002693 0.000149 0.000147
## Dim. 1 -0.558066 -1.387067 -0.640966 0.038670 5.214601 -0.188638 -0.626871
## Dim. 2 -0.963511 -1.184354 0.178902 2.423106 2.310528 -1.624521 -0.118023
## Indiana Iowa Kansas Kentucky Louisiana Maine Maryland
## Mass 0.014136 0.006958 0.013915 0.013515 0.028692 0.009300 0.033947
## ChiDist 0.155316 0.184455 0.080002 0.179732 0.116733 0.115405 0.086170
## Inertia 0.000341 0.000237 0.000089 0.000437 0.000391 0.000124 0.000252
## Dim. 1 1.276950 1.556287 0.684368 0.605534 -0.731250 -0.730917 -0.723671
## Dim. 2 0.846118 -0.778117 0.267845 2.639803 1.297079 -1.267254 -0.385095
## Massachusetts Michigan Minnesota Mississippi Missouri Montana
## Mass 0.016989 0.030253 0.008970 0.029252 0.021544 0.013154
## ChiDist 0.077393 0.032943 0.197397 0.171151 0.082876 0.071553
## Inertia 0.000102 0.000033 0.000350 0.000857 0.000148 0.000067
## Dim. 1 -0.367777 0.286675 1.652521 -1.261143 0.720283 0.566019
## Dim. 2 -1.034202 0.054927 -0.914179 1.468985 0.153112 0.486778
## Nebraska Nevada New Hampshire New Jersey New Mexico New York
## Mass 0.012294 0.031054 0.006868 0.018540 0.032886 0.029152
## ChiDist 0.092237 0.136792 0.110555 0.015962 0.036772 0.054642
## Inertia 0.000105 0.000581 0.000084 0.000005 0.000044 0.000087
## Dim. 1 0.774404 1.195561 0.865334 -0.129592 -0.268510 -0.478036
## Dim. 2 -0.413558 -0.105453 -0.786545 0.094755 -0.322455 -0.017656
## North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania
## Mass 0.036650 0.005316 0.014886 0.017780 0.019341 0.012734
## ChiDist 0.204673 0.154580 0.136446 0.022059 0.159109 0.067074
## Inertia 0.001535 0.000127 0.000277 0.000009 0.000490 0.000057
## Dim. 1 -1.790326 0.744644 1.129026 0.169196 1.208430 0.370995
## Dim. 2 -0.085468 -2.053691 0.706232 -0.168951 -1.257478 0.827126
## Rhode Island South Carolina South Dakota Tennessee Texas Utah
## Mass 0.018590 0.031625 0.010271 0.022835 0.023946 0.014626
## ChiDist 0.233586 0.116728 0.060251 0.107328 0.073162 0.180971
## Inertia 0.001014 0.000431 0.000037 0.000263 0.000128 0.000479
## Dim. 1 -1.871775 -0.956625 0.513422 0.442902 0.092898 1.332410
## Dim. 2 -1.493264 0.651012 -0.217841 1.506184 1.152002 -1.556362
## Vermont Virginia Washington West Virginia Wisconsin Wyoming
## Mass 0.006147 0.018540 0.017539 0.009611 0.006647 0.018360
## ChiDist 0.247035 0.041112 0.158640 0.107890 0.183425 0.070387
## Inertia 0.000375 0.000031 0.000441 0.000112 0.000224 0.000091
## Dim. 1 2.144060 0.197016 1.133417 -0.146313 1.602439 -0.613995
## Dim. 2 -0.499992 0.547432 -1.457591 1.696258 -0.165296 -0.088148
##
##
## Columns:
## Murder Assault Rape
## Mass 0.038983 0.854740 0.106277
## ChiDist 0.313348 0.043908 0.329427
## Inertia 0.003828 0.001648 0.011533
## Dim. 1 0.305408 -0.371405 2.875024
## Dim. 2 4.955705 -0.178899 -0.378969
summary(ca(Arrests))
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.013060 76.8 76.8 *******************
## 2 0.003948 23.2 100.0 ******
## -------- -----
## Total: 0.017009 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Albm | 27 1000 16 | -84 691 15 | 56 309 22 |
## 2 | Alsk | 32 1000 24 | 105 843 27 | -45 157 16 |
## 3 | Arzn | 33 1000 16 | -53 346 7 | -73 654 45 |
## 4 | Arkn | 22 1000 4 | -54 966 5 | 10 34 1 |
## 5 | Clfr | 33 1000 13 | 52 415 7 | -62 585 32 |
## 6 | Clrd | 25 1000 37 | 151 909 44 | -48 91 15 |
## 7 | Cnnc | 12 1000 6 | -64 526 4 | -61 474 12 |
## 8 | Dlwr | 26 1000 47 | -159 819 50 | -74 181 36 |
## 9 | Flrd | 38 1000 12 | -73 977 16 | 11 23 1 |
## 10 | Gerg | 25 1000 35 | 4 1 0 | 152 999 149 |
## 11 | Hawa | 7 1000 158 | 596 944 195 | 145 56 38 |
## 12 | Idah | 14 1000 9 | -22 43 0 | -102 957 36 |
## 13 | Illn | 28 1000 9 | -72 989 11 | -7 11 0 |
## 14 | Indn | 14 1000 20 | 146 883 23 | 53 117 10 |
## 15 | Iowa | 7 1000 14 | 178 930 17 | -49 70 4 |
## 16 | Knss | 14 1000 5 | 78 956 7 | 17 44 1 |
## 17 | Kntc | 14 1000 26 | 69 148 5 | 166 852 94 |
## 18 | Losn | 29 1000 23 | -84 513 15 | 82 487 48 |
## 19 | Main | 9 1000 7 | -84 524 5 | -80 476 15 |
## 20 | Mryl | 34 1000 15 | -83 921 18 | -24 79 5 |
## 21 | Mssc | 17 1000 6 | -42 295 2 | -65 705 18 |
## 22 | Mchg | 30 1000 2 | 33 989 2 | 3 11 0 |
## 23 | Mnns | 9 1000 21 | 189 915 24 | -57 85 7 |
## 24 | Msss | 29 1000 50 | -144 709 47 | 92 291 63 |
## 25 | Mssr | 22 1000 9 | 82 987 11 | 10 13 1 |
## 26 | Mntn | 13 1000 4 | 65 817 4 | 31 183 3 |
## 27 | Nbrs | 12 1000 6 | 89 921 7 | -26 79 2 |
## 28 | Nevd | 31 1000 34 | 137 998 44 | -7 2 0 |
## 29 | NwHm | 7 1000 5 | 99 800 5 | -49 200 4 |
## 30 | NwJr | 19 1000 0 | -15 861 0 | 6 139 0 |
## 31 | NwMx | 33 1000 3 | -31 696 2 | -20 304 3 |
## 32 | NwYr | 29 1000 5 | -55 1000 7 | -1 0 0 |
## 33 | NrtC | 37 1000 90 | -205 999 117 | -5 1 0 |
## 34 | NrtD | 5 1000 7 | 85 303 3 | -129 697 22 |
## 35 | Ohio | 15 1000 16 | 129 894 19 | 44 106 7 |
## 36 | Oklh | 18 1000 1 | 19 768 1 | -11 232 1 |
## 37 | Orgn | 19 1000 29 | 138 753 28 | -79 247 31 |
## 38 | Pnns | 13 1000 3 | 42 400 2 | 52 600 9 |
## 39 | RhdI | 19 1000 60 | -214 839 65 | -94 161 41 |
## 40 | SthC | 32 1000 25 | -109 877 29 | 41 123 13 |
## 41 | SthD | 10 1000 2 | 59 948 3 | -14 52 0 |
## 42 | Tnns | 23 1000 15 | 51 222 4 | 95 778 52 |
## 43 | Texs | 24 1000 8 | 11 21 0 | 72 979 32 |
## 44 | Utah | 15 1000 28 | 152 708 26 | -98 292 35 |
## 45 | Vrmn | 6 1000 22 | 245 984 28 | -31 16 2 |
## 46 | Vrgn | 19 1000 2 | 23 300 1 | 34 700 6 |
## 47 | Wshn | 18 1000 26 | 130 667 23 | -92 333 37 |
## 48 | WstV | 10 1000 7 | -17 24 0 | 107 976 28 |
## 49 | Wscn | 7 1000 13 | 183 997 17 | -10 3 0 |
## 50 | Wymn | 18 1000 5 | -70 994 7 | -6 6 0 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Mrdr | 39 1000 225 | 35 12 4 | 311 988 957 |
## 2 | Assl | 855 1000 97 | -42 934 118 | -11 66 27 |
## 3 | Rape | 106 1000 678 | 329 995 878 | -24 5 15 |
#plot(ca(Arrests))
knitr::include_graphics("~/Documents/Analisis Multivariado/C8.png")
#plot(ca(Arrests), what = c("all", "all"), mass = TRUE, contrib = "relative")
knitr::include_graphics("~/Documents/Analisis Multivariado/C9.png")
El dataset eurodist da las distancias por carretera (en km) entre 21 ciudades de Europa. Los datos están tomados de una tabla en The Cambridge Encyclopedia.
eurodist.mat <- as.matrix(eurodist)
reactable(eurodist.mat)
colnames(eurodist.mat) <- rownames(eurodist.mat)
dist.eurodist <- as.dist(eurodist.mat)
stress.fun <- function(datadist,fitteddist) {
sqrt(sum((datadist-fitteddist)^2)/sum(datadist^2))
}
MDSdim2eurodist.cmd <- cmdscale(dist.eurodist, eig=TRUE, k=2) # EscMult cl?sico ; eig=T devuelve autovalores
MDSdim3eurodist.cmd <- cmdscale(dist.eurodist, eig=TRUE, k=3)
MDSdim4eurodist.cmd <- cmdscale(dist.eurodist, eig=TRUE, k=4)
#plot(MDSdim3eurodist.cmd$points[,1],MDSdim3eurodist.cmd$points[,2], pch=16, col="blue", xlab="Dimensión 1",ylab="Dimensión 2", xlim=c(-2300,2300), ylim=c(-2500,2500), main="Función cmdscale")
#text(MDSdim3eurodist.cmd$points[,1],MDSdim3eurodist.cmd$points[,2],labels=rownames(MDSdim3eurodist.cmd$points), cex=0.7, pos=3)
#abline(h=0)
#abline(v=0)
knitr::include_graphics("~/Documents/Analisis Multivariado/C10.png")
MDSdim2eurodist.iso <- isoMDS(dist.eurodist) # k=2
## initial value 7.505733
## final value 7.505688
## converged
MDSdim3eurodist.iso <- isoMDS(dist.eurodist, k=3)
## initial value 6.581788
## final value 6.581747
## converged
MDSdim4eurodist.iso <- isoMDS(dist.eurodist, k=4)
## initial value 7.697729
## final value 7.697672
## converged
# Dimension 1 vs Dimension 2
#plot(MDSdim3eurodist.iso$points[,1],MDSdim3eurodist.iso$points[,2], pch=16, col="blue", xlab="Dimensión 1",ylab="Dimensión 2", xlim=c(-2300,2300), ylim=c(-2500,2500),main="Función isoMDS")
#text(MDSdim3eurodist.iso$points[,1],MDSdim3eurodist.iso$points[,2], labels=rownames(MDSdim3eurodist.iso$points), cex=0.7, pos=3)
#abline(h=0)
#abline(v=0)
knitr::include_graphics("~/Documents/Analisis Multivariado/C11.png")