class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # Revision ## .black[STAT3022 Applied Linear Models Lecture 22] <br><br><br> ### .black[2020/02/20] ]] .column.bg-brand-charcoal[.content.white[ ## Today 1. Assignment question. ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # The Avocado Lover's Great Problem ]] .row[ .split-two[ .column.bg-gray.white[.content[ Dear statistician, Thanks for agreeing to help me. I'm a millenial that LOVES Avocado toast. I eat it, like every day. Because of my job, I need to work in the U.S.A. for the whole month in July 2019. The company I work for has an office in every state in US and I get to choose which state to work in. My salary is not too high so I need to make sure that avocado is least expensive where I go. I like my avocado organic but if it's too costly, I'm willing to settle for non-organic avocado. Please help me decide where I should go! Thanks statistician! Kind regards, Avocado Lover ]] .column[.content.center.black.vmiddle[ # What is the aim? ]] ]] --- class: split-10 count: false .row.bg-brand-blue.white[.content.vmiddle[ # The Avocado Lover's Great Problem ]] .row[ .split-two[ .column.bg-gray.white[.content[ Dear statistician, Thanks for agreeing to help me. I'm a millenial that LOVES Avocado toast. I eat it, like every day. Because of my job, I need to work in the U.S.A. for the whole month in .yellow[**July 2019**]. The company I work for has an office in every state in US and I get to .yellow[**choose which state**] to work in. **My salary is not too high** so I need to make sure that avocado is least expensive where I go. .yellow[**I like my avocado organic but if it's too costly, I'm willing to settle for non-organic avocado.**] Please help me decide where I should go! Thanks statistician! Kind regards, Avocado Lover ]] .column[.content[ First identify what the client wants. * The price in **July 2019** only matters to the client. * The client wants a recommendation of which **state** to go. But there may be a lot of variation within the state? In that case, even if the client says state, what is more beneficial is to identify city (or even suburb?) in which his/her workplace has an office. * The client cares moderately for **organic** avocados. How much is he/she willing to pay extra for it? Above are points on what the client makes explicit but implicitly the client would care about: * cost of living other than avocados (rent, electricity, water bills, etc), * lifestyle and so many other factors! ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # The Avocado Lover's Great Problem ]] .row[.split-50[ .column[.content[ ### What does the result mean for the client? * In the real world, the questions are often not clearly defined. * Keep in mind that the client may find it hard to pinpoint exactly what they want. * It is your job as a statistician to work together with the client to frame the problem numerically/statistically. .bottom_abs.width100[ By no means that what is shown in lecture is "correct" (definitive) but given right data, it should shed light to the "truth" more than having no information (informative). ]]] .column.bg-brand-gray[.content[ ### Do you have the data to answer the question? * Even after you form the question analytically, you may have insufficient data to answer the question. * You may need to make the best out of data, acknowledging the potential flaws in the analysis. * In some cases, you just simply cannot gain any information out of data. * Remember it is GIGO principle: garbage-in, garbage-out. * "If you torture the data long enough, it will confess." .pull-right[ -Ronald H. Coase] ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Recall: the avocado price dataset ]] .row[.content[ ```r str(dat_ncr) ``` ``` tibble [11,154 x 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame) $ date : Date[1:11154], format: "2015-12-27" "2015-12-20" ... $ average_price: num [1:11154] 1.33 1.35 0.93 1.08 1.28 1.26 0.99 0.98 1.02 1.07 ... $ total_volume : num [1:11154] 64237 54877 118220 78992 51040 ... $ x4046 : num [1:11154] 1037 674 795 1132 941 ... $ x4225 : num [1:11154] 54455 44639 109150 71976 43838 ... $ x4770 : num [1:11154] 48.2 58.3 130.5 72.6 75.8 ... $ total_bags : num [1:11154] 8697 9506 8145 5811 6184 ... $ small_bags : num [1:11154] 8604 9408 8042 5677 5986 ... $ large_bags : num [1:11154] 93.2 97.5 103.1 133.8 197.7 ... $ x_large_bags : num [1:11154] 0 0 0 0 0 0 0 0 0 0 ... $ type : chr [1:11154] "conventional" "conventional" "conventional" "conventional" ... $ year : num [1:11154] 2015 2015 2015 2015 2015 ... $ region : chr [1:11154] "Albany" "Albany" "Albany" "Albany" ... $ pop : num [1:11154] 93576 93576 93576 93576 93576 ... $ lat : num [1:11154] 42.7 42.7 42.7 42.7 42.7 ... $ long : num [1:11154] -73.8 -73.8 -73.8 -73.8 -73.8 -73.8 -73.8 -73.8 -73.8 -73.8 ... $ state : chr [1:11154] "NY" "NY" "NY" "NY" ... ``` ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Avocado price over time by region and type ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Avocado price over time by region and type ]] .row[.content[ <br> <img src="images/average_price_over_time.gif" width="100%"/> <br><br> ## Do we need data for other days beside July? ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Avocado price for cities ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Avocado price for cities ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Avocado price for combined region ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Means for region by type by year via modelling ]] .row[.split-40[ .column.bg-brand-gray[.content[ ```r julydat_ncr <- julydat_ncr %>% mutate(year=factor(year)) *M0 <- lm(average_price ~ * region:type:year - 1, * data=julydat_ncr) outM0 <- broom::tidy(M0) %>% arrange(estimate) %>% select(term, estimate) %>% separate(term, c("region", "type", "year")) %>% mutate(region=gsub("region", "", region), type=gsub("type", "", type), year=gsub("year", "", year)) %>% mutate(tyear= paste0(substring(type, 1, 1), year)) %>% select(region, tyear, estimate) %>% spread(tyear, estimate) outM0 ``` ``` # A tibble: 33 x 7 region c2015 c2016 c2017 o2015 o2016 o2017 <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Albany 1.19 1.38 1.51 2.04 1.49 1.80 2 Atlanta 1.05 0.936 1.23 1.49 1.34 1.73 3 Boise 1.09 0.748 1.34 1.94 1.48 2.04 4 Boston 1.22 1.48 1.55 2.10 1.52 1.82 5 California 1.14 1.15 1.38 1.82 1.56 1.96 6 Charlotte 1.19 1.37 1.37 2.10 1.67 2.04 7 Chicago 1.22 1.52 1.55 1.65 1.89 1.92 8 Columbus 1.02 1.11 1.15 1.27 1.24 1.57 9 Denver 1.16 1.09 1.22 1.44 1.41 1.77 10 Detroit 1.03 1.17 1.25 1.25 1.21 1.57 # ... with 23 more rows ``` ]] .column.bg-white[.content[
]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Means for region by type by year via means ]] .row[.split-40[ .column.bg-brand-gray[.content[ ```r outsum <- julydat_ncr %>% group_by(region, type, year) %>% *summarise(estimate=mean(average_price))%>% ungroup() %>% mutate(tyear= paste0(substring(type, 1, 1), year)) %>% select(region, tyear, estimate) %>% spread(tyear, estimate) outsum ``` ``` # A tibble: 33 x 7 region c2015 c2016 c2017 o2015 o2016 o2017 <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Albany 1.19 1.38 1.51 2.04 1.49 1.80 2 Atlanta 1.05 0.936 1.23 1.49 1.34 1.73 3 Boise 1.09 0.748 1.34 1.94 1.48 2.04 4 Boston 1.22 1.48 1.55 2.10 1.52 1.82 5 California 1.14 1.15 1.38 1.82 1.56 1.96 6 Charlotte 1.19 1.37 1.37 2.10 1.67 2.04 7 Chicago 1.22 1.52 1.55 1.65 1.89 1.92 8 Columbus 1.02 1.11 1.15 1.27 1.24 1.57 9 Denver 1.16 1.09 1.22 1.44 1.41 1.77 10 Detroit 1.03 1.17 1.25 1.25 1.21 1.57 # ... with 23 more rows ``` ]] .column.bg-white[.content[
]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Interaction effects ]] .row[.split-50[ .column[.content[ ```r gather(outsum, "typeyear", "estimate", -region) %>% mutate(type=case_when( substring(typeyear, 1, 1)=="c" ~ "conventional", substring(typeyear, 1, 1)=="o" ~ "organic"), year=substring(typeyear, 2, nchar(typeyear))) %>% ggplot(aes(year, estimate, group=region, color=region)) + facet_wrap(~type) + geom_point(size=2) + geom_line() + guides(color=FALSE) + scale_y_log10() + labs(x="year (July only)") ``` <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> ]] .column.bg-brand-gray[.content[ * We already know that `type` is an important factor for determining the `average_price` even without a formal statistical test (you can always formally test this). * Is `year` or `date` important? * Is `year:region` important? * Is `year:type` important? * Is `year:type:region`? * What about `date:region:type`? ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Interaction effects ]] .row[.split-50[ .column[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-15-1.svg" style="display: block; margin: auto;" /> <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" /> ]] .column.bg-brand-gray[.content[ * Is `year` or `date` important? * Yes! E.g. early-year price appears to dip across all `region` and `type`. * Is `year:region` important? * This can be thought of as average of `year` by `region` combination. At specific dates, it seemed to behave differently but averaged over year may be not much. * Is `year:type` important? Maybe. * Is `year:type:region`? Maybe. * What about `date:region:type`? * We cannot include this in model because this indexes the residual! ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Testing for interaction effects ]] .row[.split-30[ .column[.content[ * Start with testing higher order interaction term `year:region:type`. * If a higher order interaction term is included, you cannot drop main effects (`year`, `region`, `type`) or lower order interaction term (e.g. `year:region`, `year:type`, `region:type`). ]] .column.bg-brand-gray[.content[ ```r M1 <- lm(average_price ~ year*region*type, data=dat_ncr) anova(M1) ``` ``` Analysis of Variance Table Response: average_price Df Sum Sq Mean Sq F value Pr(>F) year 1 16.95 16.95 228.9442 < 2.2e-16 *** region 32 286.35 8.95 120.8897 < 2.2e-16 *** type 1 667.87 667.87 9022.6804 < 2.2e-16 *** year:region 32 16.15 0.50 6.8167 < 2.2e-16 *** year:type 1 9.07 9.07 122.4796 < 2.2e-16 *** region:type 32 59.85 1.87 25.2664 < 2.2e-16 *** year:region:type 32 15.18 0.47 6.4100 < 2.2e-16 *** Residuals 11022 815.87 0.07 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Model diagnostic ]] .row[.split-50[ .column[.content[ ```r autoplot(M1) ``` <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-18-1.svg" style="display: block; margin: auto;" /> ]] .column.bg-brand-gray[.content[ * Model diagnostics are difficult for large data as seen here. * For large sample size, `\(p\)`-value tend to become all significant. * Thus it becomes more effective to check the effect size instead. .scroll-box-10[ ```r #print(broom::tidy(M1), n=Inf) print(summary(M1)) ``` ``` Call: lm(formula = average_price ~ year * region * type, data = dat_ncr) Residuals: Min 1Q Median 3Q Max -1.10050 -0.15987 -0.00684 0.13996 1.51631 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.482e+02 4.489e+01 -5.529 3.29e-08 *** year 1.238e-01 2.227e-02 5.559 2.77e-08 *** regionAtlanta 1.683e+02 6.349e+01 2.650 0.008050 ** regionBoise 6.268e+01 6.349e+01 0.987 0.323510 regionBoston -9.625e+00 6.349e+01 -0.152 0.879501 regionCalifornia 9.249e+01 6.349e+01 1.457 0.145191 regionCharlotte 6.243e+01 6.349e+01 0.983 0.325487 regionChicago -8.891e+01 6.349e+01 -1.400 0.161431 regionColumbus 1.719e+02 6.349e+01 2.707 0.006799 ** regionDenver 1.372e+02 6.349e+01 2.161 0.030715 * regionDetroit 1.318e+02 6.349e+01 2.077 0.037850 * regionGrandRapids -1.167e+02 6.349e+01 -1.839 0.065971 . regionHouston 2.246e+02 6.349e+01 3.537 0.000406 *** regionIndianapolis 2.000e+02 6.349e+01 3.150 0.001640 ** regionJacksonville 5.361e+01 6.349e+01 0.844 0.398470 regionLasVegas 1.384e+02 6.349e+01 2.180 0.029261 * regionLosAngeles 6.812e+01 6.349e+01 1.073 0.283281 regionLouisville 1.371e+02 6.349e+01 2.160 0.030775 * regionNashville 2.341e+02 6.349e+01 3.688 0.000227 *** regionNewYork 3.941e+01 6.349e+01 0.621 0.534794 regionOrlando 6.309e+01 6.349e+01 0.994 0.320361 regionPhiladelphia 9.511e+01 6.349e+01 1.498 0.134152 regionPittsburgh 2.720e+02 6.349e+01 4.284 1.85e-05 *** regionPortland 1.577e+02 6.349e+01 2.485 0.012982 * regionRoanoke 1.799e+02 6.349e+01 2.834 0.004607 ** regionSacramento 1.016e+02 6.349e+01 1.601 0.109514 regionSanDiego 6.248e+01 6.349e+01 0.984 0.325074 regionSanFrancisco 1.537e+02 6.349e+01 2.421 0.015480 * regionSeattle 7.416e+01 6.349e+01 1.168 0.242775 regionSouthCarolina 9.320e+01 6.349e+01 1.468 0.142148 regionSpokane 1.501e+02 6.349e+01 2.364 0.018105 * regionStLouis 1.394e+02 6.349e+01 2.196 0.028140 * regionSyracuse 1.890e+02 6.349e+01 2.977 0.002915 ** regionTampa 1.432e+01 6.349e+01 0.225 0.821604 typeorganic 4.395e+02 6.349e+01 6.922 4.68e-12 *** year:regionAtlanta -8.360e-02 3.149e-02 -2.655 0.007946 ** year:regionBoise -3.122e-02 3.149e-02 -0.992 0.321409 year:regionBoston 4.752e-03 3.149e-02 0.151 0.880053 year:regionCalifornia -4.600e-02 3.149e-02 -1.461 0.144136 year:regionCharlotte -3.100e-02 3.149e-02 -0.984 0.324916 year:regionChicago 4.411e-02 3.149e-02 1.401 0.161334 year:regionColumbus -8.538e-02 3.149e-02 -2.712 0.006708 ** year:regionDenver -6.818e-02 3.149e-02 -2.165 0.030382 * year:regionDetroit -6.551e-02 3.149e-02 -2.080 0.037524 * year:regionGrandRapids 5.789e-02 3.149e-02 1.838 0.066026 . year:regionHouston -1.116e-01 3.149e-02 -3.545 0.000394 *** year:regionIndianapolis -9.928e-02 3.149e-02 -3.153 0.001622 ** year:regionJacksonville -2.667e-02 3.149e-02 -0.847 0.397106 year:regionLasVegas -6.882e-02 3.149e-02 -2.186 0.028871 * year:regionLosAngeles -3.397e-02 3.149e-02 -1.079 0.280659 year:regionLouisville -6.814e-02 3.149e-02 -2.164 0.030480 * year:regionNashville -1.163e-01 3.149e-02 -3.693 0.000222 *** year:regionNewYork -1.952e-02 3.149e-02 -0.620 0.535347 year:regionOrlando -3.136e-02 3.149e-02 -0.996 0.319332 year:regionPhiladelphia -4.715e-02 3.149e-02 -1.497 0.134347 year:regionPittsburgh -1.349e-01 3.149e-02 -4.285 1.84e-05 *** year:regionPortland -7.839e-02 3.149e-02 -2.489 0.012810 * year:regionRoanoke -8.936e-02 3.149e-02 -2.838 0.004550 ** year:regionSacramento -5.044e-02 3.149e-02 -1.602 0.109253 year:regionSanDiego -3.113e-02 3.149e-02 -0.989 0.322855 year:regionSanFrancisco -7.622e-02 3.149e-02 -2.421 0.015512 * year:regionSeattle -3.687e-02 3.149e-02 -1.171 0.241640 year:regionSouthCarolina -4.632e-02 3.149e-02 -1.471 0.141283 year:regionSpokane -7.455e-02 3.149e-02 -2.367 0.017927 * year:regionStLouis -6.922e-02 3.149e-02 -2.198 0.027956 * year:regionSyracuse -9.373e-02 3.149e-02 -2.977 0.002921 ** year:regionTampa -7.173e-03 3.149e-02 -0.228 0.819801 year:typeorganic -2.178e-01 3.149e-02 -6.916 4.91e-12 *** regionAtlanta:typeorganic -3.152e+02 8.978e+01 -3.510 0.000450 *** regionBoise:typeorganic -3.302e+02 8.978e+01 -3.677 0.000237 *** regionBoston:typeorganic -2.437e+02 8.978e+01 -2.714 0.006653 ** regionCalifornia:typeorganic -4.455e+02 8.978e+01 -4.962 7.10e-07 *** regionCharlotte:typeorganic -2.616e+02 8.978e+01 -2.913 0.003585 ** regionChicago:typeorganic -2.060e+02 8.978e+01 -2.295 0.021775 * regionColumbus:typeorganic -2.921e+02 8.978e+01 -3.254 0.001143 ** regionDenver:typeorganic -4.978e+02 8.978e+01 -5.544 3.02e-08 *** regionDetroit:typeorganic -1.434e+02 8.978e+01 -1.597 0.110355 regionGrandRapids:typeorganic 1.078e+02 8.978e+01 1.200 0.230089 regionHouston:typeorganic -4.664e+02 8.978e+01 -5.195 2.08e-07 *** regionIndianapolis:typeorganic -2.292e+02 8.978e+01 -2.553 0.010699 * regionJacksonville:typeorganic -2.529e+02 8.978e+01 -2.817 0.004852 ** regionLasVegas:typeorganic -2.952e+02 8.978e+01 -3.288 0.001011 ** regionLosAngeles:typeorganic -4.850e+02 8.978e+01 -5.401 6.75e-08 *** regionLouisville:typeorganic -2.925e+02 8.978e+01 -3.258 0.001126 ** regionNashville:typeorganic -3.641e+02 8.978e+01 -4.056 5.03e-05 *** regionNewYork:typeorganic -1.537e+02 8.978e+01 -1.712 0.086863 . regionOrlando:typeorganic -2.266e+02 8.978e+01 -2.524 0.011607 * regionPhiladelphia:typeorganic -2.267e+02 8.978e+01 -2.525 0.011572 * regionPittsburgh:typeorganic -3.644e+02 8.978e+01 -4.059 4.97e-05 *** regionPortland:typeorganic -4.890e+02 8.978e+01 -5.447 5.24e-08 *** regionRoanoke:typeorganic -5.122e+02 8.978e+01 -5.705 1.20e-08 *** regionSacramento:typeorganic -4.997e+02 8.978e+01 -5.565 2.68e-08 *** regionSanDiego:typeorganic -5.577e+02 8.978e+01 -6.212 5.43e-10 *** regionSanFrancisco:typeorganic -5.706e+02 8.978e+01 -6.355 2.17e-10 *** regionSeattle:typeorganic -3.495e+02 8.978e+01 -3.893 9.96e-05 *** regionSouthCarolina:typeorganic -7.648e+01 8.978e+01 -0.852 0.394319 regionSpokane:typeorganic -5.180e+02 8.978e+01 -5.769 8.20e-09 *** regionStLouis:typeorganic -3.703e+02 8.978e+01 -4.125 3.74e-05 *** regionSyracuse:typeorganic -2.298e+02 8.978e+01 -2.559 0.010505 * regionTampa:typeorganic -2.929e+02 8.978e+01 -3.263 0.001107 ** year:regionAtlanta:typeorganic 1.564e-01 4.453e-02 3.511 0.000447 *** year:regionBoise:typeorganic 1.638e-01 4.453e-02 3.679 0.000236 *** year:regionBoston:typeorganic 1.209e-01 4.453e-02 2.715 0.006647 ** year:regionCalifornia:typeorganic 2.210e-01 4.453e-02 4.963 7.03e-07 *** year:regionCharlotte:typeorganic 1.298e-01 4.453e-02 2.916 0.003555 ** year:regionChicago:typeorganic 1.022e-01 4.453e-02 2.294 0.021807 * year:regionColumbus:typeorganic 1.449e-01 4.453e-02 3.253 0.001145 ** year:regionDenver:typeorganic 2.468e-01 4.453e-02 5.543 3.04e-08 *** year:regionDetroit:typeorganic 7.105e-02 4.453e-02 1.595 0.110654 year:regionGrandRapids:typeorganic -5.348e-02 4.453e-02 -1.201 0.229810 year:regionHouston:typeorganic 2.314e-01 4.453e-02 5.195 2.08e-07 *** year:regionIndianapolis:typeorganic 1.136e-01 4.453e-02 2.552 0.010728 * year:regionJacksonville:typeorganic 1.256e-01 4.453e-02 2.820 0.004817 ** year:regionLasVegas:typeorganic 1.466e-01 4.453e-02 3.292 0.000999 *** year:regionLosAngeles:typeorganic 2.406e-01 4.453e-02 5.402 6.73e-08 *** year:regionLouisville:typeorganic 1.450e-01 4.453e-02 3.257 0.001129 ** year:regionNashville:typeorganic 1.806e-01 4.453e-02 4.055 5.04e-05 *** year:regionNewYork:typeorganic 7.637e-02 4.453e-02 1.715 0.086399 . year:regionOrlando:typeorganic 1.125e-01 4.453e-02 2.526 0.011549 * year:regionPhiladelphia:typeorganic 1.125e-01 4.453e-02 2.526 0.011555 * year:regionPittsburgh:typeorganic 1.806e-01 4.453e-02 4.056 5.02e-05 *** year:regionPortland:typeorganic 2.426e-01 4.453e-02 5.448 5.20e-08 *** year:regionRoanoke:typeorganic 2.540e-01 4.453e-02 5.703 1.21e-08 *** year:regionSacramento:typeorganic 2.480e-01 4.453e-02 5.569 2.63e-08 *** year:regionSanDiego:typeorganic 2.768e-01 4.453e-02 6.215 5.33e-10 *** year:regionSanFrancisco:typeorganic 2.832e-01 4.453e-02 6.359 2.11e-10 *** year:regionSeattle:typeorganic 1.734e-01 4.453e-02 3.894 9.90e-05 *** year:regionSouthCarolina:typeorganic 3.798e-02 4.453e-02 0.853 0.393766 year:regionSpokane:typeorganic 2.570e-01 4.453e-02 5.771 8.07e-09 *** year:regionStLouis:typeorganic 1.837e-01 4.453e-02 4.126 3.73e-05 *** year:regionSyracuse:typeorganic 1.139e-01 4.453e-02 2.557 0.010558 * year:regionTampa:typeorganic 1.453e-01 4.453e-02 3.262 0.001108 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.2721 on 11022 degrees of freedom Multiple R-squared: 0.5677, Adjusted R-squared: 0.5626 F-statistic: 110.5 on 131 and 11022 DF, p-value: < 2.2e-16 ``` ] ]] ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Estimate for Conventional Avocado Price in 2015 July ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-20-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Estimate for Conventional Avocado Price in 2016 July ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-21-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Estimate for Conventional Avocado Price in 2017 July ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-22-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Estimate for Organic Avocado Price in 2015 July ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-23-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Estimate for Organic Avocado Price in 2016 July ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-24-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Estimate for Organic Avocado Price in 2017 July ]] .row[.content[ <img src="lecture22_2020JC_files/figure-html/unnamed-chunk-25-1.svg" style="display: block; margin: auto;" /> ]] --- class: split-10 .row.bg-brand-blue.white[.content.vmiddle[ # Wait... ]] .row[.split-50[ .column[.content[ * What about the different PLU? * What about different total volumes etc? * You can try to predict these values for July 2019. * I made a simplifying assumption that the volume is constant across time. Or even if it is not, all regions are increasing their volume by the same amount. * So while I can adjust the price according to the volume, under above assumptions it will not change the **ranking** of the cities. ]] .column.bg-brand-gray[.content.vmiddle.center[ {{content}} ]] ]] --