I gained an understanding of the assumptions (linearity, homoscedasticity, multivariate normality, independence of observations, lack of multicollinearity) underlying linear regression and situations where it may not be applicable. I also explored the importance of avoiding the dummy variable trap. Additionally, I conducted multiple linear regression analysis on sample data using the scikit-learn library in Python within Jupyter Lab. Furthermore, I familiarized myself with five methods of model building (All-in where you throw in all the predictors, Backward Elimination, Forward Selection, Bidirectional Elimination, All-possible-models).
Week-1 Wednesday
I read about hypothesis testing, null and alternative hypotheses (H0 and H1). Essentially, it helps us assess whether our sample is extreme enough to reject the null hypothesis (H0). This understanding also helped me grasp p-values better, which measure how extreme our sample is. I also delved into how p-values are calculated. The smaller our p-value, the more extreme our sample must have been or the closer it was to the extreme. Then, I did some cursory reading on the additional material provided on the Breusch-Pagan test. I learned the difference between R-squared and the Standard Error of the Estimate.
Week 1- Mon, Tue
I went through all the basics of statistics, like data types, distributions, sampling and estimations, hypothesis testing and p values. Addition to that I read about new topics like kurtosis and heteroskedecity. I went through the notes and learnt why extreme values or outliers in the data for diabetes might requires a statistician to consider alternative statistical methods that are more accommodative to deviations from normality.
Hello world!
Welcome to UMassD WordPress. This is your first post. Edit or delete it, then start blogging!