MTH522 Snapshot – My Dive into Advanced Stats (Part 1)

In this brief overview, we’ll delve into the core concepts that define this course, shedding light on the heightened complexities of probability, inference, and statistical methodologies. Join me in this concise recap, as we unravel the key insights gained from the advanced math stats coursem a journey that elevated my statistical understanding to new heights.

  1. Descriptive statistics are a group of statistical approaches that are used to summarise and characterise the principal characteristics of a dataset. These strategies provide a clear and informative overview of the data’s key qualities. These methods go beyond simple numerical values, providing a more nuanced view of data distribution, central tendency, and variability. In this essay, we dig into the vast world of descriptive statistics, looking at the numerous measurements and approaches statisticians use to extract valuable insights from large datasets.
    • Central Tendency Measures: Provide insights into the typical or average value of a dataset. The mean, or average, is a simple measure of central tendency, whereas the median represents the centre point, which is less influenced by extreme values. The mode indicates the most often occurring value, providing a thorough insight of the dataset’s key trends.
    • Dispersion Measures: Reveal the spread or variability within a dataset. The range is a basic yet informative statistic that quantifies the difference between the maximum and least values. In contrast, variance and standard deviation provide a more sophisticated view of how data points differ from the mean. A low variance indicates that the values tend to be close to the mean. The standard deviation represents the average distance of data points from the mean, providing a clearer understanding of the spread. These metrics provide a more in-depth examination of the distribution’s shape and concentration.
    • Shape and Distribution: Kurtosis and skewness appear as important factors in determining the shape of a distribution. Kurtosis evaluates tail behaviour by identifying distributions with heavier or lighter tails than the typical distribution. Skewness, on the other hand, is a measure of asymmetry that reveals whether a distribution is skewed to the left or right. Together, these measurements help to paint a complete picture of the dataset’s structural complexities. What we want is a normal distribution, that is perfect symmetry. When data approximates a normal distribution, statistical methods and tests tend to be more valid and reliable.
    • Visualization Techniques: Using graphical representations to aid comprehension. Frequency distributions and histograms depict the distribution of values across categories or ranges. The five-number summary is encapsulated in box-and-whisker charts, which provide a short view of the dataset’s variability. These graphic tools make interpretation and communication more accessible.
    • Additional Dimensions: Descriptive statistics go beyond the basics, including concepts like interquartile range (IQR), coefficient of variation (CV), and position measurements like z-scores and percentile ranks.
      • A higher IQR indicates more fluctuation in the central section of the dataset where majority of points are situated that is usually the first standard deviation, whereas a lower IQR shows more concentrated data.
      • Positive z-scores indicate data points above the mean, while negative z-scores indicate points below. A z-score of 0 means the data point is at the mean. This measure helps identify outliers and understand the relative position of individual data points within the distribution.
      • A percentile rank of 75% indicates that the data point is greater than or equal to 75% of the values in the dataset. It gives a relative location measure, which is particularly useful when comparing individual data points across datasets.
      • If the CV is high, it implies that the standard deviation is a significant proportion of the mean. In a dataset of monthly income, a high CV would indicate that individual incomes vary widely in relation to the average income.
  2. Multivariate Probability Distributions
    • While univariate probability distributions are limited to a single random variable, multivariate probability distributions contain numerous variables at the same time. Exploration of multivariate probability distributions is critical for understanding the intricate interdependencies and interactions among variables, and it provides a rich toolkit for advanced statistical research.
    • The joint probability density function (PDF) or probability mass function (PMF) expresses the likelihood of various outcomes for the complete collection of variables and serves as the cornerstone of multivariate probability.  In PDF the area under the curve between a and b gives the likelihood of the random variable falling within a certain interval [a,b]. While PDF is used for continuous variables, PMF is used for discrete random variables, which can take distinct values. It gives the probability of the random variable taking on a specific value. The PMF assigns a chance of 1/6 to each of the six potential outcomes for a fair six-sided dice (1, 2, 3, 4, 5, 6).
    • Multivariate Normal Distribution: Provides a way to model joint distributions of two or more random variables. The multivariate normal distribution is characterized by a mean vector and a covariance matrix, and it plays a crucial role in various statistical analyses.Source
      • Take any linear combination of the variables, the resulting distribution remains normal.
      • The Central Limit Theorem (CLT) states that the sum (or average) of a large number of independent and identically distributed random variables follows a normal distribution. Because of this, it is an obvious choice for modelling the joint distribution of numerous variables.
      • To put it simply, the multivariate normal distribution allows us to simulate the combined behaviour of numerous variables. The mean vector represents the major trends, while the covariance matrix describes how these variables interact with one another.
    • Correlation and covariance: The covariance matrix represents the degree of linear dependency or independence of variables. A positive covariance suggests a positive linear relationship, whereas a negative covariance suggests a negative linear relationship. The correlation matrix, which is produced from the covariance matrix, standardises these correlations by providing a score ranging from -1 to 1, with 0 signifying no linear correlation.
    • Distribution Multinomial: Important in discrete multivariate distributions. It applies the binomial distribution notion to circumstances with more than two possible outcomes. Binomial is when 2 possible outcomes(true or false) and Multinomial is when there are multiple categories or outcomes. Using the distribution, we can calculate the expected number of occurrences for each category over a series of trials. This helps in understanding the average or expected distribution of outcomes. So the multinomial distribution, which is frequently used in categorical data analysis, predicts the chance of finding differing counts across numerous categories, making it useful in domains as diverse as genetics, marketing, and survey analysis.
      • Let’s consider an example, imagine rolling a six-sided die multiple times. The categories, in this case, are the numbers 1 through 6 on the die.
  3. Statistical Inference: Statistical inference, at its core, bridges the gap between raw observations and meaningful conclusions by offering a framework for making informed judgements based on uncertain information.
    • Probability is the language that statisticians use to measure uncertainty and model the unpredictability that exists in data. The shift from probability theory to statistical inference is highlighted by a change from explaining chance events to making population-level judgements based on sample data.
    • Estimation theory: Estimation, a fundamental part of statistical inference, addresses the problem of deriving useful information about population parameters from a small sample size. Maximum Likelihood estimate (MLE) and Bayesian estimate are two popular methods. MLE seeks parameter values that maximise the likelihood of observed data, whereas Bayesian estimating uses prior knowledge to update parameter beliefs.
    • Hypothesis testing: Provides a disciplined process for making judgements in the face of confusion. Statisticians formulate hypotheses about population parameters and use sample data to assess the evidence against a null hypothesis.
      • Uncovering Mean Differences with the T-Test: When working with tiny sample sizes, the t-test is very beneficial, as the standard z-test may be difficult due to the unknown population standard deviation. There are several types of t-tests, each adapted to a certain case. The independent two-sample t-test, which compares the means of two independent groups, is the most commonly used. The null hypothesis (H0) states that there is no significant difference between the means of the two groups in a typical scenario.
      • P-Values as a Measure of Evidence: The concept of p-values is critical for interpreting t-test findings. The p-value shows the likelihood of receiving observed outcomes, or more extreme results, if the null hypothesis is true. It measures the strength of evidence against the null hypothesis. A low p-value (usually less than a preset significance level, such as 0.05) indicates that the observed data is implausible under the null hypothesis.
    • Asymptotic Theory: As data sizes get larger, statisticians resort to asymptotic theory to understand how estimators and tests behave. Under specific conditions, the Law of Large Numbers and the Central Limit Theorem become guiding principles, ensuring practitioners that statistical processes converge to genuine values and follow normal distributions.
    • Bayesian Statistics:By adding previous knowledge into the inference process, Bayesian statistics introduces a paradigm shift. Bayesian inference provides a consistent framework for integrating current knowledge with new information, which is especially useful in sectors with limited data.
    • Challenges and Solutions: Statistical inference is not without its challenges. Among the challenges that statisticians face include overfitting, model misspecification, and the p-value debate. In the face of these obstacles, robust statistics and resampling methods, such as bootstrapping, offer solutions to improve the reliability of inference.
  4. Asymptotic Theory: Provides a powerful framework for analysing the behaviour of statistical procedures as sample sizes increase indefinitely. Asymptotic results, which are rooted in probability theory, provide light on the limitations of statistical inference, providing essential insights into the stability and reliability of estimators and tests. It extends probability theory’s foundations by investigating the convergence behaviour of random variables. The Law of Large Numbers and the Central Limit Theorem serve as foundational principles, indicating the tendency of sample averages to converge to population means and the formation of normal distributions in the sums of independent random variables.
    • However, this goes beyond the immediate scope of the current curriculum.

Leave a Reply

Your email address will not be published. Required fields are marked *