AP Statistics [STATS Modeling the World 4th ED Ch 5-8] (2024)

The Normal Model

Exploring Relationships Between Variables - Scatterplots and Association

Correlation

Linear Modeling and Regression

Regression Wisdom

100

If the distribution of a quantitative variable is ___ and ____ then we can replace histograms by approximating the distribution with ________(1).

What is unimodal and roughly symmetric?

What is a normal model?

100

For a Scatterplot, _____ is plotted on the x-axis, and ________ is plotted on the y-axis, showing _______________.

What is the Explanatory (independent/input) variable, and Response (dependent/output) variable?

Showing the relationship between two quantitative variables on the same cases (individuals).

100

Correlation (the correlation coefficient) is a number that measures the direction and strength of a linear association between two quantitative variables.

Before calculating a correlation, you must first look for ___________ in the scatterplot.

What is straightness?

100

A ______ model is an equation of a line that can be used to contain the essence of a linear relationship between two ___________ variables.

What is Linear?

What is Quantitative?

100

These have x-values far from x-bar ((x-bar,y-bar) is the fulcrum) and pull more strongly on the regression line.

With enough leverage the ____ can appear deceptively small

What is High leverage points?

What is residuals?

200

A normal model is constructed from a rather complicated equation only dependent on parameters for ____ and ____. It is denoted as: _______.

What is mean, standard deviation?

What is N(mu,sigma)?

200

Once we make a scatterplot, we describe association by telling about: 1.__2.__3.___4.___

1. Form: straight, curved, no pattern, other?

2. Direction: + or - slope?

3. Strength: how much scatter {how closely points follow the form}

4. Unusual Features: outliers, clusters, subgroups?

200

Correlation describes the ____ and ____ of the _____ relationship between two _____ variables, without significant ______.

What is strength, direction, and linear?

quantitative, outliers


200

Observed (response variable data) value minus (linear model) predicted value is ...

If positive, then the linear model makes an _____.

If negative, then the linear model makes an _____.

The Standard deviation of the residuals, se gives ...

What is Residual?

What is an underestimate?

What is an overestimate?

se gives a measure of how much the points spread around the regression line.

200

Omitting these from the data results in a very different regression model. They are often difficult to detect because they _______ the model which causes their residual to be small.

What is Influential point(s)?

What is distort?


300

The distribution of each normal model is ______, ______, and _____ as shown by its density curve. We call it a density curve because the equation for the normal model adjusts the scale (of y, height) so that the area under the curve = ____ and gives the _______ for the distribution.

What is unimodal, symmetric, and bell-shaped?

What is 1?

What is relative frequency?

300

________ is a deliberately vague term describing the relationship between two variables. If positive then ____________.

What is Association?

Then increases in one variable generally correspond to increases in the other.

300

3 conditions needed for Correlation:

1. Quantitative Variables

2. Straight Enough

3. (No) Outlier(s)

300

The unique line that minimizes the variance of the residuals (sum of the squared residuals).

Equation for standardized values: __________.

Equation for actual x and y values: ________.

What is Regression line?

What is Line of best fit?

z-haty= r * zx

y-hat = b0 + b1 * x

300

Calculate the regression line with and without this as the surest way to verify it affects

What is a suspect point / outlier

400

We do this to avoid having to work with the complicated normal model equation or lug around a myriad of tables for every possible N(mu,sigma).

We convert our data to z-scores and use just one Standard Normal Model N(0,1) and its associated table. We read a Normal percentile from a table of normal probabilities, giving the percentage of values in a standard normal distribution found lying below a particular z-score.

400

Adjectives and adverbs that can be used to help describe the association between two quantitative variables in context.

What is: likely ___, likely to be ___, no dominant ____, weak ___, strong ____. In Context: maybe ...., usually ...., eventually ..., some people ...

400

The correation coefficient is found by __________.

It's value ranges from __ to ___, it has no ____, and is immune to changes of _____.

finding the average product of the z-scores (standardized values).

-1 to +1

units

scale or order

400

1. Quantitative Variables

2. Straight Enough - check original scatterplot & residual scatterplot

3. Outlier (clusters) - points with large residuals and/or high leverage

What are the 3 conditions needed for Linear Regression Models: /* same as correlation */

400

Do this when you have

1) Points with large residuals (might not influence the model much but aren't consistent with the overall form) and/or high leverage (e.g. extreme conformers that don't influence the model but do inflate R2)

2) Change in Scatterplot Pattern as a result of changes over time or subset that behave differently

What is: Consider comparing two or more regressions

500

______________ is a more precise method than a histogram of checking the nearly normal condition, that the shape of the data's distribution is _______ and ______.

What is a normal probability plot?

What is unimodal and roughly symmetric?

A Normal probability plot works by comparing the data values' actual z-scores with those we'd expect to find in a data set of this size. When they match up well, the line is straight and the data can be considered nearly normal. (see page 126).

500

The first rule of data analysis strongly applies to the best way to start observing the relationship between two quantitative variables.

What is: Make a picture! A scatterplot can reveal a lot about the relationship between two variables, including whether the relationship is linear and whether there are any outliers.

500

Perfect correlation (when r= +/- 1) occurs only when the points ____________, and enables you to perfectly _______ one variable knowing the other.

No correlation (when r = 0), means that knowing one variable gives you ______.

You should give the ____ and ____ of x and y along with the correlation because ....

the points lie exactly on a straight line

predict

no information about the other variable

Mean, Standard deviation

Correlation is not a complete description of two-variable data and its formula uses means and standard deviations in the z-scores.


500

The _______ is the key to assessing how well the model fits (extracts the form).

What is variation in the residuals?

500

Regression based on summaries of the data ______ because _______.

tend to look stronger than the regression on the original data.

summary statistics are less variable than the underlying data.


AP Statistics [STATS Modeling the World 4th ED Ch 5-8] (2024)
Top Articles
Latest Posts
Article information

Author: Amb. Frankie Simonis

Last Updated:

Views: 6628

Rating: 4.6 / 5 (76 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Amb. Frankie Simonis

Birthday: 1998-02-19

Address: 64841 Delmar Isle, North Wiley, OR 74073

Phone: +17844167847676

Job: Forward IT Agent

Hobby: LARPing, Kitesurfing, Sewing, Digital arts, Sand art, Gardening, Dance

Introduction: My name is Amb. Frankie Simonis, I am a hilarious, enchanting, energetic, cooperative, innocent, cute, joyous person who loves writing and wants to share my knowledge and understanding with you.