Discussion 4

Order now Get a quote

Home
Blog
Discussion 4

Assigne Readings:

Chapter 5. Linear Regression as a Fundamental Descriptive Tool

Chapter 6. Correlation vs. Causality in Regression Analysis

Initial Postings: Read and reflect on the assigned readings for the week. Then post what you thought was the most important concept(s), method(s), term(s), and/or any other thing that you felt was worthy of your understanding in each assigned textbook chapter.Your initial post should be based upon the assigned reading for the week, so the textbook should be a source listed in your reference section and cited within the body of the text. Other sources are not required but feel free to use them if they aid in your discussion.

Also, provide a graduate-level response to each of the following questions:

What are types of regression? For each type of regression, give an application. Does your job use any? if so how?. Please cite examples according to APA standards.

[Your post must be substantive and demonstrate insight gained from the course material. Postings must be in the student’s own words – do not provide quotes!]

[Your initial post should be at least 450+ words and in APA format (including Times New Roman with font size 12 and double spaced). Post the actual body of your paper in the discussion thread then attach a Word version of the paper for APA review]

Linear Regression as a Fundamental Descriptive Tool

Chapter 5

Learning Objectives

Construct a regression line for a dichotomous treatment

Construct a regression line for a multi-level treatment

Explain both intuitively and formerly the formulas generating a regression line for a single treatment

Distinguish the use of sample moment equations from estimation via least squares

Distinguish regression equations for single and multiple treatments

Describe a dataset with multiple treatments using multiple regression

Explain the difference between linear regression and a regression line

‹#›

Scatterplot of Price and Sales

How do we summarize the relationship between these two variables?

‹#›

The Regression Line for a Dichotomous Treatment
Dichotomous treatment
Two treatment statuses—treated and untreated
Regression analysis
The process of using a function to describe the relationship among variables

‹#›

The Regression Line for a Dichotomous Treatment : An Intuitive Approach

\

Draw a line through these data that will best describe the relationship between Price and Treatment

‹#›

The Regression Line for a Dichotomous Treatment: An Intuitive Approach

In general, the formula for a line is: Y = f(X) = b + mX,
where b is the intercept and m is the slope of the line

‹#›

Line Describing the Relationship Between Profits and Treatment

What is the equation for the line shown here?
Profits = 208.33 – 20 × Treatment

‹#›

Line Describing the Relationship Between Profits and Price

Knowing the two point on the Profits/Price line, solve for slope and intercept
Profits = 248.33 – 40 × Price

‹#›

The Regression Line for a Dichotomous Treatment
Whenever there is a dichotomous treatment, a line can be built describing the relationship between the treatment and outcome by using the means for each treatment status called the regression line for a dichotomous treatment
Set f(0) = and f(1)
The equation for the line is:
Outcome = +
(- ) × Treatment

‹#›

The Regression Line for a Dichotomous Treatment: A Formal Approach
Observed outcomes in terms of two points on a line
Profiti = f(1.00) + ei if Pricei = 1.00
Profiti = f(1.50) + ei if Pricei = 1.50
i delineates between different observations, (i
ei is the residual for the observation i.
The residual is the difference between the observed outcome and the corresponding point on the regression line for a given observation
ei = Yi – f(xi)

‹#›

Scatterplot of Residuals for Price of $1.00 when f(1.00) = $220
FIRST RESIDUAL IS 20. THIS MEANS THE ACTUAL PROFIT WE OBSERVE (240) IS 20 HIGHER THAN WHAT WE OBSERVE (220).

SECOND RESIDUAL IS -20. THIS MEANS THE ACTUAL PROFIT WE OBSERVE (200) IS 20 HIGHER THAN WHAT WE OBSERVE (220).

THIRD RESIDUAL IS -35. THIS MEANS THE ACTUAL PROFIT WE OBSERVE (240) IS 20 HIGHER THAN WHAT WE OBSERVE (185).

‹#›

The Regression Line for a Dichotomous Treatment: A Formal Approach
Residuals for price of $1.00 when f(1.00) = $220

The average residual is [20 + (-20) + (-35)]/3 = -11.67
A choice for f(1.00) is best if it tends to neither overshoot nor undershoot the observed outcomes. That means, a choice for f(1.00) is best if the corresponding residuals are on average, zero.

‹#›

The Regression Line for a Dichotomous Treatment: A Formal Approach
For the residuals to average zero means:

THE RESIDUALS TO AVERAGE ZERO, BEST CHOICE FOR f(1.00):

Similarly, when price is $1.50, the best choice is the average of profits when the price is $1.50 = (205 + 170 + 190)/3 = 188.33

‹#›

The Regression Line for a Multi-Level Treatment: An Intuitive Approach
Multi-level treatment is a treatment that can be administered in more than one quantity
HERE, PRICES ARE, $1.00, $1.50, $2.00. PRICE OF $1.00 IS UNTREATED AND A $0.50 PRICE INCREASE IS THE TREATMENT.

‹#›

The Regression Line for a Multi-Level Treatment: An Intuitive Approach
The approach we used for the dichotomous treatment generally does not work for a multi-level treatment
The problem is that when three or more points are plotted on a graph, it is generally the case that they might not fall on the same line

‹#›

The Regression Line for a Multi-Level Treatment: An Intuitive Approach
Line attempting to connect average profits to the following price levels:
f(1.00) = 208.33
f(1.50) = 188.33
f(2.00) = 160

‹#›

The Regression Line for a Multi-Level Treatment: An Intuitive Approach
Using the average outcome to plot the points for each treatment level generally will result in not being able to connect three points on a single line when there more than two treatment levels
f(1.00) = b + m × 1.00 208.33 = b + m × 1.00
f(1.50) = b + m × 1.50 188.33 = b + m × 1.50
f(2.00) = b + m × 2.00 160 = b + m × 2.00
We cannot solve for m and b as there are three equations to solve but only two unknowns

‹#›

The Regression Line for a Multi-Level Treatment: An Intuitive Approach
Rather than plot an “ideal” point for each treatment level and then solve for the corresponding slope and intercept, try to directly solve for the slope and intercept of the line believed to best describe the describes the data
It should not generally overshoot or undershoot the data
Its tendency to over or undershoot the data across specific price levels should not depend on the price level

‹#›

Two Candidate Lines for Describing Profits and Price Data

‹#›

The Regression Line for a Multi-Level Treatment: A Formal Approach
For our example, we have three levels and nine points. Expressing them in terms of intercept and slope:
Profiti = b + m × 1.00 + ei, if Pricei = 1.00
Profiti = b + m × 1.50 + ei, if Pricei = 1.50
Profiti = b + m × 2.00 + ei, if Pricei = 2.00

Here i takes on the values one through nine, since there are nine points. Residuals, ei, are the difference between the observed profit and the corresponding point on the line for a given observation.
Ei = Profiti – b – m × Pricei

‹#›

The Regression Line for a Multi-Level Treatment: A Formal Approach
Applying the same approach used for a dichotomous treatment, solve for the “best” line by finding a slope and intercept that makes the residuals average zero for each price point.

THIS AGAIN GIVES US THREE EQUATIONS AND TWO UNKNOWNS.

‹#›

The Regression Line for a Multi-Level Treatment: A Formal Approach
Alternative way of defining what makes a line the best to describe the data. Criteria includes:
It should not generally overshoot or undershoot the data
Its tendency to over or undershoot the data across specific price levels should not depend on the price level

‹#›

The Regression Line for a Multi-Level Treatment: A Formal Approach
Translating these criteria in terms of residuals:
The residuals for all data points average to zero
The size of the residuals is not correlated with the treatment level
Expressing these two criteria in equation form:

‹#›

The Regression Line for a Multi-Level Treatment: A Formal Approach
The first equation ensures that the residual average zero across all observations, and the second equation ensures that the size of the residuals is not related to Price level
Solving these two equations yields:
m = -48.33
b = 258.06
The line that best fits the data, where “best” implies residuals that average zero and are not correlated with the treatment:
Profit = 258.06 – 48.33 × Price

‹#›

The Regression Line for a Multi-Level Treatment: A Formal Approach
Simple regression line
The slope is the sample covariance of the treatment and outcome divided by the sample variance of the treatment
The intercept is the mean value of the outcome minus the slope times the mean value of the treatment
Y = b + mX
Solving for m and b yields the following formulas for the slope and intercept of a simple regression line:
m =

b = – m

‹#›

The Regression Line for a Multi-Level Treatment: A Formal Approach
Applying these generalized formulas for our dichotomous price/profit example:

USING THE FORMULAS FOR VARIANCE AND COVRIANCE:
sCov (Profit, Price) = -3,
sVar (Price) = 0.075,

= 1.25 and = 198.33

Plugging these into our formulas,
m = -3/0.075 = -40, and
b = 198.33 – (-40)1.25= 248.33.

‹#›

Sample Moments and Least Squares
Sample moment
The mean of a function of a random variable(s) for a given sample
For example, for a sample size 20 that contains information on salaries, is a sample moment, where Salaryi is the random variable and the function is defined as f(a) = a3
Ordinary least squares
The process of solving for the slope and intercept that minimize the sum of the squared residuals
Minb,m =Yi – b – mXi)2

‹#›

Sample Moments and Least Square
Objective function
A function ultimately wished to be maximized or minimized
For ordinary least squares, the objective function is the sum of squared residuals ()
Least absolute deviations (LAD)
Use the sum of the absolute value of the residuals as the objective function and solve for the slope and intercept that minimize it

‹#›

Ordinary Least Square vs Least Absolute Deviation for Describing a Dataset
LINE A IS CLOSER TO THE OUTLIER, SO IT IS COMING FROM OLS AND LINE B IS COMING FROM LAD.

‹#›

Regression for Multiple Treatments
CHOLESTEROL LEVEL AND DRUG DOSES FOR 15 INDIVIDUALS.

‹#›

Regression for Multiple Treatments
Single vs. Multiple Treatments
Cholesterol = 235.17 – 0.997 × Drug A
Cholesterol = 205.83 – 0.107 × Drug B
Cholesterol outcome as follows:
Cholesteroli = b + m1DrugAi + m2DrugBi + ei
Expressing the OLS criteria in equation form:

‹#›

Regression Output in Excel for Cholesterol Regressed on Drug A and Drug B
HERE WE HAVE THE VALUES FOR:

b = 256.20,
m1 = -1.259, AND
m2 = -0.514.

‹#›

Regression Plane for Cholesterol Regressed on Drug A and Drug B

‹#›

Multiple regression
Solving for a function that best describes the data the implies the use of OLS (or equivalently, the sample moment equations)
Single regression the process that produces the simple regression line for a single treatment
Multiple Regression

‹#›

Multiple Regression
For a sample size of N with K treatments, the associated equations are:

‹#›

What Makes Regression Linear?
Linear regression is the process of fitting a function that is linear in its parameters to a given dataset
Y = b + m1X1 + m2X2 + … + mKXK
Here {b, m1, …, mK} are the parameters for this function
The use of linear regression does not at all imply construction of a line to fit the data
Linear regression is linear in the parameters but not necessarily the treatment(s)
It allows for an unlimited number of possible “shapes” for the relationship between the outcome and any particular treatment

‹#›

image1

image2

image3.JPG

image4

image5

image6

image7

image8

image7.JPG

image9

image10.JPG

image11.JPG

image12

image13

image14

image15

image16

image17.JPG

image18.JPG

image20

image19.JPG

image22

image21

image23

image24.JPG

image25.JPG

image26.JPG

image27.JPG

image28.JPG

image29.JPG

Correlation vs Causality in Linear Regression Analysis

Chapter 6

Learning Objectives

Differentiate between correlation and causality in general and in the regression environment

Calculate partial and semi partial correlation

Execute inference for correlation regression analysis

Execute passive prediction using regression analysis

Execute inference for determining functions

Execute active prediction using regression analysis

Distinguish the relevance of model fit between active and passive prediction

‹#›

The Difference Between Correlation and Causality

Yi = fi(X1i, X2i, …, XKi) + Ui

We define as the determining function, since it comprises the part of the outcome that we can explicitly determine

Ui can only be inferred by solving Yi – fi(X1i, X2i, …, XKi)

Data-generating process as a framework for modeling causality

The reasoning established to measure an average treatment effect using sample means easily maps to this framework

Easily extends into modeling causality for multi-level treatments and multiple-treatments

‹#›

A causal relationship between two variables clearly implies co-movement.

If X casually impacts Y, then when X changes, we expect a change in Y

However, variables often move together even when there is no casual relationship between them

For example, height of two different children of ages 5 and 10. Since both the children are growing during these ages, their heights will generally move together. this co-movement is not due to causality – an increase in height by one child will not change in the height for the other.

The Difference Between Correlation and Causality

‹#›

Measurement of the co-movement between two variables in a dataset is captured through sample covariance or correlation:

Covariance: sCov(X,Y) =

Correlation: sCorr(X,Y) =

The Difference Between Correlation and Causality

‹#›

When there are more than two variables, e.g., Y, X1, X2, we can also measure partial correlation between two of the variables

Partial correlation between two variables is their correlation after holding one or more other variables fixed

The Difference Between Correlation and Causality

‹#›

Causality implies that a change in one variable or variables causes a change in another

Data analysis attempting to measure causality generally involves an attempt to measure the determining function within the data-generating process

Correlation implies that variables move together

Data analysis attempting to measure correlation is not concerned about the data-generating process and determining function, it uses standard statistical formulas (sample correlation, partial correlation) to assess how variables move together

The Difference Between Correlation and Causality

‹#›

The dataset is a cross-section of 230 grocery stores

AvgPrice = Average Price
AvgHHSize = Average Size of Households of Customers at that Grocery Store.
Regression Analysis for Correlation

‹#›

Sales = b + m1AvgPrice + m2AvgHHSize

Solving b, m1, m2:

Sales = 1591.54 – 181.66 × AvgPrice + 128.09 × AvgHHSize

This equation provides us information about how the variables in the equation are correlated within our sample.

Regression Analysis for Correlation

‹#›

Unconditional correlation is the standard measure of correlation between two variables X and Y
Corr(X,Y) =

Sx = Sample standard deviation for X and
SY = Sample standard deviation for Y

Partial correlation between X and Y is a measure of the relationship between these two variables, holding at least one other variable fixed
Semi-partial correlation between X and Y is a measure of the relationship between these two variables, holding at least one other variable fixed for only X or Y

Different Ways to Measure Correlation Between Two Variables

‹#›

For the general regression equation: Y = b + m1X1 + … +mKXK the solutions for m1 through mk when solving the sample moment equations are proportional to the partial and semi-partial correlation between Y and the respective Xs
Regression Analysis for Correlation

‹#›

Suppose we have the data for the entire population for our grocery store data, then, we have:
Sales = B + M1AvgPrice + M2AvgHHSize
Capital letters are used to indicate that these are the intercept and slopes for the population, rather than the sample
Solve for B, M1, and M2 by solving the sample moment equations using the entire population of data
Regression and Population Correlation

‹#›

Regression and Population Criteria
We do not have the data for the entire population, but for a sample dataset for the population whose regression line is:
Sales = b + m1AvgPrice + m2AvgHHSize
Solve for b, m1 and m2
The intercept and slope(s) of the regression equation describing a sample are estimators for the intercept and slope(s) of the corresponding regression equation describing the population.

‹#›

Consistent estimator is an estimator whose realized value gets close to its corresponding population parameter as the sample size gets large.
Regression and Population Correlation

‹#›

Regression Line for Full Population

‹#›

Regression Lines for Three Samples of Size 10

‹#›

Regression Lines for Three Samples of Size 30

‹#›

In order to conduct hypothesis testing or building confidence intervals for the population parameters of a regression equation, we need to know the distribution of the estimators
Each estimator becomes very close to its corresponding population parameters for a large sample
For a large sample, these estimators are normally distributed

Confidence Interval and Hypothesis Testing for the Population Parameters

‹#›

A large random sample implies that:
b~N(B,σB)
m1~N(M1,σm1)
mk~N(MK,σmk)
If we write each element in the population as:
Yi = B + M1X1i + … + MKXK + Ei
, where Ei is the residual, then Var(Y|X) is equal to Var(E|X)
Common assumption that this variance is constant across all values of X , so Var(Y|X) = Var(E|X) = Var(E) = σ2
This consistency of variance is called homoscedasticity
Confidence Interval and Hypothesis Testing for the Population Parameters

‹#›

Sales = 1591.54 – 181.66 × AvgPrice + 128.09 × AvgHHSize
If Store A has an average price of $0.50 higher than Store B, and Store A has an average household size that is 0.40 less than Store B, then:
= -181.66 × 0.50 + 128.09 × (-0.4) = -142
We predict Store A has 143 fewer sales than Store B
When using correlational regression analysis to make predictions, we must be considering a population that spans across time and we assume that the population regression equation best describes the future population
Prediction Using Regression

‹#›

Regression and Causation
Data-generating process of an outcome Y can be written as:
Yi = fi(X1i, X2i, …, XKi) + Ui
We assume the determining function can be written as:
fi(X1i, X2i, …, XKi) = α + β1X1i + β2X2i +… βKXKi
Combining these assumptions into a single assumption, the data-generating process can be written as:
Yi = α + β1X1i + β2X2i +… βKXKi + Ui
Error term represents unobserved factors that determine the outcome

‹#›

Regression and Causation
Yi = B + M1X1i + … +MKXK + Ei (Correlation model)
Yi = α + β1X1i + … βKXKi + Ui (Causality model)
Correlational model residuals (Ei) have a mean of zero and are uncorrelated with each of Xs. For this model, we simply plot all the data points in the population and write each observation in terms of equation that best describes these points.
For the causality model, the data-generating process is the process that actually generating the data we observe and determining function need not be the equation that best describe the data.

‹#›

CONSIDERING THESE DATA FOR Y, X, AND U ARE FOR THE ENTIRE POPULATION:

THESE DATA WERE GENERATED USING THE DATA- GENERATING PROCESS: Yi = 5 + 3.2Xi + Ui
MEANING WE HAVE A DETERMING FUNCTION : f(X) = 5 + 3.2X
The Difference Between the Correlation Model and the Causality Model: An Example

‹#›

Scatterplot, Regression Line, and Determining Function of X and Y
IN THIS FIGURE, WE PLOT Y AND X ALONG WITH THE DETERMING FUNCTION (BLUE LINE) AND THE POPULATION REGRESSION EQUATION (RED LINE).

‹#›

Regression and Causation
The correlation model describes the data best but need not coincide with the causal mechanism generating the data
The causality model provides the casual mechanism but need not describe the data best

‹#›

The Relevance of Model Fit for Passive and Active Prediction
Total sum of squares (TSS): The sum of the squared difference between each observation of Y and the average value of Yi
TSS = Yi – )2
Sum of squared residuals (SSRes): The sum of the squared residuals.
SSRes = i
R-squared: The fraction of the total variance in Y that can be attributed to variation in the Xs
R2 = 1 – SSRes/TSS

‹#›

The Relevance of Model Fit for Passive and Active Prediction
A high R-squared implies a good fit, meaning the points on the regression equation tend to be close to the actual Y values
R-squared for passive prediction (correlation) : Finding a high R-squared implies the prediction is close to reality
R-squared for active prediction (causality): R-squared is not a primary consideration when evaluating predictions

‹#›

image1

image2

image3.JPG

image4

image5

image6

image7

image8

image9.JPG

image10

image11

Calculate the price of your order

Select your paper details and see how much our professional writing services will cost.

Type of paper

Academic level

Pages 275 words

We`ll send you the first draft for approval by at

Price: $36

Formatting (MLA, APA, Chicago, custom, etc.)
Title page & bibliography
24/7 customer support
Amendments to your paper when they are needed
Chat with your writer

275 word/double-spaced page
12 point Arial/Times New Roman
Double, single, and custom spacing

We care about originality
Our custom human-written papers from top essay writers are always free from plagiarism.
We protect your privacy
Your data and payment info stay secured every time you get our help from an essay writer.
You control your money
Your money is safe with us. If your plans change, you can get it sent back to your card.

How it works

1
You give us the details

Complete a brief order form to tell us what kind of paper you need.
2
We find you a top writer

One of the best experts in your discipline starts working on your essay.
3
You get the paper done

Enjoy writing that meets your demands and high academic standards!

Analysis (any type)

Advantages and Disadvantages of Lowering the Voting Age to Thirteen

Undergrad. (yrs 1-2)

Political science

APA

View this sample
Coursework

Leadership

Undergrad. (yrs 1-2)

Business Studies

APA

View this sample
Essay (any type)

Is Pardoning Criminals Acceptable?

Undergrad. (yrs 1-2)

Criminal Justice

MLA

View this sample

Get your own paper from top experts

Order now

Perks of our essay writing service

We offer more than just hand-crafted papers customized for you. Here are more of our greatest perks.

Swift delivery

Our writing service can deliver your short and urgent papers in just 4 hours!
Professional touch

We find you a pro writer who knows all the ins and outs of your subject.
Easy order placing/tracking

Create a new order and check on its progress at any time in your dashboard.
Help with any kind of paper

Need a PhD thesis, research project, or a two-page essay? For you, we can do it all.
Experts in 80+ subjects

Our pro writers can help you with anything, from nursing to business studies.
Calculations and code

We also do math, write code, and solve problems in 30+ STEM disciplines.

Frequently asked questions

Get instant answers to the questions that students ask most often.

See full FAQ

Is there a possibility of plagiarism in my completed order?

We complete each paper from scratch, and in order to make you feel safe regarding its authenticity, we check our content for plagiarism before its delivery. To do that, we use our in-house software, which can find not only copy-pasted fragments, but even paraphrased pieces of text. Unlike popular plagiarism-detection systems, which are used by most universities (e.g. Turnitin.com), we do not report to any public databases—therefore, such checking is safe.

We provide a plagiarism-free guarantee that ensures your paper is always checked for its uniqueness. Please note that it is possible for a writing company to guarantee an absence of plagiarism against open Internet sources and a number of certain databases, but there is no technology (except for turnitin.com itself) that could guarantee no plagiarism against all sources that are indexed by turnitin. If you want to be 100% sure of your paper’s originality, we suggest you check it using the WriteCheck service from turnitin.com and send us the report.
I received some comments from my teacher. Can you help me with them?

Yes. You can have a free revision during 7 days after you’ve approved the paper. To apply for a free revision, please press the revision request button on your personal order page. You can also apply for another writer to make a revision of your paper, but in such a case, we can ask you for an additional 12 hours, as we might need some time to find another writer to work on your order.

After the 7-day period, free revisions become unavailable, and we will be able to propose only the paid option of a minor or major revision of your paper. These options are mentioned on your personal order page.
How will I receive a completed paper?

You will get the first version of your paper in a non-editable PDF format within the deadline. You are welcome to check it and inform us if any changes are needed. If everything is okay, and no amendments are necessary, you can approve the order and download the .doc file. If there are any issues you want to change, you can apply for a free revision and the writer will amend the paper according to your instructions. If there happen to be any problems with downloading your paper, please contact our support team.
Where do I upload files?

When you submit your first order, you get a personal account where you can track all your orders, their statuses, your payments, and discounts. Among other options, you will have a possibility to communicate with your writer via a special messenger. You will be able to upload all information and additional materials on your paper using the “Files” tab on your personal page. Please consider uploading everything you find necessary for our writer to perform at the highest standard.

See full FAQ

Take your studies to the next level with our experienced specialists

Order now

Discussion 4

Calculate the price of your order

How it works

Samples from our advanced writers

Get your own paper from top experts

Perks of our essay writing service

Frequently asked questions

Is there a possibility of plagiarism in my completed order?

I received some comments from my teacher. Can you help me with them?

How will I receive a completed paper?

Where do I upload files?

Take your studies to the next level with our experienced specialists