Discussion 7

 Chapter 10. Identification and Data Assessment

Initial Postings: Read and reflect on the assigned readings for the week. Then post what you thought was the most important concept(s), method(s), term(s), and/or any other thing that you felt was worthy of your understanding in each assigned textbook chapter.Your initial post should be based upon the assigned reading for the week, so the textbook should be a source listed in your reference section and cited within the body of the text. Other sources are not required but feel free to use them if they aid in your discussion.

Also, provide a graduate-level response to each of the following questions:

  1. In Chapter 10 the focus of the material is identifying and assessing data. One of the chief concerns of identifying and assessing data is extrapolation and interpolation. Please explain both of these concepts and give a reason why these scenarios would occur. Please address each component of the discussion board. Also, cite examples according to APA standards.

[Your post must be substantive and demonstrate insight gained from the course material. Postings must be in the student’s own words – do not provide quotes!] 

[Your initial post should be at least 450+ words and in APA format (including Times New Roman with font size 12 and double spaced). Post the actual body of your paper in the discussion thread then attach a Word version of the paper for APA review] 

Identification and Data Assessment

Chapter 10

© 2019 McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or distribution without the prior written consent of McGraw-Hill Education

Learning Objectives

Explain what it means for a variable’s effect to be identified in a model

Explain extrapolation and interpolation and how each inherently suffers from an identification problem

Distinguish between functional form assumptions and enhanced data coverage as remedies for identification problems stemming from exploration and interpolation

Differentiate between endogeneity and types of multicollinearity as identification problems due to variable co-movement

Articulate remedies for identification problems and inference challenges due to variable co-movement

Solve for the direction of bias in cases of variable co-movement


© 2019 McGraw-Hill Education.

The table below shows a subsample of rocking chair data

Your goal is to estimate the average treatment effect of price on sales. On average, when price increases by $1, what is the effect on sales of rocking chairs?

Assessing Data via Identification


© 2019 McGraw-Hill Education.

A parameter (e.g., β) is identified within a given model if it can be estimated with any level of precision given a large enough sample from the population
Suppose we assume the data-generating process as:
Salesi = α + βPricei + Ui
Within this model, we are interested in accurately estimating β.
A parameter is identified if, for a given confidence level K ( < 100% ) and a given “length” L, we can build a confidence interval that contains β with length less than l and confidence level of K, given a large enough sample of data Assessing Data via Identification ‹#› © 2019 McGraw-Hill Education. Identification Example Define p as the probability of rolling a 3 on any single roll of a die. Define X to be number of 3s observed on a single roll of a die( X = 1, for a roll of 3 and X = 0, for any other number), so E[X] = p It can be shown that Var[X] = p(1 – p). Using this framework, the parameter p is identified We can estimate p as precisely as we want given enough data on the roll of the die (given enough rolls of the die) Assessing Data via Identification ‹#› © 2019 McGraw-Hill Education. The fact that p is identified follows directly from central limit theorem Suppose the die is rolled N times. Define x1 as the observed values of X for the first roll, x2 for the second, and so on. Then, define: = [ the sample mean for X, or equivalently, the portion of the N rolls that showed a 3 Given these definitions, the central limit theorem states that: ~ N(p, ) as N gets large Assessing Data via Identification ‹#› © 2019 McGraw-Hill Education. Distribution of Mean of X for N =50 and N = 5,000 Assessing Data via Identification ‹#› © 2019 McGraw-Hill Education. Extrapolation and Interpolation NOTE HOW THE VARIABLES “SALES” AND “PRICE” MOVE TOGETHER IN THE PRICE RANGE OF $210 TO $225 AND IN THE PRICE RANGE OF $275 TO $300 ‹#› © 2019 McGraw-Hill Education. Suppose we want to know how Sales move with Prices in other price ranges Interpolation involves drawing conclusions where there are “gaps” in the data Data gap is any place where there are missing data for a variable over an interval of values, but data are not missing for at least some values on both ends of the interval Extrapolation involves drawing conclusions beyond the extent of the data Extrapolation and Interpolation ‹#› © 2019 McGraw-Hill Education. Must be considered when engaging in interpolation and/or extrapolation The determining factor is whether the gap(s) in, or extend of, the data are due to random limitations in the sample or limitations in the population If it is the former, there may be no identification problem If it is the latter, then there is an identification problem that must be addressed Identification Problems ‹#› © 2019 McGraw-Hill Education. Attempt to draw f(.) and g(.) without any mathematical formulas WE ARE ATTEMPTING TO INTERPOLATE (FILL IN THE DATA GAP) AND ATTEMPTING TO EXTRAPOLATE (EXTEND BEYOND THE DATA’S RANGE). Identification Problems ‹#› © 2019 McGraw-Hill Education. When interpolation or extrapolation is used to fill in gaps or limited extend of the data sample, but not the population, there is not an identification problem When interpolation or extrapolation is used to fill gaps or limited extend of the population, there is an identification problem No matter how much data is collected from the population, it will not help to draw any conclusions about what is happening in the unobserved range(s) Identification Problems ‹#› © 2019 McGraw-Hill Education. Suppose you want to engage in interpolation and/or extrapolation when there exists an identification problem For a general model of the data-generating process, where no assumptions are made about the determining function, we cannot sample more data from the population There are two key approaches toward solving this type of identification problem: Changes in the population A functional form assumption Remedies ‹#› © 2019 McGraw-Hill Education. Changing the population to alleviate an identification problem A new singer has been promoting her music by selling physical copies of her music at various high schools. She charges the same price to everyone and finds that the seniors buy the most often, freshman the least, and sophomores and juniors are in between This tells her that her sales appear to be increasing by age of customers She would like to extrapolate this relationship beyond just high school-aged kids Using only data from high schools, she has an identification problem Remedies: An Example ‹#› © 2019 McGraw-Hill Education. The figure illustrates possible ways to extrapolate past age 18, but there are no data to sort through the options. A CLEAR OPTION TO THIS IDENTIFICATION PROBLEM WOULD BE TO TRY SELLING HER MUSIC AT COLLEGES AND COLLECT DATA ON HER SALES PERFORMANCE AMONG THIS GROUP. THIS SIMPLE EXPANSION OF POPULATION WILL ALLEVIATE THE IDENTIFICATION PROBLEM. Remedies ‹#› © 2019 McGraw-Hill Education. Imposing a functional form assumption to alleviate an identification problem Standard practice is to assume a functional form of the determining function that applies for all relevant price levels Assume a data-generating process with a linear functional form for the determining function: Salesi = α + βPricei + Ui This assumption imposes the shape of the relationship between Sales and Price to be linear, but also dictates how to interpolate and/or extrapolate Remedies ‹#› © 2019 McGraw-Hill Education. HERE, WE ARE ESTIMATING α AND β USING ONLY DATA WITH PRICE IN THE RANGES ($210, $225) AND ($275, $300). WE ARE APPLYING THESE ESTIMATED VALUES ACROSS MANY OTHER PRICE LEVELS. WE ARE USING THESE VALUES TO INTERPOLATE BETWEEN $225 AND $275 AND TO EXTRAPOLATE ALL THE WAY TO $350. Regression Line for Rocking Chair Sales and Price Data ‹#› © 2019 McGraw-Hill Education. Another circumstance in which identification problems typically arise is when there is variable co-movement in the population We use the broader term “co-movement” rather than correlation, since simple correlation alone do not encompass all the ways variables may move together in a population that result in identification problems Variable Co-Movement ‹#› © 2019 McGraw-Hill Education. Variable Co-Movement Three types of variable co-movement: Perfect multicollinearity Imperfect multicollinearity Endogeneity ‹#› © 2019 McGraw-Hill Education. Consider the following data-generating process: Yi = α + β1X1i +…+ βKXKi + Ui Use regression analysis to estimate We have assumed a functional form, so as long as there is some variation in there will not be identification problems stemming from voids in the data There may be still be an identification problem when there is co-movement among the Xs and/or co-movement between one or more X and U Variable Co-Movement ‹#› © 2019 McGraw-Hill Education. Perfect multicollinearity is a condition in which two or more independent variables have an exact linear relationship If we can write there is perfect multicollinearity Perfect multicollinearity in our model is equivalent to being able to express for all i in the population Perfect multicollinearity implies a special type of correlation among two or more independent variables Variable Co-Movement ‹#› © 2019 McGraw-Hill Education. Variable Co-Movement Imperfect multicollinearity is a condition in which two or more independent variables have nearly an exact linear relationship When this condition exists for a data-generating process, we can not express for all i in the population Imperfect multicollinearity is equivalent to there being at least one semi-partial correlation that is “high”– nearly equal to 1 It is common to characterize a correlation above 0.8 as high ‹#› © 2019 McGraw-Hill Education. Variable Co-Movement Endogeneity: in the context of identification problems involves co-movement between an independent variable(s) and the error term in a data-generating process ‹#› © 2019 McGraw-Hill Education. Perfect multicollinearity always leads to an identification problem in regression analysis As an example, suppose, we believe that Sales of rocking chairs depends not only on price, but also on Distance from the designer’s location We follow the data-generating process: Salesi = α + β1Pricei + β2Distancei + Ui The population from which we are drawing suffers from perfect multicollinearity, creating an identification problem, particularly for β1 AND β2. Identification Problems ‹#› © 2019 McGraw-Hill Education. The presence of perfect multicollinearity is clear, since we can write one independent variable as a linear function for another for every element in the population: Pricei = 200 + 0.04 × Distancei The identification problem comes from the fact that we cannot separately estimate β1 and β2 – the marginal effect of Price and Distance on sales The data-generating process becomes: Salesi = α + β1(200 + 0.04 × Distancei)+ β2Distancei + Ui Salesi = (α + β1200) + (0.04β1 + β2) Distancei + Ui Perfect Multicollinearity ‹#› © 2019 McGraw-Hill Education. Three ways to detect perfect multicollinearity A known linear relationship among two or more independent variables Recognize misuse of dummy variables Let the data reveal it Perfect Multicollinearity ‹#› © 2019 McGraw-Hill Education. Imperfect multicollinearity does not cause an identification problem, it can create challenges with inference imperfect multicollinearity can generate inflated p-values and confidence intervals, making it difficult to make any strong inductive arguments about population parameters Because there is not an identification problem, these challenges go away with enough data Imperfect Multicollinearity ‹#› © 2019 McGraw-Hill Education. To illustrate, imperfect multicollinearity, suppose, Price has a near-perfect linear relationship with Distance: Pricei = 200 + 0.04 × Distancei + Vi, where Vi contains other factors such as local fuel costs, etc. A customer at a Distance of 2,000 miles might have a value for V of 3 and so face a Price of 200 + 0.04 × 2,000 + 3 = $283 A customer at a Distance of 400 miles might have a value for V of -2 and so face a Price of 200 + 0.04 × 400 ‒ 2 = $69 Price and Distance have imperfect multicollinearity Imperfect Multicollinearity: An Example ‹#› © 2019 McGraw-Hill Education. Assume the following data-generating process: Salesi = α + β1Pricei + β2Distancei + Ui There is not perfect multicollinearity so we can get estimates of all the parameters when regressing Sales on Price and Distance Imperfect Multicollinearity ‹#› © 2019 McGraw-Hill Education. Ways to check whether there is imperfect multicollinearity, and thus the possibility that this condition is inflating p-values and confidence intervals: Calculate semi-partial correlations among independent variables and check whether they are close to 1 Variance inflation factor (VIF) Imperfect Multicollinearity ‹#› © 2019 McGraw-Hill Education. Variation inflation factor (VIF) for an independent variable—say, —is equal to , where is the R-squared from regressing that independent variable (X1) on all other independent variables (X2,…,Xk) for a given determining function A higher VIF for a given variable implies more noise (less certainity) in its coefficient estimator VIF also tells us how much uncertainty this co-movement in the Xs is injecting into our estimators Variation Inflation Factor (VIF) ‹#› © 2019 McGraw-Hill Education. Endogeneity can lead to estimators that are not consistent Assume the following data-generating process: Yi = α + β1X1i +…+ βKXKi + Ui and there is a non-zero correlation between X1 and U This correlation means 1 from a regression of Y on X1,…, XK need not be consistent The inconsistency of 1 due to endogeneity amounts to endogeneity as an identification problem Endogeneity as an Identification Problem ‹#› © 2019 McGraw-Hill Education. WE HAVE, 1 APPROACH A NUMBER C ≠ 1 AS THE SAMPLE GETS LARGE Example of Inconsistent Estimator ‹#› © 2019 McGraw-Hill Education. The Effects of Variable Co-Movement on Identification For the data-generating process Yi = α + β1X1i +…+ βKXKi + Ui : If there exists an exact linear relationship between at least two of the independent variables (Xs), defined as perfect multicollinearity, then there is an identification problem In contract, if there is no exact linear relationship among the Xs, it is always possible to distinguish the effects of the independent variables on the outcome (Y) with any level of precision with sufficient data, even if some Xs exhibit imperfect multicollinearity If there is correlation between any independent variable and the error term, defined as endogeneity, then there is an identification problem, no matter whether the correlation is via an exact linear relationship or not ‹#› © 2019 McGraw-Hill Education. For perfect multicollinearity As long as our goal is to estimate the treatment effect and we have no particular interest in distinguishing the effects of controls, dropping one of the control variables contributing to perfect multicollinearity is an effective remedy The only viable remedy when the treatment contributes to a perfect multicollinearity problem is to change the population from which you are sampling Remedies for Identification Problems ‹#› © 2019 McGraw-Hill Education. Remedies for Identification Problems For imperfect multicollinearity If data are suffering from noisy estimates and VIF calculations suggest imperfect multicollinearity, the simple solution is to gather more data If the imperfect multicollinearity involves only controls and there is no interest in estimating the effects of the controls per se, then collecting more data will not necessarily be worthwhile ‹#› © 2019 McGraw-Hill Education. Remedies for Identification Problems For endogeneity The only viable remedy is to change the population from which you are sampling It does not matter whether the endogeneity involves the treatment or not Options include: collecting controls, finding a proxy variable(s), finding an instrument(s), and/or transforming cross-sectional data to become a panel ‹#› © 2019 McGraw-Hill Education. Suppose we have assumed the following data-generating process: Yi = α + β1X1i +…+ βKXKi + Ui Let X1 be the treatment and X2, … , XK be controls Suppose that there is an omitted variable XK+1, that affects Y (and so is part of U) and is correlated with X1 The data generating process can be written as: Yi = α + β1X1i +…+ βKXKi + βK+1XK+1i + Vi Identification Damage Control: Signing the Bias ‹#› © 2019 McGraw-Hill Education. Let XK+1 = + X1i + …+ XKi be the estimated regression equation we get if we were to regress XK+1on X1, …, XK Within this framework, define βK+1 × as the omitted variable bias Omitted variable bias is the product of the effect of the omitted variable on the outcome (βK+1) and the (semi – partial) correlation between the omitted variable and the treatment () Identification Damage Control: Signing the Bias ‹#› © 2019 McGraw-Hill Education. Since we do not observe the omitted variable, we cannot estimate either of the components of omitted variable bias We often can use theory to guide us with regard to the sign of each component. The basic relationship is: sign(βK+1 × ) = sign(βK+1) × sign() Identification Damage Control: Signing the Bias ‹#› © 2019 McGraw-Hill Education. Identification Damage Control: Signing the Bias The four possibilities for the sign of the omitted variable bias is shown in the table below: ‹#› © 2019 McGraw-Hill Education. image1 image2.JPG image3 image4.JPG image5.JPG image6.JPG image7.JPG image8.JPG image9.JPG image10 image11 image12 image13.JPG image14.JPG image15 image16 image17 image18.JPG image19 image20 image21.JPG

Calculate the price of your order

Select your paper details and see how much our professional writing services will cost.

We`ll send you the first draft for approval by at
Price: $36
  • Freebies
  • Format
  • Formatting (MLA, APA, Chicago, custom, etc.)
  • Title page & bibliography
  • 24/7 customer support
  • Amendments to your paper when they are needed
  • Chat with your writer
  • 275 word/double-spaced page
  • 12 point Arial/Times New Roman
  • Double, single, and custom spacing
  • We care about originality

    Our custom human-written papers from top essay writers are always free from plagiarism.

  • We protect your privacy

    Your data and payment info stay secured every time you get our help from an essay writer.

  • You control your money

    Your money is safe with us. If your plans change, you can get it sent back to your card.

How it works

  1. 1
    You give us the details
    Complete a brief order form to tell us what kind of paper you need.
  2. 2
    We find you a top writer
    One of the best experts in your discipline starts working on your essay.
  3. 3
    You get the paper done
    Enjoy writing that meets your demands and high academic standards!

Samples from our advanced writers

Check out some essay pieces from our best essay writers before your place an order. They will help you better understand what our service can do for you.

  • Analysis (any type)
    Advantages and Disadvantages of Lowering the Voting Age to Thirteen
    Undergrad. (yrs 1-2)
    Political science
  • Coursework
    Undergrad. (yrs 1-2)
    Business Studies
  • Essay (any type)
    Is Pardoning Criminals Acceptable?
    Undergrad. (yrs 1-2)
    Criminal Justice

Get your own paper from top experts

Order now

Perks of our essay writing service

We offer more than just hand-crafted papers customized for you. Here are more of our greatest perks.

  • Swift delivery
    Our writing service can deliver your short and urgent papers in just 4 hours!
  • Professional touch
    We find you a pro writer who knows all the ins and outs of your subject.
  • Easy order placing/tracking
    Create a new order and check on its progress at any time in your dashboard.
  • Help with any kind of paper
    Need a PhD thesis, research project, or a two-page essay? For you, we can do it all.
  • Experts in 80+ subjects
    Our pro writers can help you with anything, from nursing to business studies.
  • Calculations and code
    We also do math, write code, and solve problems in 30+ STEM disciplines.

Frequently asked questions

Get instant answers to the questions that students ask most often.

See full FAQ
  • Is there a possibility of plagiarism in my completed order?

    We complete each paper from scratch, and in order to make you feel safe regarding its authenticity, we check our content for plagiarism before its delivery. To do that, we use our in-house software, which can find not only copy-pasted fragments, but even paraphrased pieces of text. Unlike popular plagiarism-detection systems, which are used by most universities (e.g. Turnitin.com), we do not report to any public databases—therefore, such checking is safe.

    We provide a plagiarism-free guarantee that ensures your paper is always checked for its uniqueness. Please note that it is possible for a writing company to guarantee an absence of plagiarism against open Internet sources and a number of certain databases, but there is no technology (except for turnitin.com itself) that could guarantee no plagiarism against all sources that are indexed by turnitin. If you want to be 100% sure of your paper’s originality, we suggest you check it using the WriteCheck service from turnitin.com and send us the report.

  • I received some comments from my teacher. Can you help me with them?

    Yes. You can have a free revision during 7 days after you’ve approved the paper. To apply for a free revision, please press the revision request button on your personal order page. You can also apply for another writer to make a revision of your paper, but in such a case, we can ask you for an additional 12 hours, as we might need some time to find another writer to work on your order.

    After the 7-day period, free revisions become unavailable, and we will be able to propose only the paid option of a minor or major revision of your paper. These options are mentioned on your personal order page.

  • How will I receive a completed paper?

    You will get the first version of your paper in a non-editable PDF format within the deadline. You are welcome to check it and inform us if any changes are needed. If everything is okay, and no amendments are necessary, you can approve the order and download the .doc file. If there are any issues you want to change, you can apply for a free revision and the writer will amend the paper according to your instructions. If there happen to be any problems with downloading your paper, please contact our support team.
  • Where do I upload files?

    When you submit your first order, you get a personal account where you can track all your orders, their statuses, your payments, and discounts. Among other options, you will have a possibility to communicate with your writer via a special messenger. You will be able to upload all information and additional materials on your paper using the “Files” tab on your personal page. Please consider uploading everything you find necessary for our writer to perform at the highest standard.
See full FAQ

Take your studies to the next level with our experienced specialists

Live Chat+1 (857) 777-1210 EmailWhatsApp