Mobile Technologies Web Technologies Digital Marketing Creative Design Career Talk Medical General Tech Interviews Internet Of Things

Embark on a journey of knowledge! Take the quiz and earn valuable credits.

Take A Quiz

Challenge yourself and boost your learning! Start the quiz now to earn credits.

Take A Quiz

Unlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.

Take A Quiz

Popular Categories

Interview Questions

Quantitative Aptitude

Digital Marketing

In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?

Course Queries Syllabus Queries 3 years ago

5.77K 2 0 0 0

Manpreet Singh

Previous Next

User submissions are the sole responsibility of contributors, with TuteeHUB disclaiming liability for accuracy, copyrights, or consequences of use; content is for informational purposes only and not professional advice.

Answers (2)

Post Answer

manpreet Best Answer 3 years ago

Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?

0 views

0 shares

manpreet 3 years ago

I always hesitate to jump into a thread with as many excellent responses as this, but it strikes me that few of the answers provide any reason to prefer the logarithm to some other transformation that "squashes" the data, such as a root or reciprocal.

Before getting to that, let's recapitulate the wisdom in the existing answers in a more general way.Some non-linear re-expression of the dependent variable is indicated when any of the following apply:

The residuals have a skewed distribution. The purpose of a transformation is to obtain residuals that are approximately symmetrically distributed (about zero, of course).
The spread of the residuals changes systematically with the values of the dependent variable ("heteroscedasticity"). The purpose of the transformation is to remove that systematic change in spread, achieving approximate "homoscedasticity."
To linearize a relationship.
When scientific theory indicates. For example, chemistry often suggests expressing concentrations as logarithms (giving activities or even the well-known pH).
When a more nebulous statistical theory suggests the residuals reflect "random errors" that do not accumulate additively.
To simplify a model. For example, sometimes a logarithm can simplify the number and complexity of "interaction" terms.

(These indications can conflict with one another; in such cases, judgment is needed.)

So, when is a logarithm specifically indicated instead of some other transformation?

The residuals have a "strongly" positively skewed distribution. In his book on EDA, John Tukey provides quantitative ways to estimate the transformation (within the family of Box-Cox, or power, transformations) based on rank statistics of the residuals. It really comes down to the fact that if taking the log symmetrizes the residuals, it was probably the right form of re-expression; otherwise, some other re-expression is needed.
When the SD of the residuals is directly proportional to the fitted values (and not to some power of the fitted values).
When the relationship is close to exponential.
When residuals are believed to reflect multiplicatively accumulating errors.
You really want a model in which marginal changes in the explanatory variables are interpreted in terms of multiplicative (percentage) changes in the dependent variable.

Finally, some non - reasons to use a re-expression:

Making outliers not look like outliers. An outlier is a datum that does not fit some parsimonious, relatively simple description of the data. Changing one's description in order to make outliers look better is usually an incorrect reversal of priorities: first obtain a scientifically valid, statistically good description of the data and then explore any outliers. Don't let the occasional outlier determine how to describe the rest of the data!
Because the software automatically did it. (Enough said!)
Because all the data are positive. (Positivity often implies positive skewness, but it does not have to. Furthermore, other transformations can work better. For example, a root often works best with counted data.)
To make "bad" data (perhaps of low quality) appear well behaved.
To be able to plot the data. (If a transformation is needed to be able to plot the data, it's probably needed for one or more good reasons already mentioned. If the only reason for the transformation truly is for plotting, go ahead and do it--but only to plot the data. Leave the data untransformed for analysis.)

0 views 0 shares

No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.

Similar Forum

Q

Neet 2019 syllabus will change?

Q

Syllabus in LaTeX

Q

Does Hogwarts follow any specific syllabus for DADA?

View All

Explore Other Libraries

Online Exams

Question Bank

Career News

Feeds

Full Forms

Dictionary

Interview Question

Gigs

Quotes

Lyrics

Videos

Courses

Blogs

Tutorials

Forum

Educators

Corporates

Tools

Related Searches

Course Queries Syllabus Queries

Important Course Queries Links

Syllabus Queries

Competitions/Entrance Exams