# Car Price Prediction Analysis

` `
` `
`Introduction:`
`We have carried out the research work to predict a car's price based on the analysis that considers various features like horse power , body type , engine size , fuel type and many more. We began with 12 capabilities including car price then based on our evaluation we select out the most essential features that affected the deal majorly after which we concluded our results in the form of linear regression. Also we were able to find out how all factors affected each other.`
` `
`Final Model:`
`Price = -6252.36*Vehicle_type_car + 204.93*Horsepower -319.63*Fuelcapacity + 31096.61`

Data Description:

Our data consists of 143 observations each representing a car. Our dependent variable is price and vehicle type, horsepower, width, length, fuel capacity and fuel efficiency, etc are our independent variables. Our car price ranges from (9235-85500) with (M = 27444),  Engine size ranges from (1-8) with (M = 3.06), Horsepower ranges from (55-450) with (M = 175), Wheelbase ranges from (82.6-138.7) with (M = 107.4), Width ranges from (62.6-79.9) with (M = 73.1), Length ranges from (149.4-224.5) with (M = 187.2), Fuel capacity ranges from (10.3-32) with (M = 17.96), fuel efficiency ranges from (15-45) with (M = 23.83), Power_perf_factor ranges from (23.28-188.14) with (M = 76.96).

Correlation:

Car price has the highest positive correlation with Power_perf_factor of 0.90 and horsepower with 0.84 and engine size with 0.63

Figure 1: Correlation Heat Map

Figure 2: Distribution Plot

Result:

We implemented two linear regression models for the analysis;

In the first model as shown in table 4, we used all the variables present in our dataset like Vehicle type, engine size, horsepower, wheelbase, width, length, fuel capacity and fuel efficiency.

R-squared = 0.78 and F-statistics (8,144) = 64.25; p-value < .001.

In our second model as shown in table 5, we used all the significant variables from model 1 like Vehicle type, horsepower, length and fuel capacity.

R-squared = 0.77 and F-statistics (4,148) = 124.1; p-value < .001.

Considering the F-statistics of both the models we conclude that second model performs better in terms of the first model and in our second model all the variables used are statistically significant in the equation having the p-value <.001 respectively.

`Final Model:`
`Price = -6252.36*Vehicle_type_car + 204.93*Horsepower -319.63*Fuelcapacity + 31096.61`

Appendix:

Table 1: Attribute Information:

Attribute:                                Attribute Range:

——————                        ———————————————–

1. Manufactures:                   alfa-romero, audi, bmw, chevrolet, dodge, honda,

isuzu, jaguar, mazda, mercedes-benz, mercury,

mitsubishi, nissan, peugot, plymouth, porsche,

renault, saab, subaru, toyota, volkswagen, volvo

2. wheel-base:                               continuous from 86.6 120.9.

3. length:                                      continuous from 141.1 to 208.1.

4. width:                                      continuous from 60.3 to 72.3.

5. engine-type:                              dohc, dohcv, l, ohc, ohcf, ohcv, rotor.

6. engine-size:                              continuous from 61 to 326.

7. horsepower:                    continuous from 48 to 288.

8. peak-rpm:                      continuous from 4150 to 6600.

9. Vehicle_type:                            Passenger and Car

9. price:                                      continuous from 5118 to 45400.

` `
` `
`Table 2: Descriptive Statistics`
` `
` `
` `
`Table 3: Correlation Matrix`
` `
`1     Sales_in_thousands                               Price -0.30`
`2     Vehicle_type_Car                    Price -0.05`
`3     Vehicle_type_Passenger                Price 0.05`
`4     Price_in_thousands                   Price 1.00`
`5     Engine_size                                          Price 0.63`
`6     Horsepower                                          Price 0.84`
`7     Wheelbase                                            Price 0.11`
`8     Width                                                  Price 0.33`
`9     Length                                    Price 0.16`
`10   Fuel_capacity                                       Price 0.42`
`11   Fuel_efficiency                        Price -0.49`
`12   Power_perf_factor                    Price 0.90`
`13   Price                                                    Price 1.00`

Table 4: Regression Model 1 result:

`lm(formula = Price ~ Vehicle_type_Car + Engine_size + Horsepower + Wheelbase + Width + Length + Fuel_capacity + Fuel_efficiency, data = df)`
` `
` `
`Residuals:`
`     Min       1Q                    Median             3Q       Max `
`-17684.5           -3123.0            -449.4    1947.6    29432.0 `
` `
` `
`Coefficients:`
`                   Estimate  Std. Error         t value   Pr(>|t|)    `
`(Intercept)        41829.16           18759.15           2.230             0.027308*  `
`Vehicle_type_Car   -5897.94           2167.36   -2.721             0.007306** `
`Engine_size        -2352.35           1297.91   -1.812             0.072006 `
` `
`Horsepower          239.26             21.28              11.244   < 2e-16 ***`
`Wheelbase           101.21             163.17    0.620              0.536035    `
`Width              -371.07            283.21    -1.310             0.192207    `
`Length             -285.72            91.89              -3.109            0.002260** `
`Fuel_capacity      1183.28            296.23    3.995              0.000103***`
`Fuel_efficiency     -95.41             274.71    -0.347             0.728853    `
` `
`Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`
` `
`Residual standard error: 6939 on 144 degrees of freedom`
`Multiple R-squared:  0.7812, `
`Adjusted R-squared:  0.769 `
`F-statistic: 64.25 on 8 and 144 DF,  `
`p-value: < 2.2e-16`

Table 5: Regression Model 2 result:

lm(formula = Price ~ Vehicle_type_Car + Horsepower + Length + Fuel_capacity, data = df)

Residuals:

Min           1Q                 Median             3Q      Max

-19715.6           -3480.9            -431.2   2250.7    29771.9

Coefficients:

Estimate    Std. Error         t value   Pr(>|t|)

(Intercept)      31096.61             8294.27            3.749             0.000254 ***

Vehicle_type_Car -6252.36             1836.62  -3.404             0.000854 ***

Horsepower         204.93             12.69              16.147  < 2e-16 ***

Length            -319.63             53.81              -5.940            1.95e-08 ***

Fuel_capacity     1102.71             267.02   4.130              6.05e-05 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7011 on 148 degrees of freedom

Multiple R-squared:  0.7704,

F-statistic: 124.1 on 4 and 148 DF,

p-value: < 2.2e-16

June 26, 2022

June 22, 2022

June 18, 2022