Introduction:
We have carried out the research work to predict a car's price based on the analysis that considers various features like horse power , body type , engine size , fuel type and many more. We began with 12 capabilities including car price then based on our evaluation we select out the most essential features that affected the deal majorly after which we concluded our results in the form of linear regression. Also we were able to find out how all factors affected each other.
Final Model:
Price = -6252.36*Vehicle_type_car + 204.93*Horsepower -319.63*Fuelcapacity + 31096.61
Data Description:
Our data consists of 143 observations each representing a car. Our dependent variable is price and vehicle type, horsepower, width, length, fuel capacity and fuel efficiency, etc are our independent variables. Our car price ranges from (9235-85500) with (M = 27444), Engine size ranges from (1-8) with (M = 3.06), Horsepower ranges from (55-450) with (M = 175), Wheelbase ranges from (82.6-138.7) with (M = 107.4), Width ranges from (62.6-79.9) with (M = 73.1), Length ranges from (149.4-224.5) with (M = 187.2), Fuel capacity ranges from (10.3-32) with (M = 17.96), fuel efficiency ranges from (15-45) with (M = 23.83), Power_perf_factor ranges from (23.28-188.14) with (M = 76.96).
Correlation:
Car price has the highest positive correlation with Power_perf_factor of 0.90 and horsepower with 0.84 and engine size with 0.63
Figure 1: Correlation Heat Map

Figure 2: Distribution Plot







Result:
We implemented two linear regression models for the analysis;
In the first model as shown in table 4, we used all the variables present in our dataset like Vehicle type, engine size, horsepower, wheelbase, width, length, fuel capacity and fuel efficiency.
R-squared = 0.78 and F-statistics (8,144) = 64.25; p-value < .001.
In our second model as shown in table 5, we used all the significant variables from model 1 like Vehicle type, horsepower, length and fuel capacity.
R-squared = 0.77 and F-statistics (4,148) = 124.1; p-value < .001.
Considering the F-statistics of both the models we conclude that second model performs better in terms of the first model and in our second model all the variables used are statistically significant in the equation having the p-value <.001 respectively.
Final Model:
Price = -6252.36*Vehicle_type_car + 204.93*Horsepower -319.63*Fuelcapacity + 31096.61
Appendix:
Table 1: Attribute Information:
Attribute: Attribute Range:
—————— ———————————————–
1. Manufactures: alfa-romero, audi, bmw, chevrolet, dodge, honda,
isuzu, jaguar, mazda, mercedes-benz, mercury,
mitsubishi, nissan, peugot, plymouth, porsche,
renault, saab, subaru, toyota, volkswagen, volvo
2. wheel-base: continuous from 86.6 120.9.
3. length: continuous from 141.1 to 208.1.
4. width: continuous from 60.3 to 72.3.
5. engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
6. engine-size: continuous from 61 to 326.
7. horsepower: continuous from 48 to 288.
8. peak-rpm: continuous from 4150 to 6600.
9. Vehicle_type: Passenger and Car
9. price: continuous from 5118 to 45400.
Table 2: Descriptive Statistics
Table 3: Correlation Matrix
1 Sales_in_thousands Price -0.30
2 Vehicle_type_Car Price -0.05
3 Vehicle_type_Passenger Price 0.05
4 Price_in_thousands Price 1.00
5 Engine_size Price 0.63
6 Horsepower Price 0.84
7 Wheelbase Price 0.11
8 Width Price 0.33
9 Length Price 0.16
10 Fuel_capacity Price 0.42
11 Fuel_efficiency Price -0.49
12 Power_perf_factor Price 0.90
13 Price Price 1.00
Table 4: Regression Model 1 result:
lm(formula = Price ~ Vehicle_type_Car + Engine_size + Horsepower + Wheelbase + Width + Length + Fuel_capacity + Fuel_efficiency, data = df)
Residuals:
Min 1Q Median 3Q Max
-17684.5 -3123.0 -449.4 1947.6 29432.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41829.16 18759.15 2.230 0.027308*
Vehicle_type_Car -5897.94 2167.36 -2.721 0.007306**
Engine_size -2352.35 1297.91 -1.812 0.072006
Horsepower 239.26 21.28 11.244 < 2e-16 ***
Wheelbase 101.21 163.17 0.620 0.536035
Width -371.07 283.21 -1.310 0.192207
Length -285.72 91.89 -3.109 0.002260**
Fuel_capacity 1183.28 296.23 3.995 0.000103***
Fuel_efficiency -95.41 274.71 -0.347 0.728853
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6939 on 144 degrees of freedom
Multiple R-squared: 0.7812,
Adjusted R-squared: 0.769
F-statistic: 64.25 on 8 and 144 DF,
p-value: < 2.2e-16
Table 5: Regression Model 2 result:
lm(formula = Price ~ Vehicle_type_Car + Horsepower + Length + Fuel_capacity, data = df)
Residuals:
Min 1Q Median 3Q Max
-19715.6 -3480.9 -431.2 2250.7 29771.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31096.61 8294.27 3.749 0.000254 ***
Vehicle_type_Car -6252.36 1836.62 -3.404 0.000854 ***
Horsepower 204.93 12.69 16.147 < 2e-16 ***
Length -319.63 53.81 -5.940 1.95e-08 ***
Fuel_capacity 1102.71 267.02 4.130 6.05e-05 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7011 on 148 degrees of freedom
Multiple R-squared: 0.7704,
Adjusted R-squared: 0.7642
F-statistic: 124.1 on 4 and 148 DF,
p-value: < 2.2e-16