Difference between revisions of "Team:AHUT China/Model"

Line 219: Line 219:
 
<p style="font-family: 'Arial Unicode MS', 'Microsoft YaHei UI', 'Microsoft YaHei UI Light', '华文细黑', '微软雅黑', '幼圆';  font-size: 18px;">&nbsp;&nbsp;&nbsp;&nbsp;According to the previous analysis.Firstly, we established a linear regression model,which is consistent with the problem.by using the experimental data (ie, Table 1) and linear regression with matlab, we obtain a linear regression equation between carbon dioxide concentration and color reading.</p>
 
<p style="font-family: 'Arial Unicode MS', 'Microsoft YaHei UI', 'Microsoft YaHei UI Light', '华文细黑', '微软雅黑', '幼圆';  font-size: 18px;">&nbsp;&nbsp;&nbsp;&nbsp;According to the previous analysis.Firstly, we established a linear regression model,which is consistent with the problem.by using the experimental data (ie, Table 1) and linear regression with matlab, we obtain a linear regression equation between carbon dioxide concentration and color reading.</p>
 
<br>
 
<br>
<p style="font-family: 'Arial Unicode MS', 'Microsoft YaHei UI', 'Microsoft YaHei UI Light', '华文细黑', '微软雅黑', '幼圆'; font-style: normal; font-weight: 400; font-size: 20px; text-align: left;">  &nbsp;&nbsp;&nbsp;<strong style="font-family: Segoe, 'Segoe UI', 'DejaVu Sans', 'Trebuchet MS', Verdana, sans-serif; font-style: normal; font-weight: 400;color: #000000;">1.Multiple linear regression model:
+
<p style="font-family: 'Arial Unicode MS', 'Microsoft YaHei UI', 'Microsoft YaHei UI Light', '华文细黑', '微软雅黑', '幼圆'; font-style: normal; font-weight: 400; font-size: 20px; text-align: left;">  &nbsp;&nbsp;&nbsp;<strong style="font-family: Segoe, 'Segoe UI', 'DejaVu Sans', 'Trebuchet MS', Verdana, sans-serif; font-style: normal; font-weight: 400;color: #000000;">1.Multiple linear regression model
 
</strong></h2>
 
</strong></h2>
 
<p style="font-family: 'Arial Unicode MS', 'Microsoft YaHei UI', 'Microsoft YaHei UI Light', '华文细黑', '微软雅黑', '幼圆';  font-size: 18px;">&nbsp;&nbsp;&nbsp;&nbsp;By using multiple linear regression (see Appendix 1 for the code),we plot the residuals (see Figure 1). As we can be seen from the residual plot, except for the 15th data, the residuals of the remaining data are close to zero, and the confidence interval of the residuals contains zero points, which indicates that the regression model can better match the original data, and this data can be regarded as the abnormal point (cull). After the rejection, the multiple linear regression is performed again to obtain the residual plot (see Figure 2),the significance test indicators of the regression equation (see Table 1) and the specific residual values (see Table 2). From the table 1: correlation coefficient R ^ 2 = 0.9250310882931, indicating that the regression equation is significant. According to the test of F, the probability of F corresponds to p < α, rejecting H0, and the regression model (VIII) established. However, the estimated error variance is too large.
 
<p style="font-family: 'Arial Unicode MS', 'Microsoft YaHei UI', 'Microsoft YaHei UI Light', '华文细黑', '微软雅黑', '幼圆';  font-size: 18px;">&nbsp;&nbsp;&nbsp;&nbsp;By using multiple linear regression (see Appendix 1 for the code),we plot the residuals (see Figure 1). As we can be seen from the residual plot, except for the 15th data, the residuals of the remaining data are close to zero, and the confidence interval of the residuals contains zero points, which indicates that the regression model can better match the original data, and this data can be regarded as the abnormal point (cull). After the rejection, the multiple linear regression is performed again to obtain the residual plot (see Figure 2),the significance test indicators of the regression equation (see Table 1) and the specific residual values (see Table 2). From the table 1: correlation coefficient R ^ 2 = 0.9250310882931, indicating that the regression equation is significant. According to the test of F, the probability of F corresponds to p < α, rejecting H0, and the regression model (VIII) established. However, the estimated error variance is too large.
Line 734: Line 734:
 
</p>
 
</p>
 
<br>
 
<br>
 
+
<br>
 
<p style="font-family: 'Arial Unicode MS', 'Microsoft YaHei UI', 'Microsoft YaHei UI Light', '华文细黑', '微软雅黑', '幼圆';  font-size: 20px;">References
 
<p style="font-family: 'Arial Unicode MS', 'Microsoft YaHei UI', 'Microsoft YaHei UI Light', '华文细黑', '微软雅黑', '幼圆';  font-size: 20px;">References
 
</p>
 
</p>

Revision as of 07:22, 12 October 2018

Royal Hotel Royal Hotel

    At present, our project is still in the laboratory stage and has not yet reach the large-scale application. In order to seek scenarios of large-scale application in a better way.With the help of the instructor group, we held a seminar with the Astronautics Model Team of Anhui University of Technology to develop a social practice program.

    We got in touch with a factory in Maanshan(Maanshan Steel Plant), and a six-rotor drone was provided by the model team, as shown in Picture 1 :




Figure 1 Six-rotor drone




    Zhao Lei, a member of our team who has studied embedded programming development, used different gas sensors (the gas sensor can detect the mass of carbon dioxide, sulfur dioxide and other gases in each liter) and STM32 ARM microcontroller to develope a mountable carbon dioxide detection device on the drone. As the picture shows:




Figure 2 Gas detector




    We used this drone to carry the detection device into the air and tested the air nearing the end of the factory's exhaust system. It was found that the concentration of carbon dioxide nearing the smoke extraction device was extremely high,and it is 5-20 times greater than the normal value.As mentioned in the eleven national standards for greenhouse gas management, including Greenhouse Gas Accounting and Reporting for Industrial Enterprises.The mass ratio of the various gases in the exhaust gas from the production process of such factories is about :oxygen: carbon dioxide: sulfur dioxide: hydrogen sulfide: Carbon monoxide: Hydrogen chloride: Fluoride: Nitrogen oxide: Other=14:10:3:3:3:2:3:8:54. After several measurements and averaging , we knew the composition,content and mass percentage of each gas in the factory exhaust gas.



Gas name

Content under standard conditions(mg/L)

Mass percentage

carbon dioxide

123.0025

10.2599%

oxygen

155.1683

12.9429%

Sulfur dioxide

48.5526

4.0499%

Hydrogen sulfide

56.8314

4.4704%

Carbon monoxide

38.2593

3.1913%

Hydrogen chloride

25.9654

2.1658%

Fluoride

46.9342

3.9149%

Nitrogen oxides

96.2349

8.0272%

other

607.9167

50.7077%

total

1198.8653

100.0000%



    The detected gas content proves the correctness of the literature data.According to the proportion of gases in the exhaust gas in the literature.We conducted the simulation in the laboratory.A proportionate gas is manually mixed and passed through water to prepare an unsaturated solution. Depending on the time of access, the amount of carbon dioxide in the solution is continuously tested and used as a source of data for our mathematical modeling.

    First of all.By using MATLAB to conduct the correlation analysis of the experimental data,we found that the color readings (five dimensions: B, G, R, H, S) showed a certain linear correlation with the concentration of carbon dioxide.This conclusion is consistent with the literature [1],which obtains its conclusion by using lambert-beer's absorption law. That is, there is a certain relationship between the substance concentration and the color reading. Secondly, using the multiple regression of statistics to carry out regression analysis on the data, the relationship between the material concentration and the color reading (five-dimensional) is obtained, and the appropriate mathematical expression (or mathematical model) between them is determined as the empirical formula or Regression equation.

    A mathematical model for determining color readings and carbon dioxide concentration - a linear regression equation. Firstly, a linear regression model between carbon dioxide concentration and color reading is established. The residual of the model is large and the fitting effect is not good.

    Considering establishing a nonlinear quadratic regression model.Using the rstool function modeling in the MATLAB statistical toolbox, and evaluating the pros and cons of the model by residual standard deviation and residual. In the final nonlinear quadratic regression model, the residual standard deviation is small, the prediction model is very good, and the residual of the model is reduced by an order of magnitude compared with the multiple linear regression model. Therefore, the linear quadratic regression model is better than the linear regression model. The comparison of the errors of the two models shows that the nonlinear regression quadratic equation has higher precision.







   Model establishment and solution:


    According to the previous analysis.Firstly, we established a linear regression model,which is consistent with the problem.by using the experimental data (ie, Table 1) and linear regression with matlab, we obtain a linear regression equation between carbon dioxide concentration and color reading.


   1.Multiple linear regression model

    By using multiple linear regression (see Appendix 1 for the code),we plot the residuals (see Figure 1). As we can be seen from the residual plot, except for the 15th data, the residuals of the remaining data are close to zero, and the confidence interval of the residuals contains zero points, which indicates that the regression model can better match the original data, and this data can be regarded as the abnormal point (cull). After the rejection, the multiple linear regression is performed again to obtain the residual plot (see Figure 2),the significance test indicators of the regression equation (see Table 1) and the specific residual values (see Table 2). From the table 1: correlation coefficient R ^ 2 = 0.9250310882931, indicating that the regression equation is significant. According to the test of F, the probability of F corresponds to p < α, rejecting H0, and the regression model (VIII) established. However, the estimated error variance is too large.
y=2910.630153554265+3.587352490846x1-21.155917919245x2+4.796418968805x3-6.750902382498x4-10.532016102969x5  (Ⅷ)




concentration(mg/L) B G R H S
0 153 148 157 138 14
0 153 147 157 138 16
0 153 146 158 137 20
0 153 146 158 137 20
0 154 145 157 141 19
20 144 115 170 135 82
20 144 115 169 136 81
20 145 115 172 135 83
30 145 114 174 135 87
30 145 114 176 135 89
30 145 114 175 135 89
30 146 114 175 135 88
50 142 99 175 137 110
50 141 99 174 137 109
50 142 99 176 136 110
80 141 96 181 135 119
80 141 96 182 135 119
80 140 96 182 135 120
100 139 96 175 136 115
100 139 96 174 136 114
100 139 96 176 136 116
150 139 86 178 136 131
150 139 87 177 137 129
150 138 86 177 137 130
150 139 86 178 137 131

Table 1 Experimental data of carbon dioxide




Figure 3 Linear regression residual plote




Figure 4 Regression residuals of carbon dioxide concentration and color readings after eliminating abnormal points




Correlation coefficient R^2 F Probability P corresponding to F Estimated error variance
0.9250310882931 44.4199047583412 0.0000000016617 270.6516543935724

The significance test index of carbon dioxide linear regression equation




concentration(mg/L) Residual value
0 -2.384256481544441
0 -2.476142194851974
0 6.948682946475856
0 6.948682946475856
0 3.473424932212367
20 -14.672434139092047
20 -13.657128890758031
20 -17.320608464579323
30 4.058700090441562
30 15.529894358769411
30 20.326313327574439
30 6.206944733759315
50 -31.576255061221445
50 -33.724499704539312
80 -8.948829979216725
80 -13.745248948021299
80 0.374119645794281
100 11.627226785928087
100 5.891629651764106
100 17.362823920092069
150 4.191048334563675
150 15.830255399174575
150 8.793706073744261
150 10.941950717061900


    It can be seen from the residual value that the model has yet to be optimized. The multivariate linear regression model can continue to be optimized by eliminating the anomalous points in the new residual map. But the continued optimization is limited and the data integrity is getting worse. The results of the linear regression model require further optimization and improvement.So a multivariate nonlinear quadratic regression has been tried.










   2. Multiple quadratic regression model


2.1 Establishment and solution of multiple quadratic regression models

    A multivariate quadratic regression equation is established using rstool(x, y, 'model', alpha). The 'model' option refers to selecting one of the following four models (input with a string, which default is a linear model):
Linear:y=β_0+β_1 x_1+⋯β_m x_m
Purequadratic:y=β_0+β_1 x_1+⋯+β_m+∑_(j=1)^nβ_jj +x_j^2
Interaction:y=β_0+β_1 x_1+⋯+β_m x_m+∑_(1≤j≠k≤m)β_jk x_j x_k
Quadratic:y=β_0+β_1 x_1+⋯+β_m x_m+∑_(1≤j,k≤m)β_jk x_j x_k


    The function output includes regression parameters, residual standard deviation, and residuals. You can determine which is best by comparing the standard deviation of multiple models by modifying the value of model.
    This problem ends with a completely quadratic method for multivariate nonlinear quadratic regression.That is by using the model (IX)
y=β_0+β_1 x_1+β_2 x_2=β_3 x_3+β_4 x_4+β_5 x_5+β_6 x_1 x_2+β_7 x_1 x_3+β_8 x_1 x_4+β_9 x_1 x_5+β_10 x_2 x_3+β_11 x_2 x_4+β_12 x_2 x_5+ +β_13 x_3 x_4+β_14 x_3 x_5+β_15 x_4 x_5+β_16 x_1^2+β_17 x_2^2+β_18 x_3^2+β_19 x_4^2+β_20 x_5^2   (Ⅸ)

(Model (IX) where y represents the concentration,x1, x2, x3, x4, x5 representing the B, G, R, H, S)




Substituting data for multivariate quadratic regression fitting.The specific results are shown in Figure 3,Table3 and model (X).




Figure_5




Y=-229171.315611749-5684.26671497298B-304.823653309202G+4983.90599969629R+4477.91841203602H-1706.63700374936S-2.89310790233349BG-0.437552876691341BR+15.7179303724566BH+2.17845769137645BS-5.35551688411491GR+2.95991056538886GH+4.38735845493663GS-26.5095405408683RH-2.69440081387518RS+8.05373713434564HS+12.9717944022357B^2+3.83546582450501G^2-1.40181633583134R^2-11.9129488163979H^2+1.54303389696168S^2   (Ⅹ)




concentration(mg/L) Residual value
0 -0.183023306644486
0 0.295698988685444
0 -0.0573058051979842
0 -0.0573058051979842
0 -0.0112789762431476
20 0.483971591391310
20 0.0544367711663654
20 -0.617107404175840
30 0.438573762845408
30 0.223558673598745
30 -0.471712868260511
30 0.0136040522083931
50 0.551662284327904
50 -0.775350805535709
50 0.0956137145112734
80 0.0435429326025769
80 0.243119500199100
80 -0.357436173322640
100 -1.73816418466959
100 0.0231545045717212
100 1.60688363169902
150 0.0231545045717212
150 0.875543694299267
150 0.637870694485173
150 -1.34395668067009



2.2 Test of multiple quadratic regression model

    We test the quadratic regression model with the residual standard deviation. The regression residual e_i=Y_i-Y×i helps us to measure the degree of the regression model fitting the sample data. In order to use linear regression analysis, the regression residual standard deviation needs to be calculated. The regression residual standard deviation is the accuracy index used by the regression equation to do some predictions, an it can be used to test the reliability of the model prediction.The regression residual standard deviation (recorded as SY): S_Y=√((∑(Y_i-Y×i)^2 )/(n-2))=√(Q/(n-2)); If S_Y is close to 0, indicating the deviation of the model of the sample data is small,and the reliability (accuracy) of the prediction is higher.The larger the value of Sy,the larger the model deviates from the sample data,and the worse the reliability of the prediction is. In practical problems, SY tends to be large. To evaluate the pros and cons of the model, the index S/Y is usually used. When S/Y < 15%, the prediction model can be considered better. According to the results, we can calculate that the regression residual standard deviation RMSE= 1.65062261369908, S⁄Y=0.02821577117, and the prediction model is very good. Moreover, it can be seen from the residual value that the model fitting effect is very good. The original data is not eliminated, which ensures the integrity of the data,and the inadequacy should be the complexity of the equation.



References

[1]YANG Haiyan, JIA Guiru. A Method for Rapid Detection of Colored and Transparent Solution Concentration Based on Digital Colorimetry[J]. Journal of China Agricultural University, 2006, 11(3): 47-50.
[2] Wang Yan, Yan Silian, Wang Aiqing. Mathematical Statistics and MATLAB Engineering Data Analysis [M]. Beijing: Tsinghua University Press. 2006: 126-177
[3] National Standards Committee “National Greenhouse Gas Emissions Accounting and Reporting and Other 11 National Standards for Greenhouse Gas Management”
[4] The proportion of gas content comes from "Situation Analysis of Greenhouse Gas Emissions in China's Steel Industry", Zhang Li, Wang Pretty, Li Wei, Li Sujing, 2015, 12