At present, our project is still in the laboratory stage and has not yet reach the large-scale application. In order to seek scenarios of large-scale application in a better way.With the help of the instructor group, we held a seminar with the Astronautics Model Team of Anhui University of Technology to develop a social practice program.
We got in touch with a factory in Maanshan(Maanshan Steel Plant), and a six-rotor drone was provided by the model team, as shown in Picture 1 :
Figure 1 Six-rotor drone
Zhao Lei, a member of our team who has studied embedded programming development, used different gas sensors (the gas sensor can detect the mass of carbon dioxide, sulfur dioxide and other gases in each liter) and STM32 ARM microcontroller to develope a mountable carbon dioxide detection device on the drone. As the picture shows:
Figure 2 Gas detector
We used this drone to carry the detection device into the air and tested the air nearing the end of the factory's exhaust system. It was found that the concentration of carbon dioxide nearing the smoke extraction device was extremely high,and it is 5-20 times greater than the normal value.As mentioned in the eleven national standards for greenhouse gas management, including Greenhouse Gas Accounting and Reporting for Industrial Enterprises.The mass ratio of the various gases in the exhaust gas from the production process of such factories is about :oxygen: carbon dioxide: sulfur dioxide: hydrogen sulfide: Carbon monoxide: Hydrogen chloride: Fluoride: Nitrogen oxide: Other=14:10:3:3:3:2:3:8:54. After several measurements and averaging , we knew the composition,content and mass percentage of each gas in the factory exhaust gas.
Gas name |
Content under standard conditions(mg/L) |
Mass percentage |
carbon dioxide |
123.0025 |
10.2599% |
oxygen |
155.1683 |
12.9429% |
Sulfur dioxide |
48.5526 |
4.0499% |
Hydrogen sulfide |
56.8314 |
4.4704% |
Carbon monoxide |
38.2593 |
3.1913% |
Hydrogen chloride |
25.9654 |
2.1658% |
Fluoride |
46.9342 |
3.9149% |
Nitrogen oxides |
96.2349 |
8.0272% |
other |
607.9167 |
50.7077% |
total |
1198.8653 |
100.0000% |
The detected gas content proves the correctness of the literature data.According to the proportion of gases in the exhaust gas in the literature.We conducted the simulation in the laboratory.A proportionate gas is manually mixed and passed through water to prepare an unsaturated solution. Depending on the time of access, the amount of carbon dioxide in the solution is continuously tested and used as a source of data for our mathematical modeling.
First of all.By using MATLAB to conduct the correlation analysis of the experimental data,we found that the color readings (five dimensions: B, G, R, H, S) showed a certain linear correlation with the concentration of carbon dioxide.This conclusion is consistent with the literature [1],which obtains its conclusion by using lambert-beer's absorption law. That is, there is a certain relationship between the substance concentration and the color reading. Secondly, using the multiple regression of statistics to carry out regression analysis on the data, the relationship between the material concentration and the color reading (five-dimensional) is obtained, and the appropriate mathematical expression (or mathematical model) between them is determined as the empirical formula or Regression equation.
A mathematical model for determining color readings and carbon dioxide concentration - a linear regression equation. Firstly, a linear regression model between carbon dioxide concentration and color reading is established. The residual of the model is large and the fitting effect is not good.
Considering establishing a nonlinear quadratic regression model.Using the rstool function modeling in the MATLAB statistical toolbox, and evaluating the pros and cons of the model by residual standard deviation and residual. In the final nonlinear quadratic regression model, the residual standard deviation is small, the prediction model is very good, and the residual of the model is reduced by an order of magnitude compared with the multiple linear regression model. Therefore, the linear quadratic regression model is better than the linear regression model. The comparison of the errors of the two models shows that the nonlinear regression quadratic equation has higher precision.
Model establishment and solution:
According to the previous analysis.Firstly, we established a linear regression model,which is consistent with the problem.by using the experimental data (ie, Table 1) and linear regression with matlab, we obtain a linear regression equation between carbon dioxide concentration and color reading.
1.Multiple linear regression model
By using multiple linear regression (see Appendix 1 for the code),we plot the residuals (see Figure 1). As we can be seen from the residual plot, except for the 15th data, the residuals of the remaining data are close to zero, and the confidence interval of the residuals contains zero points, which indicates that the regression model can better match the original data, and this data can be regarded as the abnormal point (cull). After the rejection, the multiple linear regression is performed again to obtain the residual plot (see Figure 2),the significance test indicators of the regression equation (see Table 1) and the specific residual values (see Table 2). From the table 1: correlation coefficient R ^ 2 = 0.9250310882931, indicating that the regression equation is significant. According to the test of F, the probability of F corresponds to p < α, rejecting H0, and the regression model (VIII) established. However, the estimated error variance is too large.
y=2910.630153554265+3.587352490846x1-21.155917919245x2+4.796418968805x3-6.750902382498x4-10.532016102969x5 (Ⅷ)
concentration(mg/L) | B | G | R | H | S |
---|---|---|---|---|---|
0 | 153 | 148 | 157 | 138 | 14 |
0 | 153 | 147 | 157 | 138 | 16 |
0 | 153 | 146 | 158 | 137 | 20 |
0 | 153 | 146 | 158 | 137 | 20 |
0 | 154 | 145 | 157 | 141 | 19 |
20 | 144 | 115 | 170 | 135 | 82 |
20 | 144 | 115 | 169 | 136 | 81 |
20 | 145 | 115 | 172 | 135 | 83 |
30 | 145 | 114 | 174 | 135 | 87 |
30 | 145 | 114 | 176 | 135 | 89 |
30 | 145 | 114 | 175 | 135 | 89 |
30 | 146 | 114 | 175 | 135 | 88 |
50 | 142 | 99 | 175 | 137 | 110 |
50 | 141 | 99 | 174 | 137 | 109 |
50 | 142 | 99 | 176 | 136 | 110 |
80 | 141 | 96 | 181 | 135 | 119 |
80 | 141 | 96 | 182 | 135 | 119 |
80 | 140 | 96 | 182 | 135 | 120 |
100 | 139 | 96 | 175 | 136 | 115 |
100 | 139 | 96 | 174 | 136 | 114 |
100 | 139 | 96 | 176 | 136 | 116 |
150 | 139 | 86 | 178 | 136 | 131 |
150 | 139 | 87 | 177 | 137 | 129 |
150 | 138 | 86 | 177 | 137 | 130 |
150 | 139 | 86 | 178 | 137 | 131 |
Table 1 Experimental data of carbon dioxide
Figure 3 Linear regression residual plote
Figure 4 Regression residuals of carbon dioxide concentration and color readings after eliminating abnormal points
Correlation coefficient R^2 | F | Probability P corresponding to F | Estimated error variance |
---|---|---|---|
0.9250310882931 | 44.4199047583412 | 0.0000000016617 | 270.6516543935724 |
The significance test index of carbon dioxide linear regression equation
concentration(mg/L) | Residual value |
---|---|
0 | -2.384256481544441 |
0 | -2.476142194851974 |
0 | 6.948682946475856 |
0 | 6.948682946475856 |
0 | 3.473424932212367 |
20 | -14.672434139092047 |
20 | -13.657128890758031 |
20 | -17.320608464579323 |
30 | 4.058700090441562 |
30 | 15.529894358769411 |
30 | 20.326313327574439 |
30 | 6.206944733759315 |
50 | -31.576255061221445 |
50 | -33.724499704539312 |
80 | -8.948829979216725 |
80 | -13.745248948021299 |
80 | 0.374119645794281 |
100 | 11.627226785928087 |
100 | 5.891629651764106 |
100 | 17.362823920092069 |
150 | 4.191048334563675 |
150 | 15.830255399174575 |
150 | 8.793706073744261 |
150 | 10.941950717061900 |
It can be seen from the residual value that the model has yet to be optimized. The multivariate linear regression model can continue to be optimized by eliminating the anomalous points in the new residual map. But the continued optimization is limited and the data integrity is getting worse. The results of the linear regression model require further optimization and improvement.So a multivariate nonlinear quadratic regression has been tried.
2. Multiple quadratic regression model
2.1 Establishment and solution of multiple quadratic regression models
A multivariate quadratic regression equation is established using rstool(x, y, 'model', alpha). The 'model' option refers to selecting one of the following four models (input with a string, which default is a linear model):
Linear:y=β_0+β_1 x_1+⋯β_m x_m
Purequadratic:y=β_0+β_1 x_1+⋯+β_m+∑_(j=1)^nβ_jj +x_j^2
Interaction:y=β_0+β_1 x_1+⋯+β_m x_m+∑_(1≤j≠k≤m)β_jk x_j x_k
Quadratic:y=β_0+β_1 x_1+⋯+β_m x_m+∑_(1≤j,k≤m)β_jk x_j x_k
The function output includes regression parameters, residual standard deviation, and residuals. You can determine which is best by comparing the standard deviation of multiple models by modifying the value of model.
This problem ends with a completely quadratic method for multivariate nonlinear quadratic regression.That is by using the model (IX)
y=β_0+β_1 x_1+β_2 x_2=β_3 x_3+β_4 x_4+β_5 x_5+β_6 x_1 x_2+β_7 x_1 x_3+β_8 x_1 x_4+β_9 x_1 x_5+β_10 x_2 x_3+β_11 x_2 x_4+β_12 x_2 x_5+
+β_13 x_3 x_4+β_14 x_3 x_5+β_15 x_4 x_5+β_16 x_1^2+β_17 x_2^2+β_18 x_3^2+β_19 x_4^2+β_20 x_5^2 (Ⅸ)
(Model (IX) where y represents the concentration,x1, x2, x3, x4, x5 representing the B, G, R, H, S)
Substituting data for multivariate quadratic regression fitting.The specific results are shown in Figure 3,Table3 and model (X).
Figure_5
Y=-229171.315611749-5684.26671497298B-304.823653309202G+4983.90599969629R+4477.91841203602H-1706.63700374936S-2.89310790233349BG-0.437552876691341BR+15.7179303724566BH+2.17845769137645BS-5.35551688411491GR+2.95991056538886GH+4.38735845493663GS-26.5095405408683RH-2.69440081387518RS+8.05373713434564HS+12.9717944022357B^2+3.83546582450501G^2-1.40181633583134R^2-11.9129488163979H^2+1.54303389696168S^2 (Ⅹ)
concentration(mg/L) | Residual value |
---|---|
0 | -0.183023306644486 |
0 | 0.295698988685444 |
0 | -0.0573058051979842 |
0 | -0.0573058051979842 |
0 | -0.0112789762431476 |
20 | 0.483971591391310 |
20 | 0.0544367711663654 |
20 | -0.617107404175840 |
30 | 0.438573762845408 |
30 | 0.223558673598745 |
30 | -0.471712868260511 |
30 | 0.0136040522083931 |
50 | 0.551662284327904 |
50 | -0.775350805535709 |
50 | 0.0956137145112734 |
80 | 0.0435429326025769 |
80 | 0.243119500199100 |
80 | -0.357436173322640 |
100 | -1.73816418466959 |
100 | 0.0231545045717212 |
100 | 1.60688363169902 |
150 | 0.0231545045717212 |
150 | 0.875543694299267 |
150 | 0.637870694485173 |
150 | -1.34395668067009 |
2.2 Test of multiple quadratic regression model
We test the quadratic regression model with the residual standard deviation. The regression residual e_i=Y_i-Y×i helps us to measure the degree of the regression model fitting the sample data. In order to use linear regression analysis, the regression residual standard deviation needs to be calculated. The regression residual standard deviation is the accuracy index used by the regression equation to do some predictions, an it can be used to test the reliability of the model prediction.The regression residual standard deviation (recorded as SY): S_Y=√((∑(Y_i-Y×i)^2 )/(n-2))=√(Q/(n-2)); If S_Y is close to 0, indicating the deviation of the model of the sample data is small,and the reliability (accuracy) of the prediction is higher.The larger the value of Sy,the larger the model deviates from the sample data,and the worse the reliability of the prediction is. In practical problems, SY tends to be large. To evaluate the pros and cons of the model, the index S/Y is usually used. When S/Y < 15%, the prediction model can be considered better. According to the results, we can calculate that the regression residual standard deviation RMSE= 1.65062261369908, S⁄Y=0.02821577117, and the prediction model is very good. Moreover, it can be seen from the residual value that the model fitting effect is very good. The original data is not eliminated, which ensures the integrity of the data,and the inadequacy should be the complexity of the equation.
References
[1]YANG Haiyan, JIA Guiru. A Method for Rapid Detection of Colored and Transparent Solution Concentration Based on Digital Colorimetry[J]. Journal of China Agricultural University, 2006, 11(3): 47-50.
[2] Wang Yan, Yan Silian, Wang Aiqing. Mathematical Statistics and MATLAB Engineering Data Analysis [M]. Beijing: Tsinghua University Press. 2006: 126-177
[3] National Standards Committee “National Greenhouse Gas Emissions Accounting and Reporting and Other 11 National Standards for Greenhouse Gas Management”
[4] The proportion of gas content comes from "Situation Analysis of Greenhouse Gas Emissions in China's Steel Industry", Zhang Li, Wang Pretty, Li Wei, Li Sujing, 2015, 12