摘要: |
采用正辛醇/水分配系数logKow、最低未占有轨道能ELUMO、最高占有轨道能EHOMO等参数为描述变量,结合支持向量机算法,选择具有最大(Leave-One-Out,LOO)交互检验识别率,兼顾支持向量样本数和边界支持向量样本数为标准进行支持向量机建模参数优化,建立3组化合物水生毒性作用模式的SVM分类模型,分别对样本数为190、221、88的化合物水生毒性作用模式进行分类研究。结果表明,选用RBF核函数和C-SVC方法,3组参数分别为C=512、γ=2.048,C=512、γ=2.048,C=512、γ=0.512时,建立3个体系的SVM分类模型对全部样本的错误误别个数分别为0、2、0个,训练集模型对全部样本的错误识别个数分别为9,17,7。分类的效果与化合物描述子的选择和数量有关,如果增加合适的分子描述子,其分类结果相应地会得到改善。 |
关键词: 有机化合物 毒性作用 分类 支持向量机 |
DOI: |
投稿时间:2005-06-02修订日期:2005-08-30 |
基金项目:广西新世纪十百千人才计划;广西高校百名中青年学科带头人资助项目(桂教人2003年97号文);桂林工学院青年扶持基金(桂工院科[2004]8号文)联合资助。 |
|
Study of Support Vector Machine Classification of Model of Toxicity Action of Organic Compounds |
|
|
Abstract: |
Taking the octanol-water partition coefficient logKow, the energy of the lowest unoccupied molecular orbital ELUMO, and the energy of the highest occupied molecular orbitl EHOMO as described variables, and the highest leave-one-out (LOO) rate and the lower number of support vector samples and bounded support vectors as criterion of searching the optimum parameters of SVM, a classification problem about the model of aquatic toxicity action of 3 sets organic compounds (the number of compounds are 190, 221 and 88 respectively) has been built by the SVM) (Support Vector Machines) technique.The result shows that the misclassified numbers of 3 sets compounds are 0, 2, 0 for C-SVC with RBF kernel and 9, 17, 7 for train-set while the 3 sets of parameters are c=512 γ=2, c=512 γ=0.248 and c=512 γ=0.512 respectively.The qualities of SVM's classification models are relation to the selection and number of descriptors and would be improved after adding the number of descriptors. |
Key words: organic compounds toxicity action classification support vector machine |