XGBoost

2019/1/30 posted in  MachineLearning

GBDT

2019/1/30 posted in  MachineLearning

Adaboost

AdaBoost算法原理简述

Adaptive Boosting 是一种迭代算法.每一轮迭代中会在训练集上产生一个新的学习器,然后使用该学习器对所有样本进行预测,以评估每个样本的重要性.换句话来讲,算法会为每个样本赋予一个权重,每次用训练好的学习器标注/预测各个样本.如果某个样本点被预测的越正确,则将其权重降低;否则提高样本的权重.权重越高的样本在下一轮迭代训练中所占的比重就越大,也就是越难区分的样本在训练过程中会变得越重要.

Read more   2019/1/30 posted in  MachineLearning

Objective vs. Cost vs. Loss vs. Error Function

The function we want to minimize or maximize is called the objective function, or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function - these terms are synonymous. The cost function is used more in optimization problem and loss function is used in parameter estimation.

Read more   2018/11/13 posted in  MachineLearning

Isolation Forest

scikit-learn返回:是异常点(-1)或者不是异常点(1)

孤立森林-->查看数据页面,如上所示
原始数据的所有列,预测出来是否是异常值,也即是是否偏离(偏移即是-1),偏移度也就是decision_function算出来的值,返回样本的异常评分,值越小表示越有可能是异常样本
data,model.predict(X_train),model.decision_function(X_train)

df=pd.concat([pd.DataFrame(X_train),pd.Series(clf.predict(X_train)), pd.Series(clf.decision_function(X_train))], axis=1)

df.columns = ['a', 'b', 'c', 'd']
2018/10/18 posted in  MachineLearning