Kaggle初体验

20140907

为了测试下liblinear,就注册了个Kaggle的账号玩玩。

第一个问题就用titanic练手好了:http://www.kaggle.com/c/titanic-gettingStarted

解题思路:
提取性别的一栏,然后用liblinear训练一个svm。
对应代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
%% read train.csv
%1 PassengerId
%2 Survived
%3 Pclass
%4 Name
%5 Sex
%6 Age
%7 SibSp
%8 Parch
%9 Ticket
%10 Fare
%11 Cabin
%12 Embarked
fid = fopen('train.csv');
traindata = textscan(fid,'%d %d %d %q %s %f %d %d %s %f %s %s','Delimiter', ',','HeaderLines',1);
fclose(fid);

%% get label and sex(0 as male, 1 as female)
label = traindata{2};
sex = strcmp('female', traindata{5});

%% train a SVM model
X = double(sparse(sex));
y = double(label);
model = train(y, X, '-s 3 -B 1');

%% read the test.csv
%1 PassengerId
%2 Pclass
%3 Name
%4 Sex
%5 Age
%6 SibSp
%7 Parch
%8 Ticket
%9 Fare
%10 Cabin
%11 Embarked
fid = fopen('test.csv');
testdata = textscan(fid,'%d %d %q %s %f %d %d %s %f %s %s','Delimiter', ',','HeaderLines',1);
fclose(fid);

%% get sex
sex = strcmp('female', testdata{4});

%% predict by SVM
X = double(sparse(sex));
predict_labels = predict(zeros(size(X,1), 1), X, model);

%% write to svm
ofile = fopen('sex.predict.csv', 'w');
fprintf(ofile, 'PassengerId,Survived\n');
for i = 1:numel(testdata{1})
pid = testdata{1}(i);
fprintf(ofile, '%d,%d\n', pid, predict_labels(i));
end
fclose(ofile);

最后得到结果是0.76555。其实就是将所有女性都判成获救,所有男性都判成死掉。。囧
尝试过增加其他特征(比如年龄,阶级等),对准确率提高并没有帮助。

很久没有更新网站,发现多了不少评论和问题,无法一一回复,如果现在仍有问题请再次留言 :) 2016.03.29