Matlab Deep Learning学习笔记
最近对深度学习尤其着迷,是时候用万能的Matlab去践行我的DL学习之路了。之所以用Matlab,是因为Matlab真的太强大了!自从大学开始我就一直用这个神奇的软件,算是最熟悉的编程工具。加上最近mathworks公司一大波大佬的不懈努力,在今年下半年发行的R2017b版本中又加入了诸多新颖的特性,尤其在DL方面,可以发现:仅仅几条简单的代码,就能够实现复杂的功能。基于以上,我在本文列举了几个在Matlab上学习Deep Learning的例子:1. 手写字符识别;2. 搭建网络对CIFAR10分类;3.搭建一个Resnet。务必保证主机已经安装Matlab 2017a及以上。
手写字符识别
利用CNN做数字分类实验。
接下来的实验会阐明如何进行: - 加载图像数据 - 设计网络结构 - 设置网络训练参数 - 训练网络 - 预测新数据的类别
加载图像数据
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos',...
'nndatasets','DigitDataset');
% imageDatastore函数 能够通过文件夹名自动地把数据存储成ImageDatastore 对象
digitData = imageDatastore(digitDatasetPath,...
'IncludeSubfolders',true,'LabelSource','foldernames');
% Display some of the images in the datastore.
figure;
perm = randperm(10000,25);
for i = 1:25
subplot(5,5,i);
imshow(digitData.Files{perm(i)});
end
以下是手写字符的部分数据:
创建训练集与验证集
trainNumFiles = 750;
[trainDigitData,valDigitData] = splitEachLabel(digitData,750,'randomize');
% 每类有1000个,选择其中的750类作为训练集,剩下的作为验证集;此处750可以换成一个比例:75%
注意Matlab里面支持的层的类型,包括:CLICK THIS LINK。如下所示:
Epoch
Iteration
Layer Type
Function
Image input layer
imageInputLayer
Sequence input layer
sequenceInputLayer
2-D convolutional layer
convolution2dLayer
2-D transposed convolutional layer
transposedConv2dLayer
Fully connected layer
fullyConnectedLayer
Long short-term memory (LSTM) layer
LSTMLayer
Rectified linear unit (ReLU) layer
reluLayer
Leaky rectified linear unit (ReLU) layer
leakyReluLayer
Clipped rectified linear unit (ReLU) layer
clippedReluLayer
Batch normalization layer
batchNormalizationLayer
Channel-wise local response normalization (LRN) layer
crossChannelNormalizationLayer
Dropout layer
dropoutLayer
Addition layer
additionLayer
Depth concatenation layer
depthConcatenationLayer
Average pooling layer
averagePooling2dLayer
Max pooling layer
maxPooling2dLayer
Max unpooling layer
maxUnpooling2dLayer
Softmax layer
softmaxLayer
Classification layer
classificationLayer
Regression layer
regressionLayer
创建自己的网络结构
%% Define Network Architecture
% Define the convolutional neural network architecture.
layers =
[
imageInputLayer([28 28 1])
convolution2dLayer(3,16,'Padding',1)
batchNormalizationLayer()
reluLayer()
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding',1)
batchNormalizationLayer()
reluLayer()
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,64,'Padding',1)
batchNormalizationLayer()
reluLayer()
fullyConnectedLayer(10)
softmaxLayer()
classificationLayer()
];
以下就是该网络结构及参数设置:
Image Input 28x28x1 images with 'zerocenter' normalization
Convolution 16 3x3 convolutions with stride [1 1] and padding [1 1 1 1]
Batch Normalization Batch normalization
ReLU ReLU
Max Pooling 2x2 max pooling with stride [2 2] and padding [0 0 0 0]
Convolution 32 3x3 convolutions with stride [1 1] and padding [1 1 1 1]
Batch Normalization Batch normalization
ReLU ReLU
Max Pooling 2x2 max pooling with stride [2 2] and padding [0 0 0 0]
Convolution 64 3x3 convolutions with stride [1 1] and padding [1 1 1 1]
Batch Normalization Batch normalization
ReLU ReLU
Fully Connected 10 fully connected layer
Softmax softmax
Classification Output crossentropyex
网络训练参数设计
options = trainingOptions('sgdm',...
'MaxEpochs',3, ... % 训练最大轮回
'ValidationData',valDigitData,... % 验证集
'ValidationFrequency',30,...
'Verbose',false,...
'Plots','training-progress');
开始训练
net = trainNetwork(trainDigitData,layers,options);
测试新的数据
predictedLabels = classify(net,valDigitData);
valLabels = valDigitData.Labels;
accuracy = sum(predictedLabels == valLabels)/numel(valLabels)
查看某层参数
例如查看第2层的weight参数,输入以下命令:
montage(imresize(mat2gray(net.Layers(2).Weights),[128 128]));
set(gcf,'color',[1 1 1]);
frame=getframe(gcf); % get the frame
image=frame.cdata;
[image,map] = rgb2ind(image,256);
imwrite(image,map,'weight-layer2.png');
图像如下所示: 
再看一下第10层的参数:
[~,~,iter,~]=size(net.Layers(10).Weights);
name='weight.gif';
dt=0.4;
for i=1:iter
montage(imresize(mat2gray(net.Layers(10).Weights(:,:,i,:)),[128 128]));
set(gcf,'color',[1 1 1]); %变白
title(['Layer(10), Channel: ',num2str(i)]);
axis normal
truesize
%Creat GIF
frame(i)=getframe(gcf); % get the frame
image=frame(i).cdata;
[image,map] = rgb2ind(image,256);
if i==1
imwrite(image,map,name,'gif');
else
imwrite(image,map,name,'gif','WriteMode','append','DelayTime',dt);
end
end
搭建网络对CIFAR10分类
CIFAR10和CIFAR100是80 million tiny images的子集,是由Geoffrey Hinton的弟子们Alex Krizhevsky和Vinod Nair共同采集。
CIFAR10
CIFAR10由60000张32*32的彩色图像组成,一种分成10类,平均每类图像6000张。共有50000张训练图像,10000张测试图像。这个数据集被分成了5个分支,其中每个分支10000张。测试集包含每类中随机选择的1000张图像。训练集就是剩下的那些图像。
对于每个分支的数据的大小是:10000*3072;其中3072=32*32*3。数据以行优先的顺序存储,所以前1024个数据是r通道的数据,接下来的1024个数据是g通道的数据,最后1024个数据是b通道的。 假如原始的数据是data,我们想要将其重新排列成我们需要的数据。首先对其进行转置,然后再用reshape函数对图像重组(可选:最后将图像前两维互换(转置),之所以这么做,可以更好的可视化)。
XBatch = data';
XBatch = reshape(XBatch, 32,32,3,[]);
XBatch = permute(XBatch, [2 1 3 4]);
以下是cifar10的部分数据。
共有10类,包括:airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck。
Just run it
接下来我们就开始运行以下代码,来训练我们的网络。闲话少说,我把代码放在了Github,欢迎。
1 'imageinput' Image Input 28x28x1 images with 'zerocenter' normalization
2 'conv_1' Convolution 16 3x3x1 convolutions with stride [1 1] and padding [1 1 1 1]
3 'batchnorm_1' Batch Normalization Batch normalization with 16 channels
4 'relu_1' ReLU ReLU
5 'maxpool_1' Max Pooling 2x2 max pooling with stride [2 2] and padding [0 0 0 0]
6 'conv_2' Convolution 32 3x3x16 convolutions with stride [1 1] and padding [1 1 1 1]
7 'batchnorm_2' Batch Normalization Batch normalization with 32 channels
8 'relu_2' ReLU ReLU
9 'maxpool_2' Max Pooling 2x2 max pooling with stride [2 2] and padding [0 0 0 0]
10 'conv_3' Convolution 64 3x3x32 convolutions with stride [1 1] and padding [1 1 1 1]
11 'batchnorm_3' Batch Normalization Batch normalization with 64 channels
12 'relu_3' ReLU ReLU
13 'fc' Fully Connected 10 fully connected layer
14 'softmax' Softmax softmax
15 'classoutput' Classification Output crossentropyex with '0', '1', and 8 other classes
以下是训练过程输出:
Epoch
Iteration
Time Elapsed (seconds)
Mini-batch Loss
Mini-batch Accuracy
Base Learning Rate
1
1
0.06
2.3026
8.59%
0.0020
1
50
1.27
2.3026
14.06%
0.0020
1
100
2.52
2.3024
7.81%
0.0020
1
150
3.73
2.2999
20.31%
0.0020
1
200
5.01
2.2740
15.63%
0.0020
1
250
6.28
2.1194
21.09%
0.0020
1
300
7.58
1.9100
23.44%
0.0020
1
350
8.86
1.8892
28.13%
0.0020
2
400
10.08
1.7490
29.69%
0.0020
2
450
11.32
1.8377
31.25%
0.0020
2
500
12.57
1.6073
39.84%
0.0020
…
…
…
…
…
…
20
7650
407.74
0.2858
93.75%
2.00e-05
20
7700
409.06
0.3127
89.84%
2.00e-05
20
7750
410.38
0.3254
87.50%
2.00e-05
20
7800
411.64
0.2456
92.19%
2.00e-05
最后测试我们的模型的性能,accuracy=76%左右。但是训练时,我们的batch-accuracy已经达到了90%以上,说明我们的模型过拟合了。显然这不是我们想要的结果,进一步的调参将会在此补充。
可视化某层的参数
% Extract the first convolutional layer weights
w = cifar10Net.Layers(2).Weights;
% rescale and resize the weights for better visualization
w = mat2gray(w);
w = imresize(w, [100 100]);
figure
montage(w)
name='cifar10-weight-layer2';
set(gcf,'color',[1 1 1]);
frame=getframe(gcf); % get the frame
image=frame.cdata;
[image,map] = rgb2ind(image,256);
imwrite(image,map,[name,'.png']);
搭建一个Resnet
接下来,为了验证下这个DL工具包的强大之处,我打算纯手工建一个Resnet。为方便起见,我搭了一个Resnet34(更深的网络敬请期待吧)。这里是它的prototxt,我们可以用网络可视化工具进行查看resnet34的结构。以下是Resnet34的一部分(太长了没有截下全部视图)。 
定义每一层与连接层
以从pool1到res2a为例子建立网络。
layers_example=[
% pool1 - res2a
maxPooling2dLayer(3, 'Stride', 2,'Name','pool1');
% branch2a
convolution2dLayer(3,64,'Stride', 1,'Padding', 1,'Name','res2a_branch2a')
batchNormalizationLayer('Name','bn2a_branch2a')
reluLayer('Name','res2a_branch2a_relu')
% branch2b
convolution2dLayer(3,64,'Stride', 1,'Padding', 1,'Name','res2a_branch2b')
batchNormalizationLayer('Name','bn2a_branch2b')
% add together
additionLayer(2,'Name','res2a')
reluLayer('Name','res2a_relu')
];
上述过程仅仅完成了网络的一个小分支,记下来要完成res2a_branch1这部分的连接。这时候要用到DAG的一些方法。通过添加新层同时建立新的连接即可,方式如下。
lgraph = layerGraph(layers_example);
figure
plot(lgraph)
%% add some connections (shortcut)
layers_2a=[
convolution2dLayer(1,64,'Stride', 1,'Padding', 0,'Name','res2a_branch1')
batchNormalizationLayer('Name','bn2a_branch1')
];
lgraph = addLayers(lgraph,layers_2a);
lgraph = connectLayers(lgraph,'pool1','res2a_branch1');
lgraph = connectLayers(lgraph,'bn2a_branch1','res2a/in2');
% show net
plot(lgraph)
其他部分的构建同上,经过一系列重复的工作,我们可以构建出这个不太深的Resnet34,全部代码见我的Github。
一些基本问题
- 参数的基本格式
- SGD是什么? 可以参见好友写的一篇博文。
- 什么是epoch? 模型训练的时候一般采用stochastic gradient descent(SGD),一次迭代选取一个batch进行update。一个epoch的意思就是迭代次数*batch的数目 和训练数据的个数一样,就是一个epoch。
- 为什么要是用BN? Batch normalization layers normalize the activations and gradients propagating through a network, making network training an easier optimization problem. Use batch normalization layers between convolutional layers and nonlinearities, such as ReLU layers, to speed up network training and reduce the sensitivity to network initialization.
- RELU的作用? Max-Pooling Layer Convolutional layers (with activation functions) are sometimes followed by a down-sampling operation that reduces the spatial size of the feature map and removes redundant spatial information. Down-sampling makes it possible to increase the number of filters in deeper convolutional layers without increasing the required amount of computation per layer. One way of down-sampling is using a max pooling. The max pooling layer returns the maximum values of rectangular regions of inputs.
- Resnet中scale层是如何定义的?有什么用途?
- Resnet中为何残差比好学?
- add more