超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）-云科研-广西壮族自治区亚热带作物研究所

全文共5505字，预计学习时长11分钟或更长

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

图片来源：Unsplash/Dewang Gupta

通过本文的学习，我们可以了解如何着手构建初始神经网络, 学习经验法则，比如：隐藏层的数量、节点的数量、激活并观察其在TensorFlow2中的应用。

深度学习提供了种类繁多的模型。有了它们，就可以建立非常精确的预测模型。然而，由于设置参数数量巨大且种类繁多，要找到出发点会有些困难。

本文将带领大家找到构建神经网络的出发点，具体来说是以构建多层感知器为例子。虽然使用了多层感知器为例，但大多数法则都是普遍适用于神经网络构建的。

整体思路是通过经验法则先建立第一个神经网络模型。如果第一个模型能合理运行（达到最低可接受精确度），就对其进行调整与优化。否则，就最好去查看数据和问题所在，或采用其他方法。

本文将介绍：

· 建立神经网络的经验法则。

· TensorFlow2中二进制分类的实施代码。

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

神经网络

神经网络在卷积神经网络（CNN）、递归神经网络（RNN）等领域都取得了巨大进步。且随着时间的推移，每个神经网络的几个亚型都得到了发展。正是这一个个的进步，成功地提高了模型的预测能力。

但与此同时，找到模型构建的出发点也变得更难了。

每个神经网络模型都是不同的，不同的模型有不同的特点和功能，所以在这种情况下，选择适用的模型变得像在林立的广告牌中选择合适的商品一样困难。

接下来，我们将利用经验法则构建第一个模型，以克服这些干扰。

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

经验法则

在形形色色的神经网络中，多层感知器是通往深度学习的敲门砖。因此，当学习或构建一个深度学习领域的模型时，多层感知器就是很好的出发点。

以下是构建多层感知器的经验法则，且大多数法则都适用于其他深度学习模型。

1. 层数：从两个隐藏层开始（这其中不包含最后一层）。

2. 中间层的节点数量（大小）：是2的倍数，如4,8,16,32……第一层的节点数应该是输入数据特征数量的一半左右。下一层的大小是上一层的一半。

3. 分类层最后一层的节点数量（大小）：如果是二进制分类，则节点大小为一。对于多类分类器，节点大小等于类的数量。

4. 回归层最后一层的大小：如果是单响应，最后一层的大小为1。对于多响应回归，最后一层的大小等于响应的数量。

5. 激活中间层，使用Relu激活函数。

6. 激活最后一层：如果是二进制分类，使用sigmoid函数，多类分类器使用softmax函数，回归则使用linear函数。对于自动编码器，如果输入数据是连续的，使用linear函数进行激活。对于二进制或多级分类输入，使用sigmoid或softmax函数。

7. Dropout层：除了输入层以外，在其他每个层都加设Dropout（如果单独定义输入层）。将Dropout率设置为0.5。Dropout率大于0.5时结果会适得其反。因此如果觉得0.5会使太多的节点正则化，可以选择增加层的大小，而不是将Dropout率设置在0.5以下。笔者不喜欢在输入层设置任何Dropout。但如果认为有必要这样做，请把Dropout率设置在0.2以下。

8. 数据预处理：假设预测器X是数字，且已经将所有分类列转化成了独热编码。那么就可以在进行模型训练前使用MinMaxScalar对数据进行大小预处理。如果MMS不奏效，就在相同的数据库中使用StandardScalar对数据进行标准化处理。整个数据处理过程不对y进行操作。

9. 将数据分为训练数据，有效数据与测试数据：

使用sklearn.model_selection的train_test_split语句。具体操作见下方示例。

10. 类权重：如果数据不平衡，可以在model.fit中设置类权重以平衡损失。对于二进制分类器，权重应为：{0:1的数量/数据大小，1:0的数量/数据大小}。对于极不平衡数据（罕见事件），类权重可能不起作用，请小心添加使用。

11. 优化器：使用adam优化器，应用其默认学习率。

12. 分类损失：对于二进制分类，使用binary_crossentropy.对于多类别分类，如果标签是独热编码而成的，使用categorical_crossentropy。当标签都是整数时，使用sparse_categorical_crossentropy 。

13. 回归损失：使用均方误差（mse）函数。

14. 分类指标：使用accuracy显示正确分类的百分比。对不平衡数据，也要加上tf.keras.metrics.Recall() 和 tf.keras.metrics.FalsePositives().

15. 回归指标：使用tf.keras.metrics.RootMeanSquaredError()。

16. 训练周期：先从20开始，看模型训练是否显示出损失的减少，或任何在精确度上的提高。如果20个周期还没有一点成果就换个方法。如果得到了些许成果，就把训练周期提升到100。

17. 批处理大小:将批处理大小选择为2的倍数，对于不平衡数据，通常选用较大值，如128，否则一般都从16开始选用。

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

给高级从业人员的一些额外经验法则

1. 振荡损耗：如果在训练中遇到振荡损耗，则是由趋同问题导致的。遇到这种问题时，可以尝试减小学习率和/或更改批处理大小。

2. 过采样和欠采样：如果数据不平衡，使用imblearn.over_sampling中的SMOTE算法。

3. 曲线移动：如果需要进行位移预测，比如早期预测，可以使用曲线移动。下方展示了curve_shift的执行。

4. 自定义度量：假阳率是不平衡二进制分类中的一个重要度量。可以如下方给出的class FalsePositiveRate()的执行所示，构建假阳率度量和其他自定义度量。

5. Selu激活函数：在现今所有的激活函数中，selu激活函数被认为是最好的。我并不完全同意这一点，但如果想要使用selu激活函数，可以使用kernel_initializer=’lecun_normal’和 AlphaDropout.在AlphaDropout中使用0.1比率，AlphaDropout(0.1) 。执行实例在下方给出。

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

TensorFlow 2中的多层感知器（MLP）实例

在这个执行中，可以看到上述经验法则中所提到的内容的实际操作。

上述执行是在TensorFlow 2中完成的。强烈建议所有人都转战使用TensorFlow 2。它不仅拥有Keras所具有的简易性，还显著地提升了计算效率。

本文目的并不是尝试找到最好的模型，而是学习神经网络的实现。不会为了简化过程而跳过任何步骤。相反，本文给出的步骤都很详细，以帮助读者的直接应用。

库

%matplotlib inlineimport matplotlib.pyplot as pltimport seaborn as snsimport pandas as pdimport numpy as npfrom pylab import rcParamsfrom collections import Counterimport tensorflow as tffrom tensorflow.keras import optimizersfrom tensorflow.keras.models import Model, load_model, Sequentialfrom tensorflow.keras.layers import Input, Dense, Dropout, AlphaDropoutfrom tensorflow.keras.callbacks import ModelCheckpoint, TensorBoardfrom sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import confusion_matrix, precision_recall_curvefrom sklearn.metrics import recall_score, classification_report, auc, roc_curvefrom sklearn.metrics import precision_recall_fscore_support, f1_scorefrom numpy.random import seedseed(1)SEED = 123 #used to help randomly select the data pointsDATA_SPLIT_PCT = 0.2rcParams[\’figure.figsize\’] = 8, 6LABELS = [\”Normal\”,\”Break\”]

测试是否使用了正确的TensorFlow版本，运行：tf.__version__

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

读取与准备数据

数据下载传送门：https://docs.google.com/forms/d/e/1FAIpQLSdyUk3lfDl7I5KYK_pw285LCApc-_RcoC0Tf9cnDnZ_TWzPAw/viewform?source=post_page—————————

df = pd.read_csv(\”data/processminer-rare-event-mts – data.csv\”) df.head(n=5) # visualize the data.

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

将分类列转换为独热编码

hotencoding1 = pd.get_dummies(df[\’x28\’]) # Grade&Bwthotencoding1 = hotencoding1.add_prefix(\’grade_\’)hotencoding2 = pd.get_dummies(df[\’x61\’]) # EventPresshotencoding2 = hotencoding2.add_prefix(\’eventpress_\’)df=df.drop([\’x28\’, \’x61\’], axis=1)df=pd.concat([df, hotencoding1, hotencoding2], axis=1)

曲线移动

以下是一个时间序列数据，要求必须提前预测时间（y=1）。在这个数据中，连续行之间间隔两分钟。我们将把行y中的标签移动两行，以提前四分钟进行预测。

sign = lambda x: (1, -1)[x < 0]def curve_shift(df, shift_by): \’\’\’ This function will shift the binary labels in a dataframe. The curve shift will be with respect to the 1s. For example, if shift is -2, the following process will happen: if row n is labeled as 1, then – Make row (n shift_by):(n shift_by-1) = 1. – Remove row n. i.e. the labels will be shifted up to 2 rows up. Inputs: df A pandas dataframe with a binary labeled column. This labeled column should be named as \’y\’. shift_by An integer denoting the number of rows to shift. Output df A dataframe with the binary labels shifted by shift. \’\’\’vector = df[\’y\’].copy() for s in range(abs(shift_by)): tmp = vector.shift(sign(shift_by)) tmp = tmp.fillna(0) vector = tmp labelcol = \’y\’ # Add vector to the df df.insert(loc=0, column=labelcol \’tmp\’, value=vector) # Remove the rows with labelcol == 1. df = df.drop(df[df[labelcol] == 1].index) # Drop labelcol and rename the tmp col as labelcol df = df.drop(labelcol, axis=1) df = df.rename(columns={labelcol \’tmp\’: labelcol}) # Make the labelcol binary df.loc[df[labelcol] > 0, labelcol] = 1return df

向上移动两行

df = curve_shift(df, shift_by = -2)

从这里开始不需要时间行了，将其移除。

df = df.drop([\’time\’], axis=1)

将数据分为训练数据，有效数据和测试数据。

df_train, df_test = train_test_split(df, test_size=DATA_SPLIT_PCT, random_state=SEED)df_train, df_valid = train_test_split(df_train, test_size=DATA_SPLIT_PCT, random_state=SEED)

把X和y分开

x_train = df_train.drop([\’y\’], axis=1)y_train = df_train.y.valuesx_valid = df_valid.drop([\’y\’], axis=1)y_valid = df_valid.y.valuesx_test = df_test.drop([\’y\’], axis=1)y_test = df_test.y

数据缩放

scaler = MinMaxScaler().fit(x_train)# scaler = StandardScaler().fit(x_train)x_train_scaled = scaler.transform(x_train)x_valid_scaled = scaler.transform(x_valid)x_test_scaled = scaler.transform(x_test)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

MLP模型

自定义度量：FalsePositiveRate()

开发一个在下面所有模型中都要用到的FalsePositiveRate()度量。

class FalsePositiveRate(tf.keras.metrics.Metric): def __init__(self, name=\’false_positive_rate\’, **kwargs): super(FalsePositiveRate, self).__init__(name=name, **kwargs) self.negatives = self.add_weight(name=\’negatives\’, initializer=\’zeros\’) self.false_positives = self.add_weight(name=\’false_negatives\’, initializer=\’zeros\’) def update_state(self, y_true, y_pred, sample_weight=None): \’\’\’ Arguments: y_true The actual y. Passed by default to Metric classes. y_pred The predicted y. Passed by default to Metric classes. \’\’\’ # Compute the number of negatives. y_true = tf.cast(y_true, tf.bool) negatives = tf.reduce_sum(tf.cast(tf.equal(y_true, False), self.dtype)) self.negatives.assign_add(negatives) # Compute the number of false positives. y_pred = tf.greater_equal(y_pred, 0.5) # Using default threshold of 0.5 to call a prediction as positive labeled. false_positive_values = tf.logical_and(tf.equal(y_true, False), tf.equal(y_pred, True)) false_positive_values = tf.cast(false_positive_values, self.dtype) if sample_weight is not None: sample_weight = tf.cast(sample_weight, self.dtype) sample_weight = tf.broadcast_weights(sample_weight, values) values = tf.multiply(false_positive_values, sample_weight) false_positives = tf.reduce_sum(false_positive_values) self.false_positives.assign_add(false_positives) def result(self): return tf.divide(self.false_positives, self.negatives)

常规性能绘图函数

def plot_loss(model_history): train_loss=[value for key, value in model_history.items() if \’loss\’ in key.lower()][0] valid_loss=[value for key, value in model_history.items() if \’loss\’ in key.lower()][1]fig, ax1 = plt.subplots()color = \’tab:blue\’ ax1.set_xlabel(\’Epoch\’) ax1.set_ylabel(\’Loss\’, color=color) ax1.plot(train_loss, \’–\’, color=color, label=\’Train Loss\’) ax1.plot(valid_loss, color=color, label=\’Valid Loss\’) ax1.tick_params(axis=\’y\’, labelcolor=color) plt.legend(loc=\’upper left\’) plt.title(\’Model Loss\’)plt.show()def plot_model_recall_fpr(model_history): train_recall=[value for key, value in model_history.items() if \’recall\’ in key.lower()][0] valid_recall=[value for key, value in model_history.items() if \’recall\’ in key.lower()][1]train_fpr=[value for key, value in model_history.items() if \’false_positive_rate\’ in key.lower()][0] valid_fpr=[value for key, value in model_history.items() if \’false_positive_rate\’ in key.lower()][1]fig, ax1 = plt.subplots()color = \’tab:red\’ ax1.set_xlabel(\’Epoch\’) ax1.set_ylabel(\’Recall\’, color=color) ax1.set_ylim([-0.05,1.05]) ax1.plot(train_recall, \’–\’, color=color, label=\’Train Recall\’) ax1.plot(valid_recall, color=color, label=\’Valid Recall\’) ax1.tick_params(axis=\’y\’, labelcolor=color) plt.legend(loc=\’upper left\’) plt.title(\’Model Recall and FPR\’)ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axiscolor = \’tab:blue\’ ax2.set_ylabel(\’False Positive Rate\’, color=color) # we already handled the x-label with ax1 ax2.plot(train_fpr, \’–\’, color=color, label=\’Train FPR\’) ax2.plot(valid_fpr, color=color, label=\’Valid FPR\’) ax2.tick_params(axis=\’y\’, labelcolor=color) ax2.set_ylim([-0.05,1.05])fig.tight_layout() # otherwise the right y-label is slightly clipped plt.legend(loc=\’upper right\’) plt.show()

模型1.参照算法

n_features = x_train_scaled.shape[1]mlp = Sequential()mlp.add(Input(shape=(n_features, )))mlp.add(Dense(32, activation=\’relu\’))mlp.add(Dense(16, activation=\’relu\’))mlp.add(Dense(1, activation=\’sigmoid\’))mlp.summary()mlp.compile(optimizer=\’adam\’, loss=\’binary_crossentropy\’, metrics=[\’accuracy\’, tf.keras.metrics.Recall(), FalsePositiveRate()] )history = mlp.fit(x=x_train_scaled, y=y_train, batch_size=128, epochs=100, validation_data=(x_valid_scaled, y_valid), verbose=0).history

观察模型拟合损失和准确度（召回率和假阳率）变化

plot_loss(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

plot_model_recall_fpr(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

模型2.类权重

参照上文提到的经验法则决定类权重。

class_weight = {0: sum(y_train == 1)/len(y_train), 1: sum(y_train == 0)/len(y_train)}

开始训练模型。

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

plot_model_recall_fpr(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

模型3. Dropout正则化

n_features = x_train_scaled.shape[1]mlp = Sequential()mlp.add(Input(shape=(n_features, )))mlp.add(Dense(32, activation=\’relu\’))mlp.add(Dropout(0.5))mlp.add(Dense(16, activation=\’relu\’))mlp.add(Dropout(0.5))mlp.add(Dense(1, activation=\’sigmoid\’))mlp.summary()mlp.compile(optimizer=\’adam\’, loss=\’binary_crossentropy\’, metrics=[\’accuracy\’, tf.keras.metrics.Recall(), FalsePositiveRate()] )history = mlp.fit(x=x_train_scaled, y=y_train, batch_size=128, epochs=100, validation_data=(x_valid_scaled, y_valid), class_weight=class_weight, verbose=0).historyplot_loss(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

plot_model_recall_fpr(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

模型4. 过采样-欠采样

使用SMOTE重采样。

from imblearn.over_sampling import SMOTEsmote = SMOTE(random_state=212)x_train_scaled_resampled, y_train_resampled = smote.fit_resample(x_train_scaled, y_train)print(\’Resampled dataset shape %s\’ % Counter(y_train_resampled))

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

n_features = x_train_scaled.shape[1]mlp = Sequential()mlp.add(Input(shape=(n_features, )))mlp.add(Dense(32, activation=\’relu\’))mlp.add(Dropout(0.5))mlp.add(Dense(16, activation=\’relu\’))mlp.add(Dropout(0.5))mlp.add(Dense(1, activation=\’sigmoid\’))mlp.summary()mlp.compile(optimizer=\’adam\’, loss=\’binary_crossentropy\’, metrics=[\’accuracy\’, tf.keras.metrics.Recall(), FalsePositiveRate()] )history = mlp.fit(x=x_train_scaled_resampled, y=y_train_resampled, batch_size=128, epochs=100, validation_data=(x_valid, y_valid), class_weight=class_weight, verbose=0).historyplot_loss(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

plot_model_recall_fpr(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

模型5. Selu激活

使用因其自规范化属性而广为人知的selu激活函数。

注意：

· 使用kernel_initializer=’lecun_normal’ 并AlphaDropout(0.1)

· 在AlphaDropout中使用0.1比率，AlphaDropout(0.1)

n_features = x_train_scaled.shape[1]mlp = Sequential()mlp.add(Input(shape=(n_features, )))mlp.add(Dense(32, kernel_initializer=\’lecun_normal\’, activation=\’selu\’))mlp.add(AlphaDropout(0.1))mlp.add(Dense(16, kernel_initializer=\’lecun_normal\’, activation=\’selu\’))mlp.add(AlphaDropout(0.1))mlp.add(Dense(1, activation=\’sigmoid\’))mlp.summary()mlp.compile(optimizer=\’adam\’, loss=\’binary_crossentropy\’, metrics=[\’accuracy\’, tf.keras.metrics.Recall(), FalsePositiveRate()] )history = mlp.fit(x=x_train_scaled, y=y_train, batch_size=128, epochs=100, validation_data=(x_valid, y_valid), class_weight=class_weight, verbose=0).historyplot_loss(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

plot_model_recall_fpr(history)

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

结论

· 尽管深度学习提供了预测模型能力，在寻找出发点的时候也可能会无所适从。

· 文中的经验法则提供了构建初始神经网络的出发点。

· 在此基础上构建的模型应进行进一步调整，以提高性能。

· 如果使用这些经验法则构建的模型性能没有任何可取之处，进一步调整可能不会带来太大的改进，这时就应尝试使用其他的方法。

· 本文介绍了在TensorFlow 2中实现神经网络的步骤。

· 如果没有TensorFlow 2，建议开始使用它。Tensorflow2不仅具有Keras所具有的简易性，同时还具备高性能的特点。

Tensorflow2安装指南：https://towardsdatascience.com/step-by-step-guide-to-install-tensorflow-2-0-67bc73e79b82?source=post_page—————————

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

留言点赞发个朋友圈

我们一起分享AI学习与发展的干货

编译组：段昌蓉、蒋馨怡

超简单！一文带你了解如何着手构建初始神经网络（初始化神经网络）

相关推荐