接触了一个在Inference领域比较有影响力的模型——ESIM。同时薅了Colab羊毛。
ESIM模型简介 Enhanced LSTM for Natural Language Inference 这篇论文提出了一种计算两个句子相似度的模型。模型由3个部分组成:
首先将输入的两个句子,premise和hypothesis的词向量$a=(a_1,…,a_{l_a})$和$b=(b_1,…,b_{l_b})$经过一个BiLSTM的处理,得到新的词向量表示$(\bar{a_1}, \dots, \bar{a_{l_a}})$和$(\bar{b_1}, \dots, \bar{b_{l_b}})$。
Local Inference 论文中说到,计算两个词的相关程度最好的方法是计算词向量的内积,也就是$e_{ij}=\bar{a_i}^T\bar{b_j}$。这样,计算两个句子的所有词对之间的相似度(attention),就可以获得一个矩阵
$$(e_{ij}){l_a \times l_b} = (\bar{a_i}^T\bar{b_j}) {l_a \times l_b}$$
接着是一个很有意思的思想:既然要判断两个句子相似度,那么就需要看看两者之间能否相互表示。也就是分别用premise和hypothesis中的词向量$\bar{a_i}$和$\bar{b_i}$表示对方的词向量。
论文中的公式为:
$$\widetilde{a_i} = \sum_{j=1}^{l_b}{\frac{exp(e_{ij})}{\sum_{k=1}^{l_b}{exp(e_{ik})}}\bar{b_j}}$$ $$\widetilde{b_j} = \sum_{i=1}^{l_a}{\frac{exp(e_{ij})}{\sum_{k=1}^{l_a}{exp(e_{kj})}}\bar{a_i}}$$
翻译一下就是,因为模型不知道应该哪对$a_i$和$b_j$才是相近或相对,所以做了一个枚举的操作,将所有的情况都表示出来。之前计算的相似度矩阵就是就用来做加权。每个位置上的权重即当前权重矩阵行(对于计算$\widetilde{a_i}$来说,对于计算$\widetilde{b_j}$就是列)的Softmax值。
论文为了强化推理(Enhancement of inference information),将之前得到的中间结果都堆叠起来。
$$m_a = [\bar{a};\widetilde{a};\bar{a}-\widetilde{a};\bar{a} \odot \widetilde{a}]$$ $$m_b = [\bar{b};\widetilde{b};\bar{b}-\widetilde{b};\bar{b} \odot \widetilde{b}]$$
Inference Composition 推理组合使用的词向量就是上一个部分所得的$m_a$和$m_b$,还是用到了BiLSTM来获取两组词向量的上下文信息。
将所有的信息组合起来之后,一并送给全连接层,完成最后的糅合。
导入需要用到的库 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import osimport timeimport loggingimport picklefrom tqdm import tqdm_notebook as tqdmimport torchimport torch.nn as nnimport torch.nn.functional as Fimport torch.optim as optimimport torchtextfrom torchtext import data, datasetsfrom torchtext.vocab import GloVeimport numpy as npimport matplotlib.pyplot as plt%matplotlib inline import nltkfrom nltk import word_tokenizeimport spacyfrom keras_preprocessing.text import Tokenizerdevice = torch.device("cuda" if torch.cuda.is_available() else "cpu" ) print (device)
cuda
挂载Google Drive
1 2 from google.colab import drivedrive.mount('/content/drive' )
Go to this URL in a browser: https://accounts.google.com/o/oauth2/xxxxxxxx
Enter your authorization code:
··········
Mounted at /content/drive
Fri Aug 9 04:45:35 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 60C P0 62W / 149W | 6368MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
使用torchtext准备数据 torchtext的使用方式参考了参考了:https://github.com/pytorch/examples/blob/master/snli/train.py
torchtext中的GloVe可以直接使用,但是由于其没有提供类似torchvision的直接读取源文件的功能,而只能读取缓存,所以最好:
先将GloVe下载到本地
在下载目录打开终端,然后在终端中先使用torchtext生成缓存
以后使用GloVe的时候增加cache参数,这样torchtext就会从cache中读取而不是下载庞大的GloVe到本地了
不过如果薅的是Colab羊毛,那就随便了(~ ̄▽ ̄)~
torchtext还可以直接加载SNLI数据集,不过数据集的加载目录结构如下:
root
snli_1.0
snli_1.0_train.jsonl
snli_1.0_dev.jsonl
snli_1.0_test.jsonl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 TEXT = data.Field(batch_first=True , lower=True , tokenize="spacy" ) LABEL = data.Field(sequential=False ) tic = time.time() train, dev, test = datasets.SNLI.splits(TEXT, LABEL) print (f"Cost: {(time.time() - tic) / 60 :.2 f} min" )tic = time.time() glove_vectors = GloVe(name='6B' , dim=100 ) print (f"Creat GloVe done. Cost: {(time.time() - tic) / 60 :.2 f} min" )tic = time.time() TEXT.build_vocab(train, dev, test, vectors=glove_vectors) LABEL.build_vocab(train) print (f"Build vocab done. Cost: {(time.time() - tic) / 60 :.2 f} min" )print (f"TEXT.vocab.vectors.size(): {TEXT.vocab.vectors.size()} " )num_words = int (TEXT.vocab.vectors.size()[0 ]) if os.path.exists("/content/drive/My Drive/Colab Notebooks" ): glove_stoi_path = "/content/drive/My Drive/Colab Notebooks/vocab_label_stoi.pkl" else : glove_stoi_path = "./vocab_label_stoi.pkl" pickle.dump([TEXT.vocab.stoi, LABEL.vocab.stoi], open (glove_stoi_path, "wb" )) batch_sz = 128 train_iter, dev_iter, test_iter = data.BucketIterator.splits( datasets=(train, dev, test), batch_sizes=(batch_sz, batch_sz, batch_sz), shuffle=True , device=device )
Cost: 7.94 min
Creat GloVe done. Cost: 0.00 min
Build vocab done. Cost: 0.12 min
TEXT.vocab.vectors.size(): torch.Size([34193, 100])
通用参数配置 炼丹的时候最好有一个全局配方,这样好调整。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 class Config : def __init__ (self ): self.batch_first = True try : self.batch_size = batch_sz except NameError: self.batch_size = 512 self.n_embed = len (TEXT.vocab) self.d_embed = TEXT.vocab.vectors.size()[-1 ] self.linear_size = self.d_embed self.hidden_size = 300 self.d_out = len (LABEL.vocab) self.dropout = 0.5 self.save_path = r"/content/drive/My Drive/Colab Notebooks" if os.path.exists( r"/content/drive/My Drive/Colab Notebooks" ) else "./" self.snapshot = os.path.join(self.save_path, "ESIM.pt" ) self.device = device self.epoch = 64 self.scheduler_step = 3 self.lr = 0.0004 self.early_stop_ratio = 0.985 args = Config()
ESIM模型代码实现 代码参考了:https://github.com/pengshuang/Text-Similarity/blob/master/models/ESIM.py
nn.BatchNorm1d的使用 对数据的正则化可以消除不同维度数据分布不同的问题,几何上的理解就是将n维空间的一个“椭球体”正则化为一个“球体”,这样可以简化模型的训练难度,提高训练速度。
但是如果将所有的输入数据全部正则化,会消耗大量的时间,Batch Normalization就是一种折衷的方法,它只对输入的batch_size个数据进行正则化。从概率上理解就是根据batch_size个样本的分布,估计所有样本的分布。
PyTorch的nn.BatchNorm1d 听名字就知道是对一维数据的批正则化,所以这里有两个限制条件:
训练(即打开了model.train()
)的时候,提供的批大小至少为2;测试、使用的(model.eval()
)时候没有batch大小的限制
默认倒数第2维是“batch”
而我之前的数据处理所得到的每一个批次的数据,经过词向量映射之后得到的形状为batch * seq_len * embed_dim
,所以这里有3个维度。并且经过torchtext的data.BucketIterator.splits
处理,每个batch的seq_len
是动态的(和当前batch中最长句子的长度相同)。这样如果不加处理直接输入给BatchNorm1d
,一般会看到如下的报错:
RuntimeError: running_mean should contain xxx elements not yyy
关于Embedding之后是否需要增加BatchNorm1d层 参考代码实现非常漂亮,可以看出作者的代码功底。不过作者似乎不是使用预处理的词向量作为Embedding向量,而我是用的是预训练的词向量GloVe,并且也不会去训练Glove,所以是否有必要增加nn.BatchNorm1d
?
因为盲目增加网络的层数并不会有好的影响,所以最好的方式就是先看看GloVe词向量是不是每一维都是“正则化的”。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 glove = TEXT.vocab.vectors means, stds = glove.mean(dim=0 ).numpy(), glove.std(dim=0 ).numpy() dims = [i for i in range (glove.shape[1 ])] plt.scatter(dims, means) plt.scatter(dims, stds) plt.legend(["mean" , "std" ]) plt.xlabel("Dims" ) plt.ylabel("Features" ) plt.show() print (f"mean(means)={means.mean():.4 f} , std(means)={means.std():.4 f} " )print (f"mean(stds)={stds.mean():.4 f} , std(stds)={stds.std():.4 f} " )
mean(means)=0.0032, std(means)=0.0809
mean(stds)=0.4361, std(stds)=0.0541
从图中可以看出每一维的分布还是比较稳定的,所以不打算在Embedding层后使用nn.BatchNorm1d
。
nn.LSTM的使用 1 2 3 nn.LSTM( input_size, hidden_size, num_layers, bias=True, batch_first=False, dropout=0, bidirectional=False) )
nn.LSTM
的默认参数batch_first是False
,这会让习惯了CV的数据格式的我十分不适应,所以最好还是设置一下True
。
以下是LSTM的输入/输出格式。Inputs可以不带上h_0
和c_0
,这个时候LSTM会自动生成全0的h_0
和c_0
。
Inputs: input, (h_0, c_0)
Outputs: output, (h_n, c_n)
input: (seq_len, batch, input_size)
output: (seq_len, batch, num_directions * hidden_size)
h / c: (num_layers * num_directions, batch, hidden_size)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 class ESIM (nn.Module): def __init__ (self, args ): super (ESIM, self).__init__() self.args = args self.embedding = nn.Embedding( args.n_embed, args.d_embed) self.lstm1 = nn.LSTM(args.d_embed, args.hidden_size, num_layers=1 , batch_first=True , bidirectional=True ) self.lstm2 = nn.LSTM(args.hidden_size * 8 , args.hidden_size, num_layers=1 , batch_first=True , bidirectional=True ) self.fc = nn.Sequential( nn.BatchNorm1d(args.hidden_size * 8 ), nn.Linear(args.hidden_size * 8 , args.linear_size), nn.ELU(inplace=True ), nn.BatchNorm1d(args.linear_size), nn.Dropout(args.dropout), nn.Linear(args.linear_size, args.linear_size), nn.ELU(inplace=True ), nn.BatchNorm1d(args.linear_size), nn.Dropout(args.dropout), nn.Linear(args.linear_size, args.d_out), nn.Softmax(dim=-1 ) ) def submul (self, x1, x2 ): mul = x1 * x2 sub = x1 - x2 return torch.cat([sub, mul], -1 ) def apply_multiple (self, x ): p1 = F.avg_pool1d(x.transpose(1 , 2 ), x.size(1 )).squeeze(-1 ) p2 = F.max_pool1d(x.transpose(1 , 2 ), x.size(1 )).squeeze(-1 ) return torch.cat([p1, p2], 1 ) def soft_attention_align (self, x1, x2, mask1, mask2 ): ''' x1: batch_size * seq_len * dim x2: batch_size * seq_len * dim ''' attention = torch.matmul(x1, x2.transpose(1 , 2 )) mask1 = mask1.float ().masked_fill_(mask1, float ('-inf' )) mask2 = mask2.float ().masked_fill_(mask2, float ('-inf' )) weight1 = F.softmax(attention + mask2.unsqueeze(1 ), dim=-1 ) x1_align = torch.matmul(weight1, x2) weight2 = F.softmax(attention.transpose( 1 , 2 ) + mask1.unsqueeze(1 ), dim=-1 ) x2_align = torch.matmul(weight2, x1) return x1_align, x2_align def forward (self, sent1, sent2 ): """ sent1: batch * la sent2: batch * lb """ mask1, mask2 = sent1.eq(0 ), sent2.eq(0 ) x1, x2 = self.embedding(sent1), self.embedding(sent2) o1, _ = self.lstm1(x1) o2, _ = self.lstm1(x2) q1_align, q2_align = self.soft_attention_align(o1, o2, mask1, mask2) q1_combined = torch.cat([o1, q1_align, self.submul(o1, q1_align)], -1 ) q2_combined = torch.cat([o2, q2_align, self.submul(o2, q2_align)], -1 ) q1_compose, _ = self.lstm2(q1_combined) q2_compose, _ = self.lstm2(q2_combined) q1_rep = self.apply_multiple(q1_compose) q2_rep = self.apply_multiple(q2_compose) similarity = self.fc(torch.cat([q1_rep, q2_rep], -1 )) return similarity def take_snapshot (model, path ): """保存模型训练结果到Drive上,防止Colab重置后丢失""" torch.save(model.state_dict(), path) print (f"Snapshot has been saved to {path} " ) def load_snapshot (model, path ): model.load_state_dict(torch.load(path)) print (f"Load snapshot from {path} done." ) model = ESIM(args) model.embedding.weight.data.copy_(TEXT.vocab.vectors) model.embedding.weight.requires_grad = False model.to(args.device)
ESIM(
(embedding): Embedding(34193, 100)
(lstm1): LSTM(100, 300, batch_first=True, bidirectional=True)
(lstm2): LSTM(2400, 300, batch_first=True, bidirectional=True)
(fc): Sequential(
(0): BatchNorm1d(2400, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): Linear(in_features=2400, out_features=100, bias=True)
(2): ELU(alpha=1.0, inplace)
(3): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(4): Dropout(p=0.5)
(5): Linear(in_features=100, out_features=100, bias=True)
(6): ELU(alpha=1.0, inplace)
(7): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): Dropout(p=0.5)
(9): Linear(in_features=100, out_features=4, bias=True)
(10): Softmax()
)
)
训练阶段 这里有几个细节:
batch.label的形状 batch.label
是形状为(batch)的一维向量;而Y_pred
是形状为$batch \times 4$的二维向量,使用.topk(1).indices
提取最大值后仍然是二维向量。
所以如果不拓展batch.label
的维度,PyTorch会自动广播batch.label
,最终得到的结果不再是$batch \times 1$,而是$batch \times batch$,那么最后计算出来的准确率会大到离谱。这是下面代码的含义:
1 (Y_pred.topk(1 ).indices == batch.label.unsqueeze(1 ))
tensor和标量的除法 在Python3.6中,除法符号/
的结果默认是浮点型的,但是PyTorch并不是这样,这也是另一个很容易忽视的细节。
1 (Y_pred.topk(1 ).indices == batch.label.unsqueeze(1 ))
上面代码结果可以看作是bool类型(实际上是torch.uint8
)。调用.sum()
求和之后的结果类型是torch.LongTensor
。但是PyTorch中整数除法是不会得到浮点数的。
1 2 3 In [2 ]: torch.LongTensor([1 ]) / torch.LongTensor([5 ]) Out[2 ]: tensor([0 ])
变量acc累加了每一个batch中计算正确的样本数量,由于自动类型转换,acc现在指向torch.LongTensor
类型,所以最后计算准确率的时候一定要用.item()
提取出整数值。如果忽视了这个细节,那么最后得到的准确率是0。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 def training (model, data_iter, loss_fn, optimizer ): """训练部分""" model.train() data_iter.init_epoch() acc, cnt, avg_loss = 0 , 0 , 0.0 for batch in data_iter: Y_pred = model(batch.premise, batch.hypothesis) loss = loss_fn(Y_pred, batch.label) optimizer.zero_grad() loss.backward() optimizer.step() avg_loss += loss.item() / len (data_iter) acc += (Y_pred.topk(1 ).indices == batch.label.unsqueeze(1 )).sum () cnt += len (batch.premise) return avg_loss, (acc.item() / cnt) def validating (model, data_iter, loss_fn ): """验证部分""" model.eval () data_iter.init_epoch() acc, cnt, avg_loss = 0 , 0 , 0.0 with torch.set_grad_enabled(False ): for batch in data_iter: Y_pred = model(batch.premise, batch.hypothesis) avg_loss += loss_fn(Y_pred, batch.label).item() / len (data_iter) acc += (Y_pred.topk(1 ).indices == batch.label.unsqueeze(1 )).sum () cnt += len (batch.premise) return avg_loss, (acc.item() / cnt) def train (model, train_data, val_data ): """训练过程""" optimizer = optim.Adam(model.parameters(), lr=args.lr) loss_fn = nn.CrossEntropyLoss() scheduler = optim.lr_scheduler.ReduceLROnPlateau( optimizer, mode='min' , factor=0.5 , patience=args.scheduler_step, verbose=True ) train_losses, val_losses, train_accs, val_accs = [], [], [], [] tic = time.time() train_loss, train_acc = validating(model, train_data, loss_fn) val_loss, val_acc = validating(model, val_data, loss_fn) train_losses.append(train_loss) val_losses.append(val_loss) train_accs.append(train_acc) val_accs.append(val_acc) min_val_loss = val_loss print (f"Epoch: 0/{args.epoch} \t" f"Train loss: {train_loss:.4 f} \tacc: {train_acc:.4 f} \t" f"Val loss: {val_loss:.4 f} \tacc: {val_acc:.4 f} \t" f"Cost time: {(time.time()-tic):.2 f} s" ) try : for epoch in range (args.epoch): tic = time.time() train_loss, train_acc = training( model, train_data, loss_fn, optimizer) val_loss, val_acc = validating(model, val_data, loss_fn) train_losses.append(train_loss) val_losses.append(val_loss) train_accs.append(train_acc) val_accs.append(val_acc) scheduler.step(val_loss) print (f"Epoch: {epoch + 1 } /{args.epoch} \t" f"Train loss: {train_loss:.4 f} \tacc: {train_acc:.4 f} \t" f"Val loss: {val_loss:.4 f} \tacc: {val_acc:.4 f} \t" f"Cost time: {(time.time()-tic):.2 f} s" ) if val_loss < min_val_loss: min_val_loss = val_loss take_snapshot(model, args.snapshot) except KeyboardInterrupt: print ("Interrupted by user" ) return train_losses, val_losses, train_accs, val_accs train_losses, val_losses, train_accs, val_accs = train( model, train_iter, dev_iter)
Epoch: 0/64 Train loss: 1.3871 acc: 0.3335 Val loss: 1.3871 acc: 0.3331 Cost time: 364.32s
Epoch: 1/64 Train loss: 1.0124 acc: 0.7275 Val loss: 0.9643 acc: 0.7760 Cost time: 998.41s
Snapshot has been saved to /content/drive/My Drive/Colab Notebooks/ESIM.pt
Epoch: 2/64 Train loss: 0.9476 acc: 0.7925 Val loss: 0.9785 acc: 0.7605 Cost time: 1003.32s
Epoch: 3/64 Train loss: 0.9305 acc: 0.8100 Val loss: 0.9204 acc: 0.8217 Cost time: 999.49s
Snapshot has been saved to /content/drive/My Drive/Colab Notebooks/ESIM.pt
Epoch: 4/64 Train loss: 0.9183 acc: 0.8227 Val loss: 0.9154 acc: 0.8260 Cost time: 1000.97s
Snapshot has been saved to /content/drive/My Drive/Colab Notebooks/ESIM.pt
Epoch: 5/64 Train loss: 0.9084 acc: 0.8329 Val loss: 0.9251 acc: 0.8156 Cost time: 996.99s
....
Epoch: 21/64 Train loss: 0.8236 acc: 0.9198 Val loss: 0.8912 acc: 0.8514 Cost time: 992.48s
Epoch: 22/64 Train loss: 0.8210 acc: 0.9224 Val loss: 0.8913 acc: 0.8514 Cost time: 996.35s
Epoch 22: reducing learning rate of group 0 to 5.0000e-05.
Epoch: 23/64 Train loss: 0.8195 acc: 0.9239 Val loss: 0.8940 acc: 0.8485 Cost time: 1000.48s
Epoch: 24/64 Train loss: 0.8169 acc: 0.9266 Val loss: 0.8937 acc: 0.8490 Cost time: 1006.78s
Interrupted by user
绘制Loss-Accuracy曲线 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 iters = [i + 1 for i in range (len (train_losses))] min_len = min (len (train_losses), len (val_losses)) fig, ax1 = plt.subplots() ax1.plot(iters, train_losses[: min_len], '-' , label='train loss' ) ax1.plot(iters, val_losses[: min_len], '-.' , label='val loss' ) ax1.set_xlabel("Epoch" ) ax1.set_ylabel("Loss" ) ax2 = ax1.twinx() ax2.plot(iters, train_accs[: min_len], ':' , label='train acc' ) ax2.plot(iters, val_accs[: min_len], '--' , label='val acc' ) ax2.set_ylabel("Accuracy" ) handles1, labels1 = ax1.get_legend_handles_labels() handles2, labels2 = ax2.get_legend_handles_labels() plt.legend(handles1 + handles2, labels1 + labels2, loc='center right' ) plt.show()
预测 模型除了训练出结果以外,还需要能在实际中运用。
1 2 3 4 5 6 7 8 9 nlp = spacy.load("en" ) load_snapshot(model, args.snapshot) model.to(torch.device("cpu" )) with open (r"/content/drive/My Drive/Colab Notebooks/vocab_label_stoi.pkl" , "rb" ) as f: vocab_stoi, label_stoi = pickle.load(f)
Load snapshot from /content/drive/My Drive/Colab Notebooks/ESIM.pt done.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 def sentence2tensor (stoi, sent1: str , sent2: str ): """将两个句子转化为张量""" sent1 = [str (token) for token in nlp(sent1.lower())] sent2 = [str (token) for token in nlp(sent2.lower())] tokens1, tokens2 = [], [] for token in sent1: tokens1.append(stoi[token]) for token in sent2: tokens2.append(stoi[token]) delt_len = len (tokens1) - len (tokens2) if delt_len > 0 : tokens2.extend([1 ] * delt_len) else : tokens1.extend([1 ] * (-delt_len)) tensor1 = torch.LongTensor(tokens1).unsqueeze(0 ) tensor2 = torch.LongTensor(tokens2).unsqueeze(0 ) return tensor1, tensor2 def use (model, premise: str , hypothsis: str ): """使用模型测试""" label_itos = {0 : '<unk>' , 1 : 'entailment' , 2 : 'contradiction' , 3 : 'neutral' } model.eval () with torch.set_grad_enabled(False ): tensor1, tensor2 = sentence2tensor(vocab_stoi, premise, hypothsis) predict = model(tensor1, tensor2) top1 = predict.topk(1 ).indices.item() print (f"The answer is '{label_itos[top1]} '" ) prob = predict.cpu().squeeze().numpy() plt.bar(["<unk>" , "entailment" , "contradiction" , "neutral" ], prob) plt.ylabel("probability" ) plt.show()
输入两个句子之后,打印最可能的推测结果,并用直方图显示每种推测的概率
1 2 3 4 5 6 7 8 9 10 11 12 13 14 use(model, "A statue at a museum that no seems to be looking at." , "There is a statue that not many people seem to be interested in." ) use(model, "A land rover is being driven across a river." , "A sedan is stuck in the middle of a river." ) use(model, "A woman with a green headscarf, blue shirt and a very big grin." , "The woman is young." )
The answer is 'entailment'
The answer is 'contradiction'
The answer is 'neutral'