the first attempt of the network
fix the problem to processing too slow(with embedding the sequence first)
- do not use
DataFrame
in building theDataLoader
, translateDataFrame
intotensor
before put intoDataLoader
1
2
3
4
5
6
7# train
train_seq = tensor(np.array(train_data.iloc[:,305:])).unsqueeze(dim=1).to(torch.float32)
train_mol = tensor(np.array(train_data.iloc[:,5:305])).unsqueeze(dim=1).to(torch.float32)
train_Ki = tensor(np.array(train_data.iloc[:,4]))
trainDataset = TensorDataset(train_mol,train_seq,train_Ki)
trainDataLoader = DataLoader(trainDataset, batch_size=128)- do not use
add normalization layer after ReLU
need to do tomorrow
- check the embedding results
- improve the network accuracy
notice
- when building the CNN part of the network, should add normalization layer every time after Conv
- don’t put
DataFrame
intoDataSet
figures and reference for paper
about the data:
- the statistic review of protein length(exclude a single data larger than 7k)
- x label: protein sequence length
- y label: number of sequence