Machine Learning NoteBook 0817

Technique Sharing

8/17: done

  • 9:00 - 10:00 testing the model
  • 10:00 - 11:30 nothing happened
  • 1:00 - 4:00 finish the main figure 1

current status

  • the result from yesterday have a large fluctuation, so we retrain the model with a lower lr:
    Figure

need to do tomorrow

notice

figures and reference for paper

  • table for draw the figure on after each step the molecular and sequence remain:

    step description data mol seq
    0 original 2278226 986143 8005
    1 drop multichain 2169710 944576 7850
    2 only keep data with $K_i$ value 490605 204901 3404
    3 calculate the number of time that molecular and sequence occur, remove data with molecular occur less than 3 times and sequence occur less than 6 times 288115 55924 1872
    4 remove invalid $K_i$ value(e.g. $K_i$ = 0) 250481 54216 1846
    5 embed molecular and sequence, remove the data which cannot be embedded 250344 54177 1844
    6 remove $pK_i(log10 K_i)$ with higher than $8$ 249517 54135 1844
    • the figure:

    Figure