OpenLAM 2024Q1 | DPA-2 model is compatible with DeePMD-kit-v3 ! #3772
AnguseZhang
started this conversation in
Show and tell
Replies: 1 comment 3 replies
-
I am curious that in finetune, embedding net will be fixed, but why the information in the pretrained model may be "forget" when Single-task Finetune (as Multi-task Finetune said) ? The reason why Multi-task Finetune can prevent overfitting is that multiple fitting nets can verify each other? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First of all, we are excited to announce the first alpha version of DeePMD-kit v3. DeePMD-kit v3 allows you to train and run deep potential models on top of TensorFlow or PyTorch. DeePMD-kit v3 also supports the DPA-2 model, a novel architecture for large atomic models. If you have any problems, ideas or suggestions about OpenLAM, welcome to have a discussion here!
The pretrained model provided is compatible with DeePMD-kit v3 2024Q1 version.
In this page, we systematically report the data for pretraining, the evaluation of the model, and the usage method of the pretrained model. The model can be downloaded from AIS-Square, https://www.aissquare.com/models/detail?pageType=models&name=DPA-2.1.0-2024Q1&id=244
Data included in the pretraining
A general overview is going to be provided.
Alloy
model-branch :
Domains_Alloy
More details and data access can be refered to Alloy-data
SemiCond
model-branch :
Domains_SemiCond
Cathode
model-branch :
Domains_Anode
Cluster
model-branch :
Domains_Cluster
Drug
model-branch:
Domains_Drug
FerroEle
model-branch :
Domains_FerroEle
OC2M
model-branch :
Domains_OC2M
SSE-PBE
model-branch :
Domains_SSE-PBE
H2O-PD
model-branch :
H2O_H2O-PD
AgAu-PBE
model-branch :
Metals_AgAu-PBE
AlMgCu
model-branch :
Metals_AlMgCu
ANI-1x
model-branch :
Domains_ANI
Transition-1x
model-branch :
Domains_Transition1x
Other datasets
Additional datasets include Cu, Sn, Ti , V, W, C12H26, HfO2, In2Se3 H2O-DPLR, H2O-SCAN0, H2O-PBE0TS and H2O-PBE0TS-MD.
Model Evaluation
Test on 18 baseline systems in Q0 report.
To make sure the migration of codes will not affect the training performance, we first trained the model with 18heads, and the test error is very close to the previous version. (Refer to https://aissquare.com/openlam?tab=Statistics ).
Then, we add all downstream data (including ANI-1x, Transition-1x) to the pretraining, and thus bring a 27head-training. We also extended the 27head training to 5-million steps on 8 GPU cards to achieve a better accuracy.
How to use
The pretraineed model is compatible with DeePMD-kit-v3 !
To utlize the pretrained model, you need to first install the corresponding version of DeePMD-kit.
The version should be https://github.com/deepmodeling/deepmd-kit/tree/2024Q1
Zero-shot
Zero-shot learning is a machine learning scenario in which an AI model is trained to recognize and categorize objects or concepts without having seen any examples of those categories or concepts beforehand.
In the context of large atomic models, zero-shot can be considered as a test of model's generalization. For example, we can compare it with the standard deviation of the orginal data. If the zero-shot RMSE is smaller than the corresponding standard deviation, the model shows the ability of zero-shot generalization.
Furthermore, if we want to know whether the pretrained model is suitable for a new circumstance, or which head should be selected for finetuning, we can also perform the zero-shot test.
An instruction for zeroshot can be found at https://bohrium.dp.tech/notebooks/57552161357 via Bohrium Notebook which is a cloud-native computing platfrom.
To be specific, given a new downstream system, we perform a 0-steps training with DeePMD-kit via
dp --pt train
, to determine the energy bias of the downstream task, instead of those used in the pretraining stage. Then we can directly test the zero-shot of the model by selecting the head viadp --pt test
.We also use GST_GAP_22 as a practical example. The main training dataset for GST_GAP_22,is calculated using the PBEsol functional. GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions. More details can be refered to https://materials.colabfit.org/id/DS\_r3hav37ufnmb\_0. Notice that in the SemiCond branch of the pretrained model, the set of data contains structure-to-energy-force labels for 20 semiconductors, namely, Si, Ge, SiC, BAs, BN, AlN, AlP, AlAs, InP, InAs, InSb, GaN, GaP, GaAs, CdTe, InTe-In2Te3, CdSe-CdSe2, InSe-In2Se3, ZnS, CdS-CdS2. We first test the 18heads-1m-model's zero-genralization.
The energy and force std of this system is 0.96 eV/atom, and 0.73 eV/Angstrom. For 18heads-1m model, the SemiCond branch's zero-shot RMSE for energy and force is 0.69 eV/atom. and 0.38 eV/Angstrom, which indicate a good zero-shot generalization of the pretrained-model. Besides, this test also demonstrates that the selection of model branch in zero-shot has a large influence on the performance. Meanwhile, OC2M head also performs well in GeTe-Sb2Te3(GST) system. Last but not least, in the 27heads-5m model, the SemiCond branch's zero-shot RMSE is 0.59 eV/atom, and 0.34 eV/Angstrom, illustrating a better accuracy.
Single-task Finetune
Pretraining-and-finetuning is a widely used approach in other fields such as Computer Vision (CV) or Natural Language Processing (NLP) to vastly reduce the training cost, while it’s not trivial in potential models. Compositions and configurations of data samples or even computational parameters in upstream software (such as VASP) may be different between the pretrained and target datasets, leading to energy shifts or other diversities of training data.
The finetune procedure will inherit the neural network parameters of descriptor in pretrained multitask model. The fitting net can either reinit or inherit the fitting net from any branch of the pre-trained model depending on the argument -m.
If it's your first-time to try the finetuning, we would suggest you perform the se_e2_a from-scratch training, the DPA-2 from-scratch training, choose the proper head for finetuning (you can also perform the zero-shot test), and then compare their performances.
Multi-task Finetune
In come circumstances, single-task finetuning may be not enough. First in few-shot cases, where the downstream training data is quite few, a single-task finetuning may "forget" the information in the pretrained model and cause an over-fitting problem. Second, if we find there'are multiple dataset and multiple pretrained branches proper for our new task, we may want to include all these information. In consequence of these reasons, in DeePMD-kit we also support a multi-task finetuning mode. In the following parts, we will provide instructions on how to perform the multi-task finetuning procedure.
Now we have a multi-task model
model_.pt
with 18 fitting heads, with different branches such asDomains_Alloy
,Domains_Anode
,Domains_Cluster
,Domains_Drug
, etc.The content of
input.json
is different from single-task training. Includes amodel/shared_dict <model/shared_dict>
shared by all models, such asdpa2_descriptor
, and multiple model definitionsmodel/model_dict/model_key <model/model_dict/model_key>
instead of a single model definitionmodel <model>
.For example, we want to run finetune on a new system while continuing to train on the relevant data set of the pretrain model such as
Domains_Alloy
to prevent overfitting. We can define model_dict as follows:Correspondingly, we need to define
loss_dict <loss_dict>
andtraining/data_dict <training/data_dict>
for each task, and control the weights between different heads throughtraining/model_prob <training/model_prob>
:Once we have finished setting up
input.json
, we are ready to train:At last we also use the GST case to compare the performances of from-scratch training, single-task finetuning, and multitask-finetuning. In this case, we construct a few-shot training systems which contains only 40 frames from an iteration from previous data-cleaning work. The test systems contain 144 frames. The training steps are all 20w steps. The bad performace of from-scratch and single-task finetuning attributes the overfitting problem in few-shot cases.
Beta Was this translation helpful? Give feedback.
All reactions