Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not find all_ict_samples.jsonl_dev #5

Open
rahulmool opened this issue Apr 2, 2023 · 6 comments
Open

can not find all_ict_samples.jsonl_dev #5

rahulmool opened this issue Apr 2, 2023 · 6 comments

Comments

@rahulmool
Copy link

In run_xict.sh there is a command
python -m torch.distributed.launch
--nproc_per_node 1 run_xict.py
--max_grad_norm 2.0
--encoder_model_type hf_bert
--pretrained_model_cfg bert-base-multilingual-uncased
--seed 12345 --sequence_length 256
--warmup_steps 300 --batch_size 4 --do_lower_case
--train_file "../../data/bbc_passages/all_ict_samples.jsonl_[0,1,2]"
--dev_file ../../data/bbc_passages/all_ict_samples.jsonl_dev
--output_dir xict_outputs
--checkpoint_file_name xICT_biencoder.pt
--learning_rate 2e-05 --num_train_epochs 40
--dev_batch_size 6 --val_av_rank_start_epoch 30
but I don't know where can i find all_ict_samples.jsonl_dev
Instead of this file I am using all_ict_samples-trans100.jsonl
but it gives me error
#4 (comment)

@khuangaf
Copy link
Owner

khuangaf commented Apr 2, 2023

Can you check whether all_ict_samples-trans100.jsonl contains any data, or is it empty?

@rahulmool
Copy link
Author

Yes It does contains data

@khuangaf
Copy link
Owner

khuangaf commented Apr 2, 2023

Can you check if this data variable is not an empty list? https://github.com/khuangaf/CONCRETE/blob/master/CORA/mDPR/run_xict.py#L94

@rahulmool
Copy link
Author

yes for validation the data variable is empty.

@rahulmool
Copy link
Author

this is the exact output

init using bert-base-multilingual-uncased
loading weights file https://cdn.huggingface.co/bert-base-multilingual-uncased-pytorch_model.bin from cache at /home/22cs60r72/.cache/torch/transformers/b72dd13aa8437628227c4928499f2476a84af4ab7ed75eb1365c5ae9fdbd7638.54b4dad9cc3db9aa8448458b782d11ab07c80dedb951906fd2f684a00ecdb1ee
All model checkpoint weights were used when initializing HFBertEncoder.

All the weights of HFBertEncoder were initialized from the model checkpoint at bert-base-multilingual-uncased.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use HFBertEncoder for predictions without further training.
loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-config.json from cache at /home/22cs60r72/.cache/torch/transformers/33b56ce0f312e47e4d77a57791a4fc6233ae4a560dd2bdd186107058294e58ab.fcb1786f49c279f0e0f158c9972b9bd9f6c0edb5d893dcb9b530d714d86f0edc
Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 105879
}

init using bert-base-multilingual-uncased
loading weights file https://cdn.huggingface.co/bert-base-multilingual-uncased-pytorch_model.bin from cache at /home/22cs60r72/.cache/torch/transformers/b72dd13aa8437628227c4928499f2476a84af4ab7ed75eb1365c5ae9fdbd7638.54b4dad9cc3db9aa8448458b782d11ab07c80dedb951906fd2f684a00ecdb1ee
All model checkpoint weights were used when initializing HFBertEncoder.

All the weights of HFBertEncoder were initialized from the model checkpoint at bert-base-multilingual-uncased.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use HFBertEncoder for predictions without further training.
loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt from cache at /home/22cs60r72/.cache/torch/transformers/bb773818882b0524dc53a1b31a2cc95bc489f000e7e19773ba07846011a6c711.535306b226c42cebebbc0dabc83b92ab11260e9919e21e2ab0beb301f267b4c7
Reading file ../../data/bbc_passages/all_ict_samples.jsonl_1
Aggregated data size: 12250
Reading file ../../data/bbc_passages/all_ict_samples.jsonl_2
Aggregated data size: 24500
Reading file ../../data/bbc_passages/all_ict_samples.jsonl_0
Aggregated data size: 36750
Total cleaned data size: 36750
Total iterations per epoch=9188
Total updates=367520
Eval step = 9188
***** Training *****
***** Epoch 0 *****
Epoch: 0: Step: 1/9188, loss=7.311617, lr=0.000000
Train batch 100
Avg. loss per last 100 batches: 5.303617
Epoch: 0: Step: 101/9188, loss=0.816039, lr=0.000007
Train batch 200
Avg. loss per last 100 batches: 1.082258
Epoch: 0: Step: 201/9188, loss=1.185385, lr=0.000013
Train batch 300
Avg. loss per last 100 batches: 1.143644
Epoch: 0: Step: 301/9188, loss=2.163596, lr=0.000020
Train batch 400
Avg. loss per last 100 batches: 1.020407
Epoch: 0: Step: 401/9188, loss=0.395899, lr=0.000020
Train batch 500
Avg. loss per last 100 batches: 0.950857
Epoch: 0: Step: 501/9188, loss=1.575688, lr=0.000020
Train batch 600
Avg. loss per last 100 batches: 0.833716
Epoch: 0: Step: 601/9188, loss=0.066809, lr=0.000020
Train batch 700
Avg. loss per last 100 batches: 0.844092
Epoch: 0: Step: 701/9188, loss=2.120656, lr=0.000020
Train batch 800
Avg. loss per last 100 batches: 0.758489
Epoch: 0: Step: 801/9188, loss=0.041512, lr=0.000020
Train batch 900
Avg. loss per last 100 batches: 0.760439
Epoch: 0: Step: 901/9188, loss=0.701504, lr=0.000020
Train batch 1000
Avg. loss per last 100 batches: 0.725864
Epoch: 0: Step: 1001/9188, loss=0.466521, lr=0.000020
Train batch 1100
Avg. loss per last 100 batches: 0.820713
Epoch: 0: Step: 1101/9188, loss=1.519754, lr=0.000020
Train batch 1200
Avg. loss per last 100 batches: 0.722633
Epoch: 0: Step: 1201/9188, loss=0.267421, lr=0.000020
Train batch 1300
Avg. loss per last 100 batches: 0.803082
Epoch: 0: Step: 1301/9188, loss=0.025564, lr=0.000020
Train batch 1400
Avg. loss per last 100 batches: 0.688916
Epoch: 0: Step: 1401/9188, loss=0.268893, lr=0.000020
Train batch 1500
Avg. loss per last 100 batches: 0.694311
Epoch: 0: Step: 1501/9188, loss=0.423493, lr=0.000020
Train batch 1600
Avg. loss per last 100 batches: 0.733490
Epoch: 0: Step: 1601/9188, loss=0.326050, lr=0.000020
Train batch 1700
Avg. loss per last 100 batches: 0.777383
Epoch: 0: Step: 1701/9188, loss=1.815521, lr=0.000020
Train batch 1800
Avg. loss per last 100 batches: 0.654307
Epoch: 0: Step: 1801/9188, loss=0.704480, lr=0.000020
Train batch 1900
Avg. loss per last 100 batches: 0.791680
Epoch: 0: Step: 1901/9188, loss=0.410724, lr=0.000020
Train batch 2000
Avg. loss per last 100 batches: 0.658655
Epoch: 0: Step: 2001/9188, loss=0.005747, lr=0.000020
Train batch 2100
Avg. loss per last 100 batches: 0.762728
Epoch: 0: Step: 2101/9188, loss=0.768077, lr=0.000020
Train batch 2200
Avg. loss per last 100 batches: 0.724533
Epoch: 0: Step: 2201/9188, loss=0.725896, lr=0.000020
Train batch 2300
Avg. loss per last 100 batches: 0.682972
Epoch: 0: Step: 2301/9188, loss=1.073155, lr=0.000020
Train batch 2400
Avg. loss per last 100 batches: 0.648425
Epoch: 0: Step: 2401/9188, loss=0.473070, lr=0.000020
Train batch 2500
Avg. loss per last 100 batches: 0.625523
Epoch: 0: Step: 2501/9188, loss=0.043014, lr=0.000020
Train batch 2600
Avg. loss per last 100 batches: 0.701965
Epoch: 0: Step: 2601/9188, loss=0.006406, lr=0.000020
Train batch 2700
Avg. loss per last 100 batches: 0.710023
Epoch: 0: Step: 2701/9188, loss=1.481423, lr=0.000020
Train batch 2800
Avg. loss per last 100 batches: 0.562529
Epoch: 0: Step: 2801/9188, loss=0.711672, lr=0.000020
Train batch 2900
Avg. loss per last 100 batches: 0.823689
Epoch: 0: Step: 2901/9188, loss=1.403012, lr=0.000020
Train batch 3000
Avg. loss per last 100 batches: 0.713877
Epoch: 0: Step: 3001/9188, loss=1.028094, lr=0.000020
Train batch 3100
Avg. loss per last 100 batches: 0.655354
Epoch: 0: Step: 3101/9188, loss=0.650727, lr=0.000020
Train batch 3200
Avg. loss per last 100 batches: 0.707570
Epoch: 0: Step: 3201/9188, loss=0.115641, lr=0.000020
Train batch 3300
Avg. loss per last 100 batches: 0.521763
Epoch: 0: Step: 3301/9188, loss=0.057539, lr=0.000020
Train batch 3400
Avg. loss per last 100 batches: 0.611837
Epoch: 0: Step: 3401/9188, loss=0.220680, lr=0.000020
Train batch 3500
Avg. loss per last 100 batches: 0.687215
Epoch: 0: Step: 3501/9188, loss=0.117760, lr=0.000020
Train batch 3600
Avg. loss per last 100 batches: 0.612891
Epoch: 0: Step: 3601/9188, loss=1.465150, lr=0.000020
Train batch 3700
Avg. loss per last 100 batches: 0.850417
Epoch: 0: Step: 3701/9188, loss=0.035678, lr=0.000020
Train batch 3800
Avg. loss per last 100 batches: 0.789871
Epoch: 0: Step: 3801/9188, loss=0.646053, lr=0.000020
Train batch 3900
Avg. loss per last 100 batches: 0.752498
Epoch: 0: Step: 3901/9188, loss=0.282335, lr=0.000020
Train batch 4000
Avg. loss per last 100 batches: 0.567328
Epoch: 0: Step: 4001/9188, loss=0.012028, lr=0.000020
Train batch 4100
Avg. loss per last 100 batches: 0.548741
Epoch: 0: Step: 4101/9188, loss=1.539706, lr=0.000020
Train batch 4200
Avg. loss per last 100 batches: 0.734413
Epoch: 0: Step: 4201/9188, loss=0.000198, lr=0.000020
Train batch 4300
Avg. loss per last 100 batches: 0.548030
Epoch: 0: Step: 4301/9188, loss=2.856502, lr=0.000020
Train batch 4400
Avg. loss per last 100 batches: 0.707106
Epoch: 0: Step: 4401/9188, loss=0.536959, lr=0.000020
Train batch 4500
Avg. loss per last 100 batches: 0.601878
Epoch: 0: Step: 4501/9188, loss=0.015160, lr=0.000020
Train batch 4600
Avg. loss per last 100 batches: 0.766000
Epoch: 0: Step: 4601/9188, loss=0.040841, lr=0.000020
Train batch 4700
Avg. loss per last 100 batches: 0.767216
Epoch: 0: Step: 4701/9188, loss=1.521197, lr=0.000020
Train batch 4800
Avg. loss per last 100 batches: 0.615036
Epoch: 0: Step: 4801/9188, loss=1.145796, lr=0.000020
Train batch 4900
Avg. loss per last 100 batches: 0.671538
Epoch: 0: Step: 4901/9188, loss=2.099149, lr=0.000020
Train batch 5000
Avg. loss per last 100 batches: 0.632023
Epoch: 0: Step: 5001/9188, loss=0.254401, lr=0.000020
Train batch 5100
Avg. loss per last 100 batches: 0.654933
Epoch: 0: Step: 5101/9188, loss=0.479718, lr=0.000020
Train batch 5200
Avg. loss per last 100 batches: 0.542308
Epoch: 0: Step: 5201/9188, loss=0.670710, lr=0.000020
Train batch 5300
Avg. loss per last 100 batches: 0.565748
Epoch: 0: Step: 5301/9188, loss=0.003618, lr=0.000020
Train batch 5400
Avg. loss per last 100 batches: 0.620327
Epoch: 0: Step: 5401/9188, loss=1.348403, lr=0.000020
Train batch 5500
Avg. loss per last 100 batches: 0.600770
Epoch: 0: Step: 5501/9188, loss=0.152233, lr=0.000020
Train batch 5600
Avg. loss per last 100 batches: 0.494991
Epoch: 0: Step: 5601/9188, loss=0.047969, lr=0.000020
Train batch 5700
Avg. loss per last 100 batches: 0.647839
Epoch: 0: Step: 5701/9188, loss=0.137598, lr=0.000020
Train batch 5800
Avg. loss per last 100 batches: 0.566338
Epoch: 0: Step: 5801/9188, loss=0.314973, lr=0.000020
Train batch 5900
Avg. loss per last 100 batches: 0.639788
Epoch: 0: Step: 5901/9188, loss=0.009664, lr=0.000020
Train batch 6000
Avg. loss per last 100 batches: 0.509846
Epoch: 0: Step: 6001/9188, loss=0.239767, lr=0.000020
Train batch 6100
Avg. loss per last 100 batches: 0.629419
Epoch: 0: Step: 6101/9188, loss=0.000416, lr=0.000020
Train batch 6200
Avg. loss per last 100 batches: 0.539567
Epoch: 0: Step: 6201/9188, loss=0.014686, lr=0.000020
Train batch 6300
Avg. loss per last 100 batches: 0.730408
Epoch: 0: Step: 6301/9188, loss=0.434280, lr=0.000020
Train batch 6400
Avg. loss per last 100 batches: 0.466699
Epoch: 0: Step: 6401/9188, loss=0.750414, lr=0.000020
Train batch 6500
Avg. loss per last 100 batches: 0.699479
Epoch: 0: Step: 6501/9188, loss=1.802849, lr=0.000020
Train batch 6600
Avg. loss per last 100 batches: 0.700786
Epoch: 0: Step: 6601/9188, loss=0.286703, lr=0.000020
Train batch 6700
Avg. loss per last 100 batches: 0.851789
Epoch: 0: Step: 6701/9188, loss=0.336051, lr=0.000020
Train batch 6800
Avg. loss per last 100 batches: 0.531913
Epoch: 0: Step: 6801/9188, loss=0.021930, lr=0.000020
Train batch 6900
Avg. loss per last 100 batches: 0.556301
Epoch: 0: Step: 6901/9188, loss=0.018971, lr=0.000020
Train batch 7000
Avg. loss per last 100 batches: 0.567307
Epoch: 0: Step: 7001/9188, loss=2.772715, lr=0.000020
Train batch 7100
Avg. loss per last 100 batches: 0.660842
Epoch: 0: Step: 7101/9188, loss=0.124819, lr=0.000020
Train batch 7200
Avg. loss per last 100 batches: 0.487308
Epoch: 0: Step: 7201/9188, loss=1.259964, lr=0.000020
Train batch 7300
Avg. loss per last 100 batches: 0.732755
Epoch: 0: Step: 7301/9188, loss=0.739729, lr=0.000020
Train batch 7400
Avg. loss per last 100 batches: 0.629111
Epoch: 0: Step: 7401/9188, loss=1.745674, lr=0.000020
Train batch 7500
Avg. loss per last 100 batches: 0.488628
Epoch: 0: Step: 7501/9188, loss=0.006047, lr=0.000020
Train batch 7600
Avg. loss per last 100 batches: 0.533195
Epoch: 0: Step: 7601/9188, loss=0.000862, lr=0.000020
Train batch 7700
Avg. loss per last 100 batches: 0.575115
Epoch: 0: Step: 7701/9188, loss=2.283909, lr=0.000020
Train batch 7800
Avg. loss per last 100 batches: 0.538137
Epoch: 0: Step: 7801/9188, loss=0.102370, lr=0.000020
Train batch 7900
Avg. loss per last 100 batches: 0.648084
Epoch: 0: Step: 7901/9188, loss=0.504715, lr=0.000020
Train batch 8000
Avg. loss per last 100 batches: 0.672259
Epoch: 0: Step: 8001/9188, loss=0.274606, lr=0.000020
Train batch 8100
Avg. loss per last 100 batches: 0.593986
Epoch: 0: Step: 8101/9188, loss=0.006901, lr=0.000020
Train batch 8200
Avg. loss per last 100 batches: 0.586650
Epoch: 0: Step: 8201/9188, loss=3.564882, lr=0.000020
Train batch 8300
Avg. loss per last 100 batches: 0.718681
Epoch: 0: Step: 8301/9188, loss=0.956982, lr=0.000020
Train batch 8400
Avg. loss per last 100 batches: 0.715578
Epoch: 0: Step: 8401/9188, loss=1.803287, lr=0.000020
Train batch 8500
Avg. loss per last 100 batches: 0.649944
Epoch: 0: Step: 8501/9188, loss=0.170287, lr=0.000020
Train batch 8600
Avg. loss per last 100 batches: 0.497341
Epoch: 0: Step: 8601/9188, loss=0.182119, lr=0.000020
Train batch 8700
Avg. loss per last 100 batches: 0.561773
Epoch: 0: Step: 8701/9188, loss=0.013547, lr=0.000020
Train batch 8800
Avg. loss per last 100 batches: 0.665971
Epoch: 0: Step: 8801/9188, loss=0.314558, lr=0.000020
Train batch 8900
Avg. loss per last 100 batches: 0.533789
Epoch: 0: Step: 8901/9188, loss=0.389280, lr=0.000020
Train batch 9000
Avg. loss per last 100 batches: 0.620023
Epoch: 0: Step: 9001/9188, loss=0.113274, lr=0.000020
Train batch 9100
Avg. loss per last 100 batches: 0.567672
Epoch: 0: Step: 9101/9188, loss=0.535228, lr=0.000020
Validation: Epoch: 0 Step: 9188/9188
NLL validation ...
Total cleaned data size: 0
0.0
Traceback (most recent call last):
File "run_xict.py", line 602, in
main()
File "run_xict.py", line 592, in main
trainer.run_train()
File "run_xict.py", line 132, in run_train
self._train_epoch(scheduler, epoch, eval_step, train_iterator)
File "run_xict.py", line 365, in _train_epoch
self.validate_and_save(epoch, train_data_iterator.get_iteration(), scheduler)
File "run_xict.py", line 148, in validate_and_save
validation_loss = self.validate_nll()
File "run_xict.py", line 189, in validate_nll
correct_ratio = float(total_correct_predictions / total_samples)
ZeroDivisionError: division by zero

@khuangaf
Copy link
Owner

khuangaf commented Apr 6, 2023

It looks like data is an empty list because the positive_ctxs field is empty. Most likely there were some mistakes when you ran create_ict_samples.py. Can you check if this line was run properly (i.e. the positive_ctxs field should be assigned a non-empty list) for the dev data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants