Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you know how to calculate the parameter volume suitable for specific GPU size? #4

Open
whuhxb opened this issue Sep 6, 2022 · 0 comments

Comments

@whuhxb
Copy link

whuhxb commented Sep 6, 2022

Hi @yanx27 @BIRlz

Do you know how to calculate the parameter volume suitable for specific GPU size? When I revise the model with batch size 12, using 4 32G GPU cards, there is an OOM error. Before revise the model, 1 32G GPU card could run the model with batch size 12. Thanks.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[70395,26,128,1] and type float on /job:localhost/replica:0/task:0/device:GPU:3 by allocator GPU_3_bfc
[[node gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/Mul (defined at /export/home//SensatUrban_sol_tf_GPU_90_batch_12/models/local_aggregation_operators.py:453) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:3"](gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/ExpandDims_1, gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/Reshape)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node Mean_156/_8673}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_37211_Mean_156", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Best wishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant