You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do you know how to calculate the parameter volume suitable for specific GPU size? When I revise the model with batch size 12, using 4 32G GPU cards, there is an OOM error. Before revise the model, 1 32G GPU card could run the model with batch size 12. Thanks.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[70395,26,128,1] and type float on /job:localhost/replica:0/task:0/device:GPU:3 by allocator GPU_3_bfc
[[node gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/Mul (defined at /export/home//SensatUrban_sol_tf_GPU_90_batch_12/models/local_aggregation_operators.py:453) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:3"](gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/ExpandDims_1, gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/Reshape)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Best wishes.
The text was updated successfully, but these errors were encountered:
Hi @yanx27 @BIRlz
Do you know how to calculate the parameter volume suitable for specific GPU size? When I revise the model with batch size 12, using 4 32G GPU cards, there is an OOM error. Before revise the model, 1 32G GPU card could run the model with batch size 12. Thanks.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[70395,26,128,1] and type float on /job:localhost/replica:0/task:0/device:GPU:3 by allocator GPU_3_bfc
[[node gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/Mul (defined at /export/home//SensatUrban_sol_tf_GPU_90_batch_12/models/local_aggregation_operators.py:453) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:3"](gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/ExpandDims_1, gpu_3/SceneSegModel/resnext_backbone/res1_bottleneck_resnext0/split_layer/transform_layer_1/conv2/local_aggregation_card_1/Reshape)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Best wishes.
The text was updated successfully, but these errors were encountered: