Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubspot cell balancer & normalizer #126

Open
wants to merge 134 commits into
base: hubspot-2.5
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 126 commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
9c62045
HubSpotCellCostFunction
Oct 16, 2024
e4f5a14
Adjust to handle little endian cell encoding
Oct 17, 2024
aad121f
Mark as private
Oct 17, 2024
9b5002b
Revert to big endian, simplify heuristics
Oct 17, 2024
9a954dd
Fix NPE, add logging, run spotless
Oct 29, 2024
6271c26
Clean up
Oct 29, 2024
995b8cb
Add init debug
Oct 30, 2024
d94d862
Clarify expectations via preconditions
Oct 30, 2024
8202674
Update debug and add guard for non default tables
Oct 30, 2024
f83dc2e
Emit setup at info level to ensure we see it
Oct 30, 2024
8c6c48c
Add info state dump on every cost calc call
Oct 30, 2024
ab52ea6
Add some debug so we can see why regionlocation would be null
Oct 31, 2024
0ac73bb
emit if we disable locationfinder
Oct 31, 2024
275ba6b
Ensure the region finder is set if the cell cost function exists
Oct 31, 2024
1f58743
Emit the multiplier
Oct 31, 2024
57205da
Missed one spot
Oct 31, 2024
a9e1547
Fix debug
Oct 31, 2024
b59c17c
skip any that snuck in, emit better logs, and fail more obviously here
Oct 31, 2024
840496d
include count w/o servers
Oct 31, 2024
d7081eb
Make it legible
Oct 31, 2024
856e440
list details of the unknown region
Oct 31, 2024
7df5fc2
Emit which table
Oct 31, 2024
a1d849f
Skip if empty region server mapping, assume it's empty for now
Oct 31, 2024
45c8182
Emit the cells in the region here
Oct 31, 2024
1a67f81
Tell us about which cells this region holds
Oct 31, 2024
5328064
Add emission for region size
Oct 31, 2024
4490e1b
Make clear if we skip any non-empty regions
Oct 31, 2024
580e31c
Include this
Oct 31, 2024
ba831d9
If the first two bytes of start/stop are the same, the region holds e…
Oct 31, 2024
cdd6e77
Correct how we calculate the cells
Oct 31, 2024
426aeca
add a version identifier here
Nov 5, 2024
4a0a7fa
This isn't really an edge case so much as the main case
Nov 5, 2024
867f492
Update logging & calcs
Nov 5, 2024
6120bc8
Use shaded version
Nov 5, 2024
b3a664b
emit the table name and namespace
Nov 5, 2024
a3b9b88
Switch to multidimensional array to reduce allocations
Nov 5, 2024
45ed606
Optimize the balancer eval function
Nov 6, 2024
941d2d2
Deps
Nov 6, 2024
0123942
Add custom step generator for stochastic load to prioritize shuffling…
Nov 6, 2024
43fb3da
Use our own candidate generator for the stochastic balancer
Nov 6, 2024
3d45b4d
Yep that sure is a 5
Nov 6, 2024
cf1064d
It has to be the order of the ordinal of course
Nov 6, 2024
04fecfe
Correct reservoir sampling seed, and use boolean[] instead of set
Nov 7, 2024
35adc46
Cost is invoked 2-3 times per use, memoize it
Nov 7, 2024
3a744ab
Filter out non-default regions
Nov 7, 2024
fe67705
Prevent being out of bounds
Nov 7, 2024
1d9b7ef
Correct off-by-1
Nov 7, 2024
0ac31f5
Add guards here
Nov 7, 2024
06c2ad5
Include the tables
Nov 7, 2024
d25a9ac
Do not emit null actions
Nov 7, 2024
235fe34
only use these balancer tools on objects-3
Nov 7, 2024
3517598
This is a bug - only add the cost of the function if it's needed
Nov 7, 2024
c4c6296
Add a lot of trace logging
Nov 7, 2024
1b1bc44
More logging
Nov 7, 2024
6ac45ba
Trace enough to figure out why cell count is 0
Nov 7, 2024
583aca9
Fix the subtle array access bug
Nov 7, 2024
26b4f21
Undo memoization on cluster state change, and allow to trace teh bala…
Nov 8, 2024
bbed188
Also emit the full cost breakdown per step
Nov 8, 2024
dbe3263
Rework the cost function to be the number of cells (over all servers)…
Nov 8, 2024
41a0b41
Update debug to focus on which region/cells are getting picked
Nov 8, 2024
1873181
Tweak down to trace
Nov 8, 2024
adf28a6
Rework how the cost function calculates and updates cost
Nov 8, 2024
06e6b83
Fix edge case for short rowkeys
Nov 8, 2024
e270d6a
Merge pull request #120 from HubSpot/isolate-generator-cost-mismatch
szabowexler Nov 8, 2024
4f61691
Add debug and fix the state error here
Nov 8, 2024
c6a84ec
No noop
Nov 8, 2024
f40e83b
Print which generator we've selected
Nov 8, 2024
7011733
Tweak logs to allow for local run
Nov 18, 2024
bd83602
use shaded version
Nov 18, 2024
5d69edc
Use gson
Nov 18, 2024
d8cef32
Try exposing only specific fields
Nov 18, 2024
e1f1da1
Only emit objects-3, and include the full region info
Nov 18, 2024
eb52446
Mark as exposed
Nov 19, 2024
eca1dfe
Refine when we print, and what
Nov 19, 2024
d68f91a
Update serde for int2int map so we can run the balancer locally
Nov 19, 2024
aad1e1b
Stash partial balancer rework
Nov 20, 2024
5a1dc79
Stash2 -- gets to a balance of 1-6 cells/RS
Nov 20, 2024
9199cd8
First cleanup
Nov 21, 2024
a986195
Disable automatic logging for local runs
Nov 21, 2024
0677ff3
Merge pull request #122 from HubSpot/rework-cell-balancer
szabowexler Nov 21, 2024
258caf9
Fix test
Nov 21, 2024
e95ef44
Stash work
Nov 26, 2024
221a8c4
cost is actually how far we are from having as many cells as possible
Nov 26, 2024
20cbb95
add custom normalizer
Nov 26, 2024
9eedc86
Revert "add custom normalizer"
Nov 26, 2024
3526567
add hubspot normalizer
Nov 26, 2024
83dad7f
Prioritize spreading cells out
Nov 26, 2024
378ad33
Not for inclusion
Nov 26, 2024
faf41c9
Merge pull request #124 from HubSpot/cell-spread-out
szabowexler Nov 27, 2024
c0895ba
Extract static methods, simplify
Dec 2, 2024
5ba25ae
Clean up the candidate generator
Dec 2, 2024
3bd6efb
Elevate to higher package so normalizer can share common cell ops
Dec 2, 2024
61b89c5
Clean up + normalize cost
Dec 2, 2024
66b2adf
Update the normalizer to avoid merging across cell lines
Dec 2, 2024
ddb3017
Mark addition
Dec 2, 2024
bf9fe28
Finish clean up
Dec 2, 2024
6044fc7
Fix import
Dec 2, 2024
edc89ff
Print error if cell id is out of bounds
Dec 2, 2024
6a7511b
Move this up
Dec 2, 2024
ff1ef1f
Improve debug output
Dec 2, 2024
e0695c0
Cap max cells per RS to be 10% of all cells
Dec 3, 2024
e6a9d7c
Do not install unless multiplier is positive
Dec 3, 2024
7b69247
Target a specific capped cell count
Dec 4, 2024
9a41c05
Merge pull request #127 from HubSpot/target-specific-cell-count
szabowexler Dec 4, 2024
6b2df75
Simplify when we fill underloaded
Dec 4, 2024
4cae482
include target
Dec 4, 2024
28b0271
Emit which generator
Dec 4, 2024
f8085d5
randomize the under-/overloaded server picked
Dec 4, 2024
e8361df
Mark if we keep or reject
Dec 4, 2024
ca45b36
Print region counts
Dec 4, 2024
b8ac383
Prioritize balance by region and THEN evening out cell isolation
Dec 4, 2024
ce825c9
Merge pull request #128 from HubSpot/add-unbalance-dominating-factor
szabowexler Dec 4, 2024
5a42bb3
Add guard in case of error computing online cost, and do a deep reset…
Dec 5, 2024
b685173
Clean MutableRegionInfo
Dec 5, 2024
cf40b93
Clean up ServerName
Dec 5, 2024
b7f108a
Clean up TableName
Dec 5, 2024
2a22aea
Clean up Address
Dec 5, 2024
d87742d
Clean up BalancerClusterState
Dec 5, 2024
0f67a71
Clean up RegionLocationFinder
Dec 5, 2024
b425e9a
Clean up StochasticLoadBalancer
Dec 5, 2024
d8e9f9a
More cleanup StochasticLoadBalancer
Dec 5, 2024
2337b4c
Clean up RegionNormalizerFactory
Dec 5, 2024
d5b2b9a
Merge pull request #129 from HubSpot/clean-up-for-merge
szabowexler Dec 5, 2024
6caeb48
clean imports
Dec 5, 2024
e865850
style
Dec 5, 2024
f8d48d9
Style
Dec 5, 2024
70bc20f
Small clusters may not have enough regions/cell to support lower isol…
Dec 6, 2024
36d4fd2
Merge pull request #130 from HubSpot/handle-small-clusters
szabowexler Dec 6, 2024
f0cf9ff
Fix which Ints
Dec 6, 2024
71ecc2b
Emit the cluster state at the end of balance
Dec 16, 2024
da0834e
Revert "Remove all the debugging changes, generally make ready for re…
szabowexler Dec 16, 2024
59b41c2
Merge pull request #133 from HubSpot/revert-129-clean-up-for-merge
szabowexler Dec 16, 2024
25e0cf9
Measure this distance by region count from balanced
Dec 18, 2024
c2fde1d
Set target to 20% of cells
Jan 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
package org.apache.hadoop.hbase.hubspot;

import org.agrona.collections.Int2IntCounterMap;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is agrona? It's already on the master classpath?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure - I used this after seeing it in the balancer cluster state representation. It seems to be a fairly efficient counter map

import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.RegionInfo;
import org.apache.hadoop.hbase.client.RegionInfoBuilder;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hbase.thirdparty.com.google.common.collect.ImmutableSet;
import org.apache.hbase.thirdparty.com.google.common.primitives.Shorts;
import org.apache.hbase.thirdparty.com.google.gson.ExclusionStrategy;
import org.apache.hbase.thirdparty.com.google.gson.FieldAttributes;
import org.apache.hbase.thirdparty.com.google.gson.Gson;
import org.apache.hbase.thirdparty.com.google.gson.GsonBuilder;
import org.apache.hbase.thirdparty.com.google.gson.JsonArray;
import org.apache.hbase.thirdparty.com.google.gson.JsonDeserializationContext;
import org.apache.hbase.thirdparty.com.google.gson.JsonDeserializer;
import org.apache.hbase.thirdparty.com.google.gson.JsonElement;
import org.apache.hbase.thirdparty.com.google.gson.JsonObject;
import org.apache.hbase.thirdparty.com.google.gson.JsonParseException;
import org.apache.hbase.thirdparty.com.google.gson.JsonSerializationContext;
import org.apache.hbase.thirdparty.com.google.gson.JsonSerializer;
import org.apache.yetus.audience.InterfaceAudience;
import java.lang.reflect.Field;
import java.lang.reflect.Type;
import java.util.Arrays;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

@InterfaceAudience.Private
public final class HubSpotCellUtilities {
// TODO: this should be dynamically configured, not hard-coded, but this dramatically simplifies the initial version
public static final short MAX_CELL_COUNT = 360;
public static final int MAX_CELLS_PER_RS = 36;

public static final Gson OBJECT_MAPPER = new GsonBuilder()
.excludeFieldsWithoutExposeAnnotation()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Int2IntCounterMap.class, new Int2IntCounterMapAdapter())
.registerTypeAdapter(RegionInfo.class, (JsonDeserializer) (json, typeOfT, context) -> {
JsonObject obj = json.getAsJsonObject();

boolean split = obj.get("split").getAsBoolean();
long regionId = obj.get("regionId").getAsLong();
int replicaId = obj.get("replicaId").getAsInt();
JsonObject tableName = obj.get("tableName").getAsJsonObject();
JsonArray startKey = obj.get("startKey").getAsJsonArray();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An array of bytes? Wouldn't this be better as a Base64 encoded string or some such?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you may want to extend/use/copy the existing Json configuration produced by GsonFactory ; it provides byte[] serialization.

JsonArray endKey = obj.get("endKey").getAsJsonArray();

byte[] startKeyBytes = new byte[startKey.size()];
byte[] endKeyBytes = new byte[endKey.size()];

for (int i = 0; i < startKey.size(); i++) {
startKeyBytes[i] = startKey.get(i).getAsByte();
}
for (int i = 0; i < endKey.size(); i++) {
endKeyBytes[i] = endKey.get(i).getAsByte();
}

TableName tb = TableName.valueOf(
tableName.get("namespaceAsString").getAsString(),
tableName.get("qualifierAsString").getAsString()
);

RegionInfo result =
RegionInfoBuilder.newBuilder(tb).setSplit(split).setRegionId(regionId)
.setReplicaId(replicaId).setStartKey(startKeyBytes).setEndKey(endKeyBytes).build();
return result;
})
.addDeserializationExclusionStrategy(new ExclusionStrategy() {
@Override public boolean shouldSkipField(FieldAttributes f) {
return f.getName().equals("serversToIndex")
|| f.getName().equals("regionsToIndex")
|| f.getName().equals("clusterState")
;
}

@Override public boolean shouldSkipClass(Class<?> clazz) {
return false;
}
})
.create();

public static final ImmutableSet<String> CELL_AWARE_TABLES = ImmutableSet.of("objects-3");

private HubSpotCellUtilities() {}

public static String toCellSetString(Set<Short> cells) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a name conflict -- HBase already uses the term "Cell" extensively, to mean a key-value pair with all of its trimmings. If you want this code to colocate in HBase, I recommend a name like "tenant partition" or something like that.

return cells.stream().sorted().map(x -> Short.toString(x)).collect(Collectors.joining(", ", "{", "}"));
}

public static boolean isStopInclusive(byte[] endKey) {
return (endKey == null || endKey.length != 2) && (endKey == null || endKey.length <= 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with this magic number 2 ? Oh, your cell prefix is byte[2]. Maybe put this in a named constant?

|| !areSubsequentBytesAllZero(endKey, 2));
}

public static short calcNumCells(RegionInfo[] regionInfos, short totalCellCount) {
if (regionInfos == null || regionInfos.length == 0) {
return 0;
}

Set<Short> cellsInRegions = Arrays.stream(regionInfos)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I wonder if it'll be easier to represent the cells as a fixed-length BitSet.

.map(region -> toCells(region.getStartKey(), region.getEndKey(), totalCellCount))
.flatMap(Set::stream).collect(Collectors.toSet());
return Shorts.checkedCast(cellsInRegions.size());
}

public static Set<Short> toCells(byte[] rawStart, byte[] rawStop, short numCells) {
return range(padToTwoBytes(rawStart, (byte) 0), padToTwoBytes(rawStop, (byte) -1), numCells);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've assumed no change to the rowkey? Or, how do you know that the first two bytes are the cell? You've pre-migrated the schema and hard-coded cell assignments to tenant id?

}

public static byte[] padToTwoBytes(byte[] key, byte pad) {
if (key == null || key.length == 0) {
return new byte[] { pad, pad };
}

if (key.length == 1) {
return new byte[] { pad, key[0] };
}

return key;
}

public static Set<Short> range(byte[] start, byte[] stop) {
return range(start, stop, MAX_CELL_COUNT);
}

public static Set<Short> range(byte[] start, byte[] stop, short numCells) {
short stopCellId = toCell(stop, (byte) -1, (short) (numCells - 1));
if (stopCellId < 0 || stopCellId > numCells) {
stopCellId = numCells;
}
short startCellId = toCell(start, (byte) 0, (short) 0);

if (startCellId == stopCellId) {
return ImmutableSet.of(startCellId);
}

boolean isStopExclusive = areSubsequentBytesAllZero(stop, 2);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a unit test over this stop cell exclusivity stuff.

I assume that you've seen that the RegionInfo#endKey is exclusive.


final IntStream cellStream;
if (isStopExclusive) {
cellStream = IntStream.range(startCellId, stopCellId);
} else {
int stopCellIdForcedToIncludeStart = Math.max(stopCellId, startCellId + 1);
cellStream = IntStream.rangeClosed(startCellId, stopCellIdForcedToIncludeStart);
}

return cellStream.mapToObj(val -> (short) val).collect(Collectors.toSet());
}

private static boolean areSubsequentBytesAllZero(byte[] stop, int offset) {
for (int i = offset; i < stop.length; i++) {
if (stop[i] != (byte) 0) {
return false;
}
}
return true;
}

private static short toCell(byte[] key, byte pad, short ifAbsent) {
if (key == null) {
throw new IllegalArgumentException(
"Key must be nonnull");
}

return key.length == 0
? ifAbsent
: (key.length >= 2
? Bytes.toShort(key, 0, 2)
: Bytes.toShort(new byte[] { pad, key[0] }));
}

static class Int2IntCounterMapAdapter implements JsonSerializer<Int2IntCounterMap>,
JsonDeserializer<Int2IntCounterMap> {
@Override public JsonElement serialize(Int2IntCounterMap src, Type typeOfSrc,
JsonSerializationContext context) {
JsonObject obj = new JsonObject();

obj.addProperty("loadFactor", src.loadFactor());
obj.addProperty("initialValue", src.initialValue());
obj.addProperty("resizeThreshold", src.resizeThreshold());
obj.addProperty("size", src.size());

Field entryField = null;
try {
entryField = Int2IntCounterMap.class.getDeclaredField("entries");
} catch (NoSuchFieldException e) {
throw new RuntimeException(e);
}
entryField.setAccessible(true);
int[] entries = null;
try {
entries = (int[]) entryField.get(src);
} catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
JsonArray entryArray = new JsonArray(entries.length);
for (int entry : entries) {
entryArray.add(entry);
}
obj.add("entries", entryArray);

return obj;
}

@Override public Int2IntCounterMap deserialize(JsonElement json, Type typeOfT,
JsonDeserializationContext context) throws JsonParseException {
JsonObject obj = json.getAsJsonObject();

float loadFactor = obj.get("loadFactor").getAsFloat();
int initialValue = obj.get("initialValue").getAsInt();
int resizeThreshold = obj.get("resizeThreshold").getAsInt();
int size = obj.get("size").getAsInt();

JsonArray entryArray = obj.get("entries").getAsJsonArray();
int[] entries = new int[entryArray.size()];

for (int i = 0; i < entryArray.size(); i++) {
entries[i] = entryArray.get(i).getAsInt();
}

Int2IntCounterMap result = new Int2IntCounterMap(0, loadFactor, initialValue);

Field resizeThresholdField = null;
Field entryField = null;
Field sizeField = null;

try {
resizeThresholdField = Int2IntCounterMap.class.getDeclaredField("resizeThreshold");
entryField = Int2IntCounterMap.class.getDeclaredField("entries");
sizeField = Int2IntCounterMap.class.getDeclaredField("size");
} catch (NoSuchFieldException e) {
throw new RuntimeException(e);
}

resizeThresholdField.setAccessible(true);
entryField.setAccessible(true);
sizeField.setAccessible(true);

try {
resizeThresholdField.set(result, resizeThreshold);
entryField.set(result, entries);
sizeField.set(result, size);
} catch (IllegalAccessException e) {
throw new RuntimeException(e);
}

return result;
}
}
}
Loading