#
Tensor Splicing
My 2018 project at a pre-IPO Uber was the first-ever adopter of Michelangelo thanks to the good people at the Rider-Pricing-ML Team to receive this opportunity.
While working on a large AI model, and trying to figure out how to predict beta-elasiticity estimates of rider to price to population-hexagons, I noticed a large inordinate time spent during the model training even for small datasets during model weight updates.
On deeper inspection, found that at specific time-frequency the gpu consumption would go down, and ram would go up. This meant something was being loaded into memory, and this action likely had little to do with actual model training.
The answer lay in how the model weights were being updated: Tensor Splicing versus tensor-rows for model weight updates. While the mathematical action was identical, one involved loading entire weight-set in memory before the relevant variables could be accessed or edited, and in the other case, a precise segment is fetched with the idential action.
The runtime of model update step per epoch was reduced from ~20 minutes to 4 minutes. This was because a single section of code had used the alternate approach of model updates, despite performance being exactly the same since mathematical operations were unaffected.