#build log ###->Hello! I finally put the triple 3090 build in a case. It was a bit of struggle, and I cut the fuck out of my hands, but I got it done! Quite happy with the result, and it's good piece of mind that a water spill won't fry $3,000 in hardware.<- *** -> ![](https://i.imgur.com/dbxsKmml.jpg) <- ->The lads<- -> ![](https://i.imgur.com/k7Vjj9vl.jpg) <- ->The lads pt2<- *** -> ![](https://i.imgur.com/NEwE9GBl.jpg) <- ->CPU + Mobo in.<- -> ![](https://i.imgur.com/xxxhCyZl.jpg) <- ->Mobo closeup<- -> ![](https://i.imgur.com/p34jIN9l.jpg) <- ->Empty Right side.<- *** -> ![](https://i.imgur.com/fuKzrk7l.jpg) <- ->GPU's in.<- -> ![](https://i.imgur.com/a1D181Sl.jpg) <- ->GPU's in. Pt2.<- -> ![](https://i.imgur.com/Zct8HrRl.jpg) <- ->Close up on the vertically mounted GPU.<- *** -> ![](https://i.imgur.com/8qtUPl1l.jpg) <- ->Finished build<- -> ![](https://i.imgur.com/cfHLOPDl.jpg) <- ->Finished build pt2<- *** #info Specs: ASRock X670E PG Lightning Ryzen 5 7600 (CPU inference is cringe) 64GB DDR5 6000mhz 1500w Corsair PSU Anidees Raider XL 6TB storage RTX 3090 MSI X Trio RTX 3090 GIGABYTE OC RTX 3090 EVGA FTW3 *** Inference information: All tests done using TabbyAPI, with cached prompts. (Generate once, then regenerate) Mixtral-8x7b @ 8bpw | 24 Tokens per second, 13000 context. goliath-120b @ 4.5bpw | 5.4 Tokens per second, 4096 Context. Xwin-70b @ 7bpw | 13.1 Tokens per second, 4096 context I wanted to test the slowdown on splitting a model across GPU's as a control. Mistral-7b @ fp(Split) | 24 Tokens per second, 13000 context Mistral-7b @fp(one card) | 41 tokens per second, 13000 context It would seem there is a signifcant drop in speed. The cards are not NVLinked, and the cards are not all in 16x lanes. Which is probably why the drop off is so massive. It's not a big deal though, almost every model is still at a useable speed. Although if we end up getting good 175b's, I might look into getting a better motherboard with more lanes. *** GPU Draw during inference (Goliath): ![](https://i.imgur.com/FpS2j4Z.png) What's nice about GPU inference is that it's pretty hard to make your computer and thus your space hot with it. It only flashes power through the cards for a few seconds at a time. Even less so with faster, smaller models like Mixtral. The speed is mostly determined by your "main" graphics card, which in my case is the MSI X Trio. I made sure this card was not used for mining, and was refurbished, even if it did cost me an extra $70, it was good peice of mind. The other two... well they are just glorified ram sticks, they aren't doing much compute.