#build log
###->Hello! I finally put the triple 3090 build in a case. It was a bit of struggle, and I cut the fuck out of my hands, but I got it done! Quite happy with the result, and it's good piece of mind that a water spill won't fry $3,000 in hardware.<-
***

-> ![](https://i.imgur.com/dbxsKmml.jpg) <-
->The lads<-

-> ![](https://i.imgur.com/k7Vjj9vl.jpg) <-
->The lads pt2<-
***

-> ![](https://i.imgur.com/NEwE9GBl.jpg) <-
->CPU + Mobo in.<-

-> ![](https://i.imgur.com/xxxhCyZl.jpg) <-
->Mobo closeup<-

-> ![](https://i.imgur.com/p34jIN9l.jpg) <-
->Empty Right side.<-
***

-> ![](https://i.imgur.com/fuKzrk7l.jpg) <-
->GPU's in.<-

-> ![](https://i.imgur.com/a1D181Sl.jpg) <-
->GPU's in. Pt2.<-

-> ![](https://i.imgur.com/Zct8HrRl.jpg) <-
->Close up on the vertically mounted GPU.<-
***

-> ![](https://i.imgur.com/8qtUPl1l.jpg) <-
->Finished build<-

-> ![](https://i.imgur.com/cfHLOPDl.jpg) <-
->Finished build pt2<-

***
#info
Specs:
ASRock X670E PG Lightning
Ryzen 5 7600 (CPU inference is cringe)
64GB DDR5 6000mhz
1500w Corsair PSU
Anidees Raider XL
6TB storage
RTX 3090 MSI X Trio
RTX 3090 GIGABYTE OC
RTX 3090 EVGA FTW3
***
Inference information: 
All tests done using TabbyAPI, with cached prompts. (Generate once, then regenerate)
Mixtral-8x7b @ 8bpw | 24 Tokens per second, 13000 context.
goliath-120b @ 4.5bpw | 5.4 Tokens per second, 4096 Context.
Xwin-70b @ 7bpw | 13.1 Tokens per second, 4096 context

I wanted to test the slowdown on splitting a model across GPU's as a control.
Mistral-7b @ fp(Split) | 24 Tokens per second, 13000 context
Mistral-7b @fp(one card) | 41 tokens per second, 13000 context
It would seem there is a signifcant drop in speed. The cards are not NVLinked, and the cards are not all in 16x lanes. Which is probably why the drop off is so massive. It's not a big deal though, almost every model is still at a useable speed. Although if we end up getting good 175b's, I might look into getting a better motherboard with more lanes. 
***
GPU Draw during inference (Goliath): 
![](https://i.imgur.com/FpS2j4Z.png)
What's nice about GPU inference is that it's pretty hard to make your computer and thus your space hot with it. It only flashes power through the cards for a few seconds at a time. Even less so with faster, smaller models like Mixtral. 
The speed is mostly determined by your "main" graphics card, which in my case is the MSI X Trio. I made sure this card was not used for mining, and was refurbished, even if it did cost me an extra $70, it was good peice of mind. The other two... well they are just glorified ram sticks, they aren't doing much compute.