Speaker - Devoxx UK

Fabian Klemm

TNG Technology Consulting GmbH

Fabian Klemm completed his doctorate at the TU Munich in the field of discrete optimization and applied geometry. In 2020 and joining TNG as a software consultant. Besides a lot of DevOps experience, Fabian has a developed a greater interest in AI and particular LLM topics. Fabian contributes to the TNG Skainet team that operates the TNG internal AI server rack and since 2024 Fabian has been working in the internal TNG AI research team that is responsible for the successful TNG DeepSeek Chimera models.

View

Tales from the Machinery Room - Customizing LLMs

Conference (INTERMEDIATE level)

Deepseek Models Experts Chimera

"The Germans just Frankensteined DeepSeek's R1 and V3 into something called R1T Chimera".

Way beyond this X-post, the R1T and R1T2 Chimera models published by TNG gained severe attention with a daily usage of more than 10 billion tokens via OpenRouter. So, what kind of "Frankensteining" is going on there? How can a small software consultancy such as TNG produce its own models?

Our internal research team has been experimenting and publishing results with a focus on Mixture-of-Expert Large Language Models and adaptions of the DeepSeek model family. First, we manipulated the way experts work within a model under the name "Mixture of Tunable Experts". We then continued with the "Assembly-of-Experts" merging process resulting in the successful Chimera models.

This talk aims to give some technical insights into the work of TNG AI Research. We recall the theoretical basics and our most important results. We provide technical details on how things were done as well as some anecdotes about the successes and losses along this journey.

Searching for speaker images...