๐ #OpenSource #LLM Teuken-7B has been officially launched and is available for download ๐ This milestone is part of the OpenGPT-X initiative, dedicated to the creation of large AI language models “Made in Germany” ๐ฉ๐ช, designed for both business and research needs. Even with limited resources - just 18% of the computing power used to train models like Meta’s Llama3 8B - Teuken-7B comes up with some interesting features ๐ฏ
Hereโs what makes Teuken-7B stand out:
๐ Key Features:
โข Multilingual & Open Source: Built to support all 24 official EU languages, emphasizing European linguistic diversity. ๐
โข Trustworthy & Versatile: Tailored to Europe’s wide range of cultures and businesses, with a strong focus on openness and community collaboration. ๐ค
โ๏ธ Technical Innovations:
โข Custom Multilingual Tokenizer: Specifically optimized for European languages for enhanced efficiency and performance. Might help also to include other languages into LLMs.
โข Efficient training: Developed using just 18% of the computing power required for models such as Meta’s Llama3 8B, and less than 1% for larger models such as Llama3 405B - making this an interesting approach for resource-constrained environments and lower energy consumption for training. ๐๐ก
๐ Training Data:
โข Over 50% non-English content, with training content also in languages like Maltese.
โข Own benchmarks for multilingualism and achieving comparable quality of output in all supported languages.
With Teuken-7B, it seems that Europe is showing some degree of resilience in the context of a geostrategic competition for influence in the emerging AI market. ๐
For more details, check out the project:
๐ Teuken-7B: opengpt-x.de/en/models…
#AI #OpenSource #Teuken7B #Innovation #MultilingualAI #OpenGPTX #MachineLearning #EuropeanTech ๐ป๐