Start-Up DeepSeek Launched A More Efficient AI Model

Posted by Kirhat | Tuesday, December 31, 2024 | | 0 comments »

DeepSeek
A Chinese start-up called DeepSeek has releaseda new large language model (LLM) that is making waves in the global artificial intelligence (AI) industry, as benchmark tests showed that it outperformed rival models from the likes of Meta Platforms and ChatGPT creator OpenAI.

The Hangzhou-based company said in a WeChat post last 26 December that its namesake LLM, DeepSeek V3, comes with 671 billion parameters and trained in around two months at a cost of US$ 5.58 million, using significantly fewer computing resources than models developed by bigger tech firms.

LLM refers to the technology underpinning generative AI services such as ChatGPT. In AI, a high number of parameters is pivotal in enabling an LLM to adapt to more complex data patterns and make precise predictions.

Reacting to the Chinese start-up's technical report on its new AI model, computer scientist Andrej Karpathy - a founding team member at OpenAI - said in a post on social-media platform X: "DeepSeek making it look easy ... with an open weights release of a frontier-grade LLM trained on a joke of a budget."

Open weights refers to releasing only the pretrained parameters, or weights, of an AI model, which allows a third party to use the model for inference and fine-tuning only. The model's training code, original data set, architecture details and training methodology are not provided.

DeepSeek's development of a powerful LLM - at a fraction of the capital outlay that bigger companies like Meta and OpenAI typically invest - shows how far Chinese AI firms have progressed, despite US sanctions that have blocked their access to advanced semiconductors used for training models.

Leveraging new architecture designed to achieve cost-effective training, DeepSeek required just 2.78 million GPU hours - the total amount of time that a graphics processing unit is used to train an LLM - for its V3 model. The start-up's training process used Nvidia's China-tailored H800 GPUs.

That process was substantially less than the 30.8 million GPU hours that Facebook parent Meta needed to train its Llama 3.1 model on Nvidia's more advanced H100 chips, which are not allowed to be exported to China.

"DeepSeek V3 looks to be a stronger model at only 2.8 million GPU hours," Karpathy wrote in his X post.

0 comments

Post a Comment