What is DeepSeek AI and Why It Matters
DeepSeek is a collection of cutting-edge AI models developed by a Chinese startup. It's making waves in the AI community due to its competitive performance compared to models like those from OpenAI, all while boasting significantly lower training and inference costs.
This article delves into the technological, economic, and geopolitical implications of DeepSeek, including a crucial discussion on safety, especially for users considering its use with sensitive data.
Is DeepSeek Safe to Use? Guidance for Notre Dame Users
For Notre Dame faculty and staff, prioritize approved AI tools like Google Gemini, accessible via the Approved AI Tools page. AI Enablement offers access to various AI models via AWS, vetted for security and legal compliance.
Given the interest in DeepSeek, it's crucial to understand the distinction between DeepSeek-controlled services and the open-source DeepSeek models. Consider the model the 'engine' and the interface (chatbot) the 'car.' This guidance, developed with OIT Information Security, focuses on safely accessing the engine.
Currently, directly accessing DeepSeek's web or mobile services is not recommended due to security concerns. The API is also not approved for campus use. Explore safer alternatives below.
Approved Safe Ways to Use DeepSeek
Chat Through US-Based Providers (Public Data Only): Use DeepSeek via domestic chat services like Perplexity with public data. These providers manage the infrastructure and security.
Programmer Options: Local Open Source Model Use: Download DeepSeek models from Hugging Face and run them locally using tools like Ollama. Limit use to devices with restricted internet access and avoid using the model in end-user services.
API Access through AWS Bedrock: Programmers and researchers can access DeepSeek via AWS Bedrock. AWS ensures data privacy for models run through its platform. Contact AI Enablement for access.
Currently, there are no approved non-programmer options for using DeepSeek with non-public data. Refer to Notre Dame's data sensitivity classifications for more information.
For a detailed discussion on DeepSeek's security implications, listen to the latest episode of the Practical AI podcast.
“DeepSeek represents a tremendous breakthrough in training efficiency.
AI Expert
The Secret DeepSeek's Efficiency: Training and Inference Costs
AI model costs break down into training costs (one-time) and runtime 'inference' costs. DeepSeek has drastically reduced both compared to US-made models.
Training costs are reported to be under $6 million, compared to the $100 million for ChatGPT-4o. Inference costs are approximately 1/50th of Anthropic's Claude 3.5 Sonnet.
DeepSeek claims to have used older NVIDIA chips for training, despite export control laws limiting the sale of high-powered chips to China. While the exact costs and hardware remain debated, DeepSeek represents a significant breakthrough in training efficiency.
DeepSeek's 'mixture of experts' architecture enables chat-time efficiency. This approach uses specialized models, requiring less 'brainpower' per query, thus saving compute and energy costs.
Explore DeepSeek Further
Dive deeper into the world of DeepSeek and AI innovation.
Hugging Face Models
Download and experiment with DeepSeek models on Hugging Face.
Practical AI Podcast
Listen to a detailed discussion on DeepSeek's security and implications.
Synthetic Data and Open Source Benefits
OpenAI accused DeepSeek of using data from its models for training. DeepSeek disclosed using training data from OpenAI's o1 'reasoning' model, demonstrating the power of synthetic training data.
Instead of relying solely on human-created text, DeepSeek used o1 to generate 'thinking' scripts for training. This raises questions about the true cost of DeepSeek's development.
DeepSeek's open-source approach is remarkable. By publishing methods and making models freely available, it encourages global collaboration and innovation.
The open-source nature also allows for inspection and derivation of new models. A Hong Kong team fine-tuned Alibaba Cloud's Qwen model using DeepSeek's approach, achieving similar results with fewer resources.
“DeepSeek is working completely in the open, publishing their methodology in detail.
AI Researcher
Impact US Companies, AI Investments, and NVIDIA's Stock
DeepSeek's efficiency sent shockwaves through US AI companies, with NVIDIA's stock taking a hit.
This raises questions about the necessity of massive AI infrastructure investments like Project Stargate, a $500 billion project involving OpenAI, Oracle, SoftBank, and MGX. If state-of-the-art AI can be achieved with fewer resources, is such spending necessary?