Back to List
Norway's National Library Leverages 2 Petabytes of Huawei Flash Storage for Sovereign Norwegian LLM Development
Industry NewsSovereign AIHuaweiNorway

Norway's National Library Leverages 2 Petabytes of Huawei Flash Storage for Sovereign Norwegian LLM Development

Norway’s National Library (Nasjonlbiblioteket) is developing a sovereign Large Language Model (LLM) specifically designed to understand the Norwegian language, culture, and history. To support this massive AI training data pipeline, the library has implemented 2 petabytes of Huawei OceanStor Dorado flash storage. Marius Husnes, Head of IT Platform, highlighted the necessity of this project at the Huawei ID Forum 2026, noting that global, English-centric LLMs lack the local context required for national linguistic sovereignty. With a digital archive spanning 20 petabytes of unique data—including books, newspapers, and web content—the library is uniquely positioned to train a model using copyrighted materials through exclusive agreements. This initiative underscores a growing trend of nations seeking to preserve their cultural heritage through localized artificial intelligence infrastructure.

Hacker News

Key Takeaways

  • Sovereign AI Development: Norway is building its own Large Language Model (LLM) to ensure national history, news, and culture are accurately represented, addressing the gaps left by global English-speaking models.
  • High-Performance Infrastructure: The project utilizes 2 petabytes of Huawei OceanStor Dorado flash storage to manage the intensive AI training data pipeline.
  • Massive Data Repository: The National Library possesses a 20 PB unique digital archive (60 PB total under a 3-2-1 storage strategy), including books, broadcasts, and web content digitized since 2005.
  • Exclusive Data Access: Through a legal deposit mandate and specific agreements with newspapers, the library has access to copyrighted content for AI training that private companies do not possess.

In-Depth Analysis

The Case for Linguistic and Cultural Sovereignty

At the Huawei ID Forum 2026 in Paris, Marius Husnes, the Head of IT Platform at Norway’s National Library, articulated a critical challenge facing non-English speaking nations: the lack of localized Large Language Models. Husnes argued that any country possessing its own language is at a distinct disadvantage if it relies solely on globally trained, English-centric LLMs. These commercial models often lack the depth of knowledge regarding a specific country’s history, contemporary news, and cultural nuances that are primarily documented in the local tongue.

To bridge this gap, Norway’s Ministry of Culture tasked the National Library with the creation of a sovereign LLM. The library is the ideal candidate for this task as it houses the single largest digital collection of Norwegian-language materials in existence. By developing a model in-house, Norway aims to ensure that its AI tools are deeply rooted in the nation's specific linguistic and cultural context, rather than being filtered through the lens of a foreign-trained algorithm.

Data Infrastructure and the 60 Petabyte Archive

The scale of the data involved in this project is significant. Since 2005, the National Library has been digitizing its vast collection, amassing 20 petabytes of unique data. To ensure the safety and longevity of this cultural heritage, the library employs a 3-2-1 storage strategy—maintaining three copies of the data across two different media types, with one copy stored off-site. This results in a total storage footprint of approximately 60 petabytes.

For the specific requirements of the AI training data pipeline, the library has integrated 2 petabytes of Huawei OceanStor Dorado flash storage. This high-performance storage is essential for handling the rapid data access and processing speeds required for LLM training. The data pipeline involves complex processes, including extensive OCR (Optical Character Recognition) scanning of raw text, sound, moving pictures, and still images. This process generates significant metadata and supports APIs for online access, transforming a preservation archive into a dynamic training set for artificial intelligence.

Legal Mandates and Competitive Advantages

One of the most significant advantages the National Library holds over private AI developers is its legal standing and existing agreements. As a state library, it operates under a legal deposit mandate, which entitles it to receive copies of every book published and every broadcast aired in Norway. This mandate was specifically extended to cover the preservation of all Norwegian cultural heritage.

Furthermore, the library has secured a unique agreement with Norwegian newspapers that permits the use of copyrighted content for LLM training. As Husnes noted, no private company currently possesses this level of access to high-quality, copyrighted Norwegian text. This legal framework allows the library to train its AI on a more comprehensive and authoritative dataset than any commercial provider could legally acquire, further cementing the model's status as a sovereign national asset.

Industry Impact

The Rise of National AI Initiatives

Norway's move to build a sovereign LLM reflects a broader global trend where nations are beginning to view AI as a critical component of cultural and linguistic preservation. By investing in localized models, countries can protect their digital sovereignty and ensure that their citizens have access to AI tools that understand their specific societal context. This shift may lead to a more fragmented but culturally diverse AI landscape, moving away from the dominance of a few global models.

Storage Requirements for Modern AI Pipelines

The use of 2 PB of flash storage specifically for the AI pipeline highlights the evolving infrastructure needs of the industry. As LLMs grow in complexity and the datasets they train on expand, the demand for high-speed, reliable storage solutions like the Huawei OceanStor Dorado will likely increase. The project demonstrates that for large-scale AI training, the bottleneck is often not just the compute power, but the ability of the storage system to feed data into the training pipeline efficiently.

Frequently Asked Questions

Question: Why is Norway building its own LLM instead of using existing commercial models?

Existing commercial LLMs are primarily trained on English-language data and often lack a deep understanding of Norwegian history, culture, and local news. By building a sovereign LLM, Norway ensures that its AI tools are culturally and linguistically accurate for its citizens.

Question: What kind of data is being used to train the Norwegian LLM?

The training data comes from the National Library’s 20 PB unique digital archive, which includes books, newspapers, web pages, sound recordings, moving pictures, and still images. This includes copyrighted newspaper content made available through special agreements.

Question: What role does Huawei storage play in this project?

The library uses 2 petabytes of Huawei OceanStor Dorado flash storage specifically for the AI training data pipeline. This high-performance storage is necessary to handle the intensive data processing and OCR scanning required to prepare the library's massive archive for LLM training.

Related News

Meituan LongCat Team Releases General 365 Benchmark Revealing Reasoning Gaps in Leading AI Models
Industry News

Meituan LongCat Team Releases General 365 Benchmark Revealing Reasoning Gaps in Leading AI Models

The Meituan LongCat team has officially introduced General 365, a new evaluation benchmark designed to test the reasoning capabilities of large language models. In a recent assessment of 26 mainstream models, the benchmark revealed a significant performance gap across the industry. Gemini 3 Pro, currently identified as the strongest model in the test, achieved an accuracy rate of 62.8%. However, the results indicate a broader struggle within the field, as the vast majority of the 26 models tested failed to reach the 60% accuracy threshold, which is considered the passing mark. This release by Meituan's technical team establishes a new standard for measuring AI reasoning, highlighting that even top-tier models have substantial room for improvement in complex cognitive tasks.

Managing AI Coding Through Agent Evaluation: A 310,000-Line Code Refactoring Case Study
Industry News

Managing AI Coding Through Agent Evaluation: A 310,000-Line Code Refactoring Case Study

As AI-generated code begins to account for over 90% of system development, the primary challenge shifts from increasing coding speed to managing and constraining AI output. Meituan's technical team has shared a comprehensive practice involving the refactoring of 310,000 lines of code using an 'Agent evaluation' mindset. By implementing a structured framework—including technical debt sorting, rule construction, standardized operating procedures (SOP), and a Pre-PR (Pull Request) mechanism—the team successfully transitioned code refactoring from a high-cost, specialized project into a sustainable, daily iterative process. This approach addresses the risk of AI-driven development amplifying system chaos and emphasizes the necessity of unified standards in the era of AI-native programming.

Meituan BI Evolution: Building a Next-Generation Architecture with Metrics Platforms and Enhanced Calculation Engines
Industry News

Meituan BI Evolution: Building a Next-Generation Architecture with Metrics Platforms and Enhanced Calculation Engines

Meituan's data platform team has pioneered a new generation of Business Intelligence (BI) architecture, placing a centralized metrics platform at its core. This strategic shift addresses critical limitations found in traditional BI systems, which often suffer from inconsistent data definitions—commonly known as "data caliber confusion"—and sluggish query performance when handling personalized datasets. By developing and implementing two primary technical capabilities, automatic semantics and enhanced calculation, Meituan has successfully streamlined its data processing workflows. This evolution marks a significant transition from dataset-driven analytics to a more robust, metrics-centric model, ensuring higher data reliability and faster insights for the organization's diverse business operations. The practice underscores Meituan's commitment to solving complex data engineering challenges through architectural innovation.