OFC2025|Comprehensive Applications of Generative AI in Network Operations

Latitude Design Systems
Apr 30
11 min read

Introduction

Generative artificial intelligence (GAI) is transforming how the telecommunications industry manages network operations. This article explores how AT&T Labs implements GAI solutions across every stage of the network lifecycle—from initial planning to equipment decommissioning. By understanding these applications, network engineers and operations teams can better leverage AI to enhance efficiency, reduce costs, and improve service quality [1].

Understanding the Network Technology Lifecycle

Network technologies follow a predictable lifecycle that spans decades. AT&T Labs defines this lifecycle in four key phases, labeled "Day 0" through "Day 3", each with distinct activities and challenges. The 'Day 0' phase (forecasting and planning) spans 1 to 6 years. This initial phase involves forecasting network demand and planning resources accordingly. The 'Day 1' phase (design, build, configure) spans 1 to 5 years. Focuses on implementing network designs, building infrastructure, and configuring systems. The 'Day 2' phase (operate and maintain) spans 10 to 20 years. The longest phase, encompassing ongoing management, troubleshooting, and optimization. The 'Day 3' phase (sunset and migration) spans 5 to 10 years. This final phase manages the transition from legacy technologies to updated solutions.

Each phase presents unique challenges and opportunities for GAI applications. These stages often overlap, forming a continuous cycle of technological evolution. This cyclical nature allows planners to manage multiple technology generations simultaneously, retiring older technologies while deploying new ones—ensuring service continuity and optimal resource use.

Figure 1: Four key phases of the network technology lifecycle – activities and durations.

A real-world example illustrates this lifecycle well: consider AT&T’s IP-based DSL technology. Planning began around 2005, with design and deployment occurring between 2007–2011. Operations and maintenance peaked around 2015 with over 10 million users. Eventually, the technology entered the sunset phase, transitioning rapidly to xPON. This case clearly demonstrates the full lifecycle—from concept to retirement—and how the phases overlap. Notably, user numbers declined as newer technologies were introduced post-peak, illustrating the natural rhythm of technology evolution.

Figure 2: IP-based DSL user growth from 2003 to 2023 – illustrating planning, deployment, peak usage, and sunset phase.

GAI Applications Across the Network Lifecycle

AT&T Labs have identified specific use cases for generative AI at each stage of the network lifecycle. These applications leverage large language models (LLMs), generative adversarial networks (GANs), and other AI techniques to address critical operational challenges. Each is tailored to the unique needs of a particular phase.

Day 0: Traffic Demand Forecasting

Traffic demand forecasting relies heavily on historical traffic data, which is often degraded due to collection gaps, outages, and misconfigurations. This poses significant challenges for planning teams. Poor data quality can lead to inaccurate forecasts, affecting decisions around network capacity planning and resource allocation. Traditional techniques use statistical imputation or interpolation, but they fail to capture the complex patterns in traffic data.

AT&T Labs apply GAI models to infer missing and anomalous traffic data, ensuring continuity and improving forecast accuracy. Specifically, they implemented a GAN model that enhances data quality by correcting anomalies, filling in missing values, and generating realistic synthetic data. The model uses deep learning architectures, including transformer- or LSTM-based generators and discriminators, to handle complex time series data.

Technically, the system first encodes the input time series into a latent space representation, then processes it through the generator and discriminator networks. The generator produces reconstructed data, while the discriminator evaluates its realism. By combining reconstruction loss and discriminator loss, the system calculates an anomaly score to identify data points needing repair. Once anomalies are detected, a multi-scale U-Net architecture is used to accurately infer missing values, considering patterns across different time scales. This approach is particularly effective for network traffic data, as it captures daily, weekly, and seasonal trends.

Figure 3: GAN-based traffic anomaly detection and inference with latent space, generator/discriminator, and multi-scale U-Net.

This system has demonstrated significant operational benefits. It can identify and repair large-scale missing data, maintaining continuity and consistency. It also effectively detects anomalous patterns, preventing outliers from skewing forecast results. Most importantly, it can simulate the impact of introducing new services or devices on network traffic, providing valuable insights for planning decisions.

Figure 4: Examples of inferred time series with gaps and detected anomalies in reconstructed sequences.

Day 0: Ask Planning

Traditional network planning tools have significant usability barriers that limit their effectiveness. These tools are often designed for professional use, with complex interfaces requiring deep domain expertise to operate effectively. This high entry threshold prevents non-experts from participating in the planning process, creating decision silos and communication gaps.

To address this, AT&T developed a chatbot-style tool. This tool can answer questions about fiber and wireless investment strategies, helping decision-makers evaluate financial returns and performance criteria. It uses a natural language interface, allowing users to explore complex planning scenarios conversationally—without needing to understand the underlying technical details.

This tool employs LLMs to integrate disparate data sources and tools into an end-to-end solution. The architecture comprises three components: GAI semantic processing, GAI agent/planning, and GAI reasoning. Semantic analysis extracts intent, the planning agent coordinates data/tools, and the reasoning engine generates business insights.

The workflow begins with understanding the planning question (via GAI semantic analysis), then proceeds to data/tool orchestration, processing the interaction between network data and planning tools. Finally, it generates customized results and offers business recommendations. This enables non-experts to pose complex planning questions like, “How can I invest $50 million in fiber service in the Dallas area?” and receive comprehensive analyses and advice.

Figure 5: Ask planning system architecture – semantic, planning, and reasoning components.

Compared to traditional planning tools, Ask Planning significantly improves user experience. Conventional tools require users to select complex parameters like DMA/central offices, counties, polygons, etc., and manually enter various technical thresholds. In contrast, Ask Planning lets users’ express intent in natural dialogue. The system handles complex parameter settings automatically. Users can easily ask follow-up questions, adjust assumptions, or request further explanations—creating a dynamic, iterative planning experience.

Figure 6: Ask planning interface screenshot showing Dallas fiber investment query and structured financial response.

Day 1: RAN RF Design

Radio frequency (RF) planning for wireless networks has traditionally been a time-consuming process that relies on expensive, measurement-calibrated models. Accurately predicting RF signal propagation is crucial for optimizing cell coverage and minimizing interference. However, existing methods often require extensive field measurements and manual tuning, making large-scale deployment inefficient and costly.

AT&T Labs developed an innovative approach using 3D geospatial ray-tracing, accelerated by generative AI and reinforcement learning models, to predict and optimize network RF coverage and interference. This method applies physical principles of ray tracing to 3D geospatial data and uses generative AI to speed up inference, enabling detailed RF propagation analysis in fractions of a second.

Technically, the system includes a data model component and a network state prediction component. The data model integrates geospatial information, network state data, and user equipment (UE) status. The network state prediction module contains an AI-accelerated ray tracer and modules for coverage and interference analysis. The ray tracer accounts for buildings, terrain, and other physical obstacles that affect RF signals, while the coverage and interference module evaluate network performance metrics.

A key innovation in this system is the use of generative AI to accelerate ray-tracing computations. Traditional ray tracing is computationally intensive, especially in complex urban environments. By training generative models to simulate ray propagation paths, the system can complete inference in under 10 milliseconds—dramatically improving speed. Reinforcement learning is also used to optimize antenna parameters and other network configurations to maximize coverage and minimize interference.

Figure 7: AI-accelerated ray-tracing architecture for RAN RF design, combining geospatial data with network and UE state information.

In practical applications, this system has shown significant operational benefits. Computations for a single 1x1 km area take just 6 seconds and achieve an accuracy of 7 dB compared to field data, which is considered highly accurate in RF prediction. More impressively, through parallel computation, it can simulate RF coverage for the entire island of Manhattan in just 30 seconds—a task that could take days using traditional methods. This speed enables engineers to quickly perform “what-if” analyses, test different network configurations, and identify potential issues before deployment.

Figure 8: RF design visualization showing color-coded signal coverage and interference analysis in an urban environment.

Day 1: Cell Site Auto-Configuration

Modern cellular networks face unprecedented configuration complexity. Networks comprise millions of cells, each with hundreds of configurable parameters related to hardware, software, environment, and workload. Layer management, mobility management, and cross-layer interactions further complicate matters, making manual optimization nearly impossible.

AT&T addressed this challenge by implementing a data-driven, GAI-based system for tuning RAN configuration parameters. The solution uses neural network foundation models that are pre-trained on vast amounts of data and fine-tuned for individual cell sites. This approach is like how large language models are trained, but it is focused on the domain of network configuration.

Technically, the system processes a variety of input data types: tabular data for bandwidth, power, and antenna configurations; numerical time series data for KPIs; event time series data from network call traces; and drive test data. These diverse data types are preprocessed and standardized through feature extraction, then used to train the network foundation model.

Training occurs in two main stages: pre-training and fine-tuning. During pre-training, the model learns general relationships between network configurations, performance, and environmental conditions—developing a broad understanding of wireless network behavior. In the fine-tuning phase, the model is optimized for a specific target site or scenario, considering its unique characteristics and requirements.

Once trained, the system can automatically generate optimal configuration recommendations, predict the impact of different configuration changes, and continuously adapt to changing network conditions. This method is particularly effective because it not only learns from historical data but also improves over time using reinforcement learning, adjusting its recommendations based on actual performance feedback.

Figure 9: Cell site auto-configuration method showing data types used for pretraining the network foundation model and fine-tuning for specific sites.

Day 2: RAN Engineer Collaboration Tool for Root Cause Analysis

When network issues arise, RAN engineers are under pressure to quickly diagnose and resolve customer complaints. Traditionally, this requires manual searching through multiple data sources, interpreting complex logs, and applying domain expertise to identify root causes. This process is time-consuming, dependent on expert availability, and often results in prolonged resolution times and customer dissatisfaction.

AT&T has improved this situation by developing a RAN engineer collaboration tool that enables engineers to troubleshoot and resolve problems more efficiently. The tool leverages agent-based AI, conversational AI, and large language models, combined with deep domain knowledge, to perform fault isolation and root cause analysis for wireless network customer complaints.

From a technical standpoint, the system follows a five-step processing flow: First, engineers ask a question through a user-friendly interface, such as “Which 5 cells had the strongest signal for IMSI=123 on September 1?” Then, the GAI LLM analyzes the query, interprets the intent, and identifies the required data. In the third step, the system generates an API URL to retrieve data from the relevant tools. In step four, data is fetched through the tool’s API. Finally, the LLM analyzes the data and generates a comprehensive answer presented in an easy-to-understand format for the engineer.

The strength of this system lies in its ability to interpret natural language queries and translate them into the appropriate technical operations. For instance, when an engineer asks about signal issues related to a specific user experience, the system can automatically determine which KPIs to query, which systems to access, and how to interpret the results. Moreover, it not only provides answers but also offers contextual information and suggested next steps to guide the troubleshooting process.

Figure 10: Architecture of the RAN engineer collaboration tool, showing the flow from user query through LLM analysis, data retrieval, and answer generation.

In real-world deployments, this tool has demonstrated significant operational benefits. It has improved mean time to resolution (MTTR) for mobile network issues by 50%, directly translating into cost savings and enhanced customer experience. Engineers can perform complex analyses faster, reduce reliance on domain experts, and resolve intricate issues more effectively. The system also facilitates knowledge retention and sharing, enabling new engineers to learn and become productive more quickly.

Figure 11: Screenshot of the RAN engineer collaboration tool interface, showing a query about cell signal strength and the system’s detailed technical response.

Day 2: Trouble Ticket Analysis

Trouble tickets contain valuable operational information, but due to their volume, variety, inconsistent reporting, and human error, the data is often “noisy” and difficult to correlate with structured instrumentation data. This makes systematic analysis and trend identification challenging, limiting the ability of service providers to derive insights from these rich data sources.

AT&T has addressed this challenge by using GAI models to classify trouble ticket data and convert unstructured image data into structured formats. This approach processes two primary types of unstructured data: text and images. For text data, the system uses generative language models to analyze ticket descriptions, technician notes, and customer feedback, extracting key information such as issue categories, affected services, and fault symptoms. For image data, the system uses generative vision models to analyze photos taken by field technicians, identifying device identifiers, connection status, and signs of physical damage.

Technically, this multimodal approach combines the text interpretation capabilities of large language models with the image analysis strengths of computer vision models. The language models are fine-tuned on telecom-specific language, technical terms, and common issue types, enabling them to understand technical context and recognize problem descriptions that might otherwise be ambiguous.

The image models are trained to identify various types of network devices, connection types, and infrastructure components. For example, when a technician uploads a photo of a fiber connection, the system can extract the device identifier from the fiber label—even if the label is partially obscured or photographed from an angle. This automation reduces manual data entry errors and ensures that critical identification information is accurately captured.

Figure 12: Illustration showing how GenAI text and image models process trouble ticket descriptions and technician photos to extract structured information such as issue category and device ID.

By converting unstructured data into structured formats, the system creates machine-readable, standardized datasets that can be used for advanced analysis. This yields significant operational benefits, including improved accuracy of trouble classification, enhanced trend analysis, and more precise root cause identification. Most importantly, structured data enables integration with other network monitoring systems, creating a holistic view of network health that supports more proactive maintenance strategies and resource optimization.

Day 3: Access Device Upgrade

Over the long lifecycle of network equipment, migrating from one vendor to another presents a particularly challenging task. Over time, engineers become deeply familiar with a specific vendor’s ecosystem, creating a “vendor lock-in” effect that makes transitions difficult and risky. Different vendors use different naming conventions, parameters, and configuration methods, requiring retraining and adaptation periods that can result in service interruptions and performance degradation.

AT&T developed the Access Device Upgrade Collaboration Tool to address this challenge. This tool enables smoother vendor transitions by providing real-time expert support. It integrates knowledge from multiple sources, including documentation from vendors A and B, standards (such as 3GPP for wireless), AT&T internal operational procedures, as well as network configuration and KPI data.

From a technical perspective, the system employs an LLM-based reasoning framework for planning and execution. When an engineer submits a query such as “What is the closest equivalent of Vendor B’s parameter XXX to Vendor A’s parameter YYY?”, the LLM first generates a multi-step plan to answer the question. For each step, the system selects the appropriate tools—such as SQL queries, CSV processing, or RAG agents. The system then executes these steps and, if necessary, replans additional ones until it can provide a complete answer.

A key innovation of the system lies in its ability to map vendor-specific terminology, parameters, and configurations into a standardized functional understanding. For example, different vendors may use entirely different parameter names to describe the same network functionality, such as neighboring cell signal offsets. The collaboration tool not only maps parameters but also explains their functions, units, ranges, and the standards they affect—helping engineers understand equivalence across systems.

Figure 13: LLM-based reasoning framework for device upgrades, showing query analysis, plan generation, tool selection, and execution.

This tool provides significant operational value for network operations. It reduces the time engineers spend looking for device transition information and helps them more quickly adapt to new features and methods for optimizing the network. Additionally, the system records knowledge and creates mappings, becoming smarter over time and improving its recommendations and explanations. This reduces reliance on vendor experts, accelerates migration processes, and lowers operational risks.

Figure 14: Screenshot demonstrating how the system helps engineers map parameters across vendors and provides detailed technical explanations for new parameters.

Conclusion

Generative AI applications span the entire network lifecycle, offering specialized solutions for the unique challenges of each phase. From forecasting network demand to facilitating device transitions, these AI-driven systems help telecom operators increase efficiency, reduce costs, and enhance service quality.

The work by AT&T Labs demonstrates that GAI can go beyond basic automation and fundamentally transform network operations—enabling more intelligent, adaptive, and responsive network management. These technologies not only solve current operational challenges but also create new capabilities that allow operators to manage their networks in ways never before possible.

As these technologies continue to mature, deeper integration between GAI and telecom operations will evolve. Future applications may include end-to-end network optimization, autonomous network maintenance, and predictive service assurance. By embracing these innovations, telecom providers can maintain a competitive edge while delivering more reliable, efficient, and adaptive services to their customers.

Reference

[1] J. Wang, "Generative AI for Network Operations (OFC 2025 Workshop)," in Optical Fiber Communication Conference (OFC) 2025, Mar. 2025, Paper Th1A.1.