Hype vs. reality, what AI actually does in oil & gas today

Every AI use case in oil & gas is being pitched somewhere right now. The pitches range from real-and-shipping to real-but-narrow to real-only-on-stage. The buyer’s job is to sort the four buckets in front of them, working at scale, promising but unproven, looks good in a deck and fails at the wellhead, and pure fiction. We do this exercise with operators almost weekly, and the patterns are specific enough to be useful as a checklist.

Working at scale today

These are the use cases we have seen ship measurable value across multiple operators, multiple basins, multiple commodity prices. They share one feature: each of them changes the morning work loop, not just the tooling around it.

Per-asset anomaly detection on production assets. Per-well models trained on each well’s own history flag deviations 48 to 72 hours before failure. The reason this works is that no two wells are the same, and a fixed-threshold alarm system generates more noise than signal at scale. The reason it fails when it does fail is that the anomaly score is generated but never wired into the daily work plan, which puts the operator in Era 3 (see Chapter 2). The fix is to feed anomaly scores into the same ranked queue that drives the truck-cab plan.

Hydraulic model auto-build. For midstream, gas utility, and water operators, the manual model build, GIS export, network re-keying, calibration trial-and-error, takes 200+ hours per study. Auto-extraction from GIS topology plus live SCADA reconciliation produces simulator-ready models in minutes. We see this live across 3.5 million meters of pipeline infrastructure at a regulated gas-utility deployment today. It works because the underlying data sources are authoritative; the AI is simply doing the data integration that humans did badly.

Engineering drawing extraction. P&IDs, isometric drawings, and equipment datasheets contain a staggering amount of structured information that historically lived only in the heads of senior engineers. Computer vision plus domain-tuned NLP can extract pump curves, valve Cv, compressor maps, instrument tags, and topology. The accuracy is now good enough that the engineer’s job shifts from re-keying to verifying.

Daily ranked work plans, scored by economic impact. The work loop itself, ingest, score, route, execute, learn. This is the agentic stack. The reason it works is the optimization problem is large enough to be impossible by hand and structured enough to be tractable for an agent. A 26-pumper team chooses from roughly 10⁶⁹⁹ route combinations every morning; the math is in Chapter 6. The pumper still drives the truck. The agent does the math.

Route optimization with operational constraints. Constraint-based solvers (the same class of optimization that runs UPS and Amazon last-mile) handle crew qualifications, vehicle capacity, geography, time windows, and value density. We see 35% fewer site visits and 25% less drive time on the same production at scale.

35%

fewer site visits at the same production, on the same wells, with the same crews

Top 25 private producer · 5,000+ wells · Western Anadarko + Permian + Wyoming

Promising but unproven at scale

These have real shape, real teams behind them, and real wins on individual deployments. Whether they compound across an industry is still a 12 to 24-month question. We deploy several of them ourselves and we still treat them as “earning their place” rather than “proven moats.”

Natural language query over operational data. “Show me wells in the Permian producing under target with high lift cost.” The user experience is great. The failure mode is data-quality dependent: a query that returns wrong wells because the GIS asset ID drifted from the SCADA tag is more harmful than the query never existing. Useful when paired with a strong data reconciliation layer; risky without one.

Auto-generated incident reports. Pulling structured fields from a SCADA-and-witness-statement timeline into a draft report saves real time. The trap is treating the draft as final. Regulators and insurance still want a human in the loop on the substance of an incident report. The right pattern is “draft, route, edit, sign,” not “auto-generate, file.”

Reinforcement-learning loops on prediction quality. Anomaly and predictive-maintenance models that improve on outcome data (did the intervention prevent a failure?) are the right architectural shape. The data flywheel works. The proof at scale is still emerging.

AI-generated JSAs and pre-job hazard assessments. Auto-pre-populating a JSA from the asset hazard profile and weather saves the foreman 20 minutes per crew per day. The risk is the generated text becoming generic enough that the field stops reading it. We see this work when paired with operator-edit workflows and fail when shipped as “set and forget.”

Looks good in a deck, fails at the wellhead

These are the deployments where we have watched operators spend real money for no measurable change to the morning. They share a common shape: a real but narrow capability bolted onto a workflow that wasn’t modernized.

Generic copilot panels grafted onto existing tools. A chat interface that summarizes the screen the user is already looking at. Demo-friendly. Operationally pointless. The CMMS dashboard tells you what tickets are open; the copilot tells you the same thing in a sentence. Neither dispatches the right crew to the right asset.

Domain LLMs producing reports nobody reads. We have seen seven-figure programs whose primary deliverable is a 12-page weekly summary that goes into a folder. The diagnostic is simple: who acts on this report? If the answer is “the team reviews it on Friday,” you have a deliverable, not an outcome.

Anomaly detection without economic ranking. ML anomaly detection that fires on every minor SCADA blip and forwards 200 alerts a day to a foreman who already had 200 alerts a day. The model is doing its job. The deployment is failing because anomalies are not the unit of work, ranked tasks are.

“AI-powered” CMMS that’s mostly dashboards with chat. The CMMS shipped in 2018. The 2024 release added a chat panel and a keyword in the marketing. Underneath, the same workflow with the same data model. Buyers who were happy with the 2018 version are paying 2x for the 2024 version with no measurable lift.

The diagnostic, every time, is: did the morning change?

Pure fiction (or arrives in 2035)

We hear these in pitches. We hear them in conferences. We hear them on LinkedIn. They are not arriving in the next 24 months. They may not arrive in the next decade.

“Autonomous oil field.” Hand-waves the integration realities (Chapter 4), the OT security realities (Chapter 5), the regulatory framework, and the fact that most field work involves a human with a wrench. There are autonomous components, automated chemical injection, automated choke management, automated artificial-lift control, but the field as a whole is not within the planning horizon of any operator we work with.

“AI replaces the foreman.” The foreman’s job is judgment under uncertainty with stakes. Models can amplify the foreman; they cannot replace the accountability. Anyone selling otherwise is selling a regulatory problem.

“Self-healing infrastructure.” Real engineering term in software contexts. Marketing term in oil & gas contexts. Pipes do not heal. They get inspected, repaired, or replaced.

Three diagnostic questions before you sign

We give buyers these three questions to ask any AI vendor in oil & gas, including us. They sort the four buckets above with very high reliability.

01Does this change the morning work loop? If the answer is no, the deployment will look exactly like Era 3. Ask the vendor to walk you through the foreman’s morning before and after. If the morning is unchanged, the program is decorative.
02Has the vendor deployed at a comparable operator at scale, not pilot? A pilot is a controlled environment with a customer success team attached. Production at scale is a different thing. Ask explicitly: how many wells, how many basins, how many concurrent integrations, how long in production. Pilots are fine and necessary; pilots are not proof.
03Does the metric the vendor optimizes match the metric you’re measured on? If the vendor brags about model accuracy and you are measured on free cash flow per well, the connection between those two metrics is the integration the vendor isn’t shipping. Make them draw the chain from their KPI to your KPI.

Three questions. Five minutes per pitch. The hit rate on weeding out programs that would have failed in the field is high enough that we have stopped pitching past these three when the answer to any of them is bad.

Up next

Integration realities, SCADA, ERP, CMMS, GIS, historians

Each system has its own idea of what a well is. The schema reconciliation problem is the AI readiness problem. Protocol depth, read-only by default, and what 1-week integration actually means.

Back to the guide Talk to us about your operation