How Mustafa Bal Is Building NomadicML to Turn Raw Fleet Video Into Useful AI Data

By Dobie Maxwell

Autonomous vehicles, robots, industrial machines, and other physical AI systems are producing more video than most teams can realistically review. Every test drive, warehouse run, delivery route, factory workflow, or field deployment can create hours of footage. Somewhere inside that footage may be the exact moment that explains why a model failed, why a robot hesitated, or why a rare safety event needs closer attention.

The challenge is not that autonomy teams lack data. In many cases, they have too much of it. The harder problem is finding the right data at the right time.

That is the problem Mustafa Bal is taking on through NomadicML, an AI company focused on visual intelligence for physical autonomy. Instead of treating raw fleet video as a storage problem, NomadicML is working to make it searchable, structured, and useful for the teams building real-world autonomous systems.

Bal’s work sits at an important point in the AI market. The industry has spent years talking about larger models and better demos, but physical AI needs something more grounded. It needs systems that can learn from messy, real-world environments. It needs teams that can quickly find edge cases, understand failure patterns, and turn operational footage into training signals.

That is where NomadicML is trying to create value.

Who Is Mustafa Bal

Mustafa Bal is the co-founder and CEO of NomadicML, a San Francisco-based AI startup building technology for autonomous vehicle and robotics teams. His work is tied to one of the most practical challenges in modern AI: helping machines learn from the physical world.

Before NomadicML, Bal built experience in machine learning infrastructure and large-scale AI systems. That background matters because the problem NomadicML is solving is not just about watching videos. It is about processing massive amounts of visual data, understanding what is happening inside that data, and making it usable for engineering teams that need accuracy, speed, and context.

In autonomy, the smallest detail can matter. A cyclist moving between cars, a pedestrian stepping off a curb, a construction vehicle entering a lane, a robot arm missing an object by a few centimeters, or a vehicle reacting to unusual lighting can all become important learning moments. The difficulty is that these moments are rare. They are often buried inside thousands or millions of hours of footage.

Bal’s achievement with NomadicML comes from focusing on that hidden layer of the autonomy stack. While many companies focus on building the robot, the vehicle, or the end-user product, NomadicML is focused on the data layer that helps those systems improve.

The Founder Behind NomadicML’s Vision

The vision behind NomadicML is simple to understand but difficult to build. Autonomous systems collect huge amounts of visual data, but most of that data does not automatically become useful. It has to be searched, labeled, organized, reviewed, and connected to model training workflows.

For years, that process has depended heavily on manual review and traditional labeling pipelines. Human reviewers can help, but the approach does not scale well when fleets are producing enormous volumes of video every day. Teams may know that important clips exist inside their archives, but finding them can feel like searching for a needle in a moving haystack.

Mustafa Bal’s approach is built around the belief that AI can help autonomy teams search video by meaning, not just by file name, timestamp, or basic sensor trigger. Instead of asking teams to manually scan footage, NomadicML aims to help them ask for the moments they care about and surface the relevant clips much faster.

That shift matters because physical AI is not trained only on clean examples. It improves when teams can study the strange, rare, and difficult situations that models struggle with in the real world.

What NomadicML Is Building

NomadicML is building a visual AI platform for physical autonomy. In practical terms, that means it helps robotics and autonomous vehicle teams turn raw video footage into structured, searchable datasets.

The company’s platform is designed to help teams identify the moments that matter inside large-scale fleet footage. These moments can include rare road events, safety-critical scenes, unusual object movement, difficult driving conditions, robot failures, and clips that may be useful for model training.

Instead of letting footage sit unused in storage, NomadicML helps transform it into a more active part of the machine learning workflow. A clip is no longer just a video file. It can become an edge case, a training example, a fleet monitoring signal, or a clue about where a model needs improvement.

This makes the company part of a larger movement around physical AI, where the goal is to build systems that can operate in the real world, not just perform well in controlled digital environments.

Turning Fleet Footage Into Searchable Data

The core idea behind NomadicML is that fleet video becomes far more valuable when it can be searched intelligently.

For an autonomous vehicle team, that might mean searching for every example of a car turning left in heavy rain, a pedestrian crossing against a signal, a cyclist weaving through traffic, or a vehicle moving under a certain kind of bridge. For a robotics team, it might mean finding every case where a robot gripper failed to pick up an object, moved too slowly, or reacted poorly to an unexpected obstacle.

These examples are not always easy to detect through simple rules. A hard brake or sudden stop may show that something happened, but many important edge cases do not come with obvious signals. Sometimes the most useful clips are the quiet ones, where the system technically kept moving but showed hesitation, confusion, or weak understanding.

NomadicML’s value is in helping teams find those moments faster. By turning video into structured data, it gives engineers a clearer way to search, review, and use footage that would otherwise remain difficult to access.

Why Video Data Matters in Physical AI

Video is one of the richest sources of information for physical AI. It shows movement, context, timing, distance, object behavior, scene complexity, and environmental change. For autonomous systems, video can reveal what happened before, during, and after a key event.

But raw video by itself is messy. It is expensive to store, difficult to review, and hard to convert into clean training data. Without the right tools, autonomy teams may only use a small portion of what they collect.

That is a serious bottleneck. If a company is building robots or autonomous vehicles, its advantage often depends on how quickly it can learn from the real world. The more efficiently it can turn field data into model improvements, the faster it can improve performance.

Mustafa Bal and NomadicML are building around this exact need. Their work is not just about organizing files. It is about helping AI teams move from raw footage to practical learning.

The Problem Mustafa Bal Is Trying to Solve

The problem NomadicML is solving can be summed up in one question: how do autonomy teams find the few clips that matter inside massive amounts of video?

Autonomous fleets can generate enormous volumes of footage. Most of that footage may show normal behavior, routine environments, and uneventful operations. But the most valuable training signals often come from rare events. These are the moments that show a system where its understanding is weak.

That could be a pedestrian behaving unpredictably, a delivery robot facing a blocked sidewalk, an autonomous mining truck operating in unusual terrain, or a robot arm struggling with an object it has not seen before. These situations are valuable because they expose gaps in the model.

The challenge is that rare events are rare by definition. Teams cannot improve from them if they cannot find them.

Why Manual Video Review Does Not Scale

Manual review has limits. A human reviewer can watch footage, tag events, and identify important clips, but this approach becomes slow and expensive as data grows. Even if a team speeds through video, the process still takes time and can miss subtle patterns.

It also creates a gap between data collection and model improvement. If engineers have to wait too long to find the right clips, the feedback loop slows down. In fast-moving autonomy teams, that delay matters.

This is why searchable video data is becoming more important. Teams need to move from passive archives to active data systems. They need tools that help them ask sharper questions, surface relevant examples, and feed those examples back into training and evaluation.

NomadicML’s platform is designed for that kind of workflow.

The Hidden Value Inside Raw Fleet Video

A large video archive can look like a cost center. It takes storage, compute, and operational effort to manage. But when the right clips are found and organized, that same archive can become a valuable source of intelligence.

The hidden value sits inside the moments that explain how a system behaves in the real world. A model may perform well in common conditions but struggle with long-tail scenarios. Those long-tail scenarios are often the difference between a promising demo and a reliable product.

For Mustafa Bal, the opportunity is in helping teams unlock that buried value. NomadicML is built around the idea that raw fleet video should not sit unused. It should help teams understand what happened, why it happened, and what data can help improve the next model version.

That is a strong founder-market fit. Bal is not building around a vague AI trend. He is building around a real operational pain point faced by companies working on autonomy, robotics, and machine perception.

How NomadicML Helps Autonomous Fleets Learn Faster

Autonomous systems improve through iteration. Teams collect data, test models, find weaknesses, train again, and deploy better versions. The faster that loop becomes, the faster the system can improve.

NomadicML supports that loop by making video data easier to search and use. Instead of waiting for manual review or relying only on basic triggers, teams can identify meaningful examples more directly.

This can help across several workflows. Engineers can find edge cases for testing. Machine learning teams can build better training datasets. Fleet operators can monitor behavior across deployments. Safety teams can study rare events with more context.

The common thread is speed. Better data discovery helps teams spend less time hunting for clips and more time improving models.

Finding Edge Cases Faster

Edge cases are one of the most important parts of autonomy development. They are the unusual situations that test whether a system can handle the real world.

For autonomous vehicles, edge cases may include unusual pedestrian behavior, heavy rain, glare, construction zones, emergency vehicles, cyclists in tight spaces, blocked intersections, or confusing lane markings. For robotics, they may include object slips, unusual surfaces, cluttered workspaces, poor lighting, unexpected human movement, or tasks that require fine motion understanding.

Finding these moments quickly can help teams understand where their systems need more training. It can also help them test whether a new model version handles difficult situations better than the last one.

NomadicML’s focus on edge case discovery is important because it helps move autonomy development away from broad data collection and toward more targeted learning.

Helping Teams Build Better Training Datasets

Not all data is equally useful. A million hours of routine footage may not help as much as a smaller set of high-quality clips that reveal model weaknesses.

This is where structured video data becomes valuable. If teams can search for specific events, conditions, and behaviors, they can build training datasets with more purpose. Instead of feeding models random footage, they can select clips that address known gaps.

That kind of dataset building is especially important for physical AI. Real-world systems need examples that reflect the complexity of roads, warehouses, factories, construction sites, and public spaces. The more precise the data pipeline becomes, the better teams can train for the situations that actually matter.

Mustafa Bal’s work with NomadicML fits into this shift. The company is helping teams move from more data to better data.

Supporting Safer Real World Autonomy

Safety in autonomy is not built through one feature or one model update. It comes from many small improvements across perception, planning, control, testing, monitoring, and data quality.

NomadicML contributes to that process by helping teams see more clearly into their own fleet data. When rare events are easier to find, engineers can study them sooner. When weak spots are easier to identify, teams can build more focused training sets. When video archives become searchable, the entire development cycle becomes more informed.

This does not mean any single platform can guarantee safety. But better data visibility can help teams make more careful decisions. In a field where real-world performance matters, that visibility is a meaningful advantage.

Why NomadicML’s Funding Signals Market Confidence

In March 2026, NomadicML announced an $8.4 million seed round. For an early-stage AI company, that funding is a sign that investors see a clear need for better infrastructure in the physical AI market.

The funding also points to a broader trend. As robotics and autonomous systems grow, companies are realizing that the future of autonomy is not only about better models. It is also about the data systems that help those models learn.

A model is only as useful as the feedback loop behind it. If teams cannot find the right examples, diagnose failures, or turn real-world footage into training data, progress slows down. NomadicML is positioning itself as part of that infrastructure layer.

That is why Mustafa Bal’s work is worth watching. He is building in a space where the demand is practical, technical, and tied directly to how autonomy teams operate.

Why Investors Are Paying Attention to Physical AI Infrastructure

Physical AI has become one of the most important areas in artificial intelligence because it connects models to the real world. Robots, vehicles, drones, industrial machines, and autonomous equipment all need to understand dynamic environments.

That creates a different kind of challenge from purely digital AI. A chatbot can be tested through prompts and text outputs. A robot or vehicle has to understand motion, timing, space, physics, safety, and unpredictable human behavior.

This is why infrastructure matters. Companies building physical AI need tools for data management, video understanding, sensor integration, model evaluation, and training workflows. NomadicML fits into that layer by helping teams make sense of visual data at scale.

Investors are paying attention because the companies that learn fastest from real-world data may have a major advantage. In physical AI, better learning loops can become a competitive edge.

Mustafa Bal’s Achievement With NomadicML

Mustafa Bal’s achievement is not just raising money or launching another AI startup. It is identifying a hard, specific, and valuable problem inside the autonomy market.

Many AI companies focus on general productivity tools or broad software use cases. NomadicML is focused on a narrower but deeper challenge: helping autonomous systems teams understand the video data they already collect.

That focus gives the company a clear role. It is not trying to replace the robot, the vehicle, or the fleet operator. It is helping those teams learn from their own real-world footage faster.

This is a strong example of building around an industry pain point instead of chasing hype. The problem is technical, but the business need is easy to understand. Teams are collecting huge amounts of footage. Most of it is hard to use. The companies that can turn that footage into better training data may improve faster.

Bal’s success with NomadicML comes from building at that exact intersection.

Building Around a Real Industry Pain Point

The best AI infrastructure companies usually solve problems that become more painful as customers grow. NomadicML fits that pattern.

The more vehicles, robots, or machines a company deploys, the more video it collects. The more video it collects, the harder it becomes to find the important moments. The harder that process becomes, the more valuable a searchable data layer can be.

This gives NomadicML a practical reason to exist. It is not built around a nice-to-have feature. It is built around a workflow that can affect model training, evaluation, fleet operations, and safety review.

For Mustafa Bal, that creates a clear founder story. He is building a company that helps autonomy teams make better use of the data they already own.

Creating a Data Layer for the Next Era of Autonomy

Autonomy is moving into a new phase. The early race was about proving that machines could operate in controlled or limited environments. The next phase is about scaling those systems into more complex, unpredictable settings.

That shift makes data infrastructure more important. Teams need to know how their systems behave across different conditions, geographies, weather patterns, object types, and user behaviors. They need to test rare events, not just average performance.

NomadicML is building for that world. Its work points toward a future where fleet footage is not a passive archive. It becomes a living data layer that helps teams monitor, train, and improve physical AI systems.

If NomadicML can make that process faster and more reliable, it could become an important part of how robotics and autonomous vehicle companies build smarter systems.

Why This Work Matters Beyond Autonomous Vehicles

Although NomadicML is closely connected to autonomous vehicles and robotics, the broader idea can apply across many physical AI markets.

Delivery robots, warehouse robots, industrial automation systems, mining vehicles, construction equipment, drones, and smart city systems all generate visual data. Each of these industries faces the same basic question: how can teams turn real-world video into useful intelligence?

The use cases may look different, but the underlying problem is similar. Teams need to understand what happened in the field. They need to find rare events. They need to detect patterns. They need to improve models without drowning in footage.

This is why Bal’s work has a larger market relevance. NomadicML is not only about cars. It is about helping machines learn from the physical world.

From Roads to Robots

The phrase “fleet video” often brings autonomous vehicles to mind, but robots also create valuable visual data. A warehouse robot moving through aisles, a robotic arm sorting objects, a drone inspecting infrastructure, or an industrial machine operating in a factory can all produce footage that helps explain performance.

For these systems, video can show more than whether a task succeeded or failed. It can show how the machine approached the task, where it hesitated, what confused it, and which environmental details affected the outcome.

That context is important. Physical AI systems need more than labels. They need understanding. NomadicML’s focus on visual reasoning and structured video data fits into that need.

The Bigger Picture for Mustafa Bal and NomadicML

Mustafa Bal is building NomadicML around a clear idea: the future of autonomy depends on how well teams can learn from real-world data.

Raw fleet video is often treated as a difficult byproduct of autonomous systems. NomadicML is trying to turn it into something more useful. By making footage searchable and structured, the company helps teams find edge cases, build better datasets, monitor fleets, and improve models with more precision.

That is what makes Bal’s work meaningful. He is not only building an AI platform. He is building a data layer for the companies trying to bring physical AI into the real world.

As autonomous vehicles, robotics, and intelligent machines continue to grow, the companies that learn fastest from their own data may have the strongest advantage. Mustafa Bal and NomadicML are working to make that learning loop faster, cleaner, and more useful.