Autonomous Vehicle Training & Tesla's Data Engine Explained

The idea of learning how to drive may take you back to your teenage days behind the wheel with your parents in the passenger seat. But for autonomous vehicle engineers, learning to drive means diving into massive data sets, creating complex neural network algorithms, and making years of incremental improvements.

In autonomous vehicle training, large teams of expert hardware and software engineers utilize data, simulation systems, and state-of-the-art Artificial Intelligence (AI) training infrastructures to prepare autonomous vehicles for the road.

As a leader in the autonomous-vehicle space, Tesla has developed a complex machine learning infrastructure to iteratively train their full-self-driving (FSD) computers to overcome real-world challenges while simultaneously improving Tesla's training datasets.

Here, we'll examine the fundamentals of Tesla's "Data Engine" workflow to see how the data generated by Tesla's cars is retroactively used to retrain those same cars in the future.

What is Tesla's Data Engine?

Tesla's "Data Engine" is a fundamental pillar in the race to full autonomy . This unique data workflow uses real-world driving examples to iteratively run machine learning algorithms, which are then used to train self-driving neural networks.

Tesla does this in an exquisite way; each car is equipped with an FSD computer that runs two FSD systems in tandem. One FSD computer is used to drive the vehicle and pilot it when auto-pilot is on, while the other FSD computer is constantly running in "shadow mode."

Shadow mode runs as if it were truly controlling the car. Still, when the driver does something different than it would have done, or the neural network signals that it doesn't know what to do in the presented scenario, it notes that event as an inaccuracy. As Tesla logs these inaccuracies into its memory, it can retroactively collect them.

Suppose Tesla detects enough inaccuracies under similar circumstances. In that case, Tesla can then search for similar driving conditions found in other cars in the Tesla fleet, even if it didn't detect an inaccuracy.

Tesla can then harvest similar contextual examples. Using this newly formed, well-labeled data set, Tesla can re-train its neural network to better react to the scenario in which those inaccuracies were presented. Once the neural network is re-trained, it can deploy the newly revised self-driving neural network to "shadow mode" and collect new data examples for further inaccuracies.

Tesla Data Collection at a High Level

The visual representation in the image below illustrates the cyclical nature of the Tesla data collection and iteration strategy. First, data is collected at the source (the FSD computer in the Tesla). Next, the vehicle identifies an inaccuracy. That inaccuracy enters Tesla's Unit Tests to verify its legitimacy and that it's not the result of subpar driving by the human driver.

If the inaccuracy is deemed legitimate, Tesla then asks its fleet for more examples of where the inaccuracies are found. Those examples are then correctly labeled by a human, and used to train the neural network. The network is then redeployed to the data source to collect more inaccuracies.

Competition in the Automotive Industry

When Tesla first introduced its "Data Engine" during Tesla's Autonomy Day in 2019, it naturally created envy within the rest of the automotive industry

In a conversation with Reuters in 2020, Audi's CEO stated, "Tesla is two years ahead in terms of computing and software architecture, and autonomous driving." From its inception, this massive infrastructure of data generation, training material collection, understanding, re-training iteration, deployment, and rerun mechanism was extremely well-designed.

The "Data Engine" featured supporting integrations at every step of the data chain, requiring ground-up architecture of data collection, transfer, and computing. For example, Tesla automobiles can functionally support wireless connectivity to Tesla's central database, and support shadow mode compute and data collection, all while running the production-deployed autonomous vehicle algorithms. At its unveiling, no other automotive manufacturer had an autonomous vehicle data collection infrastructure as robust and mature as Tesla.

Edge Case Identification and Iteration

One of the most critical functions of the Data Engine is its ability to detect inaccuracies between human driving or production neural networks, versus its shadow deployed neural network. In its Autonomy Day presentation, Tesla used the example of detecting driving inaccuracies in the presence of bicycles on the road to illustrate the complexity of Tesla's machine learning infrastructure and its ability to identify these inaccuracies.

In its production neural networks, Tesla deemed bicycles as critical to avoid, given that bikes are often occupied by humans riding them. If a Tesla were required to collide with one of two objects, it would collide with another automobile instead of a bicycle since the car is likely to protect its driver better than the bicycle.

However, Tesla noticed that in some cases, bicycles were located in the middle of the road, very close to other cars, triggering the shadow neural network to indicate that it didn't know what to do when compared to the driver's behavior. Alternatively, Tesla may have noticed that when their production autopilot neural network was running and a bicycle was identified "in the middle of the road," human drivers may have intervened and "corrected" the course of the car.

When Tesla was notified, data labeling technicians audited the inaccuracy and discovered it occurred when cars had bike racks with bikes attached to them.

To humans, this event would have simply been a novel occurrence, but this training data unveiled a critical challenge to the production neural network in self-driving cars. In this scenario, Tesla then asked their fleet of FSD-enabled vehicles to search for other events that may have contained a bicycle on or near a car.

These data examples were then sent to data-labeling technicians that correctly identified an actual bicycle, a bicycle mounted behind a car, a bike mounted on top of a car, or none of the above. Using the newly collected and labeled data set, the beta-version neural network could be re-trained using Tesla's machine learning data centers and redeployed across the fleet to be run in shadow mode and iterated upon.

With their Data Engine, Tesla can create new software releases using learnings from their shadow mode deployments and fast training iterations. The genius of Tesla's data collection infrastructure is that it enlists real-life data examples and uses human drivers to train its machine learning models. Ultimately, Tesla's Data Engine accelerates neural network training by quickly collecting well-labeled, real-life, self-driving car training data and actively using it to conduct iterative machine learning trials.

Tesla's data collection for the win

Other companies in the autonomous vehicle sector have proprietary methods of training their autonomous vehicles, but none operate at the scale of Tesla. With cars all over the globe, Tesla can collect more safe-driving data and more edge-case inaccuracies and use this to educate their neural networks.

Many other companies have extremely robust training simulations and data collection mechanisms, but often, the scope of their work is limited to a single city or region. Regardless, the data generated by autonomous cars – at this point, only semi-autonomous vehicles – is the inevitable key that will fully open the door to autonomous cars.


ArrowPerks-Loyalty-Program-Signup-banner-EN


Latest News

Sorry, your filter selection returned no results.

We've updated our privacy policy. Please take a moment to review these changes. By clicking I Agree to Arrow Electronics Terms Of Use  and have read and understand the Privacy Policy and Cookie Policy.

Our website places cookies on your device to improve your experience and to improve our site. Read more about the cookies we use and how to disable them here. Cookies and tracking technologies may be used for marketing purposes.
By clicking “Accept”, you are consenting to placement of cookies on your device and to our use of tracking technologies. Click “Read More” below for more information and instructions on how to disable cookies and tracking technologies. While acceptance of cookies and tracking technologies is voluntary, disabling them may result in the website not working properly, and certain advertisements may be less relevant to you.
We respect your privacy. Read our privacy policy here