Making Connected Vehicles Generates ~10x Data as Driving Them

This article has been written and contributed to engineering.com by Greta Cutulenco, CEO and Cofounder of Acerta Analytics Solutions.

We keep hearing about the massive amounts of data autonomous cars will generate. One estimate from Intel a few years back pegged the number at 4,000 GB per vehicle, per day. More recently, Gartner predicted that by next year, the average connected vehicle will generate the same amount of data, to the tune of 280 petabytes annually.

Automotive industry data is Acerta’s lifeblood; our machine learning models depend on quality data to help automotive engineers reduce scrap and rework rates and accelerate root cause analysis (RCA). So, we decided to crunch the numbers for ourselves. What we found might seem counterintuitive to the uninitiated, but anyone who’s seen automotive industry data shouldn’t be the least bit surprised.

 Automotive Manufacturing & On-Road Data

Estimates for the total number of vehicles manufactured in 2019 vary, but, conservatively, it’s roughly 80 million worldwide. Each of those 80 million vehicles has approximately 30,000 parts.

The total number of vehicles on the road in 2019 has been estimated at 1.25 billion, but obviously those aren’t all connected vehicles, so we have to make some assumptions. For simplicity’s sake, we made three sets of estimates regarding the proportion of on-road vehicles with built-in connectivity and those with connectivity via OBD2:

  • In-Built Connectivity: Avg: 12% / Low: 5% / High: 16%
  • ODB2 Connectivity: Avg: 40% / Low: 20% / High: 50%

On the manufacturing side, we distinguished between simple parts and complex systems in terms of the amount of data they generate. Based on our combined experience with automotive industry data, we once again made three sets of estimates, varying the number of complex systems per vehicle in addition to the amount of data they generate:

  • Data Per Simple Part: Avg: 0.000004 GB / Low: 0.000004 GB / High: 0.001 GB
  • Data Per Complex System: Avg: 0.25 GB / Low: 0.25 GB / High: 1 GB
  • Number of Complex Systems: Avg: 40 / Low: 20 / High: 100

Based on these figures, along with the number of seconds in a day (86,400) and the number of days in a year (365), we’re ready to make our comparison (almost). The last thing we need is an estimate of how much data each connected vehicle generates. That involves estimates of the number of signals being monitored, their information content, and the proportion of the day the vehicle is driving. You can see a breakdown of these figures in this Google Sheet.

As to the sources of all this data, for manufacturing they include product specifications, sensors on the production line, on-machine measurements, testing data and outputs from the electronic control units (ECUs). Regarding the sources of on-road data, we’re including everything generated by the vehicle’s sensors, ECUs, etc., that accessible on the CAN network, with one notable exception: image data.

Now, before you cry “Foul!” Hear us out.

Obviously, images will account for the majority of data autonomous vehicles generate while driving, but that’s not the case for connected vehicles, and our analysis focuses on the latter. More to the point, we’re excluding image data for manufacturing as well as driving. While it might seem obvious that driving a connected vehicle would generate far more image data than manufacturing one, it’s worth remembering that the machine vision market for manufacturing grew ten percent in North America last year alone.

Manufacturing vs On-Road Data: The Results

chart courtesy of Acerta

By our average estimate, connected vehicles (in-built and ODB2 combined) generate 259 TB of data per day. Our low estimate puts the number at 34 TB and our high estimate puts it at 12 PB. Here are the total estimates for data generated by connected vehicles on the road in 2019:

  • Low: 12 PB
  • Average: 94 PB
  • High: 4399 PB

And here are our total estimates for data generated by vehicle manufacturing in 2019:

  • Low: 409 PB
  • Average: 809 PB
  • High: 10400 PB

For all the emphasis that’s placed on connected vehicle data, it’s worth noting that there are only two cases in which connected vehicles generate more data than manufacturing, both using the high estimate for connected vehicles. In all other combinations, making cars generates significantly more data than driving them.

Of course, this will change as the proportion of connected vehicles on the road increases—and a connected vehicle is a far cry from an autonomous one in terms of data volume. Nevertheless, the flood of automotive manufacturing data will continue, and unlike autonomous vehicles, it’s impacting revenue for automakers today.

Check out the Acerta Blog for more information.

This article has been written and contributed to engineering.com by Greta Cutulenco, CEO and Cofounder of Acerta Analytics Solutions.