The VW Data Leak
We Know Where Your Car (and You) Slept Last Night!
It appears VW subsidiary Cariad, tasked with building a futuristic software platform for the company's electric vehicles, made a whoopsie. A massive amount of data, several terabytes worth, was left unprotected in an Amazon cloud storage.
Of course this data was behind authentication, but it was not encrypted and a team of hackers from Germany were able to get their hands on it. So what exactly happened?
Before we get into that let us talk a bit about VW.
Volkswagen is a conglomerate of car brands which use a common platform. Having a common platform to develop on top of has advantages in terms of reducing costs and accelerated learning, but in this case the common platform was a disadvantage, leading to a common vulnerability for all sub brands under the VW name mentioned above.
With that being said let us talk about what happened here.
APIs leaking Information
A bunch of hackers found that a certain API which should have been turned off, was actually on. This was the GET actuator/heapdump API which dumps a bunch of binary data. So someone forgot to turn this off for customer cars, letting hackers get a huge binary dump.
Using a tool used commonly by hackers called Strings to analyse this, they were able to find credentials to an AWS server. Logging in using these credentials revealed a treasure trove of data. At this stage already, VW has made a few mistakes, by not turning off the heapdump API, not encrypting the data stored and/or not having two factor authentication in place.
Now lets look at the data, which is where the fun comes in.
The Data
So there are 3 data buckets which are stored separately.
User Data
This contains user data for 600,000 customers including fields such as User ID, Name, Email, birth dates, and phone numbers
Enrolment Data
Vin, Model year, car model and user ID
Event Data
Events with a K anonymous id. Events include charging information and location.
Before we look into the data itself, it is important to note that according to VW, they do not combine these data sources internally. Such practices are fairly standard in a lot of companies when it comes to handling data. While data siloes can be bad for business leading to limited insights, they can be great for data privacy. Thus, companies often define their own siloes to reduce privacy issues. A dataset which is collected for a certain purpose is used only for that purpose, and while technically it can be combined with other data sources, this would be violating terms of use companies put in place when collecting this data from customers.
Besides using data siloes, VW does a few other things for cusomter privacy, which we can infer based on the dataset:
Have K-anonymity in place for event data: This means that no event can be attributed to a unique vehicle. It can be attributed to one of 5 vehicles. In the case of VW, we know that K>=5.
Shortening of GPS: As per their terms of use, VW is supposed to shorten GPS coordinates leading to a 10km window of uncertainty when it comes to location. The issue is that they do this anonymisation for Skoda and Audi, but seem to have forgotten to do this for VW and SEAT brands.
Please note that all the information presented in this article, including the plots that follow are taken from this video on ccc website. For anyone looking to dig deep into this hack, I recommend watching the video.
So at this point we have 3 mistakes from VW: Leaving an API on, not encrypting personal data and not anonymising location.
Now let us actually look at the data:
So first off you can see where cars are in different cities or even across the whole of Europe:
Besides this, you can narrow down to see who is visiting which location:
For example, embassies:
You can also narrow down to other spots like police stations, brothels and religious sites. Furthermore, you can also find where people with certain email IDs are living by combing data sources together.
The Aftermath
Volkswagen has acknowledged a significant data leak involving its software subsidiary, Cariad, which exposed the personal information of approximately 800,000 electric vehicle (EV) owners. The breach allowed access to sensitive data, including precise location information for about 460,000 vehicles, and was reportedly accessible for several months before being discovered by the Chaos Computer Club (CCC) on November 26, 2024.
In an official statement, Volkswagen clarified that the exposed data did not include passwords or payment information. They emphasized that accessing the data required bypassing multiple security mechanisms, which involved a high level of expertise and considerable time investment. The company reassured users that the CCC was able to access only pseudonymized vehicle data, meaning it did not allow for easy identification of individual users.
Volkswagen has since rectified the error and is conducting a thorough investigation to determine further necessary actions.
According to German newspaper Der Spiegel, the EU data act which would come into force in the latter half of 2025 should help prevent such situations in the future. The act would require automakers to supply data to suppliers much more easily than today. This is hoped to bring in more transparent data collection practices, and hopefully some standardisation across the industry. While this might work, providing data to suppliers would also mean a wider surface area for data leaks, which could actually make the situation worse.
Europe’s data landscape is already overregulated with GDPR, but bringing in the EU data act would make the problem only worse. This might encourage automakers to look to other markets to harvest data, for understanding their customers. However, this would mean the European customer would end up with a car which is designed primarily for an American or a Chinese customer.
While the data leak reflects poorly on Volkswagen, I am not sure if the nature of the leak can be solved with another piece of regulation. There seem to be a few genuine issues in Volkswagen’s handling of data, which they definitely need to resolve. An audit would definitely be recommended to find other potential issues which the hack did not uncover in Volkswagen’s data processes.
Privacy By Design
Furthermore, automakers need more focus on Privacy by Design. Even Volkswagen’s intended practices of having K-anonymity in place along with location with an uncertainty of 10 km could be bad for privacy in certain cases. One such case is the lonely farmer problem where you only have one house in a 10 km radius, allowing the company to identify a customer if a car is parked overnight in that location. Such edge cases, while challenging, can be resolved with clever anonymisation techniques.
Bringing together a combination of anonymisation techniques such as suppression, adding noise to data, federated analytics and encryption can enable automakers to maximise business value from data without risking customer privacy. I hope this incident is a wake up call for Volkswagen to fix their issues.





