What can we guess from an email address?

This blog post will show how much data you can interfer from an arbitrary email sign up. You will be surprised how much data and how accurate a good bayesian guessing approach can work.perspective email address

My IP adresss

So lets see what we got when you sign up to a newsletter. First we have your ip address. In this example I will use my current address 91.64.150.78.

So there are services like MaxMind which tell you for a certain ip where the person is located with a 15km accuracy. So here is what I get:

IP Address Country Code Location Postal Code Coordinates ISP
91.64.150.78 DE Teltow,
Brandenburg,
Germany,
Europe
14513 52.4,
13.2667
Kabel Deutschland

In real I am currently in the Dortustr. 57 in Potsdam which is 18,5 km away. So MaxMinds guess is pretty accurate.

My HTTP request

The following table shows this data:

POST /semRecSys-rest/NewsletterService/6ecb5bf5-3580-445c-a3d2-bc64493e19b7
HTTP/1.1 Host: recsys.incentergy.de
Connection: keep-alive
Content-Length: 46
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: http://www.incentergy.de User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: http://www.incentergy.de/
Accept-Encoding: gzip,deflate,sdch Accept-Language: en,de;q=0.8
Cookie: evertimeCookie=d6d46657-31d9-4713-8e81-09695591f8e5; AWSELB=B7DB2B2316675BAFCE6F4ECBED33D738EADC2F06DB1CED99806F8AB0AD854BD143BE1BD276D82B2570B0C1BC51175259A6C92A8F7DF56A3C17A112737899AA98ED97141989; __utma=124758746.791261515.1383593642.1389706837.1390039665.40; __utmb=124758746.2.10.1390039665; __utmc=124758746; __utmz=124758746.1389028775.37.6.utmcsr=incentergy|utmccn=newsletter_2624FE5B-F07F-4EC5-8095-2DEE366ACA5C|utmcmd=email

mbox=manuel.blechschmidt%40gmail.com&submit=go

You can see that my browser automatically transmits a lof of information about me so the most importants parts are I am using a Mac with Chrome. Further google analytics did his job and already added some more information about me.

My email address itself

As you can see my email address contains my first name and my last name. It is easily guessable that my first name is Manuel and my last name is Blechschmidt. So with this data we can do more. Lets start:

Gender

That is the easiest one. Manuel is a male given name. Easy 🙂

Age

Like in fashion there are trends in naming. So parents are sometimes more likely to give their children one name compared to another. A very famous example is Kevin. There is even a psychological trait called Kevinism.

So lets have a look for my name Manuel how old I might be.

Wolfram alpha has some really nice stats for given names:

Wolfram Alpha First Name Manuel

 

So if I would be american I would be likely to be around 5 years or around 18. Stop, we already figured out that I am german from my ip address. So lets get the data from germany:

http://www.beliebte-vornamen.de/9264-manuel.htm

Manuel deutscher Vornamen

Hmm, so Manuel was quite common between 1970 – 1992. In real I am born in 1986 and I am 27.

Creditworthiness

Ok, I got an age and I got a location. Can I now say something about creditworthiness. Yes, I can. It is called geo scoring. Further salary is correlated with age.

Because I don’t have accurate scoring data for zip codes I will just use the average rental prices for apartments (Mietspiegel) which is published by most of the german communities according to § 558c Abs. 4 BGB. Further Immobilienscout24 collects and visualizes this data.

Mietspiegel Teltow und Umgebung

 

As you can see Teltow is not that expenses so my credit worthiness based on this information is not that good. Further this information is also a little bit unreliable.

Wait, we figured out something in the HTTP request. I am using a Mac. Can we use this? Yes, we can.

Orbitz figured out that people who are using a Mac are willing to spend 30% more for a hotel room.

Marital status

Ok, so next thing would be marital status. Again we can use age and location. The older I become the more likely I am to be married further if I am living in a city I am more likely to be single.

Children

Currently I have no idea how to guess how many children a person has based on the given data. Any ideas are appreciated.

Data from other networks

Ad networks like Double Click or iAd are already offering to transmit specific data for a user and this data can be later used for bidding on advertisement. It would be possible to use this data too.

Context of the website

Every website has its target group. This can also be taken into account for guessing information about you.

If you are eager to learn more and how to use all this information about internet users to make them happier and increase your revenue, subscribe to our newsletter in the upper right corner or contact us.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.