Round-Robin Balancing Implementation Question

Post by **biolizard89** » Thu Jan 02, 2014 1:38 am

The .bit spec includes round-robin load balancing for IPv4 and IPv6, and there was consensus for adding this feature for other resolvers such as Tor and I2P. I have a question about implementation.

With typical DNS, I'm under the impression that a name that uses round-robin load balancing will randomly pick one IP when it is resolved by a DNS client. This will last several hours due to caching. However, with Namecoin, the DNS server is often on localhost, and there is often no need for caching the result of the random selection. The simplest way to implement the random selection in Convergence for Namecoin is simply to do the random selection when each TCP request is made (instead of picking result[0], we pick result[rand()]).

So, my question is, will web servers freak out if a single user sends TCP requests to multiple IP's that are load-balanced? For example, I might load the main HTML page from result[0], and the images on the page will be evenly split between result[0] and result[1]. This definitely isn't something that web servers encounter now... so I wonder if it will cause problems in some cases, particularly for dynamic sites.

indolering · Post by **indolering** » Thu Jan 02, 2014 5:31 pm

biolizard89 wrote:So, my question is, will web servers freak out if a single user sends TCP requests to multiple IP's that are load-balanced? For example, I might load the main HTML page from result[0], and the images on the page will be evenly split between result[0] and result[1]. This definitely isn't something that web servers encounter now... so I wonder if it will cause problems in some cases, particularly for dynamic sites.

Good catch! Yes, I think it should be cached for some reasonable amount of time. You should throw this in as a NEP!

gigabytecoin · Post by **gigabytecoin** » Thu Jan 02, 2014 6:10 pm

Web servers are fairly dumb. They respond to http requests. That's about it.

There shouldn't be any inherent problems in loading various parts of a single website from different web servers.

If the two web servers have different domain names then you get into TLS problems. But as for downloading various pieces of a web page from multiple web servers, that's exactly what web servers are designed to do.

indolering · Post by **indolering** » Thu Jan 02, 2014 7:32 pm

gigabytecoin wrote:Web servers are fairly dumb. They respond to http requests. That's about it.

There shouldn't be any inherent problems in loading various parts of a single website from different web servers.

If the two web servers have different domain names then you get into TLS problems. But as for downloading various pieces of a web page from multiple web servers, that's exactly what web servers are designed to do.

There could be different backend databases with eventual consistency between them. Throwing the client to a new server every page load could give them inconsistent results.

Post by **biolizard89** » Fri Jan 03, 2014 2:17 am

indolering wrote:Their could be different backend databases with eventual consistency between them. Throwing the client to a new server every page load could give them inconsistent results.

Would you happen to have a link to some info documenting this?

indolering · Post by **indolering** » Fri Jan 03, 2014 3:39 am

biolizard89 wrote:Would you happen to have a link to some info documenting this?

It happens on Facebook all the time. The delay is usually a few minutes and since it's all ajax it appears instant for one user but it may take a few to propagate elsewhere.

But no, I have no hard data on this nor would I expect there to be any rules of thumb here. You might post to some of the NoSQL mailing lists.

Post by **biolizard89** » Fri Jan 03, 2014 8:21 am

indolering wrote:
biolizard89 wrote:Would you happen to have a link to some info documenting this?
It happens on Facebook all the time. The delay is usually a few minutes and since it's all ajax it appears instant for one user but it may take a few to propagate elsewhere.

But no, I have no hard data on this nor would I expect there to be any rules of thumb here. You might post to some of the NoSQL mailing lists.

I'll take your word for it.

Unfortunately, this complicates things a lot from a privacy perspective, because I imagine it would be easy for a malicious website to fingerprint users based on which random IP's get used. Let's say I possess 1024 .bit domains and 32 IP addresses (probably quite cheap for an attacker). If I'm doing my math right, now I can reduce your anonymity set by a factor of 32^1024 for the duration of the caching, across all websites. 32^1024 is an enormously large number, greater than the number of atoms in the universe if I'm not mistaken, i.e. all of your anonymity is destroyed. Furthermore, unless the DNS cache for all websites expire simultaneously, I can still fingerprint you with a nearest-neighbor search as long as you stay connected to at least 1 website I control (could be as long as you leave your browser open).

It's not a problem with ICANN domains via Tor because the randomness will change whenever you change Tor circuits, but since we're doing the resolution locally, this becomes way harder to avoid fingerprinting.

Do you (or anyone else) have suggestions for an algorithm to choose which IP gets used for a .bit domain which doesn't enable fingerprinting? If we can hook the identity variable being passed to Tor's SOCKS proxy, that might be sufficient... any thoughts?

indolering · Post by **indolering** » Sat Jan 04, 2014 6:48 am

biolizard89 wrote: Unfortunately, this complicates things a lot from a privacy perspective, because I imagine it would be easy for a malicious website to fingerprint users based on which random IP's get used. Let's say I possess 1024 .bit domains and 32 IP addresses (probably quite cheap for an attacker). If I'm doing my math right, now I can reduce your anonymity set by a factor of 32^1024 for the duration of the caching, across all websites. 32^1024 is an enormously large number, greater than the number of atoms in the universe if I'm not mistaken, i.e. all of your anonymity is destroyed. Furthermore, unless the DNS cache for all websites expire simultaneously, I can still fingerprint you with a nearest-neighbor search as long as you stay connected to at least 1 website I control (could be as long as you leave your browser open).

Maybe I've had one too many beers, but wouldn't the client have to be connect to those websites at the same time? Then your 1024 number is meaningless. And why wouldn't they just perform timing analysis if they control both websites?

biolizard89 wrote: It's not a problem with ICANN domains via Tor because the randomness will change whenever you change Tor circuits, but since we're doing the resolution locally, this becomes way harder to avoid fingerprinting.

Do you (or anyone else) have suggestions for an algorithm to choose which IP gets used for a .bit domain which doesn't enable fingerprinting? If we can hook the identity variable being passed to Tor's SOCKS proxy, that might be sufficient... any thoughts?

So why not just tie the cache expire to changing circuits? Perhaps that's what you mean by identify variable.... My main thought is that you should engage on the Tor mailing list and IRC. They were kind enough to clear-up misconceptions I had a few years ago, good people.

Post by **biolizard89** » Tue Jan 07, 2014 7:58 am

indolering wrote:
biolizard89 wrote: Unfortunately, this complicates things a lot from a privacy perspective, because I imagine it would be easy for a malicious website to fingerprint users based on which random IP's get used. Let's say I possess 1024 .bit domains and 32 IP addresses (probably quite cheap for an attacker). If I'm doing my math right, now I can reduce your anonymity set by a factor of 32^1024 for the duration of the caching, across all websites. 32^1024 is an enormously large number, greater than the number of atoms in the universe if I'm not mistaken, i.e. all of your anonymity is destroyed. Furthermore, unless the DNS cache for all websites expire simultaneously, I can still fingerprint you with a nearest-neighbor search as long as you stay connected to at least 1 website I control (could be as long as you leave your browser open).
Maybe I've had one too many beers, but wouldn't the client have to be connect to those websites at the same time? Then your 1024 number is meaningless. And why wouldn't they just perform timing analysis if they control both websites?

If I own a website that the user visits, I can embed 1024 <img> elements which each connect to a different .bit domain, thus allowing me to do high-res fingerprinting. It wouldn't be hard for an ad network to embed such code.

indolering wrote:
biolizard89 wrote: It's not a problem with ICANN domains via Tor because the randomness will change whenever you change Tor circuits, but since we're doing the resolution locally, this becomes way harder to avoid fingerprinting.

Do you (or anyone else) have suggestions for an algorithm to choose which IP gets used for a .bit domain which doesn't enable fingerprinting? If we can hook the identity variable being passed to Tor's SOCKS proxy, that might be sufficient... any thoughts?
So why not just tie the cache expire to changing circuits? Perhaps that's what you mean by identify variable.... My main thought is that you should engage on the Tor mailing list and IRC. They were kind enough to clear-up misconceptions I had a few years ago, good people.

Yeah, I've been meaning to talk to the Tor guys for a while... I had a partial conversation with one of the Tor people a couple months ago and then I got distracted by upcoming final exams. And yes, tying it to circuits is what I meant by the identity variable... except I'm not sure if the identity variable changes every 10 minutes along with the circuit rotation, and I'm unaware of another way to detect changing circuits from the Firefox end of things.

In any event, this is complex enough that I'm not going to attempt load balancing for a while.

indolering · Post by **indolering** » Tue Jan 07, 2014 8:42 am

biolizard89 wrote:
indolering wrote: Maybe I've had one too many beers, but wouldn't the client have to be connect to those websites at the same time? Then your 1024 number is meaningless. And why wouldn't they just perform timing analysis if they control both websites?
If I own a website that the user visits, I can embed 1024 <img> elements which each connect to a different .bit domain, thus allowing me to do high-res fingerprinting. It wouldn't be hard for an ad network to embed such code.

So, yes?

indolering wrote: In any event, this is complex enough that I'm not going to attempt load balancing for a while.

Tor is always very complicated :p

Namecoin Forum

Round-Robin Balancing Implementation Question

Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question

Re: Round-Robin Balancing Implementation Question