Round-Robin Balancing Implementation Question
-
- Posts: 2001
- Joined: Tue Jun 05, 2012 6:25 am
- os: linux
Round-Robin Balancing Implementation Question
The .bit spec includes round-robin load balancing for IPv4 and IPv6, and there was consensus for adding this feature for other resolvers such as Tor and I2P. I have a question about implementation.
With typical DNS, I'm under the impression that a name that uses round-robin load balancing will randomly pick one IP when it is resolved by a DNS client. This will last several hours due to caching. However, with Namecoin, the DNS server is often on localhost, and there is often no need for caching the result of the random selection. The simplest way to implement the random selection in Convergence for Namecoin is simply to do the random selection when each TCP request is made (instead of picking result[0], we pick result[rand()]).
So, my question is, will web servers freak out if a single user sends TCP requests to multiple IP's that are load-balanced? For example, I might load the main HTML page from result[0], and the images on the page will be evenly split between result[0] and result[1]. This definitely isn't something that web servers encounter now... so I wonder if it will cause problems in some cases, particularly for dynamic sites.
With typical DNS, I'm under the impression that a name that uses round-robin load balancing will randomly pick one IP when it is resolved by a DNS client. This will last several hours due to caching. However, with Namecoin, the DNS server is often on localhost, and there is often no need for caching the result of the random selection. The simplest way to implement the random selection in Convergence for Namecoin is simply to do the random selection when each TCP request is made (instead of picking result[0], we pick result[rand()]).
So, my question is, will web servers freak out if a single user sends TCP requests to multiple IP's that are load-balanced? For example, I might load the main HTML page from result[0], and the images on the page will be evenly split between result[0] and result[1]. This definitely isn't something that web servers encounter now... so I wonder if it will cause problems in some cases, particularly for dynamic sites.
-
- Posts: 801
- Joined: Sun Aug 18, 2013 8:26 pm
- os: mac
Re: Round-Robin Balancing Implementation Question
Good catch! Yes, I think it should be cached for some reasonable amount of time. You should throw this in as a NEP!biolizard89 wrote:So, my question is, will web servers freak out if a single user sends TCP requests to multiple IP's that are load-balanced? For example, I might load the main HTML page from result[0], and the images on the page will be evenly split between result[0] and result[1]. This definitely isn't something that web servers encounter now... so I wonder if it will cause problems in some cases, particularly for dynamic sites.
DNS is much more than a key->value datastore.
-
- Posts: 67
- Joined: Tue May 10, 2011 12:49 am
- os: linux
- Location: Behind 50 Proxies
Re: Round-Robin Balancing Implementation Question
Web servers are fairly dumb. They respond to http requests. That's about it.
There shouldn't be any inherent problems in loading various parts of a single website from different web servers.
If the two web servers have different domain names then you get into TLS problems. But as for downloading various pieces of a web page from multiple web servers, that's exactly what web servers are designed to do.
There shouldn't be any inherent problems in loading various parts of a single website from different web servers.
If the two web servers have different domain names then you get into TLS problems. But as for downloading various pieces of a web page from multiple web servers, that's exactly what web servers are designed to do.
-
- Posts: 801
- Joined: Sun Aug 18, 2013 8:26 pm
- os: mac
Re: Round-Robin Balancing Implementation Question
There could be different backend databases with eventual consistency between them. Throwing the client to a new server every page load could give them inconsistent results.gigabytecoin wrote:Web servers are fairly dumb. They respond to http requests. That's about it.
There shouldn't be any inherent problems in loading various parts of a single website from different web servers.
If the two web servers have different domain names then you get into TLS problems. But as for downloading various pieces of a web page from multiple web servers, that's exactly what web servers are designed to do.
Last edited by indolering on Fri Jan 03, 2014 3:40 am, edited 1 time in total.
DNS is much more than a key->value datastore.
-
- Posts: 2001
- Joined: Tue Jun 05, 2012 6:25 am
- os: linux
Re: Round-Robin Balancing Implementation Question
Would you happen to have a link to some info documenting this?indolering wrote:Their could be different backend databases with eventual consistency between them. Throwing the client to a new server every page load could give them inconsistent results.
-
- Posts: 801
- Joined: Sun Aug 18, 2013 8:26 pm
- os: mac
Re: Round-Robin Balancing Implementation Question
It happens on Facebook all the time. The delay is usually a few minutes and since it's all ajax it appears instant for one user but it may take a few to propagate elsewhere.biolizard89 wrote:Would you happen to have a link to some info documenting this?
But no, I have no hard data on this nor would I expect there to be any rules of thumb here. You might post to some of the NoSQL mailing lists.
DNS is much more than a key->value datastore.
-
- Posts: 2001
- Joined: Tue Jun 05, 2012 6:25 am
- os: linux
Re: Round-Robin Balancing Implementation Question
I'll take your word for it.indolering wrote:It happens on Facebook all the time. The delay is usually a few minutes and since it's all ajax it appears instant for one user but it may take a few to propagate elsewhere.biolizard89 wrote:Would you happen to have a link to some info documenting this?
But no, I have no hard data on this nor would I expect there to be any rules of thumb here. You might post to some of the NoSQL mailing lists.
Unfortunately, this complicates things a lot from a privacy perspective, because I imagine it would be easy for a malicious website to fingerprint users based on which random IP's get used. Let's say I possess 1024 .bit domains and 32 IP addresses (probably quite cheap for an attacker). If I'm doing my math right, now I can reduce your anonymity set by a factor of 32^1024 for the duration of the caching, across all websites. 32^1024 is an enormously large number, greater than the number of atoms in the universe if I'm not mistaken, i.e. all of your anonymity is destroyed. Furthermore, unless the DNS cache for all websites expire simultaneously, I can still fingerprint you with a nearest-neighbor search as long as you stay connected to at least 1 website I control (could be as long as you leave your browser open).
It's not a problem with ICANN domains via Tor because the randomness will change whenever you change Tor circuits, but since we're doing the resolution locally, this becomes way harder to avoid fingerprinting.
Do you (or anyone else) have suggestions for an algorithm to choose which IP gets used for a .bit domain which doesn't enable fingerprinting? If we can hook the identity variable being passed to Tor's SOCKS proxy, that might be sufficient... any thoughts?
-
- Posts: 801
- Joined: Sun Aug 18, 2013 8:26 pm
- os: mac
Re: Round-Robin Balancing Implementation Question
Maybe I've had one too many beers, but wouldn't the client have to be connect to those websites at the same time? Then your 1024 number is meaningless. And why wouldn't they just perform timing analysis if they control both websites?biolizard89 wrote: Unfortunately, this complicates things a lot from a privacy perspective, because I imagine it would be easy for a malicious website to fingerprint users based on which random IP's get used. Let's say I possess 1024 .bit domains and 32 IP addresses (probably quite cheap for an attacker). If I'm doing my math right, now I can reduce your anonymity set by a factor of 32^1024 for the duration of the caching, across all websites. 32^1024 is an enormously large number, greater than the number of atoms in the universe if I'm not mistaken, i.e. all of your anonymity is destroyed. Furthermore, unless the DNS cache for all websites expire simultaneously, I can still fingerprint you with a nearest-neighbor search as long as you stay connected to at least 1 website I control (could be as long as you leave your browser open).
So why not just tie the cache expire to changing circuits? Perhaps that's what you mean by identify variable.... My main thought is that you should engage on the Tor mailing list and IRC. They were kind enough to clear-up misconceptions I had a few years ago, good people.biolizard89 wrote: It's not a problem with ICANN domains via Tor because the randomness will change whenever you change Tor circuits, but since we're doing the resolution locally, this becomes way harder to avoid fingerprinting.
Do you (or anyone else) have suggestions for an algorithm to choose which IP gets used for a .bit domain which doesn't enable fingerprinting? If we can hook the identity variable being passed to Tor's SOCKS proxy, that might be sufficient... any thoughts?
DNS is much more than a key->value datastore.
-
- Posts: 2001
- Joined: Tue Jun 05, 2012 6:25 am
- os: linux
Re: Round-Robin Balancing Implementation Question
If I own a website that the user visits, I can embed 1024 <img> elements which each connect to a different .bit domain, thus allowing me to do high-res fingerprinting. It wouldn't be hard for an ad network to embed such code.indolering wrote:Maybe I've had one too many beers, but wouldn't the client have to be connect to those websites at the same time? Then your 1024 number is meaningless. And why wouldn't they just perform timing analysis if they control both websites?biolizard89 wrote: Unfortunately, this complicates things a lot from a privacy perspective, because I imagine it would be easy for a malicious website to fingerprint users based on which random IP's get used. Let's say I possess 1024 .bit domains and 32 IP addresses (probably quite cheap for an attacker). If I'm doing my math right, now I can reduce your anonymity set by a factor of 32^1024 for the duration of the caching, across all websites. 32^1024 is an enormously large number, greater than the number of atoms in the universe if I'm not mistaken, i.e. all of your anonymity is destroyed. Furthermore, unless the DNS cache for all websites expire simultaneously, I can still fingerprint you with a nearest-neighbor search as long as you stay connected to at least 1 website I control (could be as long as you leave your browser open).
Yeah, I've been meaning to talk to the Tor guys for a while... I had a partial conversation with one of the Tor people a couple months ago and then I got distracted by upcoming final exams. And yes, tying it to circuits is what I meant by the identity variable... except I'm not sure if the identity variable changes every 10 minutes along with the circuit rotation, and I'm unaware of another way to detect changing circuits from the Firefox end of things.indolering wrote:So why not just tie the cache expire to changing circuits? Perhaps that's what you mean by identify variable.... My main thought is that you should engage on the Tor mailing list and IRC. They were kind enough to clear-up misconceptions I had a few years ago, good people.biolizard89 wrote: It's not a problem with ICANN domains via Tor because the randomness will change whenever you change Tor circuits, but since we're doing the resolution locally, this becomes way harder to avoid fingerprinting.
Do you (or anyone else) have suggestions for an algorithm to choose which IP gets used for a .bit domain which doesn't enable fingerprinting? If we can hook the identity variable being passed to Tor's SOCKS proxy, that might be sufficient... any thoughts?
In any event, this is complex enough that I'm not going to attempt load balancing for a while.
-
- Posts: 801
- Joined: Sun Aug 18, 2013 8:26 pm
- os: mac
Re: Round-Robin Balancing Implementation Question
So, yes?biolizard89 wrote:If I own a website that the user visits, I can embed 1024 <img> elements which each connect to a different .bit domain, thus allowing me to do high-res fingerprinting. It wouldn't be hard for an ad network to embed such code.indolering wrote: Maybe I've had one too many beers, but wouldn't the client have to be connect to those websites at the same time? Then your 1024 number is meaningless. And why wouldn't they just perform timing analysis if they control both websites?
Tor is always very complicated :pindolering wrote: In any event, this is complex enough that I'm not going to attempt load balancing for a while.
DNS is much more than a key->value datastore.