We aren't going to tell anyone which bank you are working for ;-)
OK. We are in slightly different situation here. We don't have hundreds
of stupid different unrelated applications working on bunch of different
web servers. No, instead we have one application, well integrated,
well-thought-out, well designed (hopefully). What's important - we will
design and write it with scalability in mind from ground-up.
Managing multiple instances is still an issue - yes, we'd need to distribute
code to all nodes in cluster. This is true for any cluster though. Of
good clustering solution has built-in support to help with this issue.
In our case, code is just set of DSOs, and a config file - distributing
a matter of running simple rsync call, followed by a restart of each app
server instance (or potentially not even that, if I add back servlet reload
Now, performance is a tough one. We have four potential bottle-necks
1. client<->web server: we need a fat pipe in front of our cluster.
I think the day it becomes real issue will be a happy day for me - it'll
we have enough visitors.
2. web server. Again - there is next to none processing in the web server
itself, so we can only have communications overhead between web server
and app server. In most setups there will always be single front-end web
server (unless we want HA, to avoid web server becoming single point of
failure - this is different story)
3. app server. This is where it gets interesting. App server can become
CPU-hungry, even if we don't put too much of object state processing in
it. Remember - we have plans for so-called problem generator? This is
a thing that definitely does not belong in database, will be quite complex,
and might be used quite often. Some other aspects, like generating graphical
representations of math formulas for display purposes will also go into
app server: basically, in this particular application, there will be
4. Database. Well, 'nuff said. In any case, database will be working
quite a bit,
so, we need some way of distributing it.
I can think of two approaches. One is to rely on internal clustering
database. From what I've heard, there are interesting solutions in
world, and they even work. Sort of. I'm not sure how
Another approach is to build the support on application level. i.e. (and
is not a well-thought-out idea ;-):
- make user tables replicated (i.e. through triggers) across multiple
- make a course<->server mapping table and replicate it across as well
- single course runs on specific server, so all course-related
on that server only.
This is bit over-simplified, especially with our object sharing
requirements, but hey,
we can make it happen.
So, to summarize:
- we need to make it possible to distribute application server load
machines regardless of approach we take
- same is true for database
- we are still unclear on pros and cons of original question ;-)
Alexey Parshin wrote:
> I have a perfect chance now to observe the strategy "multiple
> app-servers + single DB server" in action.
> The place is a bank here, in AU, and I'm looking at this from inside.
> They have about 20 application servers and single 8-ways DB server.
> The primary problems:
> 1) Performance. They came to the point when adding more app servers
> slows down the database
> 2) Code distribution. A fix in the database in many cases requires C++
> code re-distribution, and updated parts of the system have to be offline.
> 3) Code manageability. They have tons of C++ code (250+ CGI programs)
> that all have little bells and vistles. If something doesn't work in
> such app - they debug it, recompile it, ship it. It has to be tested
> throughly. A part of the problem is - sometimes it isn't clear what
> the code is doing. The support is constantly busy. Overall - it is a
> The most important conclusions so far: by adding a lot of application
> servers, we move the bottle neck (only partially!) from data
> processing to data exchange, but we keep the same single DB server as
> a bottleneck. That schema doesn't give good performance.
> 2006/9/14, Ilya A. Volynets-Evenbakh < email@example.com
> I will agree with general approach. At least with going in that
> Let's look at few specific future requirements though.
> 1. Horizontal scalability. I.E. we run out of power, we add more
> 2. Database and application servers on different machines - result -
> potentially need to be able to cache object data in application
> server for
> efficiency reasons.
> Let's take a look at #1
> In case we need to distribute application due to lack of power, we
> will need
> to have multiple machines for most power-hungry part. In your case
> How easy is it to do with our database server? How efficient is
> it? What
> other constraints
> In all-C++ logic case, distributing will be fairly easy (almost
> load balancing
> can be implemented in cppserv front-end web server with ~30 lines
> of code)
> Issues: data consistency, data coherency. Network efficiency might
> be an issue
> in this case as well.
> #2. This is really unclear thing. Let's see what could be achieved by
> caching objects
> in app server memory:
> On one hand, caching objects in memory might seem to save
> However, in reality, when object is requested, it is often
> requested in
> order to modify.
> Now, we still have to have some sort of a handle on an object in
> to modify it, so
> caching _migh_ save us roundtrips. Meh... Needs more thought.
> Anyways - #1 seems like important aspect, that needs to be addressed.
> How does
> your approach fit with it?
> Alexey Parshin wrote:
> > C++ module shouldn't have it's own copy of data, especially - data
> > relations. Instead, it assumes this to be implemented in DB. Methods
> > in C++ classes, in that case, simply call stored procedures.
> > For instance, we need to add a relation between a student and a
> > problem. We should have a stored proc like
> student_assign_problem that
> > does the job and reports the result. Another stored proc should
> > the list of problems assign to student.
> > Pros:
> > - the business logic is concentrated in one place
> > - maximum possible performance, if we are not doing high-level math
> > - data processing in SQL is pretty simple
> > - bugs in stored procs are easier to fix (we just replace a proc in
> > real time)
> > - in general case (didn't check enough with Postgres yet), SQL
> > controls the DML in stored procs, preventing some runtime errors
> > Cons:
> > - SQL languages we can use require C-style programming
> > - debugging stored procs is more difficult than C/C++ code
> > The opposite implementation, when we have the logic implemented in
> > C++, basically treats SQL server as dBase, having a data copy on the
> > client and doing most of the processing there. That approach is
> > acceptable when a few people are using a database. With the grow of
> > user number, it leads to the slow down of the database.
> > Pros:
> > - the code may be taken to pretty high abstraction level,
> > encapsulating everything inside the classes
> > - easy to debug data processing
> > - the language (C/C++) is a sweety
> > Cons:
> > - processing data involves a data copy on the client, therefore a
> > round trip read-change-write - creates an extra steps in processing
> > (performance issues)
> > - data modifications can be made from different places. Often (not
> > necessary true for us), people untie the database security and
> > integrity constraints to have more possibilities for data
> > - leads to broken data integrity and poor performance
> > - fixing bugs in logic requires recompilation the C/C++ program and
> > shipping it to customer. That is more difficult than patching a
> > proc
> > There are some pros and cons I'm probably missing.
> > It's possible to use a combined approach. In my experience, it
> > combines the problems of both prior approaches and advantages of
> none :(
> Ilya A. Volynets-Evenbakh
> Total Knowledge. CTO
> Alexey Parshin,
Ilya A. Volynets-Evenbakh
Total Knowledge. CTO
Authoright © Total Knowledge: 2001-2008