Saturday, January 23, 2010

How to provide space for caching content

To estimate a technician called me recently to the storage capacity for the cache of the Internet. The coach is a part of a team that is creating the Internet gateway for an ISP in a small country in the Middle East.

The technician was very limited numbers available to do the dissipation necessary. He could express the question is very simple --

"The providers will be able to serve 40,000 users. What should be the storage of dataAbility to caching content? "

As part of the team that SafeSquid Content filtering proxy that expands often get this kind of query, with one major difference. Most are formulated as questions - "We have a tube of X Mbps Internet What is the recommended space for caching efficient?"

Sensible advice for a query can be obtained if we allow some hypotheses, and focus on some simple facts.

1. Only the content downloaded over HTTPcan be cached.

2. The maximum speed can be retrieved with the contents of the tube depends on the Internet.

3. There are a lot of HTTP traffic, the United Nations in the cache, for example - streaming audio / video, the pages, the results of other SQL queries, including query-driven search engines, and HTML content in web mail.

4. The main content that is cached HTML pages, embedded images, style sheets, Java scripts and other files that you download andrunning on the local desktop or another application to display ads, such as PDF / Flash (some) files.

5. A simple request to display a Web page that automatically triggers a normal browser, download a variety of content, such as cookies, images and other embedded objects. These are necessary for the browser to display the page, as for the design of the page. All components that can represent the web page, not necessarily "sources" of the website that has served the requestWebpage.

6. Modern Internet browsers, caching, state that the user is manageable, and has so much as the principles of caching proxy cache design. So not, any substance or object is necessarily required. Yes, but these browsers will depend on the availability of local memory on client systems and usually no more than a few hundred MB. And in any case, these local cache can not be used between different users.

7. Probability of having Internet resourcesdifferent load, depending on time of day, in peak times and in peak hours.

So, if we have a tube of the Internet is 10 Mbps, the maximum amount of data that we can (data transfer)

= 10Mbps x 60 seconds = 600 Mbits of data in a minute

= 600 x 60 = 36,000 Mbits of data per hour

Now assume the company uses a transmission bandwidth for QoS reservation for each pre-application (or protocol). In general, applications such as SMTP and VPNthe lion's share, almost 50%, and the rest is distributed between the HTTP / HTTPS, and others.

But I know very few people who would invest in pipes, designed exclusively for SMTP and / or VPN, and a separate (cheaper) connection to the Internet for HTTP / HTTPS.

If the company has chosen, it web server host at business premises is, then the entire program distribution has changed radically.

Even if the company is not an operator of bandwidth, which infirst come, first serve basis, we may still be managed by an estimated dose of transport on the basis of applications or protocols.

To build our algorithm could be useful with a concept - HTTP_Share, so that - HTTP_Share = x% of the pipe of the Internet.

Now that would HTTP_Share, the data on Max, who would transfer HTTP traffic

Therefore, the more our previous derivation of 36.000 Mbps of throughput per hour, if we take the factorHTTP_Share

HTTP_Traffic = x% data throughput

Now, if x = 35 (35% of the total data transfer for HTTP)

HTTP_Traffic / h (0.35 x 36,000) Mbits / hour = 12,600 Mbit / hour

Now suppose the company has off-peak and peak hours of Internet use, so that 40% of the day (about 9.6 hours) and peak hours, while 60% of non-peak days. Peak-hour days are the days when we would use our witness TOTALInternet line. And if we assume that the utilization rate exceeds 30%, ie the stress level during off-peak hours by about 25% of the peak, one can also estimate, based on the derivation of the above --

HTTP_Traffic / day = ((12,600 x 0.4) + (12,600 x 0.6 x 0.25)) x 24

HTTP_Traffic / day = ((0.4 x 1) + (0.6 x 0.25)) x 12,600 x 24 = 166,320 Mbits

This is looking for a model is very simplistic. More realistic would require a reasonable hourly rateStepping a suitable model of distribution is throughout the day.

Now we have to deal with the more difficult and more controversial!

What would be the relationship cacheable_content in HTTP_Traffic?

Based on my experience in various customer premises, I prefer to accept - 30%.

That would mean 166,320 x 0.3 = 52,617 Mbits of content that may be cached per day.

Standard practice is to store the content for at least 72 hours to (Store-Age).

Namely, we would need a deposit ofleast 49,896 Mbit.

Thus, a conventional 8bits = conversion 1byte, tells me that we need a deposit of at least 6237 MBytes

Another interesting picture is not visible, to be used during peak hours is that if HTTP_Traffic is used because the data downloaded from the proxy server view, it should be less than the data sent to the customer, and the difference would be the cache efficiency. This means that the cached content is used for the parameters would be to serve theCustomers.

In the discussion we have ignored the fact that the decrease in performance is not damaged due to factors like network latency.

In the method described, however, still no answer to the original question. As in the original application, the tube of the Internet is not defined. So I was pretty skeptical that such a calculation was never implemented, because it was the number of users) (customer, which has been defined, better known as my approach to Internet_Pipe. MyArguments and their insistence that the content can be stored in a fraction of the likely content of the HTTP download. And the maximum that can be downloaded to Internet_Pipe if you have one person or one million users. Tushar Dave of Reliance Infocomm, has helped me, the puzzle is complete with an interesting algorithm that has been presented as the missing piece of the whole puzzle!

Suppose your ISP serves its customers with256 Kbps connections, then for 40,000 users, it seems almost 10 Gbit / s requires Internet pipe.

But in reality that is never true in general (indeed it may be a reality for 40,000 users in ISP Commission with a web-pipe of less than 1 Gbit / s in most cases!). The ISP has never implemented for 1, while each user to receive at any time. This is called the OFF time is known, ie when a user is the content that is already available and visualization. ISP maywait at least 50% of leisure.

OFF-time can also go up to over 75% if your ISP is doing his service more personal and small businesses, which together with the Internet connection is not used by several users. Second, most of these user accounts are governed by a cap of bandwidth, for example, a user can choose to accounts that allow you download a few Gbs.

In the derivations above, we estimated the HTTP_traffic / day of pipe of the Internet, but now simply purchaseresulting HTTP_traffic / day from HTTP_Traffic expected per month.

Therefore, the estimated throughput over the tube be derived without knowing the Internet! And the derivation of the above is still valid!

So let's see if we can do some calculations (empirical, of course!)

connections = 40000

User_connection = 256 Kbps

HTTP_Share = 35%

ON_time = 50%

peak_hours = 60%

off_peak_utilisation = 25%

cacheable_content =35%

store_age = 3 days

PEAK_HTTP_LOAD (in kbit / s) = x x compounds user_connection HTTP_Share = 3584000

NORMAL_HTTP_LOAD (in kbit / s) = PEAK_HTTP_LOAD x ON_time = 1792000

HTTP_Traffic / h (Kbits) = NORMAL_HTTP_LOAD x 3600 = 6451200000

/ H (Kbits) Cache_Increment = x cacheable_content (HTTP_Traffic / hour) = 2257920000

Total_Cache_Increment / day = 24 x ((1 - x peak_hours

) Off_peak_utilisation + peak_hours) x (Cache_Increment/ Hour) = 2257920000

Requests for memory (Kbits) = x store_age (Total_Cache_Increment / day) = 6773760000

Request storage capacity (in Mbits) = 6,615,000

Request storage capacity (in GB) = 6.459,9609375

Date 8 bits = 1 byte, it seems that we need a little more than 800 GB of storage

However, I would requisition cacheable_content storage capacity for a possible increase in content downloaded from 35% () hosts maintainedstore_age least 3 cycles, ie 800 x 1.35 ^ 3 = 1,968 GB

The derivation above is subject to a right set of assumptions. But should, in the course of adjustments in price, very simple.

For example - if the connection is increased by 20%, 20% then we would need more space!

But, more importantly, allows anyone with my beliefs differ

but the approximation of the stocks.

It seems so easy nowWe thank Tushar.

chocolate birthday gift consumer bankruptcies