Sunday, September 5, 2010

office without computers

I was going through this article today

Well, Larry was really ahead of his time. This idea is worth giving a thought.
Think of this - you got an office with a staff of 10. You need 10 computers for the staff. 1 laptop for yourself and a server which helps you with all your storage requirements. you also need a Harddisk some where of 500 G or 1 T to back your data up.

Let us calculate the cost.
10 PCs 20 K each = 200000
1 Latptop = 50000
1 TB HD = 10000
1 server = 100000

So over all initial investment is arround 3,60,000 Rs.

maintenance cost per month from a local vendor = atleast 10000 Rs.
plus one employee who knows the back and recovery task and other general computers related jobs who can help the people arround = 15,000 Rs.
So monthly cost : 25,000.

It would be so wonderful if I can reduce this cost of 25,000 to 10,000 per month with initial invest ment less than 10,00,000 ?
Cloud can help us here. or may be we would need something more than cloud.
What if some one on internet can provide me with the office requirements, storage and backup requirements , online support and other stuff. If we can also get some hardware stuuf described in the article with minimal hardware requirement and virtually no processing unit. I think I need to get deeper into this and get back. Saurabh Acharya likes this :) (The FB Effect)

Saturday, September 4, 2010

River to the cloud.


Everyone is talking about the cloud – The On-Demand-Computing. Indeed it is the next big thing in here. And due to its tremendous calibre, it is going to be here. Globalization has affected the business model in a huge manner. Organizations are forced to think and move beyond their traditional boundaries. IT in a way is the supply chain of the thoughts, rather intelligence, which feeds the movement of organizations beyond their boundaries. And cloud is going to be the new IT delivery model of the future. The organizations can now concentrate more on ‘how to use to information’ than on ‘how to get the information’

Cloud computing is a big pie. And we are here to get our share of the pie. So are the big players. Two of the very first companies to provide the cloud platform are Amazon and GoGrid. The Big blue is out with its IBM Smart Desktop and IBM LotusLive products. The social networking site FaceBook runs on Cloud. And so is the Germany based ERP giant SAP.

SAP has several cloud projects to be released in the future. One of the ready-to-be-moved projects is “River”. Well, this is not yet the official name; it’s the code name of the project. River is a cloud-computing platform that supports simple extensions to the already deployed or live ERP systems. SAP has been working on River for about 18 months, said CTO Vishal Sikka in a recent interview. River applications are meant to be simple, with only a few dozen screens or so, and are easy to put together and extend, Sikka said. Still, SAP has done "a significant amount of work" on River in areas such as data security and authentication, "which frankly have been missing in many cases in the cloud atmosphere," he said.

River would be a middleware between the existing lightweight solutions and the cloud. The first application by SAP based on river is Carbon Impact 5.0. It is set to be unveiled in near future and will be running on Amazon's Elastic Compute Cloud. “Carbon Impact can help global organizations credibly report and profitably reduce the enterprise carbon footprint” as written on SAP’s website. This product would help the organizations to maintain the carbon inventories, quantify and compare carbon intensity across your operations – by process, product, and/or facilities and reducing their carbon footprint and thus releasing more carbon credits and generating revenue out of the released carbon credits. Think of this - Carbon Impact on a Cloud- amazing futuristic product on an equally amazing and futuristic technology. I must say SAP has got some very sharp minds at top!

SAP went with Amazon at first because it has "a great expertise in this area," Sikka added. "It's a great way for us to deliver the first application, but by no means the only one." SAP is also planning to build more applications on platform behind its Business ByDesign, SAP's on-demand ERP suite for the midmarket. Business ByDesign is a SaaS solution from SAP. SAP is planning a bigger release of Business ByDesign by the end of 2010. Probably with these releases, it may also reveal some new platform for cloud! Bye the time let us wait for the River. Its yet to be seen how this river helps SAP and its customers leverage the cloud.

Performance Problems with Data processing in Mid-size businesses


Data is growing by leaps and bounds. You throw a query on Google about world's largest database and the returns would give numbers ranging from a few hundred terabytes to yahoo's 2 petabyte DB. They are known as Extremely Large Databases (ELDB). Yahoo and other giants have enough money to spend for turning that big data pie into intelligent information. The concern of this article is not the ELDBs. It is the second and the third tier of databases, where the DB sizes range from single digit TBs (called Large Databases or LDB) to a few hundred TBs (called Very Large Databases or VLDB), which the article would discuss. Most of the enterprises having an IT system have databases of the scale of LDB or VLDB.

The big player

According to IDC’s Database Market Share Analysis (2008), Oracle has the highest market share (44.3 %) followed by IBM (21%), Microsoft (18.5%), Sybase (3.5%) and Teradata(3.3%). Though Gartner's numbers are different, but even in their report, Oracle is the market leader with 47.1 % market share (as in 2007). The problem here is that these traditional database vendors, to

remain in the ascendency, keep on pumping budgets for research on BI for ELDB (Extremely Large Databases). This article attempts to give an insight for those not so big databases in the midsize businesses. And as Oracle database has the highest share, let us concentrate more on oracle database.

The evolution

The IT system of any organization evolves over a period of time. A typical IT system would evolve in three stages viz. Initial stage (Capturing transaction), intermediate stage (Process optimization) and Advanced stage (Business Intelligence). In the initial stage, the enterprise would try to build an application which can help in performing their day-to-day operations and capturing the events. Events like capturing the materials purchased by the purchase department and payment made to the vendor are examples of it. In intermediate stage, they try to find the trends in the processes followed in the organization and accordingly optimize those processes if required. For example, the vendor's payment clearance from the accounts department might be taking two days due to some batch processing or due to the clearance from other departments. The third stage or the advanced stage is the one where the realistic and intelligent information is to be squeezed out of the captured data. We know it as Business intelligence. The trends in the sales during the monsoon sale or effect of discounts on the buying behavior of the customer are the examples of the referred intelligence.


In this approach, the advanced stage and the intermediate stage, to some extent, require large amount of data processing and number crunching. At this point of time, the organization does not have a BI system neither it has a data warehouse to process its data intrinsic queries. At most the IT department could do is a) Replicate the database so that the processing does not affect production environment and b) merge, scrub, aggregate and organize the data to give some kind of OLAP(Online Analytical Processing)- like replica of the OLTP databases to the users for running their large queries. On top of this, they may also try to de-normalize the tables. But these activities do have a lot of complexities and hurdles if it needs to be done on a daily basis. It would be a nightmare for the DBAs to perform the replication and re-arranging the data. One option is to write a job, but even then, maintaining the job and keeping on tuning the job to match the growing volume of data would be tough. Is there a sustainable and inexpensive solution to this problem? Let us see. Let us try to study a few aspects which can be applied by the organizations at the second and third stage to improve the database performance without moving to the high-end solutions. Let me remind here, we are here referring to the oracle database.

Some Problems and some solutions

First, let us check out what oracle thinks that would be better to do with large tables and the complex queries it has to run. Oracle uses a tool called optimizer. Every time a query is fired in the database, it goes to the optimizer and the optimizer chooses the best plan to fetch the data. There are two types of optimizers- Rule based optimizer (RBO) and Cost based optimizer (CBO). Rule based optimizers are almost obsolete from oracle DBs. Each oracle query has multiple choices for execution plans. The execution plan includes a pointer to the index which is to be used for a particular column, the order of the tables to be joined, the process to be followed while joining these tables(or which type of join is to be used- nested loop join, hash join, etc.), search for any parallel executions if possible and so on. These execution plans are selected based on the CBO algorithm. To get a glimpse of the execution plan, there are certain methods which Oracle provides, for example: ‘explain plan’ utility, SQLT or TKPROF.

The question here is- how does this CBO makes these plans? Every database environment is different. So the CBO algorithm needs some inputs from the environment for generating these execution plans. These inputs are nothing but statistics generated by oracle for the tables. This statistics include the basic information about the table like the number of records and some calculated information like ‘skewness’ (elaborate later) of data. This algorithm of getting inputs and finding the best execution plan, works pretty good most of the times. But there are certain conditions where it fails to deliver. One of the flaws in calculating the statistics is that it needs a parameter in percentage which says that how much percentage of data is to be taken as sample for calculating those statistics. As we all would agree that sampled data is always subject to error. It has been observed that most of the DBAs prefer 10-15 % of data as sample. During an experiment we calculated it with 30% and there was a drastic difference in the results. Fortunately they were positive, but be prepared to expect the other way round.

One more problem while analyzing the DB objects is that it tries calculating the skewness of the data. Skewness here is defined as the difference between the largest and the smallest value of the column. If the data is more skewed, then oracle tries building a histogram for that column. This histogram helps in distributing the column values in different buckets and help while retrieving the results. The limitation here is that oracle does not consider the values in between. For example, if column values are in range 1-10000 and 100000-200000 then oracle would try building histogram for range 1-200000. It does not understand that the range 10001-99999 is empty. We need to make oracle understand that indeed oracle is an efficient and intelligent system but there are such limitations to its intelligence.

One solution to the above discussed problems is to retain the statistics of the database. Other solution is to increase the sample size used to analyze the database objects. This may take time but the results would improve. Moreover changing the environment parameters may also work out. But while doing so, one need to take utmost care and should thoroughly understand the impact of that parameter. There are some more solutions to such problems which are beyond the scope of this article.

Any talk about performance improvement is incomplete without reference to OCI (Oracle Call Interface). We all know about PL/SQL. We make procedures and functions in PL/SQL to process and present the data. OCI can be said as the helping hand when PL/SQL objects become a bottleneck in performance. OCI is a wonderful C based API provided by Oracle for giving an interface to the Oracle database in a low level language like C. OCI provides all the facilities needed to access the database ranging from database administrative tasks to firing queries. Moreover it also fully supports the data types, calling conventions, syntax, and semantics of C and C++. Let us see how OCI can make a difference in overall performance over PL/SQL.

The figure shows the working of a PL/SQL block. It would require ‘PL/SQL Engine’ to dismantle the block and fetch the data from the database. If the PL/SQL blocks are bulkier, then engine has to do more work. Now here is the key. Let such bulky operations be performed by a C compiler instead of giving it to PL/SQL engine, the former being faster. We can altogether by-pass the engine and directly throw the queries to the oracle server. It is apparent that OCI would work faster. But here too, we need to take utter care for building such programs. OCI is definitely a very powerful tool. And as with every powerful thing, even this can devastate the environment if not used cautiously.

These were the few aspects which can be helpful when enterprise does not have a full fledged BI system. But again, these measures would not be long lasting. Tuning the databases is very tricky and so is using programming languages to access the data. With the growth of data and complexity of requirement being increased, the organizations would definitely need one of the two – a BI system or an enormously efficient team of administrators and programmer. The choice is yours.