
Economical Petabyte Scaling
In today’s business climate, every leading organization finds itself in the data business. Companies are storing increasingly detailed information about customers and business processes, storing it for longer, and analyzing it more deeply than ever before. It is no surprise that typical data volumes are growing by 1.5 to 2.5x a year. Given these new realities, traditional data warehouse solutions are failing to provide the scalability and cost-effectiveness that business are demanding. They need a solution that can go from terabytes to petabytes without a hitch, and do it without exorbitantly expensive proprietary hardware solutions. The answer is Greenplum Database. Greenplum Database utilizes a shared-nothing, massively parallel processing architecture that is optimized for data warehousing, business intelligence, and analytical processing. Customers can leverage the disruptive price-efficiencies of commodity servers, storage, and networking to economically scale to petabytes and meet the challenges of today and tomorrow.
Massively Parallel Query Execution
The real test of an analytical database is how quickly it can return answers to complex questions against large volumes of data. It is on these queries that traditional data warehouse solutions show their limitations, demonstrating limited parallelism and bottlenecks that can slow processing to a crawl.
Greenplum Database utilizes state-of-the-art parallel processing techniques to return answers to queries at unmatched speed -- often 10 to 100 times faster than traditional solutions.
The key is Greenplum's parallel dataflow engine, which connects 10s, 100s or 1000s of processing cores and disks into a massively parallel query processing supercomputer. Greenplum fully utilizes the power of each core with linear scalability, ensuring that processing can keep up as your data volumes grow.
Unified Analytical Processing
Most enterprises have a patchwork of platforms and tools for storing and analyzing data. Structured data tends to be stored in databases, while unstructured data often lives in file systems. Each group that may want to analyze data - DBAs/analysts, software developers and statisticians - is likely to have its own languages and specialize in particular types of analysis.
The result is siloed inefficiency and lost opportunities. Developers or statisticians who want to use innovative new analysis algorithms against data in the database must find their own servers and storage on which to run their analysis, spend hours or days copying over slices of the data, and then slowly churn through the data one record at a time.
The Greenplum Database dramatically improves on this status quo by providing the first unified analytical processing platform. This unique architecture allows all users of the system to mix and match data sources (structured in-database, unstructured external) and programming styles (SQL, MapReduce, R, Perl, Python, etc) and have them all run on a common massively-parallel infrastructure. Developers and statisticians can now directly analyze any data in the system, without any extracting or moving data, and leverage the full massively-parallel processing performance of the Greenplum system.
This innovation makes it easy for companies to deploy the latest techniques in machine learning, graph analysis, statistical computing and text analysis techniques against any of their data.
By: Paul Salazar, GreenPlum
January 9, 2009
|

|