Saturday, July 10, 2010

My comments to "Cassandra at Twitter Today"

Someone said the twitter blog post "Cassandra at Twitter Today" is a big blow to the reputation of Cassandra.


Here are my comments:

1. Cassandra is very young! Especially, the design and implementation of local storage and local indexing are junior and not good.

2. Pool read-performance is also due to the poor local storage implementation.

3. The local storage, indexing and persistence structures are not stable. They need to be re-designed /re-implemented. If Twitter move data to current Cassandra, they should do another move later for a new local storage, indexing and persistence structure.

4. There are many good techniques in Cassandra and other open-sourced projects (such as Hadoop, HBase ...), etc. But, they are not ready for production. Understand the detail of these techniques and implement them in your projects/products.

Monday, April 19, 2010

Cassandra Insert Throughput

** 0.5.1

Test Cluster:
DELL 2950 1*CPU Intel Xeon 5310 (4 cores)
5 nodes
1 node: 2GB heap for Cassandra JVM
4 nodes: 4GB heap for Cassandra JVM

Commit-log and Data stored on same disks.
25 client threads run on 5 nodes.

Data Model:
Keyspace Name = “Test”
Column Family Name = “ABC”
CompareWith for Column = LongType
Column Name = Timestamp (LongType), Value = 400 bytes binary
Billions of keys, thousands of columns.

Partitioner = dht.RandomPartitioner
MemtableSizeInMB = 64MB
ReplicationFactor = 3

Use Thrift Client Interface
Client.insert(..)
Consistency Level (write) = 1

Total inserted 1,076,333,461 columns.
Disk Use: 302GB+283GB+335GB+186GB+276GB=1,382GB (~~400B*1G=400GB *3= 1200GB)

On inserting: 1000 SSTables on each node. The latency of a query is about 1~3 seconds.
Quiet for long time: 10 SSTables (very big files, such as there is one 144GB SSTable data file)
The latency of a query is in ms.

Result: 18,000 columns/second


** 0.6.0
Only 4 nodes.

JVM GC for big heap.
Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software!
https://issues.apache.org/jira/browse/CASSANDRA-896 (LinkedBlockingQueue issue, fixed in jdk-6u19)

Seems 0.5.1 performed better.
0.6.0 eat more memory.

Tuesday, March 30, 2010

Please don't puzzle on Column-Stores

Daniel Abadi have a blog post here:
http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html

I want to leave a comment and to correct it here:

It is meaningless to compare the two groups, they target to different applications. I think the post just make more confusion. And what Mr. Stonebraker said is also not right.

I think the only thing which make these confusions is the term "column" in Group A. In fact, it is not traditional "column" of RDBMS area. And your example of a traditional spreadsheet table is also not the real target of Group A. The "column" name in Group A is in fact data (not schema).

If have got to change the term, I think we can change the term "column" in Group A to "end-key". In fact, in Bigtable, there is no column, it is "qualifier".

In short:
(1) Group A's "column" is in data, not schema.
(2) Group B's "column" is in schema.
They are different in conception and application target.

Wednesday, January 6, 2010

Jeff Dean and Sanjay Ghemawat's good advices on MapReduce


I'd like to put a copy here, since this paper[1] matchs my opinions so much on MapReduce model and the pratices about large-dataset management/processing implementations.

In the paper, Jeffrey Dean and Sanjay Ghemawat reply Stonebrake and DeWitt's misconceptions about MapReduce. In fact, these misconceptions are so obvious and easy to understand for us.

It is also a good guide to improve the implementation of Hadoop and other members in the family. Suggest you reading it carefully.

Dean and other scientists from Google always bring us clear and reasonable explains about their technologies and pratices. But sometimes, someones from other organizations bring use puzzles.

Except for the five witchcrafts which Google exposed in following papers:
GFS: http://labs.google.com/papers/gfs.html
MapReduce: http://labs.google.com/papers/mapreduce.html
Bigtable: http://labs.google.com/papers/bigtable.html
Chubby: http://labs.google.com/papers/chubby.html
幻灯片 6 Google Cluster and WorkQueue Cluster Management

Following papers/articles/keynotes are very worthy of careful reading:
Jeff Dean Keynotes on LADIS09 (Designs, Lessons and Advice from Building Large
Distributed Systems): http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
Jeff Dean Keynotes on WSDM09(Challenges in Building Large-Scale Information Retrieval Systems): http://research.google.com/people/jeff/WSDM09-keynote.pdf
Jeff Dean Stanford-295-talk (Software Engineering Advice from Building Large-Scale Distributed Systems): http://research.google.com/people/jeff/stanford-295-talk.pdf
Jeff Dean "Handling Large Datasets at Google": http://hepix.caspur.it/storage/hep_pdf/2008/Spring/handling-large-datasets-20080507.pdf
Jeff Dean "A Behind the ScenesTour": http://www.slideshare.net/rawwell/googleabehindthescenestourjeffdean

And following so called GFS-II articals:
Sean Quinlan: GFS: Evolution on Fast-forward (http://queue.acm.org/detail.cfm?id=1594206)