Friday, November 01, 2013

Eclipse 4.2 in Ubuntu 12.04

Eclipse 4.2 in Ubuntu 12.04 | Bruno Braga:

  1. Install the version of eclipse that comes with Ubuntu repo (for 12.04 its eclipse 3.8)
  2. Then follow the below steps to update that version with the latest eclipse. After that, when you click on the original Eclipse icon it will launch the updated eclipse.
  3. Note: We will need to redo the steps every time we want to update eclipse installation. 

# Get the Eclipse installer for Linux 
# (if you do not know the flavour, just choose "Classic")
# http://www.eclipse.org/downloads/
 
 
# Unpack it
$ tar -zxvf eclipse-SDK-4.2-linux-gtk.tar.gz
 
 
# if you have a previous Eclipse version installed
# just move it (in case anything goes terribly wrong, 
# you can just rollback)
$ sudo mv /usr/lib/eclipse /usr/lib/eclipse-old
 
 
# move the unpacked directory to lib
$ sudo mv eclipse /usr/lib/
 

Why OpenTSDB chose HBase for Time Series data storage? - Stack Overflow

Why OpenTSDB chose HBase for Time Series data storage? - Stack Overflow:

Nice reply to this question by OpenTSDB author for why StumbleUpon uses HBase (and in fact facebook does the same for big data analytics):

I chose HBase because it scales. Whisper is much like RRD, it's a fixed-size database, it must destroy data in order to work within its space constraints. HBase offers the following properties that make it very well suited for large scale time series databases:
  1. Linear scaling. Want to store data? Add more nodes. At StumbleUpon, where I wrote OpenTSDB, our time series data was co-located on a 20-node cluster that was primarily used for analytics and batch processing. The cluster grew to 120 nodes fairly quickly, and meanwhile OpenTSDB, which makes up only a tiny fraction of the cluster's workload, grew to half a trillion data points.
  1. Automatic replication. Your data is stored in HDFS, which by default means 3 replicas on 3 different machines. If a machine or a drives dies, no big deal. Drives and machines die all the time when you build commodity servers. But the thing is: you don't really care.
  1. Efficient scans. Most time series data is used to answer questions that are like "what are the data points between time X and Y". If you structure your keys properly, you can implement this very efficiently with HBase with a simple scan operation.
The fact that HBase is column oriented wasn't nearly as important a consideration as the fact that it's a big sorted key-value system that really scales.
All RRD-based and RRD-derived tools couldn't satisfy the scale requirements of being able to accuratelystore billions and billions of data points forever for very cheap (just a few bytes of actual disk space per data point).

TempoDB docs:

  1. https://tempo-db.com/docs/modeling-time-series-data/
  2. http://www.plainlystated.com/2013/01/bulk-loading-time-series-data-tempodb/

Book Review: Spring Start Here: Learn what you need and learn it well

  Spring Start Here: Learn what you need and learn it well by Laurentiu Spilca My rating: 5 of 5 stars This is an excellent book on gett...