Apache Cassandra Notes

Russell Bateman
August 2017
last update:

It turns out that Cassandra was so named because of the allusions to a curse on an oracle—pun intended toward the latter, software giant's famous RDBMS.

The actual history of the prophetess is rather murky and sordid coming as it does from myriad sources and inspirations. The synthesis I'm used to is that she was given the prophetic ability by Apollo in a hoped-for exchange of her womanly pleasures, but she backed out at the last minute whereupon the god spat in her mouth condemning her always to prophesy and never be believed. She was simply thought to be mad.

So it is that she famously warned against bring the Achaean offering left behind into the gates of Troy. Despite being the first-family daughter of Priam and Hecuba, her warning was ignored leading to the well known city's infamous downfall.


Cassandra's strengths are:

Down-sides, if important to you:

Setting up to unit-test with Cassandra...

Test first, code second, that's the order...

Here's what I'm using in pom.xml:



It's notoriously difficult to unit-test code that calls into a database's APIs. Cassandra provides an embedded, stand-alone database that calling isn't like a real instance in terms of having to set up a local instance let alone separate cluster-node instances.

Here is a simple test to see if this embedded Cassandra will start up. It does nothing except demonstrate that Cassandra's unit-testing helper will work.

package com.etretatlogiciels.cassandra;

import java.io.IOException;

import org.junit.BeforeClass;
import org.junit.Test;

import org.apache.cassandra.exceptions.ConfigurationException;
import org.apache.thrift.transport.TTransportException;
import org.cassandraunit.utils.EmbeddedCassandraServerHelper;

 * To run this, you must add a Run/Debug Configuration in the form
 * of an Environment Variable:
 * LD_LIBRARY_PATH=/home/russ/dev/cassandra/target/classes
 * This is so that libsigar-amd64-linux.so can be found and loaded
 * by the Cassandra code.
public class CassandraExampleTest
  public static void startCassandra()
      throws TTransportException, IOException, InterruptedException, ConfigurationException
    EmbeddedCassandraServerHelper.startEmbeddedCassandra( "another-cassandra.yaml", 20000 );

  public void test()
    System.out.println( "This is a test!" );

An early project...


cluster_name: 'Test Cluster'
hints_directory: target/embeddedCassandra/hints
cdc_raw_directory: target/embeddedCassandra/data/cdc_raw
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner

# directories where Cassandra should store data on disk.
commitlog_directory: target/embeddedCassandra/commitlog
disk_failure_policy: stop
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
saved_caches_directory: target/embeddedCassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      - seeds: ""
concurrent_reads: 32
concurrent_writes: 32
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7010
ssl_storage_port: 7011
start_native_transport: true
native_transport_port: 9152
start_rpc: true
rpc_address: localhost
rpc_port: 9175
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: false
column_index_size_in_kb: 64
compaction_throughput_mb_per_sec: 16
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
  internode_encryption: none
  keystore: conf/.keystore
  keystore_password: cassandra
  truststore: conf/.truststore
  truststore_password: cassandra

Basic Cassandra connection

public class CassandraConnector
  private Cluster cluster;
  private Session session;

  public void connect( final String node, final int port )
    cluster = Cluster.builder()
              .addContactPoint( node )
              .withPort( port )

    Metadata metadata = cluster.getMetadata();

    System.out.println( String.format( "Connected to cluster: %s",
                          metadata.getClusterName() ) );

    for( Host host : metadata.getAllHosts() )
      System.out.println( String.format( "Datacenter: %s, Host: %s, Rack: %s",
                            host.getRack() ) );

    session = cluster.connect();

  public Session getSession() { return session; }
  public void    close()      { cluster.close(); }

Cassandra data types

Nothing too surprising here...

ascii counter float list text tinyint varint
bigint date frozen map time tuple
blob decimal inet set timestamp uuid
boolean double int smallint timeuuid varchar

Assuming I'll ever need to do so, here's a Java enumeration for internal use. However, this is really code too early and may not be of much use ultimately.

public enum CassandraType
  c_text,       // UTF-8 encoded string
  c_ascii,      // US_ASCII 7-bit
  c_varchar,    // UTF-8 encoded string

  c_int,        // 32-bit signed
  c_bigint,     // 64-bit signed
  c_smallint,   // 2-byte signed
  c_tinyint,    // 1-byte signed
  c_varint,     // arbitrary-precision

  c_decimal,    // variable-precision
  c_float,      // 32-bit IEEEE-754
  c_double,     // 64-bit IEEEE-754

  c_boolean,    // true/false
  c_counter,    // distributed, 64-bit

  c_date,       // 32-bit day since Epoch
  c_time,       // 64-bit nanoseconds since midnight
  c_timestamp,  // 8 bytes since Epoch; date and time with millisecond precision
  c_timeuuid,   // ?

  c_inet,       // IPv4 or IPv6
  c_tuple,      // 2-3 fields
  c_uuid,       // 128-bit globally unique identifier

  c_list,       // collection of 1+ elements (performance impact)
  c_map,        // JSON-style array of literals
  c_set,        // collection of 1+ literal elements

  c_blob,       // arbitrary bytes (no validation), in hexadecimal
  c_frozen,     // multiple types in single value, treated as blob

   * Useful to determine whether potential enum type,
   * in string form, is a Cassandra type.
  public static boolean contains( String type )
      CassandraType.valueOf( type );
      return true;
    catch( IllegalArgumentException e )
      return false;

   * Useful to determine whether potential type,
   * in string form, is a Cassandra type.
  public static CassandraType stringToCassandraType( String string )
      CassandraType type = CassandraType.valueOf( "c_" + string );

      if( type != null )
        return type;

      return CassandraType.valueOf( string );
    catch( IllegalArgumentException e )
      return null;

   * Useful to return a list of Cassandra types.
  public static List< String > getCassandraTypes()
    List< String > list = new ArrayList<>( CassandraType.values().length );

    for( CassandraType type : CassandraType.values() )
      list.add( type.name() );

    return list;

Friday, 18 August 2017

public class CassandraConnector
  private Cluster cluster;
  private Session session;

  public void connect( String node, Integer port )
    Builder b = Cluster.builder().addContactPoint( node );
    if( port != null )
      b.withPort( port );
    cluster = b.build();
    session = cluster.connect();

  public Session getSession() { return this.session; }
  public void close() { session.close(); cluster.close(); }

In Cassandra, there's something called, keyspace. This is a little like the schema in a relational context. Remember, Cassandra isn't a document database like MongoDB, but a columnar database. The keyspace is the outermost container for data in Cassandra. The main attributes to set per keyspace are the...

Another important notion in Cassandra are the column, a data structure that contains a column name, value and timestamp. The columns and the number of columns in each row may vary in contrast with the contents of a relational database table where data are rigidly structured.

Creating a keyspace...

For the example I'm studying, the keyspace to create is "library":

public void createKeyspace( String keyspaceName, String replicationStrategy, int replicationFactor )
  StringBuilder sb = new StringBuilder();

    .append( keyspaceName )
    .append( " WITH replication = {" )
    .append( "'class':'" )
    .append( replicationStrategy )
    .append( "','replication_factor':" )
    .append( replicationFactor )
    .append( "};" );

    String query = sb.toString();
    session.execute( query );

Tuesday, 22 August 2017

I reposted to the Cassandra users' forum asking for a reply so that I know my posts are even getting there. I finally got an answer back, but the suggestion was just a pile of code that merged unit testing and production together without ultimately providing a solution around the problem I'm having:

Exception (java.lang.ExceptionInInitializerError) encountered during startup: null
	at org.apache.cassandra.transport.Server.start(Server.java:128)
	at java.util.Collections$SingletonSet.forEach(Collections.java:4767)
	at org.apache.cassandra.service.NativeTransportService.start(NativeTransportService.java:128)
	at org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:649)
	at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:511)
	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:616)
	at org.cassandraunit.utils.EmbeddedCassandraServerHelper$1.run(EmbeddedCassandraServerHelper.java:129)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException: name
	at io.netty.util.internal.logging.AbstractInternalLogger.(AbstractInternalLogger.java:39)
	at io.netty.util.internal.logging.Slf4JLogger.(Slf4JLogger.java:30)
	at io.netty.util.internal.logging.Slf4JLoggerFactory.newInstance(Slf4JLoggerFactory.java:73)
	at io.netty.util.internal.logging.InternalLoggerFactory.getInstance(InternalLoggerFactory.java:84)
	at io.netty.util.internal.logging.InternalLoggerFactory.getInstance(InternalLoggerFactory.java:77)
	at io.netty.bootstrap.ServerBootstrap.(ServerBootstrap.java:46)
	... 10 more

I've since read other attempts to explain using this helper class, but no matter how hard I've tried, I keep coming back to the error above. I worried originally that the error was saying that I had done something stupid, but I don't believe that now. It means that I don't know how to start the Cassandra unit test help up. The articles I've read all assert that I need only call it:


...but, this is not true. I've tried to supply a YAML file and have, I think. It came from step 2 in this article. Though this is required (and hardly do all the authors allude to it), it doesn't work the magic. I got one from someplace that I'm using. I've also added log4j-embedded-cassandra.properties to no avail.

I bottled up some simple test code from Testing Cassandra repositorys using Cassandra Unit. I didn't use the Spring Boot code, but just the basic Java code. It worked; it's the early project above. This means there's some crapola going on, likely slf4j in my greater nifi-pipeline project.

This Cassandra unit-test stuff works. Sadly, the thrust of the tutorial is Spring Boot, and the useful code is overly infected by it and therefore pretty useless when there are other tutorials around.

Wednesday, 23 August 2017

Monday, 28 August 2017

Setting up Cassandra as local to my development host:

https://www.tutorialspoint.com/cassandra/cassandra_installation.htm (following along with this)

http://cassandra.apache.org/download/ (Browser download to ~/dev/cassandra)

~/dev/cassandra $ tar -zxf apache-cassandra-3.11.0-bin.tar.gz
~/dev/cassandra $ ll
total 37060
drwxr-xr-x   3 russ russ     4096 Aug 28 12:59 .
drwxrwxr-x. 96 russ russ     4096 Aug 28 12:58 ..
drwxr-xr-x  10 russ russ     4096 Aug 28 12:59 apache-cassandra-3.11.0
-rw-rw-r--   1 russ russ 37929669 Aug 28 12:58 apache-cassandra-3.11.0-bin.tar.gz

~/dev/cassandra/apache-cassandra-3.11.0/bin $ gvim cassandra.yaml
  (insert https://svn.apache.org/repos/asf/cassandra/trunk/conf/cassandra.yaml)

export CASSANDRA_HOME = ~/dev/cassandra/apache-cassandra-3.11.0

~/dev/cassandra/apache-cassandra-3.11.0/bin $ sudo bash
[root@localhost bin]# mkdir -p /var/lib/cassandra/data
[root@localhost bin]# mkdir -p /var/lib/cassandra/commitlog
[root@localhost bin]# mkdir -p /var/lib/cassandra/saved_caches
[root@localhost bin]# mkdir -p /var/log/cassandra
[root@localhost bin]# chmod 777 /var/lib/cassandra/
[root@localhost bin]# chmod 777 /var/log/cassandra/

~/dev/cassandra/apache-cassandra-3.11.0 $ ./bin/cassandra -f
(lots of fun stuff...)

(output from my Cassandra connector test code...)
Connected to cluster: Test Cluster
Datacenter: datacenter1, Host: /, Rack: rack1

I want to connect to Cassandra, and do some stuff like use prepared statements. Here's my Cassandra code...


package com.etretatlogiciels.cassandra;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Metadata;
import com.datastax.driver.core.Session;

public class CassandraConnector
  private Cluster cluster;
  private Session session;

  public void connect( final String node, final int port )
    cluster = Cluster.builder()
               .addContactPoint( node )
               .withPort( port )
    session = cluster.connect();

  public Session  getSession()  { return session; }
  public Metadata getMetadata() { return cluster.getMetadata(); }
  public void     close()       { cluster.close(); }


package com.etretatlogiciels.cassandra;

import org.junit.After;
import org.junit.Before;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TestName;

import com.datastax.driver.core.Host;
import com.datastax.driver.core.Metadata;
import com.etretatlogiciels.testing.TestUtilities;

public class CassandraConnectorTest
  // @formatter:off
  @Rule   public TestName name = new TestName();
  @After  public void tearDown() { }
  @Before public void setUp() throws Exception { TestUtilities.setUp( name ); }

  public void testConnector()
    if( !TestUtilities.runningInsideIntelliJ() )

    // connects to Cassandra instance running on local box...
    CassandraConnector client = new CassandraConnector();
    client.connect( "", 9042 );

    Metadata metadata = client.getMetadata();

    System.out.println( String.format( "Connected to cluster: %s",
                          metadata.getClusterName() ) );

    for( Host host : metadata.getAllHosts() )
      System.out.println( String.format( "Datacenter: %s, Host: %s, Rack: %s",
                            host.getRack() ) );

This test also appears to work...


package com.etretatlogiciels.cassandra;

import org.junit.After;
import org.junit.Before;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TestName;

import com.datastax.driver.core.BoundStatement;
import com.datastax.driver.core.LocalDate;
import com.datastax.driver.core.PreparedStatement;
import com.datastax.driver.core.Session;

public class TryPreparedStatementsTest
  @After  public void tearDown() { }
  @Before public void setUp() throws Exception
    // connects to Cassandra instance running on local box...
    CassandraConnector client = new CassandraConnector();
    client.connect( "", 9042 );
    session = client.getSession();

  private static final String DROP_KEYSPACE   = "drop keyspace if exists product";
  private static final String CREATE_KEYSPACE = "create keyspace product with replication = { 'class' : 'SimpleStrategy',"
                             + " 'replication_factor' : 1 };";
  private static final String USE_KEYSPACE    = "use product;";
  private static final String DROP_TABLE      = "drop table if exists product.sku_list;";
  private static final String CREATE_TABLE    = "create table "
                             + "product.sku_list( sku text, description text, when date, primary key( sku ) );";
  private static final String INSERT_SKU      = "insert into sku_list( sku, description, when ) values( ?, ?, ? );";

  private CassandraConnector client;
  private Session            session;

  public void testPreparedStatement()
    PreparedStatement statement;
    BoundStatement    bound;

    statement = session.prepare( DROP_KEYSPACE );
    bound     = statement.bind();
    session.execute( bound );

    statement = session.prepare( CREATE_KEYSPACE );
    bound     = statement.bind();
    session.execute( bound );

    statement = session.prepare( USE_KEYSPACE );
    bound     = statement.bind();
    session.execute( bound );

    statement = session.prepare( CREATE_TABLE );
    bound     = statement.bind();
    session.execute( bound );

    statement = session.prepare( INSERT_SKU );
    bound     = statement.bind();
    bound.setString( 0, "665892" );
    bound.setString( 1, "LCD screen" );
    bound.setDate( 2, LocalDate.fromMillisSinceEpoch( System.currentTimeMillis() ) );
    session.execute( bound );

Here's evidence:

~/dev/cassandra/apache-cassandra-3.11.0 $ ./bin/cqlsh
cqlsh> show host;
Connected to Test Cluster at

cqlsh> describe keyspaces;

system_schema  system_auth  product  system  system_distributed  system_traces

cqlsh> use product;

cqlsh:product> describe tables;


cqlsh> describe product;

CREATE KEYSPACE product WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes =

CREATE TABLE product.sku_list (
    sku text PRIMARY KEY,
    description text,
    when date
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32',
        'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

cqlsh:product> select * from sku_list;

 sku    | description | when
 665892 |  LCD screen | 2017-08-28

	(1 rows)

cqlsh:product> exit;

Friday, 8 September 2017

Time out for debugging Cassandra...

I worked with David for half an hour on setting up to debug on Cassandra. Decidedly, trying to do development work under Windows is sorely limiting and greatly lengthens the amount of research one must to do accomplish what are simple actions under a UNIX/Linux shell. This said, it's not going to be a piece of cake on Linux either the first time. Here are some links I looked at:

Note that if JVM_OPTS isn't defined in the environemnt, it can be for the process starting Cassandra. That means they'll be present. Also, note that there are option order problems with See notes from late last year and early this year for running NiFi remotely.

Based on what I've read of the Cassandra-Lucene Index plug-in, the lib subdirectory is guaranteed to be on Cassandra's classpath.

The installation of the Cassandra-Lucene Index plug-in, which must be done by cloning and building the source, is done thus:

mvn clean package -Ppatch -Dcassandra_home=<CASSANDRA_HOME>

Stuff to figure out:

  1. Do Cassandra plug-ins follow a framework (à la Tomcat, NiFi, etc.) in the sense that they must be dropped into s specific subdirectory?
  2. Instead, are they simply code resources (.jar) that a loader knows to load on condition that they be findable in the effective ${CLASSPATH}?
  3. Are we really going to have to build Cassandra because the only way to figure out how this plug-in stuff works is to step through Cassandra itself?
  4. What is the Cassandra client? How is it used relevant to plug-ins?

Tuesday, 12 September 2017

Setting up Cassandra on Linux Mint...

  1. I created /etc/apt/sources.list.d/cassandra.list to contain:
    # from http://cassandra.apache.org/download/
    deb http://www.apache.org/dist/cassandra/debian 311x main
  2. Add the Cassandra repository keys:
    # curl https://www.apache.org/dist/cassandra/KEYS | apt-key add -
  3. Install Cassandra:
    # apt-get install cassandra
    nargothrond sources.list.d # dpkg --list | grep [c]assandra
    ii  cassandra     3.11.0   all  distributed storage system for structured data
  4. Start up the service:
    # service cassandra start
    russ@nargothrond ~ $ sudo service --status-all | grep [c]assandra
     [ + ]  cassandra
  5. I found these subdirectories. In bold are things that I think I want most to know about:
    • /etc/cassandra (cassandra-env.sh, cassandra.yaml, jvm.options where I expect to get remote debugging going)
    • /var/lib/cassandra (commit log, data, hints)
    • /var/log/cassandra (debug log, system log)
    • /usr/share/cassandra (JAR, cassandra.in.sh, lib where I expect to drop my index plug-in)

Important things learned...

I was able to copy my plug-in to Cassandra, bounce it, then connect IntelliJ IDEA via remote session to my Cassandra service. Presumably, it will kick into the debugger once I figure out how to get Cassandra to call through the plug-in.

Next up, how to tell Cassandra to call my plug-in?

Wednesday, 13 September 2017

To execute a cql command file (i.e.: a text file containing a cql command) from the cql shell, do this:

cqlsh> source 'create-keyspace.cql'

Of course, a full or relative path to the command file works to (although, if Cassandra is installed as a service, what would the current-working directory be?) the file works as well.

Comments in Cassandra cql command files can be:

In the IntelliJ IDEA editor, however, on the first one gives the warm fuzzies of grey text; the other two will not stop IntelliJ from highlighting SQL/CQL keywords found in the comment.

Tuesday, 26 September 2017

Cassandra custom index-relevant links...

Wednesday, 27 September 2017

Secondary indices in Cassandra...

If you attempt to query on a column in a table that's not part of the PRIMARY key, a error will be returned (let's do this in cqlsh). In this example, assume that first_name and not last_name is the primary key:

cqlsh:some_keyspace> SELECT * FROM some_table WHERE last_name = 'Schwartz';
InvalidRequest: code=2200 [Invalid query] message="No supported secondary index found for the non primary key colum
ns restrictions"

The error alludes, a secondary index must be created consisting at least of last_name. By definition, a secondary index is one created on/for a column that's not in the primary key:

cqlsh:some_keyspace> CREATE INDEX last_name_index ON some_table( last_name );

(Note: the name last_name_index is completely optional.)

...whereupon the original query begins to work:

cqlsh:some_keyspace> SELECT * FROM some_table WHERE last_name = 'Schwartz';

first_name | last_name
       Joe | Schwartz

(1 rows)

The secondary index is a different concept than the custom index that I'm working on.

Cassandra partitions data across multiple nodes in a cluster. For this reason, a secondary index based on the the data it refers to must be kept as a copy on every, relevant node. So, queries using a secondary index are significantly more expensive.

Because of how secondary indices are built and maintained, there are cases in which they are not recommended:

An index is built using:

Query-first design and design notes

In Cassandra, by opposition to RDBMS practices, begin design by laying out what queries are to be used instead of what the data and data relationships are to be. Organize the data to satisfy the queries. I see this as being a little like test-driven development, so it's a good thing.

Keep related columns together in the same Cassandra table. Queries that search a single partition will yield the best performance.

Monday, 2 October 2017

The SSTable, or "sorted-strings table," in Cassandra is created when the data of a column family (in memory) is flushed to disk.

The reason a disk needs to be left with 50% space free is so that Cassandra has space rebuild SSTables to optimize them.

Tuesday, 3 October 2017

Materialized views

Today, I'm looking into this topic.

When you move from an RDBMS to Cassandra, whether really or conceptually because you're adopting Cassandra and, like most, have a sort of solidly SQL mindset, you must denormalize data into separate tables based on the queries that will be run against your database (keyspace and tables).

Thinking about how to organize data in Cassandra requires different thoughts and approaches.

For example, the only way to query a column in a table without specifying the partition key is to use a secondary index. This method is not fit for data of high cardinality, that is, columns that contain values that are very uncommon or unique, like a GUID, e-mail address, user name, etc. This is very slow because high-cardinality, secondary-index queries can require all nodes in the ring to respond, adding considerable latency to the action.

One solution to this problem has been to make the client (the one making the query) perform denormalization as a part of his processing of queries into multiple, independent tables. This means that such code, in an application, would be running at the hands of many users on many hosts (instead of just one place).

In Cassandra 3.0 was introduced a new feature, materialized views, one that handles automated, server-side denormalization. This feature takes the form of a statement that's sort of a combination index-creation and select query. For example, suppose this table:

  user TEXT,
  game TEXT,
  year INT,
  month INT,
  day INT,
  score INT,
  PRIMARY KEY( user, game, year, month, day ) )

We want some way to get the all-timer high scores from the data in this table:

CREATE MATERIALIZED VIEW alltimehigh AS              # name the view
  SELECT user                                        # must identify the columns to be contained
  FROM scores                                        # must identify the base table
    WHERE game  IS NOT NULL                          # filter must be specified for each column
      AND score IS NOT NULL
      AND user  IS NOT NULL
      AND year  IS NOT NULL
      AND month IS NOT NULL
      AND day   IS NOT NULL
  PRIMARY KEY( game, score, user, year, month, day ) # must include all of the columns

In this example, we prime the table with some data:

INSERT INTO scores( user, game, year, month, day, score ) VALUES( 'pcmanus', 'Coup', 2015, 05, 01, 4000 )
INSERT INTO scores( user, game, year, month, day, score ) VALUES( 'jbellis', 'Coup', 2015, 05, 03, 1750 )
INSERT INTO scores( user, game, year, month, day, score ) VALUES( 'yukim', 'Coup', 2015, 05, 03, 2250 )
INSERT INTO scores( user, game, year, month, day, score ) VALUES( 'tjake', 'Coup', 2015, 05, 03, 500 )
INSERT INTO scores( user, game, year, month, day, score ) VALUES( 'jmckenzie', 'Coup', 2015, 06, 01, 2000 )
INSERT INTO scores( user, game, year, month, day, score ) VALUES( 'iamaleksey', 'Coup', 2015, 06, 01, 2500 )
INSERT INTO scores( user, game, year, month, day, score ) VALUES( 'tjake', 'Coup', 2015, 06, 02, 1000 )
INSERT INTO scores( user, game, year, month, day, score ) VALUES( 'pcmanus', 'Coup', 2015, 06, 02, 2000 )

...and here's how we search for the all-time high score:

SELECT user, score FROM alltimehigh WHERE game = 'Coup' LIMIT 1

The result is:

user       | score
   pcmanus |  4000

A lot of the magic happens at write time, i.e.: when the table is built. Consequently, there is a performance penalty at write- and query time. Low-cardinality data will create hotspots around the ring. In our example, because the only game is 'Coup', only the node storing 'Coup' have any data store on them. If there are tombstoned entries, the materialized view must query for and generate a tombstone for each entry. This is all overhead.

Monday, 6 November 2017

Setting up a cluster...

It's possible to use something called Cassandra Cluster Manager (CCM), but for practice and deep learning about configuration aspects and details in administration, do each box manually as a separate node. This comes from Jeff Jirsa, who says that official, first-time set-up documents are pretty lacking and gives the following steps:

  1. Install the Debian package from Configuring Cassandra.
  2. Configure following Configuring Cassandra:
    1. Pick a cluster name. (You cannot change this later.)
    2. Set the listen_address (and maybe the broadcast_address).
    3. Put the IP address of the first node as the seed. Once the cluster is up you can change the seeds. The first time a node joins the ring (and for some other stuff not to worry about), this seed is used. Thereafter, as long as the cluster isn't growing, the actual seeds don't matter very much. People tend to think of seeds as being more important than they really are. They should be the same, but, if different across nodes for a while, it's not likely to hurt the cluster much.
  3. Start the node just installed and configured.
  4. Wait 2 minutes.
  5. Proceed with next node (start these instructions over).

Another good reference is How To Run a Multi-Node Cluster Database with Cassandra on Ubuntu 14.04.

Wednesday, 8 November 2017

cqlsh> CREATE ROLE cassadmn WITH PASSWORD = 'Cassadmn' AND LOGIN = true;
NoHostAvailable: ('Unable to complete the operation against any hosts', {})

"Unavailable" indicates that the number of nodes Cassandra needs for the query to succeed isn't available. Too many nodes are down. Either it's a single node that thinks it's more than one node and others are down (you added/removed nodes to/from that cluster in the past), or the replication strategy for system_auth is wrong.

Thursday, 16 November 2017

The Cassandra Coordinator...

...or, more properly, coordinator node, is what sends the client's search request (or query) to each node in the cluster. Each node then returns its result whereupon the coordinator combines these partial results, then gives the n (where n is prescribed in the query by a limit) most highly ranked. This avoids a full scan of all the data.

Cassandra says that the client read or write requests can go to any node in the cluster because all nodes in Cassandra are peers. When a client connects to a node and issues a read or write request, that node serves as the coordinator for that particular client operation.

The job of the coordinator is to act as a proxy between the client application and the nodes (or replicas) that own the data being requested. The coordinator determines which nodes in the ring should get the request based on the cluster configured partitioner and replica placement strategy.

In my mind, this begs a number of questions, "Will every node offer a coordinator?" Or, only some nodes? "Does the coordinator consist of universal code or code that's not everywhere installed?"

My hypothesis is that every "stock" Cassandra node is a potential coordinator for mere Cassandra purposes: Any node that receives a client query is referred to as the coordinator for that client operation (query).

The coordinator node is typically chosen by an algorithm that takes network distance into account. Any node can act as the coordinator. At first requests will be sent to the nodes the client driver knows about. (Remember, a client application initiates its connection to Cassandra by passing a list of one or more contact points which are hostname plus port.)

It's also useful to know that each client request may be coordinated by a different node and there is no single point of failure (fundamental to Cassandra's architecture).

However, once the client connects and understands the topology of the cluster, the driver may change to a closer coordinator, i.e.: choose a different node including one that wasn't in the original list of contact points. This is because each node contains the metadata of all the other nodes, meaning as long as one is connected, the driver could get infomation of all the nodes in the cluster. The driver will then use the metadata of the entire cluster got from the connected node to create the connection pool. This also means that it's not necessary to set the addresses of all the nodes in the cluster in the contact-points list. Best practice is to set the nodes (in the contact-point list) that respond the quickest to the client application when it starts up. This can be difficult if impossible to predict at the finest level.

How is a coordinator chosen? How your application sets up its own load-balancing policy has an effect.

In configuring Cassandra load-balancing policy for your client application, the options are:

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.policies.RoundRobinPolicy;

public class ClientApplicationStub
  public static void main( String[] args )
    Cluster cluster = Cluster.builder()
                       .addContactPoint( "" )
                       .withLoadBalancingPolicy( new RoundRobinPolicy() )

Once the cluster is built, it's not possible to change the policy set.

Friday, 8 December 2017

A "microcluster" for development...

Here's a summary of installing a two-node cluster for my personal, development use. I'm developing a custom, secondary-index plug-in as if I were developing Stratio's Lucene index plug-in, which happens to be the model for what I'm doing. So, imagine that when following these steps.

Given that I'm doing development work, I need lightning-fast turn-around. Hence, I give scripts, short-cuts on the VMs, etc. for that purpose. Remember that these are VMs running on a private host with 32 Gb memory, ample SSD to which no one has access but me.

  1. Use Sun VirtualBox virtualizer. Except as noted, I left all VirtualBox defaults in place. I would point out, however, that in the case I'm using this cluster for, developing a custom index à la Stratio-Lucene, allowing only 1 CPU per VM cripples these nodes so much that either they cannot do any work at all or will not be able to do very much fast enough. Increasing each VM to two CPUs in VirtualBox seemed to solve the problem adequately for the tiny testing and debugging I needed to do (real testing being done in a far more real cluster).

  2. VM OS
  3. Choice of VM OS: Ubuntu Xenial 16.04 LTS (server). I use a DVD containing the OS; see below. This is easily installed in a matter of minutes and it already has all the system software that will be wanted for the rest of this Cassandra cluster installation: Additional considerations after launching the (first) VM...

  4. Cassandra
  5. Cassandra 3.11.0 installation using Debian package from this link. Why not the latest via apt-get install cassandra? Because the system Update Manager will keep changing it. Moreover, my example (the Stratio-Lucene code) will not run on 3.11.1 and later (so far). This gives me total control over the version/package of Cassandra running.
    # dpkg --install cassandra_3.11.0_all.deb
    (Note: Cassandra is running at this point.)

  6. Block Ubuntu from ever updating Cassandra. This is because I'm working on a plug-in that only works with this, specific version:
    # apt-mark hold cassandra
    cassandra set on hold.
  7. Configure Cassandra: establish ports for cluster. This means punching holes through the firewall. Here is a script:
    # Open these ports for Cassandra.
    # Internode ports:
    iptables -A INPUT -p tcp --dport 7000 --jump ACCEPT  # internode communication
    iptables -A INPUT -p tcp --dport 7001 --jump ACCEPT  # SSL
    iptables -A INPUT -p tcp --dport 7199 --jump ACCEPT  # JMX monitoring
    # Client ports:
    iptables -A INPUT -p tcp --dport 9042 --jump ACCEPT  # client
    iptables -A INPUT -p tcp --dport 9160 --jump ACCEPT  # Thrift
    iptables -A INPUT -p tcp --dport 9142 --jump ACCEPT  # native-transport port (SSL)
  8. Halt Cassandra for remaining configuration. Clear data now. (It might be useful to make this a script, but I haven't so far used it a lot.)
    # systemctl stop cassandra
    # rm -rf /var/lib/cassandra/data/system/*

  9. Establish properties beyond existing defaults already in /etc/cassandra/cassandra.yaml. I may reinforce default settings in this list even though I may not change anything when I do:
    1. cluster_name: odyssey
    2. num_tokens: 2
    3. - seeds: ","
    4. listen_address: (*
    5. rpc_address: (*
    6. endpoint_snitch: SimpleSnitch —note that, whatever your schema in real, production mode might be for your Cassandra keyspace, SimpleSnitch will correspond to the SimpleStrategy replication, e.g:
      cqlsh> CREATE KEYSPACE stratio
          ... WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 1 };
    * Differences in cassandra.yaml between the two nodes.

  10. If I want to attach a remote debugger, from my local, development host, to either node, I'll want to uncomment a line in /etc/cassandra/jvm.options:
    # uncomment to have Cassandra JVM listen for remote debuggers/profilers on port 1414
  11. At this point, I shut down scylla cleanly, then, in VirtualBox, I make a snapshot of it, and clone charybdis from it.
    # shutdown -h now (on scylla)
    Remember, there are a couple of settings in /etc/cassandra/cassandra.yaml specific to charybdis that must be done after launching that new VM. These include - seeds, listen_address and rpc_address. (I found out that you can't cheat and use localhost in any of these places.) Moreover, you'll likely discover charybdis's IP address which figures in configuration for both VMs, so these instructions were just a little bit simplified and made assumptions.

  12. Development host and IntelliJ IDEA
  13. Now, I'd have to punch this port through the firewall too, but I prefer to handle this simply via ssh and a tunnel from my local, development host, something like this:
    $ ssh [email protected] -L 1717: # (scylla)
    $ ssh [email protected]   -L 1818: # (charybdis)
    This way, in IntelliJ IDEA, I configure the debugger with Settings:

  14. It's necessary to bounce Cassandra to install my new plug-in. When I do this, because I'm often debugging, perusing logs, etc., I use this script to do the work—which I place on the path /usr/local/bin/bounce-cassandra.sh because that makes it accessible to root:
    rm -rf /var/log/cassandra/*   # zap log files
    systemctl restart cassandra   # toss Cassandra
  15. To access Cassandra, via (targetting) my "primary" node, I do this:
    $ cqlsh scylla --request-timeout=3600
    This is where I'll set up all my schema, enter data and begin conducting the cqlsh-based development.

  16. Because I'm developing a custom, secondary-index plug-in for Cassandra, I want to update the lib subdirectory of Cassandra's installation on both VMs. I want this to be as quick and painless as possible. I have a script on my local, development host.
    # ------------------------------------------------------
    # Replace the Stratio Lucene plug-in on the mini-cluster
    # or walk the cluster's nodes just to "do stuff."
    # ------------------------------------------------------
    CLUSTER="${NODE_1} ${NODE_2}"
    if [ "$args" = "--walk" ]; then
    	# walk the cluster's nodes doing whatever you need, like bouncing Cassandra...
    	for node in ${CLUSTER}; do
    		echo "ssh root@${node}"
    		ssh root@${node}
    	exit 0
    for node in $CLUSTER; do
    	echo "scp ${STRATIO_JAR} root@${node}:${CASSANDRA_LIB_PATH}"
    	scp ${STRATIO_JAR} root@${node}:${CASSANDRA_LIB_PATH}
    # vim: set tabstop=2 shiftwidth=2 noexpandtab:
  17. Again, because I'm developing a custom, secondary-index plug-in for Cassandra, I violate the permissions on the /usr/share/cassandra/lib subdirectory so that I can easily replace my plug-in from a script on my development host without multiple commands:
    # chmod ga+w /usr/share/cassandra/lib


Tuesday, 2 January 2018

Eventually, clearing out Cassandra data and reloading it, over and over again, I reached a point at which my two-node microcluster broke:

russ@gondolin ~ $ cqlsh scylla
Connection error: ('Unable to connect to any servers', {'': error(111, "Tried connecting to [('',
9042)]. Last error: Connection refused")})

Alerted by cqlsh, I looked at the nodes and both began to give me this:

root@charybdis:/etc/cassandra# nodetool status
error: No nodes present in the cluster. Has this node finished starting up?
-- StackTrace --
java.lang.RuntimeException: No nodes present in the cluster. Has this node finished starting up?
  at org.apache.cassandra.dht.Murmur3Partitioner.describeOwnership(Murmur3Partitioner.java:262)
  at org.apache.cassandra.service.StorageService.effectiveOwnership(StorageService.java:4725)
  at org.apache.cassandra.service.StorageService.effectiveOwnership(StorageService.java:114)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

On the verge of reinstalling Cassandra, I got brave and removed more than just what was under the data subdirectory:

rm -rf /var/lib/cassandra/data/*
rm -rf /var/lib/cassandra/commitlog/*
rm -rf /var/lib/cassandra/hints/*
rm -rf /var/lib/cassandra/saved-caches/*

Wednesday, 3 January 2018

Creating a microcluster for private, development use...

If creating a micro- (or mini-) cluster using, say VirtualBox, and if the purpose is beyond mere Cassandra—in my case, using a custom-index plug-in—then leaving the default one CPU per VM will only result in complete frustration. Making it two CPUs seems to solve the log-jam I experienced. Of course, where possible, having even more is better performant, but the point was that a single CPU per VM when trying to run the plug-in results in not merely slowness, but flat-out brokeness, exhibited here by an inability to write. (The red below is issued from the plug-in.)

cqlsh> INSERT INTO myaddressspace.table ( mpid, date_of_service, uri, data )
...     VALUES ( 4,
...              '2017-01-03',
...              '/home/russ/document/Folder010/665892_004.xml.4',
...              '<document>This is a test for mpid 4</document>' );
WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses]
    message="Operation timed out - received only 0 responses."
    info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

This also would tend to suggest that you cannot create a useful (even only two-node) microcluster using an i5 with only 2 cores although I think it's possible to give a set of VMs (2 in the case of a private microcluster of the sort discussed here) a number of cores that adds up to or perhaps even exceeds the actual number the supporting hardware has to offer. It may result in only slowness. This is my case at home although I intend to enroll the services of another unused i5 I've got sitting around for the second VM of my two-node cluster.

Maintaining a private microcluster is very useful for development. This is because it removes all, possible interference by competing needs for a minicluster shared across a team of people.

Wednesday, 10 January 2018

Graceful shut-down...

I read in the mailing list someone suggest, before restarting a Cassandra node, running the following nodetool commands in order to produce a very careful and graceful Cassandra shut-down:

$ nodetool disablethrift && sleep 5
$ nodetool disablebinary && sleep 5
$ nodetool disable gossip && sleep 5
$ nodetool drain

Kurt Greaves countered with "[This is not] essential. Cassandra will gracefully shut down in any scenario as long as it's not killed with a SIGKILL. However, drain does have a few benefits over just a normal shut-down. It will stop a few extra services (batchlog, compactions) and importantly it will also force recycling of dirty commitlog segments, meaning there will be [fewer] commitlog files to replay on startup and reduc[ed] start-up time.

"A comment in the code for drain also indicates that it will wait for in-progress streaming to complete, but I haven't managed to find 1) where this occurs, or 2) if it actually differs to a normal shut-down. Note that this is all with respect to Cassandra 2.1. In 3.0.10 and 3.10, drain and shut-down more or less do exactly the same thing, [though] drain will log some extra messages."

Tuesday, 13 February 2018

On JVM memory and how heap settings are arrived at plus cautionaries on arbitrary, custom settings, see Tuning [Cassandra] Java resources.

See also my general Java notes Notes on JVM heap memory.

When Cassandra starts up, you can examine what it choses as heap sizes assuming you're not telling it (via /etc/cassandra/jvm.options) what to use:

# ps fuww `pgrep java` | egrep -- '-Xms|.Xmx'
# egrep -- '-Xms|-Xmx' /var/log/cassandra/system.log

Explanation of some of the command options:

-f  Do full-format listing. This option can be combined with many other UNIX-style options to add
    additional columns. It also causes the command arguments to be printed.  When used with -L, the NLWP
    (number of threads) and LWP (thread ID) columns will be added. See the c option, the format keyword
    args, and the format keyword comm.

-u  Display user-oriented format.

-w  Wide output. Use this option twice for unlimited width.

--  Tell grep not to interpret -Xms and -Xmx as options (flags).

On a related note, here's someone's 84Gb heap settings and GC consequences in Cassandra.

# Simpler, new generation G1GC settings.
JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions"
JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=50"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"

# GC logging options -- uncomment to enable
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"
JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/vchadoop/var/logs/cassandra/cassandra-gc.log"
JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation"
JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10"


The only issue that we currently have and are looking to fix it soon is the need to upgrade our old JDK version and to set metaspace to a higher value.

We found that when the Java runtime reaches the high watermark, it induces a full GC even if there is plenty of memory to expand the heap.

{Heap before GC invocations=1 (full 1):
garbage-first heap   total 88080384K, used 655025K [0x00007fdd60000000, 0x00007ff260000000, 0x00007ff260000000)
  region size 32768K, 20 young (655360K), 0 survivors (0K)
Metaspace       used 34166K, capacity 35325K, committed 35328K, reserved 36864K
2018-01-05T08:10:31.491+0000: 81.789: [Full GC (Metadata GC Threshold) 651M->30M(84G), 0.6598667 secs]
      [Eden: 640.0M(2048.0M)->0.0B(2048.0M) Survivors: 0.0B->0.0B Heap: 651.4M(84.0G)->30.4M(84.0G)],
      [Metaspace: 34166K->34162K(36864K)]
Heap after GC invocations=2 (full 2):
garbage-first heap   total 88080384K, used 31140K [0x00007fdd60000000, 0x00007ff260000000, 0x00007ff260000000)
  region size 32768K, 0 young (0K), 0 survivors (0K)
Metaspace       used 34162K, capacity 35315K, committed 35328K, reserved 36864K
[Times: user=0.67 sys=0.00, real=0.66 secs]

Thursday, 22 February 2018

Enabling DEBUG-level logging in /etc/cassandra/logback.xml will turn on dumps of CQL commands to Cassandra, i.e.: records of what queries are made begin to appear in /var/log/cassandra/debug.log. This is where it's done in that configuration file (look for this paragraph):

<root level="INFO">                    <!-- change INFO to DEBUG -->
  <appender-ref ref="SYSTEMLOG" />
  <appender-ref ref="STDOUT" />
  <appender-ref ref="ASYNCDEBUGLOG" />

Tuesday, 27 February 2018

Slow queries are queries that take longer than a configured threshold. This threshold is established in the /etc/cassandra/cassandra.yaml file, notion:

# How long before a node logs slow queries. Select queries that take longer than
# this timeout to execute, will generate an aggregated log message, so that slow queries
# can be identified. Set this value to zero to disable slow query logging.
slow_query_log_timeout_in_ms: 300

(Note: default in Cassandra 3.0.11 was 500. I'm setting it to 300 in order to accompany the example below.)

The queries one runs may not always execute as quickly as desired, some worse than others. Set performance expectations (see above). Search queries are the slowest. Reads and writes against primary keys that are designed properly should generally execute with single-digit, millisecond latency. Search will always be slower, there's a great deal more to do, there are multiple index tables to consider, etc.

A rule of thumb for how long searches take is tens of milliseconds on the low end or a couple of seconds on the high end. Above the 2-second figure may indicate a problem to be looked into. Note: a search query, as opposed to simply reading data from Cassandra which is done by simple SELECT, would be a SELECT plus a WHERE clause like (this is using DataStax' Solr integration which would be expected to take longer than most and this would be true also of the stuff I'm working on at present):

cqlsh> SELECT * FROM killervideo.videos
   ... WHERE solr_query = '{ "q" : "title:Terminator", "query.name": "Ahnold" }';

A more tame example might be:

cqlsh> SELECT * FROM fun.users WHERE user_id=42;

Logging queries...

<logger name="com.datastax.driver.core.QueryLogger.SLOW"> <!-- or NORMAL or ERROR -->
  <logger value="DEBUG" />                               <!-- or TRACE -->

(You can also set this in Java.) This will print messages like

DEBUG [cluster1] [/] Query too slow, took 329 ms: SELECT * FROM users WHERE user_id=?;

...for every slow query. To get query-parameter values, resort to TRACE instead of merely DEBUG. You'll see something like this:

TRACE [cluster1] [/] Query too slow, took 329 ms: SELECT * FROM users WHERE user_id=? [user_id=42];

Thursday, 5 April 2018

openjdk-8-8u162-b12 is broken for Cassandra 3.11.0

The upgrade of openjdk-8-8u162-b12 broke Cassandra 3.11.0 on my cluster. It created a situation (in Cassandra dæmon start-up code) in which some method has become abstract and no longer has an executable body defined. See https://docs.oracle.com/javase/8/docs/api/java/lang/AbstractMethodError.html. This isn't in my code; I can't easily find and fix it.

root@scylla:/var/log/cassandra# bounce-cassandra.sh && sleep 3 && tail -f debug.log
ERROR [main] 2018-04-05 11:11:56,873 o.a.c.s.CassandraDaemon:706 - Exception encountered during startup
java.lang.AbstractMethodError: org.apache.cassandra.utils.JMXServerUtils$Exporter.exportObject(Ljava/rmi/Remote; \
    at javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:150) ~[na:1.8.0_162]
    at javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:135) ~[na:1.8.0_162]
    at javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:405) ~[na:1.8.0_162]
    at org.apache.cassandra.utils.JMXServerUtils.createJMXServer(JMXServerUtils.java:104) ~[apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.CassandraDaemon.maybeInitJmx(CassandraDaemon.java:143) [apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:188) [apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) [apache-cassandra-3.11.0.jar:3.11.0]
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) [apache-cassandra-3.11.0.jar:3.11.0]

I had to do this:

  1. Look for what version was installed before the update that happened yesterday morning in /var/log/:
    root@scylla:/var/log/# ll -d dpkg*
  2. Explode the compressed log file, then look for the version:
    root@scylla:/var/log/# gunzip dpkg.log.4
    root@scylla:/var/log/# fgrep openjdk-8 dpkg.4.log
    2017-12-06 14:08:44 install openjdk-8-jre-headless:amd64  8u151-b12-0ubuntu0.16.04.2
  3. Go find openjdk-8-8u151-b12. I found one at https://launchpad.net/~openjdk-r/+archive/ubuntu/security/+build/13634136.

  4. Download it, then install it back in place of the newer one (the JRE first, then the JDK):
    root@scylla:/# apt -f install /home/russ/Downloads/openjdk-8-jre-headless_8u151-b12-0ubuntu0.16.04.2_amd64.deb
    root@scylla:/# apt -f install /home/russ/Downloads/openjdk-8-jdk-headless_8u151-b12-0ubuntu0.16.04.2_amd64.deb
  5. Bounce Cassandra.

  6. Put this version on hold so you don't experience the problem again. First, look for the package in apt-get to see its name and some information about it. You'll see how the next, scheduled upgrade is going to do the same thing all over again:
    root@scylla:/# apt list --installed | grep [o]penjdk
    openjdk-8-jdk-headless/now 8u151-b12-0ubuntu0.16.04.2 amd64 [installed,upgradable to: 8u162-b12-0ubuntu0.16.04.2]
    openjdk-8-jre-headless/now 8u151-b12-0ubuntu0.16.04.2 amd64 [installed,upgradable to: 8u162-b12-0ubuntu0.16.04.2]
    root@scylla:/# apt-mark hold openjdk-8-jre-headless/now
    openjdk-8-jre-headless set on hold.
    root@scylla:/# apt-mark hold openjdk-8-jdk-headless/now
    openjdk-8-jdk-headless set on hold.
    Of course, to hold the JRE/JDK from being updated is not something to do lightly given that updates likely include security-hole patches.

Friday, 6 April 2018

One of our number joined the [email protected] mailing list only to learn that, when discussing using vnodes, especially lots of vnodes, developers are a little incredulous.

Indeed, our experience without vnodes has been excellent, but, when we've instituted vnodes, we've had no end to problems in running node repairs.

We've known that vnodes, especially greater numbers of them, lets us grow our clusters easily and reduce the likelihood of gross misbalancing in terms of how much of our data ends up on the nodes in the ring. More vnodes, more evenly the data is distributed.

We had no idea that resorting to vnodes had such a bad reputation among developers though we've been fretting about problems with Cassandra for some weeks now (our product has not been released yet, we're still in research and development).

Without using vnodes, imaging growing a customer's cluster predicts an entirely hand-wrought experience by our IT folk. Without vnodes, adding a node or two to a cluster promises a potential of failure. Instead, doubling the size of the cluster seems the safest solution.

We've found Cassandra to be very robust at data entry with no vnodes. With vnodes, we have to repair nodes frequently and, sometimes, abandon and rebuild the cluster from scratch.

Stay tuned.

Wednesday, 11 April 2018

When a new node is added to a cluster, it's necessarily added to some data center. As this happens, the "Cassandra shuffle," i.e.: what data is shifted from other nodes to the new one as the node equalize the data load they're sharing, goes on only in the data center ring to which the new node has been added, and not across all nodes in the cluster.

Thinking about it, it's important to understand too that when a new data center is added, there's no rebalancing of tokens (tokens being what determines what's called sharding in other database paradigms) since the data center will keep whatever replicas it's set up to keep.

When configuring a data center, it's stated how many replicas there are to be per data center. This will result in a lot of copying between data centers, but doesn't change the replicas for existing data centers or involve movement with them (the existing data centers).

Wednesday, 18 April 2018

A new microcluster for development...

Now that we know that vnodes are Satan's spawn (at least, for now) and because I need another microcluster that simulates multiple data centers, here's a remake of my 8 December 2017 microcluster.

I cloned scylla to sampo and louhi. Here are the differences between scylla's and sampo's /etc/cassandra/cassandra.yaml files. (louhi's IP address is

$ diff scylla.cassandra.yaml sampo.cassandra.yaml
< cluster_name: 'odyssey'
> cluster_name: 'kalevala'
< num_tokens: 2
> num_tokens: 1
<           - seeds: ","
>           - seeds: ","
< listen_address:
> listen_address:
< rpc_address:
> rpc_address:
< endpoint_snitch: SimpleSnitch
> endpoint_snitch: GossipingPropertyFileSnitch

Now, I had to smoke all the subdirectories under /var/lib/cassandra, including:

Here's the schema and everything else I created for testing and learning a few things I need to know about the Stratio-Lucene plug-in. Now, for Lucene use, this is idiot schema, but I'm not trying to learn about Lucene, only about how this plug-in works its (non Lucene-specific) magic.

    WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1'}
    AND durable_writes = true;

CREATE TABLE stratio.lucene (
    mpid bigint,
    dos text,
    uri text,
    data text,
PRIMARY KEY ( mpid, dos, uri )

CREATE CUSTOM INDEX lucene_index ON stratio.lucene (data)
    USING 'com.stratio.cassandra.lucene.Index'
        'refresh_seconds' : '10',
        'schema' : '{
        fields : {
            mpid : { type : "bigint" },
            dos :  { type : "text" },
            uri :  { type : "text" },
            data : { type : "text" }

INSERT INTO stratio.lucene ( mpid, dos, uri, data )
    VALUES ( 1,
            'This is a test for mpid 1' );
INSERT INTO stratio.lucene ( mpid, dos, uri, data )
    VALUES ( 2,
            'This is a test for mpid 2' );
INSERT INTO stratio.lucene ( mpid, dos, uri, data )
    VALUES ( 3,
            'This is a test for mpid 3' );
INSERT INTO stratio.lucene ( mpid, dos, uri, data )
    VALUES ( 4,
            'This is a test for mpid 4' );
INSERT INTO stratio.lucene ( mpid, dos, uri, data )
    VALUES ( 5,
            'This is a test for mpid 5' );
INSERT INTO stratio.lucene ( mpid, dos, uri, data )
    VALUES ( 13,
            'This is a test for mpid 13' );
INSERT INTO stratio.lucene ( mpid, dos, uri, data )
    VALUES ( 69,
            'This is a test for mpid 69' );

SELECT * FROM stratio.lucene
    WHERE expr( lucene_index, '{ query: { type: "phrase", field: "data", value: "for mpid" } }' );

Friday, 20 April 2018

I've been doing some very heavy-duty debugging of the Stratio-Lucene plug-in to see how it works, in particular, what happens when nodetool repair is run on a node after it's been down and new data's been added on other nodes. I make these observations for later use.

A number of assumptions exist that are, I hope, established in previous days' work (see notes above).

This is pretty tedious, painful and fraught with error. For example, if you don't erase the hints files from both nodes, restarting Cassandra on the downed node will result in repairs happening right away which is what I'm trying to step through. The factual entry points into the code are important too. Here are some exact steps:

  1. Set up breakpoints (these are in Stratio-Lucene code):
    • SecondaryIndexManager+290
    • Index+276
    • IndexServiceWide+450
    • IndexServiceWide+76
    • IndexWriterWide+51
    • IndexWriter+672
    • FSIndex+115
  2. Edit IDEA run configuration to add remote (port 1414) to louhi. Ensure no mistakes:
    1. Host: louhi (no: use see below)
    2. Port: 1414
  3. Drop Cassandra on louhi:
    # systemctl stop cassandra
  4. Add new row (it's the 504 below that changes from other entries):
    INSERT INTO stratio.lucene ( mpid, dos, uri, data )
        VALUES ( 504,
              'This is a test for mpid 504' );
  5. Erase hints on both sampo and louhi:
    # rm /var/lib/cassandra/hints/*
  6. Restart Cassandra on louhi:
    # systemctl restart cassandra
  7. Attach (reattach) IDEA debugger to louhi.
  8. Groom sampo and louhi log files* after they settle down. i.e.:
    root@sampo:/var/log/cassandra# tail -f debug.log
  9. Run nodetool repair -full -pr on louhi.
  10. This should cause a breakpoint to be hit, but it does not for me. Why? I found out that I needed to use the IP address in the remote-debugger configuration. I've not had to do this before; the entry in /etc/hosts has always sufficed.

* Note on grooming the debug.log:

Wait for sampo and louhi logs to settle down, then add blank lines. Note: this is an on-going requirement—before everything that causes lines to be added to debug.log, add blank lines so that, ultimately, you've got blank lines before the log entries you really want to see (and these aren't lost among all the other lines of the log file). If your objective is to debug, maybe this doesn't matter so much, but what's happening will be much clearer.