Cassandra

Cassandra
highly scalable, high-performance distributed database

About Cassandra

Cassandra stores data by dividing data evenly around its cluster of nodes. Each node is responsible for part of the data.

Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Highly consistent
fault-tolerant
scalable

Cassandra Topics

The following are the things covered under Cassandra.

Cassandra Architecture

Cassandra's big picture

Data Modeling

How to make square pegs fit round holes

Cassandra Query Language

CQL reference documentation

Cassandra Development

Learn how to improve Cassandra and contribute patches

Configuration

Cassandra's handles and knobs

Operating Cassandra

The operator's corner

Course Contents

The following are the course contents offered for Cassandra

What is big data?
Challenges of modern applications
Why not relational databases?
How to handle big data
What is Cassandra and why Cassandra?
Horizontal scalability
High availability
Write optimization
Structured records
Secondary indexes
Materialized views
Efficient result ordering
Immediate consistency
Discretely writable collections
Relational joins
MapReduce and Spark
Rich and flexible data model
Lightweight transactions
Multidata center replication
Comparing Cassandra to the alternatives
Installing Cassandra
Installing the JDK
Installing on Debian-based systems (Ubuntu)
Installing on RHEL-based systems
Installing on Windows
Installing on Mac OS X
Installing the binary tarball
Bootstrapping the project
CQL—the Cassandra Query Language
Interacting with Cassandra
Getting started with CQL
Creating a keyspace
Selecting a keyspace
Creating a table
Inserting and reading data
New features in Cassandra 2.2

How to configure keyspaces
Creating the users table
Structuring of tables
Table and column options
The type system
Strings
Integers
Floating point and decimal numbers
Timestamp
UUIDs
Booleans
Blobs
Collections
Other data types
The purpose of types
Inserting data
Writing data does not yield feedback
Partial inserts
Selecting data
Missing rows
Selecting more than one row
Retrieving all the rows
Paginating through results
Inserts are always upserts
Developing a mental model for Cassandra

A table for status updates
Creating a table with a compound primary key
The structure of the status updates table
UUIDs and timestamps
Working with status updates
Extracting timestamps
Looking up a specific status update
Automatically generating UUIDs
Anatomy of a compound primary key
Anatomy of a single-column primary key
Beyond two columns
Multiple clustering columns
Composite partition keys
Composite partition key table
Structure of composite partition key tables
Composite partition key with multiple clustering columns
Compound keys represent parent-child relationships
Coupling parents and children using static columns
Defining static columns
Working with static columns
Interacting only with the static columns
Static-only inserts
Static columns act like predefined joins
When to use static columns
Refining our mental model

Looking up rows by partition
The limits of the WHERE keyword
Restricting by clustering column
Restricting by part of a partition key
Retrieving status updates for a specific time range
Creating time UUID ranges
Selecting a slice of a partition
Paginating over rows in a partition
Counting rows
Reversing the order of rows
Reversing clustering order at query time
Reversing clustering order in the schema
Limitations of ORDER BY
ORDER BY summary
Paginating over multiple partitions
JSON support
INSERT JSON
SELECT JSON
Building an autocomplete function

Modeling follow relationships
Outbound follows
Inbound follows
Storing follow relationships
Cassandra data modelling
Conceptual data model (entity relationship model)
Logical data model (query-driven design)
Physical data model
Denormalization
Looking up follow relationships
Unfollowing users
Using secondary indexes to avoid denormalization
The form of the single table
Adding a secondary index
Other uses of secondary indexes
Limitations of secondary indexes
Secondary indexes can only have one column
Secondary indexes can only be tested for equality
Secondary index lookup is not as efficient as primary key lookup
Materialized views
Adding a view

A normalized approach
Generating the timeline
Ordering and pagination
Multiple partitions and read efficiency
Partial denormalization
Displaying the home timeline
Read performance and write complexity
Fully denormalizing the home timeline
Creating a status update
Displaying the home timeline
Write complexity and data integrity
Batching in Cassandra
Logged batches
Unlogged batches
When to use unlogged batches
Misuse of BATCH statements

Viewing a keyspace schema
Viewing a table schema in cqlsh
Adding columns to tables
Deleting columns
Updating the existing rows
Updating multiple columns
Updating multiple rows
Removing a value from a column
Missing columns in Cassandra
Deleting specific columns
Syntactic sugar for deletion
Deleting table data (TRUNCATE)
Deleting table/keyspace with schema (DROP)
Inserts
Inserts can overwrite existing data
Checking before inserting isn't enough
Another advantage of UUIDs
Conditional inserts and lightweight transactions
Updates can create new rows
Optimistic locking with conditional updates
Optimistic locking in action
Optimistic locking and accidental updates
Lightweight transactions and their cost
When lightweight transactions aren't necessary

The problem with concurrent updates
Serializing the collection
Introducing concurrency
Collection columns and concurrent updates
Defining collection columns
Reading and writing sets
Advanced set manipulation
Removing values from a set
Sets and uniqueness
Collections and upserts
Using lists for ordered
Defining a list column
Writing a list
Discrete list manipulation
Writing data at a specific index
Removing elements from the list
Using maps to store key-value pairs
Writing a map
Updating discrete values in a map
Removing values from maps
Collections in inserts
Collections and secondary indexes
Secondary indexes on map columns
The limitations of collections
Reading discrete values from collections
Collection size limit
Reading a collection column from multiple rows
Unable to reuse collection names
Performance of collection operations
Working with tuples
Creating a tuple column
Writing to tuples
Indexing tuples
User-defined types
Creating a user-defined type
Assigning a user-defined type to a column
Adding data to a user-defined column
Indexing and querying user-defined types
Partial selection of user-defined types
Choosing between tuples and user-defined types
Nested collections
Nested tuples/UDTs
Comparing data structures

Recording discrete analytics observations
Using discrete analytics observations
Slicing and dicing our data
Recording aggregate analytics observations
Answering the right question
Precomputation versus read-time aggregation
The many possibilities for aggregation
The role of discrete observations
Recording analytics observations
Updating a counter column
Counters and upserts
Setting and resetting counter columns
Counter columns and deletion
Counter columns need their own table
Cassandra configuration
Configuration location
Modifying configuration
Restarting Cassandra
User-defined functions
User-defined aggregate functions
Standard aggregate functions

Data distribution in Cassandra
Cassandra's partitioning strategy - partition key tokens
Distributing partition tokens
Partitioners
Partition keys group data on the same node
Virtual nodes
Virtual nodes facilitate redistribution
Data replication in Cassandra
Masterless replication
Replication without a master
Gossip protocol
Multidata center cluster
Snitch
Replication strategy
Durable writes
Consistency
Immediate and eventual consistency
Consistency in Cassandra
The anatomy of a successful request
Tuning consistency
Eventual consistency with ONE
Immediate consistency with ALL
Fault-tolerant immediate consistency with QUORUM
Local consistency levels
Comparing consistency levels
Choosing the right consistency level
The CAP theorem
Handling conflicting data
Last-write-wins conflict resolution
Introspecting write timestamps
Overriding write timestamps
Distributed deletion
Stumbling on tombstones
Expiring columns with TTL
Table configuration options

3 - node cluster
Prerequisites
Tuning configuration options setting up a 3-node cluster
Tuning configuration
Cassandra.yaml
Cassandra-env.sh
Starting the 3-node cluster
Consistency in action
Write consistency
Consistency QUORUM
Consistency ANY
Cassandra internals
The write path
Compaction
The read path
Cassandra repair mechanisms
Hinted handoff
Read repair
Anti-entropy repair

A simple query
Cluster API
Getting metadata
Querying
Prepared statements
QueryBuilder API
Building an INSERT statement
Building an UPDATE statement
Building a SELECT statement
Asynchronous querying
Execute asynchronously
Processing future results
Driver policies
Load-balancing policy
RoundRobinPolicy
DCAwareRoundRobinPolicy
TokenAwarePolicy
Retry Policy

Using cassandra-cli
The structure of a simple primary key table
Exploring cells
A model of column families: RowKey and cells
Compound primary keys in column families
A complete mapping
The wide row data structure
The empty cell
Collection columns in column families
Set columns in column families
Map columns in column families
List columns in column families
Appending and prepending values to lists
Other list operations

Enabling authentication and authorization
Authentication
Authentication with cqlsh
Authentication in your application
Setting up a user
Changing a user's password
Viewing user accounts
Controlling access
Viewing permissions
Revoking access
Authorization in action
Authorization as a hedge against mistakes
Security beyond authentication and authorization
Security protects against vulnerabilities

Download

Download Cassandra course plan

WeCanDoNow

Cassandra
highly scalable, high-performance distributed database

About Cassandra

Cassandra stores data by dividing data evenly around its cluster of nodes. Each node is responsible for part of the data.

Cassandra Topics

Cassandra Architecture

Data Modeling

Cassandra Query Language

Cassandra Development

Configuration

Operating Cassandra

Course Contents

Download

WeCanDoNow

Useful Links

Contact Us

Cassandra highly scalable, high-performance distributed database

About Cassandra

Cassandra stores data by dividing data evenly around its cluster of nodes. Each node is responsible for part of the data.

Cassandra Architecture

Data Modeling

Cassandra Query Language

Cassandra Development

Configuration

Operating Cassandra

Download

Cassandra
highly scalable, high-performance distributed database