10/5/2018

Reading time:6 min

Intro to Cassandra

by Tyler Hobbs

Intro to Cassandra SlideShare Explore You Successfully reported this slideshow.Intro to CassandraUpcoming SlideShareLoading in …5× 0 Comments 6 Likes Statistics Notes Radha Krishna Proddaturi , Principal Software Engineer at Coupons.com at Coupons.com Will Hayworth Yuvaraj Loganathan , Technical Architect at Owler at Owler Colin Kuo , Staff Engineer at Brocade Communications Systems at Brocade Tags cassandra Dave Kincaid , Managing Architect - Data Services/Data Science at IDEXX Laboratories at IDEXX Laboratories Show More No DownloadsNo notes for slide 1. Intro toCassandra Tyler Hobbs 2. HistoryDynamo BigTable(clustering) (data model) Cassandra 3. Users 4. Clustering Every node plays the same role – No masters, slaves, or special nodes – No single point of failure 5. Consistent Hashing 0 50 10 40 20 30 6. Consistent Hashing Key: “www.google.com” 0 50 10 40 20 30 7. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 8. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 9. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 10. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 Replication Factor = 3 11. Clustering Client can talk to any node 12. ScalingRF = 2 0 50 10The node at50 owns thered portion 20 30 13. ScalingRF = 2 0 50 10 Add a new 40 20 node at 40 30 14. ScalingRF = 2 0 50 10 Add a new 40 20 node at 40 30 15. Node FailuresRF = 2 0 50 10 Replicas 40 20 30 16. Node FailuresRF = 2 0 50 10 Replicas 40 20 30 17. Node FailuresRF = 2 0 50 10 40 20 30 18. Consistency, Availability Consistency – Can I read stale data? Availability – Can I write/read at all? Tunable Consistency 19. Consistency N = Total number of replicas R = Number of replicas read from – (before the response is returned) W = Number of replicas written to – (before the write is considered a success) 20. Consistency N = Total number of replicas R = Number of replicas read from – (before the response is returned) W = Number of replicas written to – (before the write is considered a success) W + R > N gives strong consistency 21. Consistency W + R > N gives strong consistency N=3 W=2 R=2 2 + 2 > 3 ==> strongly consistent 22. Consistency W + R > N gives strong consistency N=3 W=2 R=2 2 + 2 > 3 ==> strongly consistent Only 2 of the 3 replicas must be available. 23. Consistency Tunable Consistency – Specify N (Replication Factor) per data set – Specify R, W per operation 24. Consistency Tunable Consistency – Specify N (Replication Factor) per data set – Specify R, W per operation – Quorum: N/2 + 1 • R = W = Quorum • Strong consistency • Tolerate the loss of N – Quorum replicas – R, W can also be 1 or N 25. Availability Can tolerate the loss of: – N – R replicas for reads – N – W replicas for writes 26. CAP TheoremDuring node or network failure: 100% Not Possible Availability Possible Consistency 100% 27. CAP TheoremDuring node or network failure: 100% Not Ca Possible ss an dr Availability a Possible Consistency 100% 28. Clustering No single point of failure Replication that works Scales linearly – 2x nodes = 2x performance • For both writes and reads – Up to 100s of nodes Operationally simple Multi-Datacenter Replication 29. Data Model Comes from Google BigTable Goals – Minimize disk seeks – High throughput – Low latency – Durable 30. Data Model Keyspace – A collection of Column Families – Controls replication settings Column Family – Kinda resembles a table 31. Column Families Static – Object data – Similar to a table in a relational database Dynamic – Pre-calculated query results – Materialized views 32. Static Column Families Users zznate password: * name: Nate driftx password: * name: Brandon thobbs password: * name: Tyler jbellis password: * name: Jonathan site: riptano.com 33. Dynamic Column Families Rows – Each row has a unique primary key – Sorted list of (name, value) tuples • Like a sorted map or dictionary – The (name, value) tuple is called a “column” 34. Dynamic Column Families Followingzznate driftx: thobbs:driftxthobbs zznate:jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate 35. Dynamic Column Families Column Timestamps – Each column (tuple) has a timestamp – In the case of a collision, the latest timestamp wins – Client specifies timestamp with write – Writes are idempotent • Infinite retries allowed 36. Dynamic Column Families Other Examples: – Timeline of tweets by a user – Timeline of tweets by all of the people a user is following – List of comments sorted by score – List of friends grouped by state 37. The Data API Two choices – RPC-based API – CQL • Cassandra Query Language 38. Inserting Data INSERT INTO users (KEY, “name”, “age”) VALUES (“thobbs”, “Tyler”, 24); 39. Updating Data Updates are the same as inserts: INSERT INTO users (KEY, “age”) VALUES (“thobbs”, 34); Or UPDATE users SET “age” = 34 WHERE KEY = “thobbs”; 40. Fetching Data Whole row select: SELECT * FROM users WHERE KEY = “thobbs”; 41. Fetching Data Explicit column select: SELECT “name”, “age” FROM users WHERE KEY = “thobbs”; 42. Fetching Data Get a slice of columns UPDATE letters SET 1=a, 2=b, 3=c, 4=d, 5=e WHERE KEY = “key”; SELECT 1..3 FROM letters WHERE KEY = “key”; Returns [(1, a), (2, b), (3, c)] 43. Fetching Data Get a slice of columns SELECT FIRST 2 FROM letters WHERE KEY = “key”; Returns [(1, a), (2, b)] SELECT FIRST 2 REVERSED FROM letters WHERE KEY = “key”; Returns [(5, e), (4, d)] 44. Fetching Data Get a slice of columns SELECT 3.. FROM letters WHERE KEY = “key”; Returns [(3, c), (4, d), (5, e)] SELECT FIRST 2 REVERSED 4.. FROM letters WHERE KEY = “key”; Returns [(4, d), (3, c)] 45. Deleting Data Delete a whole row: DELETE FROM users WHERE KEY = “thobbs”; Delete specific columns: DELETE “age” FROM users WHERE KEY = “thobbs”; 46. Secondary Indexes Builtin basic indexes CREATE INDEX ageIndex ON users (age); SELECT name FROM USERS WHERE age = 24 AND state = “TX”; 47. Performance Writes – 10k – 30k per second per node – Sub-millisecond latency Reads – 1k – 10k per second per node – Depends on data set, caching – Usually 0.1 to 10ms latency 48. Other Features Distributed Counters – Can support millions of high-volume counters Excellent Multi-datacenter Support – Disaster recovery – Locality Hadoop Integration – Isolation of resources – Hive and Pig drivers Compression 49. What Cassandra Cant Do Transactions – Unless you use a distributed lock – Atomicity, Isolation – These arent needed as often as youd think Limited support for ad-hoc queries – Know what you want to do with the data 50. Not One-size-fits-all Use alongside an RDBMS – Use the RDBMS for highly-transactional or highly- relational data • Usually a small set of data – Let Cassandra scale to handle the rest 51. Language Support Good: – Java – Python – Ruby – PHP – C# Coming Soon: – Everything else, now that we have CQL 52. Questions? Tyler Hobbs @tylhobbs tyler@datastax.com Recommended Visual Thinking StrategiesOnline Course - LinkedIn Learning Creative Inspirations: Duarte Design, Presentation Design StudioOnline Course - LinkedIn Learning Gaining Skills with LinkedIn LearningOnline Course - LinkedIn Learning SC 2015: Thinking Fast and Slow with Software DevelopmentDaniel Bryant Cassandra for Python DevelopersTyler Hobbs Detect all memory leaks with LeakCanary!Pierre-Yves Ricau How Yelp Uses Sensu to Monitor Services in a SOA WorldKyle Anderson Evolving the Netflix APIKatharina Probst Datomic – A Modern Database - StampedeCon 2014StampedeCon 7 Common Mistakes in Go (2015)Steven Francia About Blog Terms Privacy Copyright LinkedIn Corporation © 2018 Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Read this article if you want to know more about Intro to Cassandra

Intro to Cassandra

SlideShare Explore You

Successfully reported this slideshow.

Intro to Cassandra

Upcoming SlideShare

Loading in …5

0 Comments

1. Intro toCassandra Tyler Hobbs
2. HistoryDynamo BigTable(clustering) (data model) Cassandra
3. Users
4. Clustering Every node plays the same role – No masters, slaves, or special nodes – No single point of failure
5. Consistent Hashing 0 50 10 40 20 30
6. Consistent Hashing Key: “www.google.com” 0 50 10 40 20 30
7. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
8. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
9. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
10. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 Replication Factor = 3
11. Clustering Client can talk to any node
12. ScalingRF = 2 0 50 10The node at50 owns thered portion 20 30
13. ScalingRF = 2 0 50 10 Add a new 40 20 node at 40 30
14. ScalingRF = 2 0 50 10 Add a new 40 20 node at 40 30
15. Node FailuresRF = 2 0 50 10 Replicas 40 20 30
16. Node FailuresRF = 2 0 50 10 Replicas 40 20 30
17. Node FailuresRF = 2 0 50 10 40 20 30
18. Consistency, Availability Consistency – Can I read stale data? Availability – Can I write/read at all? Tunable Consistency
19. Consistency N = Total number of replicas R = Number of replicas read from – (before the response is returned) W = Number of replicas written to – (before the write is considered a success)
20. Consistency N = Total number of replicas R = Number of replicas read from – (before the response is returned) W = Number of replicas written to – (before the write is considered a success) W + R > N gives strong consistency
21. Consistency W + R > N gives strong consistency N=3 W=2 R=2 2 + 2 > 3 ==> strongly consistent
22. Consistency W + R > N gives strong consistency N=3 W=2 R=2 2 + 2 > 3 ==> strongly consistent Only 2 of the 3 replicas must be available.
23. Consistency Tunable Consistency – Specify N (Replication Factor) per data set – Specify R, W per operation
24. Consistency Tunable Consistency – Specify N (Replication Factor) per data set – Specify R, W per operation – Quorum: N/2 + 1 • R = W = Quorum • Strong consistency • Tolerate the loss of N – Quorum replicas – R, W can also be 1 or N
25. Availability Can tolerate the loss of: – N – R replicas for reads – N – W replicas for writes
26. CAP TheoremDuring node or network failure: 100% Not Possible Availability Possible Consistency 100%
27. CAP TheoremDuring node or network failure: 100% Not Ca Possible ss an dr Availability a Possible Consistency 100%
28. Clustering No single point of failure Replication that works Scales linearly – 2x nodes = 2x performance • For both writes and reads – Up to 100s of nodes Operationally simple Multi-Datacenter Replication
29. Data Model Comes from Google BigTable Goals – Minimize disk seeks – High throughput – Low latency – Durable
30. Data Model Keyspace – A collection of Column Families – Controls replication settings Column Family – Kinda resembles a table
31. Column Families Static – Object data – Similar to a table in a relational database Dynamic – Pre-calculated query results – Materialized views
32. Static Column Families Users zznate password: * name: Nate driftx password: * name: Brandon thobbs password: * name: Tyler jbellis password: * name: Jonathan site: riptano.com
33. Dynamic Column Families Rows – Each row has a unique primary key – Sorted list of (name, value) tuples • Like a sorted map or dictionary – The (name, value) tuple is called a “column”
34. Dynamic Column Families Followingzznate driftx: thobbs:driftxthobbs zznate:jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate
35. Dynamic Column Families Column Timestamps – Each column (tuple) has a timestamp – In the case of a collision, the latest timestamp wins – Client specifies timestamp with write – Writes are idempotent • Infinite retries allowed
36. Dynamic Column Families Other Examples: – Timeline of tweets by a user – Timeline of tweets by all of the people a user is following – List of comments sorted by score – List of friends grouped by state
37. The Data API Two choices – RPC-based API – CQL • Cassandra Query Language
38. Inserting Data INSERT INTO users (KEY, “name”, “age”) VALUES (“thobbs”, “Tyler”, 24);
39. Updating Data Updates are the same as inserts: INSERT INTO users (KEY, “age”) VALUES (“thobbs”, 34); Or UPDATE users SET “age” = 34 WHERE KEY = “thobbs”;
40. Fetching Data Whole row select: SELECT * FROM users WHERE KEY = “thobbs”;
41. Fetching Data Explicit column select: SELECT “name”, “age” FROM users WHERE KEY = “thobbs”;
42. Fetching Data Get a slice of columns UPDATE letters SET 1=a, 2=b, 3=c, 4=d, 5=e WHERE KEY = “key”; SELECT 1..3 FROM letters WHERE KEY = “key”; Returns [(1, a), (2, b), (3, c)]
43. Fetching Data Get a slice of columns SELECT FIRST 2 FROM letters WHERE KEY = “key”; Returns [(1, a), (2, b)] SELECT FIRST 2 REVERSED FROM letters WHERE KEY = “key”; Returns [(5, e), (4, d)]
44. Fetching Data Get a slice of columns SELECT 3.. FROM letters WHERE KEY = “key”; Returns [(3, c), (4, d), (5, e)] SELECT FIRST 2 REVERSED 4.. FROM letters WHERE KEY = “key”; Returns [(4, d), (3, c)]
45. Deleting Data Delete a whole row: DELETE FROM users WHERE KEY = “thobbs”; Delete specific columns: DELETE “age” FROM users WHERE KEY = “thobbs”;
46. Secondary Indexes Builtin basic indexes CREATE INDEX ageIndex ON users (age); SELECT name FROM USERS WHERE age = 24 AND state = “TX”;
47. Performance Writes – 10k – 30k per second per node – Sub-millisecond latency Reads – 1k – 10k per second per node – Depends on data set, caching – Usually 0.1 to 10ms latency
48. Other Features Distributed Counters – Can support millions of high-volume counters Excellent Multi-datacenter Support – Disaster recovery – Locality Hadoop Integration – Isolation of resources – Hive and Pig drivers Compression
49. What Cassandra Cant Do Transactions – Unless you use a distributed lock – Atomicity, Isolation – These arent needed as often as youd think Limited support for ad-hoc queries – Know what you want to do with the data
50. Not One-size-fits-all Use alongside an RDBMS – Use the RDBMS for highly-transactional or highly- relational data • Usually a small set of data – Let Cassandra scale to handle the rest
51. Language Support Good: – Java – Python – Ruby – PHP – C# Coming Soon: – Everything else, now that we have CQL
52. Questions? Tyler Hobbs @tylhobbs tyler@datastax.com

Visibility Others can see my Clipboard

windows

cassandra

tutorial

Install Cassandra on Windows 10: Tutorial With Simple Steps

Vladimir Kaplarevic

12/2/2023

cassandra

tutorial

GitHub - datastaxdevs/bootcamp-fullstack-apps-with-cassandra: Learn how to build a backend for Cassandra, from data model to drivers to API exposition

John Doe

12/2/2023

cassandra

tutorial

Cassandra Training

John Doe

4/21/2022

cassandra

tutorial

Courses and scenarios authored by DataStax | Katacoda

John Doe

3/17/2021

cassandra

tutorial

Hands-on Learning for Cassandra

John Doe

9/3/2020

cassandra

tutorial

Cassandra Tutorial for Beginners | Learn Apache Cassandra - DataFlair

DataFlair Team

5/20/2020

cassandra

python

tutorial

kadnan/PythonCassandraTutorial

John Doe

4/9/2020

cassandra

tutorial

Cassandra Tutorial

John Doe

12/19/2019

time.series

tutorial

data.modeling

Apache Cassandra, Part 6: Time Series Modelling

John Doe

12/19/2019

cassandra

architecture

tutorial

Apache Cassandra, Part 7: Secondary Index, Replication, and Tips

John Doe

12/19/2019

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

tutorial

windows

cassandra

tutorial

Install Cassandra on Windows 10: Tutorial With Simple Steps

Vladimir Kaplarevic

12/2/2023

cassandra

tutorial

GitHub - datastaxdevs/bootcamp-fullstack-apps-with-cassandra: Learn how to build a backend for Cassandra, from data model to drivers to API exposition

John Doe

12/2/2023

cassandra

tutorial

Cassandra Training

John Doe

4/21/2022

cassandra

tutorial

Courses and scenarios authored by DataStax | Katacoda

John Doe

3/17/2021