Vote count: 0

Is there a way to delete some records based on a select query?

I have this query,

Select min(id) from ID having count(*)>1 which will show the duplicates. I need to get those ids and delete them. How can I do it in spark sql?

asked Apr 28 '16 at 4:48
ashK
1881214

1 Answer 1

Vote count: 0

Spark SQL does not support DELETE.

If the number of ids to delete is small, you can do it using the Cassandra driver instead of through Spark:

import scala.collection.JavaConverters._
import scala.collection.JavaConversions._
import com.datastax.driver.core.{Cluster, Session, BatchStatement}
import com.datastax.driver.core.querybuilder.QueryBuilder
val cluster = Cluster.builder().addContactPoint(host_ip).build()
val session = cluster.connect(keyspace)
val idsToDelete = ... // perform your query and collect the ids
val queries = idsToDelete.map({ id => QueryBuilder.delete().from(keyspace, table).where(QueryBuilder.eq("id", id)) })
val batch = batchStatement().addAll(queries.asJava)
session.execute(batch)
cluster.close
answered Oct 27 '16 at 15:18