Category: HBASE
A Deep Dive Into Couchbase N1QL Query Optimization
Feed: Planet NoSQL. Author: Keshav Murthy. [Reposting of the article published with Sitaram Vemulapalli on DZone. https://dzone.com/articles/a-deep-dive-into-couchbase-n1ql-query-optimization] SQL is the declarative language for manipulating data in a relational database system. N1QL is the declarative language for JSON data and metadata. Similar to SQL, N1QL has DML statements to manipulate JSON data: SELECT, INSERT, UPDATE, DELETE, MERGE, EXPLAIN. It also introduces a new statement, INFER, which samples the data to describe the schema and show data samples. Execution of a N1QL query by the engine involves multiple steps. Understanding these will help you to write queries, design for performance, tune query engine efficiently. The ... Read More
How to Speed Up Spatial Search in Couchbase N1QL
Feed: Planet NoSQL. Author: Keshav Murthy. Location-based services like Yelp, Uber, and Foursquare are ubiquitous. This article shows an easy way to use Couchbase N1QL to issue spatial queries and use GSI index to speed up those queries to meet your SLAs. Whether you just finished a long SF Giants extra innings win or spent the afternoon running up and down the Lombard Street in San Francisco, sometimes, you just want a cold one, really fast. When you search on your app, you expect the answer quickly, with ratings, distance, and all. Thanks to GPS, triangulation, and IP-based location detection, apps can determine your current location. Google does ... Read More
More than LIKE: Efficient JSON search with Couchbase N1QL
Feed: Planet NoSQL. Author: Keshav Murthy. [Reposting of my article at DZone: https://dzone.com/articles/more-than-like-efficient-json-search-with-couchbas] Enterprise applications require both exact search (equality, range predicate) and pattern search (LIKE or text search). The B-Tree based indexes in databases help perform an exact search in milliseconds. Pattern search is a whole different problem. Predicates of patterns (name LIKE "%turin%") will scan the whole index — unsuitable for application use. Recognizing the need for speed, we've introduced a TOKENS() functions in Couchbase 4.6 which retuns an array of tokens from each field or document. We then index the array using the array indexing in Couchbase N1QL. This transforms the pattern search problem ... Read More
SPLIT and CONQUER: Efficient String Search With N1QL in Couchbase

Feed: Planet NoSQL. Author: Keshav Murthy. [Repost of my article: https://dzone.com/articles/split-and-conquer-efficient-string-search-with-n1q ] Consider my DZone article: Concurrency Behavior: MongoDB vs. Couchbase. It’s a 10+ page article. For any application, indexing and supporting searching within the full text is a big task. The best practice for search is to have tags or labels for each article and then search on those tags. For this article, tags are: CONCURRENCY, MONGODB, COUCHBASE, INDEX, READ, WRITE, PERFORMANCE, SNAPSHOT, and CONSISTENCY. Let’s put this into a JSON document. Document Key: "k3" { "tags": "CONCURRENCY,MONGODB,COUCHBASE,INDEX,READ,WRITE,PERFORMANCE,SNAPSHOT,CONSISTENCY", "title": "Concurrency Behavior: MongoDB vs. Couchbase" } It's one one thing to store ... Read More
COUNT and GROUP Faster With N1QL

Feed: Planet NoSQL. Author: Keshav Murthy. [Repost from my article at DZone: https://dzone.com/articles/count-amp-group-faster-using-n1ql] Humans have counted things for a very long time. In database applications, COUNT() is frequently used in various contexts. The COUNT performance affects both application performance and user experience. Keeping this mind, Couchbase supported generalized COUNT() and has improved its performance in Couchbase 4.5 release. Two Couchbase 4.5 Features There are two features in Couchbase 4.5 helping the COUNT performance. When the query is interested only in the COUNT of a range of data that’s indexed, the indexer does the counting itself. In other words, the query pushes the counting ... Read More
What’s in a New York Name? Unlock data.gov Using N1QL
Feed: Planet NoSQL. Author: Keshav Murthy. [Reposting my article from https://dzone.com/articles/json-files-whats-in-a-new-york-name-unlocking-data] Data.gov, started in 2009, has about 189,000 datasets. Data is published in XML, CSV, JSON, HTML, PDF, and other formats. Data.gov aims to improve public access to high value, machine-readable datasets generated by the Executive Branch of the Federal Government. Lots of this data comes from the Socrata database. They also provide Socrata APIs to retrieve the subset of the data that you need. Data is valuable. Insights are more valuable. Instead of working with data trickle, let’s load all the data and analyze them. We start this series using a dataset on a simple ... Read More
Try the latest innovations in the Apache Hadoop ecosystem with Hortonworks 2.5 Sandbox – Hortonworks

Feed: Hortonworks Blog – Hortonworks. Author: Rafael Coss. It’s never been easier to get started with Apache Hadoop. The Hortonworks Sandbox combines 100% open-source Apache Hadoop and its data access engines (Apache Spark, Apache Hive, Apache HBase, Apache Solr, Apache Pig) with enterprise-grade Operations (Apache Ambari), Security (Apache Ranger and Apache Knox) and Governance (Apache Atlas). The Sandbox also provides tools for devOps, exploration(Ambari Views) and web notebooks development (Apache Zeppelin). Learn more about the key enhancements in HDP 2.5. The Hortonworks Sandbox provides the fastest onramp to Apache Hadoop and the extended ecosystem with an easy-to-use, integrated learning environment ... Read More
SQL on Twitter: Analysis Made Easy Using N1QL
Feed: Planet NoSQL. Author: Keshav Murthy. [This is the article published on DZone: https://dzone.com/articles/sql-on-twitter-twitter-analysis-made-easy] "If I had more time, I would have written shorter letter" — Blaise Pascal There have been lengthy articles on analyzing Twitter data. From Cloudera: here, here, and here. More from Hortonworks here and here. This one from Couchbase is going to be short, save the examples and results. Step 1: Install Couchbase 4.5. Use the Couchbase console create a bucket called Twitter and CREATE PRIMARY INDEX on Twitter using the query workbench or cbq shell. CREATE PRIMARY INDEX ON twitter; Step 2: Request your Twitter archive. Once you receive it, unzip it. (You can use larger twitter archives as well): cd ... Read More
KEEP CALM and JSON

Feed: Planet NoSQL. Author: Keshav Murthy. JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for software to parse and generate. Here’s a JSON representation of a customer data in JSON. This has served well for interchange and integration.So far, so good.As long as JSON was used for data interchange between multiple layers of application, everything was good. The moment people started using JSON as the database storage format, all hell broke loose. When I first saw JSON as the data format in database, I was surprised databases ... Read More
Is Oracle’s Larry Ellison Wrong on Object Stores?

Feed: Planet NoSQL. Author: Keshav Murthy. It's All About the DATA Here's Larry Ellison's critique of Workday. "Workday does not use a database, they use an object store. So, they can't really do reporting. They use flash. So, they can't run on iPhones or iPads. Besides that, they're great!" Here's the overview Workday application architecture (from Workday). Workday uses object model and stores the objects MySQL table as a blob instead of normalizing and storing the data in multiple tables, tuples, and columns. Object store makes object-relational mapping, object disassembly/assembly unnecessary. Because you store objects, you lose some benefits of RDBMS: query ... Read More
Recent Comments