05.07.15 Starcounter - NoSQL (NewSQL) NoSQL (NewSQL) What is Starcounter? It is an In-Memory application server containing NoSQL* Database Engine and application server for ultra rapid development of high-performance business applications It was founded by Joachim Wester in 2006, and the leader of the development team was a Russian comrade, Dan Skatov It has a RESTable Web Server for simple communication with remote components Supports SQL queries, which is already odd for a NoSQL server Amazingly enough, it is fully ACID compatible. (ACID = Atomicity, Consistency, Isolation, Durability – Guarantee of transaction reliability within the database) Includes native .NET object API (C#, VB.NET, Managed C++ etc.) Supports .NET Framework 4.5 Why are we talking about Starcounter? So let’s first start a discussion about contemporary databases. For a year, I was working at their sister company Heads from Stockholm on the integration of Starcounter into their ERP system. I had the opportunity to gain an insight in the performance of this system, which initiated a series of questions. Contemporary databases Part 1 Data processing is OMNIPRESENT. And that is really not surprising at all. From money transfer to phone contacts and basically everything stored in the "cloud", an average consumer, even unknowingly, participates in the processing of various data ON A DAILY BASIS. At the same time, options for software and hardware solutions for data management applications multiply with no end in sight. However, such applications leave the data as they are, waiting for data to be processed and stored in an efficient manner. Contemporary world is truly driven by data. Let us have a look at Database Management Systems, i.e. DBMS. According to TechTerms, DBMS is a "software system that uses standard methods for collection, catalogization and performing queries over data". Established in 1970, DBMS is a very well known concept and fairly studied field. But they still do not offer a simple answer, but only numerous unanswered questions. This generates interesting dynamics in the field of DBMS development. Today, there are more than 150 vertical solutions of DBMS special purpose systems. This is more than an average engineer could withstand! One way to reduce such difficulty is to observe DBMS through the lens of certain dilemmas and compromises. None of DBMS is ultimately brilliant in every field. Each of them has been made on the basis of compromises between various extremes. Compromise #1 – Human-generated data versus the other Rapid development of the Internet has initiated exponential growth of digital data. Every day, new data is created on the basis of a set of previously generated data. Those are blocks of data on blocks of data. With so much data available, it is nearly impossible to perceive their quantity. Google CEO Eric Schmidt said, "There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days!" Out of all these data, we will focus on those generated by humans: e-mails, videos, Internet shopping, banking transactions. We will exclude scientific data such as photos from the Hubble or measurements from the Large Hadron Collider, simply due to the fact that these data could create an infinite number of information upon defining the object of measurement. The emphasis here is put on the creation of a unique special solution for storage and processing of such data. An example of that is MatrixNet, machine learning tool used by nuclear physicists at CERN. Unlike scientific data, such human-generated data present information in the manner understandable to DBMS parameters, of vast significance for everyday use and application. Compromise #2 – Structuredness versus Unstructuredness The concept of structureness is significant when it comes to human-generated data. If we think of a name, "Peter" for example, and if we observe that information as a series of 4 letters or one word in natural language, this is an unstructured datum. So is most of the Internet. On the other hand, if we consider this information as the value of the "Name" field in a CRM, then this is a structured datum, since it has a defined meaning in a particular context. Another good example is business-critical transactional databases, because the content of a transaction is always strictly defined. Although the growth of each individual data set is exponential, the exponent of growth of structured data is several orders of magnitude smaller than those numerous exabytes of information of unstructured internet. In other words, structured data sets are far smaller than unstructured ones, for example, the estimated size of annual transactions of Amazon is 56Gb. We could work with that. Compromise #3 – Transactions versus Analytics The evolving phenomenon of Big Data has emphasized the significance of classification of DBMS into transaction systems in the classical sense (OLTP – online transactional processing) and systems for data analysis, including measurements and new observations from existing data sets (OLAP – online analytical processing). These two technologies serve the two sides of observing data, pursuant to which certain tasks can be solved efficiently. While various write-conflicts are frequent during transactions (when two agents withdraw money from an account simultaneously), analytics treats the data set as static and performs high-performance read-queries over them. (in many business applications, serious analyses are performed overnight, on a snapshot of transactional data.) Occasionally, in Search Engines or high-frequency trades, analytics has to be performed in parallel with transactions and such hybrid solutions are currently cutting-edge technologies. In order to update hundreds of parameters, in a competitive manner and in real time, it is necessary to use an ultra high-performance transactional database. Compromise #4 - In-Memory versus Disk-based OLTP DBMS attract a large part of attention because OLTP is considered as a basic technology for many existing and growing businesses (e.g. digital advertising or e-commerce). The pioneer of DBMS R&D space, Dr Michael Stonebreaker said that the best way to organize data management was the use of a specialized OLTP engine over the current data. However, with current classification to database based on memory or disk, data processing no longer requires powerful OLTP machines. In 2005, the price of 1MB of RAM dropped drastically, causing the switch of all business-critical transactional databases to memory. Today, a standard server comes with mandatory 128MB of RAM, while disks are still important in the field of OLAP machines. Databases optimized for memory still use disks for the implementation of persistence and safety functions, but unlike disk-based DBMS, they do it in a much "softer" and more efficient manner. Compromise #5 - Scale-In versus Scale-Out In fact, this comes down to a compromise between Consistent and Inconsistent. Why? Partitioning data to several machines is a good option for increasing speed. Many NoSQL solutions work that way (MongoDB). This Scaling-Out practice is based on the assumption that performances will grow linearly with the number of servers, which is the case. However, such systems lose transactional consistency, which means that the system is in Non-Conflicted state all the time. For example, when money is transferred from one account to another, such operation should either be finished by the same amount being put on another account, or being discarded completely. Compromise #6 - SQL versus NoSQL In other words, Classical versus contemporary.> Another way to improve performance is to abandon the SELECT DBMS functionality. Between 2007 and 2012, "classical" DBMS systems with rich SQL syntax switched to NoSQL DBMS, quite light, inconsistent and fast memory storage, which sometimes offer nothing else but Key-Value functionality. This shift has provided an alternative for widely used, expensive corporate DBMS. However, DBMS users now more than ever require an even richer syntax that supports operations with graphs and basic OLAP. This has resulted in the appearance of NewSQL systems that offer performance, consistency and SQL functionality. NewSQL is now a growing trend that leads to even greater competence and innovations in the market. How does it work? VMDBMS represents integration between Application Runtime Virtual Machine (VM) and Database Management Systems (DBMS). As a result of such integration, data are located in one place in RAM memory all the time, and are not copied back and forth between the database and application. Virtual machine, such as Java or .NET CLR (Common Language Runtime), host and execute code. When Java or C# code is compiled, the compiler creates a binary code that need to be performed not by the operating system, but the specific virtual machine. This VM converts the compiled code to native code. When the application accesses data, it accesses them directly in the part of the memory occupied by the database. In this way, unlike Entity Framework, nHibernate or other OR-mappers, objects do not carry a local copy of the datum into the part of RAM memory occupied by the application. This means there is no serialization and deserialization of objects. However, it still uses disk to secure the transaction log and record the state of database image. Why is that good? Fundamental criteria: Moving both the model and controller logic inside the database means less strain on db the database. Moving memory around is a bottleneck! Clock cycles for business logic are much smaller than clock cycles for communicating and moving data. Starcounter database features Do not think relationally! Classes are tables and instances are rows Database objects live in the database from the beginning (new operator) using Starcounter; [Database] public class Persistent { public string Name; } Public fields, public auto-created properties and public properties getting and setting private fields are columns using Starcounter; [Database] public class Person { public string FirstName; public string LastName { get; set; } public string FullName { get { return FirstName + " " + LastName; } } } Offers relational, graph, object oriented and document access, all rolled into one. Relations are established through object references rather than explicit keys. [Database] public class Quote { public Person Who; private string _Text; public string Text { get { return _Text; } set { _Text = value; } } } The Database attribute is inherited from base to subclasses. Any class that directly or indirectly inherits a class with the Database attribute implicitly becomes a database class [Database] public class Vehicle { public int YearOfManufacturing; } public class Car : Vehicle { public int Wheels; } Every Database class is available to standard SQL expressions in C#, no ORM is needed. You can use queries with path expressions to increase performance and replace complex joins: ..person.City.Name =?", this); [Database] public class Person{ public String Name; public IEnumerable Quotes(){ return SQL("SELECT q FROM Quotes q WHERE q.Who=?", this); } You can exclude fields and auto-created properties from becoming columns by using the Transient custom attribute. It will remain a regular .NET field/property with value stored on the CLR heap and garbage collected later. Row is removed using Delete() method. Starcounter supports: ○ Transaction scopes ○ Long running transactions ○ Nested transactions ○ Parallel transactions Autor: Bojan Brankov Game design, game development, business software development, web development, board games, improv