Abstracts

Nathan Marz – A call for sanity in NoSQL

The techniques, algorithms, and technologies for managing data have progressed greatly since the advent of the relational database 40 years ago, yet applications are getting HARDER to build, not easier. NoSQL provides scalability at the cost of sanity. The complexities we face as software engineers runs deep inside the industry. We have become so accustomed to painful schema implementations that people consider "schemaless" to be a feature. We use eventually consistent databases that require extremely intricate read-repair algorithms in order to work properly, and we fail to see that this is a huge red flag of something seriously wrong. Finally, we build systems that fall apart due to the slightest human mistake. That anyone – and seemingly everyone – runs systems with any of these complexities is insane. Sanity is within your reach, but you need to work for it. You need to stop placing the relational database on a holy pedestal and instead think of data systems from first principles. It turns out the technology is already here in order to build data systems that are scalable AND easy to reason about. In this talk, you'll learn how.

view the slides

Christian Kvalheim – A Journey through the MongoDB Internals

Join us in a voyage into the MongoDB internals and learn how the database actually works under the covers, how the data is structured and how it leverages memory mapping to it’s advantage. Learn about how write concerns, replication and sharding works under the covers.

view the slides

Clinton Gormley – Getting down and dirty with Elasticsearch

We’ve had talks about how Elasticsearch can scale search and analytics to hundreds of nodes and terrabytes of data. We know it is up to the job, big or small. The question now is: How do I use it? How do I go from a list of requirements to a functioning performant application? This talk will take you from the beginning — how to get your data into Elasticsearch — through the process of tweaking the schema to support your search requirements. We will talk about how to construct complex queries using the query DSL, and the tools available for understanding search results. Finally, we will talk about using facets to help your users navigate your data. By the end of the talk, you should have a solid understanding of the tools available to you in Elasticsearch.

view the slides

Daniel Villatoro – Cicerone: A Real-Time social venue recommender

Smart-devices with information sharing capabilities anytime and anywhere have opened a wide range of ubiquitous applications. Within urban environments citizens have a plethora of locations to choose from, and in the advent of the smart-cities paradigm, this is the scope of location-based recommender systems to provide citizens with the adequate suggestions. In this work we present the design of an in-situ location-based scalable recommender system, where the venue recommendations are built upon the users’ location at request-time, but also incorporating the social dimension and the expertise of the neighboring users knowledge used to build the recommendations. Moreover, we propose a specific hight scalability architecture, that bases its functioning in NoSQL usage such as MongoDB and Neo4j. Our system constructs its knowledge base from the accessible social data this implies work with data streams and huge of historical data for this reason this project is designed as a Big Data ready platform.

view the slides

David Mytton – NoSQL Infrastructure

NoSQL databases are often touted for their performance and whilst it’s true that they usually offer great performance out of the box, it still really depends on how you deploy your infrastructure. Dedicated vs cloud? In memory vs on disk? Spindal vs SSD? Replication lag. Multi data centre deployment. This talk will consider all the infrastructure requirements of a successful high performance infrastructure with hints and tips that can be applied to any NoSQL technology. It will include things like OS tweaks, disk benchmarks, replication, monitoring and backups.

view the slides

Doug Turnbull – Database History from Codd to Brewer and Beyond

There are innumerable technical lessons to learn from database history. Its easy to go with what’s new and trendy. Its harder to appreciate technical reasons why one approach suddenly became more favored than another. History highlights the limitations and power behind database solutions. If we don’t learn from history we are doomed to repeat it: – What were the first databases like (Codasyl, etc)? Why did they start out this way? – Why was RDMS the right technical response to the non RDMS databases back in the day? – Why was the move away from RDMS to NoSQL the right technical solution for many problems today? A great introductory to the basic technical scaffolding and historic context for NoSQL, from this talk, you’ll have a deeper appreciation of the transition from vertically scaling Big Metal to horizontally scaling Big Data.

view the slides

Gianmarco De Francisci Morales – SAMOA: A Platform for Mining Big Data Streams

Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. In this talk, we present SAMOA, an upcoming platform for mining big data streams. SAMOA is a platform for online mining in a cluster/cloud environment. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as S4 and Storm. SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering.

view the slides

Iván de Prado – Splout SQL: Web-latency SQL View for Hadoop

There are many Big Data problems whose output is also Big Data. In this presentation we will show Splout SQL, which allows serving an arbitrarily big dataset by partitioning it. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. When the output of a Hadoop process is big, there isn’t a satisfying solution for serving it. Splout decouples database creation from database serving and makes it efficient and safe to deploy Hadoop-generated datasets. Splout is not a “fast analytics” engine. Splout is made for demanding web or mobile applications where query performance is critical. On top of that, Splout is scalable, flexible, RESTful & open-source.

view the slides

Javier Ramirez – API Analytics with Redis and Bigquery

At teowaki we have a system for API use analytics using Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in just miliseconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will speak of the alternatives we evaluated and how we are using Redis and Bigquery to solve our problem.

view the slides

Jeroen Reijn – Realtime visitor analysis with Couchbase and Elasticsearch

The time that Web Content Management products were just for managing “static” content was a long time ago. Right now the WCM market is all about delivering relevant content. Information about each individual visitor needs to be stored and processed real-time to be able to deliver the most relevant content to each individual visitor. This brings new and interesting challenges, which can be solved with today’s emerging technologies. During this presentation I will go into detail about how we’re building a high performance relevance platform at Hippo with Couchbase and Elasticsearch. The talk will also cover why we chose Couchbase for storage and how Elasticsearch can be used for search and analytics. I will share how we integrate and leverage both products full-circle from within our Hippo CMS product.

view the slides

Joel Jacobsen – Killing pigs and saving Danish bacon with Riak

NoSQL offers us a multitude of new database options. Choosing the right one can mean the difference between the success or failure of your project. In this talk Matthew’ll look at four vastly different organisations and why they choose Riak. At the end of this talk you should have an idea of when you’d use Riak and, just as importantly, when not to use Riak.

view the slides

Lucas Dohmen – ArangoDB – a different approach to NoSQL

ArangoDB is an open source NoSQL database, offering developers lots of flexibility. While still being schema-free, the database automatically recognizes similarities in data, thus allowing storage space reductions. ArangoDB is primarily useful to store semi-structured documents, making it an ideal backing store for JSON data. Modelling and traversal of graph data structures is also supported with dedicated graph functionality. The database also comes with AQL, a query language for complex data retrieval operations including joins, subqueries etc. Additionally, the database provides ACID transactions for executing multi-document/multi-collections operations. ArangoDB also allows custom JavaScript functionality to be published server-side, allowing developers to provide easy-to-use APIs with dedicated control over the data.

view the slides

Michael Hausenblas – Harnessing the Internet of Things with NoSQL

The emerging Internet of Things (IoT) comes in different shapes and one can easily claim that it has already started to penetrate our everyday life: from consumer devices such as smartphones and wearables over buildings to entire cities, where sensors are increasingly deployed; estimates range from 40 billion to 50 billion IoT devices by 2020. In order to manage the data torrent from IoT devices we need to re-think data processing options. Hence, in this talk we will discuss the requirements to capture, store and process data from IoT devices. We will also look into real-world use cases and deployments and make the case why NoSQL solutions are a great fit for harnessing the IoT.

view the slides

Michel Müller – Building information systems using rapid application development methods

view the slides

Niklas Bjorkman – Big Memory – Scale-in vs. Scale-out

A single PC is capable of handling over a hundred billion instructions per second. That is more than a hundred thousand IBM mainframes when Oracle v2 was launched. A single high-end network card can handle 10 Gbps, more than the total output of Wikipedia. Are we really deploying all of this power? This presentation will provide a practical hands-on demonstration and visionary outlooks: • Disk and communication latency is still slow. Can we remove that? • RAM is a hundred thousand times faster than disk, and prices have dropped from $1 to 0.5 cents. • If datacenters move into a single box with Big RAM, how many users can it serve? The answer might surprise you. • What are the downsides? • A practical example of how to serve millions of web users with real-time web database requests using a single PC.

view the slides

Pablo Enfedaque and Javier Arias – Sprayer: low latency, reliable multichannel messaging for Telefonica Digital

At Telefonica Product Development and Innovation we are developing an internal messaging service to be used by our own products. Sprayer is a low latency, reliable messaging system supporting delivery of messages to a single receiver, predefined group of receivers or specific list of receivers over different channels (SMS, HTTP, WebSockets, Email, Android, iOS and Firefox OS native push…). We are using Redis, MongoDB and RabbitMQ to implement Sprayer. In this talk we will review Sprayer’s architecture. We will see for each of these technologies, why, where and for what they are used as well as some performance figures.

view the slides

Patrick Heneise – Bringing NoSQL to your mobile!

Mobile devices are the preferred means of data access today, but databases are stuck in the mainframe era. The NoSQL document model can be leveraged for off-line synchronization. See example code to quickly get up to speed building off-line capable applications for major mobile platforms, and learn how you can contribute to the open source projects behind this movement.

view the slides

Rubén Casado – Lambdoop, a framework for easy development of Big Data applications

Most of the existing Big Data technologies are focused on managing large amount of static data (e.g. Hadoop, Hive, Pig). On the other hand, trending approaches try to deal with real time processing of dynamic data (e.g Storm, S4). Batch processing of massive static data provides strong results since they can take into account more information and, for example, perform better training of predictive models. But batch processing takes time and is not feasible for domains where the response time is a critical issue. Real time processing solves this issue, but it uses a weak approach where the analyzed information is limited in order to achieve low latency. Many domains require the benefit of both batch and real time processing approaches. It is not an easy issue to develop software architecture by tailoring suitable technologies, software layers, data sources, data storage solutions, smart algorithms and so on to achieve the good scalable solution. That is where Lambdoop comes in. Lambdoop is a software framework for easing developing Big Data applications by combining real time and batch processing approaches. It implements a Lambda based architecture that provide an abstraction layer to the developers. Developers do not have to deal with different technologies, configurations, data formats … They just use Lambdoop framework as the only needed API. Lambdoop also includes extra tools such as input/output drivers, visualization tools, cluster management tools and widely accepted AI algorithms. To evaluate the effectiveness of Lambdoop we have applied our framework to different real scenarios: 1) Analysis and prediction of data air quality information; 2) Social networks based identification of emergent situations and 3) Quantum Chemistry molecular dynamics simulations. Conclusions of the evaluations provide good feedback to improve the development of the framework.

view the slides

Stefan Armbruster – Introduction to Graph Databases

Join this presentation for a high level introduction to graph databases. This talk demonstrates how graph databases fit within the NOSQL space, and where they are most appropriately used. In this session you will learn: * Owerview of NOSQL * Why graphs matter * Overview of Neo4j

view the slides

Steffen Krause – DynamoDB – on-demand NoSQL scaling as a service

Scaling a distributed NoSQL database and making it resilient to failure can be hard. With Amazon DynamoDB, you just specify the desired throughput, consistency level and upload your data. DynamoDB does all the heavy lifting for you. Come to this session to get an overview of an automated, self-managed key-value store that can seamlessly scale to hundreds of thousands of operations per second.

view the slides

Uwe Friedrichsen – How to survive in a BASE world

NoSQL, Big Data and Scale-out in general are leaving the hype plateau and start to become enterprise reality. This usally means no more ACID tranactions, but BASE transactions instead. When confronted with BASE, many developers just shrug and think “Okay, no more SQL but that’s basically it, isn’t it?”. They are terribly wrong! BASE transactions do not guarantee data consistency at all times anymore, which is a property we became so used to in the ACID years that we barely think about it anymore. But if we continue to design and implement our applications as if there still were ACID transactions, system crashes and corrupt data will become your daily company. This session gives a quick introduction into the challenges of BASE transactions and explains how to design and implement a BASE-aware application using real code examples. Additionally we extract some concrete patterns in order to preserve the ideas in a concise way. Let’s get ready to survive in a BASE world!

view the slides