Nathan Marz – A call for sanity in NoSQL
The techniques, algorithms, and technologies for managing data have progressed greatly since the advent of the relational database 40 years ago, yet applications are getting HARDER to build, not easier. NoSQL provides scalability at the cost of sanity. The complexities we face as software engineers runs deep inside the industry. We have become so accustomed to painful schema implementations that people consider "schemaless" to be a feature. We use eventually consistent databases that require extremely intricate read-repair algorithms in order to work properly, and we fail to see that this is a huge red flag of something seriously wrong. Finally, we build systems that fall apart due to the slightest human mistake. That anyone – and seemingly everyone – runs systems with any of these complexities is insane. Sanity is within your reach, but you need to work for it. You need to stop placing the relational database on a holy pedestal and instead think of data systems from first principles. It turns out the technology is already here in order to build data systems that are scalable AND easy to reason about. In this talk, you'll learn how.
Christian Kvalheim – A Journey through the MongoDB Internals
Join us in a voyage into the MongoDB internals and learn how the database actually works under the covers, how the data is structured and how it leverages memory mapping to it’s advantage. Learn about how write concerns, replication and sharding works under the covers.
Clinton Gormley – Getting down and dirty with Elasticsearch
We’ve had talks about how Elasticsearch can scale search and analytics to hundreds of nodes and terrabytes of data. We know it is up to the job, big or small. The question now is: How do I use it? How do I go from a list of requirements to a functioning performant application? This talk will take you from the beginning — how to get your data into Elasticsearch — through the process of tweaking the schema to support your search requirements. We will talk about how to construct complex queries using the query DSL, and the tools available for understanding search results. Finally, we will talk about using facets to help your users navigate your data. By the end of the talk, you should have a solid understanding of the tools available to you in Elasticsearch.
Daniel Villatoro – Cicerone: A Real-Time social venue recommender
Smart-devices with information sharing capabilities anytime and anywhere have opened a wide range of ubiquitous applications. Within urban environments citizens have a plethora of locations to choose from, and in the advent of the smart-cities paradigm, this is the scope of location-based recommender systems to provide citizens with the adequate suggestions. In this work we present the design of an in-situ location-based scalable recommender system, where the venue recommendations are built upon the users’ location at request-time, but also incorporating the social dimension and the expertise of the neighboring users knowledge used to build the recommendations. Moreover, we propose a specific hight scalability architecture, that bases its functioning in NoSQL usage such as MongoDB and Neo4j. Our system constructs its knowledge base from the accessible social data this implies work with data streams and huge of historical data for this reason this project is designed as a Big Data ready platform.
David Mytton – NoSQL Infrastructure
NoSQL databases are often touted for their performance and whilst it’s true that they usually offer great performance out of the box, it still really depends on how you deploy your infrastructure. Dedicated vs cloud? In memory vs on disk? Spindal vs SSD? Replication lag. Multi data centre deployment. This talk will consider all the infrastructure requirements of a successful high performance infrastructure with hints and tips that can be applied to any NoSQL technology. It will include things like OS tweaks, disk benchmarks, replication, monitoring and backups.
Doug Turnbull – Database History from Codd to Brewer and Beyond
There are innumerable technical lessons to learn from database history. Its easy to go with what’s new and trendy. Its harder to appreciate technical reasons why one approach suddenly became more favored than another. History highlights the limitations and power behind database solutions. If we don’t learn from history we are doomed to repeat it: – What were the first databases like (Codasyl, etc)? Why did they start out this way? – Why was RDMS the right technical response to the non RDMS databases back in the day? – Why was the move away from RDMS to NoSQL the right technical solution for many problems today? A great introductory to the basic technical scaffolding and historic context for NoSQL, from this talk, you’ll have a deeper appreciation of the transition from vertically scaling Big Metal to horizontally scaling Big Data.
Gianmarco De Francisci Morales – SAMOA: A Platform for Mining Big Data Streams
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. In this talk, we present SAMOA, an upcoming platform for mining big data streams. SAMOA is a platform for online mining in a cluster/cloud environment. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as S4 and Storm. SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering.
Iván de Prado – Splout SQL: Web-latency SQL View for Hadoop
There are many Big Data problems whose output is also Big Data. In this presentation we will show Splout SQL, which allows serving an arbitrarily big dataset by partitioning it. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. When the output of a Hadoop process is big, there isn’t a satisfying solution for serving it. Splout decouples database creation from database serving and makes it efficient and safe to deploy Hadoop-generated datasets. Splout is not a “fast analytics” engine. Splout is made for demanding web or mobile applications where query performance is critical. On top of that, Splout is scalable, flexible, RESTful & open-source.
Javier Ramirez – API Analytics with Redis and Bigquery
At teowaki we have a system for API use analytics using Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in just miliseconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will speak of the alternatives we evaluated and how we are using Redis and Bigquery to solve our problem.
Jeroen Reijn – Realtime visitor analysis with Couchbase and Elasticsearch
The time that Web Content Management products were just for managing “static” content was a long time ago. Right now the WCM market is all about delivering relevant content. Information about each individual visitor needs to be stored and processed real-time to be able to deliver the most relevant content to each individual visitor. This brings new and interesting challenges, which can be solved with today’s emerging technologies. During this presentation I will go into detail about how we’re building a high performance relevance platform at Hippo with Couchbase and Elasticsearch. The talk will also cover why we chose Couchbase for storage and how Elasticsearch can be used for search and analytics. I will share how we integrate and leverage both products full-circle from within our Hippo CMS product.
Joel Jacobsen – Killing pigs and saving Danish bacon with Riak
NoSQL offers us a multitude of new database options. Choosing the right one can mean the difference between the success or failure of your project. In this talk Matthew’ll look at four vastly different organisations and why they choose Riak. At the end of this talk you should have an idea of when you’d use Riak and, just as importantly, when not to use Riak.
Lucas Dohmen – ArangoDB – a different approach to NoSQL
Michael Hausenblas – Harnessing the Internet of Things with NoSQL
The emerging Internet of Things (IoT) comes in different shapes and one can easily claim that it has already started to penetrate our everyday life: from consumer devices such as smartphones and wearables over buildings to entire cities, where sensors are increasingly deployed; estimates range from 40 billion to 50 billion IoT devices by 2020. In order to manage the data torrent from IoT devices we need to re-think data processing options. Hence, in this talk we will discuss the requirements to capture, store and process data from IoT devices. We will also look into real-world use cases and deployments and make the case why NoSQL solutions are a great fit for harnessing the IoT.
Michel Müller – Building information systems using rapid application development methods
Niklas Bjorkman – Big Memory – Scale-in vs. Scale-out
A single PC is capable of handling over a hundred billion instructions per second. That is more than a hundred thousand IBM mainframes when Oracle v2 was launched. A single high-end network card can handle 10 Gbps, more than the total output of Wikipedia. Are we really deploying all of this power? This presentation will provide a practical hands-on demonstration and visionary outlooks: • Disk and communication latency is still slow. Can we remove that? • RAM is a hundred thousand times faster than disk, and prices have dropped from $1 to 0.5 cents. • If datacenters move into a single box with Big RAM, how many users can it serve? The answer might surprise you. • What are the downsides? • A practical example of how to serve millions of web users with real-time web database requests using a single PC.
Pablo Enfedaque and Javier Arias – Sprayer: low latency, reliable multichannel messaging for Telefonica Digital
At Telefonica Product Development and Innovation we are developing an internal messaging service to be used by our own products. Sprayer is a low latency, reliable messaging system supporting delivery of messages to a single receiver, predefined group of receivers or specific list of receivers over different channels (SMS, HTTP, WebSockets, Email, Android, iOS and Firefox OS native push…). We are using Redis, MongoDB and RabbitMQ to implement Sprayer. In this talk we will review Sprayer’s architecture. We will see for each of these technologies, why, where and for what they are used as well as some performance figures.
Patrick Heneise – Bringing NoSQL to your mobile!
Mobile devices are the preferred means of data access today, but databases are stuck in the mainframe era. The NoSQL document model can be leveraged for off-line synchronization. See example code to quickly get up to speed building off-line capable applications for major mobile platforms, and learn how you can contribute to the open source projects behind this movement.
Rubén Casado – Lambdoop, a framework for easy development of Big Data applications
Most of the existing Big Data technologies are focused on managing large amount of static data (e.g. Hadoop, Hive, Pig). On the other hand, trending approaches try to deal with real time processing of dynamic data (e.g Storm, S4). Batch processing of massive static data provides strong results since they can take into account more information and, for example, perform better training of predictive models. But batch processing takes time and is not feasible for domains where the response time is a critical issue. Real time processing solves this issue, but it uses a weak approach where the analyzed information is limited in order to achieve low latency. Many domains require the benefit of both batch and real time processing approaches. It is not an easy issue to develop software architecture by tailoring suitable technologies, software layers, data sources, data storage solutions, smart algorithms and so on to achieve the good scalable solution. That is where Lambdoop comes in. Lambdoop is a software framework for easing developing Big Data applications by combining real time and batch processing approaches. It implements a Lambda based architecture that provide an abstraction layer to the developers. Developers do not have to deal with different technologies, configurations, data formats … They just use Lambdoop framework as the only needed API. Lambdoop also includes extra tools such as input/output drivers, visualization tools, cluster management tools and widely accepted AI algorithms. To evaluate the effectiveness of Lambdoop we have applied our framework to different real scenarios: 1) Analysis and prediction of data air quality information; 2) Social networks based identification of emergent situations and 3) Quantum Chemistry molecular dynamics simulations. Conclusions of the evaluations provide good feedback to improve the development of the framework.
Stefan Armbruster – Introduction to Graph Databases
Join this presentation for a high level introduction to graph databases. This talk demonstrates how graph databases fit within the NOSQL space, and where they are most appropriately used. In this session you will learn: * Owerview of NOSQL * Why graphs matter * Overview of Neo4j
Steffen Krause – DynamoDB – on-demand NoSQL scaling as a service
Scaling a distributed NoSQL database and making it resilient to failure can be hard. With Amazon DynamoDB, you just specify the desired throughput, consistency level and upload your data. DynamoDB does all the heavy lifting for you. Come to this session to get an overview of an automated, self-managed key-value store that can seamlessly scale to hundreds of thousands of operations per second.
Uwe Friedrichsen – How to survive in a BASE world
NoSQL, Big Data and Scale-out in general are leaving the hype plateau and start to become enterprise reality. This usally means no more ACID tranactions, but BASE transactions instead. When confronted with BASE, many developers just shrug and think “Okay, no more SQL but that’s basically it, isn’t it?”. They are terribly wrong! BASE transactions do not guarantee data consistency at all times anymore, which is a property we became so used to in the ACID years that we barely think about it anymore. But if we continue to design and implement our applications as if there still were ACID transactions, system crashes and corrupt data will become your daily company. This session gives a quick introduction into the challenges of BASE transactions and explains how to design and implement a BASE-aware application using real code examples. Additionally we extract some concrete patterns in order to preserve the ideas in a concise way. Let’s get ready to survive in a BASE world!