High Scalability

Paper: ZooKeeper: Wait-free coordination for Internet-scale systems

High Scalability - Thu, 2014-07-31 15:56

Do you really need to roll your own? ZooKeeper: Wait-free coordination for Internet-scale systems: In this paper, we describe ZooKeeper, a service for coordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a replicated, centralized service. The interface exposed by Zoo-Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet powerful coordination service.

 

The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state. These design decisions enable the implementation of a high performance processing pipeline with read requests being satisfied byvlocal servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used extensively by client applications.

ZooKeeper achieves throughput values of hundreds of thousands of operations per second for read-dominant workloads by using fast reads with watches, both of which served by local replicas. Although our consistency guarantees for reads and watches appear to be weak, we have shown with our use cases that this combination allows us to implement efficient and sophisticated coordination protocols at the client even though reads are not precedence-ordered and the implementation of data objects is wait-free. The wait-free property has proved to be essential for high performance.

Although we have described only a few applications, there are many others using ZooKeeper. We believe such a success is due to its simple interface and the powerful abstractions that one can implement through this interface. Further, because of the high-throughput of ZooKeeper, applications can make extensive use of it, not only course-grained locking.

Related Articles
Categories: High Scalability

Preventing the Dogpile Effect - Problem and Solution

High Scalability - Wed, 2014-07-30 15:56

This is a guest repost Przemek Sobstel, who believes that dogpile effect issue is not covered enough, especially in the PHP world. Orignal article: Preventing dogpile effect.

The Dogpile effect occurs when cache expires and websites are hit by numerous requests the same time. From my own experiences working on big-traffic websites this is what I consider best the best solution. It was used sucessfully in the wild and it worked. Many people mention storing two redundant values FRESH + STALE, but for big traffic websites it was killing our network. We thought it worth sharing our solution and starting a discussion for sharing experiences.

Preventing Dogpiles
Categories: High Scalability

The Great Microservices vs Monolithic Apps Twitter Melee

High Scalability - Mon, 2014-07-28 15:56

Once upon a time a great Twitter melee was fought for the coveted title of Consensus Best Way to Structure Systems. The competition was between Microservices and Monolithic Apps. 

Flying the the logo of Microservices, from a distant cloud covered land, is the Kingdom of Netflix, whose champion was Sir Adrian Cockcroft (who has pledged fealty to another). And for the Kingdom of ThoughtWorks we have Sir Sam Newman as champion.

Flying the logo of the Monolithic App is champion Sir John Allspaw, from the fair Kingdom of Etsy.

Knights from the Kingdom of Digital Ocean and several independent realms filled out the list.

To the winner goes a great prize: developer mindshare and the favor of that most fickle of ladies, Lady Luck.

May the best paradigm win.

The opening blow was wielded by the highly ranked Sir Cockcroft, a veteran of many tournaments:

Categories: High Scalability

Stuff The Internet Says On Scalability For July 25th, 2014

High Scalability - Fri, 2014-07-25 16:15

Hey, it's HighScalability time:


It's systems all the way down. Bugs That Call Us Home.
  • 1 million users in just 4 days: Yo;  30 billion: Pinterest Pins
  • Quotable Quotes:
    • @GlennF: Amazon still dreams it is a startup, like a dog dreaming of chasing rabbits, twitching its legs while asleep.
    • @mfdii: Nobody knows how git works. We all just type in commands like monkeys trying to write Shakespeare. #devopsdays 
    • Benedict Evans: When you pull these strands together, smartphones don't just increase the size of the internet by 2x or 3x, but more like 5x or 10x. It's not just how many devices, but how different those devices are, that has the multiplier effect.
    • @Aaronontheweb: @codinghorror I broke this rule for myself last week. Spent 3 days fixing a problem that we finally solved by a $0.06/hour AWS bill increase
    • Physicist George Ellis: Barring something very unforeseen – the possible tests of the very large and the very small are coming towards the limits of whatever will be possible.
    • The Master Switch: Once the industry had concluded that its profits could be maximized if more people listened to fewer stations, the government, acting as if the business of America were only business, did the industry’s bidding, showing only the most feeble awareness of its consequences for the American ideal of free expression.

  • Ex-Googlers try to recreate Spanner with CockroachDB (awesome name!), which is A Scalable, Geo-Replicated, Transactional Datastore. The design is here and looks good. There's an article on Wired. Good discussion on HackerNews. A globally distributed transactional database it's not, yet, but it's early days yet. After all, they can only work in the dark.

  • Useful post on Handling 1 Billion requests a week with Symfony2. Symfony2 provides good performance and a nice development environment. HAProxy distributes to application servers. Varnish in every application’s server to keep high availability – without having a single point of failure (SPOF). Redis and MySQL for storing data. MySQL is mostly used as a third-tier cache layer (Varnish > Redis > MySQL) for non-expiring resources. 

  • The truest form of the Interest Graph on the net? Details on how Pinterest scales their data infrastructure to create a personalized discovery engine. 20 terabytes of new data each day. 10 petabytes of data in S3. 100 regular Mapreduce users run over 2,000 jobs each day through Qubole. 6 standing Hadoop clusters comprised of over 3,000 nodes. 

  • Just like Captain Kirk. Shifts In Algorithm Design: Now today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful. The brilliance of the current approach is that we can change the problem. 

  • Wet Computing Could Put a Terabyte in a Tablespoon: Researchers from the University of Michigan and New York University demonstrated how plastic nanoparticles, deposited in a liquid, can form a one-bit cluster—the essential building block for information storage. It's called "wet computing," and the technique mimics other biological processes found in nature, like DNA in living cells.

  • Daniel Eloff: The world is not just going massively multicore, it's going heterogeneous core. The one core fits all model of programming is going away. Big performance and efficiency gains can be had from splitting your application among different types of specialized processors. Programmable hardware with FPGAs seems like a natural extension of this trend.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Sponsored Post: Apple, Asana, FoundationDB, Cie Games, BattleCry, Surge, Dreambox, Chartbeat, Monitis, Netflix, Salesforce, Cloudant, CopperEgg, Logentries, Gengo, Couchbase, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, ManageEngin

High Scalability - Tue, 2014-07-22 16:00

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Software Developer in Test. The iOS Systems team is looking for a Quality Assurance engineer. In this role you will be expected to work hand-in-hand with the software engineering team to find and diagnose software defects. The ideal candidate will also seek out ways to further automate all aspects of our existing process. This is a highly technical role and requires in-depth knowledge of both white-box and black-box testing methodologies. Please apply here
    • Senior Software Engineer -iOS Systems. Do you love building highly scalable, distributed web applications? Does the idea of a fast-paced environment make your heart leap? Do you want your technical abilities to be challenged every day, and for your work to make a difference in the lives of millions of people? If so, the iOS Systems Carrier Services team is looking for a talented software engineer who is not afraid to share knowledge, think outside the box, and question assumptions. Please apply here.
    • Software Engineering Manager, IS&T WWDR Dev Systems. The WWDR development team is seeking a hands-on engineering manager with a passion for building large-scale, high-performance applications. The successful candidate will be collaborating with Worldwide Developer Relations (WWDR) and various engineering teams throughout Apple. You will lead a team of talented engineers to define and build large-scale web services and applications. Please apply here.
    • C++ Senior Developer and Architect- Maps. The Maps Team is looking for a senior developer and architect to support and grow some of the core backend services that support Apple Map's Front End Services. Ideal candidate would have experience with system architecture, as well as the design, implementation, and testing of individual components but also be comfortable with multiple scripting languages. Please apply here.

  • Cie Games, small indie developer and publisher in LA, is looking for rock star Senior Game Java programmers to join our team! We need devs with extensive experience building scalable server-side code for games or commercial-quality applications that are rich in functionality. We offer competitive comp, great benefits, interesting projects, and exceptional growth opportunities. Check us out at http://www.ciegames.com/careers.

  • BattleCry, the newest ZeniMax studio in Austin, is seeking a qualified Front End Web Engineer to help create and maintain our web presence for AAA online games. This includes the game accounts web site, enhancing the studio website, our web and mobile- based storefront, and front end for support tools. http://jobs.zenimax.com/requisitions/view/540

  • FoundationDB is seeking outstanding developers to join our growing team and help us build the next generation of transactional database technology. You will work with a team of exceptional engineers with backgrounds from top CS programs and successful startups. We don’t just write software. We build our own simulations, test tools, and even languages to write better software. We are well-funded, offer competitive salaries and option grants. Interested? You can learn more here.

  • Asana. As an infrastructure engineer you will be designing software to process, query, search, analyze, and store data for applications that are continually growing in scale. You will work with a world-class team of engineers on deploying and operating existing systems, and building new ones for problems that are unique to our problem space. Please apply here.

  • Operations Engineer - AWS Cloud. Want to grow and extend a cutting-edge cloud deployment? Take charge of an innovative 24x7 web service infrastructure on the AWS Cloud? Join DreamBox Learning’s creative team of engineers, designers, and educators. Help us radically change education in an environment that values collaboration, innovation, integrity and fun. Please apply here. http://www.dreambox.com/careers

  • Chartbeat measures and monetizes attention on the web. Our traffic numbers are growing, and so is our list of product and feature ideas. That means we need you, and all your unparalleled backend engineer knowledge to help up us scale, extend, and evolve our infrastructure to handle it all. If you've these chops: www.chartbeat.com/jobs/be, come join the team!

  • The Salesforce.com Core Application Performance team is seeking talented and experienced software engineers to focus on system reliability and performance, developing solutions for our multi-tenant, on-demand cloud computing system. Ideal candidate is an experienced Java developer, likes solving real-world performance and scalability challenges and building new monitoring and analysis solutions to make our site more reliable, scalable and responsive. Please apply here.

  • Sr. Software Engineer - Distributed Systems. Membership platform is at the heart of Netflix product, supporting functions like customer identity, personalized profiles, experimentation, and more. Are you someone who loves to dig into data structure optimization, parallel execution, smart throttling and graceful degradation, SYN and accept queue configuration, and the like? Is the availability vs consistency tradeoff in a distributed system too obvious to you? Do you have an opinion about asynchronous execution and distributed co-ordination? Come join us

  • Human Translation Platform Gengo Seeks Sr. DevOps Engineer. Build an infrastructure capable of handling billions of translation jobs, worked on by tens of thousands of qualified translators. If you love playing with Amazon’s AWS, understand the challenges behind release-engineering, and get a kick out of analyzing log data for performance bottlenecks, please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • OmniTI has a reputation for scalable web applications and architectures, but we still lean on our friends and peers to see how things can be done better. Surge started as the brainchild of our employees wanting to bring the best and brightest in Web Operations to our own backyard. Now in its fifth year, Surge has become the conference on scalability and performance. Early Bird rate in effect until 7/24!
Cool Products and Services
  • A third party vendor benchmarked the performance of three databases: Couchbase Server, MongoDB and DataStax Enterprise. The databases were benchmarked with two different workloads (read-intensive / balanced) via YCSB on dedicated servers. Read.

  • Now track your log activities with Log Monitor and be on the safe side! Monitor any type of log file and proactively define potential issues that could hurt your business' performance. Detect your log changes for: Error messages, Server connection failures, DNS errors, Potential malicious activity, and much more. Improve your systems and behaviour with Log Monitor.

  • The NoSQL "Family Tree" from Cloudant explains the NoSQL product landscape using an infographic. The highlights: NoSQL arose from "Big Data" (before it was called "Big Data"); NoSQL is not "One Size Fits All"; Vendor-driven versus Community-driven NoSQL.  Create a free Cloudant account and start the NoSQL goodness

  • Finally, log management and analytics can be easy, accessible across your team, and provide deep insights into data that matters across the business - from development, to operations, to business analytics. Create your free Logentries account here.

  • CopperEgg. Simple, Affordable Cloud Monitoring. CopperEgg gives you instant visibility into all of your cloud-hosted servers and applications. Cloud monitoring has never been so easy: lightweight, elastic monitoring; root cause analysis; data visualization; smart alerts. Get Started Now.

  • Whitepaper Clarifies ACID Support in Aerospike. In our latest whitepaper, author and Aerospike VP of Engineering & Operations, Srini Srinivasan, defines ACID support in Aerospike, and explains how Aerospike maintains high consistency by using techniques to reduce the possibility of partitions.  Read the whitepaper: http://www.aerospike.com/docs/architecture/assets/AerospikeACIDSupport.pdf.

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: High Scalability

StackOverflow Update: 560M Pageviews a Month, 25 Servers, and It's All About Performance

High Scalability - Mon, 2014-07-21 16:00

The folks at Stack Overflow remain incredibly open about what they are doing and why. So it’s time for another update. What has Stack Overflow been up to?

The network of sites that make up StackExchange, which includes StackOverflow, is now ranked 54th for traffic in the world; they have 110 sites and are growing at a rate of 3 or 4 a month; 4 million users; 40 million answers; and 560 million pageviews a month.

This is with just 25 servers. For everything. That’s high availability, load balancing, caching, databases, searching, and utility functions. All with a relative handful of employees. Now that’s quality engineering.

This update is based on The architecture of StackOverflow (video) by Marco Cecconi and What it takes to run Stack Overflow (post) by Nick Craver. In addition, I’ve merged in comments from various sources. No doubt some of the details are out of date as I meant to write this article long ago, but it should still be representative. 

Stack Overflow still uses Microsoft products. Microsoft infrastructure works and is cheap enough, so there’s no compelling reason to change. Yet SO is pragmatic. They use Linux where it makes sense. There’s no purity push to make everything Linux or keep everything Microsoft. That wouldn’t be efficient. 

Stack Overflow still uses a scale-up strategy. No clouds in site. With their SQL Servers loaded with 384 GB of RAM and 2TB of SSD, AWS would cost a fortune. The cloud would also slow them down, making it harder to optimize and troubleshoot system issues. Plus, SO doesn’t need a horizontal scaling strategy. Large peak loads, where scaling out makes sense, hasn’t  been a problem because they’ve been quite successful at sizing their system correctly.

So it appears Jeff Atwood’s quote: "Hardware is Cheap, Programmers are Expensive", still seems to be living lore at the company.

Marco Ceccon in his talk says when talking about architecture you need to answer this question first: what kind of problem is being solved?

First the easy part. What does StackExchange do? It takes topics, creates communities around them, and creates awesome question and answer sites. 

The second part relates to scale. As we’ll see next StackExchange is growing quite fast and handles a lot of traffic. How does it do that? Let’s take a look and see….

Stats
Categories: High Scalability

Stuff The Internet Says On Scalability For July 18th, 2014

High Scalability - Fri, 2014-07-18 16:01

Hey, it's HighScalability time:


The one is many. Lichen composed of possibly 400 distinct species.
  • Quotable Quotes:
    • The Master Switch: Selling radio sets—the old revenue model—was a good if limited business, for ultimately few households would need more than one radio every few years. But advertising revenues could expand indefinitely—or so it seemed then.
    • Larry Page: It’s pretty difficult to solve big problems in four years. I think it’s probably pretty easy to do it in 20 years. I think our whole system is setup in a way that makes it difficult for leaders of really big companies.

  • The Master Switch: The inventors we remember are significant not so much as inventors, but as founders of “disruptive” industries, ones that shake up the technological status quo. Through circumstance or luck, they are exactly at the right distance both to imagine the future and to create an independent industry to exploit it.

  • You thought you were clever and safe? Reality doesn't like that. The fallacy of distributed transactions: In other words, as far as the queue is concerned, the transaction committed, and the message is gone. As far as the database is concerned, that transaction was rolled back, and never happened. Of course, the chance that something like that can happen in one of your systems? Probably one in a million.

  • How do you make Cassandra 50% faster? You add batching of replies. You fix your thread pools. And you get rid of unecessary endcoding and decoding phases like with Thrift. Of course now the bottleneck has moved and the process starts again.

  • Everyday Algorithms: Elevator Allocation. Though I'm pretty sure my contextualized and personalized elevator scheduling goes something like: for him, slow as possible. And they and the crosswalk lights must be in cahoots, because I get the same fine service.

  • Simon Wardley: Anyhow, this is what I don't get. Micro services has become a big thing - good. So, why do we have to continuously create 'new' terms to describe what is already happening? < Why have new songs when they use all the same words? What changes is the context. A new binding requires a new word. It's like a linguistic method of versioning. 10 years ago all the words around that eras version of microservices would be different, so we need a new word now to reflect a new world.

  • Programmers really want the database to work as a queue. It just never works in the end. But if you are Antirez and are a programmer and have your own database then you can make that happen. Queues and databases

  • Another case of breaking one thing in to two parts and then arguing which part is more important. In reference to the debate over Linus' quote on data structures: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." < Good and evil. Light and dark. Mind and Body. Starsky and Hutch. They are defined in terms of each other and make no sense without each other. They are a single system. 

  • New russian 8-core CPU. It may not be the fastest CPU, but it won't break in the field and hardly ever jams when covered in mud or sand.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

10 Program Busting Caching Mistakes

High Scalability - Wed, 2014-07-16 16:00

While Ten Caching Mistakes that Break your App by Omar Al Zabir is a few years old, it is still a great source of advice on using caches, especially on the differences between using a local in-memory cache and when using a distributed cache.

Here are the top 10 mistakes (summarized):
  1. Relying on a default serializer. Default serializers can use a lot of CPU, especially for complex types. Give some thought to the best serialization and deserialization method for your language and environment.
  2. Storing large objects in a single cache item. Because of serialization and deserialization costs, under concurrent load, frequent access to large object graphs can kill your server's CPU. Instead, break up the larger graph into smaller subgraphs and cache them separately. Retrieve only the smallest unit you need.
  3. Using cache to share objects between threads. Race conditions, when writes are involved, develop if parts of a program are accessing the same cached items simultaneously. Some sort of external locking mechanism is needed. 
  4. Assuming items will be in cache immediately after storing them. Never assume an item will be in a cache, even after it was just written, because a cache can flush items when memory gets tight. Code should always check for a null return value from a cache.
  5. Storing entire collection with nested objects. Storing an entire collection when you need to get a particular item results in poor performance because of the serialization overhead. Cache individual items separately so they can be retrieved separately. 
  6. Storing parent-child objects together and also separately. Sometimes an object will simultaneously be contained in two or more parent objects. To not have the same object stored in two different places in the cache store it on its own under its own key. The parent objects will then read the objects when access is needed.
  7. Caching Configuration settings. Store configuration data in a static variable that is local to your process. Accessing cached data is expensive so you want to avoid that cost when possible.
  8. Caching Live Objects that have open handle to stream, file, registry, or network. Don't cache objects the have references to resources like files, streams, memory, etc. When the cached item is removed from the cache those resources will not be deleted and system resources will leak. 
  9. Storing same item using multiple keys. It can be convenient to access an item by a key and an index number.  This can work when a cache is in-memory because the cache can contain a reference to the same object which means changes to the object will be seen through both access paths. When using a remote cache any updates won't be visible so the objects will get out of sync. 
  10. Not updating or deleting items in cache after updating or deleting them on persistent storage. Items in a remote cache are stored as a copy, so updating an object won't update the cache. The cache must specifically be updated for the changes to be seen by anyone else. With an in-memory cache changes to an object will be seen by everyone. Same for deletion. Deleting an object won't delete it from the cache. It's up to the program make sure cached items are deleted correctly.
Categories: High Scalability

Bitly: Lessons Learned Building a Distributed System that Handles 6 Billion Clicks a Month

High Scalability - Mon, 2014-07-14 16:00

Have you ever wondered how bitly makes money? A URL shortener can’t be that hard to write, right? Sean O'Connor, Lead Application Developer at bitly, answers the how can bitly possibly make money question immediately in a talk he gave on bitly at the Bacon conference.

Writing a URL shortner that works is easy, says Sean, writing one that scales and is highly available, is not so easy.

Bitly doesn’t make money with a Shortening as a Service service, bitly makes money on an analytics product that mashes URL click data with with data they crawl from the web to help customers understand what people are paying attention to on the web. 

Analytics products began as a backend service that crawled web server logs. Logs contained data from annotated links along with cookie data to indicate where on a page a link was clicked, who clicked it, what the link was, etc. But the links all went back to the domain of the web site. The idea of making links go to a different domain than your own so that a 3rd party can do the analytics is a scary proposition, but it’s also kind of genius.

While this talk is not on bitly’s architecture, it is a thoughtful exploration on the nature of distributed systems and how you can solve bigger than one box problems with them.

Perhaps my favorite lesson from his talk is this one (my gloss):

SOA + queues + async messaging is really powerful. This approach isolates components, lets work happen concurrently, lets boxes fail independently, while still having components be easy to reason about.

I also really like his explanation for why event style messages are better than command style messages. I’ve never heard it put that way before.

Sean talks from a place of authentic experience. If you are trying to make a jump from a single box mindset to a multibox way of thinking, this talk is well worth watching. 

So let’s see what Sean has to say about distributed systems...

Stats
Categories: High Scalability

Stuff The Internet Says On Scalability For July 11th, 2014

High Scalability - Fri, 2014-07-11 16:01

Hey, it's HighScalability time:


Yesterday in history: Nikola Tesla's Birthday, born in 1856. The greatest geek who ever lived?
  • 10Gbps: New world record broadband speed of 10 Gbps over copper.
  • Quotable Quotes:
    • @BenedictEvans: There were 40m internet users when Netscape IPOed. The time's not far off when a startup with 40m users will be too small to get funded.
    • Scott Aaronson: In any case, the question I asked myself about CLEVER/PageRank was not the one that, maybe in retrospect, I should have asked: namely, “how can I leverage the fact that I know the importance of this idea before most people do, in order to make millions of dollars?”
    • chub79: µservices aren't technological as much as they are cultural.
    • @Elmood: I thought of a new term when talking about code: "It's made from unmaintainium."
    • @lxt: Amazing how quickly a bunch of nines go up in smoke.
    • @martinrue: Knock knock. Race condition. Who's there?

  • The Master Switch: The Rise and Fall of Information Empires by Tim Wu: History shows a typical progression of information technologies: from somebody’s hobby to somebody’s industry; from jury-rigged contraption to slick production marvel; from a freely accessible channel to one strictly controlled by a single corporation or cartel—from open to closed system. History also shows that whatever has been closed too long is ripe for ingenuity’s assault: in time a closed industry can be opened anew, giving way to all sorts of technical possibilities and expressive uses for the medium before the effort to close the system likewise begins again.

  • Tim Freeman indulges a well developed Technothantos Complex and comes up with a great big list of outage postmortems. You'll find the usual, outages from configuration issues, failover failures, quorumnesia, protocol flapping, bugs in not your stuff that causes bugs in your stuff, power outages, capacity problems, JPOBs (just plain old bugs), DDOS attacks, and good old operator error. 

  • Pinterest describes PinLater, An asynchronous job execution system. PinLater executes hundreds of different job types at a processing rate of over 100,000 per second. So you may say yet another async job system, but it's clear keeping such a critical part of their infrastructure in house makes sense. The article is a good explanation of a fairly standard approach. It used Thrift for the API, it's written in Java, Twitter’s Finagle is used for the RPC framework. MySQL is "used for relatively low throughput use cases and those that schedule jobs over long periods and thus can benefit from storing jobs on disk rather than purely in memory." Redis is "used for high throughput job queues that are normally drained in real time." Horizontal scaling is via sharding. 

  • In science class we did this one day, but I just couldn't do it. Dissecting Message Queues. Tyler Treat looks at both brokerless and brokered queues by looking a throughput benchmarks, latency benchmarks, and through qualitative analysis. No winner was declared, but if you are making a choice in this area it's well worth reading. 

  • 40 Million hits a day on WordPress using a $10 VPS. Sure, it's a static site, but still a good example of what can be done these days. Stack: Nginx + PHP-FPM (aka LEMP Stack) + Microcaching. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

High Scalability - Wed, 2014-07-09 16:00

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Categories: High Scalability

Sponsored Post: Surge, Apple, Dreambox, Chartbeat, Monitis, Netflix, Salesforce, Blizzard Entertainment, Cloudant, CopperEgg, Logentries, Gengo, ScaleOut Software, Couchbase, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, ManageEngin

High Scalability - Tue, 2014-07-08 16:00

Who's Hiring?

  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here.
    • Senior Security Engineer. As a Senior Security Engineer on our team, you will be the ‘tip of the spear’ and will have direct impact on the Point-of-Sale system that powers Apple Retail globally. You will contribute to implementing standards and processes across multiple groups within the organization. You will also help lead the organization through a continuous process of learning and improving secure practices. Please apply here
    • Quality Assurance Engineer - Mobile Platforms. Apple’s Mobile Services/Emerging Technology group is looking for a highly motivated, result-oriented Quality Assurance Engineer. You will be responsible for overseeing quality engineering of mobile server and client platforms and applications in a fast-paced dynamic environment. Your job is to exceed our business customer's aggressive quality expectations and take the QA team forward on a path of continuous improvement. Please apply here.
    • Sr Software Engineer. Join Apple's Internet Applications Team, within the Information Systems and Technology group, as a Senior Software Engineer. Be involved in challenging and fast paced projects supporting Apple's business by delivering Java based IS Systems. Please apply here.
    • Senior Payment Engineer.  you will be responsible for working with cross-functional teams and developing Java server-based solutions to address business and technological needs. You will be helping design and build next generation retail solutions. You will be reviewing design and code developed by others on the team.You will build services and integrate with both internal as well as external services in a SOA environment. You will design and develop frameworks to be used by a large community of developers within the organization. Please apply here
    • Software Developer in Test. The iOS Systems team is looking for a Quality Assurance engineer. In this role you will be expected to work hand-in-hand with the software engineering team to find and diagnose software defects. The ideal candidate will also seek out ways to further automate all aspects of our existing process. This is a highly technical role and requires in-depth knowledge of both white-box and black-box testing methodologies. Please apply here
    • Senior Software Engineer -iOS Systems.Do you love building highly scalable, distributed web applications? Does the idea of a fast-paced environment make your heart leap? Do you want your technical abilities to be challenged every day, and for your work to make a difference in the lives of millions of people? If so, the iOS Systems Carrier Services team is looking for a talented software engineer who is not afraid to share knowledge, think outside the box, and question assumptions. Please apply here.

  • Asana. As an infrastructure engineer you will be designing software to process, query, search, analyze, and store data for applications that are continually growing in scale. You will work with a world-class team of engineers on deploying and operating existing systems, and building new ones for problems that are unique to our problem space. Please apply here.

  • Operations Engineer - AWS Cloud. Want to grow and extend a cutting-edge cloud deployment? Take charge of an innovative 24x7 web service infrastructure on the AWS Cloud? Join DreamBox Learning’s creative team of engineers, designers, and educators. Help us radically change education in an environment that values collaboration, innovation, integrity and fun. Please apply here. http://www.dreambox.com/careers

  • Chartbeat measures and monetizes attention on the web. Our traffic numbers are growing, and so is our list of product and feature ideas. That means we need you, and all your unparalleled backend engineer knowledge to help up us scale, extend, and evolve our infrastructure to handle it all. If you've these chops: www.chartbeat.com/jobs/be, come join the team!

  • The Salesforce.com Core Application Performance team is seeking talented and experienced software engineers to focus on system reliability and performance, developing solutions for our multi-tenant, on-demand cloud computing system. Ideal candidate is an experienced Java developer, likes solving real-world performance and scalability challenges and building new monitoring and analysis solutions to make our site more reliable, scalable and responsive. Please apply here.

  • Sr. Software Engineer - Distributed Systems. Membership platform is at the heart of Netflix product, supporting functions like customer identity, personalized profiles, experimentation, and more. Are you someone who loves to dig into data structure optimization, parallel execution, smart throttling and graceful degradation, SYN and accept queue configuration, and the like? Is the availability vs consistency tradeoff in a distributed system too obvious to you? Do you have an opinion about asynchronous execution and distributed co-ordination? Come join us

  • Java Software Engineers of all levels, your time is now. Blizzard Entertainment is leveling up its Battle.net team, and we want to hear from experienced and enthusiastic engineers who want to join them on their quest to produce the most epic customer-facing site experiences possible. As a Battle.net engineer, you'll be responsible for creating new (and improving existing) applications in a high-load, high-availability environment. Please apply here.

  • Human Translation Platform Gengo Seeks Sr. DevOps Engineer. Build an infrastructure capable of handling billions of translation jobs, worked on by tens of thousands of qualified translators. If you love playing with Amazon’s AWS, understand the challenges behind release-engineering, and get a kick out of analyzing log data for performance bottlenecks, please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • OmniTI has a reputation for scalable web applications and architectures, but we still lean on our friends and peers to see how things can be done better. Surge started as the brainchild of our employees wanting to bring the best and brightest in Web Operations to our own backyard. Now in its fifth year, Surge has become the conference on scalability and performance. Early Bird rate in effect until 7/24! 

  • FoundationDB has announced a new course on concurrency which is free and fully browser-accessible. The course is targeted at developers who are familiar with the FoundationDB Key-Value Store API and want to achieve high throughput in their applications.
Cool Products and Services
  • Now track your log activities with Log Monitor and be on the safe side! Monitor any type of log file and proactively define potential issues that could hurt your business' performance. Detect your log changes for: Error messages, Server connection failures, DNS errors, Potential malicious activity, and much more. Improve your systems and behaviour with Log Monitor.

  • The NoSQL "Family Tree" from Cloudant explains the NoSQL product landscape using an infographic. The highlights: NoSQL arose from "Big Data" (before it was called "Big Data"); NoSQL is not "One Size Fits All"; Vendor-driven versus Community-driven NoSQL.  Create a free Cloudant account and start the NoSQL goodness

  • Finally, log management and analytics can be easy, accessible across your team, and provide deep insights into data that matters across the business - from development, to operations, to business analytics. Create your free Logentries account here.

  • CopperEgg. Simple, Affordable Cloud Monitoring. CopperEgg gives you instant visibility into all of your cloud-hosted servers and applications. Cloud monitoring has never been so easy: lightweight, elastic monitoring; root cause analysis; data visualization; smart alerts. Get Started Now.

  • Aerospike in-Memory NoSQL database is now Open Source. Read the news and see who scales with Aerospike. Check out the code on github!

  • consistent: to be, or not to be. That’s the question. Is data in MongoDB consistent? It depends. It’s a trade-off between consistency and performance. However, does performance have to be sacrificed to maintain consistency? more.

  • Do Continuous MapReduce on Live Data? ScaleOut Software's hServer was built to let you hold your daily business data in-memory, update it as it changes, and concurrently run continuous MapReduce tasks on it to analyze it in real-time. We call this "stateful" analysis. To learn more check out hServer.

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: High Scalability

Scaling the World Cup - How Gambify runs a massive mobile betting app with a team of 2

High Scalability - Mon, 2014-07-07 16:00

This is a guest post by Elizabeth Osterloh and Tobias Wilke of cloudControl.

Startups face very different issues than big companies when they build software. Larger companies develop projects over much longer time frames and often have entire IT-departments to support them in creating customized architecture. It’s an entirely different story when a startup has a good idea, it gets popular, and they need to scale fast.

This was the situation for Gambify, an app for organizing betting games released just in time for the soccer World Cup. The company was founded and is run in Germany by only two people. When they managed to get a few major endorsements (including Adidas and the German team star Thomas Müller), they had to prepare for a sudden deluge of users, as well as very specific peak times.

The Gambify App: Basic Architecture
Categories: High Scalability

Stuff The Internet Says On Scalability For July 4th, 2014

High Scalability - Fri, 2014-07-04 16:00

Hey, it's HighScalability time:


Beauty is everywhere. Household dust magnified 22 million times.
  • Let's play a game of guess the company. They have: >100 billion searches per month; > 60 trillion known URLs; > 50 billion facts in knowledge graph; > 100 hours of video uploaded every minute; > 2 billion containers; > 6 trillion Cloud Datastore ops/month. Who is it? Why it's Google, of course. 
  • Billions of events every day: Twitter. One billion active users: Android.
  • Quotable quotes:
    • PeterGriffin: I don't know why the author called this "Multi-process architectures suck :(" when he really meant "I suck at multi-process architectures :("
    • @khrabrov: Experienced startup engineers are looking for a full-stack Business Guy to be CEO, COO, PM, marketer, account manager, HR, and receptionist.
    • @PatrickMcFadin: 30x perf over #hadoop by running #spark over #cassandra The crowd was stunned. 
    • @jcoglan: A programmer is someone who can simultaneously entertain the ideas that tight coupling is bad and fridges should be connected to the 'net
    • @BenedictEvans: Consumers spend more on apps (~$20bn run rate) than on recorded music ($17bn). 
    • @solarce: "You achieve nirvana when all failures are viewed as normal operations and not as apocalyptic events"
    • Rudiger Moller: Yup. As memory keeps getting cheaper, Java cannot profit except going off heap or use Azul Zing. Either improve concurrent GC or reduce the amount of references required to model data structures in Java.
    • @PatrickMcFadin: OH: "idompotency is better than beefalo"
  • I started listening to Songza about 6 weeks ago. Loved its emotional intelligence. And now I find Google went and acquired it. A coincidence? This is not a case of megalomania. It occurred to me that Google is in the perfect position to let some algorithms loose on its data to see if a service like Songza is gaining mind share. If you look at DNS access, G+, Gmail, Chrome, web trends, etc you have a pretty good proxy for actual usage data. In fact, your algorithms could just look at everything and identify acquisition targets by ranking what services are rising above the noise. And in double fact Google can probably estimate future growth trends better than Songza because they have historical data on many other services. 
  • Concurrency Improvements in HyperLevelDB. Taking single threaded code and making multithreaded is not for the faint of heart. Deadlocks await each new access pattern. By reducing time locks are held, using lock free data structures, and using fine grained locking HyperDex was able to reach 400K operations per second, better than LevelDB's 275K operations per second.
  • The Lambda Architecture has nothing to do with The Secret, in case you were wondering. To see why Jay Kreps has an excellent article Questioning the Lambda Architecture based on his experiences at LinkedIn. The main objection is double processing, concluding: These days, my advice is to use a batch processing framework like MapReduce if you aren’t latency sensitive, and use a stream processing framework if you are, but not to try to do both at the same time unless you absolutely must. Great discussion in the comment section. For me it's as simple as never mix read and write streams. They have completely different purposes. More on Hacker News.
  • Videos from the Velocity Conference 2014 on YouTube.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so keep on going)...

Categories: High Scalability

Why does data need to have sex?

High Scalability - Wed, 2014-07-02 16:00

Data needs the ability to combine with other data in new ways to reach maximum value. So data needs to have the equivalent of sex.

That's why I used sex in the title of my previous article, Data Doesn't Need To Be Free, But It Does Need To Have Sex. So it wasn't some sort of click-bait title as some have suggested.

Sex is nature's way of bringing different data sets together, that is our genome, and creating something new that has a chance to survive and thrive in changing environments.

Currently data is cloistered behind Walled Gardens and thus has far less value than it could have. How do we coax data from behind these walls? With money. So that's where the bit about "data doesn't need to be free" comes from. How do we make money? Through markets. What do we have as a product to bring to market? Data. What do services need to keep producing data as a product? Money.

So it's a virtuous circle. Services generate data from their relationship with users. That data can be sold for the money services need to make a profit. Profit keeps the service that users like running. A running service  produces even more data to continue the cycle.

Why do we even care about data having a sex?

Historically one lens we can use to look at the world is to see everything in terms of how resources have been exploited over the ages. We can see the entire human diaspora as largely being determined by the search for and exploitation of different resource reservoirs.

We live near the sea for trade and access to fisheries. Early on we lived next to rivers for water, for food, for transportation, and later for power. People move to where there is lumber to harvest, gold to mine, coal to mine, iron to mine, land to grow food, steel to process, and so on. Then we build roads, rail roads, canals and ports to connect resource reservoirs to consumers.

In Nova Scotia, where I've been on vacation, a common pattern was for England and France to fight each other over land and resources. In the process they would build forts, import soldiers, build infrastructure, and make it relatively safe to trade. These forts became towns which then became economic hubs. We see these places as large cities now, like Halifax Nova Scotia, but it's the resources that came first.

When you visit coves along the coast of Nova Scotia they may tell you with interpretive signage, spaced out along a boardwalk, about the boom and bust cycles of different fish stocks as they were discovered, exploited, and eventually fished out.

In the early days in Nova Scotia great fortunes were made on cod. Then when cod was all fished out other resource reservoirs like sardines, halibut, and lobster were exploited. Atlantic salmon was over fished. Production moved to the Pacific where salmon was once again over fished. Now a big product is scallops and what were once trash fish, like redfish, is now the next big thing because that's what's left.

During these cycles great fortunes were made. But when a resource runs out people move on and find another. And when that runs out people move on and keep moving on until they find a place to make make living.

Places associated with old used up resources often just fade away. Ghosts of the original economic energy that created them. As a tourist I've noticed what is mined now as a resource is the history of the people and places that were created in the process of exploiting previous resources. We call it tourism.

Data is a resource reservoir like all the other resource reservoirs we've talked about, but data is not being treated like a resource. It's as if forts and boats and fishermen all congregated to catch cod, but then didn't sell the cod on an open market. If that were the case limited wealth would have been generated, but because all these goods went to market as part of a vast value chain, a decent living was made by a great many people.

If we can see data as a resource reservoir, as natural resources run out, we'll be able to switch to unnatural resources to continue the great cycle of resource exploitation.

Will this work? I don't know. It's just a thought that seems worth exploring.

Categories: High Scalability

Data Doesn't Need to Be Free, But it Does Need to Have Sex

High Scalability - Mon, 2014-06-30 16:32

How do we pay for the services we want to create and use? That is the question. Systems like Twitter, Instagram, Pinterest and all the other services you love are not cheap to build at scale. Grow now and figure out your business model later as the VC funding disappears, like hope, is not a sustainable strategy. If we want new services that stick around we are going to have to figure out a way for them to make money.

I’m going to argue here that a business model that could make money for software companies, while benefiting users, is creating an open market for data. Yes, your data. For sale. On an open market. For anyone to buy. Privacy is dead. Isn’t it time we leverage the death of privacy for our own gain?

The idea is to create an ecosystem around the production, consumption, and exploitation of data so that all the players can get the energy they need to live and prosper.

The proposed model:

Categories: High Scalability

The Secret of Scaling: You Can't Linearly Scale Effort with Capacity

High Scalability - Wed, 2014-06-25 15:56

The title is a paraphrase of something Raymond Blum, who leads a team of Site Reliability Engineers at Google, said in his talk How Google Backs Up the Internet. I thought it a powerful enough idea that it should be pulled out on its own:

Mr. Blum explained common backup strategies don’t work for Google for a very googly sounding reason: typically they scale effort with capacity.

If backing up twice as much data requires twice as much stuff to do it, where stuff is time, energy, space, etc., it won’t work, it doesn’t scale. 

You have to find efficiencies so that capacity can scale faster than the effort needed to support that capacity.

A different plan is needed when making the jump from backing up one exabyte to backing up two exabytes.

When you hear the idea of not scaling effort with capacity it sounds so obvious that it doesn't warrant much further thought. But it's actually a profound notion. Worthy of better treatment than I'm giving it here:

Categories: High Scalability

Sponsored Post: Apple, Chartbeat, Monitis, Netflix, Salesforce, Blizzard Entertainment, Cloudant, CopperEgg, Logentries, Wargaming.net, PagerDuty, Gengo, ScaleOut Software, Couchbase, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, Ma

High Scalability - Tue, 2014-06-24 15:58

Who's Hiring?

  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here.
    • Mobile Services Software Engineer. The Emerging Technologies/Mobile Services team is looking for a proactive and hardworking software engineer to join our team. The team is responsible for a variety of high quality and high performing mobile services and applications for internal use. Please apply here
    • Senior Software Engineer. Join Apple's Internet Applications Team, within the Information Systems and Technology group, as a Senior Software Engineer. Be involved in challenging and fast paced projects supporting Apple's business by delivering Java based IS Systems. Please apply here.
    • Sr Software Engineer. Join Apple's Internet Applications Team, within the Information Systems and Technology group, as a Senior Software Engineer. Be involved in challenging and fast paced projects supporting Apple's business by delivering Java based IS Systems. Please apply here.
    • Senior Security Engineer. You will be the ‘tip of the spear’ and will have direct impact on the Point-of-Sale system that powers Apple Retail globally. You will contribute to implementing standards and processes across multiple groups within the organization. You will also help lead the organization through a continuous process of learning and improving secure practices. Please apply here.
    • Quality Assurance Engineer - Mobile Platforms. Apple’s Mobile Services/Emerging Technology group is looking for a highly motivated, result-oriented Quality Assurance Engineer. You will be responsible for overseeing quality engineering of mobile server and client platforms and applications in a fast-paced dynamic environment. Your job is to exceed our business customer's aggressive quality expectations and take the QA team forward on a path of continuous improvement. Please apply here.

  • Chartbeat measures and monetizes attention on the web. Our traffic numbers are growing, and so is our list of product and feature ideas. That means we need you, and all your unparalleled backend engineer knowledge to help up us scale, extend, and evolve our infrastructure to handle it all. If you've these chops: www.chartbeat.com/jobs/be, come join the team!

  • The Salesforce.com Core Application Performance team is seeking talented and experienced software engineers to focus on system reliability and performance, developing solutions for our multi-tenant, on-demand cloud computing system. Ideal candidate is an experienced Java developer, likes solving real-world performance and scalability challenges and building new monitoring and analysis solutions to make our site more reliable, scalable and responsive. Please apply here.

  • Sr. Software Engineer - Distributed Systems. Membership platform is at the heart of Netflix product, supporting functions like customer identity, personalized profiles, experimentation, and more. Are you someone who loves to dig into data structure optimization, parallel execution, smart throttling and graceful degradation, SYN and accept queue configuration, and the like? Is the availability vs consistency tradeoff in a distributed system too obvious to you? Do you have an opinion about asynchronous execution and distributed co-ordination? Come join us

  • Java Software Engineers of all levels, your time is now. Blizzard Entertainment is leveling up its Battle.net team, and we want to hear from experienced and enthusiastic engineers who want to join them on their quest to produce the most epic customer-facing site experiences possible. As a Battle.net engineer, you'll be responsible for creating new (and improving existing) applications in a high-load, high-availability environment. Please apply here.

  • Engine Programmer - C/C++. Wargaming|BigWorld is seeking Engine Programmers to join our team in Sydney, Australia. We offer a relocation package, Australian working visa & great salary + bonus. Your primary responsibility will be to work on our PC engine. Please apply here

  • Human Translation Platform Gengo Seeks Sr. DevOps Engineer. Build an infrastructure capable of handling billions of translation jobs, worked on by tens of thousands of qualified translators. If you love playing with Amazon’s AWS, understand the challenges behind release-engineering, and get a kick out of analyzing log data for performance bottlenecks, please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Your event here.
Cool Products and Services
  • Now track your log activities with Log Monitor and be on the safe side! Monitor any type of log file and proactively define potential issues that could hurt your business' performance. Detect your log changes for: Error messages, Server connection failures, DNS errors, Potential malicious activity, and much more. Improve your systems and behaviour with Log Monitor.

  • The NoSQL "Family Tree" from Cloudant explains the NoSQL product landscape using an infographic. The highlights: NoSQL arose from "Big Data" (before it was called "Big Data"); NoSQL is not "One Size Fits All"; Vendor-driven versus Community-driven NoSQL.  Create a free Cloudant account and start the NoSQL goodness

  • Finally, log management and analytics can be easy, accessible across your team, and provide deep insights into data that matters across the business - from development, to operations, to business analytics. Create your free Logentries account here.

  • CopperEgg. Simple, Affordable Cloud Monitoring. CopperEgg gives you instant visibility into all of your cloud-hosted servers and applications. Cloud monitoring has never been so easy: lightweight, elastic monitoring; root cause analysis; data visualization; smart alerts. Get Started Now.

  • PagerDuty helps operations and DevOps engineers resolve problems as quickly as possible. By aggregating errors from all your IT monitoring tools, and allowing easy on-call scheduling that ensures the right alerts reach the right people, PagerDuty increases uptime and reduces on-call burnout—so that you only wake up when you have to. Thousands of companies rely on PagerDuty, including Netflix, Etsy, Heroku, and Github.

  • Aerospike in-Memory NoSQL database is now Open Source. Read the news and see who scales with Aerospike. Check out the code on github!

  • consistent: to be, or not to be. That’s the question. Is data in MongoDB consistent? It depends. It’s a trade-off between consistency and performance. However, does performance have to be sacrificed to maintain consistency? more.

  • Do Continuous MapReduce on Live Data? ScaleOut Software's hServer was built to let you hold your daily business data in-memory, update it as it changes, and concurrently run continuous MapReduce tasks on it to analyze it in real-time. We call this "stateful" analysis. To learn more check out hServer.

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: High Scalability

Performance at Scale: SSDs, Silver Bullets, and Serialization

High Scalability - Mon, 2014-06-23 16:00

This is a guest post by Aaron Sullivan, Director & Principal Engineer at Rackspace.

We all love a silver bullet. Over the last few years, if I were to split the outcomes that I see with Rackspace customers who start using SSDs, the majority of the outcomes fall under two scenarios. The first scenario is a silver bullet—adding SSDs creates near-miraculous performance improvements. The second scenario (the most common) is typically a case of the bullet being fired at the wrong target—the results fall well short of expectations.

With the second scenario, the file system, data stores, and processes frequently become destabilized. These demoralizing results, however, usually occur when customers are trying to speed up the wrong thing.

A common phenomena at the heart of the disappointing SSD outcomes is serialization. Despite the fact that most servers have parallel processors (e.g. multicore, multi-socket), parallel memory systems (e.g. NUMA, multi-channel memory controllers), parallel storage systems (e.g. disk striping, NAND), and multithreaded software, transactions still must happen in a certain order. For some parts of your software and system design, processing goes step by step. Step 1. Then step 2. Then step 3. That’s serialization.

And just because some parts of your software or systems are inherently parallel doesn’t mean that those parts aren’t serialized behind other parts. Some systems may be capable of receiving and processing thousands of discrete requests simultaneously in one part, only to wait behind some other, serialized part. Software developers and systems architects have dealt with this in a variety of ways. Multi-tier web architecture was conceived, in part, to deal with this problem. More recently, database sharding also helps to address this problem. But making some parts of a system parallel doesn’t mean all parts are parallel. And some things, even after being explicitly enhanced (and marketed) for parallelism, still contain some elements of serialization.

How far back does this problem go? It has been with us in computing since the inception of parallel computing, going back at least as far as the 1960s(1). Over the last ten years, exceptional improvements have been made in parallel memory systems, distributed database and storage systems, multicore CPUs, GPUs, and so on. The improvements often follow after the introduction of a new innovation in hardware. So, with SSDs, we’re peering at the same basic problem through a new lens. And improvements haven’t just focused on improving the SSD, itself. Our whole conception of storage software stacks is changing, along with it. But, as you’ll see later, even if we made the whole storage stack thousands of times faster than it is today, serialization will still be a problem. We’re always finding ways to deal with the issue, but rarely can we make it go away.

Parallelization and Serialization
Categories: High Scalability

Migrating to XtraDB Cluster in EC2 at PagerDuty

High Scalability - Mon, 2014-06-16 16:01

This is a guest post by Doug Barth, a software generalist who has currently found himself doing operations work at PagerDuty. Prior to joining PagerDuty, he worked for Signal in Chicago and Orbitz, an online travel company.

A little over six months ago, PagerDuty switched its production MySQL database to XtraDB Cluster running inside EC2. Here's the story of why and how we made the change.

How the Database Looked Before

PagerDuty's MySQL database was a fairly typical deployment. We had:

  • A pair of Percona Servers writing data to a DRBD-backed volume.

  • EBS disks for both the primary and secondary servers backing the DRBD volume.

  • Two synchronous copies of the production database. (In the event of a failure of the primary, we wanted to be able to quickly switch to the secondary server without losing any data.)

  • A number of asynchronous replication slaves for disaster recovery, backups and accidental modification recovery.

Problems With the Old Setup
Categories: High Scalability
Syndicate content