High Scalability

Sponsored Post: Contentful, Stream, Loupe, New York Times, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Tue, 2017-01-17 18:05

Who's Hiring?
  • Contentful is looking for a JavaScript BackEnd Engineer to join our team in their mission of getting new users - professional developers - started on our platform within the shortest time possible. We are a fun and diverse family of over 100 people from 35 nations with offices in Berlin and San Francisco, backed by top VCs (Benchmark, Trinity, Balderton, Point Nine), growing at an amazing pace. We are working on a content management developer platform that enables web and mobile developers to manage, integrate, and deliver digital content to any kind of device or service that can connect to an API. See job description.

  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com
Fun and Informative Events
  • Your event here!
Cool Products and Services
  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Categories: High Scalability

Wouldn't it be nice if everyone knew a little queuing theory?

High Scalability - Tue, 2017-01-17 16:56

After many days of rain one lane of this two lane road collapsed into the canyon. It's been out for a month and it will be many more months before it will be fixed. Thanks to Google maps way too many drivers take this once sleepy local road. 

How do you think drivers go through this chokepoint? 

 

 

One hundred experience points to you if you answered one at a time.

One at a time! Through a half-duplex pipe following a first in first out discipline takes forever!

Yes, there is a stop sign. And people default to this mode because it appeals to our innate sense of fairness. What could be fairer than alternating one at a time?

The problem is it's stupid.

While waiting, stewing, growing angrier, I often think if people just knew a little queueing theory we could all be on our way a lot faster.

We can't make the pipe full duplex, so that's out. Let's assume there's no priority involved, vehicles are roughly the same size and take roughly the same time to transit the network. Then what do you do?

Why can't people figure out its faster to drive through in batches? If we went in groups of say, three, the throughput would be much higher. And when one side's queue depth grows larger because people are driving to or from work that side's batch size should increase. 

Since this condition will last a long time we have a possibility to learn because the same people take this road all the time. So what happens if you try to change the culture by showing people what a batch is by driving right behind someone as they take their turn?

You got it. Honking. There's a simple heuristic, a deeply held ethic against line cutting, so people honk, flip you off, and generally make heir displeasure known.

It's your classic battle of reason versus norms. The smart thing is the thing we can't do by our very natures. So we all just keep doing the dumb thing.

 

Categories: High Scalability

Stuff The Internet Says On Scalability For January 13th, 2017

High Scalability - Fri, 2017-01-13 16:56

Hey, it's HighScalability time:

 

So you think you're early to market! The Man Who Invented VR Goggles 50 Years Too Soon
If you like this sort of Stuff then please support me on Patreon.
  • 99.9: Percent PCs cheaper than in 1980; 300x20 miles: California megaflood; 7.5 million: articles published on Medium; 1 million: Amazon paid eBook downloads per day; 121: pages on P vs. NP; 79%: Americans use Facebook; 1,600: SpaceX satellites to fund a city on Mars; 

  • Quotable Quotes:
    • @GossiTheDog: How corporate security works: A) buy a firewall B) add a rule allowing all traffic C) the end How corporate security works:A) buy a firewall B) add a rule allowing all traffic C) the end
    • @caitie: Distributed Systems PSA: your regular reminder that the operational cost of a system should be included & considered when designing a system
    • @jimpjorps: 1998: the internet means you can "telecommute" to a tech job from anywhere on Earth 2017: everyone works in the same one square mile of SF
    • Jessi Hempel: [re: BitTorrent] Perhaps the lesson here is that sometimes technologies are not products. And they’re not companies. They’re just damn good technologies.
    • giltene: My new pet peeve: "how to make X faster: do less of X" recommendations.
    • peterwwillis: It used to be you had to actually break into a system to exfiltrate all its data. Now you just make an HTTP query.
    • Laralyn McWillams: Identify problems but focus on solutions. If you become more about problems than solutions, that negativity infects your work, your team, and how you think about your career.
    • Chris Fox: Apple is 100% a boutique retailer, meaning that a human chooses which books to promote. Without that, there was no organic discovery tool where readers could find your book.
    • vytah: In fact, the 1986 [Chernobyl] disaster happened because the engineers decided to get rid of safeguards and run tests.
    • Eric Elliott: Breaking into a user’s top 5 apps is like getting struck by lightning or winning the lottery. Don’t bank on it.
    • Peter: I say the super-intelligent aliens will be powered by hyper-computation, a technology that makes our concept of computation look like counting on your fingers; and they’ll have not only qualia, but hyper-qualia, experiential phenomenologica whose awesomeness we cannot even speak of.
    • SEJeff: LVS is pretty much the undisputed king for serious business load balancing. I've heard (anecdotally) that Uber uses gorb[1] and google has released seesaw, which are both fancy wrappers ontop of LVS for load balancing.
    • k__: I have the feeling this is haunting my life. Jobs, relationships, everything. When I got something, it didn't feel that hard to get it. When I try to get something it feels impossible.
    • Nelson Elhage: One of my favorite concepts when thinking about instrumenting a system to understand its overall performance and capacity is what I call “time utilization”. By this I mean: If you look at the behavior of a thread over some window of time, what fraction of its time is spent in each “kind” of work that it does?
    • Bart Sano (Google): I can say that we are committed to the choice of these different architectures, including X86 – and that includes AMD – as well as Power and ARM. The principle that we are investing in heavily is that competition breeds innovation, 
    • aaron-lebo: This is a larger issue with developer burnout I suspect. You master one thing and there's someone standing on the corner saying..."well, actually, I've got something better" and there's a very real anxiety in that evaluation process. Does object-oriented programming suck? Are functional languages the future? Do you really want an SPA? Should you replace your C codebase with Rust... or Go? Is Bitcoin worth getting in on? etc etc
    • StorageMojo: [re: Violin’s bankruptcy] The race is not always to the swift, nor riches to the wise. By starting with software, other companies built an early lead, and now have the money and time to optimize hardware for flash.
    • nocarrier: [Why no datacenters in India?] Cost was a smaller factor than politics; the Indian government wanted the private keys for our certs in order to let FB put a POP there. That was an absolute dealbreaker, so we served India from Singapore and other POPs in nearby countries.
    • RDX: So that original post, although long and full of real examples, was not about Javascript fatigue really. Its change fatigue. Let’s be clear, if you’re picking something new, you’re making a conscious choice to grow up with it.
    • @jamesurquhart: Amazing that emergent tech that’ll revolutionize software dev is already almost a commodity utility service. #streaming #serverless #events

  • The Ethics of Autonomous Cars. The obvious revenue model is highest bidder lives. During the first few milliseconds of a crash response a real-time bidding session is created and the lowest bidder assumes the risk. That at least captures the zeitgeist of the times.

  • First Go. Now poker. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker. Thank the force humans are still unbeatable at Sabacc. 

  • Medium may be the first YA (Young Adult, think Hunger Games) style publishing outlet. YA is often written in first-person present. It's a good way to fake authenticity. Traditional publications use third-person past tense, but that's not what works best on Medium. What I learned from analyzing the top 252 Medium stories of 2016: The words “you” and “I” were by far the most common, which suggests that addressing the reader directly as an individual person is a better writing strategy than writing in third person.

  • Ben Kehoe says AWS Step Functions is not the cheap, high-scale state machines using an event-driven paradigm he has been looking for. FaaS is stateless, and AWS Step Functions provides state as-a-Service: at $0.025 per 1,000 executions, it’s 125 times more expensive per invocation than Lambda; it’s not going to be cost-effective to replace existing roll-your-own Lambda solutions; the default throttling limit for a state machine is two executions per second...it’s not built to handle massively scaled but transient event scheduling.

  • Ransomware has shifted to being a reproducible strategy. @SteveD3Since I fist covered the MongoDB hacking on Jan 3, the number of compromised DBs has surpassed 32,000. Now possibly Elasticsearch. Anything you can find basically with Shodan. Which is why we now have @GossiTheDog: Found out today firms have started doing legal contracts which specifically rule out liability if they get hit by ransomware, naming it.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Stuff The Internet Says On Scalability For January 6th, 2017

High Scalability - Fri, 2017-01-06 16:56

Hey, it's HighScalability time:

 

Hot rods in space. The Smith Cloud plummets towards our galaxy at nearly 700,000 mph. Vroom!
If you like this sort of Stuff then please support me on Patreon.
  • 3 of top 5: Stackoverflow questions are about Git; 3,000: four-passenger cars could serve 98 percent of NYC taxi demand; 44%: US population lives within 20 miles of Amazon fulfillment center; 72%: Amazon customers shopped using mobile device; 110%: increase in industrial control system attacks; 455: Number of scripted television series aired this year; $28.5 billion/yr: App downloads on iOS;

  • Quotable Quotes:
    • @ValaAfshar: Number of robots working in Amazon warehouses: 2016: 45,000 / 2015: 30,000 2014: 15,000 / 2013: 1,000 — @JonErlichman
    • @jason_kint: updated duopoly #s. new IAB data came out yesterday. easy to run vs earnings for goog and fb, it's evident everyone else is zero sum game. 
    • rb2k_: I also haven't seen one [company in Germany] that isn't riddled with MBA grads that mainly push Jira tickets around.
    • Joe McCann: The best software developers I know are always hacking over the holidays. True story.
    • @kaffeecoder: Sigh. Async vs blocking protocol is irrelevant. What matters is communicating with other services outside your own req/response cycle.
    • Eric Jang: It's not a coincidence that Nvidia, the literal arms-dealer of deep learning, has had a good year in the stock market.
    • @markimbriaco: Just read a comment that said "Any good codebase has every part perfectly isolated". Oh, to be young and optimistic about software again.
    • @swardley: Asked "What do I think is the biggest impact AI would have?" ... hmmm, the largest erosion of social mobility in human history?
    • The Attention Merchants: It is therefore more effective for the State to intervene before options are seen to exist. This creates less friction with the State but requires a larger effort: total attention control.
    • StorageMojo: The cloud’s collateral damage to the legacy IT vendors continues to spread. A few billion here and a few billion there, and pretty soon you’re talking real money.
    • Janakiram MSV: The key takeaway is that Amazon wants enterprises to consume EC2 while it is pushing startups and developers towards Lambda. This move from Amazon will fuel the growth of serverless computing in the industry. 
    • Maxime Chevalier-Boisvert: Edsger Dijkstra famously said, “The question of whether machines can think is about as relevant as the question of whether submarines can swim.”
    • @karlseguin: Microservices without asynchronous messaging (queues) is actually a monolith with really slow and error prone method invocation.
    • AshleysBrain: We've been using WebRTC Datachannels for multiplayer gaming in the browser in our game editor Construct 2 (www.scirra.com) for a couple of years now. Generally they work great! However the main problem we have is switching tab suspends the game, which if you're acting as the host, freezes the game for everybody. This is really inconvenient. 
    • @lstoll: 2017: Year of the return of three tier architecture.
    • @tealtan: “I will never make a racial profiling database!” *continues working on social networks, analytics, ad tech*
    • @abt_programming: Inverse bus factor: "how many developers have to be hit by a bus before a project starts to proceed smoothly?” - @gasproni
    • M.G. Siegler: The numbers speak for themselves. 2 billion words written on Medium in the last year. 7.5 million posts during that time. 60 million monthly readers now. Pageviews galore. So step 2 is simply to slap some banner ads on the site, while step 3 is to profit, right?
    • snarf21: Writing software is hard but to me the hardest part is always taking a random abstract concept from someone's mind (or worse, several people) and converting that into something "real" in a fixed timeline and budget. There will have to be lots of tradeoffs and miscues by definition. We are always making something that doesn't already exist, it is creation and creation is hard.
    • @Pinboard: Who could have foreseen the always-on home microphone might be of interest to the cops?
    • @ThePracticalDev: I heard a rumor that Santa moved over to AWS this year. Big if true.
    • Drew Purves~ “intelligence” extends beyond brains; something as simple as self-replicating RNA exhibit intelligent behavior at the evolutionary scale. The natural world is fractal, cyclic, and fuzzy...in a biosphere, every organism is a resource to another organism. That is, learning and adaptation of each organism is not independent of other organisms
    • @meatcomputer: System clocks are always accurate and increase monotonically. Timestamps from remote machines are reliable
    • pjmlp: Everything on web development feels like an hack.
    • @kelseyhightower: In my opinion Serverless does not mean FaaS. I consider any platform that hides the management of servers from the user to be Serverless.
    • Amit: one of the lessons I learned from this journey was that the tutorials work best when I've needed that technology for a real project
    • @Carnage4Life: Snapchat' copied all the worst parts of Apple's culture & seen success. More copycats to come
    • ch: So all that's missing with the decentralized web is a centralized service to aggregate the decentralized streams?
    • @mathiasverraes: "Separation of intent and implementation" is probably a much more useful programming principle than all of SOLID combined.
    • doh: We moved back and forth between AWS and GCE (based on who gave us free credits). Once we ran out, we chose GCE and never regretted it. GCE has many quirks, for instance the inconsistency between API and the UI, it misses the richness of the services offered by AWS but everything GCE does offer is just faster, more stable and much more consistent.
    • Exponential Laws: We have argued that exponential growth would not have succeeded without sustained exponential growth at three levels of the computing ecosystem—chip, system, and adopting community. Growth (progress) feeds on itself up to the inflection point.

  • Measuring a gnat's eyebrow at a billion miles. Ivan Linscott tells the thrilling story behind the development of the New Horizons probe to Pluto. a16z Podcast: New Year, New Horizons — Pluto!  Completing the probe was a close thing. Finding enough plutonium to power system almost didn't happen. Enough wasn't found so the probe had a much lower power budget than originally spec'ed, which caused the communication system to use one FPGA instead of two. You have to use radiation hardened parts. The chips sit right next to a pile of plutonium pumping out gamma rays and neutrons. The FPGA's had a capacity of a million gates, were hardened by design, and had triple redundancy. Each gate in the array is implemented in threes. They are voted in pairs. If three agree then fine. If two agree that's the value used. They fit all the code with 5 gates margin. They also had a hero's journey sourcing a high precision oscillator. And then the frightening story of when the watchdog timer timedout and put the probe in safe mode. It turned out the JPEG compression algorithm took too long to compress an image of Pluto and that caused the timeout to fire. The reason is one of those crazy testing stories. When this feature was tested the picture of the sky was darker so it took less time to compress!

  • The impulse for folks at Twitter to delay Trump's tweets and insider trade on that information must be overwhelming.

  • 33C3 (Chaos Computer Congress) videos are now available. Great overview by Chris Hager. Lots of interesting talks. You might like: Dissecting modern (3G/4G) cellular modems; Edible Soft Robotics - An exploration of candy as an engineered material; Software Defined Emissions - A hacker’s review of Dieselgate; Rebel Cities - Towards A Global Network Of Neighbourhoods And Cities Rejecting Surveillance.

  • A compelling break down of the DNC phishing attack. Making everything viewable through a generic UI and everything programmable through a scriptable API has interesting consequences @pwnallthethings: Could have hacked? Sure. Did hack? No. Let me go through why not..The hackers weren't hacking one-by-one; so URL contraction wasn't done manually. It was done via the Bitly API...Why did the hackers include this info? Same reason they contracted links via API. Because they're not hacking 1-by-1. Are hacking at scale...When hackers hack at scale, they reuse infrastructure. They make mistakes. This isn't unusual. You can piece the bits together.

  • In the game of data you want to be at the top of the data gravity well. When your are down well nothing escapes without great cost. AWS Snowball

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

How is Writing Lord of the Rings Like Writing Software?

High Scalability - Wed, 2017-01-04 17:16

 

Have you ever read a book and wondered how any human could have written something so brilliant? For me it was Lord of the Rings. I despaired that in a hundred lifetimes I could never write a book so rich, so deep, so beautiful. Since then I've learned a few things about about how LoTr was created that has made me reconsider. The kick-in-the-head is that it's the same lesson I learned long ago about writing software.

I've always been amazed how a program can start as a single source file and after years of continued effort turn into a working system that is so large no human can come close to understanding it. If you had tried from the start to build the system you ended up with you would have never ever got there. That's just not how it works. Software is path dependent.

I've experienced this growth from a single cell to a Cambrian explosion many times so I know it's a thing. What I hadn't considered is how it's also a thing for writing books too. 

Creating good software is a process of evolution through the mechanism of constant iteration for the purpose of survival. This is also how good stories are made. What both have in common is creation through thought.

Thought needs an object to contemplate. Each intermediate state of a project is that object. By linking together a series of state inspired creative jumps something wonderful can be created that may contain only the faintest trace of its beginnings.

Here's how Lord of the Rings is a good example of this process...

LoTr started out as a sequel to the Hobbit. Tolkien's publisher wanted to cash in on the success of the Hobbit with a sequel. And The Silmarillion wasn't it. So Tolkien began with the intention of writing a sequel to the Hobbit. It was horrible. 

The first book title was The Return of the Shadow, not Lord of the Rings. The prose was still written for children. Frodo was called Bingo. Strider was a hobbit called Trotter. Bilbo planned to get married. And the ring was still just a ring. The story had no clear motive or direction. "What more can hobbits do?" asked Tolkein. The ideas of the Hobbit were played out. The LoTr we know and love was far far away. 

In draft after draft Tolkien probed and searched for a direction to take the story. It all turned when Tolkien wrote the scene with the Black Rider. At first the Black Rider was really a White Rider. It was Gandalf coming to talk to Bingo. But then some insight happened. A dizzying array of neurons conspired and the color of the horse changed from white to black and Gandalf transformed into a man wrapped in a great black cloak and hood. A new framework was creating itself.

How do we know? Fortunately, from Christopher Tolkien, we have the history of changes his father made to LoTr. Dr. Corey Olsen in a great series—The Return of the Shadow, Session 1 - In Search of a Sequel—walks us through what is essentially the git log for LoTr. Imagine a kind of Papers We Love treatment from a true Tolkien expert and gifted analyst. It's magical.

We see idea after idea worked through in the text. It was a continuous process of refactoring and new development. Some ideas were kept from beginning to end. Many were cut. Many morphed. Much dialogue was kept, but was given to different characters to say in different circumstances. 

The whole feeling was very much like seeing software being developed, only the result wasn't a working app, but one of the most influential stories of all time.

The lesson for me was a deep and dramatic reconfirmation of an old idea: All successful large systems started as successful small systems.

This applies to us as writers and programmers. It's easy to get down on yourself during the creation process. Neither your story or program has to start out great; greatness is something that evolves.

In this new year, that's the lesson of Lord of the Rings for me.

Categories: High Scalability

Sponsored Post: Loupe, New York Times, ScaleArc, Aerospike, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Tue, 2017-01-03 16:56

Who's Hiring?
  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com
Fun and Informative Events
  • Your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Categories: High Scalability

Efficient storage: how we went down from 50 PB to 32 PB

High Scalability - Mon, 2017-01-02 16:56

As the Russian rouble exchange rate slumped two years ago, it drove us to think of cutting hardware and hosting costs for the Mail.Ru email service. To find ways of saving money, let’s first take a look at what emails consist of.

Indexes and bodies account for only 15% of the storage size, whereas 85% is taken up by files. So, files (that is attachments) are worth exploring in more detail in terms of optimization. At that time, we didn’t have file deduplication in place, but we estimated that it could shrink the total storage size by 36% since many users receive the same messages, such as price lists from online stores or newsletters from social networks containing images and so on. In this article, I’m going to describe how we implemented a deduplication system under the supervision of Albert Galimov.

Categories: High Scalability

Stuff The Internet Says On Scalability For December 23rd, 2016

High Scalability - Fri, 2016-12-23 16:56

Hey, it's HighScalability time:

 

A wondrous ethereal mix of technology and art. Experience of "VOID"
If you like this sort of Stuff then please support me on Patreon.
  • 2+ billion: Google lines of code distributed over 9+ million source files; $3.6 bn: lower Google taxes using Dutch Sandwich; $14.6 billion: aggregate value of all cryptocurrencies; 2x: graphene-fed silkworms produce silk that conducts electricity; < 100: scientists looking for extraterrestrial life; 48: core Qualcomm server SoC; 455: original TV series in 2016;

  • Quotable Quotes:
    • Ben Thompson~ It's so easy to think of tech with an 80s mindset with all the upstarts. We still glorify people in garages. The garage is gone...Our position in the world is not the scrappy upstart. It is the establishment.
    • The Attention Merchants: True brand advertising is therefore an effort not so much to persuade as to convert. At its most successful, it creates a product cult, whose loyalists cannot be influenced by mere information
    • @seldo: Speed of development always wins. Performance problems will (eventually) get engineered away. This is nearly always how technology changes.
    • @evgenymorozov: How Silicon Valley can support basic income: give everyone a bot farm so that we can make advertising $ from fake traffic to their platforms
    • @avdi: Apple has 33 Github repos and 56 contributors. Microsoft now has ~1,200 repos and 2,893 contributors.
    • Peter Norvig: Understanding the brain is a fascinating problem but I think it’s important to keep it separate from the goal of AI which is solving problems ... If you conflate the two it’s like aiming at two mountain peaks at the same time—you usually end up in the valley between them .... We don’t need to duplicate humans ... We want humans and machines to partner and do something that they cannot do on their own.
    • Brave New Greek: Unbounded anything—whether its queues, message sizes, queries, or traffic—is a resilience engineering anti-pattern. Without explicit limits, things fail in unexpected and unpredictable ways. Remember, the limits exist, they’re just hidden. By making them explicit, we restrict the failure domain giving us more predictability, longer mean time between failures, and shorter mean time to recovery at the cost of more upfront work or slightly more complexity.
    • Naren Shankar (Expanse): Everybody feels like they can look at the show and find parts of themselves in it. When you can give people collective ownership of the creative product you get the best from people. At the end of the day it shows. People work their asses off and accomplish the impossible.
    • Richard Jones: a corollary of Moore’s law (sometimes called Rock’s Law). This states that the capital cost of new generations of semiconductor fabs is also growing exponentially
    • Waterloo: His [Napoleon] strategy was simple. It was to divide his enemies, then pin one down while the other was attacked hard and, like a boxing match, the harder he punched the quicker the result. Then, once one enemy was destroyed, he would turn on the next. The best defense for Napoleon in 1815 was attack, and the obvious enemy to attack was the closest.
    • Daniel Lemire: beyond a certain point, reducing the possibility of a fault becomes tremendously complicated and expensive… and it becomes far more economical to minimize the harm due to expected faults
    • @greglinden: “For some products at Baidu, the main purpose is to acquire data from users, not revenue.” — @stuhlmueller
    • strebler:  Deep Learning has made some very fundamental advances, but that doesn't mean it's going to make money just as magically!
    • sulam: Twitter clearly doesn't have growth magic (or they'd be growing faster) -- but is that an engineer's fault? At the end of the day, any user facing engineering is beholden to the product team. Engineers at Twitter can run experiments, but they can't get those experiments shipped unless a PM is behind it.
    • Gil Tene: The right way to read "99%'ile latency of a" is "1 or a 100 of occurrences of 'a' took longer than this. And we have no idea how long". That is the only information captured by that metric. It can be used to roughly deduce "what is the likelihood that 'a' will take longer than that?". But deducing other stuff from it usually simply doesn't work.
    • @esh: Unheralded tiny features like AWS Lambda inside Kinesis Firehose streams replace infrastructure monstrosities with a few lines of code
    • @postwait: Listening to this twitter caching talk... *so* glad my OS doesn't even contemplate OOMs. How is that shit still in Linux? A literal WTF.
    • SomeStupidPoint: Mostly, it was just a choice to save $1-2k on a laptop (every 1-2 years) and spend the money on cellphone data and lattes.
    • @timbray: Oracle trying to monetize Java... Golang/Rust/Elixir all looking better. Assume all JVM langs are potential targets.
    • Kathryn S. McKinley: In programming languages research, the most revolutionary change on the horizon is probabilistic programming, in which developers produce models that estimate the real world and explicitly reason about uncertainty in data and computations. 
    • cindy sridharan: Four Golden Signals 1) Latency 2) Traffic 3) Errors 4) Saturation
    • @FioraAeterna: as a tech company grows in size, the probability of it developing its own in-house bug tracking system approaches 1
    • The Attention Merchants: In 1928, Paley made a bold offer to the nation’s many independent radio stations. The CBS network would provide any of them all of its sustaining content for free—on the sole condition that they agree to carry the sponsored content as well

  • philips: Essentially I see the world broken down into four potential application types: 1) Stateless applications: trivial to scale at a click of a button with no coordination. These can take advantage of Kubernetes deployments directly and work great behind Kubernetes Services or Ingress Services. 2) Stateful applications: postgres, mysql, etc which generally exist as single processes and persist to disks. These systems generally should be pinned to a single machine and use a single Kubernetes persistent disk. These systems can be served by static configuration of pods, persistent disks, etc or utilize StatefulSets. 3) Static distributed applications: zookeeper, cassandra, etc which are hard to reconfigure at runtime but do replicate data around for data safety. These systems have configuration files that are hard to update consistently and are well-served by StatefulSets. 4) Clustered applications: etcd, redis, prometheus, vitess, rethinkdb, etc are built for dynamic reconfiguration and modern infrastructure where things are often changing. They have APIs to reconfigure members in the cluster and just need glue to be operated natively seemlessly on Kubernetes, and thus the Kubernetes Operator concept

  • Top 5 uses for Redis: content caching; user session store; job & queue management; high speed transactions; notifications.

  • Is machine learning being used in the wild? The answer appears to be yes. Ask HN: Where is AI/ML actually adding value at your company? Many uses you might expect and some unexpected: predicting if a part scanned with an acoustic microscope has internal defects; find duplicate entries in a large, unclean data set; product recommendations; course recommendations; topic detection; pattern clustering; understand the 3D spaces scanned by customers; dynamic selection of throttle threshold; EEG interpretation; predict which end users are likely to churn for our customers; automatic data extraction from web pages; model complex interactions in electrical grids in order to make decisions that improve grid efficiency;sentiment classification; detecting fraud; credit risk modeling; Spend prediction; Loss prediction; Fraud and AML detection; Intrusion detection; Email routing; Bandit testing; Optimizing planning/ task scheduling; Customer segmentation; Face- and document detection; Search/analytics; Chat bots; Topic analysis; Churn detection; phenotype adjudication in electronic health records; asset replacement modeling; lead scoring;  semantic segmentation to identify objects in the users environment to build better recommendation systems and to identify planes (floor, wall, ceiling) to give us better localization of the camera pose for height estimates; classify bittorrent filenames into media classify bittorrent filenames into media categories; predict how effective a given CRISPR target site will be; check volume, average ticket $, credit score and things of that nature to determine the quality and lifetime of a new merchant account; anomaly detection; identify available space in kit from images; optimize email marketing campaigns; investigate & correlate events, initially for security logs; moderate comments; building models of human behavior to provide interactive intelligent agents with a conversational interface; automatically grading kids' essays; Predict probability of car accidents based on the sensors of your smartphone; predict how long JIRA tickets are going to take to resolve; voice keyword recognition; produce digital documents in legal proceedings; PCB autorouting.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Sponsored Post: Loupe, New York Times, ScaleArc, Aerospike, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Wed, 2016-12-21 17:30

Who's Hiring?
  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com
Fun and Informative Events
  • Your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Categories: High Scalability

Has Amazon Overthrown Apple as the 'I Hate Buttons' Leader?

High Scalability - Mon, 2016-12-19 16:56

 

Steve Jobs is notorious for hating buttons. Here's Jobs explaining the foulness of buttons during his famous iPhone introduction:

What's wrong with their [other phones] user interface? The problem with them is really sort of in the bottom 40. They all have these keyboard that are there whether you need them or not to be there. And they all have these control buttons that are fixed in plastic and are the same for every application. Well every application wants a slightly different user interface, a slightly optimized set of buttons just for it. And what happens if you think of a great idea six months from now? You can't run around and add a button to these things. They're already shipped. So what do you do? It doesn't work because the buttons and the controls can't change.

The iPhone solved the button problem with a new multi-touch screen and by using your finger as the pointing device (not a nasty nasty stylus). We all know how this works now, but it was novel back in the olden days.

The iPhone was one of three new products based on revolutionary user interface development: the mouse and the Macintosh; the click-wheel and the iPod; multi-touch and the iPhone.

 

UI innovation is not enough on its own. Creating a new product category requires a combination of advanced hardware and new supporting software. The Mac was a completely new everything. The iPod paired with iTunes. And the iPhone leveraged OS X, iTunes, and a lot of very smart code for dealing with touch.

That's the history lesson.

Something curious has happened. Amazon. Amazon has escaped the land of misfit phones and has developed three brilliant new products based on revolutionary UIs and sophisticated software systems:

Categories: High Scalability

Stuff The Internet Says On Scalability For December 16th, 2016

High Scalability - Fri, 2016-12-16 16:56

Hey, it's HighScalability time:

 

This is the entire internet. In 1973! David Newbury found the map going through his dad's old papers.
If you like this sort of Stuff then please support me on Patreon.
  • 2.5 billion+: smartphones on earth; $36,000: loss making a VR game; $1 million: spent playing Game of War; 2000 terabytes: saved downloading Font Awesome's fonts per day; 14TB: new hard drives; 19: Systems We Love talks; 4,600Mbps: new 802.11ad Wi-Fi standard; 

  • Quotable Quotes:
    • Thomas Friedman: [John] Doerr immediately volunteered to start a fund that would support creation of applications for this device by third-party developers, but Jobs wasn’t interested at the time. He didn’t want outsiders messing with his elegant phone.
    • Fastly: For every problem in computer networking there is a closed-box solution that offers the correct abstraction at the wrong cost. 
    • ben stopford: The Data Dichotomy. Data systems are about exposing data. Services are about hiding it.
    • Ernie: just as Amazon invaded the CDN ecosystem with CloudFront and S3, CDNs are going to invade the cloud compute space of AWS.
    • The Attention Merchants: When not chronicling death in its many forms, Bennett loved to gain attention for his paper by hurling insults and starting fights. Once he managed in a single issue to insult seven rival papers and their editors. He was perhaps the media’s first bona fide “troll.” As with contemporary trolls, Bennett’s insults were not clever.
    • @swardley: "Serving 2.1 million API requests for $11" not bad at all. My company site used to cost £19 pcm
    • hibikir: I don't know about Uber, but I've worked at a lot of places that had sensitive data. A common patterns is to fail to treat employees like attackers, and protect data in ways that are very beatable by a motivated employee. 
    • @davecheney: OH: lambdas are stored procedures for millenials.
    • @jamesurquhart: This. Containers will play a huge role in low-level service deployments, but not user facing (e.g. “consumer”) app deployments (5-7 years).
    • theptip: Geo-redundancy seems like a luxury, until your entire site comes down due to a datacenter-level outage. (E.g. the power goes down, or someone cuts the internet lines when doing construction work on the street outside).
    • Resilience Thinking: The ruling paradigm-that we can optimize components of a system in isolation of the rest of the system-is proving inadequate to deal with the dynamic complexity of the real world.
    • Eliezer Steinbock: Disconnect users when they’ve just left their tab open. It’s so simple to do and saves precious resources
    • @ieatkillerbees: In 20 years of engineering I've never said, "thank goodness we hired someone who can reverse a b tree on a whiteboard while strangers watch"
    • Rushkoff: I think as people realize they can’t get jobs in this highly centralized digital economy, as companies realize that it might be better to beat them than join them, I think we will see the retrieval of some of these earlier networking values.
    • Darren Cibis: I think BigQuery is the better product at this stage, however, it’s had a big head start over Athena which has a lot of catching up to do.
    • Fastly: Over the span of a day, IoT devices were probed for vulnerabilities 800 times per hour by attackers from across the globe.
    • Quantum Gravity Research Could Unearth the True Nature of Time: somehow, you can emerge time from timeless degrees of freedom using entanglement.
    • @SystemsWeLove: "You can think of the OS as the bouncer at Club CPU: if a VIP comes in and buys up the place, you're out." -- @arunthomas #systemswelove
    • Erik Darling: When starting to index temp tables, I usually start with a clustered index, and potentially add nonclustered indexes later if performance isn’t where I want it to be.
    • Customers Don’t Give a Shit About Your Data Centers: My youngest daughter co-developed an Alexa skill called PotterHead. By taking advantage of the templates and how-to instructions, the skill was designed, developed, tested, and deployed within 24 hours — without a data center or any knowledge of ansible, git, jenkins, chef, or kubernetes.

  • In summary: mobile is [still] eating the world, everything is changing, nobody knows where it will all end up. And people are scared. Interesting observation on the new scale: Facebook, Amazon, Apple, and Google are 10x bigger than Microsoft & Intel when they were changing the world. 

  • Bigger is not always better when it comes to datacenters. AWS re:Invent 2016: Tuesday Night Live with James Hamilton. Amazon could easily build 200 megawatt (MW) facilities, yet they choose to build mostly 32MW facilities. Why? The data tells them to. What does the data say? The law of diminishing returns. The cost savings don't justify having a larger failure domain. When you start small and scale up a datacenter you get really big gains in cost advantage. As you get bigger and bigger it's a logarithm. The gains of going bigger are relatively small. The negative gain of a big datacenter is linear. If you have a 32MW datacenter tha's about 80k servers it's bad if it goes down, but it can be handled so that it's unnoticeable. If a datacenter with 500K server goes down the amount of network traffic needed to heal all the problems is difficult to handle.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Ask High Scalability: How to build anonymous blockchain communication?

High Scalability - Wed, 2016-12-14 17:09

This question came in over the Internets. If you have any ideas please consider sharing them if you have the time...

I am building a 2 way subscription model I am working on a blockchain project where in I have to built a information/data portal where in I will have 2 types of users data providers and data recievers such that there should be anonimity between both of these.

Please guide me how can I leverage blockchain (I think Etherium would be useful in this context but not sure) so that data providers of my system can send messages to data receivers anonymously and vice versa data receivers can request for data through my system to data providers.

I believe, it work if we can create a system where in if a user has data, it will send description to the server, The system will host this description about data without giving the data provider details.

Simultaneously server will store info which user has the data. When data receiver user logs in to system and wants and sees the description of data and wants to analyze that data, it will send request to server for that data. This request is stored in the server and it will allow access to data without receiver knowing who wants to access that data, but it will trigger a message to receiver that an anonymous user wants to access data and would data.

Can you please guide me how to build architecture of this system and how to proceed to do a POC?

Categories: High Scalability

A Scalable Alternative to RESTful Communication: Mimicking Google’s Search Autocomplete with a Single MigratoryData Server

High Scalability - Tue, 2016-12-13 16:56

This is a guest post by Mihai Rotaru, CTO of MigratoryData.

Using the RESTful HTTP request-response approach can become very inefficient for websites requiring real-time communication. We propose a new approach and exemplify it with a well-known feature that requires real-time communication, and which is included by most websites: search box autocomplete.

Google, which is one of the most demanding web search environments, seems to handle about 40,000 searches per second according to an estimation made by Internet Live Stats. Supposing that for each search, a number of 6 autocomplete requests are made, we show that MigratoryData can handle this load using a single 1U server.

More precisely, we show that a single MigratoryData server running on a 1U machine can handle 240,000 autocomplete requests per second from 1 million concurrent users with a mean round-trip latency of 11.82 milliseconds.

The Current Approach and Its Limitations
Categories: High Scalability

Stuff The Internet Says On Scalability For December 9th, 2016

High Scalability - Fri, 2016-12-09 16:56

Hey, it's HighScalability time:

 

Here's a 1 TB hard drive in 1937. Twenty workers operated the largest vertical letter file in the world. 4000 SqFt. 3000 drawers, 10 feet long. (from @BrianRoemmele)
If you like this sort of Stuff then please support me on Patreon.
  • 98%~ savings in green house gases using Gmail versus local servers; 2x: time spent on-line compared to 5 years ago; 125 million: most hours of video streamed by Netflix in one day; 707.5 trillion: value of trade in one region of Eve Online; $1 billion: YouTube's advertisement pay-out to the music industry; 1 billion: Step Functions predecessor state machines run per week in AWS retail; 15.6 million: jobs added over last 81 months;

  • Quotable Quotes:
    • Gerry Sussman~ in the 80s and 90s, engineers built complex systems by combining simple and well-understood parts. The goal of SICP was to provide the abstraction language for reasoning about such systems...programming today is more like science. You grab this piece of library and you poke at it. You write programs that poke it and see what it does. And you say, ‘Can I tweak it to do the thing I want?
    • @themoah: Last year Black Friday weekend: 800 Windows servers with .NET. This year: 12 Linux servers with Scala/Akka. #HighScalability #Linux #Scala
    • @swardley: If you're panicking over can't find AWS skills / need to go public cloud - STOP! You missed the boat. Focus now on going serverless in 5yrs.
    • @jbeda: Nordstrom is running multitenant Kubernetes cluster with namespace per team. Using RBAC for security.
    • Tim Harford: What Brailsford says is, he is not interested in team harmony. What he wants is goal harmony. He wants everyone to be focused on the same goal. He doesn’t care if they like each other and indeed there are some pretty famous examples of people absolutely hating each other. 
    • @brianhatfield: SUB. MILLISECOND. PAUSE. TIME. ON. AN. 18. GIG. HEAP. (Trying out Go 1.8 beta 1!)
    • haberman: If you can make your system lock-free, it will have a bunch of nice properties: - deadlock-free - obstruction-free (one thread getting scheduled out in the middle of a critical section doesn't block the whole system) - OS-independent, so the same code can run in kernel space or user space, regardless of OS or lack thereof 
    • Neil Gunther: The world of performance is curved, just like the real world, even though we may not always be aware of it. What you see depends on where your window is positioned relative to the rest of the world. Often, the performance world looks flat to people who always tend to work with clocked (i.e., deterministic) systems, e.g., packet networks or deep-space networks.
    • @yoz: I liked Westworld, but if I wanted hours of watching tech debt and no automated QA destroy a virtual world, I’d go back to Linden Lab
    • @adrianco: I think we are seeing the usual evolution to utility services, and new higher order (open source) functionality emerges /cc @swardley
    • Neil Gunther: a buffer is just a queue and queues grow nonlinearly with increasing load. It's queueing that causes the throughput (X) and latency (R) profiles to be nonlinear.
    • Juho Snellman: I think [QUIC] encrypting the L4 headers is a step too far. If these protocols get deployed widely enough (a distinct possibility with standardization), the operational pain will be significant.
    • @Tobarja: "anyone who is doing microservices is spending about 25% of their engineering effort on their platform" @jedberg 
    • @cdixon: 2016 League of Legends finals: 43M viewers 2016 NBA finals: 30.8M viewers 
    • @mikeolson: 7 billion people on earth; 3 billion images shared on social media every day. @setlinger at #StrataHadoop
    • @swardley: When you think about AWS Lambda, AWS Step Functions et al then you need to view this through the lens of automating basic doctrine i.e. not just saying it and codifying in maps and related systems but embedding it everywhere. At scale and at the speed of competition that I expect us to reach then this is going to be essential.
    • Jakob Engblom: hardware accelerators for particular  common expensive tasks seems to be the right way to add performance at the smallest cost in silicon area and power consumption.
    • Joe Duffy: The future for our industry is a massively distributed one, however, where you want simple individual components composed into a larger fabric. In this world, individual nodes are less “precious”, and arguably the correctness of the overall orchestration will become far more important. I do think this points to a more Go-like approach, with a focus on the RPC mechanisms connecting disparate pieces
    • @cmeik: AWS Lambda is cool if you never had to worry about consistency, availability and basically all of the tradeoffs of distributed systems.
    • prions: As a Civil Engineer myself, I feel like people don't realize the amount of underlying stuff that goes into even basic infrastructure projects. There's layers of planning, design, permitting, regulations and bidding involved. It usually takes years to finally get to construction and even then there's a whole host of issues that arise that can delay even a simple project. 
    • Netflix: If you can cache everything in a very efficient way, you can often change the game. 
    • The Attention Merchants: One [school] board in Florida cut a deal to put the McDonald’s logo on its report cards (good grades qualified you for a free Happy Meal). In recent years, many have installed large screens in their hallways that pair school announcements with commercials. “Take your school to the digital age” is the motto of one screen provider: “everyone benefits.” What is perhaps most shocking about the introduction of advertising into public schools is just how uncontroversial and indeed logical it has seemed to those involved.

  • Just how big is Netflix? The story of the tape is told in Another Day in the Life of a Netflix Engineer. Netflix runs way more than 100K EC2 instances and more than 80,000 CPU cores. They use both predictive and reactive autoscaling, aiming for not too much or too little, just the right amount. Of those 100K+ instances they will autoscale up and down 20% of that capacity everyday. More than 50Gbps ELB traffic per region. More than 25Ggps is telemetry data from devices sending back customer experience data. At peak Netflix is responsible over 37% of Internet traffic. The monthly billing file for Netflix is hundreds of megabytes with over 800 million lines of information. There's a hadoop cluster at Amazon whose only purpose is to load Netflix's bill. Netflix considers speed of innovation to be a strategic advantage. About 4K code changes are put into production per day. At peak over 125 million hours of video were streamed in a day. Support for 130 countries was added in one day. That last one is the kicker. Reading about Netflix over all these years you may have got the idea Netflix was over engineered, but going global in one day was what it was all about. Try that if you are racking and stacking. 

  • Oh how I miss stories that began Once upon a time. The start of so many stories these days is The attack sequence begins with a simple phishing scheme. This particular cautionary tale is from Technical Analysis of Pegasus Spyware, a very, almost lovingly, detailed account of the total ownage of the "secure" iPhone. The exploit made use of three zero-day vulnerabilities: CVE-2016-4657: Memory Corruption in WebKit, CVE-2016-4655: Kernel Information Leak, CVE-2016-4656: Kernel Memory corruption leads to Jailbreak. Do not read if you would like to keep your Security Illusion cherry intact. 

  • Composing RPC calls gets harder has the graph of calls and dependencies explodes. Here's how Twitter handles it. Simplify Service Dependencies with Nodes. Here's their library on GitHub. It's basically just a way to setup a dependency graph in code and have all the RPCs executed to the plan. It's interesting how parallel this is to setting up distributed services in the first place. They like it: We have saved thousands of lines of code, improved our test coverage and ended up with code that’s more readable and friendly for newcomers. Also, AWS Step Functions.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Sponsored Post: Loupe, New York Times, ScaleArc, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Tue, 2016-12-06 16:56

Who's Hiring?
  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com

  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.
Fun and Informative Events
  • Your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Categories: High Scalability

The Tech that Turns Each of Us Into a Walled Garden

High Scalability - Mon, 2016-12-05 17:05

(source)

 

How we treat each other is based on empathy. Empathy is based on shared experience. What happens when we have nothing in common?

Systems are now being constructed so we’ll never see certain kinds of information. Each of us live in our own algorithmically created Skinner Box /silo/walled garden, fed only information AIs think will be simultaneously most rewarding to you and their creators (Facebook, Google, etc).

We are always being manipulated, granted, but how we are being manipulated has taken a sharp technology driven change and we should be aware of it. This is different. Scary different. And the technology behind it all is absolutely fascinating.

Divided We Are Exploitable
Categories: High Scalability

Stuff The Internet Says On Scalability For December 2nd, 2016

High Scalability - Fri, 2016-12-02 16:56

Hey, it's HighScalability time:

 

A phrase you've probably heard a lot this week: AWS announces...

 

If you like this sort of Stuff then please support me on Patreon.
  • 18 minutes: latency to Mars; 100TB: biggest dynamodb table; 55M: visits to Kaiser were virtual; $2 Billion: yearly Uber losses; 91%: Apple's take of smartphone profits; 825: AI patents held by IBM; $8: hourly cost of a spot welding in the auto industry; 70%: Walmart website traffic was mobile; $3 billion: online black friday sales; 80%: IT jobs replaceable by automation; $7500: cost of the one terabit per second DDoS attack on Dyn; 

  • Quotable Quotes:
    • @BotmetricHQ: #AWS is deploying tens of thousands of servers every day, enough to power #Amazon in 2005 when it was a $8.5B Enterprise. #reInvent
    • bcantrill: From my perspective, if this rumor is true, it's a relief. Solaris died the moment that they made the source proprietary -- a decision so incredibly stupid that it still makes my head hurt six years later.
    • Dropbox: it can take up to 180 milliseconds for data traveling by undersea cables at nearly the speed of light to cross the Pacific Ocean. Data traveling across the Atlantic can take up to 90 milliseconds.
    • @James_R_Holmes: The AWS development cycle: 1) Have fun writing code for a few months 2) Delete and use new AWS service that replaces it
    • @swardley: * asked "Can Amazon be beaten?" Me : of course * : how? Me : ask your CEO * : they are asking Me : have you thought about working at Amazon?
    • @etherealmind: Whatever network vendors did to James Hamilton at AWS, he is NEVER going to forgive them.
    • Stratechery: the flexibility and modularity of AWS is the chief reason why it crushed Google’s initial cloud offering, Google App Engine, which launched back in 2008. Using App Engine entailed accepting a lot of decisions that Google made on your behalf; AWS let you build exactly what you needed.
    • @jbeda: AWS Lambda@Edge thing is huge. It is the evolution of the CDN. We'll see this until there are 100s of DCs available to users.
    • erikpukinskis: Everyone in this subthread is missing the point of open source industrial equipment. The point is not to get a cheap tractor, or even a good one. The point is not to have a tractor you can service. The point is to have a shared platform.
    • John Furrier: Mark my words, if Amazon does not start thinking about the open-source equation, they could see a revolt that no one’s ever seen before in the tech industry. If you’re using open source to build a company to take territory from others, there will be a revolt.
    • @toddtauber: As we've become more sophisticated at quantifying things, we've become less willing to take risks. via @asymco
    • Resilience Thinking: Being efficient, in a narrow sense, leads to elimination of redundancies-keeping only those things that are directly and immediately beneficial. We will show later that this kind of efficiency leads to drastic losses in resilience.
    • Connor Gibson: By placing advertisements around the outside of your game (in the header, footer and sidebars) as well as the possibility video overlays it is entirely possible to earn up to six figures through this platform.
    • Google Analytics: And maybe, if nothing else, I guess it suggests that despite the soup du jour — huge seed/A rounds, massive valuations, binary outcomes— you can sometimes do alright by just taking less money and more time.
    • badger_bodger: I'm starting to get Frontend Fatigue Fatigue.
    • Steve Yegge: But now, thanks to Moore's Law, even your wearable Android or iOS watch has gigs of storage and a phat CPU, so all the decisions they made turned out in retrospect to be overly conservative.  And as a result, the Android APIs and frameworks are far, far, FAR from what you would expect if you've come from literally any other UI framework on the planet.  They feel alien. 
    • David Rosenthal: Again we see that expensive operations with cheap requests create a vulnerability that requires mitigation. In this case rate limiting the ICMP type 3 code 3 packets that get checked is perhaps the best that can be done.
    • @IAmOnDemand: Private on public cloud means the you can burst public/private workloads intothe public and shut down yr premise or... #reinvent
    • @allingeek: It isn’t “serverless" if you own the server/device. It is just a functional programing framework. #reinvent
    • brilliantcode: If you told me to use Azure two years ago I would've laughed you out of the room. But here I am in 2016, using Azure, using ASP.net + IIS on Visual Studio. that's some powerful shit and currently AWS has cost leadership and perceived switching cost as their edge.
    • seregine: Having worked at both places for ~4 years each, I would say Amazon is much more of a product company, and a platform is really a collection of compelling products. Amazon really puts customers first...Google really puts ideas (or technology) first.
    • api: Amazon seems to be trying to build a 100% proprietary global mainframe that runs everywhere.
    • Athas: No, it [Erlang] does not use SIMD to any great extent. Erlang uses message passing, not data parallelism. Erlang is for concurrency, not parallelism, so it would benefit little from these kinds of massively parallel hardware.
    • @chuhnk: @adrianco @cloud_opinion funnily those of us who've built platforms at various startups now think a cloud provider is the best place to be.
    • @jbeda: So the guy now in charge of building OSS communities at @awscloud says you should just join Amazon? Communities are built on diversity.
    • @JoeEmison: There's also an aspect of some of these AWS services where they only exist because of problems with other AWS services.
    • logmeout: Until bandwidth pricing is fixed rather than nickel and dimeing us to death; a lot of us will choose fixed pricing alternatives to AWS, GCP and Rackspace.
    • arcticfox: 100%. I can't stand it [AWS]. It's unlimited liability for anyone that uses their service with no way to limit it. If you were able to set hard caps, you could have set yours at like $5 or even $0 (free tier) and never run into that.
    • @edw519: I hate batch processing so much that I won't even use the dishwasher. I just wash, dry, and put away real time.
    • @CodeBeard: it could be argued that games is the last real software industry. Libraries have reduced most business-useful code to glue.
    • Gall's Law: A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
    • @mathewlodge: AWS now also designing its own ASICs for networking #Reinvent
    • @giano: From instances to services, AWS better than anybody else understood that use case specific wins over general purpose every day. #reinvent
    • @ben11kehoe: AWS hitting breadth of capability hard. Good counterpoint to recent "Google is 50% cheaper" news #reinvent
    • Michael E. Smith: But there are also positive effects of energized crowding. Urban economists and economic geographers have known for a long time that when businesses and industries concentrate themselves in cities, it leads to economies of scale and thus major gains in productivity. These effects are called agglomeration effects.
    • Andrew Huang: The inevitable slowdown of Moore’s Law may spell trouble for today’s technology giants, but it also creates an opportunity for the fledgling open-hardware movement to grow into something that potentially could be very big. 
    • Stratechery: This is Google’s bet when it comes to the enterprise cloud: open-sourcing Kubernetes was Google’s attempt to effectively build a browser on top of cloud infrastructure and thus decrease switching costs; the company’s equivalent of Google Search will be machine learning.

  • Just what has Amazon been up to?

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

How to Make Your Database 200x Faster Without Having to Pay More?

High Scalability - Mon, 2016-11-28 16:56

This is a guest repost Barzan Mozafari, an assistant professor at University of Michigan and an advisor to a new startup, snappydata.io, that recently launched an open source OLTP + OLAP Database built on Spark.

Almost everyone these days is complaining about performance in one way or another. It’s not uncommon for database administrators and programmers to constantly find themselves in a situation where their servers are maxed out, or their queries are taking forever. This frustration is way too common for all of us. The solutions are varied. The most typical one is squinting at the query and blaming the programmer for not being smarter with their query. Maybe they could have used the right index or materialized view or just re-write their query in a better way. Other times, you might have to spin up a few more nodes if your company is using a cloud service. In other cases, when your servers are overloaded with too many slow queries, you might set different priorities for different queries so that at least the more urgent one (e.g., CEO queries) finish faster. When the DB does not support priority queues, your admin might even cancel your queries to free up some resources for the more urgent queries.

No matter which one of these experiences you’ve had, you’re probably familiar with the pain of having to wait for slow queries or having to pay for more cloud instances or buying faster and bigger servers. Most people are familiar with traditional database tuning and query optimization techniques, which come with their own pros and cons. So we’re not going to talk about those here. Instead, in this post, we’re going to talk about more recent techniques that are far less known to people and in many cases actually lead to much better performance and saving opportunities.

To start, consider these scenarios:

Categories: High Scalability

Stuff The Internet Says On Scalability For November 25th, 2016

High Scalability - Fri, 2016-11-25 17:40

Hey, it's HighScalability time:

 

Margaret Hamilton was honored with the Presidential Medal of Freedom for writing Apollo guidance software. Oddly, she's absent from best programmers of all time lists.

 

If you like this sort of Stuff then please support me on Patreon.
  • 98 seconds: before camera infected with malware; zeptosecond: smallest fragment of time ever measured; 50%: Google Cloud cheaper than AWS; 50%: of the world is on-line;

  • Quotable Quotes:
    • @skamille: Sometimes I think that human societies just weren't meant to scale to billions of people sharing arbitrary information
    • @joshk0: At @GetArbor we use #kubernetes to host a 30K QPS ad-tech serving platform. Maybe smaller than Pokemon Go but nothing to sneeze at.
    • HFT Guy: 2016 should be remembered as the year Google became a better choice than AWS. If 50% cheaper is not a solid argument, I don’t know what is.
    • Glenn Marcus: Hybrid [Progressive Web App] development takes 260% more effort man hours than Native development.
    • Bruce Schneier: I want to suggest another way of thinking about it in that everything is now a computer: This is not a phone. It’s a computer that makes phone calls. A refrigerator is a computer that keeps things cold. ATM machine is a computer with money inside. Your car is not a mechanical device with a computer. It’s a computer with four wheels and an engine… And this is the Internet of Things, and this is what caused the DDoS attack we’re talking about.
    • Bruce Schneier: I don’t like this. I like the world where the internet can do whatever it wants, whenever it wants, at all times. It’s fun. This is a fun device. But I’m not sure we can do that anymore.
    • southpolesteve: [Lambda] is cheaper and simpler to operate than our previous ec2+Opsworks setup. We get code to production faster and spend more time on actual business problems vs infrastructure problems.
    • Carlo Rovelli: Meaning = Information + Evolution
    • chadscira: We have been using Rancher as well... It allowed us to move away from DO and AWS. Now most of our infra is from OVH :). It's been smooth sailing. Because of massive costs savings we were able to just reinvest it in our own redundancy. Also 12-factor apps are pretty damn resilient.
    • Fiahil: Making separate [Google] accounts might not be enough considering they allegedly banned accounts related to each others by recovery address. Why would you think they would not do the same with accounts sharing occasionally the same laptop, the same ip address, and the same first and last name ?
    • @swardley: Arghhh, one of those "can IBM beat Amazon?" .... the answer has three parts 1) the game has become harder  2) yes it could  3) no it won't
    • fest: Replaying the sensor inputs and evaluating new estimated state is a really good way of debugging failures (because you can't just stop the system mid-air and evaluate internal state). It also helps with regression test suite and trying out new algorithms quickly.
    • @Tibocut: «Institutions prefer to have trillions sitting still than redistributing them towards opportunities» @asymco https://youtu.be/nD8QszyiVTY  at 2h45
    • @AlanaMassey: A gathering of two or more average looking white men is referred to by biologists as "a podcast."
    • @RyanHoliday: "How slow men are in matters when they believe they have time and how swift they are when necessity drives them to it." Machiavelli
    • agataygurturk: We use route53 health checks to invoke API gateway and thus the backend Lambda.
    • Paul Biggar: Yeah, BDSM. It’s San Francisco. Everyone’s into distributed systems and BDSM.
    • @mims: Since the Apollo program, we've privatized the R&D that drives all innovation. That might be a problem.
    • Backblaze:  We have fewer drives because over the last quarter we swapped out more than 3,500 2 terabyte (TB) HGST and WDC hard drives for 2,400 8 TB Seagate drives. So we have fewer drives, but more data.
    • @lee_newcombe: Fun finding from my talk earlier.  40 attendees: 37 on cloud, 3 about to start.  Only one trying serverless.  There's your opportunity folks
    • Resilience Thinking: In resilient systems everything is not necessarily connected to everything else. Overconnected systems are susceptible to shocks and they are rapidly transmitted through the system. A resilient system opposes such a trend; it would maintain or create a degree of modularity.

  • Security expert Rob Graham with a stunning blow by blow twitter story of a botnet infecting his brand new security camera. The whole process starts within 98 seconds of putting the camera on the internet, which is far faster than an ordinary mortal can configure the device to be secure. This was a cheap camera that had good reviews. At some point we need to think about all this too cheap equipment as being funded by a Botnet Subsidy. It's almost too much of a coincidence that all these cheap devices, meant to be bought like candy in the mass consumer market, have such obviously poor security. Maybe it's not an accident? See also, Pre-installed Backdoor On 700 Million Android

  • Their profit margin is your opportunity. With The Era of Cloud Price Discounts Is Fading and the cost of metal continuing to decrease, is now a good time to consider transitioning to bare metal on-premise type infrastructures? The incentives are now coming into alignment. Kubernetes: Finally...A True Cloud Platform by Sam Ghods, Co-founder, Box makes a good case for Kubernetes as the only truly portable infrastructure option.

  • This is both pure genius and a sure sign of the apocalypse. Exclusive Interview: How Jared Kushner Won Trump The White House. Democrats may have thought they had a technological lead because of the last presidential election, but it turns out they were fighting the last war. Technology changed and they did not. Old: targeting, organizing and motivating voters. New: Moneyball meets Social Media with a twist of message tailoring, sentiment manipulation and machine learning. If this presidential election could be represented as a battle between Peter Thiel and Eric Schmidt: Thiel triumphed. Traditional microtargeting is almost quaint. Now, using Facebook's ability to target users with dark posts, a newsfeed message seen by no one aside from the users being targeted, each user can be shown a world specifically tailored to push and prod their particular buttons. For an explanation see The Secret Agenda of a Facebook Quiz. That's why it's both genius and apocalyptical. Things will never be the same. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

IT Hare: Ultimate DB Heresy: Single Modifying DB Connection. Part I. Performanc

High Scalability - Tue, 2016-11-22 16:56

Sergey Ignatchenko continues his excellent book series with a new chapter on databases. This is a guest repost

The idea of single-write-connection is used extensively in the post, as it's defined elsewhere I asked Sergey for a definition so the article would make a little more sense...

As for single-write-connection - I mean that there is just one app (named "DB Server" in the article) having a single DB connection to the database which is allowed to issue modifying statements (UPDATEs/INSERTs/DELETEs). This allows to achieve several important simplifications - first of all, all fundamentally non-testable concurrency issues (such as missing SELECT FOR UPDATE and deadlocks) are eliminated entirely, second - the whole thing becomes deterministic (which is a significant help to figure out bugs - even simple text logging has been seen to make the system quite debuggable, including post-mortem), and last but not least - this monopoly on updates can be used in quite creative ways to improve performance (in particular, to keep always-coherent app-level cache which can be like 100x-1000x more efficient than going to DB).

After we finished with all the preliminaries, we can now get to the interesting part – implementing our transactional DB and DB Server. We already mentioned implementing DB Server briefly in Chapter VII, but now we need much more detailed discussion on this all-important topic.

“Transactional / operational DB is a place where all the automated decisions are made about your game (stock exchange, bank, etc.)First of all, let’s re-iterate what we’re speaking about. Transactional/operational DB is a place where all the automated decisions are made about your game (stock exchange, bank, etc.).

It stores things such as player accounts, with all their persistent attributes etc. etc.; it also stores communications related to payment processing, and so on, and so forth. And “DB Server” is our app handling access to DBMS (as noted in Chapter VII, I am firmly against having SQL statements issued directly by your Game Servers/Game Logic, so an intermediary such as DB Server is necessary).

As discussed above, ACID properties tend to be extremely important for transactional/operational DB. We don’t want money – or that artifact which is sold for real $20K on eBay – to be lost or duplicated. For this and some other reasons, we’ll be speaking about SQL databases for our transactional/operational DB (while it is possible to use NoSQL for transactional/operational DB – achieving strict guarantees is usually difficult, in particular because of lack of multi-object ACID transactions in most of NoSQL DBs out there, see discussion in [[TODO]] section above).

And now, we’re finally ready to start discussing interesting things.

Multi-Connection DB Access
Categories: High Scalability
Syndicate content