High Scalability

Sponsored Post: Etleap, Pier 1, Aerospike, Loupe, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Tue, 2017-05-23 15:56

Who's Hiring? 
  • Pier 1 Imports is looking for an amazing Sr. Website Engineer to join our growing team!  Our customer continues to evolve the way she prefers to shop, speak to, and engage with us at Pier 1 Imports.  Driving us to innovate more ways to surprise and delight her expectations as a Premier Home and Decor retailer.  We are looking for a candidate to be another key member of a driven agile team. This person will inform and apply modern technical expertise to website site performance, development and design techniques for Pier.com. To apply please email cmwelsh@pier1.com. More details are available here.

  • Etleap is looking for Senior Data Engineers to build the next-generation ETL solution. Data analytics teams need solid infrastructure and great ETL tools to be successful. It shouldn't take a CS degree to use big data effectively, and abstracting away the difficult parts is our mission. We use Java extensively, and distributed systems experience is a big plus! See full job description and apply here.

  • Advertise your job here! 
Fun and Informative Events
  • DBTA Roundtable OnDemand Webinar: Leveraging Big Data with Hadoop, NoSQL and RDBMS. Watch this recent roundtable discussion hosted by DBTA to learn about key differences between Hadoop, NoSQL and RDBMS. Topics include primary use cases, selection criteria, when a hybrid approach will best fit your needs and best practices for managing, securing and integrating data across platforms. Brian Bulkowski, CTO and Co-founder of Aerospike, presented along with speakers from Cask Data and Splice Machine. View now.

  • Advertise your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • Etleap provides a SaaS ETL tool that makes it easy to create and operate a Redshift data warehouse at a small fraction of the typical time and cost. It combines the ability to do deep transformations on large data sets with self-service usability, and no coding is required. Sign up for a 30-day free trial.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Working on a software product? Clubhouse is a project management tool that helps software teams plan, build, and deploy their products with ease. Try it free today or learn why thousands of teams use Clubhouse as a Trello alternative or JIRA alternative.

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Categories: High Scalability

Stuff The Internet Says On Scalability For May 19th, 2017

High Scalability - Fri, 2017-05-19 15:56

Hey, it's HighScalability time:

 

 

Who wouldn't want to tour the Garden of Mathematical Sciences with Plato as their guide?
If you like this sort of Stuff then please support me on Patreon.
  • 2 billion: Android users; 1,000: cloud TPUs freely available to researchers; 11.5 petaflops: in Google's machine learning pod; 86 billion: neurons in the human brain, not 100 billion; 1,300: Amazon's new warehouses across Europe; $1 trillion: China self-investment; 1/7th: California's portion of US GDP; more: repetition in songs; 99.999%: Spanner availability, strong consistency, good latency; 6: successful SpaceX launch in 4 months; 160TB: RAM in HPE computer; 40,000+ workers: private offices > open offices

  • Quotable Quotes:
    • Tim Bray: with­out ex­cep­tion, I ob­served that they [Per­son­al com­put­er­s, Unix, C, the In­ter­net and We­b, Java, REST, mo­bile, pub­lic cloud] were ini­tial­ly load­ed in the back door by geek­s, with­out ask­ing per­mis­sion, be­cause they got shit done and helped peo­ple with their job­s. That’s not hap­pen­ing with blockchain. Not in the slight­est. Which is why I don’t be­lieve in it.
    • @swardley: Amazon continues to take industry after industry not because those companies lack engineering talent but executive talent.
    • @RichRogersIoT: "I bought my boss two copies of The Mythical Man Month so that he could read it twice as fast." - @rkoutnik
    • @GossiTheDog: Seeing ATMs and banks go down here suggests fundamental issues which flashing boxes can't fix. Design, architect a security model.
    • @stevesi: Is Google's TPU investment the biggest advantage ever or laying groundwork for being disrupted? Can Google out-innovate sum of industry?
    • Ryan Mac: Last year, Craigslist took in upwards of $690 million in revenue, most of which is net profit
    • @dberkholz: Capex vs opex budget for tools is a bigger deal than I'd fully appreciated. Welcome to the enterprise!
    • Vint Cerf: AI stands for artificial idiot. 
    • Douglas Hofstadter: In the end, we are self-perceiving, self-inventing, locked-in mirages that are little miracles of self-reference. 
    • cocktailpeanuts: I feel like the term "Serverless" has been hijacked to a point that it will soon become meaningless just like "AI", "IoT", etc. Basically "Serverless" in 2017 has become just a hype friendly marketing friendly way of saying "Saas".
    • @skupor: Over last 20 years, m&a exits for venture backed companies has gone from 60% to 90% of exits (was 20% in 1990)
    • bpicolo: C# with visual studio is, I think, the most productive environment I've come across in programming. It's ergonomically sound, straightforward, and the IDE protects me from all sorts of relevant errors. Steve mentioned Intellij is a bit slower than he'd hope typing sometimes. I totally agree with that. I think Visual Studio doesn't quite suffer from that.
    • @codepitbull: A good developer is like a werewolf: Afraid of silver bullets.
    • @sehnaoui: Coffee shop. People next to me are loud and rude. They just found the perfect name for their new business. I just bought the domain name.
    • David Robinson: Python and Javascript developers start and end the day a little later than C# users, and are a little less likely than C programmers to work in the evening.
    • Ben Thompson: The fatal flaw of software, beyond the various technical and strategic considerations I outlined above, is that for the first several decades of the industry software was sold for an up-front price, whether that be for a package or a license. The truth is that software — and thus security — is never finished; it makes no sense, then, that payment is a one-time event.
    • boulos: Spanner does things for you that MySQL et al. don't. Having an automagic Regional (and eventually Global if you'd like) database without dealing with sharding is worth $8k/year even to me. So even if it could fit on $10/month of hardware, I don't begrudge them for charging a service fee, rather than saying "This is how much cores, RAM, disk and flash this eats".
    • codedokode: One of the reasons why such attack was possible is poor security in Windows. Port 445 that was used in an attack is opened by a kernel driver (at least that is what netstat says on WinXP) that runs in ring 0. This driver is enabled by default even if the user doesn't need SMB server and it cannot be easily disabled.
    • @RichRogersIoT: Job interview:  Implement Large Hadron Collider on whiteboard / Actual job:  Jira bug-id #2342: Move login button 3 pixels to left
    • slackingoff2017: This is part of a worrying new trend. Increasingly you can't buy software anymore, only rent. Innovation is being kept from scrutiny hidden behind closed doors. The kind of thing patents were meant to prevent back when the system wasn't broken.
    • Scott Borg~ Engineers need to look at their products from the standpoint of the attacker, and consider how attacker would benefit from cyberattack and how to make undertaking that attack more expensive. It’s all about working to increase an attacker’s costs
    • @tottinge: "A code base isn't a thing we build, it's a place we live. We don't seek to finish it and move on, but to make it liveable"  @sarahmei
    • Sam Kroonenburg: We Believe …Don’t do the things that someone else can do. Do the things that only we can do. [re: Serverless]
    • Anush Mohandass: What you’re starting to see are different architectures for different workloads. There will be chips for image recognition, SQL, machine learning acceleration. 
    • Craig McLuckie: Given the current state-of-the-art, most users will achieve best day-to-day top line availability by just picking a single public cloud provider and running their app on one infrastructure.
    • watmough: Chromebooks work, and I am a big fan of them in education. I have a pretty good idea how hard our teachers work, and I'd hate to think of the Windows bullshit being imposed them, like it's imposed on me and my coworkers.
    • axilmar: It [React Native] is the future! But you need experience to make it work, and navigation/routing is still being worked out, and it is native, but it is Javascript, and it is crossplatform, but you need to be aware of the differences of the two platforms, and styling uses something that is like css but not entirely, you have to learn all the intricate details... Thank god software engineering "practices" are not used in other engineering disciplines...
    • Anton Howes: So without the British acceleration of innovation, the Industrial Revolution would likely have happened elsewhere within a few decades. France and the Low Countries and Switzerland and the United States were by the eighteenth century well on their way towards sustained modern economic growth. 
    • Dr. Suzana Herculano-Houzel~ evolution is not progress, all that evolution means is change over geological time, it's not for the better, it's not for the worst, it's just different. All it has to do with is generating diversity. We have ample evidence we are not descendents of reptiles, we are close cousins. We could not have a basic reptile brain to which something else was added. We know now that every reptile has a neo-cortex. There is not such thing as triune brain. There is no such thing as reptilian brain on top of which a new structure appeared only in mammals. We all have it. The brain is very much the same in its essence, the difference lies in the quantities. 
    • James Clear: The great mistake of Hurricane Katrina was that the levees and flood walls were not built with a proper “margin of safety.” The engineers miscalculated the strength of the soil the walls were built upon. As a result, the walls buckled and the surging waters poured over the top, eroding the soft soil and magnifying the problem. Within a few minutes, the entire system collapsed.
    • elvinyung: This "modern" Spanner feels very different from the one we saw in 2012 [1]. Some interesting takeaways: * There is a native SQL interface in Spanner, rather than relying on a separate upper-layer SQL layer, a la F1 [2] * Spanner is no longer on top of Bigtable! Instead, the storage engine seems to be a heavily modified Bigtable with a column-oriented file format * Data is resharded frequently and concurrently with other operations -- the shard layout is abstracted away from the query plan using the "distributed union" operator * Possible explanation for why Spanner doesn't support SQL DML writes: writes are required to be the last step of a transaction, and there is currently no support for reading uncommitted writes (this is in contrast to F1, which does support DML) * Spanner supports full-text search (!)

  • Cautionary tale number 1000 on depending on someone else's service. Firebase Costs Increased by 7,000%! Google changed something (billing for SSL overhead) and HomeAutomation's bill spiked. There was no warning. There were no tools to tell why. Support stopped replying. There's no one to call. The recommendation is to protect yourself from being trapped by a service from the very beginning. They've moved to Lambda/DynamoDb, which many point out is also a potential service trap. The Firebase Founder responded with an explanation, saying he was "embarrassed by the level of communication on our side." Good discussion on HackerNews and on reddit. Lots of people with similar stories, complaints about lack of support with Google, complaints about lack of transparency, and the usual about never rely on anything ever. 

  • Serverlessconf Austin '17 videos are now available (most of them anyway). 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Is Serverless the New Visual Basic?

High Scalability - Mon, 2017-05-15 16:35

With Serverless hiring less experienced developers can work out better than hiring experienced cloud developers. That's an interesting point I haven't heard before and it was made by Paul Johnston, CTO of movivo, in The ServerlessCast #6 - Event-Driven Design Thinking.

The thought process goes something like this...

An experienced cloud developer will probably think procedurally, in terms of transactional systems, frameworks, and big fat containers that do lots of work. 

That's not how a Serverless developer needs to think. A Serverless developer needs to think in terms of small functions that do one thing linked together by events; and they need to grok asynchronous and distributed thinking.

So the idea is you don't need typical developer skills. Paul finds people with sysadmin skills have the right stuff. Someone with a sysadmin background is more likely than a framework developer to understand the distributed thinking that goes with building an entire system of events.

Paul also makes the point that once a system has built experienced developers will get bored because Serverless systems don't require the same amount of maintenance.

For example, they had good success hiring a person with two years of vo-tech on-the-job training because they didn't have the baggage of working with frameworks and servers and all of those kind of things. That baggage gets in the way.

So hire younger, hungrier developers who don't have that experience behind them. 

Obviously "younger, hungrier" and "less experienced" also means cheaper, not that there's anything wrong with that. Developers are hard to find.

We've seen this kind of thing before. Using Visual Basic lots of systems were built that did real and important work for companies by relatively inexperienced people because VB made it so easy to write a Windows program. It was really difficult and time-consuming to write a Windows program, like it's really difficult and time-consuming to write a cloud program today. Like VB, Serverless radically reduces the expertise needed to write a cloud program. 

Though they got the job done, most of those VB programs were technical debt bombs. Over time as more and more functionality was bolted on they became hard to understand, hard to change, hard to test, and were poorly designed. Your classic Big Ball of Mud.

A lot of the problem was VB made it easy to include business logic in event handlers, so there was no layering, the GUI was the orchestrator. This made VB programs hard to test. Serverless also has this problem. Inexperienced programmers also used a lot of global variables in VB programs so there wasn't a clean separation of concerns. Coupling was high and cohesion was low. Serverless also has this problem, though obviously there are no global variables in the code, the database effectively becomes a store for global variables that can be accessed from any Serverless function.

It will be interesting to see if Serverless can avoid VB's fate.

On HackerNews

Categories: High Scalability

Stuff The Internet Says On Scalability For May 12th, 2017

High Scalability - Fri, 2017-05-12 15:56

Hey, it's HighScalability time:

 

 

Earth's surface is covered with accidental hidden letters. Can you find them? (ABC: The Alphabet from the Sky)

 

If you like this sort of Stuff then please support me on Patreon.
  • 1 million: cord cutters in Q1; 500 billion: FINRA validations of stock trades every day on Lambda; 100k: messages sent per hour at Airbnb; 21.1 billion: transistors in GV100 GPU; 11,500: crashes to train a drone; 84,469: Backblaze hard drives; 8,000: questions per day asked on StackOverflow; 

  • Quotable Quotes:
    • Jonathan Taplin: Google Is as Close to a Natural Monopoly as the Bell System Was in 1956
    • Tom Goldenberg: more companies on the site [StackShare.io] use JavaScript on the back-end (6,000) than Python (4,100) or Java (3,900).
    • Andrew Shafer: The dark ages of of the relational database and the Java middleware stack paused everything for a decade. 
    • @Taytus: "We are early stage investors. Call me when you hit 1 million monthly active users"
    • @chrisjrn: "At this point I was drunk on Perl" @bradfitz #tweetsincontext #oscon
    • Bryan Cantrill: AWS is underwriting a war on big box retail. 
    • Paul Gilster: You’re reading that right — one-tenth of a milliwatt is enough to create error-free communications between the Sun and Alpha Centauri through two FOCAL antennas [gravitational lens].
    • Vadim Markovtsev: There is a productivity peak between 2 pm and 5 pm for all the languages, when the commit frequency is the highest. This is the industry’s golden time. Managers should never distract coders during this interval.
    • Patrick Tucker: The goal, one day, is a neural net that can learn instantaneously, continuously, and in real-time, by observing the brainwaves and eye movement of highly trained soldiers doing their jobs.
    • @alicegoldfuss: it is incredibly difficult to balance "don't burn out and become a statistic" with "get as far as you can fast so they can't take it away"
    • Jonathan Taplin: With the advent of YouTube and other streaming services, revenue for musicians has fallen 70%. If you had a song that had a million downloads on iTunes, you would get $900,000. On YouTube, you’d get $900.
    • David Robinson: In short, if we had to summarize the average story [after analyzing 100,000 stories] that humans tell, it would go something like Things get worse and worse until at the last minute they get better.
    • Confucius: He who cannot describe the problem will never find the solution to that problem
    • Peter Thiel: competition is for losers
    • Jason McGee~ Serverless adoption is moving 10x faster than Container adoption.
    • Max Ehrenfreund: An average, workers born in 1942 earned as much or more over their careers than workers born in any year since
    • Michael Elad: To put it bluntly, your grandchild is likely to have a robot spouse. And here is the punch line: much of the technology behind this bizarre future is likely to emerge from deep learning and its descendant fields.
    • aliostad: We just did a benchmarking for a PoC on DocumentDB side-by-side Cassandra. It does the job, I have not yet seen anything revolutionary. Cassandra benchmarks seemed better.
    • AWS Lambda Engineer: When you develop a Lambda function that uses SQS, SNS, Dynamo and other stuff in the cloud.. you can’t really debug it on your local. People just need to change their mindset
    • sbuttgereit: What looks compelling about the PostgreSQL offering as compared to AWS RDS is that it looks like you get a PostgreSQL cluster rather than a single database in a shared cluster.
    • Warren Toomey: Simulated hardware is infinitely easier to obtain, configure and diagnose than real hardware.
    • Kate Kaye: The mistake companies have made, he says, is to rely too much on targeted advertising, cutting too far back on broader advertising that builds brand awareness with people outside the existing customer base and eventually leads to new sales.
    • cbanek: I've had to work on mission critical projects with 100% code coverage (or people striving for it). The real tragedy isn't mentioned though - even if you do all the work, and cover every line in a test, unless you cover 100% of your underlying dependencies, and cover all your inputs, you're still not covering all the cases.
    • There's just too many quotes. Please read the full article to see them all.

  • Is bundling a race to the bottom for content creators? What's the future of game monetization?: the value of games seems to keep falling...The fact that we want everything free now because it costs less (not 'nothing', remember) to produce each additional unit is a fairly entitled view and, I suggest, it would lead to the destruction of the  games industry in the same way that it's gutted the music industry...The success of Spotify and Netflix's models in other industries worries me and we see a bit of a move in that direction with things like Humble Bundles...If we're not careful, we'll get to where there's no money to be made in games and only the most trite, generic, relatively low cost and mass-appealing titles (the Call of Duties and FIFAs) will be financially viable...it's worth noting that these titans are resorting to F2P to try and shore up their player numbers. Will we ever see subscription models in new games again?

  • A 10,000+ phone Chinese click farm looks a lot like Facebook's mobile device testing lab

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Sponsored Post: Etleap, Pier 1, Aerospike, Loupe, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Tue, 2017-05-09 15:59

Who's Hiring? 
  • Pier 1 Imports is looking for an amazing Sr. Website Engineer to join our growing team!  Our customer continues to evolve the way she prefers to shop, speak to, and engage with us at Pier 1 Imports.  Driving us to innovate more ways to surprise and delight her expectations as a Premier Home and Decor retailer.  We are looking for a candidate to be another key member of a driven agile team. This person will inform and apply modern technical expertise to website site performance, development and design techniques for Pier.com. To apply please email cmwelsh@pier1.com. More details are available here.

  • Etleap is looking for Senior Data Engineers to build the next-generation ETL solution. Data analytics teams need solid infrastructure and great ETL tools to be successful. It shouldn't take a CS degree to use big data effectively, and abstracting away the difficult parts is our mission. We use Java extensively, and distributed systems experience is a big plus! See full job description and apply here.

  • Advertise your job here! 
Fun and Informative Events
  • DBTA Roundtable OnDemand Webinar: Leveraging Big Data with Hadoop, NoSQL and RDBMS. Watch this recent roundtable discussion hosted by DBTA to learn about key differences between Hadoop, NoSQL and RDBMS. Topics include primary use cases, selection criteria, when a hybrid approach will best fit your needs and best practices for managing, securing and integrating data across platforms. Brian Bulkowski, CTO and Co-founder of Aerospike, presented along with speakers from Cask Data and Splice Machine. View now.

  • Advertise your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • Etleap provides a SaaS ETL tool that makes it easy to create and operate a Redshift data warehouse at a small fraction of the typical time and cost. It combines the ability to do deep transformations on large data sets with self-service usability, and no coding is required. Sign up for a 30-day free trial.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Working on a software product? Clubhouse is a project management tool that helps software teams plan, build, and deploy their products with ease. Try it free today or learn why thousands of teams use Clubhouse as a Trello alternative or JIRA alternative.

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Categories: High Scalability

Privacy: Bartering Data for Services

High Scalability - Mon, 2017-05-08 15:56

Data is the new currency. A phrase we’ve heard frequently in the wake of the story of Unroll.me selling user data to Uber.

Two keys to that story:

  • Users didn’t realize their data was being sold.
  • Free services can be considered a sophisticated form of phishing attack.

In both cases prevention requires user awareness. How do we get user awareness? Force meaningful disclosure. How do we force meaningful disclosure? Here’s an odd thought: use the tax system.

If data is the new currency then why isn’t exchanging data for use of a service a barter transaction? If a doctor exchanges medical services for chickens, for example, that is a taxable event at fair market value. It's a barter arrangement. A free service that sells user data is similarly bartering the service for data, otherwise said service would not be offered. 

How would it work?

  • Service providers send out 1099-Bs to users for the fair market value of the service. Fair market value could be determined using a similar for pay service or as a percentage of the income generated from the data being sold.

  • The IRS treats barter transactions as income received. Users would need to pay income tax for the “free” services they use that sell their data.

What would it accomplish?

  • Force disclosure by services. Businesses making money selling data would be forced to inform their users that they are doing so because it’s required for tax accounting.

  • Eyes Wide Open. Users would know for certain that the services they are using are selling their data. They could then determine if the relationship is worth the cost.

This would not prevent free service for data arrangements. There’s nothing wrong with exchanging data for a service, but everyone should enter such a transaction knowingly.

Categories: High Scalability

Stuff The Internet Says On Scalability For May 5th, 2017

High Scalability - Fri, 2017-05-05 16:20

Hey, it's HighScalability time:

 

 

GPUs and CPUs run hot hot hot. See them in action with thermal imaging. (Tested)

 

If you like this sort of Stuff then please support me on Patreon.
  • 25ms: SpaceX satellite latency; 17 million: tax returns received by IRS during week ending April 21; 1.94 billion: Facebook users; 1.2 billion: Lambda requests by Expedia / month; ~$91.5K: Capital One's yearly Serverless TCO; 1.2 billion: Facebook Messenger users; 215 petabytes: storage per gram of DNA; 1/2: households in US are Amazon Prime members; 50.8%: households in US that are mobile phone only; 80 billion: street view images; 3 million: open sourced Instacart orders; $175: RaaS (ransomware-as-a-service); 350,000+: Amazon employees; 

  • QuotableQuotes:
    • Paul Barnum: You can have a second computer when you've shown you know how to use the first one
    • @chrisalbon: 2007: “You are the product.”  2017: “You are the training data.”
    • shitloadofbooks: As an Ops guy, I preach Ansible + systemd all day everyday, but so many of our Devs (and Ops) have drunk the containerization Kool-aid.
    • roland-s: Like you, I'm sometimes unsure if this is the right choice. Maybe a monolithic server or traditional VMs + Puppet would be easier, simpler, better? In the end, I think Docker just fit with the way I conceptualized my problem so I went for it.
    • Venki Ramakrishnan: each experiment generates several terabytes of data, which is then massaged, analyzed, and reduced, and finally you get a structure. 
    • @dberkholz: A 19-line sample pulled in 190,000 lines of code in dependencies. Is that what you call a 10000x programmer? #ServerlessConf
    • @asymco: Apple Watch continues to struggle as unit sales more than doubled in six of top 10 markets
    • @pomeranian99: Memory leaks on missiles don't matter, so long as the missile explodes before too much leaks. A 1995 memo: 
    • Paul Johnston: Most of these vendors can cope with what you throw at them so just go for it and stop trying to keep your options open. That way lies madness and mediocrity for your solution (at present).
    • @BrewersStats: 0.3% of the largest breweries make 69.3% of the beer. Conversely, 76.5% of the smallest make 0.7% of the beer.
    • @howardlindzon: Apple is 12.3 billion away from being the first Trillion dollar company
    • @michael_adda: Completely agree with the #serverless async/sync argument "Concurrency within a flow? it needs to move into our infrastructure"
    • resident_ninja: making literally EVERYTHING a stored proc creates a very bad, tight coupling between the app and db, kills scalability, and increases the pain of app and website deployments 
    • Impact Lab: There are about 1,200 malls in America today. In a decade, there might be about 900. That’s not quite the “the death of malls.” But it is decline, and it is inevitable.
    • Joel Frohlich: at that point in history, no other human being had ever experienced a focused beam of radiation at such high energy
    • Shazam: Whenever a user Shazams a song, our algorithm uses GPUs to search that database until it finds a match. This happens successfully over 20 million times per day.
    • Dmitri Zimine: You will rewrite your app, not to move to the other provider but by the progress of your cloud provider. They change existing services and introduce new ones
    • There's just too much. Read more by clicking through to the full article.

  • Filed under the coolest use of machine learning category. Algorithmic ‘Printed’ Fields Could Make Farms More Productive and Resilient: UK-based designer Benedikt Groß has created algorithmic models that enable him to plant various crops in complex patterns in a field. This improves ecological resilience and diversity through fascinating patterns that are best appreciated from above.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Homegrown master-master replication for a NoSQL database

High Scalability - Wed, 2017-05-03 16:03

Many of you may have already heard about the high performance of the Tarantool DBMS, about its rich toolset and certain features. Say, it has a really cool on-disk storage engine called Vinyl, and it knows how to work with JSON documents. However, most articles out there tend to overlook one crucial thing: usually, Tarantool is regarded simply as storage, whereas its killer feature is the possibility of writing code inside it, which makes working with your data extremely effective. If you’d like to know how igorcoding and I built a system almost entirely inside Tarantool, read on.

If you’ve ever used the Mail.Ru email service, you probably know that it allows collecting emails from other accounts. If the OAuth protocol is supported, we don’t need to ask a user for third-party service credentials to do that — we can use OAuth tokens instead. Besides, Mail.Ru Group has lots of projects that require authorization via third-party services and need users’ OAuth tokens to work with certain applications. That’s why we decided to build a service for storing and updating tokens.

I guess everybody knows what an OAuth token looks like. To refresh your memory, it’s a structure consisting of 3–4 fields:

Categories: High Scalability

The AdStage Migration from Heroku to AWS

High Scalability - Mon, 2017-05-01 16:03

This is a guest repost by G Gordon Worley III, Head of Site Reliability Engineering at AdStage.

When I joined AdStage in the Fall of 2013 we were already running on Heroku. It was the obvious choice: super easy to get started with, less expensive than full-sized virtual servers, and flexible enough to grow with our business. And grow we did. Heroku let us focus exclusively on building a compelling product without the distraction of managing infrastructure, so by late 2015 we were running thousands of dynos (containers) simultaneously to keep up with our customers.

We needed all those dynos because, on the backend, we look a lot like Segment, and like them many of our costs scale linearly with the number of users. At $25/dyno/month, our growth projections put us breaking $1 million in annual infrastructure expenses by mid-2016 when factored in with other technical costs, and that made up such a large proportion of COGS that it would take years to reach profitability. The situation was, to be frank, unsustainable. The engineering team met to discuss our options, and some quick calculations showed us we were paying more than $10,000 a month for the convenience of Heroku over what similar resources would cost directly on AWS. That was enough to justify an engineer working full-time on infrastructure if we migrated off Heroku, so I was tasked to become our first Head of Operations and spearhead our migration to AWS.

It was good timing, too, because Heroku had become our biggest constraint. Our engineering team had adopted a Kanban approach, so ideally we would have a constant flow of stories moving from conception to completion. At the time, though, we were generating lots of work-in-progress that routinely clogged our release pipeline. Work was slow to move through QA and often got sent back for bug fixes. Too often things “worked on my machine” but would fail when exposed to our staging environment. Because AdStage is a complex mix of interdependent services written on different tech stacks, it was hard for each developer to keep their workstation up-to-date with production, and this also made deploying to staging and production a slow process requiring lots of manual intervention. We had little choice in the matter, though, because we had to deploy each service as its own Heroku application, limiting our opportunities for automation. We desperately needed to find an alternative that would permit us to automate deployments and give developers earlier access to reliable test environments.

So in addition to cutting costs by moving off Heroku, we also needed to clear the QA constraint. I otherwise had free reign in designing our AWS deployment so long as it ran all our existing services with minimal code changes, but I added several desiderata:

Categories: High Scalability

Stuff The Internet Says On Scalability For April 28th, 2017

High Scalability - Fri, 2017-04-28 16:05

Hey, it's HighScalability time:

 

Do you understand the power symbol? I always think of O as a circuit being open, or off, and the | as the circuit being closed, or on. Wrong! Really the symbols are binary, 0 for false, or off, 1 for true, or on. Mind blown.
If you like this sort of Stuff then please support me on Patreon.
  • 220,000-Core: largest Google Compute Engine job; 100 million: Netflix subscribers; 1.3M: Sling TV subscribers; 200: Downloadable Modern Art Books; 25%: Americans Won't Subscribe To Traditional Cable; 84%: image payload savings using smart CDN; 10^5: number of world-wide cloud data centers needed; 63%: more Facebook clicks using personality targeting; 2.5 million: red blood cells created per second; 

  • Quotable Quotes:
    • Silicon Valley~ The only reason Gilfoyle and I stayed up 48 f*cking straight hours was to decrease server load, not keep it the same. 
    • Robert Graham: In other words, if the entire Mirai botnet of 2.5 million IoT devices was furiously mining bitcoin, it's total earnings would be $0.25 (25 cents) per day.
    • @BoingBoing: John Deere just told US Copyright office that only corporations can own property, humans merely license it
    • mattbillenstein: Lin Clark's talk makes this sound like they implemented a scheduler in React -- basically JS is single-threaded, so they're implementing their own primitives and a scheduler for executing those on that main thread.
    • Robert M. Pirsig: When analytic thought, the knife, is applied to experience, something is always killed in the process.
    • @vornietom: I honestly feel bad for the people on the Placebo March who thought they were at the Science March but double blind testing is important
    • MIT: we can capture and monitor human breathing and heart rates by relying on wireless reflections off the human body.
    • Mohamed Zahran~ Surprisingly enough traditional homogenous multi-core are really heterogeneous. Why is that? Every core is running at its own frequency. Many processors are now a traditional core and a GPU. FPGAs are already with us. Automata Processor is a specialized processor that can execute non-deterministic finite automata (regular expressions) orders of magnitude faster than a GPU.  Neuromorphic brain inspired chips. Fancy GPUs. 
    • @craigbuj: amazing how fast China Internet companies can scale: ofo: 10+ million daily rides in China Uber: ~6 million daily rides globally
    • knz: CockroachDB's architecture is an emergent property of its source code. 
    • @Jason: Good news: over 70b spent on digital ads in 2016.  Terrifying news: 89% of growth was Facebook & Google. Via @iab
    • @swardley: I think we need to stop thinking about AMZN as a future $1T biz and more think about it as a future $10T biz, possibly much more.
    • @timoreilly: "Algorithms are opinions embedded in code." @mathbabedotorg #TED2017 
    • Google: I think we [Google Cloud] have a pretty good shot at being No. 1 in five years
    • limitless__: Folks who think programmer skill declines when you're 40+ are 100% wrong. What declines is your willingness to put up with stupidity and what increases is your willingness and ability to tell someone to fly a kite when they tell you to work stupid hours and do stupid things.
    • @nicusX: "Don't worry about X. X is transparently managed for you". Reads: "When things go wrong you'll never be able to fix it" #mechanicalSympathy
    • defined: What's up is the rampant ageism in the industry - the perception that you are washed up as a "dinosaur" developer after a certain age, maybe 40 or so, and belong in management. We "dinosaurs" - we happy few - are living evidence to the contrary.
    • user5994461: AWS Spot Instances are under bid. The highest bidder takes the instances, the price changes all the time. Google Spot Instances (preemptibles) are 80% off and that's it. It's simple.
    • James Hamilton: in 10 years, ML will be more than 1/2 the worlds server side footprint.
    • qnovo: if we examine the average capacity in smartphones over the past 5 years, we see that it has grown at about 8% annually. A battery in a 2017 smartphone contains about 40 – 50% more capacity (mAh) than it did in 2012.
    • StorageMojo: Bottom line: the NVRAM market is heating up. And that’s a very good thing for the IT industry.
    • Crazycontini: We need a lot more help to clean up the world’s crypto mess.
    • Pramati Muthalaxe: Irrespective of what Facebook says, all of them have one objective — to get more money out of potential advertisers. That requires a constant decay of your reach.
    • danluu: It looks like, for a particular cache size, the randomized algorithms do better when miss rates are relatively high and worse when miss rates are relatively low,
    • There's just too much. To see all Quotable Quotes please click through to the full article.

  • Is Kubernetes the next OpenStack? The Cloudcast #296. No. The core architecture team for Kubernetes ensures there's a consistency accross the project...

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Sponsored Post: Etleap, Pier 1, Aerospike, Loupe, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Tue, 2017-04-25 16:00

Who's Hiring? 
  • Pier 1 Imports is looking for an amazing Sr. Website Engineer to join our growing team!  Our customer continues to evolve the way she prefers to shop, speak to, and engage with us at Pier 1 Imports.  Driving us to innovate more ways to surprise and delight her expectations as a Premier Home and Decor retailer.  We are looking for a candidate to be another key member of a driven agile team. This person will inform and apply modern technical expertise to website site performance, development and design techniques for Pier.com. To apply please email cmwelsh@pier1.com. More details are available here.

  • Etleap is looking for Senior Data Engineers to build the next-generation ETL solution. Data analytics teams need solid infrastructure and great ETL tools to be successful. It shouldn't take a CS degree to use big data effectively, and abstracting away the difficult parts is our mission. We use Java extensively, and distributed systems experience is a big plus! See full job description and apply here.

  • Advertise your job here! 
Fun and Informative Events
  • DBTA Roundtable OnDemand Webinar: Leveraging Big Data with Hadoop, NoSQL and RDBMS. Watch this recent roundtable discussion hosted by DBTA to learn about key differences between Hadoop, NoSQL and RDBMS. Topics include primary use cases, selection criteria, when a hybrid approach will best fit your needs and best practices for managing, securing and integrating data across platforms. Brian Bulkowski, CTO and Co-founder of Aerospike, presented along with speakers from Cask Data and Splice Machine. View now.

  • Advertise your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • Etleap provides a SaaS ETL tool that makes it easy to create and operate a Redshift data warehouse at a small fraction of the typical time and cost. It combines the ability to do deep transformations on large data sets with self-service usability, and no coding is required. Sign up for a 30-day free trial.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Working on a software product? Clubhouse is a project management tool that helps software teams plan, build, and deploy their products with ease. Try it free today or learn why thousands of teams use Clubhouse as a Trello alternative or JIRA alternative.

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Categories: High Scalability

Stuff The Internet Says On Scalability For April 21st, 2017

High Scalability - Fri, 2017-04-21 16:06

Hey, it's HighScalability time:

 

Which do you see: Machines freeing people? Lost jobs? Slavery? Hyperactive Skittles?
If you like this sort of Stuff then please support me on Patreon.
  • year 1899: “Nobody has to use the Internet”; 12MPH: Speed news of Lincoln's assassination traveled the US; $200 million: Lyft tips; 500: data structures and algorithms interview questions; %0.00244140625: Odds of 13 straight male Dr. Who regens; 100: gigafactories could power the world; 100K: bots on Messenger; 1 million: containers Netflix lanched in one week; 5.2 trillion: 2014 US revenue; 52,129: iterations to converge on NFL schedule; 36 Gbps: Facebook's network in the sky; 

  • Quotable Quotes:
    • @mipsytipsy: "That doesn't sound hard. I could build that in a weekend."
    • @Noahpinion: The Elon Musk Future is the good future. The Peter Thiel Future is the bad future. But honestly you'll probably get the Jeff Bezos Future.
    • @BenedictEvans: In 2007 Google, Apple, Facebook & Amazon had maybe 50k staff between them. Today it's more like 400k.
    • @AWSonAir: @Expedia inserting 70,000 rows per second of hotel data with Amazon Aurora.
    • @swardley: STOP! If you're thinking of moving to cloud today (as in IaaS), you are so late that you need to consider moving to serverless ->
    • David Rosenthal: Silicon Valley would not exist but for Ph.D.s leaving research to create products in industry.
    • @cmeik: Distributed applications today treat the database like shared memory, and that's why we love things like Spanner.  This is a flawed design.
    • @Jason: Apple's cash hoard swells to five Teslas / four Ubers / 25 Twitters
Categories: High Scalability

Stuff The Internet Says On Scalability For April 14th, 2017

High Scalability - Fri, 2017-04-14 16:03

Hey, it's HighScalability time:

 

After 20 years, Cassini will not go gently into that good night, it will burn and rave at close of day. (nasa)
If you like this sort of Stuff then please support me on Patreon.
  • 10^15: synapses activated per second in human brain (2/3rds fail); $4.5B: Amazon spend on video (Netflix $6 billion); 22,000: AWS database migrations served; ~15%: Dropbox reduced CPU usage using Brotli; $3.5 trillion: IT spending in 2017; 10%: reduction in QoQ hard drive shipments; 33.3%: Nginx share of webserver market; 37.2 trillion: human cells in a Cell Atlas; 6.2 miles: journey to the center of the earth; 200: lines of code for blockchain; 95%: Wikipedia pages end up at philosophy; 1.2 billion: Messenger monthly users; 

  • Quotable Quotes:
    • Jeff Bezos: Day 2 is stasis. Followed by irrelevance. Followed by excruciating, painful decline. Followed by death. And that is why it is always Day 1.
    • Bob Schmidt: If debugging is the process of removing errors from a design, then designing must be the process of putting errors into a design!
    • @swardley: the gap between where the cutting edge is and where the majority are just seems to increase year on year.
    • Riot Games: We need to provide resources when it's time to grow, we need to react when it gets sick, and we need to do it all as fast as possible at a global scale.
    • masklinn: High-performance native code already does these specialisation, generally on a per-project basis (some projects include multiple allocators for different bits of data), and possibly using a non-OS allocator in the first place
    • @erikbryn: MT: @DKThomp : there are 950k warehouse workers —6X the number of steel workers and miners combined
    • Joeri: The challenge of a rewrite is not in mapping the core architecture and core use case, it's mapping all the edge cases and covering all the end user needs. You need people intimately familiar with the old system to make sure the new system does all the weird stuff which nobody understood but had good reasons that was in the corners of the old system's code. 
    • @redblobgames: 2016 GDC Diablo talk: let's switch from turn-based to real-time 2017 GDC Civilization talk: let's switch from real-time to turn-based
    • @random_walker: Encrypted traffic has a fingerprint—enough to distinguish among 200 Netflix vids with 99.5% accuracy in < 2.5 mins.
    • Sophie Wilson: You’re going to buy a 10-way, 18-way multi-core processor that’s the latest, all because we told you you could buy it and made it available, and we’re going to turn some of those processors off most of the time. So you’re going to pay for logic and we’re going to turn it off so you can’t use it.
    • qq66: But is there anything more personal than a computer programmer writing a bot to send messages for him?
    • Anu Hariharan: Unlike other social products, WeChat does not only measure growth by number of users or messages sent. Instead they also focus on measuring how deeply is the product engaged in every aspect of daily life (e.g., the number of tasks WeChat can help with in a day).
    • @fredwilson: "The real issue here is Facebook’s market power. And we face similar market power issues in search (Google) and commerce (Amazon)"
    • There are so many quotable quotes I couldn't include them all here. Click through to read the full article.

  • Luna Duclos on Game Development and Rebuilding Microservices. Switching from PHP/Python to Go. Go is much faster and uses less CPU. As big as the switch to Go is the switch from Google App Engine to VMs. GAE servers are small and CPU constrained despite the relatively high cost. Their Go cluster runs in the Google Cloud on Google Container Engine.

  • Werner Against the Machine. Wait, aren't you the machine now?

  • Kwabena Boahe on Stanford Seminar: Neuromorphic Chips: Addressing the Nanostransistor Challenge. A dollar bought more and more transistors until 2014, when for the first time the price for transistors went up. Fundamental constraints at the physical level is the cause. The challenge is to continually shrink the footprint of the transistor so it occupies less space. A traffic metaphor is used to explain the difficulty of continually shrinking transistors. Shrinking gives you fewer lanes and electrons can block a lane by being trapped in a pothole. When you get down to one lane and electron is trapped the current flows slowly. Our brains work with ultimately scaled devices...

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Sponsored Post: Pier 1, Aerospike, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Tue, 2017-04-11 16:05

Who's Hiring? 
  • Pier 1 Imports is looking for an amazing Sr. Website Engineer to join our growing team!  Our customer continues to evolve the way she prefers to shop, speak to, and engage with us at Pier 1 Imports.  Driving us to innovate more ways to surprise and delight her expectations as a Premier Home and Decor retailer.  We are looking for a candidate to be another key member of a driven agile team. This person will inform and apply modern technical expertise to website site performance, development and design techniques for Pier.com. To apply please email cmwelsh@pier1.com. More details are available here.

  • Etleap is looking for Senior Data Engineers to build the next-generation ETL solution. Data analytics teams need solid infrastructure and great ETL tools to be successful. It shouldn't take a CS degree to use big data effectively, and abstracting away the difficult parts is our mission. We use Java extensively, and distributed systems experience is a big plus! See full job description and apply here.

  • Advertise your job here! 
Fun and Informative Events
  • DBTA Roundtable OnDemand Webinar: Leveraging Big Data with Hadoop, NoSQL and RDBMS. Watch this recent roundtable discussion hosted by DBTA to learn about key differences between Hadoop, NoSQL and RDBMS. Topics include primary use cases, selection criteria, when a hybrid approach will best fit your needs and best practices for managing, securing and integrating data across platforms. Brian Bulkowski, CTO and Co-founder of Aerospike, presented along with speakers from Cask Data and Splice Machine. View now.

  • Advertise your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • Etleap provides a SaaS ETL tool that makes it easy to create and operate a Redshift data warehouse at a small fraction of the typical time and cost. It combines the ability to do deep transformations on large data sets with self-service usability, and no coding is required. Sign up for a 30-day free trial.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Working on a software product? Clubhouse is a project management tool that helps software teams plan, build, and deploy their products with ease. Try it free today or learn why thousands of teams use Clubhouse as a Trello alternative or JIRA alternative.

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Categories: High Scalability

Five things we’ve learned about monitoring containers and their orchestrators

High Scalability - Mon, 2017-04-10 15:56

This is a guest post by Apurva Davé, who is part of the product team at Sysdig.

Having worked with hundreds of customers on building a monitoring stack for their containerized environments, we’ve learned a thing or two about what works and what doesn’t. The outcomes might surprise you - including the observation that instrumentation is just as important as the application when it comes to monitoring.

In this post, I wanted to cover some details around what it takes to build a scale-out, highly reliable monitoring system to work across tens of thousands of containers. I’ll share a bit about what our infrastructure looks like, the design choices we made, and tradeoffs. The five areas I’ll cover:

  • Instrumenting the system

  • Relating your data to your applications, hosts, and containers.

  • Leveraging orchestrators

  • Deciding what to data to store

  • How to enable troubleshooting in containerized environments

For context, Sysdig is the container monitoring company. We’re based on the open source Linux troubleshooting project by the same name. The open source project allows you to see every single system call down to process, arguments, payload, and connection on a single host. The commercial offering turns all this data into thousands of metrics for every container and host, aggregates it all, and gives you dashboarding, alerting, and an htop-like exploration environment.

Ok, let’s get into the details, starting with the impact containers have had on monitoring systems.

Why do containers change the rules of the monitoring game?
Categories: High Scalability

Stuff The Internet Says On Scalability For April 7th, 2017

High Scalability - Fri, 2017-04-07 15:56

Hey, it's HighScalability time:

 

Visualization of the magic system behind software infrastructure. (eyezmaze@ThePracticalDev
If you like this sort of Stuff then please support me on Patreon.
  • 10-20: aminoacids can be made per second; 64800x: faster DDL Aurora vs MySQL; 25 TFLOPS: cap for F1 simulations; 15x to 30x: Tensor Processing Unit faster than GPUs and CPUs; 100 Million: Intel transistors per square millimeter; 25%: Internet traffic generated by Google; $1 million: Tim Berners-Lee wins Turing Award; 43%: phones FBI couldn't open because of crypto;

  • Quotable Quotes:
    • @adulau: To summarize the discussions of yesterday. All tor exit nodes are evil except the ones I operate.
    • @sinavaziri: Let's say a data center costs $1-2B. Then the TPU saved Google $15-30B of capex?
    • Vinton G. Cerf: While it would be a vast overstatement to ascribe all this innovation to genetic disposition, it seems to me inarguable that much of our profession was born in the fecund minds of emigrants coming to America and to the West over the past century.
    • Alan Bundy: AI systems are not just narrowly focused by design, because we have yet to accomplish artificial general intelligence, a goal that still looks distant. 
    • JamesBarney: Soo much this, just worked on a project that sacrificed reliability, maintainability, and scalability to use a real time database to deal with loads that were on the order of 70 values or 7 writes a second.
    • bobdole1234: 3.5x faster than CPU doesn't sound special, but when you're building inference capacity by the megawatt, you get a lot more of that 3.5x faster TPU inside that hard power constraint.
    • Eugenio Culurciello: As we have been predicting for 10 years, in SoC you can achieve > 10x more performance that current GPUs and > 100x more performance per watt.
    • Google: The TPU’s deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. 
    • visarga: TPU excited me too at first, but when I realized that it is not related to training new networks (research) and is useful only for large scale deployment, I toned down my enthusiasm a little. 
    • Julian Friedman: Kube is being designed by system administrators who like distributed systems, not for programmers who want to focus on their apps.
    • shadowmint: Given what I've seen, I'd argue that clojure has an inherent complexity that results in poor code quality outcomes during the software maintenance cycle.
    • weberc2: I like Go, but it's not dramatically faster than Java. Any contest between the two of them will probably just be a back and forth of optimizations. They share pretty much the same upper bound.
    • adrianratnapala: All this means is that we should stop thinking of this stuff as RAM. Only the L1 cache is really RAM. Everything else is just a kind of fast, volatile, solid state disk that just happens to share an address space with the RAM.
    • pbreit: Getting a million users is infinitely harder than scaling a system to handle a million users. Most systems could run comfortably on a Raspberry Pi.
    • @sustrik: If you want your protocol to be fully reliable in the face of either peer shutting down, the terminal handshake has to be asymmetric. As we've seen above, TCP protocol has symmetric termination algorithm and thus can't, by itself, guarantee full reliability.
    • @damonedwards: Unit tests are critical for good dev, but aren't really ops concern. Integration tests are critical for good ops. Ops wants more int tests.
    • mannigfaltig: the brain appears to spend about 4.7 bits per synapse (26 discernible states, given the noisy computation environment of the brain); so it seems to be plenty enough for general intelligence. This could, of course, merely be a biological limit and on silicon more fine-grained weights might be the optimum.
    • marwanad: The main power of GraphQL is for client developers and lies in the decoupling it provides between the client and server and the ability to fulfill the client needs in a single round trip. This is great for mobile devices with slower networks.
    • kyleschiller: As a pretty good rule of thumb, a system that fails 1/nth of the time and has n opportunities to fail has ~.63 probability of failure, where n is more than ~10.
    • jjirsa: databases aren't where you want to have hipster tech. You want boring things that work. For me, Cassandra is the boring thing that works. 
    • @etherealmind: "rule #1 of Enterprise IT: easier to spend 10 million on equipment than 100k for a person. A third person would increase capacity by 30%"
    • @SwiftOnSecurity: “Just pick a good VPN” is like telling thirsty people to “go to a store and drink clear liquid.” They drank bleach, but at least you helped.
    • falsedan: There's 2 secrets to scaling to millions of users: 1. You aren't going to have millions of users so any work you do to support it is stopping you from delivering features that will make your existing 10 clients happier. 2. Write code that can be replaced (i.e. design for change). 
    • X86BSD: Have you tested running it on a FreeBSD box with ZFS? It has lz4 compression by default and makes such a great storage solution for PG. You get compression, snapshots, replication (not quite realtime but close), self healing, etc etc in a battled hardened and easy to manage filesystem and storage manager. I've found you can't beat ZFS and PG for most applications. Edge cases exist of course everywhere.

  • Worried about too much infrastructure? Only 2% of DNA codes for proteins, the other 98% codes for RNA. Harry Noller Lecture. Maybe lots of infrastructure is not a bad thing. One of they key differences in programming and biology is how in biology form completely determines function. Just amazing to watch in action: mRNA Translation (Advanced). Programming is the complete opposite.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

Stuff The Internet Says On Scalability For March 31st, 2017

High Scalability - Fri, 2017-03-31 15:56

Hey, it's HighScalability time:

 

What lies beneath? Networks...of blood vessels. (Wellcome Image Awards)
If you like this sort of Stuff then please support me on Patreon.
  • 5000: node (150,000 pod) clusters in Kubernetes 1.6; 15 years: time to @spacex launch with a recycled rocket booster; 174 mbps: Internet speed in Dublin; 10 nm: Intel’s new Moore approved process; 30 minutes: to create Samsung's S8; 50 billion: of your cells replaced each day; 2 million: new red blood cells per second; 3dbm: attenuation of human body, same as a wall; 12: hours of tardis sounds; 350: pages to stop a bullet; 2: meters of DNA pack in a space .000006m wide; 

  • Quotable Quotes:
    • @swardley: Having met many "leaders" in technology & business, I wouldn't bet on the future survival of humanity. If anything AI might help the odds
    • Francis Pouliot: Any contentious hard fork of the Bitcoin blockchain shall be considered an alternative cryptocurrency (altcoin), regardless of the relative hashing power on the forked chain.
    • @coda: WhatsApp: 900M users, built w/ < 35 devs, using #erlang Krispy Kreme: 1004 locations, 3700 employees, original glazed is 190 #calories
    • @BenedictEvans: Still think it's interesting Instagram shifted emphasis from interests to friends. Is that a law of nature for social if you want scale?
    • @johnrobb: "each robot per thousand workers decreased employment by 6.2 workers and wages by 0.7 percent"
    • Alex Woodie: The Hadoop dream of unifying data and compute in a distributed manner has all but failed in a smoking heap of cost and complexity, according to technology experts and executives who spoke to Datanami.
    • @RichRogersIoT: "First you learn the value of abstraction, then you learn the cost of abstraction, then you are ready to engineer." - @KentBeck
    • @codemanship: Don't explain code quality to execs. Explain high cost of change. Explain slowing down of innovation. Explain longer cycle times.
    • @malwareunicorn: Bad malware pickup lines: Hey girl, I heard you like sandboxes. I would never try to escape yours ;)
    • dkhenry: The selling of data isn't the policy you need to fight. The monopoly power of ISP's is the problem you must push back on. 
    • @MaxWendkos: An SEO expert walks into a bar, bars, pub, tavern, public house, Irish pub, drinks, beer, alcohol
    • Barry Lampert: the point of Amazon isn't to offer a consumer the absolute lowest price possible; it's to offer the lowest price possible given the convenience that Amazon offers
    • Daniel Lemire: Let us make the statement precise: Most performance or memory optimizations are useless.
    • @sarahmei: People run into trouble with DRY because it doesn't tell you *what* not to repeat. People assume syntax, but it's actually concepts.
    • Dan Rayburn: China suffers from 9.2% transfer failure rate (similar to Malaysia, India and Brazil), and a high packet loss.  These two parameters have severe impact on content download time and overall performance.
    • Daniel Lemire: I submit to you that it is no accident if the StackOverflow list of top-paying programming languages is made of obscure languages. They are comparing the average of a niche against the average of a large population
    • For even more Quotable Quotes please click through to the main article.

  • For good WiFi you don't necessarily need one big powerful router bristling with antenna like a radiation mutated ant. 802.eleventy what? A deep dive into why Wi-Fi kind of suck and New Screen Savers (@20 min). You want a true mesh network (Plume). WiFi should whisper, use 5G to create pools of WiFi in each room so signals don't penetrate between rooms. Lots of little access points can automatically find a path through your house. Use a wired backhaul for best performance. Raw throughput isn't the best measure. How does it perform with many people using many devices? Roaming isn't always well supported. Consider how well the system hands-off devices as you walk through the house. 

  • BloomCON 2017 Videos are now available. You might like Honey, I Stole Your C2 [Command-and-control] Server: A dive into attacker infrastructure.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: High Scalability

How to speed up your MySQL with replication to in-memory database

High Scalability - Wed, 2017-03-29 15:56

Original article available at https://habrahabr.ru/company/mailru/blog/323870/

I’d like to share with you an article based on my talk at Tarantool Meetup(the video is in Russian, though). It’s a short story of why Mamba, one of the biggest dating websites in the world and the largest one in Russia, started using Tarantool. Why did we decide to busy ourselves with MySQL-to-Tarantool replication?

First, we had to migrate to MySQL 5.7 at some point, but this version didn’t have HandlerSocket that was being actively used on our MySQL 5.6 servers. We even contacted the Percona team — and they confirmed MySQL 5.6 is the last version to have HandlerSocket.

Second, we gave Tarantool a try and were pleased with its performance. We compared it against Memcached as a key-value store and saw the speed double from 0.6 ms to 0.3 ms on the same hardware. In relative terms, Tarantool’s twice as fast as Memcached. In absolute terms, it’s not that cool, but still impressive.

Third, we wanted to keep the whole existing architecture. There’s a MySQL master server and its slaves — we didn’t want to change anything in this structure. Can MySQL 5.6 slaves with HandlerSocket be replaced with something else without having to make significant architectural changes?

We learned that the Mail.Ru Group team has a replicator they created for their own purposes. The idea of replicating data from MySQL to Tarantool belongs to them. We asked the team to share the source code, which they did. We had to rewrite the code, though, since it worked with MySQL 5.1 and Tarantool 1.5, not 1.7. The replicator uses libslave, an open-source solution for reading events from a MySQL master server, and is built statically without any of MySQL’s system libraries. It’s been open-sourcedunder the BSD license, so anyone can use it for free.

Replication constraints
Categories: High Scalability

Sponsored Post: ButterCMS, Aerospike, Loupe, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

High Scalability - Wed, 2017-03-29 15:56

Who's Hiring? 
  • Etleap is looking for Senior Data Engineers to build the next-generation ETL solution. Data analytics teams need solid infrastructure and great ETL tools to be successful. It shouldn't take a CS degree to use big data effectively, and abstracting away the difficult parts is our mission. We use Java extensively, and distributed systems experience is a big plus! See full job description and apply here.

  • Advertise your job here! 
Fun and Informative Events
  • Analyst Webinar: Forrester Study on Hybrid Memory NoSQL Architecture for Mission-Critical, Real-Time Systems of Engagement. Thursday, March 30, 2017 | 11 AM PT / 2 PM ET. In today’s digital economy, enterprises struggle to cost-effectively deploy customer-facing, edge-based applications with predictable performance, high uptime and reliability. A new, hybrid memory architecture (HMA) has emerged to address this challenge, providing real-time transactional analytics for applications that require speed, scale and a low total cost of ownership (TCO). Forrester recently surveyed IT decision makers to learn about the challenges they face in managing Systems of Engagement (SoE) with traditional database architectures and their adoption of an HMA. Join us as our guest speaker, Forrester Principal Analyst Noel Yuhanna, and Aerospike’s VP Marketing, Cuneyt Buyukbezci, discuss the survey results and implications for your business. Learn and register

  • Advertise your event here!
Cool Products and Services
  • Etleap provides a SaaS ETL tool that makes it easy to create and operate a Redshift data warehouse at a small fraction of the typical time and cost. It combines the ability to do deep transformations on large data sets with self-service usability, and no coding is required. Sign up for a 30-day free trial.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • ButterCMS is an API-based CMS that seamlessly drops into your app or website. Great for blogs, dynamic pages, knowledge bases, and more. Butter works with any language/framework including Ruby, Rails, Node.js, .NET, Python, Django, Flask, React, Angular, Go, PHP, Laravel, Elixir, Phoenix, and Meteor.

  • Working on a software product? Clubhouse is a project management tool that helps software teams plan, build, and deploy their products with ease. Try it free today or learn why thousands of teams use Clubhouse as a Trello alternative or JIRA alternative.

  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Categories: High Scalability

Faster Networks + Cheaper Messages => Microservices => Functions => Edge

High Scalability - Mon, 2017-03-27 16:25

When Adrian Cockroft—the guy who helped put the loud in Cloud through his energetic evangelism of Cloud Native and Microservice architectures—talks about what’s next, it pays to listen. And you can listen, here’s a fascinating forward looking talk he gave at microXchg 2017: Shrinking Microservices to Functions. It’s typically Cockroftian: understated, thoughtful, and full of insight drawn from experience.

Adrian makes a compelling case that the same technology drivers, faster networking and cheaper messaging, that drove the move to Microservices are now driving the move to Functions.

The payoffs are all those you’ve no doubt heard about Serverless for some time, but Adrian develops them in an interesting way. He traces how architectures have evolved over time. Take a look at my gloss of his talk for more details.

What’s next after Functions? Adrian talks about pushing Lambda functions to the edge. A topic I’m excited about and have been interested in for sometime, though I didn’t quite see it playing out like this.

Datacenters disappear. Functions are not running in an AWS region anymore, code is placed near the customer using a CDN at CDN endpoints. Now you have a fully distributed, at the edge, low latency, milliseconds from the customer way of running code. Now you can build architectures that are partly in the datacenter, partly at the edge, and partly at the customer premises. And since this is AWS, it’s all, of course, built around Lambda. AWS Greengrass and Snowball Edge are peeks into what the future might look like.

There’s a hidden tension here. Once you put code at the edge you violate two of Lambda’s key assumptions: functions are composed using scalable backend services; low latency messaging. The edge will have a high latency path back to services in the datacenter, so how do you make a function based distributed application at the edge? Does edge computing argue for a more retro architecture with fewer messages back to a more monolithic core?

Or does edge computing require something completely different? Here’s one thought as to what that something completely different might look like: Datanet: A New CRDT Database That Let's You Do Bad Bad Things To Distributed Data.

Now, let’s see the future by first taking a tour of the past….

From Monoliths, to Microservices, to Functions
Categories: High Scalability
Syndicate content