tag:blogger.com,1999:blog-89775746026340676962024-02-07T13:44:02.228-08:00Anton KharenkoAntonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-8977574602634067696.post-1652360621578147022015-09-29T10:57:00.000-07:002015-10-12T11:38:53.384-07:00SWIM Distributed Group Membership Protocol<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="text-align: left;">
Overview</h2>
<div>
<br /></div>
Many distributed applications require reliable and scalable implementation of membership protocol. Membership protocol provides each process (“member”) of the group with a locally-maintained list of other non-faulty processes in the group. The protocol ensures that the membership list is updated with changes resulting from new members joining the group, or dropping out either voluntarily or through a failure. The SWIM (Scalable Weakly-consistent Infection-style Process Group Membership Protocol) is motivated by unscalability of traditional heart-beating protocols, which either impose network loads that grow quadratically with group size, or compromise response times or false positive frequency of failure detection. It is aiming to implement a membership protocol that provides stable failure detection time, stable rate of false positives and low message load per group member, thus allowing distributed applications that use it to scale well. <br />
<br />
Applications which rely on reliability and scalability of the distributed membership maintenance protocol: <br />
<ul style="text-align: left;">
<li>Reliable multicast</li>
<li>Epidemic-style information dissemination (gossip)</li>
<li>Distributed databases</li>
<li>Publish-subscribe systems</li>
<li>Large scale peer-to-peer systems</li>
<li>Large scale cooperative gaming</li>
<li>Other collaborative distributed applications</li>
</ul>
<br />
Requirements to protocol: <br />
<ul style="text-align: left;">
<li>Low probability of false positive detection of process fail</li>
<li>Do not rely on central server</li>
<li>Low network and computational load</li>
<li>Low time to detect failure</li>
<li>Weakly consistent</li>
</ul>
Heartbeating-based protocol do not fit well under such requirements since it suffers from scalability limitations. The unscalability of the popular class of all-to-all heartbeating protocols arises from the implicit decision to join two principal functions of the membership problem: <br />
<ul style="text-align: left;">
<li>Membership update dissemination</li>
<li>Failure detection</li>
</ul>
<br />
SWIM address non-scalability problems of traditional protocols for membership maintenance by: <br />
<ul style="text-align: left;">
<li>Designing the failure detection and membership update dissemination components separately</li>
<li>Using a non-heartbeat based strategy for failure detection. It uses random-probing based failure detector protocol.</li>
</ul>
<br />
Properties of SWIM protocol: <br />
<ol style="text-align: left;">
<li>Imposes a constant message load per group member;</li>
<li>Detects a process failure in an expected constant time at some non-faulty process in the group;</li>
<li>Provides a deterministic bound (as a function of group size) on the local time that a non-faulty process takes to detect failure of another process;</li>
<li>Propagates membership updates, including information about failures, in gossip-style; the dissemination latency in the group grows slowly (logarithmically) with the number of members;</li>
<li>Provides a mechanism to reduce the rate of false positives by “suspecting” a process before “declaring” it as failed within the group.</li>
</ol>
While 1-2 are properties of the used failure detection protocol, 3-5 represents properties introduced in membership protocol.<br />
<br />
<h2 style="text-align: left;">
SWIM Protocol Design</h2>
<br />
SWIM protocol consists of two main components: <br />
<ul style="text-align: left;">
<li>Failure Detector Component</li>
<li>Dissemination Component</li>
</ul>
<br />
<b>Failure Detector Component</b><br />
<br />
Quality of service <a href="http://www.antonkharenko.com/2015/07/quality-of-service-of-failure-detectors.html">properties of failure detection</a>: <br />
<ul style="text-align: left;">
<li>Strong completeness</li>
<li>Speed of failure detection</li>
<li>Accuracy</li>
<li>Network message load</li>
</ul>
It is proved that strong completeness and strong accuracy is impossible to achieve at the same time over asynchronous unreliable network. However, since a typical distributed application relies on Strong Completeness (in order to maintain up to date information in dynamic groups), most failure detectors, including heartbeating-based solutions, guarantee this property while attempting to maintain a low rate of false positives. SWIM takes the same approach. <br />
<br />
SWIM uses Failure detection protocol described at [<a href="http://www.antonkharenko.com/2015/08/scalable-and-efficient-distributed.html">2</a>] (see Fig. 1). <br />
<br />
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQKs3ekX6xyDaPnqUgeBY4zv36KYc7Uxs8FbdPNeLGfyc5Yx0NgtHg3LgIYlS0GFBe2J_5nVllxe0KXMOZOItKFAxJu2JxOu6WV8-r-mMAWmTxA4W_my1AvA_EGtHQRtuuRU_g1aVlLI4/s1600/Screen+Shot+2015-09-29+at+13.35.36.png"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQKs3ekX6xyDaPnqUgeBY4zv36KYc7Uxs8FbdPNeLGfyc5Yx0NgtHg3LgIYlS0GFBe2J_5nVllxe0KXMOZOItKFAxJu2JxOu6WV8-r-mMAWmTxA4W_my1AvA_EGtHQRtuuRU_g1aVlLI4/s320/Screen+Shot+2015-09-29+at+13.35.36.png" /></a></div>
<br />
<div style="text-align: center;">
Fig. 1 SWIM failure detection: Example protocol period at Mi. This shows all the possible messages that a protocol period may initiate. </div>
<br />
<b>Dissemination Component</b><br />
<br />
Basic implementation of Membership Dissemination component consist of: <br />
<ul style="text-align: left;">
<li>On detecting failed member by failure detector process multicast it to the group. Other members delete failed member.</li>
<li>Joined or voluntarily leaving member multicast it in the same way. But on join need to know at least one contact member of the group. </li>
</ul>
In order for a new member to join the group it can use one of the following approaches: <br />
<ul style="text-align: left;">
<li>If the group is associated with a well known server or IP multicast address, all joins could be directed to the associated address.</li>
<li>In the absence of such infrastructure, join messages could be broadcasted. Group members will reply with some probability to this broadcast request.</li>
</ul>
<br />
The basic implementation described above shows the general idea of the protocol, but in order to make protocol robust and fault tolerant following improvements may be applied: <br />
<ul style="text-align: left;">
<li>Do not use multicast to disseminate membership updates, but spread membership information by piggybacking it on top of failure detection protocol (ping, ping-req and ack messages). This approach results in an infection-style of membership updates dissemination, with the associated benefits of robustness to packet losses, and of low latency.</li>
<li>Use suspicion mechanism. Process that unresponsive to ping marked as “suspected”. Suspected member treated as normal for ping selection, but if respond then spread alive information. Only after predefined timeout suspected member declared as faulty. This timeout effectively trades off an increase in failure detection time for a reduction in frequency of false failure detections. Suspicion mechanism is applicable to any system with the distinct Failure Detector and Membership Dissemination components.</li>
<li>Basic implementation of SWIM protocol guarantee only eventual detection of the failure. Round-Robin probe target selection will guarantee Time Bounded Completeness property; the time interval between the occurrence of a failure and its detection at member Mj is no more than two times the group size in number of protocol periods. This can be solved by the following modification to the protocol. The failure detection protocol at member Mi selecting ping targets not randomly from the list of members, but in a round robin fashion. A newly joining member is inserted in the membership list at a position that is chosen uniformly at random. On completing a traversal of the entire list, Mi rearranges the membership list to a random reordering.</li>
</ul>
<div>
<br /></div>
In order to implement gossip-style membership update dissemination each group member Mi maintains a buffer of recent membership updates with count of each buffered element how much it was piggybacked (sent) so far by Mi and is used to choose which element to piggyback next. Each membership update is piggybacked at most some a*log(n) [e.g. 3 * |log(n + 1)|] in order to ensure reliable arrival of gossip to each member of the group. <br />
<br />
SWIM can be extended to a Wide-area network (WAN) or a virtual private network (VPN), by weighing ping target choices with topological information, thus reducing bandwidth usage on core network elements inside the network. <br />
<br />
<h2 style="text-align: left;">
SWIM Implementations</h2>
<b><br /></b>
<b>Consul</b><br />
<br />
<a href="https://www.consul.io/">Consul</a> is a distributed, highly available, and datacenter-aware service discovery and configuration management system. It <a href="https://www.consul.io/docs/internals/gossip.html">uses SWIM</a> as a basis for implementing Gossip Protocol and for managing membership information. <br />
<br />
<b>ScaleCube</b><br />
<br />
<a href="https://github.com/scalecube/scalecube">ScaleCube</a> (<b>disclaimer:</b> I am one of the authors) is an open source microservices framework for a rapid development of a distributed, resilient, reactive applications that scales. It uses a variant of SWIM protocol implementation in Java and uses it as a basis for managing the cluster of connected microservices. It uses suspicion mechanism over the failure detector events and also separate membership updates dissemination component. But it introduces separate Gossip Protocol component instead of piggybacking membership updates on top of failure detector messages. It is done in order to reuse gossip component for other platform events and have more fine grained control over time intervals used for gossiping and failure detection pinging. New members to the cluster joins via the configuration provided seed members addresses (it is an analogous of well known servers described above). And also it extends SWIM protocol with the introduction of periodic SYNC messages in order to improve recovery from network partitioning and message losses.<br />
<br />
<b>Reference:</b><br />
<ol style="text-align: left;">
<li><a href="http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf">SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol (2002)</a></li>
<li><a href="http://www.antonkharenko.com/2015/08/scalable-and-efficient-distributed.html">Scalable and Efficient Distributed Failure Detectors</a></li>
<li><a href="http://www.antonkharenko.com/2015/07/quality-of-service-of-failure-detectors.html">Quality of Service of Failure Detectors</a></li>
<li><a href="https://www.consul.io/">Consul</a></li>
<li><a href="https://github.com/scalecube/scalecube">ScaleCube</a></li>
</ol>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-7384904540012273872015-09-10T12:40:00.000-07:002015-09-10T12:44:16.352-07:00Time, Clocks and Ordering in a Distributed System<div dir="ltr" style="text-align: left;" trbidi="on">
A distributed system consist of a collection of distinct processes which are spatially separated, and which communicate via messaging. Here considered the concept of events ordering in a distributed systems. First, discussed the partial ordering defined by "happened before" relation and give a distributed algorithm for extending it to a consistent total ordering of all events.<br />
<div>
<br />
<h2 style="text-align: left;">
Partial Ordering</h2>
<br />
The relation <b>"happened before"</b> (->) on the set of events of a system is the smallest relation satisfying the following three conditions: <br />
<ul style="text-align: left;">
<li>If a and b are events in the same process, and a comes before b, then a -> b.</li>
<li>If a is the sending of a message by one process and b is the receipt of the same message by another process, then a -> b.</li>
<li>If a -> b and b -> c then a -> c.</li>
</ul>
Two distinct events a and b are said to be <b>concurrent</b> if a !-> b and b !-> a. <br />
<br />
The "happens before" relation describes <b>partial ordering</b> of the events in the system. <br />
<br />
Logical Clock can be represented by the following condition (it doesn't specifically means physical clock):</div>
<div>
<br />
<div>
<b>Clock Condition.</b> For any events a, b: if a -> b then C(a) < C(b). </div>
<div>
<br /></div>
<div>
Note that converse condition doesn't hold so if C(a) < C(b) then we cannot expect that a -> b and two cases are possible either a -> b or a and b are concurrent events. Which means that by knowing just the clock values you cannot distinguish concurrent events. <br />
<br />
From the definition of "happens before", Clock Condition is satisfied if the following two conditions hold:<br />
<ul style="text-align: left;">
<li>C1. If a and b are events in process Pi, and a comes before b, then Ci(a) < Ci(b).</li>
<li>C2. If a is the sending of a message by process Pi and b is the receipt of that message by process Pj, then Ci(a) < Cj(b).</li>
</ul>
<br />
Implementation rules: <br />
<ul style="text-align: left;">
<li>IR1. Each process Pi increments Ci between any two successive events.</li>
<li>IR2. (a) If event a is the sending of a message m by process Pi, then the message m contains a timestamp Tm = Ci(a). (b) Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm.</li>
</ul>
<br />
Java implementation of Clock Condition as an illustration to this post can be found at <a href="https://github.com/antonkharenko/logical-clocks">Logical Clocks</a> repository. The base classes are <a href="https://github.com/antonkharenko/logical-clocks/blob/blog-references/src/main/java/com/antonkharenko/logicalclocks/LogicalTimestamp.java">LogicalTimestamp</a> which represent specific moment of time and <a href="https://github.com/antonkharenko/logical-clocks/blob/blog-references/src/main/java/com/antonkharenko/logicalclocks/LogicalClock.java">LogicalClock</a> class which provides thread-safe methods to store and update time according to implementation rules and conditions described above during events at each process of distributed system. </div>
<div>
<br />
<h2 style="text-align: left;">
Total Ordering</h2>
<br />
We can use a system of clocks that satisfying Clock Condition to define a <b>total ordering</b> (=>). <br />
If a is an event in process Pi and b is an event in process Pj, then a => b if and only if either: <br />
<ul style="text-align: left;">
<li>(i) Ci(a) < Cj(b) or </li>
<li>(ii) Ci(a) = Cj(b) and Pi < Pj. </li>
</ul>
<br />
We can use use the concept of total ordering to solve simple synchronization problem. We wish to find the algorithm for granting access to shared resource to the process which satisfy following conditions: <br />
<ul style="text-align: left;">
<li>(I) A process which has been granted resource must release it before it can be granted to another process;</li>
<li>(II) Different requests for the resource must be granted in the order in which they are made;</li>
<li>(III) If every process which is granted the resource eventually release it, then every request is eventually granted.</li>
</ul>
<br />
Algorithm which satisfy conditions I-III and uses concept of total ordering can be described with the following 5 rules: <br />
<ol style="text-align: left;">
<li>To request the resource, process Pi sends the request resource message Tm:Pi to every other process, and puts that message on its request queue, where Tm is the timestamp of the message. </li>
<li>When process Pj receives the request resource message Tm:Pi, it places it on its request queue and sends a (timestamped) acknowledgment message to Pi. </li>
<li>To release the resource, process Pi removes any Tm:Pi request resource message from its request queue and sends a (timestamped) Pi release resource message to every other process.</li>
<li>When process Pj receives a Pi release resource message, it removes any Tm:Pi request resource message from its request queue.</li>
<li>Process Pi is granted the resource when the following two conditions are satisfied: (i) There is a Tm:Pi request resource message in its request queue which is ordered before any other request in its queue by the total ordering relation (=>); (ii) Pi has received a message from every other process timestamped later than Tm.</li>
</ol>
It is a distributed algorithm. Each process independently follows these rules to perform an emergent function (in this case granting synchronized access to the shared resource) , and there is no central coordinating process or central storage. <br />
<br />
Such approach and principles can be generalized and allow to implement any desired form of multiprocess synchronization in a distributed system. But it depends from operating of all processes and failure of one process will halt other processes. The problem of failure not considered here. </div>
<div>
<br />
<h2 style="text-align: left;">
Anomalous Behavior</h2>
<br />
There can be an <b>anomalous behaviour</b> if some events which satisfy condition a -> b happens outside the system then system can assign resource access in the wrong order since from the point of view of the system a !-> b which can contradict to expected behaviour by the users. This can be prevented by the use of properly synchronized <b>physical clocks</b>. It can be formally shown how closely the clocks can be synchronized. <br />
<br />
<b>References:</b> <br />
<ol style="text-align: left;">
<li><a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf">Time, Clocks and the Ordering of Events in a Distributed System (1978)</a> </li>
<li><a href="https://github.com/antonkharenko/logical-clocks">Logical Clocks github repository</a></li>
</ol>
</div>
</div>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-18295363037250643762015-09-07T02:50:00.000-07:002015-09-07T02:50:24.198-07:00Monolithic vs. Microservices Architecture<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="text-align: left;">
Monolithic Architecture</h2>
When developing a server-side application you can start it with a modular hexagonal or layered architecture which consists of different types of components: <br /><ul style="text-align: left;">
<li>Presentation - responsible for handling HTTP requests and responding with either HTML or JSON/XML (for web services APIs).</li>
<li>Business logic - the application’s business logic.</li>
<li>Database access - data access objects responsible for access the database.</li>
<li>Application integration - integration with other services (e.g. via messaging or REST API).</li>
</ul>
Despite having a logically modular architecture, the application is packaged and deployed as a monolith. <div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br clear="none" /></div>
<b>Benefits of Monolithic Architecture</b><br /><ul style="text-align: left;">
<li>Simple to develop.</li>
<li>Simple to test. For example you can implement end-to-end testing by simply launching the application and testing the UI with Selenium.</li>
<li>Simple to deploy. You just have to copy the packaged application to a server.</li>
<li>Simple to scale horizontally by running multiple copies behind a load balancer.</li>
</ul>
In the early stages of the project it works well and basically most of the big and successful applications which exist today were started as a monolith.<br /><br /><b>Drawbacks of Monolithic Architecture</b><br /><ul style="text-align: left;">
<li>This simple approach has a limitation in size and complexity. </li>
<li>Application is too large and complex to fully understand and made changes fast and correctly. </li>
<li>The size of the application can slow down the start-up time.</li>
<li>You must redeploy the entire application on each update.</li>
<li>Impact of a change is usually not very well understood which leads to do extensive manual testing.</li>
<li>Continuous deployment is difficult.</li>
<li>Monolithic applications can also be difficult to scale when different modules have conflicting resource requirements.</li>
<li>Another problem with monolithic applications is reliability. Bug in any module (e.g. memory leak) can potentially bring down the entire process. Moreover, since all instances of the application are identical, that bug will impact the availability of the entire application.</li>
<li>Monolithic applications has a barrier to adopting new technologies. Since changes in frameworks or languages will affect an entire application it is extremely expensive in both time and cost.</li>
</ul>
<div>
<br /></div>
<h2 style="text-align: left;">
Microservices Architecture</h2>
The idea is to split your application into a set of smaller, interconnected services instead of building a single monolithic application. Each microservice is a small application that has its own hexagonal architecture consisting of business logic along with various adapters. Some microservices would expose a REST, RPC or message-based API and most services consume APIs provided by other services. Other microservices might implement a web UI. <br /><br />The Microservice architecture pattern significantly impacts the relationship between the application and the database. Instead of sharing a single database schema with other services, each service has its own database schema. On the one hand, this approach is at odds with the idea of an enterprise-wide data model. Also, it often results in duplication of some data. However, having a database schema per service is essential if you want to benefit from microservices, because it ensures loose coupling. Each of the services has its own database. Moreover, a service can use a type of database that is best suited to its needs, the so-called <a href="http://www.infoq.com/presentations/The-Evolving-Panorama-of-Data">polyglot persistence</a> architecture. <br /><br />Some APIs are also exposed to the mobile, desktop, web apps. The apps don’t, however, have direct access to the back-end services. Instead, communication is mediated by an intermediary known as an <a href="https://www.linkedin.com/pulse/api-gateway-pattern-ronen-hamias">API Gateway</a>. The API Gateway is responsible for tasks such as load balancing, caching, access control, API metering, and monitoring. <br /><br /><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_xFogrjQE1bDKqtpb7c8YgAkxaMl_ltbvgDb19rcGT0h30UbyDyszZAY_sBJpdy4vKJqk11jhC7STfBNEEpCfeRG1fnBP5szOCLGIjPZ992maU1tbWy-bXLN8s3c27SvUv7Si_jOEwSw/s1600/typical_modern_architecture.png"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_xFogrjQE1bDKqtpb7c8YgAkxaMl_ltbvgDb19rcGT0h30UbyDyszZAY_sBJpdy4vKJqk11jhC7STfBNEEpCfeRG1fnBP5szOCLGIjPZ992maU1tbWy-bXLN8s3c27SvUv7Si_jOEwSw/s640/typical_modern_architecture.png" /></a><br /><br />The Microservice architecture pattern corresponds to the Y-axis scaling of the <a href="http://microservices.io/articles/scalecube.html">Scale Cube</a> model of scalability. <br /><br /><b>Benefits of Microservices Architecture</b><br /><ul style="text-align: left;">
<li>It tackles the problem of complexity by decomposing application into a set of manageable services which are much faster to develop, and much easier to understand and maintain.</li>
<li>It enables each service to be developed independently by a team that is focused on that service.</li>
<li>It reduces barrier of adopting new technologies since the developers are free to choose whatever technologies make sense for their service and not bounded to the choices made at the start of the project.</li>
<li>Microservice architecture enables each microservice to be deployed independently. As a result, it makes continuous deployment possible for complex applications.</li>
<li>Microservice architecture enables each service to be scaled independently.</li>
</ul>
<br /><b>Drawbacks of Microservices Architecture</b><br /><ul style="text-align: left;">
<li>Microservices architecture adding a complexity to the project just by the fact that a microservices application is a <a href="http://www.antonkharenko.com/2015/06/notes-on-distributed-vs-non-distributed.html">distributed system</a>. You need to choose and implement an inter-process communication mechanism based on either messaging or RPC and write code to handle partial failure and take into account other <a href="http://www.antonkharenko.com/2015/06/notes-on-fallacies-of-distributed.html">fallacies of distributed computing</a>.</li>
<li>Microservices has the partitioned database architecture. Business transactions that update multiple business entities in a microservices-based application need to update multiple databases owned by different services. Using distributed transactions is usually not an option and you end up having to use an eventual consistency based approach, which is more challenging for developers.</li>
<li><a href="http://martinfowler.com/articles/microservice-testing/">Testing a microservices</a> application is also much more complex then in case of monolithic web application. For a similar test for a service you would need to launch that service and any services that it depends upon (or at least configure stubs for those services).</li>
<li>It is more difficult to implement changes that span multiple services. In a monolithic application you could simply change the corresponding modules, integrate the changes, and deploy them in one go. In a Microservice architecture you need to carefully plan and coordinate the rollout of changes to each of the services.</li>
<li>Deploying a microservices-based application is also more complex. A monolithic application is simply deployed on a set of identical servers behind a load balancer. In contrast, a microservice application typically consists of a large number of services. Each service will have multiple runtime instances. And each instance need to be configured, deployed, scaled, and monitored. In addition, you will also need to implement a service discovery mechanism. Manual approaches to operations cannot scale to this level of complexity and successful deployment a microservices application requires a high level of automation.</li>
</ul>
<div>
<br /></div>
<h2 style="text-align: left;">
Summary</h2>
Building complex applications is inherently difficult. A Monolithic architecture better suits simple, lightweight applications. There are opinions which suggest to start from the <a href="http://martinfowler.com/bliki/MonolithFirst.html">monolith first</a> and others which recommend <a href="http://martinfowler.com/articles/dont-start-monolith.html">not to start with monolith</a> when your goal is a microservices architecture. But anyway it is important to understand Monolithic architecture since it is the basis for microservices architecture where each service by itself is implemented according to monolithic architecture. The Microservices architecture pattern is the better choice for complex, evolving applications. Actually the microservices approach is all about handling a complex system, but in order to do so the approach introduces its own set of complexities and implementation challenges.<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br clear="none" /></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<strong>References:</strong></div>
<ol style="font-family: 'Helvetica Neue'; font-size: 14px; text-align: left;">
<li><a href="https://www.nginx.com/blog/introduction-to-microservices/">Introduction to Microservices</a></li>
<li><a href="http://microservices.io/articles/scalecube.html">The Scale Cube</a></li>
<li><a href="http://microservices.io/patterns/microservices.html">Microservices Architecture Pattern</a></li>
<li><a href="http://microservices.io/patterns/monolithic.html">Monolithic Architecture Pattern</a></li>
<li><a href="http://martinfowler.com/microservices/">Microservices Guide</a></li>
<li><a href="http://martinfowler.com/bliki/MonolithFirst.html">Monolith First</a></li>
<li><a href="http://martinfowler.com/articles/dont-start-monolith.html%20">Don't start with a monolith</a></li>
<li><a href="http://www.infoq.com/presentations/The-Evolving-Panorama-of-Data">The Evolving Panorama of Data</a></li>
<li><a href="https://www.linkedin.com/pulse/api-gateway-pattern-ronen-hamias">API Gateway Pattern</a></li>
<li><a href="http://martinfowler.com/articles/microservice-testing/">Testing Strategies in a Microservice Architecture</a></li>
<li><a href="http://martinfowler.com/bliki/MicroservicePremium.html">Microservice Premium</a> </li>
<li><a href="http://www.antonkharenko.com/2015/06/notes-on-distributed-vs-non-distributed.html">Distributed vs. Non-Distributed Computing</a></li>
<li><a href="http://www.antonkharenko.com/2015/06/notes-on-fallacies-of-distributed.html">Fallacies of Distributed Computing</a></li>
</ol>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-57288002457963031912015-08-21T10:41:00.001-07:002015-08-21T10:42:00.838-07:00Netty Best Practices Distilled<div dir="ltr" style="text-align: left;" trbidi="on">
<ul style="font-family: 'Helvetica Neue'; font-size: 14px;">
<li><span style="font-family: gotham, helvetica, sans-serif;"><strong>Pipeline optimizations</strong>. Write do not make syscalls. Flush makes syscalls to flush out to the socket all previous writes. Limit flushes as much as possible, but also limit writes as well since it need to traverse pipeline.</span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><strong>GS-Pressure</strong>. Use VoidChannelPromise to reduce object creation if not interested in future result and no need to write listener in any channel outbound handler.</span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><strong>Correctly write with respect to slow receivers.</strong> Make use of Channel.isWritable() to prevent out of memory error.</span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><strong>Configure low and high write watermarks.</strong></span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><strong>Pass custom events through pipeline. </strong>Good fit for handshake notifications and more.</span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><b>Prefer ByteToMessageDecoder over ReplayingDecoder.</b> ReplayingDecoder is slower because of more overhead in methods and needs to handle ReplayingError. Use ByteToMessageDecoder if it is possible without making things complicated. </span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><strong>Use pooled direct buffers.</strong></span></li>
<li><b><span style="font-family: gotham, helvetica, sans-serif;">Write direct buffers… always.</span></b></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><b>Use ByteBufProcessor when need to find pattern in a ByteBuf.</b> It is faster because it can eliminate range checks, can be created and shared, easier to inline by the JIT.</span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><b>Prefer alloc() over Unpooled.</b></span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><b>Prefer slice() and duplicate() over copy. </b>Since they do not create extra copy of the buffer.</span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><b>Prefer bulk operations over loops. </b>Because otherwise need range checks on each get.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Use DefaultByteBufHolder for messages with payload. </span></strong><span style="font-family: gotham, helvetica, sans-serif;">Gets reference-counting and release resources for free.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">File transfer. </span></strong><span style="font-family: gotham, helvetica, sans-serif;">Use zero-memory-copy for efficient transfer of raw file content with DefaultFileRegion.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Never block or perform computationally intensive operations on the EventLoop.</span></strong></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">EventLoop extends ScheduledExecturoService, so use it!</span></strong> <span style="font-family: gotham, helvetica, sans-serif;">Schedule and execute tasks in EventLoop.</span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><b>Reuse EventLoopGroup if you can.</b> Sharing the same EventLoopGroup allows to keep the resource usage (like Thread-usage) to a minimum. </span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><b>Share EventLoop for proxy like applications to reduce context-switching.</b></span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><b>Combine operations when call outside EventLoop. </b>To reduce overhead of wakeups and object creation.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Operations from inside ChannelHandler. </span></strong><span style="font-family: gotham, helvetica, sans-serif;">Use shortest path on pipeline if possible.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Share ChannelHandlers if stateless.</span></strong></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Remove ChannelHandler once not needed anymore. </span></strong><span style="font-family: gotham, helvetica, sans-serif;">This keeps the pipeline as short as possible and so eliminate overhead of traversing as much as possible.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Use proper buffer type in MessageToByteEncoder. </span></strong><span style="font-family: gotham, helvetica, sans-serif;">This saves extra byte copies.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Use auto-read flag to control flow. </span></strong><span style="font-family: gotham, helvetica, sans-serif;">This can also be quite useful when writing proxy like applications.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Don't use JDKs SSLEngine if performance matters. </span></strong><span style="font-family: gotham, helvetica, sans-serif;">Use Twitters OpenSSL based SSLEngine. Netty will ship it by its own.</span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><strong>Prefer inner static classes over anonymous classes for channel future listeners.</strong> You never know when ChannelFuture will be notified so it can prevent objects from garbage collection.</span></li>
<li><strong><span style="font-family: gotham, helvetica, sans-serif;">Native Transport (epoll) for less GC and lower latency. </span></strong><span style="font-family: gotham, helvetica, sans-serif;">Only works on Linus as epoll is supported atm.</span></li>
</ul>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<span style="font-family: gotham, helvetica, sans-serif;"><br /></span></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<span style="font-family: gotham, helvetica, sans-serif;">References:</span></div>
<!--?xml version="1.0" encoding="UTF-8" standalone="no"?-->
<br />
<ol style="font-family: 'Helvetica Neue'; font-size: 14px;">
<li><span style="font-family: gotham, helvetica, sans-serif;"><a href="http://normanmaurer.me/presentations/2014-facebook-eng-netty/slides.html">Netty Best Practices a.k.a. Faster == Better (slides)</a> </span></li>
<li><span style="font-family: gotham, helvetica, sans-serif;"><a href="https://www.youtube.com/watch?v=_GRIyCMNGGI">Netty Best Practices with Norman Mauer (video)</a></span></li>
</ol>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-46180414925197929002015-08-19T04:46:00.000-07:002015-09-29T04:51:28.569-07:00Scalable and Efficient Distributed Failure Detectors<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="border: 0px; color: #383838; font-family: gotham, helvetica, arial, sans-serif; font-size: 14px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
Failure detectors are a central component in fault-tolerant distributed systems running over unreliable asynchronous networks e.g., group membership protocols, supercomputers, computer clusters etc. The ability of the failure detector to detect process failures completely and efficiently, in the presence of unreliable messaging as well as arbitrary process crashes and recoveries, can have a major impact on the performance of these systems<br />
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<strong style="line-height: 1.571428em;">Properties of failure detector:</strong></div>
<ul style="border: 0px; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">{Strong/Weak} Completeness:</em> is the guarantee that failure of group member will be eventually detected by {all/some} non-faulty members.</li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Strong Accuracy:</em> no non-faulty group member is declared as failed by any other non-faulty member.</li>
</ul>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
It is proved that achieving both strong completeness and strong accuracy is impossible on fault-prone networks. So if required strong completeness then need to accept weak accuracy (some possibility of false positive failure detection) while trying to reduce it to minimum.<br />
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<strong style="line-height: 1.571428em;">Requirements to Failure Detector:</strong></div>
<ol style="border: 0px; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Completeness:</em> satisfy eventual Strong Completeness.</li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Efficiency:</em></li>
</ol>
<ul style="border: 0px; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Speed:</em> quick (within some given time T) detection of member failure by some (not all) non-faulty member</li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Accuracy:</em> low probability (below given PM(T) which is much below of probability of message loss Pml) of failure detection mistakes.</li>
</ul>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
3. <em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span data-mce-style="font-size: 14px;" style="line-height: 1.571428em;">Scalability</span>: </em>equal expected worst-case network load per member</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<strong style="line-height: 1.571428em;">Theorem.</strong> Any distributed failure detector algorithm for a group of size n (>> 1) that deterministically satisfies the <em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Completeness, Speed and Accuracy</em> requirements above, for a given values of T and PM(T) (<< Pml), imposes a minimal worst-case network load (messages per time unit, as defined above) of:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj944TXI4OQH6dy7yELUodYh_L_pwNqeefV4myrp7dyg0CbSUDDGl1rGZinsrFpskqTNtgqc7v3eVLLdVDnWaOhLXEsmX9OBjN6cLkKe6lXQ7cRsPE71aDla4PQ9hqUPFYza83m9sER9uo/s1600/Picture.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="59" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj944TXI4OQH6dy7yELUodYh_L_pwNqeefV4myrp7dyg0CbSUDDGl1rGZinsrFpskqTNtgqc7v3eVLLdVDnWaOhLXEsmX9OBjN6cLkKe6lXQ7cRsPE71aDla4PQ9hqUPFYza83m9sER9uo/s200/Picture.jpeg" width="200" /></a></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
Furthermore, there is a failure detector that achieves this minimal worst-case bound while satisfying the <em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Completeness, Speed, Accuracy</em> requirements. L* is thus the optimal worst-case network load required to satisfy the <em style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Completeness, Speed, Accuracy</em> requirements.</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<strong style="line-height: 1.571428em;">Heartbeat-based</strong> approaches provides completeness, but have shortcomings:</div>
<ul style="border: 0px; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Centralized - creates hot-spots and prevent them from scaling</li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">Distributed - inherently not very efficient and scalable.</li>
</ul>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<strong style="line-height: 1.571428em;">Randomized Distributed Failure Detector Algorithm</strong><strong style="line-height: 1.571428em;">:</strong></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
It takes as assumption that list of members are same and already known on the nodes. Member recovers from failure in distinguishable new incarnation. Each message also contains current incarnation number of the sender.</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br data-mce-bogus="1" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
At each member Mi:</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">Integer pr; /* local period number */</span></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<br clear="none" /></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">Every T' time units at Mi:</span></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">0. pr := pr + 1</span><br />
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">1. Select random member Mj from view.</span></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Send a ping (Mi, Mj, pr) message to Mj</span><br />
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Wait for the worst case message round trip time for an ack(Mi, Mj, pr) message.</span><br />
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">2. If not received ack yet:</span></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Select k members randomly from view.</span></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Send each of them a ping-req(Mi,Mj, pr) message</span></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Wait for an ack (Mi, Mj, pr) until the end of period pr.</span><br />
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">3. If not received ack(Mi, Mj, pr) message yet then declare Mj as failed.</span><br />
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<br clear="none" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">Anytime at Mi:</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">4. On receipt of a ping-req(Mm,Mj, pr) (Mj != Mi)</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Send a ping(Mi, Mj, Mm, pr) message to Mj</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> On receipt of an ack(Mi, Mj, Mm, pr) message from Mj</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Send an ack(Mm, Mj, pr) message to Mm</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">Anytime at Mi:</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">5. On receipt of a ping(Mm, Mi, Ml, pr) message from member Mm</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Reply with an ack(Mm, Mi, Ml, pr) message to Mm</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">Anytime at Mi:</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">6. On receipt of a ping(Mm, Mi, pr) message from member Mm</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;"> Reply with an ack(Mm, Mi, pr) message to Mm</span></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqCSfj3DSShWUB6qoUPkp8yuajJTMlqZzqHT4wZuzrkLGGbwz5Tyt2eLVAu1qM4XiOlzhFmrFe31vxHOHlFFeyzD34n3Wh79t2ZxFyIdCFTgSrnUbGJBAM6aUkSzQ2Sb0DfrL5U2NMaQk/s1600/Screen+Shot+2015-09-29+at+13.35.36.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="278" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqCSfj3DSShWUB6qoUPkp8yuajJTMlqZzqHT4wZuzrkLGGbwz5Tyt2eLVAu1qM4XiOlzhFmrFe31vxHOHlFFeyzD34n3Wh79t2ZxFyIdCFTgSrnUbGJBAM6aUkSzQ2Sb0DfrL5U2NMaQk/s320/Screen+Shot+2015-09-29+at+13.35.36.png" width="320" /></a></div>
<div style="text-align: center;">
<span style="font-family: 'Helvetica Neue'; line-height: 1.57143em;">Fig. 1 Example of failure detection protocol period at Mi. This shows all the possible messages that a protocol period may initiate.</span></div>
<!--?xml version="1.0" encoding="UTF-8" standalone="no"?-->
<br />
Parameters T' and k of the algorithm can be formally calculated based on the given required quality of service parameters: average speed of failure detection T and accuracy PM(T). </div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br data-mce-bogus="1" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
This algorithm has uniform expected network load at all members. The worst-case network load occurs when, every T' time units, each member initiates steps (1-6) in the algorithm. Steps (1, 6) involve at most 2 messages, while steps (2-5) involve at most 4 messages per ping-req target member. Therefore, the worst-case network load imposed by this protocol (in messages/time unit) is:</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br data-mce-bogus="1" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<div data-mce-style="text-align: center;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px; text-align: center;">
<span data-mce-style="font-family: 'courier new', courier, monospace;" style="font-family: 'courier new', courier, monospace; line-height: 1.571428em;">L = n * [2 + 4 * k] * 1/T'</span></div>
<div data-mce-style="padding-left: 30px;" style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px 0px 0px 30px;">
<br data-mce-bogus="1" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
Which means linear overall network load O(n) produced by running algorithm on all nodes and constant network load at one particular member independent from the group size n.</div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<br clear="none" /></div>
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="line-height: 1.571428em;">References:</span></div>
<ol style="border: 0px; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><a data-mce-href="http://www.cs.cornell.edu/projects/quicksilver/public_pdfs/On%20Scalable.pdf" href="http://www.cs.cornell.edu/projects/quicksilver/public_pdfs/On%20Scalable.pdf" shape="rect" style="border: 0px; color: #047ac6; cursor: pointer; line-height: 1.571428em; margin: 0px; padding: 0px;" target="_blank">On Scalable and Efficient Distributed Failure Detectors (2001)</a></li>
</ol>
</div>
</div>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-86183260586443085022015-08-10T12:34:00.000-07:002015-08-10T12:34:02.911-07:00A Gossip-Style Failure Detector<div dir="ltr" style="text-align: left;" trbidi="on">
Failure Detector is a valuable component for system management, replication, load balancing, group communication services, various consensus algorithms and many other distributed services which rely on reasonably accurate detection of failed members.<div>
<br /></div>
<div>
<b>Algorithm:</b><br /><ol style="text-align: left;">
<li>Each member maintains a list with each member's address and heartbeat counter. At each time period <i>Tgossip</i> it increments its own heartbeat counter and sends its list to one random member. Common communication delay (time for sending messages) is typically much smaller then interval between gossiping rounds (<i>Tgossip</i>).</li>
<li>On receive of such gossip message, a member merges the list in the message with its own list by maximum heartbeat counter for each member.</li>
<li>Each member occasionally broadcasts its list in order to be located initially and also recover from network partitions. Decides about broadcast in probabilistic way with increased probability after longer period of time from last received broadcast. Instead of broadcast can be used few gossip servers with well-known addresses. </li>
<li>Each member also maintains for each other member in the list, the last time that its corresponding heartbeat counter has increased.</li>
<li>If the heartbeat counter wasn't increased for time <i>Tfail</i> the member is considered as failed. But failed members not removed from the list immediately.</li>
<li>Member removed from membership lists after time period <i>Tcleanup</i>.</li>
</ol>
It is possible to analyse probabilities of gossiping and calculate parameters <i>Tgossip</i> and <i>Tfail</i> with given quality of service requirement using epidemic theory. Proposed algorithm may be optimized to utilize IP address structure in order to reduce an amount of traffics between sub networks. </div>
<div>
<br /></div>
<div>
References:</div>
<div>
<ol style="text-align: left;">
<li><!--?xml version="1.0" encoding="UTF-8" standalone="no"?--><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.5411">A Gossip-Style Failure Detection Service (1998)</a></li>
</ol>
</div>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-12856644608159899632015-07-08T03:56:00.000-07:002015-07-08T04:14:52.498-07:00Quality of Service of Failure Detectors<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="border: 0px; color: #383838; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;">Basic Quality of Service (QoS) properties of Failure Detector are:</span></div>
<ul style="border: 0px; color: #383838; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">How fast it detects failure</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">How well it avoids false detection</span></li>
</ul>
<div style="border: 0px; color: #383838; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;">Assuming that process crash is permanent or in other words recovered processes will be the new identities.</span><br />
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;"><br clear="none" /></span></div>
<span style="font-family: inherit;">Primary QoS metrics of Failure Detector:</span></div>
<ul style="border: 0px; color: #383838; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;"><span style="line-height: 1.571428em;">Detection time (T<sub style="line-height: 1.571428em;">d</sub>)</span>. How long it takes to detect failure.</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;"><span style="line-height: 1.571428em;">Mistake recurrence time (T<sub style="line-height: 1.571428em;">mr</sub>).</span> Time between two consecutive mistakes.</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;"><span style="line-height: 1.571428em;">Mistake duration (T<sub style="line-height: 1.571428em;">m</sub>)</span>. Time it takes for failure detector to correct mistake.</span></li>
</ul>
<div style="border: 0px; color: #383838; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;">Derived metrics (can be computed from primary metrics):</span></div>
<ul style="border: 0px; color: #383838; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Average mistake rate.</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Query accuracy probability. Probability that failure detector's output is correct at random moment of time.</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Good period duration.</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Forward good period duration.</span></li>
</ul>
<div style="border: 0px; color: #383838; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;">The defined metrics do not depend on implementation-specific features of failure detection algorithm and can be used to compare failure detectors.</span><br />
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;"><br clear="none" /></span></div>
<span style="font-family: inherit;">QoS requirements to the Failure Detector can be expressed via primary metrics:</span></div>
<ul style="border: 0px; color: #383838; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Upper bound on the detection time (T<sub style="line-height: 1.571428em;">ud</sub>)</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Lower bound on the average mistake recurrence time (T<sub style="line-height: 1.571428em;">lmr</sub>)</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Upper bound on the average mistake duration (T<sub style="line-height: 1.571428em;">um</sub>)</span></li>
</ul>
<div style="border: 0px; color: #383838; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;">Together with probabilistic parameters of the network:</span></div>
<ul style="border: 0px; color: #383838; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Message loss probability [P<sub style="line-height: 1.571428em;">loss</sub>]</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Average message delay [E(D)]</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">Variance of message delay [V(D)]. Exponential distribution of message delays is very common.</span></li>
</ul>
<div style="border: 0px; color: #383838; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;">Also at the paper [1] was proposed a modified heartbeating algorithm for failure detection with input parameters:</span></div>
<ul style="border: 0px; color: #383838; line-height: 1.571428em; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">n - delay between consecutive heartbeats</span></li>
<li style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><span style="font-family: inherit;">ro - time shift of the heartbeat after which process declared as failed</span></li>
</ul>
<div style="border: 0px; color: #383838; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;">The goal was to compute based on (1) QoS requirements (T<sub style="line-height: 1.571428em;">ud</sub>, T<sub style="line-height: 1.571428em;">lmr</sub> and T<sub style="line-height: 1.571428em;">um</sub>) and (2) the probabilistic parameters of network (P<sub style="line-height: 1.571428em;">loss</sub>, E(D) and V(D)), the optimal failure detector parameters n and ro for the proposed algorithm.</span><br />
<div style="border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;">
<span style="font-family: inherit;"><br clear="none" /></span></div>
<span style="font-family: inherit;">It is possible to estimate links quality parameters analyzing heartbeat messages. So we can adjust failure detector parameters dynamically using link quality estimator:</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEju7m6ASnZPcVPvLMekCr7JUBnzVBdAx7qPVYlo7m_H8dc6emX-4iirZ2PEqveTjB2cOCWl0zq9EsU5G9FR9f1LkobgqRQH-k0ipgvliL-yDgOQ6o7fXjypxSmiSQiSdCPOu1w2Zsbe8tI/s1600/failure_detector_qos.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: inherit;"><img border="0" height="308" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEju7m6ASnZPcVPvLMekCr7JUBnzVBdAx7qPVYlo7m_H8dc6emX-4iirZ2PEqveTjB2cOCWl0zq9EsU5G9FR9f1LkobgqRQH-k0ipgvliL-yDgOQ6o7fXjypxSmiSQiSdCPOu1w2Zsbe8tI/s400/failure_detector_qos.png" width="400" /></span></a></div>
<div style="border: 0px; color: #383838; line-height: 21.9999923706055px; margin: 0px; padding: 0px;">
<span style="font-family: inherit;">References:</span></div>
<ol style="border: 0px; color: #383838; line-height: 21.9999923706055px; list-style-position: outside; margin: 0.2857em 0px 0.714285em 2em; padding: 0px;">
<li style="border-image-outset: initial; border-image-repeat: initial; border-image-slice: initial; border-image-source: initial; border-image-width: initial; border: 0px; line-height: 1.571428em; margin: 0px; padding: 0px;"><a href="http://research.microsoft.com/en-us/people/weic/ieeetc02_fdqos.pdf"><span style="font-family: inherit;">On the Quality of Service of Failure Detectors (2002)</span></a></li>
</ol>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-47791035090838625282015-06-22T05:54:00.000-07:002015-07-08T04:37:57.787-07:00The Twelve-Factor App<div dir="ltr" style="text-align: left;" trbidi="on">
In the modern era, software is commonly delivered as a service: called web apps, or software-as-a-service. The twelve-factor app is a methodology for building software-as-a-service apps proposed by <a href="https://www.heroku.com/">Heroku PaaS</a>.<br />
<br />
<div style="text-align: left;">
<b>1. Codebase.</b> One codebase tracked in revision control, many deploys.</div>
<br />
One-to-one correlation between the codebase and the app:<br />
<ul style="text-align: left;">
<li>If multiple codebases - it's not an app but distributed system. And each component in a distributed system is an app.</li>
<li>Multiple apps sharing the same code is violation of this principle. The solution is to factor code into libraries and include them via dependency management.</li>
</ul>
A deploy is the running instance of the app.<br />
<br />
<b>2. Dependencies.</b> Explicitly declare and isolate dependencies.<br />
<br />
Declare all dependencies completely and exactly via a dependency declaration manifest and use dependency isolation tool during execution.<br />
<br />
<b>3. Config.</b> Store config in the environment.<br />
<br />
An app's config is everything that is likely to vary between deploys:<br />
<ul style="text-align: left;">
<li>Resource handles to the database, Memcached, and other backing services</li>
<li>Credentials to external services such as Amazon S3 or Twitter</li>
<li>Per-deploy values such as canonical hostname for the deploy</li>
</ul>
Need strict separation of config from code. Config varies substantially across deploys, code doesn't. The twelve-factor app stores config in environment variables.<br />
<br />
<b>4. Backing Services</b>. Treat backing services as attached resources.<br />
<br />
It's an any service which app consumes over the network as part of its normal operation. The code of app makes no distinction between local (means deployed to localhost) and third party services.<br />
<br />
<b>5. Build, release, run.</b> Strictly separate build and run stages.<br />
<br />
For example, it is impossible to make changes to the code in the runtime. Release can not mutate once it is created and any changes must create a new release.<br />
<br />
<b>6. Processes.</b> Execute the app as one or more stateless processes.<br />
<br />
Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database. Sticky sessions are violation of twelve-factor and should never be used or relied upon. Session state data is a good candidate for a datastore that offers time-expiration, such as Memcached or Redis.<br />
<br />
<b>7. Port binding.</b> Export services via port binding.<br />
<br />
The twelve-factor app is completely self-contained and does not rely on runtime injection of a webserver into the execution environment. The web app exports HTTP as a service by binding to a port, and listening to requests coming in on that port.<br />
<br />
<b>8. Concurrency.</b> Scale out via the process model.<br />
<br />
In the twelve-factor app, processes are a first class citizen. Developer can architect their app to handle diverse workloads by assigning each type of work to a process type. This does not exclude individual processes from handling their own internal multiplexing, via threads inside the runtime VM, or the async/evented model. But an individual VM can only grow so large (vertical scale), so the application must also be able to span multiple processes running on multiple physical machines.<br />
<br />
The share-nothing, horizontally partitionable nature of twelve-factor app processes means that adding more concurrency is a simple and reliable operation.<br />
<br />
<b>9. Disposability.</b> Maximize robustness with fast startup and graceful shutdown.<br />
<br />
The twelve-factor app's processes are disposable, meaning they can be started or stopped at a moment's notice. Processes should strive to minimize startup time.<br />
<br />
<b>10. Dev/prod parity.</b> Keep development, staging, and production as similar as possible.<br />
<br />
Minimize gaps between development and production environments:<br />
<ul style="text-align: left;">
<li>Make time gap small. A developer may write code and have it deployed hours or even just minutes later.</li>
<li>Make personnel gap small: developers involved in deploying app and watching its behavior in production.</li>
<li>Make the tools gap small: keep dev and prod as similar as possible.</li>
</ul>
The twelve-factor developer resists the urge to use different backing services between development and production.<br />
<br />
<b>11. Logs.</b> Treat logs as event streams.<br />
<br />
Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes and backing services.<br />
<br />
A twelve-factor app never concerns itself with routing or storage of its output stream. Most significantly, the stream can be sent to a log indexing and analysis system, or a general-purpose data warehousing system such as Hadoop/Hive.<br />
<br />
<b>12. Admin Processes.</b> Run admin/management tasks as one-off processes.<br />
<br />
In a local deploy, developers invoke one-off admin processes by a direct shell command inside the app’s checkout directory. In a production deploy, developers can use ssh or other remote command execution mechanism provided by that deploy’s execution environment to run such a process.<br />
<br />
References:<br />
<ol style="text-align: left;">
<li><a href="http://12factor.net/">The Twelve-Factor App</a></li>
</ol>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-28332991476167633342015-06-19T02:16:00.001-07:002015-07-08T04:40:43.931-07:00Criteria for Decomposing Systems into Modules<div dir="ltr" style="text-align: left;" trbidi="on">
Benefits from decomposing system into modules:<br />
<ul style="text-align: left;">
<li>Managerial - independent development</li>
<li>Flexibility - easier to change product</li>
<li>Comprehensibility - easier to understand product</li>
</ul>
Criteria for splitting system into modules:<br />
<ul style="text-align: left;">
<li>BAD: Each module corresponds to separate subroutine</li>
<li>GOOD: Module hides ("chunk") information</li>
</ul>
What module should hide:<br />
<ul style="text-align: left;">
<li><b>Important design decisions</b></li>
<li><b>Design decisions which is likely to change</b></li>
</ul>
What else:<br />
<ul style="text-align: left;">
<li>Data structures</li>
<li>Sequence of instructions to call a given routine and the routine itself</li>
<li>The sequence of items processing</li>
</ul>
etc.<br />
<div>
<br /></div>
<div>
Hierarchical structure of dependencies and "clean" decomposition into modules are two desirable but independent properties of the system.</div>
<div>
<br /></div>
<div>
It is almost always incorrect to do decomposition of the system based on the flow chart (each block in the flow chart is a separate module). Instead begin with the list of difficult design decisions or decisions which is likely to change. Each module then designed to hide such decisions from others. So subroutines and programs (blocks in flow chart) is an assembled collection of code from various modules.<br />
<br />
References:<br />
<ol style="text-align: left;">
<li><a href="https://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf">On the Criteria To Be Used in Decomposing Systems into Modules (1972)</a></li>
</ol>
</div>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-62607069042286910042015-06-18T04:45:00.000-07:002015-07-08T04:11:42.705-07:00Fallacies of Distributed Computing<div dir="ltr" style="text-align: left;" trbidi="on">
Above described assumptions which architects and designers of distributed systems are likely to make and which prove to be wrong in the long run:<br />
<br />
<ul style="text-align: left;">
<li><b>The network is reliable</b>. The network is unreliable and we need to address possible failure scenarios.</li>
<li><b>Latency is zero.</b> Try to make few as possible network calls since latency is not zero and huge comparing to in-memory calls.</li>
<li><b>Bandwidth is infinite.</b> Try to simulate real production environment.</li>
<li><b>The network is secure.</b> You need to build security into your solution from Day 1. You need to be aware about security concerns and its implications even so the architect of the system should not be a security expert.</li>
<li><b>Topology doesn't change.</b> In real environment topology can change. And you need to be aware regards it (e.g. use “Next Hop” routing or specify address by DNS name). </li>
<li><b>There is one administrator.</b> Administrators can constraint your options and you need to help them to manage your application.</li>
<li><b>Transport cost is zero.</b> There are costs associated with both computational resources and money spent on network maintenance.</li>
<li><b>The network is homogeneous.</b> Interoperability will be needed sooner or later. Do not rely on proprietary protocols but use standard technologies that are widely accepted.</li>
</ul>
<br />
References:<br />
<ol style="text-align: left;">
<li><a href="http://www.rgoarchitects.com/Files/fallacies.pdf">Fallacies of Distributed Computing Explained</a></li>
</ol>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.comtag:blogger.com,1999:blog-8977574602634067696.post-47373864328983328092015-06-16T03:08:00.003-07:002015-07-08T02:27:23.314-07:00Distributed vs. Non-Distributed Computing<div dir="ltr" style="text-align: left;" trbidi="on">
Distributed and non-distributed computing has conceptual differences:<br />
<ul style="text-align: left;">
<li>Latency</li>
<li>Memory access</li>
<li>Partial failure</li>
<li>Concurrency</li>
</ul>
Major problems in distributed computing correspond to this differences:<br />
<ul style="text-align: left;">
<li>Ensuring adequate performance</li>
<li>Dealing with differences in memory models between local and distributed entities</li>
<li>Dealing with partial failures and lack of a central resource manager</li>
<li>Dealing with problems of concurrency</li>
</ul>
Distributed application interface should reflect its distributed nature. Merging this two models leads to one of the following problems:<br />
<ul style="text-align: left;">
<li>Making local computing looks like distributed makes local computing unnecessary difficult.</li>
<li>Making distributed computing looks like local leads to the unreliable system.</li>
</ul>
<br />
References:<br />
<ol style="text-align: left;">
<li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.7628">A Note on Distributed Computing (1994)</a></li>
</ol>
</div>
Antonhttp://www.blogger.com/profile/10711859959471898109noreply@blogger.com