Narendra Dhami

My Site

Setting up continuous integration for PHP using Hudson and Phing

Posted by Narendra Dhami on October 13, 2010

he difficulty with Unit Testing is keeping it up. It is very easy to slip into poor habits and before you know it there’s a huge chunk of code with no tests. Possibly a huge, badly designed chunk of code, that didn’t benefit from having tests written before it was coded. Before you know what’s going on, you end up with a project that you really can’t write tests for, because retrofitting the tests is near impossible.

For me, there are two critical reasons for Unit Testing:

  • Enforcing good design
  • To be able to write tests, you need to be able to zero in on a “unit” of code, isolating it from all the rest of your 1,000,000 lines of web application. Writing Unit Tests forces you to design systems that have loose coupling because otherwise it is impossible to test.

  • Allowing changes to be made in confidence
  • Without Unit Tests, you get to the point where no one really wants to make any changes to the code. This is especially true in a commercial environment, where many people have worked on the code, including some key team member who has since left. Unit Tests allow you to make changes to one part of the code and be pretty convinced you haven’t messed up something else.

    More …

    Posted in PHP | 1 Comment »

    The Seven Wastes of Software Development

    Posted by Narendra Dhami on October 7, 2010

    1. Partially Done Work
    2. Extra Features
    3. Relearning
    4. Handoffs
    5. Delays
    6. Task Switching
    7. Defects


    Posted in General | Leave a Comment »

    Top Ten Most Critical Web Application Security Vulnerabilities

    Posted by Narendra Dhami on March 25, 2010

    Web application security is often viewed incorrectly as a set of server and host-based security issues, rather than code-level and configuration-based security vulnerabilities. Although servers and hosts may still be the cause for exploitations, it is critical that security professionals recognize the major impact of poorly written web applications as well as how their applications and servers are configured separately and in combination. The Internet is increasingly responsible for handling and storing information and files of a sensitive nature requiring security and protection. Keeping hackers at bay and assuring the privacy of private and proprietary documents is paramount. Below are the top ten security vulnerabilities and how Security Programmers mediate these to prevent exploitation.

    Security Web Programmers are often not given the clout nor the attention they deserve. Security programmers apply a much higher degree of attention, detail, and time to programming. Secure software may require more time and money than insecure software. A comparison must be made between the cost of securing web applications, and an insecure web application bringing the business down or releasing sensitive information to potentially nefarious hackers.

    Don’t be misled by security misnomers or be mistaken about your security requirements. Security factors can be well-defined and explained at any level of your corporate structure. Emagined Security employs security programmers who are trained and experienced to develop secure software, including web and database applications. Our proven security programming techniques and multi-layered security development protocols ensure your web applications are protected and your sensitive information secured.

    Issue 1: Lack of Data Validation & Data Cleansing
    Information sent to your server from web requests that is not validated and cleansed before being handled by a web application produces an environment for attacks. The vast majority of compromised servers are broken as a result of this security vulnerability. A common security misnomer is that, “my firewall will protect my server.” Firewalls may limit the traffic that reaches your web applications, but a poorly developed web application creates vulnerabilities to traffic that firewalls do not filter or prevent.

    Resolution 1: Wherever a user may submit information to your server, there is a requirement for data validation and data cleansing. From a programming point of view, any information that reaches a web application can be inspected for filtration, rejection, or approval prior to being handled by the expected web application modules. Whereas a firewall does not filter traffic, an IDS (Intrusion Detection System) can inspect traffic for predefined attack patterns prior to reaching the web application, and is advised. A common security misnomer is that SSL (Secure Socket Layers) will protect your customers, yet can create another layer of security vulnerabilities, if traffic is decrypted after passing through the IDS rather than before. IDS does not decrypt traffic, so SSL-encrypted traffic must be decrypted prior to the IDS, allowing IDS to inspect “plain traffic data” for attack patterns.

    At the programming level, web applications should inspect all incoming data for attacks and abuse to create a layer of web application security. Bear in mind that some scripts may accept traffic from sources other than the expected web forms and points of origin. A hacker may submit a form via a POST request directly to the server without ever visiting the web site or using a web form. Data types can be modified easily and re–submitted multiple times in an attack. The web application should be designed to handle a specific data set of expected data types, not to accept any or all submitted data. Every Value sent to the server has a Variable that can be addressed individually for the expected format, pattern, content, and illegal characters.

    A simplified example of data submission and data validation and data cleansing would be the simple newsletter signup form. The user is expected to submit an email address, which the server would place into a database for storage and perhaps send an alert email to the newsletter manager. Three critical points of programming come into play here. The email address is handled by code, the database is updated with the supplied email address, and an email is sent using the email address. The vulnerability arises when the insecure programmer assumes a nice shiny belief that the only data submitted will be an email address. The exploitation arises when the hacker submits something ther than an email address, such as a String of Code, an SQL Injection, or another attack. Without programming-level measures in place, the web applicati0n may be compromised, not to mention the server, the database, and sensitive information as well.

    Data Validation is a security programming technique that inspects the submitted “email address” to determine if the pattern matches that of a real email address format. If the data does not meet the requirements for being an email address, the data is dropped and the user messaged to say that the web application is not vulnerable, hopefully diminishing ongoing attacks. Data sent as an attack should certainly not be sent to the database, but it can be modified for administrator review and inserted into security logs or emailed inside an alert. Data Cleansing is the process of removing illegal characters or patterns, and rendering attack strings inert so they may be handled by the web application.

    Beware the common mistakes of assuming a radio button or checkbox will not create a point of data tampering. Attackers can send anything they want to a web application using the variable names of anything they define, including radio buttons, checkboxes, and file upload fields using standard POST/GET requests and using AJAX connection methods. AJAX (and Javascript in general) have created a large resource of attack points for exploitation since a larger assumption is made by insecure programmers that the AJAX connections will not become points of attack, for the same reasons they assume for form objects.

    Issue 2: Broken or Lacking Access Controls
    Member-Restricted and Administrator-Restricted access is often granted to provide control over websites, databases, and various information types. When there is a lack of control over who accesses data and which restrictions really apply, attackers may access accounts, sensitive information, and levels of administration that they should not. A member who gains access to administration functions that delete members or downloads a copy of the database is not a good thing, and can destroy member and investor loyalty. This vulnerability will not meet HIPPA Compliance, PCI Compliance, and other compliance mandates in different industries such as the medical and financial sectors.

    Issue 3: Broken or Lacking Authentication and Session Management
    Restricting access to a web application, and the data it is supposed to protect, requires protecting the access credentials and authentication schemes. A common attack to gain access rights of other users is called Sessi0n Hijacking. Sessions are closely associated with Cookies, and together these storeUser Names, Passwords, Access Keys, and other vital information used to allow or deny access to sensitive information. Cookies are stored in user browsers where sessions are stored on the hosting server.

    Secure techniques allow the storage of sensitive authentication tokens in the session only, not the cookie, and use the cookie only to store the Session ID that connect the cookie (browser) to the correct server-side session. Secure server configurations are required to assure randomized session ID values that are not easily predicted and to assure storage of sessions is secure. Sessions must track a fingerprint of the user, including the IP Address, browser type and language, platform, and other trackable data types for fingerprinting. Tracking the time the session started and restricting its duration of use is another useful security technique. The concern is that a hacker may copy the session ID from the transmission between the user and the server, then integrate that session into the traffic on another machine and access the user’s account and web application functions.

    Resolution 2 & 3: Securing sessions may impede Session Hijacking, but there is another layer of programming required for authentication schemes to operate securely. Data Validation and Data Cleansing (ab0ve) must be employed in all transactions. Enforcement of unique user names must be employed to prevent users with the same User Name from accessing incorrect accounts. A well-defined access control list should determine what the user may access, not a list of what the user may “not” access. Authentication and access control modules should be integrated into a single modules or system and applied globally, allowing global updates and corrections. SSL encryption should be used to help prevent attackers from listening to the traffic for sensitive information, and SSL Enforcement is advised. Unathorized access attempts should be logged and alerts sent for review and potential remediation.

    When new modules are added for user access, a complete security review must be performed to assure vulnerabilities have not been introduced. When systems that require authentication are hostedd on the same server ass systems not requiring authentication, the security programmers should review the insecure programming modules for vulnerabilities to their systems, else host the authentication-restricted media on a separate server. Security vulnerability assessments should be scheduled and performed as a routine security measure. Although there are other techniques and aspects to securing authentication systems, these measures are a good basic review.

    Issue 4: Cross-Site Scripting (XSS) Vulnerabilities
    Cross-Site Scripting is an attack method that introduces remote web components into the host site, or transports the site user to a remote web site. The goal of the attacker is to make the user execute remote code or modules that reveal user authentication, sensitive information, or create access to the host site in some form. Methods of attack include inserting XSS code into the host database, using methods such as Code Injection or SQL Injection. The attacker may modify the web file code on the host server to include Active-X controls, redirection methods, iFramed content, or other attack types. The results of these XSS attacks range from revealing authentication credentials to feeding sensitive information to a remote user or host, proprgating the attack to other members or administrators of the site, and creating Man-in-the-Middle attacks such as Phishing attacks. Some attacks may become visible immediately with or without repercussions, while other may reside actively or latent on the host server feeding data to remote hackers regularly or waiting for a trigger to launch a more complex or timed attack.

    Resolution 4: XSS Cross Site Scripting requires due diligence of a security programmer and includes other resolution description above and below. The vast majority of web application exploitations result from poor programming. Server configurations such as Magic Quotes and Strip Tags may reduce the chance of injections from being successful. Correctly placed and configured IDS with updated match patterns helps reduce the chance of attack traffic from reaching the web application. Most importantly, Data Validation and Data Cleansing must be included at every level of security programming to protect the web application from many various attacks.

    Issue 5: Buffer Overflow Vulnerabilities
    Some web application components in some languages do not correctly validate user-submitted input and may crash or freeze a server. IN some situations a Buffer Overflow may used to take control of a process, despite being difficult to execute. Vulnerable components can include CGs, DLLLibraries, Various Drivers, and some web application server components. This attack is performed by submitting more information than a variable is expecteing to receive, causing the “overflow” of data to write over a section of memory where another process may subsequently access and execute it. If the section of overwritten code contains the correct executable content for the process that happens to access it, the results could range from a defunct server or process to a compromised server with associated exploitations.

    Resolution 5: In most cases, buffer overflows are not the responsibility of the security programmer as much as that of the server administrator. The physical server itself must be correctly configured and updated with the latest security features and patches. Vulnerabilities in languages asuch as ASP, PHP, CFM and others must be addressed by the developers or communities and the updates and upgrades applied quickly. Hackers are just as well-informed about buffer overflow problems in software as the security professionals, since buffer overflow problems are almost always public information.

    If the hacker knows what version of server software you utilize, and a security vulnerability is announced for your version, that hacker may immediately attack your server with a well-defined attack point. For this reason, your server should never announce which type or version of software is running. In fact, the option exists to mislead the attacker by declaring a different server software name or version.

    Buffer overflows are not easy to execute successfully and usually require multiple attack attempts. Keeping your server up to date for all software packages is required, and using an updated and correctly placed IDS is important. There is not much a security programmer can do to mediate buffer overflow attacks, but data validation may reveal attacks of this nature and alerts may be used to bring attention to logs and advise admins to look for attacks and their sources.

    Issue 6: Code Injection & SQL Injection Vulnerabilities
    This is my favorite aspect of hacking, since it is very much a programming-level attack technique. Referring to the email address and newsletter signup example above, an incorrectly programmed handler of submitted data may allow unexpected data to exploit the web application or server. An SQL Injection is a type of Code Injection. A Code Injection sends programmer’s code to the server inside a variable. If the web application (or IDS) does not identify the variable’s contents as invalid, the code that is sent may be executed by the web application. An SQL Injection is a specific type of Code Injection in which the web application’s database is the target of the attack.

    Code Injections may come in many flavors that include anything your programming language can perform, given permissions restrictions on the server. In the newsletter signup example, the expected value is a regular email address. However, if that variable contains a section of code that is correctly written to insert itself into the server-side web application code, the server may execute the targeted script including the attacker’s code. Given the range of possibilities defined by the language itself, the worst case scenario is that the code creates a situation allowing the upload of a root kit to the server and the whole server is owned by the attacker.

    SQL Injections are the same as a Code Injection except that the target of the attack is the database. Using Sequel Querying Language (SQL), the injected code may connect to the database to delete all contents, feed the entire database to the remote hacker, insert whatever records the hacker defines, or modify existing records. Using an eCommerce system as another example, an attacker may wish to retrieve credit card profiles or create records causing the business to deliver products or services without being paid.

    Resolution 6: Employing Data Validation and Data Cleansing techniques from above, SQL Injections may be easily mediated. Techniques include individual value inspection, configuring the web application to use Magic Quotes, and configuring the web server language and database to handle cases of string escaping. Deprecated methods such as using “mysql_escape_string()” must be remediated with modern code or higher value techniques. Data Cleansing and Validation is the bottom line for preventing Code and SQL Injection attacks. Where SQL Injection attacks may be difficult to identify without IDS or advanced SQL Injection prevention programming techniques, slash notation handling may be required.

    Given a website that stores samples of SQL Queries and commands for SQL researchers, a text field may contains SQL string that will be tough to differentiate from SQL Injection attempts. Although this is an uncommon situation, it illustrates the fact that there are times you simply accept the data and “add slashes” to the data so it is not executed by the server or database. Magic Quotes is another example of server language and programming-level technique for mediating SQL or Code Injections.

    Issue 7: Poor or Lacking Error Handling
    Errors can be created on almost every web application, but the difference between a crafted web application and a poor web application is how the error is handled. A server that is configured to operate in Debug mode may offer considerable information about the error, which may reveal server and application pathways, file names, code segments, server types and versions, and other information. All of these contribute the to resources the attacker will use to exploit the web application and server. Once a poorly handled error is discovered, an attacker may iterate man types of errors to collect a wide range of information about the server.

    Some errors, especially crafted errors by an attacker, may result in denial of service or cause the server to fail in some form. Some security features of the web application may become voided by some error situations, creating yet more vulnerabilities. The nature of the potential vulnerabilty depends on the nature of the mishandled error and the server or web application environment.

    Resolution 7: All errors must be handled either with specific error handlers for expected errors, or with broad error handling mechanisms for unexpected errors. The server should not operate in debug mode once launched into production, and development systems must be kept inside intranets (or securely on the Internet) where they may operate safely in debug mode. At the server level, all errors may be written to error logs regardless of being in production mode instead of debug mode. The amoutn of logging is independent of the operation mode, so logging may remain in debug mode while the server is not in debug mode. The pages that result from errors should specifically or generally return a designed web page containing navigation back into the site, no debugging content for general users, and not reveal the facts related to a true error. General users will return to the site unaware that there was an error, while attackers may be misinformed to think their attacks did not succeed.

    Issue 8: Insecure & Improper Data & File Storage
    All too often poorly programmed websites provide easy access to credit card profiles, restricted areas of the web application, and to sensitive documents and information. Information and file management is a growing concern and must be addressed. Information stored in databases may not be protected by authentication or encryption. Sensitive files stored on the server may be directly browsed simply by side-stepping authentication. Authentication techniques may be incorrectly applied or insufficient to protect data and files.

    Resolution 8: Correctly securing data and files requires two different approaches to the same results. Data stored in the database requires different security techniques from files stored on the server, yet there are some techniques that are the same. The type of data stored inside the database record or file may affect the technique for securing the content.

    Protecting simple passwords may be as simple as using one-way or two-way encryption, depending on the type of server the web application is hosted on. One-way encryption means the user’s password is encrypted using the same one-way encryption then compared against the encrypted password in the database. Failure requires the password be changed via some other mechanism, which can create vulnerabilities in itself. Two-way encryption may be the decryption of the password stored in the database and comparison against the submitted password. If the server is a shared hosting server, one-way encryption is necessary, yet may become a moot point when misconfigured web hosts allow neighboring web applications access to browse the web files that contain the encryption/decryption code.

    File storage is a somewhat different issue since the information is stored as flat files on the file system. If possible, store the files outside of the web root, where permissions are more restricted. Do not store files with the original file name of file URI. Track the files and use a file name that is a tracking ID without a file extension. Stream the file contents to the user and name the file on-the-fly.

    Issue 9: Denial of Service Vulnerability
    A denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to make a computer resource unavailable to its intended users. Although the means to perform a DoS attack may vary, it generally consists of concerted efforts to prevent an Internet site or service from functioning efficiently or at all. DoS attackers typically target sites or services hosted on high-profile web servers such as banks, credit card payment gateways, and even root nameservers. The term is generally used with regards to computer networks, but is not limited to this field, for example, it is also used in reference to CPU resource management.

    One common method of attack involves saturating the target (victim) machine with external communications requests, such that it cannot respond to legitimate traffic, or responds so slowly as to be rendered effectively unavailable. In general terms, DoS attacks are implemented by either forcing the targeted computer(s) to reset, or consuming its resources so that it can no longer provide its intended service or obstructing the communication media between the intended users and the victim so that they can no longer communicate adequately.

    Resolution 9: Preventing Dos and DDoS attacks requires the integration of multiple prevention techniques. Firewalls can limit the traffic to the server. Switches and Routers that can handle high volume traffic are often able to detect and limit throughput based on too much traffic or a detected DoS attack. Front-end application hardware in conjunction with switches and routers may identify and handle traffic ans normal, priority or potentially dangerous. IP Based restrictions may help throttle traffic and limit or mediate DoS attacks. Blackholing (sinkholing) may be used to redirect DoS traffic to dead-end IPs and destinations. Low-volume web environments may make more use of programming-level prevention, but DoS/DDoS prevention is more of a server and network level remediation approach.

    Issue 10: Lacking or Poor Configuration Management
    A poorly or incorrectly configured server and server modules will potentially create multiple security vulnerabilities. Network configuration is also important for securing the server and its environment. A secure web application requires a strong server configuration. Many servers and software packages are distributed with an insecure configuration out of the box. A default server may have many features turned on for convenience, including Open STMP Relay, Anonymous FTP access, and other insecure features which may be desired after being reconfigured. Some defauly systems may have software packages that are pre-installed and should be removed. Operating a server with a default software setup and configuration may cause multiple security vulnerabilities that may be exploited by hackers.

    Resolution 10: Server configuration is not typically the responsibility of the programmers, and should be addressed by server and network administrators. The host server should be assessed for security issues, which will usually reveal what packages are installed and their versions, missing patches and updates or upgrades, and provide a hit list for which packages should be removed, disabled or correctly configured. Before putting the server into production, a security vulnerability scan should be performed, followed by a code review of the web application.

    Original from:

    Posted in Security | Leave a Comment »

    25 Excellent PHP Tools That Enhance The Way You Develop

    Posted by Narendra Dhami on February 10, 2010

    PHP has grown to become one of the most wide-spread server-side scripting language that’s used on a daily basis in a variety of ways. For the most part, almost every developer that’s introduced to this programming language uses it and becomes fond of it at some point. With the support of a very large community, this incredibly fast technology is taking over other server-side scripting languages. If you conduct a quick search on the subject you’ll find countless ready-to-use scripts and a fist-full of quality frameworks.

    Here you can see we’ve complied 25 Excellent PHP Tools That Will Enhance The Way You Develop in detail. Take a look at a few of these tools, assess which ones address your needs, and take advantage of them. After all, they’re there for us to use.


    PHP_Debug is very useful open-source tool that allows you to output processing times of your PHP and SQL, check the performance of specific code blocks and get variable dumps in graphical form.


    PHPUnit is a great testing framework that belongs to the xUnit family of testing frameworks. It’s perfect for writing and running automated tests on your PHP code.


    phpDocumentor, (phpdoc or phpdocu) is an auto-documentation tool for the php language. Similar to Javadoc, and written in php, phpDocumentor can be used from the command line or a web interface to create professional documentation from php source code.


    Scavenger is an open source real-time vulnerability management tool. It helps system administrators respond to vulnerability findings, track vulnerability findings, review accepted or false-positive answered vulnerabilities, and not ‘nag’ system administrators with old vulnerabilities.


    Xdebug is a PHP extension that helps you debug your scripts by providing a lot of valuable debug information. If a script fails to execute properly, Xdebug will print a full stack trace in the error message, along with function names, parameter values, source files, and line numbers.


    MODx helps you take control of your web content. It’s a fast and reliable Open Source PHP application framework, it frees you to build sites exactly how you want and make them 100% yours.

    PHP Object Generator

    PHP Object Generator, (POG) is an open sourcePHP code generator which automatically generates clean & tested Object Oriented code for your PHP4/PHP5 application.


    CakePHP is one of the most widely used rapid development framework for PHP. It provides an extensible architecture for developing, maintaining, and deploying applications.

    PHP/SWF Charts

    PHP/SWF Charts is a simple and powerful PHP tool that lets you create attractive web charts and graphs from dynamic data. Use PHP scripts to generate or gather the data from databases, then pass it to this tool to generate Flash (swf) charts and graphs. This tool allows the integration of PHP scripts, and Flash provides the best graphic quality.


    BlueShoes is a comprehensive application framework and content management system. It is written in the widely used web-scripting language PHP. BlueShoes offers excellent support for the popular MySQL database as well as support for Oracle and MSSQL.


    Propel is an open-source Object-Relational Mapping (ORM) library written in PHP5, written on top of PDO. It allows you to access your database using a set of objects, providing a simple API for storing and retrieving data. Propel allows web application developers to work with databases in the same way they work with other classes and objects in PHP.


    PEAR is a framework and distribution system for reusable PHP components. You will be able to automatically format and clean-up your PHP4 and PHP5 source code.


    Phing is a PHP project build system or build tool based on Apache Ant. You can do anything with it that you could do with a traditional build system like GNU make, and its use of simple XML build files and extensible PHP “task” classes make it an easy-to-use and highly flexible build framework.


    CodeIgniter is very a powerful PHP framework with a small footprint. It was built for PHP coders who need a simple and elegant toolkit to create full-featured web applications.


    Securimage is an open-source free PHP CAPTCHA script for generating complex images and CAPTCHA codes to protect forms from spam and abuse. It can be easily added into existing forms on your website to provide protection from spam bots.


    SimplePie is a clear-cut PHP class that can help you work with any RSS feed. It was written in PHP and is Flexible, this way beginners or advanced users can benefit from it alike.


    Pixy is a Java program that performs automatic scans of PHP 4 source code, aimed at the detection of XSS and SQL injection vulnerabilities. Pixy takes a PHP program as input, and creates a report that lists possible vulnerable points in the program, together with additional information for understanding the vulnerability.


    Phormer is a PHP-Based PhotoGallery Manager application, that helps you to store, categorize and trim your photos on the web with various helpful features.

    xajax PHP Class Library

    xajax is pretty much a PHP class for simplifying your workflow when working with PHP AJAX applications. It gives you an easy-to-use API for quickly managing AJAX-related tasks.


    PHP-GTK is an extension for the PHP programming language that implements language bindings for GTK+. It provides an object-oriented interface to GTK+ classes and functions and greatly simplifies writing client-side cross-platform GUI applications.


    Minify is PHP 5 app that helps you combines multiple CSS or Javascript files, removes unnecessary whitespace and comments, and serves them with gzip encoding and optimal client-side cache headers.


    gotAPI is a useful online tool for quickly looking up PHP functions and classes.


    PECL is a very useful directory of all known PHP extensions and a hosting facility for downloading and developing PHP extensions.


    Koders is the leading search engine for open source and other downloadable code now contains over 2 billion lines of code within its repository. This is very useful when you’re searching for various PHP source codes that you may need to replace or enhance, or when you’re not exactly sure of which codes to use, you can search, test, and compare.

    Few More

  • Valgrind
  • YSlow
  • Page Speed
  • Original From :

    Posted in PHP | Leave a Comment »

    PHP Frameworks

    Posted by Narendra Dhami on February 10, 2010

    Here are the list of some popular PHP Frameworks:

    Posted in PHP | Leave a Comment »

    Color Theory for Designers, Part 1: The Meaning of Color

    Posted by Narendra Dhami on February 10, 2010

    Color in design is very subjective. What evokes one reaction in one person may evoke a very different reaction in somone else. Sometimes this is due to personal preference, and other times due to cultural background. Color theory is a science in itself. Studying how colors affect different people, either individually or as a group, is something some people build their careers on. And there’s a lot to it. Something as simple as changing the exact hue or saturation of a color can evoke a completely different feeling. Cultural differences mean that something that’s happy and uplifting in one country can be depressing in another. Read More …

    Posted in General | Leave a Comment »

    Web Performance Testing – Test objectives and Real Life Monitoring

    Posted by Narendra Dhami on January 28, 2010

    Web Performance Testing is executed to provide accurate information on the readiness of an application through testing the web site and monitoring the server side application. This article describes different techniques and aspects to be considered when performing Performance Testing on Web Applications.

    Author: Robin Bortz, QualiTest Group,

    Today, when a telephone system is installed in a new neighborhood, it will always work perfectly. The phone will ring, the line will be OK and people can communicate with each other even if all the neighbors are speaking at the same time. This is because overtime algorithms have been developed so it is known how much hardware is needed to support a specific area. Web technology is very different. There can be a different number of servers, different amounts of resources per server, different and new technologies, many different parameters can be configured to improve performance and naturally code can always be improved. As the number of mission critical applications Web sites increases, so too there is an increasing need to test these applications in order to ensure and guarantee the business and marketing needs. As the web does not have the same loyalty as traditional trading, poor performance will let users’ easily move over to the competition.

    What is Web Performance Testing?

    Web Performance Testing is executed to provide accurate information on the readiness of an application through testing the web site and monitoring the server side application. This is done by simulating load as close as possible to the real conditions in order to evaluate if the application will support the expected load. That allows you to guarantee system performances and to identify and help in fixing possible issues, identifying possible bottlenecks and providing useful advice about how to fix problems (tuning of system parameters, modification of software or hardware upgrade).

    Testing is an art and science and there may be multiple objectives for testing. It is important to know what the objectives are before testing can begin. Naturally there is always interest in the different page times and specifically the slowest ones. These slow pages can point to the bottlenecks in the application. An objective may also be to make sure pages are downloaded within a specific time. Companies always want to know what this desired time is but it differs from application to application. It is desired to know the number of concurrency that the application can support and at how many users there is a decline in application performance. Does the page time reduced from 1 to 3 seconds which may be acceptable or is it greater than 30 seconds? Another objective may be to know the number of users which can crash the application.

    Type of Load Testing

    Once the objectives have been set, then the type of testing can be determined.

    Smoke Test: A simple quick test, to check if the application is really ready to be tested. This is not normally mentioned, but without it, much time and resources are wasted.

    Load Test: This is the simplest form of performance testing. A load test is usually conducted to understand the behavior of the application under a specific expected load. This load can be the expected concurrent number of users on the application performing a specific number of transactions within the set duration. This test will give out the response times of all the important business critical transactions. If the database, application server, etc are also monitored, then this simple test can itself point towards the bottleneck in the application.

    Stress: This testing is normally used to break the application. Double the number of users is added to the application and the test is run again until the application breaks down. This kind of test is done to determine the application's robustness in times of extreme load and helps application administrators to determine if the application will perform sufficiently if the current load goes well above the expected load.

    Spike Testing: Spike testing, as the name suggests is done by spiking the number of users and understanding the behavior of the application, whether it will go down or will it be able to handle dramatic changes in load.

    Endurance Testing (Soak Testing): This test is usually done to determine if the application can sustain the continuous expected load. Generally this test is done to determine if there are any memory leaks in the application.

    Model Your Real Life Conditions

    If you want to get an accurate idea of how your Web site is going to perform in the real world, then you need to model the conditions that your site will experience. That means different work flows representing different users and different actions per flow. This will give the confidence that the application will react in a similar way as to the load conditions. In order to model the test correctly and accurately, the following aspects should be considered:

    Configuration management: You must make sure the testing environment is ready for the load that will be generated. Ensure the testing hardware is dedicated to the testing effort. Ensure the database is not changed and that it is populated as it would be in production. When changes are made to the application, make these on at a time between test runs.

    Application Work Flows (Scenarios): This is most critical aspect of successful application testing and input should be provided by all teams which have the relevant knowledge. It could be marketing, sales or product management. In the case of existing applications, it should be modeled on known usage patterns and log files or other analytic tools which may also be used. Load testing is not functional testing and not every link should be clicked upon. However, the recorded scenarios should reflect typical usage of the application and these scenarios run according the appropriate percentages.

    Sleep Times: User spend time on web pages reading, filling in a form, watching a media clip or interacting with a client side object before the next click which sends a server request. This time parameter is extremely influential on the load created. People interact at different times, some read all the content and some scan briefly over the text. People using the same site repeatedly may quickly click through a number of screens. The Sleep time in the test should therefore be realistic often defined as random number within a specific range based on what is possible on a specific page. Also note that running a test without Sleep times is extremely difficult to quantify in terms of real users. It may be a good stress test but will not provide the answers needed for an accurate performance test. If the tools you are using do not have recorded Sleep time, try to obtain this important information from server logs.

    Usage Patterns of Usage in web applications: The web is not a 9:00 – 5:00 interface for doing business. Web sites may have a global reach or may be accessed during typical work hours. Some applications are continually accessed whilst other may be accessed once per week or less than that. For ecommerce, users use this always open interface to shop at any time in the day. There are however times where there may be less or more users. And there may be great variations as well as extreme peaks of load on the application. A marketing campaign for Christmas shopping or a large sporting event with web access can create such peaks. An educational application may have the greatest load at 8:00 in the morning when an online lesson begins. These patterns and peaks should be tested for in order to be ready for changing load as well as peak load periods.

    Browsers: The http header sent together with a web request may result in a different response from the server. A request from a mobile telephone may return a different result set than a browser. May require different server side activities and return a different result set.

    Connection speeds: In early web testing when homes were connected via a modem at the transmission speed 28.8 Kbps simulating such a slow speed actually made it easier for web applications to respond. The slow times were a result of the slow connection and not the slow response of the application. Today with ever increasing access speeds and fiber to the home, the load should be generated at such speeds and also generated at a mix of different speeds simulating different populations.

    Different, Remote Geographic Locations: Typically web performance testing is done in a lab environment. However different locations around the world may take longer due to the distance or web traffic at peak hours. There are ways to test this to get a true report of client side response times. Some tools have a remote roaming client which can be installed anywhere in the web to run at the same time as the load test. If it is an internal application, tests can be run from various locations or branches of the company. You can request from someone at a remote location to access the location whilst testing. This is generally recommended anyway to do even in the lab when testing. Recently with the development of Cloud computing there are some new exciting possibilities to resolve this problem of remote geographical locations as well as the ability to generate large leads with enough bandwidth. Using the services of a third-party company that specializes in testing a Web site from different locations around the world.

    Poor performance and slow reaction times will both have negative consequences on companies. These could be decreased sales in web commerce, customer abandonment, harmed reputation and wasted resources. Good test planning where the objectives have been set and the correct tests are based on these objectives will ensure that web application will perform as expected. Once these tests are run as close as possible to real life scenarios, you can be sure that the small insurance paid for the application testing it is a great investment.

    Posted in Testing | Leave a Comment »

    Continuous Integration

    Posted by Narendra Dhami on January 19, 2010

    Building a Feature with Continuous Integration

    The easiest way for me to explain what CI is and how it works is to show a quick example of how it works with the development of a small feature. Let’s assume I have to do something to a piece of software, it doesn’t really matter what the task is, for the moment I’ll assume it’s small and can be done in a few hours. (We’ll explore
    longer tasks, and other issues later on.)

    I begin by taking a copy of the current integrated source onto my local development machine. I do this by using a source
    code management system by checking out a working copy from the mainline.

    The above paragraph will make sense to people who use source code control systems, but be gibberish to those who don’t. So let me quickly explain that for the latter. A source code control system keeps all of a project’s source code in a repository. The current state of the system is usually referred to as the ‘mainline’. At any time a developer can make a controlled copy of the mainline onto their own machine, this is called ‘checking out’. The copy on the developer’s machine is called a ‘working copy’. (Most of the time you actually update your working copy to the mainline – in practice it’s the same thing.)

    Now I take my working copy and do whatever I need to do to complete my task. This will consist of both altering the
    production code, and also adding or changing automated tests. Continuous Integration assumes a high degree of tests which are automated into the software: a facility I call self-testing code. Often these use a version of the popular XUnit testing frameworks.

    Once I’m done (and usually at various points when I’m working) I carry out an automated build on my development machine. This takes the source code in my working copy, compiles and links it into an executable, and runs the automated tests. Only if it all builds and tests without errors is the overall build considered to be good.

    With a good build, I can then think about committing my changes into the repository. The twist, of course, is that other people may, and usually have, made changes to the mainline before I get chance to commit. So first I update my working copy with their changes and rebuild. If their changes clash with my changes, it will manifest as a failure either in the compilation or in the tests. In this case it’s my responsibility to fix this and repeat until I can build a working copy that is properly synchronized with the mainline.

    Once I have made my own build of a properly synchronized working copy I can then finally commit my changes into the mainline,
    which then updates the repository.

    However my commit doesn’t finish my work. At this point we build again, but this time on an integration machine based on
    the mainline code. Only when this build succeeds can we say that my changes are done. There is always a chance that I missed something on my machine and the repository wasn’t properly updated. Only when my committed changes build successfully on the integration is my job done. This integration build can be executed manually by me, or done automatically by Cruise.

    If a clash occurs between two developers, it is usually caught when the second developer to commit builds their updated working copy. If not the integration build should fail. Either way the error is detected rapidly. At this point the most important task is to fix it, and get the build working properly again. In a Continuous Integration environment you should never have a failed integration build stay failed for long. A good team should have many correct builds a day. Bad builds do occur from time to time, but should be quickly fixed.

    The result of doing this is that there is a stable piece of software that works properly and contains few bugs. Everybody develops off that shared stable base and never gets so far away from that base that it takes very long to integrate back with it. Less time is spent trying to find bugs because they show up quickly.

    Practices of Continuous Integration

    The story above is the overview of CI and how it works in daily life. Getting all this to work smoothly is obviously rather more than that. I’ll focus now on the key practices that make up effective CI.

    Maintain a Single Source Repository.

    Software projects involve lots of files that need to be orchestrated together to build a product. Keeping track of all of these is a major effort, particularly when there’s multiple people involved. So it’s not surprising that over the years software development teams have built tools to manage all this. These tools – called Source Code Management tools, configuration management, version control systems, repositories, or various other names – are an integral part of most development projects. The sad and surprising thing is that they aren’t part of all projects. It is rare, but I do run into projects that don’t use such a system and use some messy combination of local and shared drives.

    So as a simple basis make sure you get a decent source code management system. Cost isn’t an issue as good quality open-source tools are available. The current open source repository of choice is Subversion. (The older open-source tool CVS is still widely used, and is much better than nothing, but Subversion is the modern choice.) Interestingly as I talk to developers I know most commercial source code management tools are liked less than Subversion. The only tool I’ve consistently heard people say is worth paying for is Perforce.

    Once you get a source code management system, make sure it is the well known place for everyone to go get source code. Nobody should ever ask “where is the foo-whiffle file?” Everything should be in the repository.

    Although many teams use repositories a common mistake I see is that they don’t put everything in the repository. If people use one they’ll put code in there, but everything you need to do a build should be in
    there including: test scripts, properties files, database schema, install scripts, and third party libraries. I’ve known projects that check their compilers into the repository (important in the early days of flaky C++ compilers). The basic rule of thumb is that you should be able to walk up to the project with a virgin machine, do a checkout, and be able to fully build the system. Only a minimal amount of things should be on the virgin machine – usually things that are large, complicated to install, and stable. An operating system, Java development environment, or base database system are typical examples.

    You must put everything required for a build in the source control system, however you may also put other stuff that people generally work with in there too. IDE configurations are good to put in there because that way it’s easy for people to share the same IDE setups.

    One of the features of version control systems is that they allow you to create multiple branches, to handle different streams of development. This is a useful, nay essential, feature – but it’s frequently overused and gets people into trouble. Keep your use of branches to a minimum. In particular have a mainline: a single branch of the project currently under development. Pretty much everyone should work off this mainline most of the time. (Reasonable branches are bug fixes of prior production releases and temporary experiments.)

    In general you should store in source control everything you need to build anything, but nothing that you actually build. Some people do keep the build products in source control, but I consider that to be a smell – an indication of a deeper problem, usually an inability to reliably recreate builds.

    Automate the Build

    Getting the sources turned into a running system can often be a complicated process involving compilation, moving files around, loading schemas into the databases, and so on. However like most tasks in this part of software development it can be automated – and as a result should be automated. Asking people to type in strange commands or clicking through dialog boxes is a waste of time and a breeding ground for mistakes.

    Automated environments for builds are a common feature of systems. The Unix world has had make for decades, the Java community developed Ant, the .NET community has had Nant and now has MSBuild. Make sure you can build and launch your system using these scripts using a single command.

    A common mistake is not to include everything in the automated build. The build should include getting the database schema out of the repository and firing it up in the execution environment. I’ll elaborate my earlier rule of thumb: anyone should be able to bring in a virgin machine, check the sources out of the repository, issue a single command, and have a running system on their machine.

    Build scripts come in various flavors and are often particular to a platform or community, but they don’t have to be. Although most of our Java projects use Ant, some have used Ruby (the Ruby Rake system is a very nice build script tool). We got a lot of value from automating an early Microsoft COM project with Ant.

    A big build often takes time, you don’t want to do all of these steps if you’ve only made a small change. So a good build tool analyzes what needs to be changed as part of the process. The common way to do this is to check the dates of the source and object files and only compile if the source date is later. Dependencies then get tricky: if one object file changes those that depend on it may also need to be rebuilt. Compilers may handle this kind of thing, or they may not.

    Depending on what you need, you may need different kinds of things to be built. You can build a system with or without test code, or with different sets of tests. Some components can be built stand-alone. A build script should allow you to build alternative targets for different cases.

    Many of us use IDEs, and most IDEs have some kind of build management process within them. However these files are always proprietary to the IDE and often fragile. Furthermore they need the IDE to work. It’s okay for IDE users set up their own project files and use them for individual development. However it’s essential to have a master build that is usable on a server and runnable from other scripts. So on a Java project we’re okay with having developers build in their IDE, but the master build uses Ant to ensure it can be run on the development server.

    Make Your Build Self-Testing

    Traditionally a build means compiling, linking, and all the additional stuff required to get a program to execute. A program may run, but that doesn’t mean it does the right thing. Modern staticallytyped languages can catch many bugs, but far more slip through that net.

    A good way to catch bugs more quickly and efficiently is to include automated tests in the build process. Testing isn’t p erfect, of course, but it can catch a lot of bugs – enough to be useful. In particular the rise of Extreme Programming (XP) and Test Driven Development (TDD) have done a great deal to popularize self-testing code and as a result many people have seen the value of the technique.

    Regular readers of my work will know that I’m a big fan of both TDD and XP, however I want to stress that neither of these approaches are necessary to gain the benefits of self-testing code. Both of these approaches make a point of writing tests before you write the code that makes them pass – in this mode the tests are as much about exploring the design of the system as they are about bug catching. This is a Good Thing, but it’s not necessary for the purposes of Continuous Integration, where we have the weaker
    requirement of self-testing code. (Although TDD is my preferred way of producing self-testing code.)

    For self-testing code you need a suite of automated tests that can check a large part of the code base for bugs. The tests need to be able to be kicked off from a simple command and to be self-checking. The result of running the test suite should indicate if any tests failed. For a build to be self-testing the failure of a test should cause the build to fail.

    Over the last few years the rise of TDD has popularized the XUnit family of open-source tools which are ideal for this kind of testing. XUnit tools have proved very valuable to us at ThoughtWorks and I always suggest to people that they use them. These tools, pioneered by Kent Beck, make it very easy for you to set up a fully self-testing environment.

    XUnit tools are certainly the starting point for making your code self-testing. You should also look out for other tools that focus on more end-to-end testing, there’s quite a range of these out there at the moment including FIT, Selenium, Sahi, Watir, FITnesse, and plenty of others that I’m not trying to comprehensively list here.

    Of course you can’t count on tests to find everything. As it’s often been said: tests don’t prove the absence of bugs. However perfection isn’t the only point at which you get payback for a self-testing build. Imperfect tests, run frequently, are much better than perfect tests that are never written at all.

    Everyone Commits To the Mainline Every Day

    Integration is primarily about communication. Integration allows developers to tell other developers about the changes they have made. Frequent communication allows people to know quickly as changes develop.

    The one prerequisite for a developer committing to the mainline is that they can correctly build their code. This, of course, includes passing the build tests. As with any commit cycle the developer first updates their working copy to match the mainline, resolves any conflicts with the mainline, then builds on their local machine. If the build passes, then they are free to commit to the mainline.

    By doing this frequently, developers quickly find out if there’s a conflict between two developers. The key to fixing problems quickly is finding them quickly. With developers committing every few hours a conflict can be detected within a few hours of it occurring, at that point not much has happened and it’s easy to resolve. Conflicts that stay undetected for weeks can be very hard to resolve.

    The fact that you build when you update your working copy means that you detect compilation conflicts as well as textual conflicts. Since the build is self-testing, you also detect conflicts in the running of the code. The latter conflicts are particularly awkward bugs to find if they sit for a long time undetected in the code. Since there’s only a few hours of changes between commits, there’s only so many places where the problem could be hiding. Furthermore since not much has changed you can use diff-debugging to help you find the bug.

    My general rule of thumb is that every developer should commit to the repository every day. In practice it’s often useful if developers commit more frequently than
    that. The more frequently you commit, the less places you have to look for conflict errors, and the more rapidly you fix conflicts.

    Frequent commits encourage developers to break down their work into small chunks of a few hours each. This helps track progress and provides a sense of progress. Often people initially feel they can’t do something meaningful in just a few hours, but we’ve found that mentoring and practice helps them learn.

    Every Commit Should Build the Mainline on an Integration Machine

    Using daily commits, a team gets frequent tested builds. This ought to mean that the mainline stays in a healthy state. In practice, however, things still do go wrong. One reason is discipline, people not doing an update and build before they commit. Another is environmental differences between developers’ machines.

    As a result you should ensure that regular builds happen on an integration machine and only if this integration build succeeds should the commit be considered to be done. Since the developer who commits is responsible for this, that developer needs to monitor the mainline build so they can fix it if it breaks. A corollary of this is that you shouldn’t go home until the mainline build has passed with any commits you’ve added late in the day.

    There are two main ways I’ve seen to ensure this: using a manual build or a continuous integration server.

    The manual build approach is the simplest one to describe. Essentially it’s a similar thing to the local build
    that a developer does before the commit into the repository. The developer goes to the integration machine, checks out the head of the mainline (which now houses his last commit) and kicks off the integration build. He keeps an eye on its progress, and if the build succeeds he’s done with his commit. (Also see Jim Shore’s description.)

    A continuous integration server acts as a monitor to the repository. Every time a commit against the repository finishes the server automatically checks out the sources onto the integration machine, initiates a build, and notifies the committer of the result of the build. The committer isn’t done until she gets the notification – usually an email.

    At ThoughtWorks, we’re big fans of continuous integration servers – indeed we led the original development of CruiseControl and CruiseControl.NET, the widely used open-source CI servers. Since then we’ve also built the commercial Cruise CI server. We use a CI server on nearly every project we do and have been very happy with the results.

    Not everyone prefers to use a CI server. Jim Shore gave a well argued description of why he prefers the manual approach. I agree with him that CI is much more than just installing some software. All the practices here need to be in play to do Continuous Integration effectively. But equally many teams who do CI well find a CI server to be a helpful tool.

    Many organizations do regular builds on a timed schedule, such as every night. This is not the same thing as a continuous build and isn’t enough for continuous integration. The whole point of continuous integration is to find problems as soon as you can. Nightly builds mean that bugs lie undetected for a whole day before anyone discovers them. Once they are in the system that long, it takes a long time to find and remove them.

    A key part of doing a continuous build is that if the mainline build fails, it needs to be fixed right away. The whole point of working with CI is that you’re always developing on a known stable base. It’s not a bad thing for the mainline build to break, although if it’s happening all the time it suggests people aren’t being careful enough about updating and building locally before a commit. When the mainline build does break, however, it’s important that it gets fixed fast. To help avoid breaking the mainline you might consider using a pending head.

    When teams are introducing CI, often this is one of the hardest things to sort out. Early on a team can struggle to get into the regular habit of working mainline builds, particularly if they are working on an existing code base. Patience and steady application does seem to regularly do the trick, so don’t get discouraged.

    Keep the Build Fast

    The whole point of Continuous Integration is to provide rapid feedback. Nothing sucks the blood of a CI activity more than a build that takes a long time. Here I must admit a certain crotchety old guy amusement at what’s considered to be a long build. Most of my colleagues consider a build that takes an hour to be totally unreasonable. I remember teams dreaming that they could get it so fast – a nd occasionally we still run into cases where it’s very hard to get builds to that speed.

    For most projects, however, the XP guideline of a ten minute build is perfectly within reason. Most of our modern projects achieve this. It’s worth putting in concentrated effort to make it happen, because every minute you reduce off the build time is a minute saved for each developer every time they commit. Since CI demands frequent commits, this adds up to a lot of time.

    If you’re staring at a one hour build time, then getting to a faster build may seem like a daunting prospect. It can even be daunting to work on a new project and think about how to keep things fast. For enterprise applications, at least, we’ve found the usual bottleneck is testing – particularly tests that involve external services such as a database.

    Probably the most crucial step is to start working on setting up a staged build. The idea behind a staged build (also known as build pipeline) is that there are in fact multiple builds done in sequence. The commit to the mainline triggers the first build – what I call the commit build. The commit build is the build that’s needed when someone commits to the mainline. The commit build is the one that has to be done quickly, as a result it will take a number of shortcuts that will reduce the ability to detect bugs. The trick is to balance the needs of bug finding and speed so that a good commit build is stable enough for other people to work on.

    Once the commit build is good then other people can work on the code with confidence. However there are further, slower, tests that you can start to do. Additional machines can run further testing routines on the build that take longer to do.

    A simple example of this is a two stage build. The first stage would do the compilation and run tests that are more localized unit tests with the database completely stubbed out. Such tests can run very fast, keeping within the ten minute
    guideline. However any bugs that involve larger scale interactions, particularly those involving the real database, won’t be found. The second stage build runs a different suite of tests that do hit the real database and involve more end-to-end behavior. This suite might take a couple of hours to run.

    In this scenario people use the first stage as the commit build and use this as their main CI cycle. The second-stage build is a secondary build which runs when it can, picking up the executable from the latest good commit build for further testing. If the secondary build fails, then
    this doesn’t have the same ‘stop everything’ quality, but the team does aim to fix such bugs as rapidly as possible, while keeping the commit build running. Indeed the secondary build doesn’t have to stay good, as long as each known bug is identified and dealt with in a next few days. As in this example, secondary builds are often pure tests since these days it’s usually tests that cause the slowness.

    If the secondary build detects a bug, that’s a sign that the commit build could do with another test. As much as possible you want to ensure that any secondary build failure leads to new test in the commit build that would have caught the bug, so the bug stays fixed in the commit build. This way the commit tests are strengthened whenever something gets past them. There are cases where there’s no way to build a fast-running test that exposes the bug, so you may decide to only test for that condition in the secondary build. Most of time, fortunately, you can add suitable tests to the commit build.

    This example is of a two-stage build, but the basic principle can be extended to any number of later builds. The builds after the commit build can also be done in parallel, so if you have two hours of secondary tests you can improve responsiveness by having two machines that run half the tests each. By using parallel secondary builds like this you can introduce all sorts of further automated testing, including performance testing, into the regular build process. (I’ve run into a lot of interesting techniques around this as I’ve visited various ThoughtWorks projects over the last couple of years – I’m hoping to persuade some of the developers to write these up.)

    Test in a Clone of the Production Environment

    The point of testing is to flush out, under controlled conditions, any problem that the system will have in production. A significant part of this is the environment within which the production system will run. If you test in a different environment, every difference results in a risk that what happens under test won’t happen in production.

    As a result you want to set up your test environm nt to be as exact a mimic of your production environment as possible. Use the same database software, with the same versions, use the same version of operating system. Put all the appropriate libraries that are in the production environment into the test environment, even if the system doesn’t actually use them. Use the same IP addresses and ports, run it on the same hardware.

    Well, in reality there are limits. If you’re writing desktop software it’s not practicable to test in a clone of every possible desktop with all the third party software that different people are running. Similarly some production environments may be prohibitively expensive to duplicate (although I’ve often come across false economies by not duplicating moderately expensive environments). Despite these limits your
    goal should still be to duplicate the production environment as much as you can, and to understand the risks you are accepting for every difference between test and production.

    If you have a pretty simple setup without many awkward communications, you may be able to run your commit build in a mimicked environment. Often, however, you need to use test doubles because systems respond slowly or intermittently. As a result it’s common to have a very artificial environment for the commit tests for speed, and use a production clone for secondary testing.

    I’ve noticed a growing interest in using virtualization to make it easy to put together test environments. Virtualized machines can be saved with all the necessary elements baked into the virtualization. It’s then relatively straightforward to install the latest build and run tests. Furthermore this can allow you to run multiple tests on one machine, or simulate multiple machines in a network on a single machine. As the performance penalty of virtualization decreases, this option makes more and more sense.

    Make it Easy for Anyone to Get the Latest Executable

    One of the most difficult parts of software development is making sure that you build the right software. We’ve found that it’s very hard to specify what you want in advance and be correct; people find it much easier to see something that’s not quite right and say how it needs to be changed. Agile development processes explicitly expect and take advantage of this part of human behavior.

    To help make this work, anyone involved with a software project should be able to get the latest executable and be able to run it: for demonstrations, exploratory testing, or just to see what changed this week.

    Doing this is pretty straightforward: make sure there’s a well known place where people can find the latest executable. It may be useful to put several executables in such a store. For the very latest you should put the latest executable to pass the commit tests – such an executable should be pretty stable providing the commit suite is reasonably strong.

    If you are following a process with well defined iterations, it’s usually wise to also put the end of iteration builds there too. Demonstrations, in particular, need software whose features are familiar, so then it’s usually worth sacrificing the very latest for something that the demonstrator knows how to operate.

    Everyone can see what’s happening

    Continuous Integration is all about communication, so you want to ensure that everyone can easily see the state of the system and the changes that have been made to it.

    One of the most important things to communicate is the state of the mainline build. If you’re using Cruise there’s a built in web site that will show you if there’s a build in progress and what was the state of the last mainline build. Many teams like to make this even more apparent by hooking up a continuous display to the build system – lights that glow green when the build works, or red if it fails are popular. A particularly common touch is red and green lava lamps – not just do these indicate the state of the build, but also how long it’s been in that state. Bubbles on a red lamp indicate the build’s been broken for too long. Each team makes its own choices on these build sensors – it’s good t be playful with your choice (recently I saw someone experimenting with a dancing rabbit.)

    If you’re using a manual CI process, this visibility is still essential. The monitor of the physical build machine can show the status of the mainline build. Often you have a build token to put on the desk of whoever’s currently doing the build (again something silly like a rubber chicken is a good choice). Often people like to make a simple noise on good builds, like ringing a bell.

    CI servers’ web pages can carry more information than this, of course. Cruise provides an indication not just of
    who is building, but what changes they made. Cruise also provides a history of changes, allowing team members to get a good sense of recent activity on the project. I know team leads who like to use this to get a sense of what people have been doing and keep a sense of the changes to the system.

    Another advantage of using a web site is that those that are not co-located can get a sense of the project’s status. In general I prefer to have everyone actively working on a project sitting together, but often there are peripheral people who like to keep an eye on things. It’s also useful for groups to aggregate together build information from multiple projects – providing a simple and automated status of different projects.

    Good information displays are not only those on a computer screens. One of my favorite displays was for a project that was getting into CI. It had a long history of being unable to make stable builds. We put a calendar on the wall that showed a full year with a small square for each day. Every day the QA group would put a green sticker on the day if they had received one stable build that passed the commit tests, otherwise a red square. Over time the calendar revealed the state of the build process showing a steady improvement until green squares were so common that the calendar disappeared – its purpose fulfilled.

    Automate Deployment

    To do Continuous Integration you need multiple environments, one to run commit tests, one or more to run secondary tests. Since you are moving executables between these environments multiple times a day, you’ll want to do this automatically. So it’s important to have scripts that will allow you to deploy the application into any environment easily.

    A natural consequence of this is that you should also have scripts that allow you to deploy into production with similar ease. You may not be deploying into production every day (although I’ve run into projects that do), but automatic deployment helps both speed up the process and reduce errors. It’s also a cheap option since it just uses the same capabilities that you use to deploy into test environments.

    If you deploy into production one extra automated capability you should consider is automated rollback. Bad things do happen from time to time, and if smelly brown substances hit rotating metal, it’s good to be able to quickly
    go back to the last known good state. Being able to automatically revert also reduces a lot of the tension of deployment, encouraging people to deploy more frequently and thus get new features out to users quickly. (The Ruby on Rails community developed a tool called Capistrano that is a good example of a tool that does this sort of thing.)

    In clustered environments I’ve seen rolling deployments where the new software is deployed to one node at a time, gradually replacing the application over the course of a few hours.

    See Related Article: Evolutionary Database Design

    A common roadblock for many people doing frequent releases is database migration. Database changes are awkward because you can’t just change database schemas, you also have to ensure data is correctly migrated. This article describes techniques used by my colleague Pramod Sadalage to do automated refactoring and migration of databases. The article is an early attempt the capture the information that’s described in more detail by Pramod and Scott Amblers book on refactoring databases[ambler-sadalage].

    A particularly interesting variation of this that I’ve come across with public web application is the idea of deploying a trial build to a subset of users. The team then sees how the trial build is used before deciding whether to deploy it to the full user population. This allows you to test out new features and user-interfaces before committing to a final choice. Automated deployment, tied into good CI discipline, is essential to making this work.

    Benefits of Continuous Integration

    On the whole I think the greatest and most wide ranging benefit of Continuous Integration is reduced risk. My mind still floats back to that early software project I mentioned in my first paragraph. There they were at the end (they hoped) of a long project, yet with no real idea of how long it would be before they were done.

    The trouble with deferred integration is that it’s very hard to predict how long it will take to do, and worse it’s very hard to see how far you are through the process. The result is that you are putting yourself into a complete blind spot right at one of tensest parts of a project – even if you’re one of the rare cases where you aren’t already late.

    Continuous Integration completely finesses this problem. There’s no long integration, you completely eliminate the blind spot. At all times you know where you are, what works, what doesn’t, the outstanding bugs you have in your system.

    Bugs – these are the nasty things that destroy confidence and
    mess up schedules and reputations. Bugs in deployed software make users angry with you. Bugs in work in progress get in your way, making it harder to get the rest of the software working correctly.

    Continuous Integrations doesn’t get rid of bugs, but it does make them dramatically easier to find and remove. In this respect it’s rather like self-testing code. If you introduce a bug and detect it quickly it’s far easier to get rid of. Since you’ve only changed a small bit of the system, you don’t have far to look. Since that bit of the system is the bit you just worked with, it’s fresh in your memory – again making it easier to find the bug. You can also use diff debugging – comparing the current version of the system to an earlier one that didn’t have the bug.

    Bugs are also cumulative. The more bugs you have, the harder it is to remove each one. This is partly because you get bug interactions, where failures show as the result of multiple faults – making each fault harder to find. It’s also psychological – people have less energy to find and get rid of bugs when there are many of them – a phenomenon that the
    Pragmatic Programmers call the Broken Windows syndrome.

    As a result projects with Continuous Integration tend to have dramatically less bugs, both in production and in process. However I should stress that the degree of this benefit is directly tied to how good your test suite is. You should find that it’s not too difficult to build a test suite that makes a noticeable difference. Usually, however, it takes a while before a team really gets to the low level of bugs that they have the potential to reach. Getting there means constantly working on and improving your tests.

    If you have continuous integration, it removes one of the biggest barriers to frequent deployment. Frequent deployment is valuable because it allows your users to get new features more rapidly, to give more rapid feedback on those features, and generally become more collaborative in the development cycle. This helps break down the barriers between customers and development – barriers which I believe are the biggest barriers to successful software development.

    Introducing Continuous Integration

    So you fancy trying out Continuous Integration – where do you start? The full set of practices I outlined above give you the full benefits – but you don’t need to start with all of them.

    There’s no fixed recipe here – much depends on the nature of your setup and team. But here are a few things that we’ve learned to get things going.

    One of the first steps is to get the build automated. Get everything you need into source control get it so that you can build the whole system with a single command. For many projects this is not a minor undertaking – yet it’s essential for any of the other things to work. Initially you may only do build occasionally on demand, or just do an automated nightly build. While these aren’t continuous integration an automated nightly build is a fine step on the way.

    Introduce some automated testing into you build. Try to identify the major areas where things go wrong and get automated tests to expose those failures. Particularly on an existing
    project it’s hard to get a really good suite of tests going rapidly – it takes time to build tests up. You have to start somewhere though – all those cliches about Rome’s build schedule apply.

    Try to speed up the commit build. Continuous Integration on a build of a few hours is better than nothing, but getting down to that magic ten minute number is much better. This usually requires some pretty serious surgery on your code base to do as you break dependencies on slow parts of the system.

    If you are starting a new project, begin with Continuous Integration from the beginning. Keep an eye on build times and take action as soon as you start going slower than the ten minute rule. By acting quickly you’ll make the necessary restructurings before the code base gets so big that it becomes a major pain.

    Above all get some help. Find someone who has done Continuous Integration before to help you. Like any new technique it’s hard to introduce it when you don’t know what the final result looks like. It may cost money to get a mentor, but you’ll also pay in lost time and productivity if you don’t do it. (Disclaimer / Advert – yes we at ThoughtWorks do some consultancy in this area. After all we’ve made most of the mistakes that there are to make.)

    Final Thoughts

    In the years since Matt and I wrote the original paper on this site, Continuous Integration has become a mainstream technique for software development. Hardly any ThoughtWorks projects goes without it – and we see others using CI all over the world. I’ve hardly ever heard negative things about the approach – unlike some of the more controversial Extreme Programming practices.

    If you’re not using Continuous Integration I strongly urge you give it a try. If you are, maybe there are some ideas in this article that can help you do it more effectively. We’ve learned a lot about Continuous Integration in the last few years, I hope there’s still more to learn and improve.

    Further Reading

    An essay like this can only cover so much ground. To explore Continuous Integration in more detail I suggest taking a look at Paul Duvall’s appropriately titled book on the subject (which won a Jolt award – more than I’ve ever managed). There’s not much been written on staged builds thus far, but there’s an essay by Dave Farley in The ThoughtWorks Anthology that’s useful (also available here).

    Original from

    Posted in Testing | Leave a Comment »

    PHP Patterns

    Posted by Narendra Dhami on January 18, 2010

    Below is the better explanation of possible design patterns which can be implemented in PHP:

    Posted in PHP | Leave a Comment »

    Tips on MySQL

    Posted by Narendra Dhami on January 18, 2010

  • Posted in MySQL | Leave a Comment »