The age of data science is here, and it is expanding into all the areas of software development, including testing. This shouldn’t come as a surprise since the 2 have a lot in common.
Testing is all about data and patterns, making testers ready to enhance their knowledge with insights from the new discipline. Yet, most testers fear this transition as they perceive it as a potential threat to their position. In fact, this shouldn’t be the case. A senior tester from A1QA explained that the general approach towards Data Science should be more relaxed and curious. By becoming knowledgeable about data science, testers can elevate their skills and improve their testing abilities with new tools and methods.
The cross-section of data science and testing
The importance of data science for testing can be assessed by starting with the five basic questions it can answer. Because, at its core, it is that simple. A tester who is asked to use data science should only think if the problem they are trying to solve fits into one of these categories and apply the specified algorithms.
Classification: Is this A or B?
Classification algorithms are great for questions that have a couple of answers. Usually, there are two (yes/no), but there are situations where there are more options possible, which are named multi-class classifications. This works best on a small set of possibilities.
By learning classification algorithms, testers can define error types and even pinpoint causes. Since there are many classification algorithms, selecting the best one could be a problem. Luckily, recent research shows that a method called Active Testing can designate the winner in a tournament-like fashion. At each step, it is comparing outcomes and moving forward with the winner.
Anomaly detection: Is this out of the ordinary?
The power of data science is that it can highlight patterns in vast amounts of unstructured information. By understanding the underlying structure, it becomes easy to identify the pieces that don’t match. Such algorithms are already used by banks to prevent credit card frauds.
Testers can use these tools to get a quick highlight of the results that don’t fall into the 'business as usual' category. Anomaly detection is a more elegant way to track down bugs. Also, it can be useful during performance testing to identify if any part of the system is behaving differently from expectations. Lastly, testers can incorporate this into the routine used for updates, to verify that upgrades did not hurt the current performance significantly.
Regression: what Is the size?
Data science is all about numbers, which become important when trying to forecast the future. By identifying the trend using historical data, algorithms can anticipate future values even in volatile conditions.
Also, during performance testing, such information can determine if a specific software product is behaving within the expected parameters or if it is using too many resources. Regression can even be used to estimate the number of bugs in each development stage.
Clustering: what Is the structure?
Sometimes when we plot data, patterns emerge, or we can see points coming together towards a center. Usually, there are many ways to group data, depending on the number of groups we want to get at the end, or on where we draw the line regarding 'similar' behaviors.
Defect clustering has been used in software testing for a long time. It relies on the supposition that most defects are caused by a small number of modules. Following the Pareto principle, 80% of faults come from 20% of modules. The underlying data structure makes it easy to define solutions to similar problems.
Reinforcement learning: what Is the next step?
The final goal of AI is to learn to create reasoning patterns similar to those of humans. This is done by looking not only at trends and events but at event sequences, and by extracting the causality. Once some steps are similar to an existing pattern, the next step can be determined.
Testers can use this approach to create scripts that can be the base for intelligent machinery such as self-driving cars or household robots. By learning sequences, these tools will be able to mimic reasoning in a way that is not only convincing but safe for humans to be around.
Although intimidating at first, data science can be useful for software testers and, with a bit of practice, it could help them discover new ways of doing their job. Presumably, these will also be much faster, more accurate and apt for automation.
Yet, to be successful, it is essential to ask the right questions since data is an amorphous mass, much like a raw ingredient. You could take flour and make pizza, bread or bagels - it all depends on the process. It’s the same with data analysis. Different methods used with the same data can yield very different outcomes.
It is critical for a tester to start with the right question, not with a preferred method. The way you are going to answer the question is secondary, as it is just a tool, not the overall goal. The initial difficulty for a general tester will be to formulate the problem in a way that is relevant to the client’s business.
The disruption of software testing
The software testing territory is going through a profound transformation at the moment. The good news is that for now, all testing forms will co-exist and testers keep the opportunity to choose the direction which best satisfies their interests, working style, and expertise. Manual testers are not disappearing, since they are in high demand for UX functions anyway.
Testing automation is also growing, slowly replacing repetitive and tedious parts of manual testing. To top it all, data science and AI are forming new generations of hybrid testers who are part data scientists. The best news for them is that they already have the technical skills required since the programming languages used are the same - mostly R and Python.