This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "Best Practices in Data Mining." The speaker is Mark Flaherty, CMO at InetSoft.
Flaherty: And in that context, what we have seen more and more of our customers coming to us, talking about big data for example, where we have large volumes of data, or we have different types of data or coming in at different speeds. So I think some of our more mature customers are also focusing on what are the best practices around sampling when it comes to big data.
When it comes to data visualization, what are some of best methods to use? When it comes to transformations, there are question such as how do we handle missing values? That’s from the data preparation process, and a lot of our customers are looking into some of the best practices on that end.
Moderator: When you talk about sampling, I am presuming you are talking about taking a small subset of your data and creating some algorithms using the subset. Obviously if you are trying to develop an algorithm based on a megabyte of data, it's going to run a lot of faster than if you try to do that on a terabyte of data. When you do sampling, what’s a good percentage of the total? Is there a best practice there?
Flaherty: I think we have seen ranges from like 2% to 6% or 7%, but I think definitely when we are trying to predict something which is rare, such as fraud. You will need to pull a percentage from the higher end of this range from a sampling perspective. When the customers are asking, can I model on a complete set of data, is there even a possibility, will I be able to yield create a better model, if I have a well-defined sample versus building a model on a complete set of data?
Helminthology research may not make headlines every day, but the study of parasitic worms is a critical pillar of both public health and agricultural science. Firms working in this domain collect staggering volumes of information: genetic sequences of helminths, infection prevalence data from rural communities, soil and water environmental samples, livestock treatment trials, and drug resistance markers. These datasets are vast, messy, and interconnected across disciplines ranging from molecular biology to epidemiology. For years, many research groups relied on platforms like Qlik to wrangle their analytics. Qlik provided a way to aggregate and visualize their core metrics, helping researchers make sense of sprawling spreadsheets and imported databases. Yet as helminthology shifted into an era of “big data”—powered by high-throughput sequencing, IoT-enabled monitoring devices, and real-time field data logging—Qlik began showing its limitations. This is the story of how one such firm made the switch to InetSoft’s Style Intelligence platform and, in doing so, reshaped its capacity to deliver insights at scale.
To appreciate the magnitude of the change, one must understand the complexity of the data landscape in helminthology. A single research program studying Schistosoma mansoni, for example, might involve terabytes of genomic sequences, survey data from thousands of individuals across endemic regions, climate sensor readings tracking freshwater habitats, and clinical data from treatment cohorts. On top of that, regulatory agencies demand precise traceability: every dataset must be documented, anonymized appropriately, and kept accessible for audits or publication peer review.
At the firm in question—let’s call it Paratech Analytics—these challenges were magnified by partnerships with universities, NGOs, and governments in over a dozen countries. Data arrived in multiple formats, languages, and standards. Qlik had served as their main BI tool for nearly eight years, producing dashboards on infection prevalence, intervention outcomes, and population-level modeling. But Qlik’s traditional associative engine struggled with the exponential growth of data streams, particularly when it came to real-time or near-real-time analysis. Integrating new data sources often meant weeks of scripting, custom connectors, and data preprocessing that slowed research momentum.
The team identified three core pain points with Qlik:
While Qlik remained powerful in certain business contexts, Paratech needed a platform purpose-built for flexibility, scalability, and real-time mashups. The search for a replacement began in earnest.
InetSoft’s Style Intelligence emerged as the frontrunner during the evaluation process. What appealed most to Paratech’s leadership was InetSoft’s unique positioning: a balance between robust enterprise-level analytics and nimble data mashup capabilities. The platform’s ability to integrate disparate sources—relational databases, big data stores, cloud APIs, IoT streams—without heavy reliance on ETL pipelines was a game-changer. For a research field characterized by constant data evolution, this agility mattered.
InetSoft also distinguished itself with cost efficiency. Compared to Qlik’s licensing model, InetSoft offered more flexible pricing that scaled predictably with usage. This mattered for a nonprofit-leaning research environment where budgets are often tied to grants. Furthermore, InetSoft’s visualizations could be delivered across devices, with lighter-weight dashboards that field researchers in sub-Saharan Africa or Southeast Asia could access on tablets without straining bandwidth.
The migration from Qlik to InetSoft unfolded in three phases:
Paratech’s IT and data science teams began by consolidating legacy Qlik dashboards into InetSoft’s environment. This required exporting raw datasets, re-mapping schemas, and building data mashups that reflected the organization’s complex data flows. Where Qlik required lengthy pre-processing, InetSoft allowed direct connections into both their SQL databases and their cloud-based genomic repositories. In just two months, the team recreated key dashboards on infection rates and treatment efficacy, but now powered by live data streams.
Researchers accustomed to Qlik’s interface were initially skeptical. To ease the transition, InetSoft consultants conducted tailored workshops showing how dashboards could be built with drag-and-drop simplicity. Within weeks, epidemiologists who had never coded were creating their own visualizations of worm burden by age group and geographic location. Field coordinators learned to embed real-time dashboards into their mobile reporting tools, a feature Qlik had never supported without heavy customization.
By the third phase, Paratech began leveraging InetSoft’s predictive modeling capabilities. Using Style Intelligence’s built-in statistical functions, they modeled infection spread scenarios under varying climate conditions. They also integrated genomic resistance markers into dashboards that flagged emerging hotspots of drug-resistant helminths. These predictive insights fundamentally shifted their role: from descriptive analysis to forward-looking guidance for intervention strategies.
The switch to InetSoft delivered tangible benefits across multiple dimensions:
Most importantly, these improvements translated into scientific impact. In one case, predictive modeling dashboards alerted researchers to a spike in resistance markers in a Tanzanian region, prompting an early intervention campaign that averted a wider outbreak. In another, integration of water-quality sensor data with infection prevalence revealed correlations that guided sanitation investment decisions.
Beyond the technical wins, adopting InetSoft fostered a cultural transformation at Paratech. Researchers previously dependent on IT staff for dashboard updates began experimenting with their own visualizations. Weekly research meetings evolved into interactive sessions where hypotheses could be tested live against the data. The shift democratized analytics, empowering frontline scientists to become data storytellers rather than passive consumers.
Leadership also appreciated InetSoft’s flexibility in governance. With sensitive human subject data in play, compliance with data protection regulations was non-negotiable. InetSoft’s role-based access controls allowed the firm to tailor visibility for different stakeholders, ensuring that anonymized views were shared with external partners while sensitive details remained restricted.
Paratech’s roadmap includes further leveraging InetSoft’s advanced analytics. Plans are underway to build dashboards that integrate drone-collected imagery of snail habitats (the intermediate hosts of schistosomes) with machine learning models predicting outbreak zones. Blockchain-based data provenance is also being explored to assure funding agencies and journals of end-to-end data integrity.
The transition from Qlik to InetSoft was not simply a technological upgrade; it was a redefinition of how helminthology research can operate in a big data era. By moving to a platform that embraced agility, integration, and accessibility, Paratech positioned itself at the cutting edge of parasitology analytics. The implications extend well beyond one firm: they point to a future where neglected tropical disease research can harness modern BI to accelerate discoveries, interventions, and ultimately, healthier lives for vulnerable populations.