James Scott
  • Research
  • Books
  • Teaching
  • Software
  • Consulting
  • CV

Consulting and Litigation

Consulting and Litigation

I serve as a consulting statistical expert, often in litigation where data analysis or statistical modeling are central to the legal questions. I conduct and evaluate analyses to assess what the data can support, what assumptions are required, and whether another party’s conclusions are stronger than the evidence allows.

My work is most relevant when a case depends on samples, models, algorithms, audits, enforcement analyses, or large operational data sets. These issues often arise in disputes involving liability, damages, causation, compliance, fraud, collusion, class certification, or classwide proof.

Qualifications

I am Professor and Chair of Statistics and Data Sciences at The University of Texas at Austin, with faculty appointments in Statistics and Data Sciences and the McCombs School of Business. My academic work is about how to draw reliable conclusions from complex data, spanning statistics, machine learning, AI, and interdisciplinary scientific collaborations.

I take pride in explaining complex statistical and machine-learning ideas clearly to non-specialists. I have won multiple university teaching awards, including the Regents’ Outstanding Teaching Award, the highest teaching honor in the University of Texas System. I am also the coauthor of AIQ: How People and Machines Are Smarter Together, a book on the statistical foundations of artificial intelligence written for a broad audience and favorably reviewed by outlets such as the Times and The Wall Street Journal.

When statistical questions matter

Statistical questions become important in litigation when a party asks a court, agency, arbitrator, or jury to draw a conclusion from data. In some matters, counsel needs an affirmative analysis that can answer a specific factual question in a defensible way. In others, an opposing expert, regulator, agency, or other party has drawn a conclusion that requires careful statistical evaluation.

The examples below illustrate statistical issues that arise frequently in litigation. Some are based on matters from my own consulting work, with details modified to preserve confidentiality. Others are included because they involve similar statistical questions in related litigation settings.

Whether a statistical anomaly constitutes evidence of misconduct

Some disputes turn on whether an unusual data pattern is meaningful evidence of fraud, collusion, or other misconduct. The statistical task is to define an appropriate baseline, account for ordinary sources of variation, and assess whether alternative explanations have been addressed.

For example, a state agency investigates whether contractors coordinated bids on major infrastructure upgrades to the State Capitol Complex. The data include bid amounts, project characteristics, bidder identities, timing, geography, and historical bidding behavior. The analysis must address whether the observed bid patterns support an inference of collusion rather than ordinary variation, capacity constraints, shared reliance on common suppliers or bidding software, or project-specific factors.

Whether a model or algorithm performed as represented

Commercial disputes involving predictive models, machine learning, or AI often turn on how performance should be measured. The statistical issues include the choice of accuracy metric, whether the available outcome data measure the claimed failure mode, and whether observed business outcomes can be attributed to the model rather than to implementation choices, customer behavior, product mix, or other operational factors.

For example, a clothing retailer retains a software vendor to deploy a machine-learning system to make online size recommendations to customers. After the retailer observes high return rates attributed to size mismatch, a dispute arises over whether the vendor’s predictions met the promised standard of accuracy. The analysis addresses whether return data are a valid proxy for sizing errors, how prediction accuracy should be measured, and whether the observed outcomes show that the model was out of specification.

Whether sampled products support conclusions about an underlying process

In technical or IP disputes, parties may ask whether finished products reveal something about an underlying design, method, or manufacturing process. The statistical issues include how the products should be sampled, how measurement variation should be handled, and how strongly observed similarities or differences support conclusions about the process that produced them.

For example, a semiconductor manufacturer alleges that a competitor copied a protected manufacturing process. Because the process itself cannot be observed directly in every relevant instance, the plaintiff samples and measures finished circuits from both manufacturers. The analysis must address whether the measurements thus obtained are consistent with equivalent manufacturing processes and how much uncertainty remains in that comparison.

Whether a sample can support extrapolation to a full population

Many disputes require conclusions about a large population that cannot be reviewed exhaustively. The statistical issues include the sampling frame, the selection process, the inspection or coding protocol, the treatment of ambiguous cases, and the uncertainty in any extrapolation to the full population.

For example, a logistics company alleges that a warehouse operator used leased scanning and tracking equipment at unauthorized facilities. Because a complete inspection of all facilities and devices is impractical, the parties inspect a sample of locations and use the results to estimate the prevalence of unauthorized use and associated damages across the full network. The analysis must address how large a sample to take, and subsequently whether the sample actually taken supports extrapolation to the full population of facilities and devices.

Whether an audit sample supports conclusions about classification and amounts

Audits often require conclusions about a large number of transactions, entries, claims, or records that cannot all be reviewed individually. The statistical issues include how the sample was selected, how records were classified, how dollar amounts were estimated, and whether the resulting estimates can be generalized to the full population.

For example, customs authorities audit a logistics firm to assess whether imported goods were classified correctly and whether duties were calculated properly. The audit depends on a sample of entries, with disputed issues involving both product classification and duty amounts. The analysis must address whether the sample was drawn from the right population, whether classification errors were coded consistently, and whether the estimated underpayment can be extrapolated to the full set of imports.

Whether sampled claims can support reimbursement or damages estimates

Health care, insurance, and reimbursement disputes often rely on samples of claims or records to estimate error rates, overpayments, underpayments, or damages. The statistical issues include the definition of the claim population, reviewer consistency, disputed classifications, stratification, and uncertainty in the extrapolated amount.

For example, a payer audits a provider and concludes that a sample of claims contains documentation or coding errors. The payer seeks to extrapolate the estimated error rate to a much larger population of claims. The analysis must address whether the reviewed claims are representative of the target population, whether the review criteria were applied consistently, and whether the extrapolated amount reflects the uncertainty in the sampling process.

Whether transaction data support classwide conclusions

Class actions often turn on whether data can establish common impact, common exposure, or damages across a proposed class. The statistical issues include whether the data measure the relevant conduct, whether individual variation can be accounted for, and whether the proposed analysis supports classwide conclusions rather than merely aggregate patterns.

For example, consumers allege that an online retailer’s pricing or recommendation system affected a large class of purchasers. The data include transaction records, customer histories, product attributes, pricing changes, and platform behavior over time. The analysis must address whether the alleged effect can be measured consistently across the proposed class and whether alternative explanations for observed differences have been accounted for.

Whether incomplete digital records support conclusions about scope or impact

Disputes involving digital systems often depend on logs, alerts, records, or event histories that are incomplete or difficult to reconcile across systems. The statistical issues include missing data, duplicate records, classification rules, sampling from large event sets, and uncertainty about the affected population.

For example, after a cybersecurity incident, the parties dispute how many accounts, records, devices, or transactions were affected. The available data include system logs, security alerts, forensic records, and exports from different systems that do not align perfectly. The analysis must address how affected records are identified, how missing or inconsistent logs are handled, and whether a sampled review can support a broader estimate of scope.

Contact

For litigation inquiries, please provide the party names, brief matter description if available, jurisdiction or forum, side represented, relevant deadlines, and a brief description of the statistical or data-related issue. This information is needed for an initial conflict check before any substantive discussion.

© James Scott

 
  • Email

  • Google Scholar

  • GitHub