Accessibility Score Methodology Deep Dive

AudioEye Technology Blog
9 min readJul 6, 2022

--

Web browser with an accessibility score of 82/100 and two pie charts to represent site data

At AudioEye, we want to make it easier to answer: how accessible is my site to people with disabilities? To help answer this question, we created the AudioEye Accessibility Score in 2021. The score is a snapshot summary of the accessibility of all visited pages on a site and the site itself. We believe that helping others understand and measure accessibility will help make a more accessible world.

What follows is an in-depth look at the methodology for the AudioEye Accessibility Score. We discuss the challenges of creating a summary score, how we selected our approach, and how we aggregate results from page-level tests to a domain level. We end with examples that walk through the score calculation and compare our score with results from the Web Accessibility Evaluation (WAVE) tool.

Accessibility is a continuous journey and the methodology to measure it needs to also be continuous. We plan to update the score on an ongoing basis and want your feedback to inform future iterations of our metrics development. This post is meant to start a conversation around accessibility/scoring metrics, get feedback on our methodology, and share our approach for others to adopt.

Keeping Accessibility Simple Isn’t Easy

Summarizing the overall accessibility of an entire website or single page into a single measure is difficult. The Web Content Accessibility Guidelines (WCAG) provide a way to detect and fix accessibility issues but do not guide summarizing accessibility measures. These guidelines are written for web developers and accessibility experts, not the general public. Results of an accessibility scan from a tool such as the WAVE scanner are exhaustive lists of issues, errors, and alerts with no indicator of overall accessibility. Someone may see a scan with 100 errors and issues, but won’t know whether it’s good, bad, or normal. Without a metric that conveys the impact or urgency, it is unclear what a user should infer from the information. Our accessibility score transforms the exhaustive testing results of WCAG guidelines into an easy-to-understand metric to help you prioritize issues and take action.

Score Methodology

In developing the methodology for the AudioEye Accessibility Score, we used the following guiding principles:

  1. Based on a well-known standard, such as WCAG
  2. Supported by accessibility metrics research literature
  3. Easy to interpret and understand, including how it changes over time
  4. Flexible enough to be useful with different test suites

The approach we developed meets our guiding principles. We provide both page-level scores and a single site-level score (which is an aggregate of page-level scores). Both are described below.

Page-Level Score

The page-level score is the accessibility score for an individual page, determined by an automated test suite. We developed the test suite to test elements as described in the WCAG success criteria. The AudioEye Accessibility Score is not strictly a WCAG conformance score; rather, it is loosely-based on WCAG success criteria and WCAG-type tests.

There are many approaches one could take to the calculation. After doing an extensive literature review (see reference section), we found a common “rate X weights” pattern that met our design criteria. The weights X rates approach gives the added benefit of being test suite agnostic — any test suite can use this methodology to produce an accessibility score. At a high level, the page level score is calculated by aggregating the results of our tests into failure rates, then multiplying the failure rates by weight. The failure rates are determined by counting the number of elements that fail a given test. The weights are determined by the number of WCAG success criteria discovered on the page. More details of the calculation are in the score calculation details section. This is an area we are looking for feedback on.

This approach does have limitations for WCAG criteria based on issues spanning multiple pages (e.g., navigation) since this approach monitors each page independently. Further, because this scoring is based on a test suite, the score only detects and incorporates issues that can be caught by a test suite. This stresses the importance of a full manual review for a more accurate assessment.

Weighting Selection
The weighting aspect is an important part of the overall score. In general, there are two primary ways to calculate weights, equal or priority. Equal weighting is when we give each success criteria the same weight regardless of how we think the success criteria affects the site’s accessibility. Conversely, priority weighting is when we give some success criteria larger weights based on how important we think they are to a site’s accessibility. Within priority weighting, there are numerous ways to assign priority. Selecting between equal and priority requires understanding how the weighting techniques translate a user’s experience on the page to a given page’s score.

We see some evidence that priority weighting is only marginally better in some cases, and in other cases, equal weighting performs better. This is explained by the chart below taken from “WAEM: A Web Accessibility Evaluation Metric Based on Partial User Experience Order”. It shows how equal and priority weighting techniques compare. They use a satisfying percentage (SP) metric. SP is the percentage of user experience-ranked pairs satisfied by the corresponding websites with weighted accessibility scores. The diagonal solid black line shows the performance of WAEM, and the scatter plots show the performance of other weighting schemes — equal, priority — in comparison. We see that priority weight outperforms equal weighting sometimes, but not always. While this evidence isn’t comprehensive, it suggests that the additional complexity of priority weighting isn’t significantly more correlated with user experience.

We decided to use equal weighting for two primary reasons: 1) it isn’t obvious that priority weighting translates to a score having a higher correlation to user experience 2) it is more easily understood and explainable. Attempting to replicate a human’s more nuanced judgment of severity, which may be based on factors not available within the context of a single page, adds additional complexity.

Scatterplot of equal, priority, and random weight performance measured by satisfied percentages
Figure 1: Weighting schemes measured by satisfied percentage (SP)

Score Calculation Details
The calculation is as follows:

For any AudioEye test t run on a given page:

EFt = count of elements fails

ETt = count of elements tested

Let i be the number of tests related to a single WCAG success criterion that run on a given page. The failure rate r for that criterion is:

rSC = ( EFt1 + EFt2 + … EFti ) / ( ETt1 + ETt2 + … ETti )

Let S be the set of all WCAG success criteria evaluated on the page. The weight w is:

w = 1 / size of S

The score for a page with n success criteria is:

wXr = w x rSC_1 + w x rSC_2 + … + w x rSC_n

Score = (1 — wXr ) * 100

A failure rate is calculated for every WCAG success criteria (based on element level tests which are aggregated at the WCAG success criteria level) and then multiplied by the weight of that WCAG success criteria. This introduces the concept of a partial pass for any given WCAG success criteria, which gives credit to a site for having accessible elements. Note, the concept of a partial pass is not strict adherence to WCAG, which mandates success criteria are either pass or fail.

Weights are determined by counting the number of WCAG success criteria found on a given site and then evenly distributing the weights across them. For example, if 15 of our automated tests (aggregated at 15 different WCAG success criteria) detect failures on one site, the weights will be 1/15 = .0667.

Site Level Score

Now that we understand page-level scores, let’s examine aggregating pages into a site-level score. We do this by weighting each page’s score by pageviews, then averaging pageviews and scores over seven days. This captures the natural weekly cycles of pageviews and actively while also remaining responsive to changes that are employed on the site.

Example: How the Score Works

In the tables in Figure 2, we show the WCAG criteria and the number of failures we see per WCAG criteria based on our test suite.

Weights: Below we see tests associated with 19 WCAG criteria; the weight for each criterion is 1/19, or .052.

Rates: We see the failure rates here after AudioEye has applied remediations. For the first line, we see there are still 428 elements failing out of a total of 436 elements. Critically, for criteria “2.1.1 Keyboard”, we see the failure rate is 0 out of 11 total elements found.

A list of test criteria with pass and fail rates for item
Figure 2: WCAG success criteria associated with the example site

The precise calculation is:

A list of criteria with the columns of element count, failing, tested rate and rate times weight
Figure 3: WAVE report summary and details output for example site. https://wave.webaim.org/

Communicating Accessibility: Visualizing conformance in WAVE vs. AudioEye

Let’s compare the AudioEye Accessibility Score with the WAVE tool when both applied to the same site. While both WAVE and the AudioEye Accessibility Score are based on WCAG, we see how the display of information can make a big difference in how we understand and interpret results. The AudioEye Accessibility Score aggregates all of the information for customers, making it easier to understand their site’s accessibility at a glance. The WAVE information is much more granular. While this granular level is appropriate for accessibility experts or developers, it can be confusing for non-experts.

In Figures 4.1 and 5.1, we show the results of a WAVE tool scan before remediations are applied (note: this can be any remediation, not just an AudioEye remediation) and then show a scan after remediations are applied. A number of errors went down but the number of alerts increased and other counts have changed, but it’s difficult to understand how these changes impact our site’s overall accessibility.

A detailed results image of WAVE scan that lists the different test types and issues detected before remediations
Figure 4.1: WAVE tool scan before remediations — https://wave.webaim.org/
Page Accessibility Score represented by a gauge line graphic and the number 74 out of 100 written in red
Figure 4.2: AudioEye score before remediations in the customer administration website

Now let’s look at the same site scans after remediations have been applied. Same as with the WAVE scan, we show a scan before remediations are applied (regardless of the source, not just AudioEye) and after remediations are applied. This site gets a score of 74 (Figure 4.2) before remediations and 84 (Figure 5.2), an improvement of +10 after the remediations. Visually and contextually, the Accessibility score change is an easier way for a user to measure the change than comparing two WAVE scans side-by-side. The AudioEye score makes it much easier to understand that this site’s overall accessibility has increased.

In the appendix, we provide an under-the-hood look at how our test suite measures the improvements and how it impacts the score.

A detailed results image of WAVE scan that lists the different test types and issues detected after remediations
Figure 5.1: WAVE tool scan after remediations — https://wave.webaim.org/
Site Accessibility Score represented by a gauge line graphic and the number 84 out of 100 written in green
Figure 5.2: AudioEye score after remediations in the customer administration website

Conclusion

The AudioEye Accessibility Score helps our customers answer the question: “how accessible is my site?”. The AudioEye Accessibility Score is grounded in WCAG fundamentals, similar to the WAVE tool, but more easily understood by the average person. Our methodology is derived from 20 years of accessibility metrics research. Finally, this methodology is test suite-agnostic, allowing anyone to adopt it. Summarizing a site’s accessibility with automation plays a critical role in making the web more accessible for all. Please provide feedback on any aspect of our work; we look forward to leveraging your feedback and working together to create a more accessible future.

References

  1. Comparing Web Accessibility tools and Evaluating the Accessibility of Web Pages: Proposed Frameworks. Abdullah Alsaeedi. 2019.
  2. Reliability Aware Web Accessibility Experience Metric. Song et al. 2018.
  3. WAEM: A Web Accessibility Evaluation Metric Based on Partial User Experience Order. Song et al. 2017.
  4. Automatic Web Accessibility Metrics. Where we are and Where we can Go. Vigo, Branjjik. 2011.
  5. An Evaluation of Web Accessibility Metrics Based on Their Attributes. Freire et al. 2008
  6. Quantitative Metrics for Measuring Web Accessibility. Vigo et al. 2007.
  7. Unified Web Evaluation Methodology. Veilman, Snaprud. 2006.
  8. Metrics for Web Accessibility Evaluation. Parmanto. 2005
  9. Barriers to Use: Usability and Content Accessibility on the Web’s Most Popular Sites.Sullivan, Matson. 2000.

Appendix

AudioEye’s internal testing tool showing the list of WCAG criteria with pass and failure rates for a given site, before remediations.
Figure A.1: AudioEye internal test results and Accessibility score before remediations
AudioEye’s internal testing view showing the list of WCAG criteria with pass and failure rates for a given site, after remediations.
Figure A.2: AudioEye internal test results and Accessibility Score after remediations

--

--

AudioEye Technology Blog

AudioEye is a digital accessibility platform delivering ADA and WCAG compliance at scale, through a combination of technology and subject matter expertise.