When the Department of Veterans Affairs released the annual ratings of its hospitals this fall, the facility in Atlanta dropped to the bottom, while the one in West Haven, Conn., shot to the top. It was something of a mystery as to why.
The Atlanta hospital was downgraded to one star from three on the agency’s five-star scale, even though there had been only a “trivial change” in its quality data from the year before, according to the department. The Connecticut hospital climbed to five stars from three, even though numerous operations had to be performed elsewhere or canceled at the last minute because of problems with sterilization of surgical tools, according to an internal assessment and other accounts cited by Senator Richard Blumenthal in a letter to the agency.
Veterans Affairs set up the rating system in 2012 in the hope of pushing its hospitals to improve, and it has been increasingly aggressive in using the ratings to hold hospital managers accountable. Leaders with low ratings can be ousted, as happened last week in Atlanta, where the chief of staff and heads of the emergency department, primary care and clinical access services were removed because of low scores.
But former senior officials at the agency and experts in health care metrics say the system can be confusing, and so arbitrary that hospitals may gain and lose stars based only on statistical error. More than a dozen hospitals improved care but lost stars; another did not improve and gained one.
What is most worrisome to some experts is the role that the star ratings now play in grading performance of hospitals and their managers. They say it creates an incentive to conceal problems rather than grapple with them, in order to collect bonuses or sidestep penalties.
“It’s a big mistake,” said Dr. Ken Kizer, a former under secretary for health at Veterans Affairs who is widely credited with pioneering the use of health care quality metrics at the agency.
Dr. Kizer said that it made sense to track quality measures when the goal was improving patient outcomes, and the agency had made important strides in that way. But he said that using the data to single out hospitals for discipline could lead to problems like the 2014 wait-time scandal, when managers who could not meet goals for prompt scheduling of patient appointments started keeping secret off-the-books waiting lists.
“It’s the same pathology that perverted things then,” Dr. Kizer said. “As soon as you tie metrics to pay or performance, they become subject to gaming.”
The gaming can put patient care on the line. At the hospital in Roseburg, Ore., administrators turned away some of the sickest patients to keep them from affecting the facility’s scores, doctors there have said.
The chief of surgery at another veterans’ hospital in a major metropolitan area said in an interview that administrators discussed whether the hospital should not perform certain operations because they could impact the hospital’s quality statistics.
“That kind of thinking is driven by these ratings,” said the surgeon, who spoke on the condition of anonymity, adding that he feared being fired if he spoke publicly. “My life right now is continuously filling out reports and going to meetings, trying to figure out how to improve the numbers.”
There is broad consensus in health care that quality should be tracked and reported, but little agreement on the best way to do it. As in education and law enforcement, a drive to collect data and use it to direct strategy has led to both improvements and frustrations. Often, experts say, the way care is measured can alter the care itself, and not always for the better.
The Department of Veterans Affairs defended its hospital rating system, saying in a statement that it “has been successful in moving systemwide performance upward.” But the department declined to make key officials available to discuss the system.
The ratings may soon take on even more importance. A law signed in June may allow more veterans to get care from private providers if veterans’ hospitals fall short of performance standards.
Veterans Affairs has been tracking hundreds of health care metrics for decades, but it had no overall performance gauge for its 146 hospitals until 2012, when it started using a process called Strategic Analytics for Improvement and Learning, or Sail, to combine many of the metrics into a single score. Executive performance and pay were tied to Sail scores in 2015 in the wake of the wait-time scandal.
The department has reported steady improvement in Sail scores, noting in September that 71 percent of veterans’ hospitals did better this year than in 2017. But experts say some of that improvement may exist only on paper.
The former quality director of a large veterans’ hospital with a five-star rating, who spoke on the condition of anonymity to avoid harming a continuing relationship with the agency, said the hospital employed two analysts whose full-time job was to find ways to improve the Sail data. Some of their work focused on spotting ways that services could be improved, but much of it focused on finding ways to improve the numbers, such as by changing how patients’ conditions were entered in hospital records. “We learned how to take the test,” the director said.
Sail was designed by Dr. Peter Almenoff, a longtime hospital administrator who was moved to a quality control post in the department in 2008 despite questions about his track record. This spring he was also put in charge of the team that revamps hospitals that get low ratings.
The department refused multiple requests to interview Dr. Almenoff, and he did not respond to direct inquiries seeking comment.
Veterans Affairs now relies on Sail to warn about failing hospitals. But Dr. Stephan Fihn, who was the department’s chief quality and performance officer before he retired this year, says the system is not reliable.
“It has serious flaws and always has,” Dr. Fihn said. “The first is statistical: the numbers may not be mathematically sound. Second, it’s not transparent and lacks independent oversight.”
A draft internal evaluation in 2014 found that combining dozens of metrics into a single Sail score was “akin to adding apples and oranges and trying to express the total as the number of pineapples.”
An outside audit in 2015 found that many of the score’s ingredients had “never been assessed to see if they were actually valid measures of quality,” and that hospitals could gain or lose a star solely from statistical error.
According to the report, 70 percent of veterans’ hospital directors interviewed by the auditors with a promise of anonymity said Sail scores did not accurately reflect the quality of their hospitals.
The New York Times contacted eight veterans’ hospitals, including those in Atlanta and West Haven, asking to interview their directors about Sail. None were willing.
“A lot of people don’t like this system, but they won’t speak up because they are afraid of what will happen,” Dr. Fihn said.
Problems in measuring health care quality are not confined to veterans’ hospitals. A 2015 comparison of four popular commercial systems used by private hospitals found their ratings so inconsistent that not one of the 844 hospitals examined earned a top rating from all four.
Medicare tried to institute a five-star hospital grading system, but postponed releasing the latest results indefinitely in July after several hospitals threatened to sue, saying the grading method was inaccurate.
Veterans’ hospitals, however, do not have that option, nor can they choose among commercial rating systems.
The department says its star ratings help keep veterans informed. But Dr. David Shulkin, who was President Trump’s first secretary of veterans affairs, says the stars are not much help in gauging progress from year to year or in making comparisons with nearby civilian hospitals, because Sail grades veterans’ hospitals on a national curve.
“It’s not useful for our patients. It’s confusing. I wanted to move away from Sail,” said Dr. Shulkin, who clashed with political appointees in the department and was dismissed by Mr. Trump in March.
Agency employees say that only Dr. Almenoff and a few members of his staff know exactly how the system weighs and adjusts the 60 publicly available measures that go into a score.
“That’s the problem with Sail — what happens to make the scores is invisible,” Dr. Fihn said. “A person could move the stars arbitrarily, and you would have no way of knowing.”
That lack of transparency became a problem for Lisa Nashton, who is in charge of tracking quality at the veterans’ hospital in Columbia, S.C.
After the hospital received one star, Dr. Almenoff visited the facility in 2016 to brief the staff on ways to improve. While he was there, Ms. Nashton said, she took him out to dinner to talk more about quality metrics.
The effort seemed to pay off. The hospital got its rating up to three stars that year, and it looked forward to a similar rating in 2017, Ms. Nashton said, because it had sustained its quality measures at basically the same level.
So when the word came that the hospital had actually lost a star, “it was a gut punch,” she said. “I kept going over the numbers again and again. I compared us to other hospitals. The math didn’t make sense.”
Ms. Nashton said she then alerted the department’s Office of Accountability and Whistleblower Protection that Sail was statistically unsound and open to gaming, and submitted a lengthy paper showing how a host of problems made the system a “credibility crisis waiting to happen.”
The reply came nearly a year later: The department planned to take no action.