Thursday 18 May 2006

Lots of Little Errors

From Space Daily, the story of a 99% success. A story of how software and Rocket Science are hard.
NASA officials released a summary report Tuesday identifying the causes of a collision on April 15, 2005, between the experimental Demonstration of Autonomous Rendezvous Technology spacecraft and its intended rendezvous target, MUBLCOM, an inactive military communications satellite.
...
"There were a lot of causes," Scott Croomes, who chaired the eight-member board, told reporters during a teleconference at NASA's Marshall Space Flight Center in Huntsville, Ala.

Croomes said the spacecraft's global-positioning receiver suffered from a "factory error," which caused DART to reset its position and speed continually, and thereby discard the real-time GPS data that could have kept it on a precise course for the rendezvous and avoided the collision.

Because of the error, DART's receiver consistently produced a velocity reading that was biased by about 0.6 meters per second from what it should have been. The spacecraft's onboard software could not reconcile the error with the real-time data, and hence kept firing the thrusters and using up its fuel.

The investigation board also found that although the DART team at Orbital Sciences Corp. – the spacecraft's builder - knew about the error, they never attempted to correct it. This proved to be a critical misstep, because the software model that simulated the receiver during preflight testing assumed the receiver measured velocity perfectly, and that assumption was transferred to the spacecraft's software.

Combined with other errors and complications, the miscalibration caused DART to collide with MUBLCOM, the NASA board found – although the collision was minor. DART missed its 6.3 meter target envelope by less than 2 meters.

DART's design did include a collision-avoidance mechanism, but the software was dependent on the same navigational data source as the guidance system, so it was ineffective.

"The reasons for this inadequately-designed logic include the unanticipated potential for navigational errors and a lack of adequate design review," the board concluded in the summary report.

"This almost worked," Croomes told reporters. "Had any one of those causes not been there, (the rendezvous) would have worked."

As usual, the problems weren't primarily technical, though malfunctioning equipment was the immediate culprit. The problems were managerial, with two teams operating under different assumptions. One assumed that the equipment they were interfacing with would operate according to its specification - that it would fulfill its contractual promise as written in the interface documents. The other team assumed that a minor error wouldn't matter.

No comments: