“We started five years go and now we are more mature in the industry with using MPP (massively parallel processing) systems, and we have shown significant ROI, in being able to do complex analytics while managing the footprint,” said Werr.
“NYSE needs to store and analyze seven years of historical data and be able to search through approximately one terabyte of data per day, which amounts to hundreds in total,” added Werr. “The PureData System for Analytics powered by Netezza provides the scalability, simplicity and performance critical in being able to analyze our big data to deliver results eight hours faster than on the previous solution, which in our world is a game changer.”
NYSE’s initial focus was on trading surveillance of market makers and broker-dealers’ trading platforms. A second concern was capacity planning.
“The New York Stock Exchange SLAs (service level agreements) are stringent,” said Werr. “The system need to be 100 percent fault tolerant. When systems cross capacity thresholds, additional capacity would be automatically engaged and trading would continue to flow without interruptions.”
“Extremely large data volumes, data integration complexities, market surveillance and ad hoc analytics requirements took a large number of IT resources to babysit the environment and constantly tune it. The systems became too complex and slow,” Werr added.
To run analytics, data had to be extracted out of the database into applications like SAS and proprietary NYSE apps to perform necessary analysis.
“Big data for us is augmentation between systems like Netezza and a set of technologies like Hadoop and a distributed file system and identifier tiers that orchestrate data access. NYSE big data is all about taking that to the next level and packaging it so it can be dropped into an organization and leveraged so they could continue to support the innovations in big data.”
Phil Francisco, vice president of big data product management at IBM, said Werr had developed some interesting ways to load archival data into Netezza very quickly so NYSE can run surveillance analytics against records a few months back, or a few years back.
“NYSE continues to push the envelope for high performance, scalability and reliability,” Werr said. “NYSE has implemented large network pipes across data centers and trading systems. We can move data around very quickly. Data needs to move in and out of analytics systems (like Netezza) fast.
NYSE Technologies makes its systems available for purchase and installation behind a firewall or as a service. The system is fast -- in terms of analytics; it is not designed for high frequency trading. It refreshes at one-minute intervals, near real-time in the analytics world.