Samuel Pass

Mathematics and Computer Science Student | Quantitative Trader, Researcher & Developer

An Introduction to Trading Infrastructure

There is a significant gap between researching strategies on a free data source, such as Yahoo Finance or Alpaca (yes, they have free API keys), and running a systematic strategy (or a portfolio of systematic strategies) through a brokerage firm. For example, where do you get more granular data to verify your strategy further? How do you connect to a brokerage firm to execute orders systematically? What fees will you encounter when trading? How do you safely place orders, and what safety nets should you have built into your trading algorithm to prevent mistakes? If you have any of these questions, then this blog is for you! (Don’t worry, I don’t actually talk like this).

In this blog, I’ll provide a high-level overview of the infrastructure needed to create a fully systematic portfolio of trading strategies through Interactive Brokers (IB). Additionally, I’ll review trading fees, paper trading, account types, and algorithm safety nets that are standard practices for running such a portfolio.

Let’s start at the beginning: say you have a trading idea that you’ve spent some time researching in a Jupyter notebook. What’s the next step?

Granular Data

You will want to analyze your strategy further using more granular data and develop a backtest to see how it would have (roughly) performed on historical market data. Some of my personal favorite granular data providers include Polygon.io (monthly fee), Interactive Brokers (requires you to have a brokerage firm account), Alpaca (monthly fee for the good subscription), and Databento (allows you to pay a one-time fee only for the data that you want).

*Polygon fun fact - you can pay for a monthly stock data subscription, download the flat files, which include the quote data for every stock in the stock universe (this sums up to about 3 TB worth of data), and then unsubscribe from your subscription the next month. However, considering the size of these files, you will need to come up with an elegant way to store and stream the data from these flat files (e.g., store the flat files on a cloud server, break the files up into days rather than years, and write an algorithm to stream chunks from these files).

Based on your specific data needs, I’m (relatively) confident that you’ll be happy with one of the above data providers. After acquiring data subscriptions/licensing, you’ll need a way to quickly access this data. I have succeeded in using cloud infrastructure services and simply streaming the data from the data provider’s API. I also like using cloud infrastructure (e.g., AWS) as it is secure (set up a VPN!) and an easy way to share access with the rest of your team. However, this will vary based on the needs of the individual. Now, you’re ready to dive into the world of backtesting.

Backtesting & Fees

Now that you have an elegant setup for retrieving granular data, you will want to develop in-depth backtests to verify the performance of your strategy on historical market data. If you’ve never created a backtest before, check out my GitHub repository to make a high-level statistical arbitrage pairs trading backtest (here). When making an in-depth backtest, it is essential to account for all types of fees your strategy will incur while trading. Here are two of the main fees you will need to account for if you are going to be putting on a taking-liquidity statistical arbitrage pairs trading strategy:

Check with your brokerage firm for additional fees. Additionally, you can significantly reduce trading fees if you adjust your strategy to create liquidity rather than take liquidity.

*There should be a different dividend factor for bid prices and ask prices.

After accounting for fees, I would spend some time programming the order execution logic you intend to use in your production algorithm into your backtest. I will talk more about this in the following section. Lastly, you will need to adjust your historical market data for corporate actions (dividends, stock splits, etc.). This will ensure that corporate actions don’t trigger any false signals in your backtest, which could result in distorted strategy performance metrics. Adjusted historical market data for a given dividend is calculated by multiplying all prices before the ex-dividend date by the dividend factor (dividend amount / last price before the ex-dividend date).

*I will soon be writing a blog about creating a probability distribution model to gauge your likelihood of getting filled at prices within the bid-ask price interval for a given stock (based on historical trade data for that stock). I have meant to do this for a while, as it’s incredibly hard to backtest and paper-trade a making-liquidity strategy.

Lastly, you will need to ensure that you have proper logic implemented in your backtests for calculating profit and loss (both unrealized and realized). These profit-and-loss values will give you your equity curve when combined with your initial portfolio capital. From here, you can calculate performance metrics such as the Sharpe ratio, Sortino ratio, drawdowns, etc. Now, it’s time to set up your brokerage firm account, graphical user interface (GUI), and paper trading algorithm(s).

Live Trading Infrastructure

You will now need to set up a brokerage firm account. From this point forward, everything I say will only be relevant to Interactive Brokers (IB), as this is the brokerage firm I have direct experience with. I recommend starting with a stock margin account, giving you no additional leverage to trade with. In a stock margin account, your cash value is what you are allowed to trade with. Suppose you enter a position with a market value greater than your SMA/cash amount. In that case, your long positions will be partially auto-liquidated, with a value of positional market value minus the original cash amount in your portfolio, at 3:50 pm EST. Suppose trading with leverage is something up your alley. In that case, you can see if you qualify for a portfolio margin account, which will give you leverage based on the risk of your portfolio (more information on how IB measures the risk of your portfolio is available online). Once you have a brokerage account set up, you will have to choose between using IB trader workstation (TWS) and IB gateway. TWS provides traders with a clean and user-friendly real-time graphical user interface (GUI). This GUI provides users with information regarding open and closed trade positions, market orders, bid/ask prices, last prices, time and sales data, and more.

*TWS does provide you with realized and unrealized pnl; however, it’s considered good practice to perform this accounting on your own (if you are using the python IB package, ib_insync, this can be done by accessing the ib.positions() and ib.executions() built-in functions to collect information on your existing and pre-existing trades/positions). Some of the many reasons for calculating your own pnl are as follows: IB calculates unrealized pnl based on ‘last’ prices rather than using the live bid/ask prices, and IB also adjusts your average position costs/prices based on wash-sales (if you are unfamiliar with wash-sales, here is an IB article on wash sales) which will significantly distort both your realized and unrealized pnl, but no your net liquidation value.

If you decide to use TWS because of the GUI, I recommend setting up a virtual network computing (VNC) server for your GUI to live on. This is especially important if you are working with a team of traders.

IB gateway takes the headless-gui approach, which means you won’t have access to a clean interface to look at directly when monitoring trading strategies. You will need to create some sort of elegant trading dashboard (which I will talk more about later) to display and record the trading-related information that’s relevant to you exclusively through IB’s API. However, it is worth mentioning that because IB gateway uses nearly 40-50% less of the system resources that TWS uses, you might have lower latency when placing orders through the API. There are plenty of well-written resources online that discuss the pros and cons of each.

Live Trading Safety Nets

Now that you have access to granular data and in-depth backtests and have set up the infrastructure needed to trade, it is time to start sculpting your production trading algorithm. Integrating the basic trading logic from your backtest with the brokerage firm’s API will take time. Additionally, depending on the type of strategy you will be trading, developing elegant order execution logic and a discrete (or continuous) dynamic position sizing model will also take some time. However, after your script is built, here are some of the safety nets to incorporate into your trading pipeline (whether through your central algorithm or through support algorithms that will run in parallel with your central algorithm):

Some other things to consider are reconnect/disconnect logic, handling trading halts, continuously saving parameters and prices in a database, and ensuring that your pnl accounting is precise. Lastly, I’d like to mention the importance of a live-trading dashboard. Having a centralized location to access unrealized pnl for each strategy in your portfolio and strategy performance metrics that get updated continually (Sharpe ratio, max drawdown, etc. ) is a significant asset. There are some great online resources for making a live-trading dashboard (I might even make a blog about it at some point soon).

Conclusion

The objective of this blog was to give a high-level overview of the steps required to take your trading strategy from research to production trading. I acknowledge that there are many things I may have failed to include, but I hope this can serve as a beneficial one-stop shop for someone interested in learning about the process of putting on a live trading strategy.