• Australian (ASX) Stock Market Forum

Hello and welcome to Aussie Stock Forums!

To gain full access you must register. Registration is free and takes only a few seconds to complete.

Already a member? Log in here.

Python - getting started

Discussion in 'Software and Data' started by jjbinks, Jan 8, 2017.

  1. jjbinks

    jjbinks

    Posts:
    278
    Likes Received:
    246
    Joined:
    May 14, 2013
    There's been much talk about python. Over the years I have completed a number of online courses on python, and data science with python. However, I have never really used python outside of this courses.

    I've finally decided to start using python first to back test some ideas. Then also try implement machine learning tools to try find new trading ideas.

    I am hoping to share my baby steps in using python. I know that my methods may not be the most efficient way of doing things and there maybe some gaps. So hopefully I can get some feedback. Also, this may be useful for others trying to get started with python for trading.


    If you are interested in learning python there are many online courses. The courses I have done have been through edX
    1)Edx introduction to computer science - https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x-9
    2)Edx - Progrmaming with python for datascience https://courses.edx.org/courses/course-v1:Microsoft+DAT210x+6T2016/info
    3)Udemy - machine learning for trading (not that useful)
     
  2. jjbinks

    jjbinks

    Posts:
    278
    Likes Received:
    246
    Joined:
    May 14, 2013
    I am working with premium data ASX stock data.

    Using the premium data converter to convert the ASX data into CSV file for each ticker which is then saved into one folder.

    Code:
    import pandas as pd
    
    with open('Lists/SPASX100.asx.txt') as f:
        asx100=f.read().splitlines()
    
    
    asxdict={}
    for c in asx100:
        filepath='Equities/' +c +'.csv'
        asxdict[c]  =pd.read_csv(filepath)
    I have used this code to make a dictionary of dataframes for each ticker which can be then used to do further work.

    I have also tested the data using the code from https://www.quantstart.com/articles/Research-Backtesting-Environments-in-Python-with-pandas
    THis codes for a very basic back testing model. This can now be tweaked. It can also be used to test other ideas.
    FYI - the code on that website has a few problems. PM me if you want to know how I tweaked it. I did not post the since it would be essentially just copying.
     
    Last edited: Jan 8, 2017
  3. DaveDaGr8

    DaveDaGr8

    Posts:
    110
    Likes Received:
    34
    Joined:
    Jun 29, 2007
    Awesome idea .. i am a c++ man myself, but i've recently started teaching the kids python, which means teaching myself as well.

    For anyone who's interested in trying it out, install the Anaconda version of Python. It comes preloaded with Numpy, scikit-learn, Pandas and a whole bunch of other plugins that will become useful. It saves a lot of messing around.
     
  4. Lone Wolf

    Lone Wolf

    Posts:
    404
    Likes Received:
    63
    Joined:
    Dec 4, 2008
    The first question every Python learner has - Which version of Python are you using?
     
  5. ThingyMajiggy

    ThingyMajiggy

    Posts:
    1,627
    Likes Received:
    43
    Joined:
    Nov 12, 2007
    Been doing the same myself.

    Some more courses/links:

    Andrew Ng's Machine Learning from Stanford on Coursera, one of the most popular and recommended courses on ML.

    Sentdex's YouTube channel, the guy is a Python whizz and has a lot of stuff related to finance and machine learning as well. I've followed along in one of the machine learning tutorials and got a chart that makes a "forecast" which is supposed to be into the future but the dates or something are stuffing up atm. He also has Sentdex(where his channel name come from) which is "Sentiment" index, which grabs/learns/gauges sentiment off Twitter on stocks/indices. Pretty cool.

    Quantiacs, more of a Python trading/backtesting platform, but looks pretty sweet, they also have some good videos in the tutorial section on ML and just making systems in general and with Quantiacs, they also have a competition for the best systems.

    Anaconda Distribution, definitely use this as your Python install, as it comes with ALL the libraries you need for this stuff included, all free anyway obviously but saves the pain in the bum of installing each library yourself manually.

    I'd go with Python 3, I used to say 2 but it seems as though 3 is taking charge in recent times, only reason people were sticking with 2 was because all the libraries were mostly made for 2 but I think they're transitioning over now. The syntax differences are minimal anyway so it's hardly any different.
     
  6. howardbandy

    howardbandy

    Posts:
    838
    Likes Received:
    93
    Joined:
    Jun 13, 2007
    Greetings --

    Perfect advice.

    The Anaconda distribution of Python is available for any operating system -- Linux, Windows, or MacOS.

    I recommend version 2.7.12. Version 3 might be the future, but eventually you will run into a library you want to use that has not yet been converted from V2 to V3. Many of the best tutorials are version-neutral, and many related to machine learning use V2.

    Best, Howard
     
  7. howardbandy

    howardbandy

    Posts:
    838
    Likes Received:
    93
    Joined:
    Jun 13, 2007
    Greetings --

    I have had several conversations with Richard Dale about having a Python / Premium Data API. I am a big fan of Premium Data and I hope that happens.

    Until it does, stock and fund data can be loaded directly from the quote vendor into a Python Pandas database (like an array) from Yahoo, Google, or Quandl. No local files are required -- although you can set them up if you so desire. All are available free. As usual, free is not always best. If the quality is not to your standards, Quandl has a premium offering available with a subscription.

    Best, Howard
     
  8. jjbinks

    jjbinks

    Posts:
    278
    Likes Received:
    246
    Joined:
    May 14, 2013
    I'm using python 2.7.12 with sypder/anaconda.

    I started using eclipse but switched to anaconda. I definitely prefer Anaconda.

    As for data it would be nice if Premium data had an API esp for automatic updating of your data. But I think its not to hard to download and convert the the data to a useable process.
     
  9. DaveDaGr8

    DaveDaGr8

    Posts:
    110
    Likes Received:
    34
    Joined:
    Jun 29, 2007
    What do you need it to do ?. You can use windows scheduler to run the Data Updater.
    You can set up a jscript and schedule that can auto dump amibroker (premium data) data.

    Is this something that would work or do you need something different ?
    All my updates and explorations are run at 5am for ASX and 5pm for US.
     
  10. jjbinks

    jjbinks

    Posts:
    278
    Likes Received:
    246
    Joined:
    May 14, 2013
    yep something like that would work. Not particularly familiar jscript. But if you had a script to automatically run the data updater followed by the data converter it would save you a few minutes every day. :)
     
  11. DaveDaGr8

    DaveDaGr8

    Posts:
    110
    Likes Received:
    34
    Joined:
    Jun 29, 2007
    captain black and jjbinks like this.
  12. jjbinks

    jjbinks

    Posts:
    278
    Likes Received:
    246
    Joined:
    May 14, 2013
  13. captain black

    captain black

    Posts:
    1,279
    Likes Received:
    742
    Joined:
    Oct 24, 2005
    Great thread jjbinks :)

    I've been using Python on Linux for years but haven't used it for trading systems at this stage. Watching with interest :)
     
  14. DaveDaGr8

    DaveDaGr8

    Posts:
    110
    Likes Received:
    34
    Joined:
    Jun 29, 2007
    You have to remember that amibroker uses a standard backtesting module. You can override it, but that would be a lot more than 2 lines of code.

    As you say, Once you have a good Portfolio class you can just reuse it everywhere. You will end up with a handfull of functions but really only use buy(xxx) or Sell(xxx).

    If you haven't already done so, google CVM (Controller, model, View) architecture. Basically split your data from your algorithms from you GUI/CRT output. It's hard at first but things end up less like spagetti and more like something that you could pick up in a year and work on again. It also makes it easy if you want to change your view module, or have multiple modules like a normal program and a web interface and a mobile phone interface etc.
     
  15. Wysiwyg

    Wysiwyg Everyone wants money

    Posts:
    8,424
    Likes Received:
    260
    Joined:
    Aug 8, 2006
    Had a look at the link because I am always interested in learning something within my intelligence capabilities and found this quote disturbing
    4 million lines of code and then optimise the lookback periods. Everyone should know that optimising to fit the historical data is fruitloops and you can never beat the lag of an indicator that uses average price data.
     
    Last edited: Jan 10, 2017
  16. howardbandy

    howardbandy

    Posts:
    838
    Likes Received:
    93
    Joined:
    Jun 13, 2007
    Greetings --

    My Data Updater runs on its own schedule. It would be the conversion program that needs a local scheduled execution manager.

    We need Richard Dale to outline the best procedures for us. He regularly reads and enters conversations such as these.

    My understanding the following, and may be wrong:
    1. The "native" format for Premium Data stored on each subscriber's personal computer is MetaStock.
    2. Premium Data has some tools that convert to formats that many technical analysis and charting packages use. One of those tools is The Premium Data Converter. Webpage here:
    https://www.premiumdata.net/support/pdc.php
    3. Premium Data Converter can export into csv files.
    4. Python reads csv files easily. Information in several places. You will probably want the pandas version listed third:
    https://docs.python.org/2/library/csv.html
    http://www.pythonforbeginners.com/systems-programming/using-the-csv-module-in-python/
    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

    There have been a couple of MetaStock libraries for Python posted, but none has been kept up to date. Here is the most recent (two years since any maintenance):
    https://pypi.python.org/pypi/pyms

    Best, Howard
     
  17. jjbinks

    jjbinks

    Posts:
    278
    Likes Received:
    246
    Joined:
    May 14, 2013
    Yep not the best source for back testing ideas. But it provides a practical introduction in how to use python for back testing. Hopefully no one is trading with that system haha!

    What you described is exactly what I am doing.
     
  18. howardbandy

    howardbandy

    Posts:
    838
    Likes Received:
    93
    Joined:
    Jun 13, 2007
    Greetings --

    AmiBroker, and all the other charting and trading system development platforms, were designed for the specific purpose of reading, displaying, and manipulating financial data, and using a built-in model (decision tree) to interpret signals, generate buy and sell signals, prepare reports, and interface with brokerage applications.

    Python is a general purpose programming language. In order to generate trading signals, the support that is built-in to AmiBroker must be programmed into Python, creating a development platform.

    What we give up in moving from a traditional trading system development platform is all of the built-in applications. What we gain is the ability to expand beyond decision tree models. Whether that is worth while depends on how you use AmiBroker and how you want to use Python.

    You will need to provide the features and capabilities you currently use in AmiBroker in Python. But you will not need to replicate all of AmiBroker if you do not use all of AmiBroker.

    Do not misinterpret my comments. I think AmiBroker is the best of the traditional trading system development platforms. For traders who use the graphical capabilities and / or are satisfied with decision tree models, there is no need to leave AmiBroker for Python.

    The reason to use Python is that it provides the capability to use models other than decision trees, and also to do other calculations in the same program that generates the trades.

    Chapter 8, Model Development - Machine Learning, of my "Quantitative Technical Analysis" book, has fully disclosed and ready to run examples of the following sequence of programs and explanation:
    1. Use AmiBroker to read primary data for a tradable issue, compute the variable-lookback version of the RSI indicator, generate impulse Buy and Sell signals from that indicator for trading Market-on-Close, compute and plot the cumulative equity. Begin on page 274.
    2. Provide a flowchart, discussion, and template code necessary to implement a computer program that replicates the capabilities of the AmiBroker program in step 1. Begin on page 278.
    3. Use Python to implement a program that replicates the AmiBroker program from step 1. This is a very bare-bones trading system development platform. Begin on page 279.
    4. Use Python and state signals to do the same, verifying that state signals are equivalent to impulse signals. Begin on page 283.

    The AmiBroker function for the program in step 1 to implement the variable-lookback RSI is about 20 lines. The block of control variables is another 12 lines. The code needed to set up the data, assign values to indicators, compute signals, generate trades is about 7 lines. Plotting takes another 5 lines. The whole program is a page and a half.

    The Python program in step 4 requires more code in order to implement those functions built-in to AmiBroker. Loading the libraries is about 8 lines. Computing the custom RSI is about 25 lines. Loading the data and setting the date range is about 20 lines. Computing the indicator and interpreting to generate Buy and Sell signals is about 20 lines. The loop that interprets the Buy and Sell signals, generates the trades, and manages the account balances is about 50 lines. The whole program is about four and a half pages.

    Moving on to machine learning.

    Everything done above uses decision tree model. The Python program in step 4 illustrates a very basic decision tree. To use a different model, begin on page 365. Several functions that might be useful have been coded in Python and are included in the listing. Variable-lookback RSI, RateOfChange, z-score, softmax, DetrendedPriceOscillator, NumberZeros, GainAhead, AverageTrueRange, and PriceChange. Even when the documentation is as long as the code, each function is about 30 lines long. Data is read on-the-fly from Quandl. The process of bringing previous values forward so that each day is a complete and independent data point is illustrated. Data is divided into in-sample and out-of-sample, and target is isolated from predictor variables, in about 10 lines. The Logistic Regression model is fit to the in-sample data in one line. And the out-of-sample data tested in one line. Using more code than necessary, the confusion matrix showing the accuracy of the model uses about 30 lines (it could be one line).

    Additional processing includes one line to plot, one line to compute terminal equity, one line to implement Monte Carlo analysis. Continuing on, if you wish, one page to implement dynamic position sizing.

    Caveats --

    What I have written and published is bare-bones, and is a template. You will be able to switch from Logistic Regression to Support Vector Machine, or any of the other learning algorithms found in scikit-learn, by changing exactly one line of code.

    But do not be fooled. This is a powerful tool and using it involves a somewhat steep learning curve to use it properly and safely.

    Best regards, Howard
     
  19. jjbinks

    jjbinks

    Posts:
    278
    Likes Received:
    246
    Joined:
    May 14, 2013
    Hi guys,

    Just an update.
    So far I am working on trying to emulate amibroker.
    I loaded up stock prices as separate arrays and am now trying to code the functions to generate new arrays for things like moving averages, highest values, lowest values.

    it actually is conceptually very simple but because my python is rusty (or perhaps was never very good to begin with) it is taking some time. However, with each function I make I feel like my python knowledge is improving

    The next step would be to use this arrays to generate signals and try create a backtesting model. Again I don't think it will be do difficult to generate signals. The back test model may be more difficult.

    What I am doing thus far can be done much more easily on amibroker. The reason I am trying to do it with python is two fold:
    1)I want to improve my python skills
    2)I think python will offer more flexibility to customise algorithms in the future. More importantly it will set the frame work to use some of the machine learning tools if i decide to go down that way but that is a long long way away!
     
  20. DaveDaGr8

    DaveDaGr8

    Posts:
    110
    Likes Received:
    34
    Joined:
    Jun 29, 2007
    Are you using python to actually calculate Moving averages, bollinger bands etc. For an educational point of view this is ok, but i think that speed wise you might suffer. It would be good to do some simple timings of
    1 - raw python with no library calls
    2 - Python with libraries
    3 - Raw Amibroker.

    Another reason to build a tool like this is that you may not want to run this on windows. Amibroker doesn't play nicely with linux at all.

    Also in 5 to 10 years time, who knows where amibroker will be. Tomasz seems to be a one man show, which is good for him, but when he goes amibroker will never be the same.
     
Loading...

Share This Page