Why would you want to call Python from MATLAB. There could be a number of reasons.
First, as a single user. You might want to grab features available in Python. For instance, specialized libraries in fields like AI: Machine Learning with Scikit-Learn or XGBoost, Deep-Learning with TensorFlow or PyTorch, Reinforcement Learning with OpenAI Gym, …
Second, if you are working with colleagues that developed Python functions, that you would like to leverage as a MATLAB user, without the need to recode.
Third, if you are deploying your MATLAB Application in a Python-based environment, where some of the services, for instance for data access like in the case of the weather app from the first chapter, are written in Python.
Since R2021b, you can run Python statements directly from MATLAB with pyrun. This is convenient to simply run short snippets of Python code, without having to wrap it into a script.
pyrun("l = [1,2,3]")
pyrun("print(l)")
[1, 2, 3]
As you can see, the pyrun function is stateful, in that it maintains the variable defined in previous calls. You can retrieve the Python variable on the MATLAB side by entering it as a second argument:
pyrun("l2 = [k^2 for k in l]","l2")
ans =
Python list with values:
[3, 0, 1]
Use string, double or cell function to convert to a MATLAB array.
You can retrieve the list of variables defined in the local scope with the function dir():
D = pyrun("d = dir()","d")
D =
Python list with values:
['__builtins__', '__name__', 'l', 'l2']
Use string, double or cell function to convert to a MATLAB array.
If it feels more convenient to paste your Python code snippet into a script, you can use pyrunfile.
Since MATLAB 2022a, you can develop your own custom live tasks. So, in mid-2021, we started prototyping a Python live task with Lucas Garcia. The truth is: I made a first crappy version, and Lucas turned it into something awesome (Lucas should get all the credits for this). Based on this Minimal Viable Product, we engaged with the development teams, both the MATLAB editor team, and the Python interface team. We decided it would be best to release this prototype in open-source on GitHub to get early feedback. The code is available on https://github.com/mathworks/MATLAB-Live-Task-for-Python
As of 24a, you can find this live task directly in the product. To test it, create a new Live Script, and select Task in the Live Editor tab. You should see this icon under EXTERNAL LANGUAGE INTERFACES:
If you click on it, it will add the live task to your Live Script where the cursor is located. Alternatively, you can start typing “python” or “run” directly in your Live Script select the task:
This is what the in-product version of this Python live task looks like:
You can try it out by inputting raw python code, that takes a MATLAB input, like:
import textwrap as tw
format = tw.TextWrapper(width=30)
wrapped = format.wrap(T)
All Python functions calls from MATLAB have the same basic syntax:
The basic example that I give to kick things off is usually calling the square root function from the math module, that is part of the Python standard library. It makes little sense to call mathematics functions in Python from MATLAB, but it is easy to compare the result with what you would expect directly from MATLAB:
In the MATLAB Command Window:
>> py.math.sqrt(42)
In a MATLAB Live Script:
py.math.sqrt(42)
ans = 6.4807
We can create Python data structures from within MATLAB:
py.list([1,2,3])
ans =
Python list with values:
[1.0, 2.0, 3.0]
Use string, double or cell function to convert to a MATLAB array.
py.list({1,2,'a','b'})
ans =
Python list with values:
[1.0, 2.0, 'a', 'b']
Use string, double or cell function to convert to a MATLAB array.
s = struct('a', 1, 'b', 2)
s =
a: 1
b: 2
d = py.dict(s)
d =
Python dict with no properties.
{'a': 1.0, 'b': 2.0}
And we can run methods on those data structures from the MATLAB side:
methods(d)
Methods for class py.dict:
char copy eq get items le ne popitem struct values
clear dict ge gt keys lt pop setdefault update
Static methods:
fromkeys
Methods of py.dict inherited from handle.
d.get('a')
ans = 1
In this chapter, we will leverage a demo developed by a colleague of mine, Ian McKenna, who is a principal application engineer specialized in Finance at MathWorks. In this example, he is responsible for building enterprise web predictive analytics that other business critical applications can connect to as a web service. It follows the same structure as the weather example in chapter 2.
This web service is forecasting the price of cryptocurrencies:
It returns data in the following form (JSON):
[{"Time":"2022-01-21T12:00:00Z","predictedPrice":2466.17},
...
{"Time":"2022-01-21T17:00:00Z","predictedPrice":2442.25}]
The first step is to develop an application that simply shows the historical price movement of a particular cryptocurrency:
This allows you to monitor the evolution of the price over the last 24 hours and take decisions to buy or sell your crypto assets based on this. Then one day, you manager comes to you and says:
“Hey, I have an idea. If we had access to the predicted forward looking data as opposed to the historical data, we could make additional profit beyond what we’re currently making, even if the prediction is 100% accurate. “
Let’s assume the organization has a few quants that have extensive MATLAB expertise. And they know exactly how to build out such predictive models that the business users are looking for.
However, before we can get to that, our first challenge is to call the Python data scraping libraries and pull that data directly into MATLAB. Our first task at hand: Parse the cryptocurrency URL that we are connecting to, and just get out the domain name. For that, we want to use this function that’s contained within the Python standard libraries and use it from within MATLAB. In this case, we are going to call a package urllib
. It contains a sub-module called parse
, that contains in turn the function urlparse
.
startDate = '2022-01-21T12:00:00Z';
stopDate = '2022-01-21T17:00:00Z';
url = "https://api.pro.coinbase.com/products/ETH-USD/candles?start="+startDate+"&end="+stopDate+"&granularity=60"
url = "https://api.pro.coinbase.com/products/ETH-USD/candles?start=2022-01-21T12:00:00Z&end=2022-01-21T17:00:00Z&granularity=60"
urlparts = py.urllib.parse.urlparse(url)
urlparts =
Python ParseResult with properties:
fragment
hostname
netloc
params
password
path
port
query
scheme
username
ParseResult(scheme='https', netloc='api.pro.coinbase.com', path='/products/ETH-USD/candles', params='', query='start=2022-01-21T12:00:00Z&end=2022-01-21T17:00:00Z&granularity=60', fragment='')
domain = urlparts.netloc
domain =
Python str with no properties.
api.pro.coinbase.com
To avoid the unnecessary back and forth of intermediate data between MATLAB and Python, we write a Python User Defined Module, called dataLib.py
with a few functions in it:
jsonData = py.dataLib.getPriceData("ETH", startDate, stopDate)
data = py.dataLib.parseJson(jsonData, [0,4])
dataLib.py
imports 1-minute bars from Coinbase Pro. Note, the API does not fetch the first minute specified by the start date so the times span (start, stop]. To return data we are using a variety of data structures from Numpy arrays to lists and dictionaries, and even JSON.
This is how you would call this function from MATLAB.
Note: dataLib.py must be on Python’s path
product = "ETH"
product = "ETH"
startDate = '2022-01-21T12:00:00Z';
stopDate = '2022-01-21T17:00:00Z';
jsonData = py.dataLib.getPriceData(product, startDate, stopDate);
If you want to add interactivity to your Live Script, you can add so called Live Controls. This is helpful to point other people to areas where you may want to change parameters or select things to do scenario analysis.
You can insert controls from the ribbon:
This is how you would parametrize the Live Control:
Another type of Live Control that is useful here are simple checkboxes to select the information we want to return from the parseJson function:
Another type of Live Control that is useful here are simple checkboxes to select the information we want to return from the parseJson
function:
Dates = true;
Low = false;
High = false;
Open = false;
Close = true;
Volume = false;
% subtract 1 to convert to Python index starting at 0
selectedColumns = find([Dates Low High Open Close Volume])-1
selectedColumns = 1x2
0 4
Pay attention to the fact that we are subtracting 1 to the resulting array to adapt to Python indexing starting at 0.
% this function returns back two outputs as a tuple
data = py.dataLib.parseJson(jsonData, selectedColumns);
The last thing we will do in this part of the story is to convert the Python function outputs do MATLAB Data types (this will be covered in the last section of this chapter on mapping data between Python and MATLAB).
priceData = data{1}
priceData =
Python ndarray:
1.0e+09 *
1.6428 0.0000
1.6428 0.0000
...
1.6428 0.0000
1.6428 0.0000
Use details function to view the properties of the Python object.
Use double function to convert to a MATLAB array.
Use details function to view the properties of the Python object.
Use double function to convert to a MATLAB array.
columnNames = data{2}
columnNames =
Python list with no properties.
['Date', 'Close']
Then we can cast over the Numpy array on the right-hand side by just using the double command:
priceData = double(priceData)
priceData = 300x2
1.0e+09 *
1.6428 0.0000
1.6428 0.0000
1.6428 0.0000
1.6428 0.0000
1.6428 0.0000
1.6428 0.0000
1.6428 0.0000
1.6428 0.0000
1.6428 0.0000
1.6428 0.0000
Likewise, we have a variety of commands for casting lists like string (or cell before R2022a):
columnNames = string(columnNames);
Once we have those data in MATLAB, we will convert it over to the MATLAB table, which is basically equivalent to Pandas data frames:
data = array2table(priceData, 'VariableNames', columnNames);
Like tables, timetable are built-in data constructs that appeared in MATLAB over the last couple of years to make our lives easy for doing simple types of tasks or even complex types of tasks. If I want to deal with time zones and convert the times – which are with respect to universal time zone – to a view of someone who is in New York, the command datetime allows us to do that conversion:
data.Date = datetime(data.Date, 'ConvertFrom', 'posixtime', 'TimeZone', 'America/New_York')
Date | Close | |
---|---|---|
1 | 21-Jan-2022 12:00:00 | 2.8073e+03 |
2 | 21-Jan-2022 11:59:00 | 2.8108e+03 |
3 | 21-Jan-2022 11:58:00 | 2.8051e+03 |
4 | 21-Jan-2022 11:57:00 | 2.8071e+03 |
5 | 21-Jan-2022 11:56:00 | 2.8051e+03 |
6 | 21-Jan-2022 11:55:00 | 2.8028e+03 |
7 | 21-Jan-2022 11:54:00 | 2.7984e+03 |
8 | 21-Jan-2022 11:53:00 | 2.7983e+03 |
9 | 21-Jan-2022 11:52:00 | 2.8062e+03 |
10 | 21-Jan-2022 11:51:00 | 2.8054e+03 |
11 | 21-Jan-2022 11:50:00 | 2.8061e+03 |
12 | 21-Jan-2022 11:49:00 | 2.8012e+03 |
13 | 21-Jan-2022 11:48:00 | 2.8008e+03 |
14 | 21-Jan-2022 11:47:00 | 2.8030e+03 |
plot(data.Date, data.Close)
Reload Modified User-Defined Python Module
What if you’ve made modifications to the functions inside of your dataLib
module? You call those again from MATLAB, but you don’t see any difference. It is because you need to reload the module:
mod = py.importlib.import_module('dataLib');
py.importlib.reload(mod);
You may need to unload the module first, by clearing the classes. This will delete all variables, scripts and classes in your MATLAB workspace.
clear classes
If you’re running Python out-of-process, another approach is to simply terminate the process.
terminate(pyenv)
In some scientific fields like earth and climate sciences, we observe a growing Python community. But as programming skills may vary a lot in researchers and engineers, a MATLAB interface to Python community packages can open up some domain specific capabilities to the 5M+ MATLAB community.
One great example of this is the Climate Data Store Toolbox developed by Rob Purser, a fellow MathWorker. Rob and I are part of the MathWorks Open Source Program. We are promoting open-source, both to support the use of open-source software in MathWorks products and to help for MathWorkers to contribute their work on GitHub and the MATLAB File Exchange.
In this section we will demonstrate with the Climate Data Store Toolbox how to build MATLAB toolboxes on top of Python packages. It relies on the CDS Python API created by the European Centre for Medium-Range Weather Forecasts (ECMWF). The toolbox will automatically configure Python, download and install the CSAPI package (you can manually do it using pip install cdsapi
). You will need to create an account on https://cds.climate.copernicus.eu/ to retrieve data.
The first time you use it, it will prompt you for CSAPI credentials.
Select the data to download from the dataset (see this webpage for options). This is a “MATLABized” version of the python structure that is generated in the API request.
datasetName ="satellite-sea-ice-thickness";
options.version = "1_0";
options.variable = "all";
options.satellite = "cryosat_2";
options.cdr_type = ["cdr","icdr"];
options.year = ["2011","2021"];
options.month = "03";
[downloadedFilePaths,citation] = climateDataStoreDownload('satellite-sea-ice-thickness',options);
2021-12-03 14:45:47,502 INFO Welcome to the CDS
2021-12-03 14:45:47,504 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/satellite-sea-ice-thickness
2021-12-03 14:45:47,610 INFO Request is completed
2021-12-03 14:45:47,611 INFO Downloading https://download-0012.copernicus-climate.eu/cache-compute-0012/cache/data0/dataset-satellite-sea-ice-thickness-2e9e98de-6daf-4e4e-b54c-6e2d3717bda2.zip to C:\Users\rpurser\AppData\Local\Temp\tp047bab7e_df6a_405a_b357_cda6a03d28f6.zip (4.4M)
2021-12-03 14:45:50,037 INFO Download rate 1.8M/s
Once imported with Python, the NetCDF files are read with MATLAB using ncread and storing information as timetable with the function readSatelliteSeeIceThickness:
ice2011 = readSatelliteSeaIceThickness("satellite-sea-ice-thickness\ice_thickness_nh_ease2-250_cdr-v1p0_201103.nc");
ice2021 = readSatelliteSeaIceThickness("satellite-sea-ice-thickness\ice_thickness_nh_ease2-250_icdr-v1p0_202103.nc");
head(ice2021)
time | lat | lon | thickness | |
---|---|---|---|---|
1 | 01-Mar-2021 | 47.6290 | 144.0296 | 2.4566 |
2 | 01-Mar-2021 | 47.9655 | 144.0990 | 2.5800 |
3 | 01-Mar-2021 | 50.5072 | 148.0122 | -0.0364 |
4 | 01-Mar-2021 | 50.8360 | 148.1187 | 1.0242 |
5 | 01-Mar-2021 | 50.3237 | 146.9969 | 0.0518 |
6 | 01-Mar-2021 | 51.1642 | 148.2269 | 0.2445 |
7 | 01-Mar-2021 | 50.9112 | 147.6573 | 0.8933 |
8 | 01-Mar-2021 | 50.6540 | 147.0948 | 0.1271 |
disp(citation)
Generated using Copernicus Climate Change Service information 2021
This toolbox leverages the beautiful geoplotting capabilities of MATLAB:
subplot(1,2,1)
geodensityplot(ice2011.lat,ice2011.lon,ice2011.thickness,"FaceColor","interp")
geolimits([23 85],[-181.4 16.4])
geobasemap("grayterrain")
title("Ice Thickness, March 2011")
subplot(1,2,2)
geodensityplot(ice2021.lat,ice2021.lon,ice2021.thickness,"FaceColor","interp")
geolimits([23 85],[-181.4 16.4])
geobasemap("grayterrain")
title("Ice Thickness, March 2021")
f = gcf;
f.Position(3) = f.Position(3)*2;
In a well written toolbox like this one, you find a documentation that is packaged directly with it.
You can create your own toolbox and share it with others. These files can include MATLAB code, data, apps, examples, and documentation. When you create a toolbox, MATLAB generates a single installation file (.mltbx) that enables you or others to install your toolbox.
Read more on how to create and share toolboxes
One of the first difficulty you will face when developing bilingual applications, is debugging across the language boundary. In the following examples we will demonstrate how to attach a MATLAB session to a VSCode or Visual Studio process to debug the Python part of your app. In the next chapter, we will see how to do the opposite, debug the MATLAB part with the nice MATLAB Debugger.
This section is showing in 8 steps how to debug Python code called from MATLAB with VSCode:
See this tutorial for instructions on how to install Visual Studio Code, set up a Python project, select a Python interpreter, and create a launch.json
file.
$ python -m pip install debugpy
import debugpy
debugpy.debug_this_thread()
{
"version": "0.2.0",
"configurations": [
{
"name": "Attach to MATLAB",
"type": "python",
"request": "attach",
"processId": "${command:pickProcess}"
}
]
}
Add breakpoints to your code.
ExecutionMode
is set to InProcess
.
```matlab
pyenv
ans = PythonEnvironment with properties:
Version: "3.8"
Executable: "C:\Users\ydebray\AppData\Local\Programs\Python\Python38\python.exe"
Library: "C:\Users\ydebray\AppData\Local\Programs\Python\Python38\python38.dll"
Home: "C:\Users\ydebray\AppData\Local\Programs\Python\Python38"
Status: Loaded
ExecutionMode: InProcess
ProcessID: "12772"
ProcessName: "MATLAB"
If you see `Status: NotLoaded`, execute any Python command to load the Python interpreter (for example `>> py.list` ) then execute the `pyenv` command to get the `ProcessID` for the MATLAB process.
7. Attach the MATLAB process to VS Code.
In VS Code, select "Run and Debug" (Ctrl+Shift+D), then select the arrow to Start Debugging (F5). In this example, the green arrow has the label "Attach to MATLAB". Note that this corresponds to the value of the "name" parameter that you specified in the `launch.json` file. Type "matlab" in the search bar of the dropdown menu and select the "MATLAB.exe" process that matches the ProcessID from the output of the pyenv command. Note that if you are using OutOfProcess execution mode, you will need to search for a "MATLABPyHost.exe" process.
**In-process:**
![](/matlab-with-python-book/media/image47.png)
**Out-of-Process:**
![](/matlab-with-python-book/media/image48.png)
8. Invoke the Python function from MATLAB. Execution should stop at the
breakpoint.
![](/matlab-with-python-book/media/image49.png)
Run the following MATLAB code to step into the Python function search:
```matlab
>> N = py.list({'Jones','Johnson','James'})
>> py.mymod.search(N)
If you have access to Visual Studio and you are more familiar with it, you can do the same as before with Visual Studio. Open Visual Studio and create a new Python project from existing code. Then, select Attach to Process from the Debug menu:
In his book about Python for MATLAB Development, Albert Danial shares some clever functions to convert MATLAB variables into an equivalent Python-native variable with mat2py, and vice-versa with py2mat.
Converting data returned by Python function inside of MATLAB may require understanding some of the differences in the native datatypes of the two languages:
Some specialized MATLAB data types like timetable or categorical will require some extra love and need to be converted manually. Of course, we can still use these data types in our functions, but the functions need to return types that the Python interpreter can understand.
The table below shows the mappings for common scalar data types:
By default, numbers in MATLAB are double, whereas numbers without decimal point in Python are integers.
a = py.dataExchange.get_float()
a = 1
class(a)
ans = 'double'
b = py.dataExchange.get_complex()
b = 2.0000 + 0.0000i
class(b)
ans = 'double'
There are several kinds of integers in MATLAB, depending on the precision you require.
For instance uint8 can only store positive numbers between 0 and 255, whereas int8 covers the range [-2^7,2^7-1].
The most generic type to convert Python integers are int64, which you can do explicitly.
c = py.dataExchange.get_integer()
c =
Python int with properties:
denominator: [1x1 py.int]
imag: [1x1 py.int]
numerator: [1x1 py.int]
real: [1x1 py.int]
3
class(c)
ans = 'py.int'
int64(c)
ans = 3
When getting a string from a Python function, the convertion isn’t obvious. It can either be turned into a char (character array) or a string.
You can distinguish them by the single quotation marks for chars, and double quotes for strings.
abc = py.dataExchange.get_string()
abc =
Python str with no properties.
abc
char(abc)
ans = 'abc'
class(char(abc))
ans = 'char'
string(abc)
ans = "abc"
class(string(abc))
ans = 'string'
Finally, the last basic datatype that contains a logical information is called a boolean in Python:
py.dataExchange.get_boolean()
ans =
1
This is how containers map to each other between the two languages:
To illustrate the conversion of Python dictionaries and lists into MATLAB containers, we will reuse the example from chapter 2.
JSON data are really close to dictionaries in Python, which makes the data processing very easy when accessing data from web services.
url = webread("https://samples.openweathermap.org").products.current_weather.samples{1};
r = py.urllib.request.urlopen(url).read();
json_data = py.json.loads(r);
py.weather.parse_current_json(json_data)
ans =
Python dict with no properties.
{'temp': 280.32, 'pressure': 1012, 'humidity': 81, 'temp_min': 279.15, 'temp_max': 281.15, 'speed': 4.1, 'deg': 80, 'lon': -0.13, 'lat': 51.51, 'city': 'London', 'current_time': '2022-05-22 22:15:18.161296'}
Dictionaries can contain scalars, but also other datatypes like lists.
url2 = webread("https://samples.openweathermap.org").products.forecast_5days.samples{1};
r2 = py.urllib.request.urlopen(url2).read();
json_data2 = py.json.loads(r2);
forecast = struct(py.weather.parse_forecast_json(json_data2))
forecast =
current_time: [1x40 py.list]
temp: [1x40 py.list]
deg: [1x40 py.list]
speed: [1x40 py.list]
humidity: [1x40 py.list]
pressure: [1x40 py.list]
forecastTemp = forecast.temp;
forecastTime = forecast.current_time;
Lists containing only numeric data can be converted into doubles since MATLAB R2022a:
double(forecastTemp)
ans = 1x40
261.4500 261.4100 261.7600 261.4600 260.9810 262.3080 263.7600 264.1820 264.6700 265.4360 266.1040 266.9040 268.1020 270.2690 270.5850 269.6610 269.1550 268.0560 265.8030 263.3810 261.8500 263.4550 264.0150 259.6840 255.1880 255.5940 256.9600 258.1090 259.5330 263.4380 264.2280 261.1530 258.8180 257.2180 255.7820 254.8190 257.4880 259.8270 261.2560 260.2600
And any lists can be converted to string (even those containing a mix of text and numeric data).
forecastTimeString = string(forecastTime);
datetime(forecastTimeString)
ans = 1x40 datetime
30-Jan-2017 18:00:0030-Jan-2017 21:00:0031-Jan-2017 00:00:0031-Jan-2017 03:00:0031-Jan-2017 06:00:0031-Jan-2017 09:00:0031-Jan-2017 12:00:0031-Jan-2017 15:00:0031-Jan-2017 18:00:0031-Jan-2017 21:00:0001-Feb-2017 00:00:0001-Feb-2017 03:00:0001-Feb-2017 06:00:0001-Feb-2017 09:00:0001-Feb-2017 12:00:0001-Feb-2017 15:00:0001-Feb-2017 18:00:0001-Feb-2017 21:00:0002-Feb-2017 00:00:0002-Feb-2017 03:00:0002-Feb-2017 06:00:0002-Feb-2017 09:00:0002-Feb-2017 12:00:0002-Feb-2017 15:00:0002-Feb-2017 18:00:0002-Feb-2017 21:00:0003-Feb-2017 00:00:0003-Feb-2017 03:00:0003-Feb-2017 06:00:0003-Feb-2017 09:00:00
Before MATLAB R2022a, Python lists need to be converted into MATLAB cell arrays.
Cells can then be transformed to double, strings, with the cellfun function.
The previous code would look like this until R2021b:
forecastTempCell = cell(forecastTemp)
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 261.4500 | 261.4100 | 261.7600 | 261.4600 | 260.9810 | 262.3080 | 263.7600 | 264.1820 | 264.6700 | 265.4360 | 266.1040 | 266.9040 | 268.1020 | 270.2690 | 270.5850 | 269.6610 | 269.1550 | 268.0560 | 265.8030 | 263.3810 | 261.8500 | 263.4550 | 264.0150 | 259.6840 | 255.1880 | 255.5940 | 256.9600 | 258.1090 | 259.5330 | 263.4380 |
cellfun(@double,forecastTempCell)
ans = 1x40
261.4500 261.4100 261.7600 261.4600 260.9810 262.3080 263.7600 264.1820 264.6700 265.4360 266.1040 266.9040 268.1020 270.2690 270.5850 269.6610 269.1550 268.0560 265.8030 263.3810 261.8500 263.4550 264.0150 259.6840 255.1880 255.5940 256.9600 258.1090 259.5330 263.4380 264.2280 261.1530 258.8180 257.2180 255.7820 254.8190 257.4880 259.8270 261.2560 260.2600
forecastTimeCell = cell(forecastTime)
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str | 1x19 str |
cellfun(@string,forecastTimeCell)
ans = 1x40 string
"2017-01-30 18:0… "2017-01-30 21:0… "2017-01-31 00:0… "2017-01-31 03:0… "2017-01-31 06:0… "2017-01-31 09:0… "2017-01-31 12:0… "2017-01-31 15:0… "2017-01-31 18:0… "2017-01-31 21:0… "2017-02-01 00:0… "2017-02-01 03:0… "2017-02-01 06:0… "2017-02-01 09:0… "2017-02-01 12:0… "2017-02-01 15:0… "2017-02-01 18:0… "2017-02-01 21:0… "2017-02-02 00:0… "2017-02-02 03:0… "2017-02-02 06:0… "2017-02-02 09:0… "2017-02-02 12:0… "2017-02-02 15:0… "2017-02-02 18:0… "2017-02-02 21:0… "2017-02-03 00:0… "2017-02-03 03:0… "2017-02-03 06:0… "2017-02-03 09:0…
By modifying the parse_forecast_json function in the weather module, we can output Python arrays instead of lists.
There exists indeed a native array datatype in base Python.
forecast2 = struct(py.weather.parse_forecast_json2(json_data2))
forecast2 =
current_time: [1x40 py.list]
temp: [1x1 py.array.array]
deg: [1x1 py.array.array]
speed: [1x1 py.array.array]
humidity: [1x1 py.array.array]
pressure: [1x1 py.array.array]
The MATLAB double function will convert the Python array into a MATLAB array
double(forecast2.temp)
ans = 1x40
261.4500 261.4100 261.7600 261.4600 260.9810 262.3080 263.7600 264.1820 264.6700 265.4360 266.1040 266.9040 268.1020 270.2690 270.5850 269.6610 269.1550 268.0560 265.8030 263.3810 261.8500 263.4550 264.0150 259.6840 255.1880 255.5940 256.9600 258.1090 259.5330 263.4380 264.2280 261.1530 258.8180 257.2180 255.7820 254.8190 257.4880 259.8270 261.2560 260.2600
Those data conversion also apply to Numpy arrays:
npA = py.numpy.array([1,2,3;4,5,6;7,8,9])
npA =
Python ndarray:
1 2 3
4 5 6
7 8 9
Use details function to view the properties of the Python object.
Use double function to convert to a MATLAB array.
double(npA)
ans = 3x3
1 2 3
4 5 6
7 8 9
One common question on data transfer, is how to exchange data between MATLAB tables and Pandas Dataframes. Since 24a, Pandas Dataframes can be casted into MATLAB tables:
d.a=[1,2,3];
d.b=[4,5,6];
df = py.pandas.DataFrame(d)
df =
Python DataFrame with properties:
T: [1x1 py.pandas.core.frame.DataFrame]
at: [1x1 py.pandas.core.indexing._AtIndexer]
attrs: [1x1 py.dict]
axes: [1x2 py.list]
columns: [1x1 py.pandas.core.indexes.base.Index]
dtypes: [1x1 py.pandas.core.series.Series]
empty: 0
flags: [1x1 py.pandas.core.flags.Flags]
iat: [1x1 py.pandas.core.indexing._iAtIndexer]
iloc: [1x1 py.pandas.core.indexing._iLocIndexer]
index: [1x1 py.pandas.core.indexes.range.RangeIndex]
loc: [1x1 py.pandas.core.indexing._LocIndexer]
ndim: [1x1 py.int]
shape: [1x2 py.tuple]
size: [1x1 py.numpy.int32]
style: [1x1 py.pandas.io.formats.style.Styler]
values: [1x1 py.numpy.ndarray]
a b
0 1.0 4.0
1 2.0 5.0
2 3.0 6.0
T = table(df)
| |a|b| |:–:|:–:|:–:| |1|1|4| |2|2|5| |3|3|6|
Before 24a, the recommended solution was to rely on Parquet files. Parquet is a columnar storage format that enables to store \& transfer tabular data between languages. It is available to any project in the Hadoop big data ecosystem, regardless of the choice of data processing framework, data model or programming language (more on Parquet).
This example demonstrates a back and forth between Pandas DataFrames and MATLAB Tables:
pq_CreateDataFrame.py
import pandas as pd
import numpy as np
# create dataframe
df = pd.DataFrame({'column1': [-1, np.nan, 2.5],
'column2': ['foo', 'bar', 'tree'],
'column3': [True, False, True]})
print(df)
# save dataframe to parquet file via pyarrow library
df.to_parquet('data.parquet', index=False)
Read in parquet file
% info = parquetinfo('data.parquet')
data = parquetread('data.parquet')
column1 | column2 | column3 | |
---|---|---|---|
1 | -1 | “foo” | 1 |
2 | NaN | “bar” | 0 |
3 | 2.5000 | “tree” | 1 |
Examine datatype of a particular column
class(data.column2)
ans = 'string'
Change data in table
data.column2 = ["orange"; "apple"; "banana"];
Write the results back to parquet
parquetwrite('newdata.parquet', data)
Finally read the modified DataFrame back in Python:
pq_ReadTable.py
import pandas as pd
import os
# change to current directory
thisDirectory = os.path.dirname(os.path.realpath(__file__))
os.chdir(thisDirectory)
# read parquet file via pyarrow library
df = pd.read_parquet('newdata.parquet')
print(df)