<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns="http://purl.org/rss/1.0/">
<channel rdf:about="http://www.inderscience.com/current_issue_rss/index.php?journal=ijhpsa">
<title>Most recent issue published online for the International Journal of High Performance Systems Architecture.</title>
<description>International Journal of High Performance Systems Architecture</description>
<link>http://www.inderscience.com/browse/index.php?journalID=213&amp;year=2011&amp;vol=3&amp;issue=2/3</link>
<dc:publisher>Inderscience Publishers Ltd</dc:publisher>
<dc:language>en-uk</dc:language>
<prism:publicationName>International Journal of High Performance Systems Architecture</prism:publicationName>
<prism:issn>1751-6528</prism:issn>
<prism:eIssn>1751-6536</prism:eIssn>
<prism:copyright>&#169; 2011 Inderscience Publishers Ltd</prism:copyright>
<prism:rightsAgent>editor@inderscience.com</prism:rightsAgent>
<image rdf:resource="https://www.inderscience.com/images/files/coverImgs/ijhpsa_scoverijhpsa.jpg" />
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040460" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040461" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040462" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040463" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040464" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040465" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040466" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040467" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040468" />
<rdf:li rdf:resource="http://dx.doi.org/10.1504/IJHPSA.2011.040469" />
</rdf:Seq>
</items>
</channel>
<image rdf:about="https://www.inderscience.com/images/files/coverImgs/ijhpsa_scoverijhpsa.jpg">
<title>International Journal of High Performance Systems Architecture</title>
<url>https://www.inderscience.com/images/files/coverImgs/ijhpsa_scoverijhpsa.jpg</url>
<link>http://www.inderscience.com/browse/index.php?journalID=213&amp;year=2011&amp;vol=3&amp;issue=2/3</link>
</image>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040460">
<title>A minimalist cache coherent MPSoC designed for FPGAs</title>
<link>http://www.inderscience.com/link.php?id=40460</link>
<description>We describe the design and VHDL implementation of a cache coherent MPSoC named minimalist cache coherent MPSoC &#40;MCCM&#41;. The system comprises one to eight MIPS&#45;I processors, coherent primary data caches, memory management units, memory controller and the interconnection. We present a detailed account of the implementation, focusing on the shared memory subsystem. A simple benchmark is used to assess the overall system functionality. We compared the size of our design to that of a LEON3&#45;based multiprocessor and found that a four&#45;core LEON3 system needs roughly the same amount of logic&amp;&#35;47;state as a six to eight cores MCCM.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40460"><b>A minimalist cache coherent MPSoC designed for FPGAs</b></A><br />Jorge Tortato Junior, Roberto A. Hexsel<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 67 - 76</i><br />We describe the design and VHDL implementation of a cache coherent MPSoC named minimalist cache coherent MPSoC &#40;MCCM&#41;. The system comprises one to eight MIPS&#45;I processors, coherent primary data caches, memory management units, memory controller and the interconnection. We present a detailed account of the implementation, focusing on the shared memory subsystem. A simple benchmark is used to assess the overall system functionality. We compared the size of our design to that of a LEON3&#45;based multiprocessor and found that a four&#45;core LEON3 system needs roughly the same amount of logic&amp;&#35;47;state as a six to eight cores MCCM.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040460</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 67 - 76</dc:source>
<dc:creator>Jorge Tortato Junior</dc:creator>
<dc:creator>Roberto A. Hexsel</dc:creator>
<dc:contributor>Datacom Telematica, Av. Carlos de Carvalho 603, Sala 122, CEP 80430&#45;180, Curitiba, PR, Brazil. &#39; Departamento de Informatica, Universidade Federal do Parana &#40;UFPR&#41;, Caixa Postal 19.081 &amp;ndash; CEP 81531&#45;990, Curitiba, PR, Brazil</dc:contributor>
<dc:subject>multicore</dc:subject>
<dc:subject>shared memory multiprocessors</dc:subject>
<dc:subject>cache coherence</dc:subject>
<dc:subject>FPGA</dc:subject>
<dc:subject>MPSoC</dc:subject>
<dc:subject>VHDL implementation</dc:subject>
<dc:subject>field programmable gate arrays.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>67</prism:startingPage>
<prism:endingPage>76</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040461">
<title>Dynamic workload balancing deques for branch and bound algorithms in the message passing interface</title>
<link>http://www.inderscience.com/link.php?id=40461</link>
<description>The message passing interface &#40;MPI&#41; is the standard in message passing parallel computation. MPI does not provide a canonical way to dynamically distribute run&#45;time generated workload evenly across all the participating computer nodes. This paper proposes a MPI library to provide near&#45;optimal dynamical workload balancing over branch and bound &#40;B&amp;amp;B&#41; algorithms; B&amp;amp;B potentially produces huge workload unbalance during a parallel execution. The library, named RaWSDM, provides a double ended queue &#40;deque&#41; data structure on which the programmer may pop, process, and later, pull back some parallel tasks; an underlying efficient system scheduler is responsible for keeping the workload balanced by exchanging elements among all deques. Theoretical bounds are traced and practical experiments are performed with the unlimited knapsack problem. Results show a performance gain up to 80&amp;&#35;37; &#40;best&#45;case scenario&#41; against a pure MPI implementation using round&#45;robin scheduling, with near linear speedup and memory consumption.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40461"><b>Dynamic workload balancing deques for branch and bound algorithms in the message passing interface</b></A><br />Stefano Mor, Nicolas Maillard<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 77 - 86</i><br />The message passing interface &#40;MPI&#41; is the standard in message passing parallel computation. MPI does not provide a canonical way to dynamically distribute run&#45;time generated workload evenly across all the participating computer nodes. This paper proposes a MPI library to provide near&#45;optimal dynamical workload balancing over branch and bound &#40;B&amp;amp;B&#41; algorithms; B&amp;amp;B potentially produces huge workload unbalance during a parallel execution. The library, named RaWSDM, provides a double ended queue &#40;deque&#41; data structure on which the programmer may pop, process, and later, pull back some parallel tasks; an underlying efficient system scheduler is responsible for keeping the workload balanced by exchanging elements among all deques. Theoretical bounds are traced and practical experiments are performed with the unlimited knapsack problem. Results show a performance gain up to 80&amp;&#35;37; &#40;best&#45;case scenario&#41; against a pure MPI implementation using round&#45;robin scheduling, with near linear speedup and memory consumption.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040461</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 77 - 86</dc:source>
<dc:creator>Stefano Mor</dc:creator>
<dc:creator>Nicolas Maillard</dc:creator>
<dc:contributor>Informatics Institute, Federal University of Rio Grande do Sul, Av. Bento Goncalves 9500, Porto Alegre, RS, Brazil. &#39; Informatics Institute, Federal University of Rio Grande do Sul, Av. Bento Goncalves 9500, Porto Alegre, RS, Brazil</dc:contributor>
<dc:subject>message passing interface</dc:subject>
<dc:subject>MPI</dc:subject>
<dc:subject>scheduling</dc:subject>
<dc:subject>work stealing</dc:subject>
<dc:subject>branch and bound</dc:subject>
<dc:subject>parallel tasks</dc:subject>
<dc:subject>dynamic workload balancing</dc:subject>
<dc:subject>workload unbalance</dc:subject>
<dc:subject>parallel computing</dc:subject>
<dc:subject>double ended queue</dc:subject>
<dc:subject>deque data structure.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>77</prism:startingPage>
<prism:endingPage>86</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040462">
<title>Challenges and solutions to improve the scalability of an operational regional meteorological forecasting model</title>
<link>http://www.inderscience.com/link.php?id=40462</link>
<description>This work investigates the parallel scalability of BRAMS, a limited area weather forecasting production code, from O&#40;100&#41; cores to O&#40;1,000&#41; cores on large grids &#40;20 km and 10 km resolution runs over South America&#41;. Initial experiments show lack of scalability at modest core count. Execution time profiling and source code examination revealed the causes of the limited scalability&#58; sequential algorithms and extensive memory requirements at scarcely used phases of the computation. As processor count increases, these &#39;secondary&#39; phases dominate execution time. Algorithm replacement and memory reduction generate a new code version that possesses strong and weak scaling. The new version achieved a speed&#45;up of 6 from 100 to 700 processors on the 20 km resolution grid and a speed&#45;up of 6.9 on the same processor range on the 10 km resolution grid. Results were confirmed at another machine with a distinct architecture. Further experiments show that the scalability of the 20 km resolution case is limited by load unbalancing at the most demanding computational phase.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40462"><b>Challenges and solutions to improve the scalability of an operational regional meteorological forecasting model</b></A><br />Alvaro L. Fazenda, Jairo Panetta, Daniel M. Katsurayama, Luiz F. Rodrigues, Luis F.G. Motta, Philippe O.A. Navaux<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 87 - 97</i><br />This work investigates the parallel scalability of BRAMS, a limited area weather forecasting production code, from O&#40;100&#41; cores to O&#40;1,000&#41; cores on large grids &#40;20 km and 10 km resolution runs over South America&#41;. Initial experiments show lack of scalability at modest core count. Execution time profiling and source code examination revealed the causes of the limited scalability&#58; sequential algorithms and extensive memory requirements at scarcely used phases of the computation. As processor count increases, these &#39;secondary&#39; phases dominate execution time. Algorithm replacement and memory reduction generate a new code version that possesses strong and weak scaling. The new version achieved a speed&#45;up of 6 from 100 to 700 processors on the 20 km resolution grid and a speed&#45;up of 6.9 on the same processor range on the 10 km resolution grid. Results were confirmed at another machine with a distinct architecture. Further experiments show that the scalability of the 20 km resolution case is limited by load unbalancing at the most demanding computational phase.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040462</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 87 - 97</dc:source>
<dc:creator>Alvaro L. Fazenda</dc:creator>
<dc:creator>Jairo Panetta</dc:creator>
<dc:creator>Daniel M. Katsurayama</dc:creator>
<dc:creator>Luiz F. Rodrigues</dc:creator>
<dc:creator>Luis F.G. Motta</dc:creator>
<dc:creator>Philippe O.A. Navaux</dc:creator>
<dc:contributor>Federal University of Sao Paulo, Institute of Science and Technology, Sao Jose dos Campos, Brazil. &#39; Brazilian National Institute for Space Research, Center for Weather Prediction and Climate Studies, Cachoeira Paulista, Brazil. &#39; Brazilian National Institute for Space Research, Center for Weather Prediction and Climate Studies, Cachoeira Paulista, Brazil. &#39; Brazilian National Institute for Space Research, Center for Weather Prediction and Climate Studies, Cachoeira Paulista, Brazil. &#39; Brazilian National Institute for Space Research, Center for Weather Prediction and Climate Studies, Cachoeira Paulista, Brazil. &#39; Federal University of Rio Grande do Sul, Informatics Institute, Porto Alegre, Brazil</dc:contributor>
<dc:subject>parallel scalability</dc:subject>
<dc:subject>regional weather forecasting</dc:subject>
<dc:subject>weather forecasting models</dc:subject>
<dc:subject>modelling</dc:subject>
<dc:subject>load unbalancing</dc:subject>
<dc:subject>meteorological forecasting</dc:subject>
<dc:subject>meteorology.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>87</prism:startingPage>
<prism:endingPage>97</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040463">
<title>Automated refactorings for high performance Fortran programmes</title>
<link>http://www.inderscience.com/link.php?id=40463</link>
<description>Refactoring is a software engineering technique aimed at improving the design of software applications, without changing their external behaviour. Several refactorings have been proposed for object&#45;oriented languages, but there are few related works focusing on procedural programming. Fortran is a procedural language heavily used in high performance computing, which is not fully explored considering refactoring support. In this paper, we describe a set of automated refactorings for Fortran based on the Photran plug&#45;in, which is integrated with the Eclipse integrated development environment &#40;IDE&#41;. We present a set of experiments to evaluate the impact of the proposed refactorings in third&#45;party Fortran applications. The results show that the proposed refactorings improve the design of existing applications without compromising their performance.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40463"><b>Automated refactorings for high performance Fortran programmes</b></A><br />Bruno Batista Boniati, Andrea Schwertner Charao, Benhur De Oliveira Stein, Gustavo Rissetti, Eduardo Kessler Piveta<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 98 - 109</i><br />Refactoring is a software engineering technique aimed at improving the design of software applications, without changing their external behaviour. Several refactorings have been proposed for object&#45;oriented languages, but there are few related works focusing on procedural programming. Fortran is a procedural language heavily used in high performance computing, which is not fully explored considering refactoring support. In this paper, we describe a set of automated refactorings for Fortran based on the Photran plug&#45;in, which is integrated with the Eclipse integrated development environment &#40;IDE&#41;. We present a set of experiments to evaluate the impact of the proposed refactorings in third&#45;party Fortran applications. The results show that the proposed refactorings improve the design of existing applications without compromising their performance.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040463</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 98 - 109</dc:source>
<dc:creator>Bruno Batista Boniati</dc:creator>
<dc:creator>Andrea Schwertner Charao</dc:creator>
<dc:creator>Benhur De Oliveira Stein</dc:creator>
<dc:creator>Gustavo Rissetti</dc:creator>
<dc:creator>Eduardo Kessler Piveta</dc:creator>
<dc:contributor>Colegio Agricola de Frederico Westphalen, Campus UFSM, Linha 7 de Setembro, BR 386, Km 40, 98400&#45;000 &amp;ndash; Frederico Westphalen, RS, Brazil. &#39; PPGI, Universidade Federal de Santa Maria, Avenida Roraima, 1000, Cidade Universitaria, 97105&#45;900 &amp;ndash; Santa Maria, RS, Brazil. &#39; PPGI, Universidade Federal de Santa Maria, Avenida Roraima, 1000, Cidade Universitaria, 97105&#45;900 &amp;ndash; Santa Maria, RS, Brazil. &#39; PPGI, Universidade Federal de Santa Maria, Avenida Roraima, 1000, Cidade Universitaria, 97105&#45;900 &amp;ndash; Santa Maria, RS, Brazil. &#39; PPGI, Universidade Federal de Santa Maria, Avenida Roraima, 1000, Cidade Universitaria, 97105&#45;900 &amp;ndash; Santa Maria, RS, Brazil</dc:contributor>
<dc:subject>software refactoring</dc:subject>
<dc:subject>source code restructuring</dc:subject>
<dc:subject>high performance systems</dc:subject>
<dc:subject>Fortran programming</dc:subject>
<dc:subject>software design tools</dc:subject>
<dc:subject>integrated development environment</dc:subject>
<dc:subject>Eclipse IDE</dc:subject>
<dc:subject>software engineering</dc:subject>
<dc:subject>procedural programming.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>98</prism:startingPage>
<prism:endingPage>109</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040464">
<title>Assessing the influence of data access patterns and contention management policies on the performance of software transactional memory systems</title>
<link>http://www.inderscience.com/link.php?id=40464</link>
<description>Transactional memory was proposed as a mean for easing the burden of traditional concurrency control mechanisms. The programmer has only to mark the code sections that are to be executed atomically, and the system takes care of the synchronisation details. As transactions are executed in parallel, some of them are likely to access resources in ways that cannot be conciliated. Conflicts among transactions are mediated by a contention manager. In this work, we present a novel approach to contention management &#40;CM&#41;, which binds different CM strategies to different data in a programme, based on the access patterns to these data. We show how it can be done in a way that introduces minimum overhead and present benchmark results to evaluate our implementation, also demonstrating how the best CM strategy may vary under different levels of contention, under a varying number of threads per processing core, and under different system architectures.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40464"><b>Assessing the influence of data access patterns and contention management policies on the performance of software transactional memory systems</b></A><br />Fernando Kronbauer, Sandro Rigo<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 110 - 121</i><br />Transactional memory was proposed as a mean for easing the burden of traditional concurrency control mechanisms. The programmer has only to mark the code sections that are to be executed atomically, and the system takes care of the synchronisation details. As transactions are executed in parallel, some of them are likely to access resources in ways that cannot be conciliated. Conflicts among transactions are mediated by a contention manager. In this work, we present a novel approach to contention management &#40;CM&#41;, which binds different CM strategies to different data in a programme, based on the access patterns to these data. We show how it can be done in a way that introduces minimum overhead and present benchmark results to evaluate our implementation, also demonstrating how the best CM strategy may vary under different levels of contention, under a varying number of threads per processing core, and under different system architectures.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040464</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 110 - 121</dc:source>
<dc:creator>Fernando Kronbauer</dc:creator>
<dc:creator>Sandro Rigo</dc:creator>
<dc:contributor>Motorola Mobility, Jaguariuna, SP, Rod. SP 340 &amp;ndash; Km 128,7, Tanquinho Velho, Jaguariuna, SP, 13820&#45;000, Brazil. &#39; Institute of Computing, University of Campinas, Av. Albert Einstein 1251, CEP 13083&#45;852, Campinas&#45;SP, Brazil</dc:contributor>
<dc:subject>contention management</dc:subject>
<dc:subject>parallel programming</dc:subject>
<dc:subject>concurrent programming</dc:subject>
<dc:subject>parallel architectures</dc:subject>
<dc:subject>software transactional memory</dc:subject>
<dc:subject>high performance systems</dc:subject>
<dc:subject>data access patterns</dc:subject>
<dc:subject>concurrency control.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>110</prism:startingPage>
<prism:endingPage>121</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040465">
<title>The impact of applications&#39; I&amp;&#35;47;O strategies on the performance of the Lustre parallel file system</title>
<link>http://www.inderscience.com/link.php?id=40465</link>
<description>Parallel applications present multiple approaches regarding the management of data. Due to specific characteristics of parallel file systems, some approaches will provide better performance than others due to a better matching to the system&#39;s internals. One common situation is when each instance of an application accesses exclusive data stored in the file system. This paper studies some I&amp;&#35;47;O techniques for this situation and evaluates them on the Lustre file system. We provide a guide to help developers tune their application to extract the best performance out of Lustre. Our results show expressive gains in performance related with the choice of access pattern of the application. We present considerations on operation granularity, intra&#45;node concurrency and temporal behaviour of the application.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40465"><b>The impact of applications&#39; I&amp;&#35;47;O strategies on the performance of the Lustre parallel file system</b></A><br />Francieli Zanon Boito, Rodrigo Virote Kassick, Philippe O.A. Navaux<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 122 - 136</i><br />Parallel applications present multiple approaches regarding the management of data. Due to specific characteristics of parallel file systems, some approaches will provide better performance than others due to a better matching to the system&#39;s internals. One common situation is when each instance of an application accesses exclusive data stored in the file system. This paper studies some I&amp;&#35;47;O techniques for this situation and evaluates them on the Lustre file system. We provide a guide to help developers tune their application to extract the best performance out of Lustre. Our results show expressive gains in performance related with the choice of access pattern of the application. We present considerations on operation granularity, intra&#45;node concurrency and temporal behaviour of the application.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040465</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 122 - 136</dc:source>
<dc:creator>Francieli Zanon Boito</dc:creator>
<dc:creator>Rodrigo Virote Kassick</dc:creator>
<dc:creator>Philippe O.A. Navaux</dc:creator>
<dc:contributor>Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil. &#39; Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil. &#39; Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil</dc:contributor>
<dc:subject>parallel file systems</dc:subject>
<dc:subject>PFS</dc:subject>
<dc:subject>Lustre</dc:subject>
<dc:subject>parallel I&amp;&#35;47</dc:subject>
<dc:subject>O</dc:subject>
<dc:subject>access patterns</dc:subject>
<dc:subject>HPC</dc:subject>
<dc:subject>clustering</dc:subject>
<dc:subject>I&amp;&#35;47</dc:subject>
<dc:subject>O granularity</dc:subject>
<dc:subject>scalability</dc:subject>
<dc:subject>I&amp;&#35;47</dc:subject>
<dc:subject>O strategies</dc:subject>
<dc:subject>data organisation</dc:subject>
<dc:subject>input output</dc:subject>
<dc:subject>intra&#45;node concurrency</dc:subject>
<dc:subject>temporal behaviour</dc:subject>
<dc:subject>high performance systems.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>122</prism:startingPage>
<prism:endingPage>136</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040466">
<title>Trebuchet&#58; exploring TLP with dataflow virtualisation</title>
<link>http://www.inderscience.com/link.php?id=40466</link>
<description>Parallel programming has become mandatory to fully exploit the potential of multi&#45;core CPUs. The dataflow model provides a natural way to exploit parallelism. However, specifying dependences and control using fine&#45;grained instructions in dataflow programs can be complex and present unwanted overheads. To address this issue, we have designed TALM&#58; a coarse&#45;grained dataflow execution model to be used on top of widespread architectures. We implemented TALM as the Trebuchet virtual machine for multi&#45;cores. The programmer identifies code blocks that can run in parallel and connects them to form a dataflow graph, which allows one to have the benefits of parallel dataflow execution in a Von Neumann machine, with small programming effort. We parallelised a set of seven applications using our approach and compared with OpenMP implementations. Results show that Trebuchet can be competitive with state&#45;of&#45;the&#45;art technology, while providing the benefits of dataflow execution.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40466"><b>Trebuchet&#58; exploring TLP with dataflow virtualisation</b></A><br />Tiago A.O. Alves, Leandro A.J. Marzulo, Felipe M.G. Franca, Vitor Santos Costa<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 137 - 148</i><br />Parallel programming has become mandatory to fully exploit the potential of multi&#45;core CPUs. The dataflow model provides a natural way to exploit parallelism. However, specifying dependences and control using fine&#45;grained instructions in dataflow programs can be complex and present unwanted overheads. To address this issue, we have designed TALM&#58; a coarse&#45;grained dataflow execution model to be used on top of widespread architectures. We implemented TALM as the Trebuchet virtual machine for multi&#45;cores. The programmer identifies code blocks that can run in parallel and connects them to form a dataflow graph, which allows one to have the benefits of parallel dataflow execution in a Von Neumann machine, with small programming effort. We parallelised a set of seven applications using our approach and compared with OpenMP implementations. Results show that Trebuchet can be competitive with state&#45;of&#45;the&#45;art technology, while providing the benefits of dataflow execution.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040466</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 137 - 148</dc:source>
<dc:creator>Tiago A.O. Alves</dc:creator>
<dc:creator>Leandro A.J. Marzulo</dc:creator>
<dc:creator>Felipe M.G. Franca</dc:creator>
<dc:creator>Vitor Santos Costa</dc:creator>
<dc:contributor>Programa de Engenharia de Sistemas e Computacao, COPPE, Universidade Federal do Rio de Janeiro, Cidade Universitaria, Centro de Tecnologia, Bloco H, Sala 319, Rio de Janeiro, RJ, 21941&#45;972, Brazil. &#39; Programa de Engenharia de Sistemas e Computacao, COPPE, Universidade Federal do Rio de Janeiro, Cidade Universitaria, Centro de Tecnologia, Bloco H, Sala 319, Rio de Janeiro, RJ, 21941&#45;972, Brazil. &#39; Programa de Engenharia de Sistemas e Computacao, COPPE, Universidade Federal do Rio de Janeiro, Cidade Universitaria, Centro de Tecnologia, Bloco H, Sala 319, Rio de Janeiro, RJ, 21941&#45;972, Brazil. &#39; CRACS and INESC&#45;Porto LA, Faculdade de Ciencias, Universidade do Porto, Rua do Campo Alegre, 1021, 4169&#45;007 Porto, Portugal</dc:contributor>
<dc:subject>dataflow architectures</dc:subject>
<dc:subject>parallel programming</dc:subject>
<dc:subject>computer architecture</dc:subject>
<dc:subject>dataflow virtualisation</dc:subject>
<dc:subject>Trebuchet virtual machine</dc:subject>
<dc:subject>multicore</dc:subject>
<dc:subject>dataflow graphs</dc:subject>
<dc:subject>TLP.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>137</prism:startingPage>
<prism:endingPage>148</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040467">
<title>A RISC architecture for 2DLNS&#45;based signal processing</title>
<link>http://www.inderscience.com/link.php?id=40467</link>
<description>The multi&#45;dimensional logarithmic number system &#40;MDLNS&#41; provides a reduction in the size of the number representation and promises a lower cost realisation of arithmetic operations. The non&#45;linear nature of the representation and independency of the parallel&#45;based computations combined with multi&#45;digit extensions of the MDLNS representations along with simplified arithmetic operations, make MDLNS suitable for some multiplication intensive DSP applications. The work presented in this paper is the design and implementation of a 2DLNS&#45;based processor architecture. This CPU takes advantage of a relatively simple architecture and a well designed organisation which greatly improves the implementation of many DSP algorithms. An assembly programme is also written to implement a 2DLNS&#45;based filterbank architecture. This implementation demonstrates the efficiency and ease of use of 2DLNS CPU in real applications.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40467"><b>A RISC architecture for 2DLNS&#45;based signal processing</b></A><br />M. Azarmehr, R. Muscedere<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 149 - 156</i><br />The multi&#45;dimensional logarithmic number system &#40;MDLNS&#41; provides a reduction in the size of the number representation and promises a lower cost realisation of arithmetic operations. The non&#45;linear nature of the representation and independency of the parallel&#45;based computations combined with multi&#45;digit extensions of the MDLNS representations along with simplified arithmetic operations, make MDLNS suitable for some multiplication intensive DSP applications. The work presented in this paper is the design and implementation of a 2DLNS&#45;based processor architecture. This CPU takes advantage of a relatively simple architecture and a well designed organisation which greatly improves the implementation of many DSP algorithms. An assembly programme is also written to implement a 2DLNS&#45;based filterbank architecture. This implementation demonstrates the efficiency and ease of use of 2DLNS CPU in real applications.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040467</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 149 - 156</dc:source>
<dc:creator>M. Azarmehr</dc:creator>
<dc:creator>R. Muscedere</dc:creator>
<dc:contributor>Department of Electrical and Computer Engineering, University of Windsor, 401 Sunset Avenue, Windsor, Ontario, Canada. &#39; Department of Electrical and Computer Engineering, University of Windsor, 401 Sunset Avenue, Windsor, Ontario, Canada</dc:contributor>
<dc:subject>MDLNS representation</dc:subject>
<dc:subject>processors</dc:subject>
<dc:subject>reduced instruction set computer</dc:subject>
<dc:subject>RISC architecture</dc:subject>
<dc:subject>CPU</dc:subject>
<dc:subject>multiplication</dc:subject>
<dc:subject>FIR filter</dc:subject>
<dc:subject>multiply and accumulation</dc:subject>
<dc:subject>MAC</dc:subject>
<dc:subject>filterbank architecture</dc:subject>
<dc:subject>digital signal processing</dc:subject>
<dc:subject>DSP.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>149</prism:startingPage>
<prism:endingPage>156</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040468">
<title>GNLS&#58; a hybrid on&#45;chip communication architecture for SoC designs</title>
<link>http://www.inderscience.com/link.php?id=40468</link>
<description>In this paper, we propose global network local bus &#40;GNLS&#41; communication architecture, where network interface is designed and DMA communication is given. We also study and compare the performance of bus&#45;based and mesh&#45;based with GNLS NoC&#45;based infrastructure by theoretical analysis and simulation. It is shown that NoC&#45;based infrastructure performs better than bus&#45;based one in terms of latency when the number of flits contained in the packets exceeds certain threshold. In addition, GNLS&#45;based infrastructure outperforms mesh&#45;based one under the same condition, which verifies the correctness of the theoretical performance analysis. An example of design is given to show that the proposed architecture has better performance than bus, and Mesh based architecture.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40468"><b>GNLS&#58; a hybrid on&#45;chip communication architecture for SoC designs</b></A><br />Ling Wang, Chunda Ding, Shenghai Zhong, Jianwen Zhang<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 157 - 166</i><br />In this paper, we propose global network local bus &#40;GNLS&#41; communication architecture, where network interface is designed and DMA communication is given. We also study and compare the performance of bus&#45;based and mesh&#45;based with GNLS NoC&#45;based infrastructure by theoretical analysis and simulation. It is shown that NoC&#45;based infrastructure performs better than bus&#45;based one in terms of latency when the number of flits contained in the packets exceeds certain threshold. In addition, GNLS&#45;based infrastructure outperforms mesh&#45;based one under the same condition, which verifies the correctness of the theoretical performance analysis. An example of design is given to show that the proposed architecture has better performance than bus, and Mesh based architecture.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040468</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 157 - 166</dc:source>
<dc:creator>Ling Wang</dc:creator>
<dc:creator>Chunda Ding</dc:creator>
<dc:creator>Shenghai Zhong</dc:creator>
<dc:creator>Jianwen Zhang</dc:creator>
<dc:contributor>School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China. &#39; School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China. &#39; School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China. &#39; School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China</dc:contributor>
<dc:subject>network&#45;on&#45;chip</dc:subject>
<dc:subject>NoC</dc:subject>
<dc:subject>global network local bus</dc:subject>
<dc:subject>GNLS</dc:subject>
<dc:subject>network interface</dc:subject>
<dc:subject>routers</dc:subject>
<dc:subject>SoC design</dc:subject>
<dc:subject>system&#45;on&#45;chip</dc:subject>
<dc:subject>simulation.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>157</prism:startingPage>
<prism:endingPage>166</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
<item rdf:about="http://dx.doi.org/10.1504/IJHPSA.2011.040469">
<title>A hardware architecture for subtractive clustering</title>
<link>http://www.inderscience.com/link.php?id=40469</link>
<description>Clustering algorithms are used extensively to organise and categorise abundant data. This paper describes the implementation of subtractive clustering algorithm in hardware. The solution developed in this paper seeks a hardware implementation to automatic and fast identification of cluster centres. This hardware proposed is generic so it can be used in any data classification problems, omnipresent in identification systems.</description>
<content:encoded><![CDATA[<p><a href="http://www.inderscience.com/link.php?id=40469"><b>A hardware architecture for subtractive clustering</b></A><br />Marcos Santana Farias, Nadia Nedjah, Luiza De Macedo Mourelle<br /><i>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 167 - 173</i><br />Clustering algorithms are used extensively to organise and categorise abundant data. This paper describes the implementation of subtractive clustering algorithm in hardware. The solution developed in this paper seeks a hardware implementation to automatic and fast identification of cluster centres. This hardware proposed is generic so it can be used in any data classification problems, omnipresent in identification systems.</p>]]></content:encoded>
<dc:identifier>10.1504/IJHPSA.2011.040469</dc:identifier>
<dc:source>International Journal of High Performance Systems Architecture, Vol. 3, No. 2/3 (2011) pp. 167 - 173</dc:source>
<dc:creator>Marcos Santana Farias</dc:creator>
<dc:creator>Nadia Nedjah</dc:creator>
<dc:creator>Luiza De Macedo Mourelle</dc:creator>
<dc:contributor>Rua Helio de Almeida, 75, Cidade Universitaria &amp;ndash; Ilha do Fundao, Rio de Janeiro, RJ, CEP&#58; 21941&#45;906, Brazil. &#39; Rua Sao Francisco Xavier, 524, Sala 5145&#45;F, Maracana, Rio de Janeiro, RJ, CEP&#58; 20550&#45;900, Brazil. &#39; Rua Sao Francisco Xavier, 524, Sala 5145&#45;F, Maracana, Rio de Janeiro, RJ, CEP&#58; 20550&#45;900, Brazil</dc:contributor>
<dc:subject>subtractive clustering</dc:subject>
<dc:subject>reconfigurable hardware</dc:subject>
<dc:subject>data classification</dc:subject>
<dc:subject>cluster centres.</dc:subject>
<dc:date>2011-05-30T23:20:50-05:00</dc:date>
<prism:volume>3</prism:volume>
<prism:number>2/3</prism:number>
<prism:startingPage>167</prism:startingPage>
<prism:endingPage>173</prism:endingPage>
<prism:publicationDate>2011-05-30T23:20:50-05:00</prism:publicationDate>
</item>
</rdf:RDF>

